I've recently bought a mini-itx computer with two Western Digital 640Gb Scorpio blue 2.5" disks. I'll be talking about the hardware specs and buying experience in another post that should come soon enough (if I can get the last part shipped at last!), but for now I'll give a word about a weird issue with the disks.

Since I bought the disks, I've been hearing clicking sounds coming from them, as if one of them was defective. I was going to open an RMA request with the manufacturer to get the clicking one replaced. However, my big brother visited today and told me "Oh no, don't RMA them, the next ones'll do the same thing: it's a bug in the firmware and I've had the same thing in the NAS at work."

I was intrigued. We looked up "hdparm linux head count" on google and found out this very useful page talking exactly about this issue. It would seem some recent hard disks are set, to either save energy or to protect the disk somehow, to place the reading heads to parked state frequently ... in fact, so frequently that it's dangerous for your hard drive and gives out poor I/O performance. The clicking sound which I was hearing is apparently produced when the heads are parked.

Disks are generally certified to withstand between 200k and 600k head parking operations. Beyond that, it's possible that the head breaks or behaves erroneously. We checked out both disks' current values:

$ sudo smartctl -A /dev/sda
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   100   253   021    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       8
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       532
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       7
191 G-Sense_Error_Rate      0x0032   095   095   000    Old_age   Always       -       5
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       4
193 Load_Cycle_Count        0x0032   173   173   000    Old_age   Always       -       81681
194 Temperature_Celsius     0x0022   087   076   000    Old_age   Always       -       7484 (0 0 0 71)
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

$ sudo smartctl -A /dev/sdb
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   100   253   021    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       6
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       499
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       5
191 G-Sense_Error_Rate      0x0032   096   096   000    Old_age   Always       -       4
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       3
193 Load_Cycle_Count        0x0032   172   172   000    Old_age   Always       -       85877
194 Temperature_Celsius     0x0022   102   092   000    Old_age   Always       -       7469 (0 0 0 55)
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

Look at both disks' lines starting with "193 Load_Cycle_Count". More than 80k head parks in about 1 month and a half! wtf!

So we changed the 'apm' setting to make the power management settings less aggressive:

hdparm -B 254 /dev/sda
hdparm -B 254 /dev/sdb

and the clicking stopped instantly. I'll have to monitor this value in the coming weeks to make sure that this workaround actually helps. It at least stopped the clicking.

Hopefully, this will make my disks last a bit longer. At this rate, they would have probably both died in about 3 to 6 months.