[aklug] Re: hard drive issues

From: Shane Spencer <shane@bogomip.com>
Date: Sun Oct 04 2009 - 17:28:53 AKDT

Some of SMARTs trends are not very accurate, but they aren't insane
either. Your data is on a hard drive that is reporting problems with
factor x. do you want to continue using the drive? It's completely
up to you - don't say we didn't warn you.

The only data I want from SMART, which if gives me, is if a block was
ever reallocated and if I am working with a drive that has blocks that
cannot be reallocated. I work with a lot of used equipment and our
quality point is often balanced against the price point, no bad blocks
and not unreasonably slow =3D good buy.

Each SMART compatible device conforms with the SMART reporting and
command standard but the logic on how to reallocate sectors and when,
say, reallocating block 0 to block 10045005 is available but not
within the performance specification for the drive results in a block
that is not re-allocatable according to the manufacturer. I often
wonder if the cache of initially unallocated sectors is distributed
evenly in small sections all over the drive. Anyways, the if's and's
or but's on how the drive determines re-allocatable sectors doesn't
change the fact that it's got bad blocks and that it's reliability
compared to a drive that reports no faults is not 100% the same.

I believe the Google report is referring to many of the metrics above
and beyond failed block count and other complete failure conditions.
Thankfully SMART reports faulty drives (a drive with any unrecoverable
faults at all) based on bad sector count, high temp, or inability to
do anything useful like spin or access any data what-so-ever. Those
are the critical ones that I don't mind being annoyed about (with
respect to the other thread about Ubuntu 9.10 and the drive failure
popups. To which I say please run SMART on your drives and determine
faults on your own, set up it's email reporting and run smartd on all
critical machines, or machines you are pestered about often. (/me
looks at girlfriends laptop)

- Shane

On Sun, Oct 4, 2009 at 4:36 PM, Greg Madden <gomadtroll@acsalaska.net> wrot=
e:
> On Sun, 4 Oct 2009 15:50:19 -0500 (CDT)
> "Christopher E. Brown" <cbrown@woods.net> wrote:
>
>> On Sat, 3 Oct 2009, Greg Madden wrote:
>>
>> > A few cosmic convergences happening with my IDE storage.
>> > 1. I have hda & hdb, both loose DMA settings reliably.
>> > 2. Smartmontools 'smartctl' reports a high count :
>> > 'Raw_Read_Error_Rate', 'Seek_Error_Rate', fortunately? the
>> > 'Hardware_ECC_Recovered' equals the 'Raw_Read_Error_Rate'.
>> >
>> > The 'Reallocated_Sector_Ct' is at zero which means that, afaikt,
>> > that nothing is physically happening, yet. Also the
>> > 'Current_Pending_Sector' & 'Offline_Uncorrectable' count is zero.
>> > It should be noted that the smartmon stuff reports errors but does
>> > not fix them.
>> >
>> > 3. It was mentioned Friday Nite the Steve Gibson's 'Spinrite' tools
>> > fixes magnetic media issues. Unwilling to spend $89 on a 'black
>> > box' solution I found the tools in Linux that do the same.
>> > http://smartmontools.sourceforge.net/badblockhowto.html
>> >
>> > A brief list of tools:
>> > smartctl -l selftest /dev/hda
>> > smartctl -A /dev/hda
>> > fdisk -lu /dev/hda
>> > Tune2fs -l /dev/hda3 | grep Block
>> > debugfs, cool tool .
>> > and "dd'
>> >
>> > A quick fix :'e2fsck with, at least, the -c option,
>> > which uses 'badblocks' and moves data, remaps bad blocks :-)
>> >
>> > Google is full of hits on the results of SMART enabled drives. Me
>> > thinks it causes much wasted time and anxiety. I mention this
>> > because the new Ubuntu 9.10 is using a new hard drive health tool
>> > that pops up on every boot telling the user that the hard drive is
>> > ABOUT to fail.
>> >
>> > I am maybe a litle closer to understanding hard drive issues :-) =A0I
>> > have renewed interest in backups.
>> >
>> > ps, I used the 'System Rescue Cd' on an Ext4 partition.
>>
>>
>> You should keep in mind that modern drives keep an internal spare
>> area and bad block map. =A0When the drive itself detects a failing
>> sector it is transparently remapped to another sector in the spare
>> area.
>>
>>
>> Normally they only start showing perminant bad sectors _after_ they
>> run our of remap space. =A0If badblocks via e2fsck or mkfs locks out a
>> number of blocks as "bad at the presentation layer" it normally means
>> the drive is on its last legs.
>
> Thanks, Google did a study on predicting hard drive failures from SMART
> data. It was done about 5 years ago but, I believe, their conclusion
> was SMART wasn't all that accurate of a predictor.
>
> http://research.google.com/archive/disk_failures.pdf
>>
>>
>> As I recall, this internal remap on multiple fail behavior became
>> pretty much standard back when standard drive sizes were < 500MB,
>> before that a low level format was needed to get the drive to do the
>> internal mapping.
>>
>> I remember using badblocks to lockout sectors on 40 - 500MB IDE and
>> SCSI drives until I could coupy out the data and low level, and on
>> MFM and RLL drives before that, but not since then.
>
>
> --
> Peace with Love
>
> Greg Madden
> ---------
> To unsubscribe, send email to <aklug-request@aklug.org>
> with 'unsubscribe' in the message body.
>
>
---------
To unsubscribe, send email to <aklug-request@aklug.org>
with 'unsubscribe' in the message body.
Received on Sun Oct 4 17:29:03 2009

This archive was generated by hypermail 2.1.8 : Sun Oct 04 2009 - 17:29:03 AKDT