[aklug] Re: RAID 1: FailSpare event

From: Christopher Howard <christopher.howard@frigidcode.com>
Date: Fri Aug 24 2012 - 13:31:26 AKDT

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/24/2012 12:07 AM, Christopher Howard wrote:
> I figured it out through the empirical approach: I added a
> partition (raid autodetect) to the new drive about the same size as
> the other drive and added the partition to the array, and it seems
> to have synced up fine.
>
> --------- To unsubscribe, send email to <aklug-request@aklug.org>
> with 'unsubscribe' in the message body.
>

Updated log (for posterity, at least): After the next reboot, I kept
having all kinds of weird behavior from the new drive, resulting in
more FailSpare situations, or the drive not registering with the
system at all. When the drive wasn't registering with the system (it's
device file not accessible) I'd generally find messages like this is
the kernel log:

quote:
- --------
Aug 24 12:35:40 enigma kernel: [ 116.211157] ata2: hard resetting link
Aug 24 12:35:42 enigma kernel: [ 118.409499] ata2: COMRESET failed
(errno=-32)
Aug 24 12:35:42 enigma kernel: [ 118.409507] ata2: reset failed,
giving up
Aug 24 12:35:42 enigma kernel: [ 118.409525] ata2: exception Emask
0x10 SAct 0x0 SErr 0x40d0202 action 0xe froze
n t4
Aug 24 12:35:42 enigma kernel: [ 118.409526] ata2: irq_stat
0x00400040, connection status changed
Aug 24 12:35:42 enigma kernel: [ 118.409529] ata2: SError: {
RecovComm Persist PHYRdyChg CommWake 10B8B DevExch
}
Aug 24 12:35:42 enigma kernel: [ 118.409533] ata2: limiting SATA link
speed to 1.5 Gbps
Aug 24 12:35:42 enigma kernel: [ 118.409536] ata2: hard resetting link
Aug 24 12:35:45 enigma kernel: [ 120.998681] ata2: COMRESET failed
(errno=-32)
Aug 24 12:35:45 enigma kernel: [ 120.998687] ata2: reset failed
(errno=-32), retrying in 8 secs
Aug 24 12:35:52 enigma kernel: [ 128.379362] ata2: hard resetting link
...snip...
Aug 24 12:39:31 enigma kernel: [ 346.617991] ata2: reset failed,
giving up
Aug 24 12:39:31 enigma kernel: [ 346.618004] ata2: EH pending after 5
tries, giving up
Aug 24 12:39:31 enigma kernel: [ 346.618006] ata2: EH complete
- --------

Tried replacing cables and using different SATA controller ports, but
to no avail. The documentation of the drive was not particularly
helpful (all Windows- and Mac- centric); however, I eventually
discovered that there is a pin pair on the drive that limits the data
transfer speed to 1.5Gb/s. Having no other ideas, I stuck a jumper on
it and booted again.

Against all my expectations, this seems to have worked. At least, the
drive is now registered with the system, and the RAID system is now in
recovery mode mirroring everything to the new drive. So, good sign, I
guess... we'll see if this survives through the next reboot.

- --
frigidcode.com
indicium.us
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJQN/KuAAoJEI2DxlFxTtgdFk0H/0DK58LwD/yikvhmWekxKKdF
45D+4bP2UWz0YJ+4Asn1PHWH1uu5rMRAwR7sK8pMe+5WzRfEzTW1vMqUbDeEsZPP
90vCFIIEZY/zGpD5PmB8ZTrGRuFwCWEuYYnTpUUiyGlnbBJGU3nvDvNh9vNxVju9
mQOuf+/vQtDny8eK+hXDWQfh9p9caW37T+FrFt3HcKMbVKdNXVBHP5FXJXUYymvp
M168YGO9MXE9OGt2t74bejI47qmeGol5QoRODsfvaJj0WebINsuJwU+OeOMcsKHe
09KyFgYvRhd5shAbpNy1vfFVIvVkoAX5j741ZVCXyJKLhUuOSoDNjAsGF9in3iI=
=zxLY
-----END PGP SIGNATURE-----
---------
To unsubscribe, send email to <aklug-request@aklug.org>
with 'unsubscribe' in the message body.
Received on Fri Aug 24 13:25:51 2012

This archive was generated by hypermail 2.1.8 : Fri Aug 24 2012 - 13:25:51 AKDT