Koozali.org: home of the SME Server

Software Raid - Problems following disk failure

nkevans

Software Raid - Problems following disk failure
« on: April 30, 2004, 04:03:16 PM »
Yesterday my server (SME 6.0.1)started reporting errors relating to one of my HDDs.  Typical error messages were:

"Apr 29 01:13:45 kalmar kernel: hdb: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Apr 29 01:13:45 kalmar kernel: hdb: read_intr: error=0x40 { UncorrectableError }, LBAsect=79506093, sector=79297248
Apr 29 01:13:46 kalmar kernel: end_request: I/O error, dev 03:42 (hdb), sector 79297248
Apr 29 01:13:46 kalmar kernel: raid1: hdb2: rescheduling block 79297248
Apr 29 01:13:46 kalmar kernel: raid1: hdb2: unrecoverable I/O read error for block 79297248"

The Server then would not shut down cleanly and reported problems with the file system.  It suggested running fsck on root which I did.

This in turn reported errors which it fixed.  

It would seem that my 2nd HDD is faulty according to the above error message.  I would have expected MDSTAT to show that the 2nd HDD was not being used in the array.

However, when I view MDSTAT, I get the following:

"Personalities : [raid1] read_ahead 1024 sectors
md2 : active raid1 hda3[0]  264960 blocks [2/1] [U_]
md1 : active raid1 hdb2[1]  39776832 blocks [2/1] [_U]
md0 : active raid1 hda1[0]  104320 blocks [2/1] [U_]
unused devices: <none>"

This would suggest that md0 and md2 are on my Master HDD and md1 on the Slave.  This does not seem healthy to me.

Have I understood the position correctly and, if so, how do I now replace the Slave HDD, assuming that it is faulty?

TIA

Nick

Offline raem

  • *
  • 3,972
  • +4/-0
Software Raid - Problems following disk failure
« Reply #1 on: April 30, 2004, 05:36:18 PM »
Go to the contribs.org contribs area, look for dmay and find the RAID Recovery HOWTO

Regs
Ray
...

Offline Reinhold

  • *
  • 517
  • +0/-0
    • http://127.0.0.1
Software Raid - Problems following disk failure
« Reply #2 on: May 02, 2004, 05:37:23 PM »
nkevans,

Actually md0,md1 and md2 are located on both HDs .-)
Look at:  pico /etc/raidtab

I'm pretty sure you still only have hardware problems with /dev/hdb ...
NOTE: PLEASE CHECK THE CABLE CONNECTING YOUR TWO HDs ...
As a test physically remove hdb and look at (any) error messages when booting (since there may be inconsistency now).
Read "dmesg" and scan for errors (via sme panel).

If you come up free (IMO higly likely):

CHANGE THAT IMAGE HD TO HDC (/dev/hdc)!!! ...
or HDD if needed (your cdrom is hdc or whatever).
(Mirror should be on another IDE - preferably master .-)

Follow the recovery howto by darrel (a bit old .-)
or just partition your new hdc identical to your primary hda...

Regards
Reinhold

http://www.tldp.org/HOWTO/Software-RAID-HOWTO.html
http://www.samag.com/documents/s=9102/sam0404a/0404a.htm
............