Koozali.org: home of the SME Server

Obsolete Releases => SME Server 7.x => Topic started by: wjhobbs on October 23, 2007, 03:08:45 AM

Title: A Bad Drive -- or something else?
Post by: wjhobbs on October 23, 2007, 03:08:45 AM
I am seeking advice.

On my daughter's machine I have been getting intermittent RAID rebuild sequences, where email messages get generated during a re-sync. (e.g., "A Rebuild20 event has been detected on md device /dev/md2." ending with "A RebuildFinished event has been detected on md device /dev/md2.")

There was one sequence in April, another in June and one in July. All referenced /dev/md2.

Last week, however, the "RebuildFinished" event was followed almost immediately by a "A Fail event has been detected on md device /dev/md2." message. A day later I initiated a manual re-sync. It successfully went through all of the rebuild messages, ending with the "RebuildFinished" followed by a "SpareActive" event. BUT, 4 hours later I got a "A Fail event has been detected on md device /dev/md2."

It's a fairly old box but the drives are only a couple of years old.

Is this just a bad block somewhere on the second drive that I could get around by running fsck (or something else)? Or is it possibly a controller problem? Or do I need to bite the bullet and replace the drive?

Thanks for your input.

John
Title: Re: A Bad Drive -- or something else?
Post by: jfarschman on October 23, 2007, 05:59:14 PM
John,

  Seems like the first thing to try is fsck.   Can't you just disconnect the md2 device with something like this:

  mdadm --verbose -f /dev/md2 /dev/sda2 -r /dev/sda2
  mdadm --verbose -f /dev/md2 /dev/sdb2 -r /dev/sdb2

  This is basically telling the system that both a2 and b2 are broken and disconnecting them.  Then you can run your fsck.