Koozali.org: home of the SME Server

A Bad Drive -- or something else?

Offline wjhobbs

  • *****
  • 171
  • +0/-0
    • http://www.chryxus.ca
A Bad Drive -- or something else?
« on: October 23, 2007, 03:08:45 AM »
I am seeking advice.

On my daughter's machine I have been getting intermittent RAID rebuild sequences, where email messages get generated during a re-sync. (e.g., "A Rebuild20 event has been detected on md device /dev/md2." ending with "A RebuildFinished event has been detected on md device /dev/md2.")

There was one sequence in April, another in June and one in July. All referenced /dev/md2.

Last week, however, the "RebuildFinished" event was followed almost immediately by a "A Fail event has been detected on md device /dev/md2." message. A day later I initiated a manual re-sync. It successfully went through all of the rebuild messages, ending with the "RebuildFinished" followed by a "SpareActive" event. BUT, 4 hours later I got a "A Fail event has been detected on md device /dev/md2."

It's a fairly old box but the drives are only a couple of years old.

Is this just a bad block somewhere on the second drive that I could get around by running fsck (or something else)? Or is it possibly a controller problem? Or do I need to bite the bullet and replace the drive?

Thanks for your input.

John
...

Offline jfarschman

  • *
  • 406
  • +0/-0
Re: A Bad Drive -- or something else?
« Reply #1 on: October 23, 2007, 05:59:14 PM »
John,

  Seems like the first thing to try is fsck.   Can't you just disconnect the md2 device with something like this:

  mdadm --verbose -f /dev/md2 /dev/sda2 -r /dev/sda2
  mdadm --verbose -f /dev/md2 /dev/sdb2 -r /dev/sdb2

  This is basically telling the system that both a2 and b2 are broken and disconnecting them.  Then you can run your fsck.
Jay Farschman
ICQ - 60448985
jay@hitechsavvy.com