Koozali.org: home of the SME Server
Obsolete Releases => SME Server 7.x => Topic started by: wjhobbs on October 23, 2007, 03:08:45 AM
-
I am seeking advice.
On my daughter's machine I have been getting intermittent RAID rebuild sequences, where email messages get generated during a re-sync. (e.g., "A Rebuild20 event has been detected on md device /dev/md2." ending with "A RebuildFinished event has been detected on md device /dev/md2.")
There was one sequence in April, another in June and one in July. All referenced /dev/md2.
Last week, however, the "RebuildFinished" event was followed almost immediately by a "A Fail event has been detected on md device /dev/md2." message. A day later I initiated a manual re-sync. It successfully went through all of the rebuild messages, ending with the "RebuildFinished" followed by a "SpareActive" event. BUT, 4 hours later I got a "A Fail event has been detected on md device /dev/md2."
It's a fairly old box but the drives are only a couple of years old.
Is this just a bad block somewhere on the second drive that I could get around by running fsck (or something else)? Or is it possibly a controller problem? Or do I need to bite the bullet and replace the drive?
Thanks for your input.
John
-
John,
Seems like the first thing to try is fsck. Can't you just disconnect the md2 device with something like this:
mdadm --verbose -f /dev/md2 /dev/sda2 -r /dev/sda2
mdadm --verbose -f /dev/md2 /dev/sdb2 -r /dev/sdb2
This is basically telling the system that both a2 and b2 are broken and disconnecting them. Then you can run your fsck.