Koozali.org: home of the SME Server

Legacy Forums => General Discussion (Legacy) => Topic started by: rick1908 on January 03, 2006, 08:33:20 PM

Title: RAID1 Failure Help
Post by: rick1908 on January 03, 2006, 08:33:20 PM: Server: 6.01

Problem: I am getting a raidmonitor email every 15 mins.

Discussion: It appears that only 1 partition is getting the error. This has been a reoccuring issue over the couple of years. I would say that it has happened about three times. It always happens to one drive only. It has always worked out that I would be upgrading the server anyhow and would start a server installtion from scratch with new new hard drive. Everything seems to work Ok for a few months sometimes a year, then I get the raidmonitor report indicating only one partition is down. I have been sucessful in just forcing a resync at least on one occasion (but cant remeber how to do it any more). The server worked well for a few months then the error again. Anyone know what the problem might be or atleast refreash my memory on how to force a resync.

I have gone through the information here:

http://mirror.contribs.org/smeserver/contribs//dmay/smeserver/5.x/contrib/raidmonitor/raid-monitor-howto.html

and here

http://mirror.contribs.org/smeserver/contribs//dmay/smeserver/5.x/contrib/raidmonitor/raid-recovery-howto.html

I am confused as to which drive has the problem. I don't want to erase the good one and keep the bad one. Can anyone help me with that?

Attached is the text from the raid alarm. Thanks for your help.

-Rick Evans

ALARM! RAID configuration problem

Current configuration is:

Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hda3[0] hdc3[1]
262016 blocks [2/2] [UU]

md1 : active raid1 hda2[0](F) hdc2[1]
79666240 blocks [2/1] [_U]

md0 : active raid1 hda1[0] hdc1[1]
102208 blocks [2/2] [UU]

unused devices: <none>

Last known good configuration was:

Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hda3[0] hdc3[1]
262016 blocks [2/2] [UU]

md1 : active raid1 hda2[0] hdc2[1]
79666240 blocks [2/2] [UU]

md0 : active raid1 hda1[0] hdc1[1]
102208 blocks [2/2] [UU]

unused devices: <none>
Title: Re: RAID1 Failure Help
Post by: CharlieBrady on January 03, 2006, 10:12:00 PM: Quote from: "rick1908"

I am confused as to which drive has the problem.
...
md1 : active raid1 hda2[0](F) hdc2[1]
79666240 blocks [2/1] [_U]

The "_" is the bad drive and the "U" is the good one. I suspect "F" is a failing grade.

I won't spoil all your fun by telling you which drive has the problem.
Title: RAID1 Failure Help
Post by: rick1908 on January 04, 2006, 12:01:33 AM: Quote
The "_" is the bad drive and the "U" is the good one. I suspect "F" is a failing grade.

I won't spoil all your fun by telling you which drive has the problem.

I see that as well....but if the F is with hda2 why wouldn't the blocks line read [1/2]?

I just dont want to leave any doubt before I wipe the drive and start again.

Thanks,
Rick
Title: RAID1 Failure Help
Post by: NickR on January 06, 2006, 11:01:39 AM: Quote from: "rick1908"
I see that as well....but if the F is with hda2 why wouldn't the blocks line read [1/2]?

Because the blocks line should be read as [number of disks / working disks] The preceeding line is the one that tells you where the problem is.

FWIW, I run many RAID1 SME's & I've seen exactly your situation more times than I care to count. IME, you have 3 options:

1) Reboot - unless the disk really is trashed, this will normally rebuild the array automatically.

2) To re-activate rhe disk without a re-boot, remove the broken member from the array with:

# /sbin/raidhotremove /dev/md1 /dev/hda2

Now add it back:

# /sbin/raidhotadd /dev/md1 /dev/hda2

Check it's rebuilding with:

# cat /proc/mdstat

3) Replace the drive. This is "last resort" and something I have only had to do once in the last 5 years.
Title: RAID1 Failure Help
Post by: rick1908 on January 06, 2006, 11:38:07 PM: NickR,

Thank you so much for your reply. It did the trick...both disks are now in sync again.

I really appreactiate your help you and everyone else here provides.

Regards,
Rick
Title: RAID1 Failure Help
Post by: NickR on January 07, 2006, 10:42:10 AM: Good stuff, glad I could be of help.

Out of interest which method did you use?
Title: RAID1 Failure Help
Post by: rick1908 on January 07, 2006, 03:30:17 PM: raidhotremove/add...

Quote
2) To re-activate rhe disk without a re-boot, remove the broken member from the array with:

# /sbin/raidhotremove /dev/md1 /dev/hda2

Now add it back:

# /sbin/raidhotadd /dev/md1 /dev/hda2

Check it's rebuilding with:

# cat /proc/mdstat

...everything seems to be working great now.

Thanks again.

-Rick