Koozali.org: home of the SME Server
Legacy Forums => General Discussion (Legacy) => Topic started by: rick1908 on January 03, 2006, 08:33:20 PM
-
Server: 6.01
Problem: I am getting a raidmonitor email every 15 mins.
Discussion: It appears that only 1 partition is getting the error. This has been a reoccuring issue over the couple of years. I would say that it has happened about three times. It always happens to one drive only. It has always worked out that I would be upgrading the server anyhow and would start a server installtion from scratch with new new hard drive. Everything seems to work Ok for a few months sometimes a year, then I get the raidmonitor report indicating only one partition is down. I have been sucessful in just forcing a resync at least on one occasion (but cant remeber how to do it any more). The server worked well for a few months then the error again. Anyone know what the problem might be or atleast refreash my memory on how to force a resync.
I have gone through the information here:
http://mirror.contribs.org/smeserver/contribs//dmay/smeserver/5.x/contrib/raidmonitor/raid-monitor-howto.html
and here
http://mirror.contribs.org/smeserver/contribs//dmay/smeserver/5.x/contrib/raidmonitor/raid-recovery-howto.html
I am confused as to which drive has the problem. I don't want to erase the good one and keep the bad one. Can anyone help me with that?
Attached is the text from the raid alarm. Thanks for your help.
-Rick Evans
ALARM! RAID configuration problem
Current configuration is:
Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hda3[0] hdc3[1]
262016 blocks [2/2] [UU]
md1 : active raid1 hda2[0](F) hdc2[1]
79666240 blocks [2/1] [_U]
md0 : active raid1 hda1[0] hdc1[1]
102208 blocks [2/2] [UU]
unused devices: <none>
Last known good configuration was:
Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hda3[0] hdc3[1]
262016 blocks [2/2] [UU]
md1 : active raid1 hda2[0] hdc2[1]
79666240 blocks [2/2] [UU]
md0 : active raid1 hda1[0] hdc1[1]
102208 blocks [2/2] [UU]
unused devices: <none>
-
I am confused as to which drive has the problem.
...
md1 : active raid1 hda2[0](F) hdc2[1]
79666240 blocks [2/1] [_U]
The "_" is the bad drive and the "U" is the good one. I suspect "F" is a failing grade.
I won't spoil all your fun by telling you which drive has the problem.
-
The "_" is the bad drive and the "U" is the good one. I suspect "F" is a failing grade.
I won't spoil all your fun by telling you which drive has the problem.
I see that as well....but if the F is with hda2 why wouldn't the blocks line read [1/2]?
I just dont want to leave any doubt before I wipe the drive and start again.
Thanks,
Rick
-
I see that as well....but if the F is with hda2 why wouldn't the blocks line read [1/2]?
Because the blocks line should be read as [number of disks / working disks] The preceeding line is the one that tells you where the problem is.
FWIW, I run many RAID1 SME's & I've seen exactly your situation more times than I care to count. IME, you have 3 options:
1) Reboot - unless the disk really is trashed, this will normally rebuild the array automatically.
2) To re-activate rhe disk without a re-boot, remove the broken member from the array with:
# /sbin/raidhotremove /dev/md1 /dev/hda2
Now add it back:
# /sbin/raidhotadd /dev/md1 /dev/hda2
Check it's rebuilding with:
# cat /proc/mdstat
3) Replace the drive. This is "last resort" and something I have only had to do once in the last 5 years.
-
NickR,
Thank you so much for your reply. It did the trick...both disks are now in sync again.
I really appreactiate your help you and everyone else here provides.
Regards,
Rick
-
Good stuff, glad I could be of help.
Out of interest which method did you use?
-
raidhotremove/add...
2) To re-activate rhe disk without a re-boot, remove the broken member from the array with:
# /sbin/raidhotremove /dev/md1 /dev/hda2
Now add it back:
# /sbin/raidhotadd /dev/md1 /dev/hda2
Check it's rebuilding with:
# cat /proc/mdstat
...everything seems to be working great now.
Thanks again.
-Rick