Koozali.org: home of the SME Server

RAID1 Failure Help

Offline rick1908

  • ***
  • 56
  • +0/-0
    • http://theevansfamily.com/
RAID1 Failure Help
« on: January 03, 2006, 08:33:20 PM »
Server: 6.01

Problem: I am getting a raidmonitor email every 15 mins.

Discussion: It appears that only 1 partition is getting the error. This has been a reoccuring issue over the couple of years. I would say that it has happened about three times. It always happens to one drive only. It has always worked out that I would be upgrading the server anyhow and would start a server installtion from scratch with new new hard drive. Everything seems to work Ok for a few months sometimes a year, then I get the raidmonitor report indicating only one partition is down. I have been sucessful in just forcing a resync at least on one occasion (but cant remeber how to do it any more). The server worked well for a few months then the error again. Anyone know what the problem might be or atleast refreash my memory on how to force a resync.

I have gone through the information here:

http://mirror.contribs.org/smeserver/contribs//dmay/smeserver/5.x/contrib/raidmonitor/raid-monitor-howto.html

and here

http://mirror.contribs.org/smeserver/contribs//dmay/smeserver/5.x/contrib/raidmonitor/raid-recovery-howto.html

I am confused as to which drive has the problem. I don't want to erase the good one and keep the bad one. Can anyone help me with that?

Attached is the text from the raid alarm. Thanks for your help.

-Rick Evans

ALARM! RAID configuration problem

Current configuration is:

Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hda3[0] hdc3[1]
      262016 blocks [2/2] [UU]
     
md1 : active raid1 hda2[0](F) hdc2[1]
      79666240 blocks [2/1] [_U]
     
md0 : active raid1 hda1[0] hdc1[1]
      102208 blocks [2/2] [UU]
     
unused devices: <none>

Last known good configuration was:

Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hda3[0] hdc3[1]
      262016 blocks [2/2] [UU]
     
md1 : active raid1 hda2[0] hdc2[1]
      79666240 blocks [2/2] [UU]
     
md0 : active raid1 hda1[0] hdc1[1]
      102208 blocks [2/2] [UU]
     
unused devices: <none>
Seeeeeeeeeeeeeeee ya,
Rick :pint:

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: RAID1 Failure Help
« Reply #1 on: January 03, 2006, 10:12:00 PM »
Quote from: "rick1908"

I am confused as to which drive has the problem.
...
md1 : active raid1 hda2[0](F) hdc2[1]
      79666240 blocks [2/1] [_U]


The "_" is the bad drive and the "U" is the good one. I suspect "F" is a failing grade.

I won't spoil all your fun by telling you which drive has the problem.

Offline rick1908

  • ***
  • 56
  • +0/-0
    • http://theevansfamily.com/
RAID1 Failure Help
« Reply #2 on: January 04, 2006, 12:01:33 AM »
Quote
The "_" is the bad drive and the "U" is the good one. I suspect "F" is a failing grade.

I won't spoil all your fun by telling you which drive has the problem.


I see that as well....but if the F is with hda2 why wouldn't the blocks line read [1/2]?

I just dont want to leave any doubt before I wipe the drive and start again.

Thanks,
Rick
Seeeeeeeeeeeeeeee ya,
Rick :pint:

Offline NickR

  • *
  • 283
  • +0/-0
    • http://www.witzendcs.co.uk/
RAID1 Failure Help
« Reply #3 on: January 06, 2006, 11:01:39 AM »
Quote from: "rick1908"
I see that as well....but if the F is with hda2 why wouldn't the blocks line read [1/2]?


Because the blocks line should be read as [number of disks / working disks] The preceeding line is the one that tells you where the problem is.

FWIW, I run many RAID1 SME's & I've seen exactly your situation more times than I care to count.  IME, you have 3 options:

1) Reboot - unless the disk really is trashed, this will normally rebuild the array automatically.

2) To re-activate rhe disk without a re-boot, remove the broken member from the array with:

# /sbin/raidhotremove /dev/md1 /dev/hda2

Now add it back:

# /sbin/raidhotadd /dev/md1 /dev/hda2

Check it's rebuilding with:

# cat /proc/mdstat

3) Replace the drive.  This is "last resort" and something I have only had to do once in the last 5 years.
--
Nick......

Offline rick1908

  • ***
  • 56
  • +0/-0
    • http://theevansfamily.com/
RAID1 Failure Help
« Reply #4 on: January 06, 2006, 11:38:07 PM »
NickR,

Thank you so much for your reply. It did the trick...both disks are now in sync again.

I really appreactiate your help you and everyone else here provides.

Regards,
Rick
Seeeeeeeeeeeeeeee ya,
Rick :pint:

Offline NickR

  • *
  • 283
  • +0/-0
    • http://www.witzendcs.co.uk/
RAID1 Failure Help
« Reply #5 on: January 07, 2006, 10:42:10 AM »
Good stuff, glad I could be of help.  

Out of interest which method did you use?
--
Nick......

Offline rick1908

  • ***
  • 56
  • +0/-0
    • http://theevansfamily.com/
RAID1 Failure Help
« Reply #6 on: January 07, 2006, 03:30:17 PM »
raidhotremove/add...

Quote
2) To re-activate rhe disk without a re-boot, remove the broken member from the array with:

# /sbin/raidhotremove /dev/md1 /dev/hda2

Now add it back:

# /sbin/raidhotadd /dev/md1 /dev/hda2

Check it's rebuilding with:

# cat /proc/mdstat


...everything seems to be working great now.

Thanks again.

-Rick
Seeeeeeeeeeeeeeee ya,
Rick :pint: