Koozali.org: home of the SME Server

Raid 1 - RAID configuration problem - both disks failing???

avery

I'm in serious need of some help here...  My raidmonitor is giving me the following every 15mins....

Quote

ALARM! RAID configuration problem

Current configuration is:

Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hdb3[1]
      264960 blocks [2/1] [_U]
     
md1 : active raid1 hdb2[1]
      155918784 blocks [2/1] [_U]
     
md0 : active raid1 hda1[0]
      104320 blocks [2/1] [U_]
     
unused devices: <none>

Last known good configuration was:

Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hda3[0] hdb3[1]
      264960 blocks [2/2] [UU]
     
md1 : active raid1 hda2[0] hdb2[1]
      155918784 blocks [2/2] [UU]
     
md0 : active raid1 hda1[0] hdb1[1]
      104320 blocks [2/2] [UU]
     
unused devices: <none>


Please tell me I'm reading this wrong ... both hda and hda have issues ....?  I've searched the forum and faq and found some help ... e.g.

http://mirror.contribs.org/smeserver/contribs/dmay/smeserver/5.x/contrib/raidmonitor/raid-monitor-howto.html

http://mirror.contribs.org/smeserver/contribs/dmay/smeserver/5.x/contrib/raidmonitor/raid-recovery-howto.html

http://mirror.contribs.org/smeserver/contribs/jbennett/howto/Recoving%20From%20Raid%201%20Failure.htm


From these am I right in thinking my only solutions are:

1/ get another disk - restore the active partitions back to the new one - then use new disk to rebuild the array (assuming the original disks are OK anyway)

2/ Back up all data then start again - would this be easier??

Or is there an easier way to restore the original disk partitions??  Advice appreciated! :-?

avery

Raid 1 - RAID configuration problem - both disks failing???
« Reply #1 on: July 24, 2005, 08:31:00 PM »
I may have just found a "fix" - but I'm still concren about the cause.

Reading older posts I thought I'd try:

/sbin/raidhotadd /dev/md2 /dev/hda3
/sbin/raidhotadd /dev/md0 /dev/hdb1
/sbin/raidhotadd /dev/md1 /dev/hda2

The first two seem to go OK, the final one is rebuilding now - I know in 4 hours...

But this leave me with more questions than answers ..

1/  What did this happen?   Will it happen again?  Are my Seagate disks prone to this, as other posts sugest?
2/  Have I cured it?  or just put a sticking plaster over it?
3/  Was this the right thing to do?

I'll make sure I run /usr/local/bin/raidmonitor -iv after its finished and will cross my fingers...

But in the absence of experince of my own, I'd be gratefull if anyone could help with any of this.

Offline raem

  • *
  • 3,972
  • +4/-0
Raid 1 - RAID configuration problem - both disks failing???
« Reply #2 on: July 25, 2005, 08:06:05 AM »
The message suggests one disk failed.

I would run a drive fitness test on both disks to make sure before you rely on the supposedly failed disk that you are rebuilding.

Follow the Recovery howto from dmay exactly.
...

Offline NickCritten

  • *
  • 245
  • +0/-0
Raid 1 - RAID configuration problem - both disks failing???
« Reply #3 on: July 25, 2005, 03:06:52 PM »
Hi all,

I also just recently had this problem, a brand new pair of 80Gb HDD's came up as degraded.

I followed the raidhotadd instructions, and it screwed my install, it completely failed to boot, just kept asking me to run fsck, but it refused to run.

I ran through the full seagate diagnostics (as found on the latest Ultimate Boot CD) and they pronounced that the Drives were both fine. I also ran Memtest+ for 24 hours, and that came back OK, I even ran a Prime Number Generator for 24 hours to test the CPU, which also came back OK. So hardware-wise, the system is fine.

I have the same questions.. Why did this happen and is it likely to happen again?

I think for SME7 there needs to be better RAID utilities built into the Server-manager. For a system that supports RAID from the install, the built in features are pretty abysmal.

Nick
...
Nick

"No good deed goes unpunished." :-x...

Offline NickCritten

  • *
  • 245
  • +0/-0
Raid 1 - RAID configuration problem - both disks failing???
« Reply #4 on: July 25, 2005, 03:08:29 PM »
by the way, my system is a 6.0.1-01 with the SMEPlus script installed
...
Nick

"No good deed goes unpunished." :-x...

avery

Raid 1 - RAID configuration problem - both disks failing???
« Reply #5 on: July 26, 2005, 01:06:06 AM »
Nick - so sorry to hear of your problems ... :-(

I "think" I'm sorted now - just hotadding the disks back and letting them rebuild seems to have done the trick.  Both drives tested out OK, and I can't believe different bits of both failed at the same time...

So what caused it?  Power cut?  No idea!  

Hopefully I can sort out my Clsam issues on other thread then all in SME land will be sweetness and light again!

Offline NickCritten

  • *
  • 245
  • +0/-0
Raid 1 - RAID configuration problem - both disks failing???
« Reply #6 on: July 26, 2005, 09:25:01 AM »
Yeah I've rebuilt mine now, in order to get the data off, I had to boot off an Insert CD (Live Linux distro), smbmount to my windows box and transfer the files that way...

Then I had to manually set up all the users, ibays & addons again, then transfer all the data back, copy the users mail back across and reset all the permissions :hammer:

It took one HELL of a long time.

My backups were out of date you see.

Motto: Keep regular backups.
...
Nick

"No good deed goes unpunished." :-x...

Offline raem

  • *
  • 3,972
  • +4/-0
Raid 1 - RAID configuration problem - both disks failing???
« Reply #7 on: July 26, 2005, 09:56:21 AM »
NickCritten

> My backups were out of date you see.
> Motto: Keep regular backups.

...and you can do that relatively easily by having a third hard disk, and swap that in and out of the server occasionally (daily, weekly or monthly as desired) and just rebuild the array each time onto the freshly inserted drive.
That way you have the whole server backed up, in the event of a major failure just plug in the spare drive and use that in degraded mode.
...

Offline NickCritten

  • *
  • 245
  • +0/-0
Raid 1 - RAID configuration problem - both disks failing???
« Reply #8 on: July 26, 2005, 09:10:45 PM »
That is a good Idea, unfortunately the server in question is on a client site. And its one particular client I don't like to visit much.

He MOANS about EVERYTHING... Absolutely no appreciation of how much work is put in to getting his systems up and running.

Ahh well...

I'm going to set up backup2ws and tell him to regularly burn the rar to CDRW.

If it goes wrong again and he's not got backups next time, I'm charging him full whack for the setup again... :-D
...
Nick

"No good deed goes unpunished." :-x...

Offline NickCritten

  • *
  • 245
  • +0/-0
Raid 1 - RAID configuration problem - both disks failing???
« Reply #9 on: July 29, 2005, 10:42:59 AM »
Oh HELL!  :evil:

The drives on that server have degraded again.

Both drives are brand new Identical Seagate 80Gbs.
They are connected as Primary Master & Secondary Master with a CDRW as Secondary Slave.

For some reason, when I checked fstab, the partition sizes look different?

Both drives test OK using Seagate diagnostics Full Test.
Server Memory & CPU both test OK (24Hour burn-in tests)
Server is running on a UPS, with Surge Protection.

Server is now running 6.5, with only 2 contribs, webdav and Swerts-Knudsens Clam AV.


I am completely at a loss; I can't find any info on contribs.org or google that mentions repeated degredation, that doesn't relate to faulty hardware.

Has anyone got any Ideas?

Many Thanks
...
Nick

"No good deed goes unpunished." :-x...

Offline raem

  • *
  • 3,972
  • +4/-0
Raid 1 - RAID configuration problem - both disks failing???
« Reply #10 on: July 29, 2005, 03:20:20 PM »
I have seen this happen on a server with mrtg installed. I removed mrtg and no more corruption. I don't believe the disks were actually corrupted in any way, just that cat /proc/mdstat did not report correctly.
...

Offline pfloor

  • *****
  • 889
  • +1/-0
Raid 1 - RAID configuration problem - both disks failing???
« Reply #11 on: July 30, 2005, 08:29:06 AM »
I have had the same repeated problem with IBM Deathstar drives.  They test OK with IBM test software but for some reason will not stay synced.  I have tried everything but after a while (sometimes a few day and other times several weeks) they just fall out of sync.

I threw the IBM's in the trash can and replaced with WD drives.  They have been perfect for almost 2 years now.  Raid monitor has never sent me an email so I manually check every now and then just to make sure everything is OK.

My test server HAD IBM's in it but they both started giving me problems and off to the trash can they also went about a week ago.  I'm using an OLD 20 gig WD drive now that has never failed me and will avoid anything else for the time being.

I have not tried Seagates in an SME server but I can tell you to stay away from IBM's.

Paul
In life, you must either "Push, Pull or Get out of the way!"