Koozali.org: home of the SME Server

Contribs.org Forums => General Discussion => Topic started by: portedaix on January 07, 2011, 09:50:02 PM

Title: Raid disk dead. Need help for replacement
Post by: portedaix on January 07, 2011, 09:50:02 PM
Hello,
I believe I have one hard disk dead, and I try to recover but I am confused with raid behavior. I need some help.
In my server, an old AMD K7, I have the following devices:
•   Disk 1 : hda is a 40Gb hd
•   Disk 2 : hdb is a 40Gb hd
•   Disk 3 : hdc is a 60Gb hd
•   hdd is a cdrom
I installed first sme 7.4 and then upgraded it to 8.0 beta 6.

I think disk 1 is dead because of the following messages (an extract only, there quite many) :
•   hda: dma_intr : error = 0x40 {unrecorrectableError} LBA sect = ……., sector = ……..
•   /dev
•   /dev/root
•   Switchroot : mount failed : No such a file or directory
•   Kernel panic – not syncing : attempted to kill init
And then the server hungs. This happened during normal use, no upgrade done, no new contribs, no restart …

I tried to replace disk 1 with disk 2 at hda channel, with no disk 3 connected, I had a grub command line coming, no boot up. Then  I boot up the server with disk 3 only (moved to hda) I had a kernel panic. Finally I removed disk 1 and replaced it with disk 3, and with disk 2 in place. The server boots again, tells me that it mounts a new root filesystem. Then I shut down the server. Hope it is not a bad idea… But some files like mysql database are not updated. They are a few weeks old.

QUESTION 1 :  After checking the RAID howtos, it seems that disk 1 (hda) has been mirrored with disk 3 (hdc) ? What is disk 2 (hdb) used for ?

QUESTION 2 :  I am very confused with the kernel versions. When I boot the faulty disk 1, there is only one kernel version, 2.6.18-194.11.1.el5. When I finally boot up the server with disk 3 as hda device, I had the choice between 2.6.18-194.17.1.el5 and 2.6.9-89.0.25.El. Different from faulty disk 1 …?

QUESTION 3 :  is there a way to recover mysql files ? Where should I look ?  I did not try yet to mount faulty disk 1 on another linux box, but I guess the ext3 filesystem is dead.

QUESTION 4 : what is the next move ? Leave disk 3 to the actual hda place, and install a new hard disk at hdc chanel ?

Thanks
Olivier
Title: Re: Raid disk dead. Need help for replacement
Post by: mmccarn on January 10, 2011, 01:41:33 PM
[caveat] I have done *no* data recovery from damaged sme hard drives [/caveat]

Recent-looking RAID information from the wiki: http://wiki.contribs.org/Raid

This section looks like it may have information germane to your situation: http://wiki.contribs.org/Raid#Convert_Software_RAID1_to_RAID5

I found this with google, which looks like it has way more info than I would want to deal with:
http://pve.proxmox.com/wiki/SMEServer_LVM_Recovery_using_Knoppix_LiveCD#SMEServer_Data_Info_for_recovery

I've seen the behavior you describe -- where the available data from a RAID array is older than expected after a failure.  In my case, it turned out to be a flaky raid 1 member that had failed a while back, but which worked after a fresh reboot.  In my case, I could boot from the remaining good raid member and get current data, or boot from the originally-failed-but-good-enough-to-boot-up raid member and get old data (I was working on a hardware raid in a windows server, so I didn't learn anything about rebuilding SME).  If I rebooted with both raid members connected, the system would report that the raid was rebuilding, then fail before the rebuild completed.

Good luck!
Title: Re: Raid disk dead. Need help for replacement
Post by: portedaix on January 13, 2011, 09:03:03 AM
Hello,
As the dead disk is hda, I have a real problem to boot the raid array. I read loads of howtos, tried to use my favorit supergrub as suggested, no way. :sad:

If someone as an idea...

Regards
Olivier
Title: Re: Raid disk dead. Need help for replacement
Post by: portedaix on January 24, 2011, 05:16:07 PM
To close the subject, my mobo died as well :sad:. Strange... Mains is no so good here... A UPS does not help with surges.
So I installed my three hard disks to another PC, running Ubuntu from a sata disk. So my three disks were installed as before on two IDE channels. Then I followed this howto
http://wiki.contribs.org/Recovering_SME_Server_with_lvm_drives
So far, I could recover my mysql data up to D day -4. I am happy with that.
Ciao