Koozali.org: home of the SME Server

Dead RAID

John Crisp

Dead RAID
« on: May 06, 2003, 05:09:48 PM »
Hi,

Have been having a few problems with a RAID setup.

I have a 5.1.2 SP3 server which has run pretty well faultlessly for 18 months 24/7. Well done to the developers for such a good piece of code.

On Sunday, a user (one of the bosses) found they couldn't login, so rather than check to see what the problem was they just rebooted the server.

On reboot, /dev/hda reports inittab not found and asks for a runlevel. Whatever you put in it states that there are no more processes and just sits and sulks.

OK, so drop hda and reboot off hdc which gets as far as the first 'L' in lilo and refuses to go any further. A quick boot with tomsrtbt and I can happily mount the main data partitions on either disk so I can get the data without too much trouble (HUGE sigh of relief at 2 am  this morning).

However, trying to mount /dev/hda1 or hdc1 results in much gnashing of teeth and bad/missing superblocks etc
(moans of exasperation by 4 am)

I'm sure that /dev/hdc would boot/run if I had a rescue disk, but the rescue floppy had committed harikiri and I understand that I need the system up to recreate another.

Is there anyway to salvage the setup easily - for instance running 5.1.2 as an ugrade, or even ugrading to say 5.6 ?

Just loath to spend hours recreating it all. They do have a tape backup from a few days previous but there is likley to be a certain amount of information that wasn't on the last one. I also wonder how I get any mail off the machine ?

Any help would be appreciated. It was a long night last night and I don't fancy another one !

B. Rgds
John


John Crisp

Re: Dead RAID
« Reply #2 on: May 06, 2003, 09:08:42 PM »
Sorry Ray - been there and done that.

Machine had raidmonitor running - doesn't help when they don't read the messages.....!

I have used the recovery howto before (fantastic). However neither docs cover my current predicament as you need one booting/working disk.

Both of mine have decided to screw around at the same time, with either the boot partition or MBR or similar, resulting in complete failure to boot.

The data partitions look OK, although possibly out of synch. I need to check the logs etc to see what happened and when.

I think that hdc would boot if I had a rescue floppy that worked to bypass lilo. If I could get that disk running the server, I could backup the data and settings and would then just reinstall onto a fresh installation.

I might try to create the resuce disk on another machine and see if it will boot it (someone will tell that won't work I guess, someone will tell me)

If you have any other thoughts or suggestions I'd be grateful - I spent hours reading online last night and got just about nowhere !

B. Rgds
John

Per Sørensen

Re: Dead RAID
« Reply #3 on: May 07, 2003, 02:56:32 AM »
Hi

I had quite a similar situation a few days ago, i did like this:

"So i had to make a new install on another HD, mount one of the main ones and copy the file on to that then it bootet again."

And then i rebuilded the RAID.

Here is the thread:

http://e-smith.org/bboard/read.php?f=3&i=30858&t=30858

Rgds
Per

Paul

Re: Dead RAID
« Reply #4 on: May 07, 2003, 09:49:21 AM »
I had a similar problem when hda crashed.  I couldn't boot from hdc (gave me the dreaded "L").  So I re-wrote the MBR on hdc as a boot device and bingo, it booted from hdc and I was able to repair/restore.

See if you can mount hdc and try:

/sbin/lilo -C /etc/lilo.conf -b /dev/hdc

if lilo.conf is trashed and if you have raidmonitor installed you can also use:

/sbin/lilo -C /root/raidmonitor/lilo.conf -b /dev/hdc

This should tell lilo to make hdc a boot device.  I think it will only work if hdc1 was set up as bootable in fdisk but I'm not sure.

Just a suggestion, you probably already tried it!

Paul

John Crisp

Re: Dead RAID
« Reply #5 on: May 07, 2003, 08:03:55 PM »
Thanks for the replies.

/dev/hda gets as far as stating that inittab is missing and asking for a runlevel.

/dev/hdc just gives 'L'

I have tried a few things with /dev/hdc1 to no avail.

I do have an older disk swapped out and wondered if I could use this too boot,and then re mirror the relevant partitions. If I make the old disk (which boots happily) /dev/hda and use either of the other disks as /dev/hdc they seem to assert control and the same errrors occur. Take them off and the old disk happily reboots.

I have tried to restore from the backup tape onto a fresh install on a new disk, but that keeps getting a segmentation fault.

Tonight I was going to:

Fresh install on a single drive
Install old drive as /dev/hdc
boot with tomsrtbt
mount both drives and copy all data from hdc to new hda

backup etc.

chuck in new pair of mirrored drives

restore data.

Not sure what settings I will save/lose, but at least the data is back.

I am not sure what happens if I mount the old drive onto a mirrored pair and copy to them, particularly all the configuration settings.

Any suggestions on other methods would be gratefull received.

B. Rgds
John

Per Sørensen

Re: Dead RAID
« Reply #6 on: May 07, 2003, 09:22:26 PM »
John Crisp wrote:
 
> Fresh install on a single drive
> Install old drive as /dev/hdc
> boot with tomsrtbt
> mount both drives and copy all data from hdc to new hda

Why not try the other way to get hdc to boot again, and then rebuild the array?
 
> backup etc.
>
> chuck in new pair of mirrored drives
>
> restore data.
>
> Not sure what settings I will save/lose, but at least the
> data is back.

That's a good plan B

>
> I am not sure what happens if I mount the old drive onto a
> mirrored pair and copy to them, particularly all the
> configuration settings.
>
I'm not not shure, but why not for a plan C


Per

Bill

Re: Dead RAID
« Reply #7 on: May 08, 2003, 08:17:46 AM »
Take a peek at http://www.knoppix.com . It's self booting Linux OS on CD which I used to rescue windows data. It may save your job. Also check out http://fire.dmzs.com/

Good Luck and report back !

Bill

John Crisp

Re: Dead RAID
« Reply #8 on: May 08, 2003, 10:20:20 AM »
It was along night in hell......

Thanks for the suggestions. Look I did it the hard way which was not somehingI wish too repeat. But what can you do when your decent tape backup keeps getting a segmentation fault ?

Going to have a serious rethink on the backup strategy.

Most stuff now works, except the damn printer which seems to have it's permissions in a muddle - I get 'cannot open lp0 permission denied'

If any one has any suggestions I'd be grateful.

When I have had some sleep I'll do a full report...


B. Rgds
John

Ray Mitchell

Re: Dead RAID
« Reply #9 on: May 08, 2003, 05:19:51 PM »
John

Your problem has highlighted the need for a good backup strategy.

You need to prove your backup system, ie do a backup and then emulate a worst case scenario, ie reinstal the OS and then restore your backup. That proves that your procedures work OK in the event of a REAL failure arising.

Relying on one backup is not a good idea either, as a minimum 2 backups should be used and you alternate between them, in the event of a major failure with one backup (ie segmentation errors on restore) then you  can resort to the other backup.

Personally multiple backups are safer, it allows you to reinstate a certain point in time if required. Also a file (or more) may be corrupted or even missing on the most recent few backups, but if you can go back in time then you find the missing files on an earlier backup.

There are different ways to achieve this (tape etc), but backup to a couple of swappable hard disks, on which you can also retain a number of old backups, is quite cost effective on a Mb per $ basis compared to tape. Have the drives in a removable caddie and you can also take them off site for security.

Norton Ghost is another effective way to save your whole hard disk configuration, ie burn to CD's.

Regards
Ray

Kevin

Re: Dead RAID
« Reply #10 on: May 08, 2003, 07:28:51 PM »
Here is a good tool for the toolbox.  It only works with ext2fs though

http://www.r-tt.com/RLinux.shtml

Kevin

Re: Dead RAID
« Reply #11 on: May 08, 2003, 07:30:33 PM »
sorry you need to buy this one for RAID reconstruction

http://www.r-tt.com/RStudio.shtml

Ed Form

Re: Dead RAID
« Reply #12 on: May 08, 2003, 09:21:11 PM »
Ray Mitchell wrote:
>
> There are different ways to achieve this (tape etc), but
> backup to a couple of swappable hard disks, on which you can
> also retain a number of old backups, is quite cost effective
> on a Mb per $ basis compared to tape. Have the drives in a
> removable caddie and you can also take them off site for
> security.

I'd sound a considerable note of caution over this appoach since hard disks are fragile, and modern disks are actually becoming *more* fragile as their capacity rises - witness the recent downgrading of warranty periods from 3 years to 1 year by several of the big manufacturers.

If the disks used for backup, remove, and rotate are treated with very great care this is a good system, but once appoint an ordinary user to carrying them off site and they *will* fail sooner or later.

Ed Form

Ray Mitchell

Re: Dead RAID
« Reply #13 on: May 08, 2003, 10:12:52 PM »
Ed

> I'd sound a considerable note of caution over this appoach
> since hard disks are fragile

Your point is quite a valid one, and I did consider this issue of fragility at the outset.

> witness the recent downgrading of warranty periods from
> 3 years to 1 year by several of the big manufacturers.

Yes that 3yr > 1 yr warranty trick, which all manufacturers seemed to do, was a bit nasty.
 
> If the disks used for backup, remove, and rotate are treated
> with very great care this is a good system, but once appoint
> an ordinary user to carrying them off site and they *will*
> fail sooner or later.

I, my wife and a trusted staff member, do put the drive straight into a small case (actually a ladies beauty case from Target) that is lined with medium density foam, with nice secure locks to stop it accidently springing open. About as good as I can do. Hopefully this gives the drives some added shock protection during transport.

I should also add that a regular backup to CDR's is also a good idea, which I do as well. The CD backup (actually its 4 or 5 CD's every month) is good for long term data archiving, and again is a standby backup in case the backup hard disks should fail, be lost or be stolen etc.

The sme server config is also saved to disk every night so it could be easily rebuilt in a worst case scenario.

Hopefully with RAID1 redundancy, 2 backup removable disks which retain 3 to 6 months worth of daily backups, and monthly backups to CD, if something nasty should happen I should be up and running at least with everything except todays data intact (and I probably will have that on the other RAID disk anyway).
A lot of different problems would have to occur at the same time to create a situation where lots of data was lost, and I think the percentage chances of that happening are extremely small.

Regards
Ray

Michiel

Re: Dead RAID
« Reply #14 on: May 08, 2003, 10:53:26 PM »
> A lot of different problems would have to occur at the same
> time to create a situation where lots of data was lost, and I
> think the percentage chances of that happening are extremely
> small.

Trust on Murphy to prove you wrong one of these days ;-)
(hopefully not)