Koozali.org: home of the SME Server

Big trouble with raid array

Offline Sijhtam

  • *
  • 16
  • +0/-0
    • http://www.sijhtam.nl
Big trouble with raid array
« on: September 13, 2007, 11:06:30 AM »
Hello everyone,
I’m having a big fight with my sme 7.2 server.  When I had a power failure I want to restart my server. But after power up it gives a message; unclean stop.
Than I repaired the data by using the fscd. After a new reboot I get the following message:

Error:  /bin/lvm exited abnormally (pid 461)
Creating root device
Mounting root filesystem
Mount error 6 mounting ext3
Mount error 2 mounting none
Switching to new root
Switchingroot: mount failed: 22
Unmount /initrd/dev failed:2
Kernel panic – not syncing: attempted to kill init!



I did not make any backup  :( but the date contains pictures of my doughter getting born.
So please help me get some data back

Mathijs

Offline gilsaa

  • 2
  • +0/-0
Re: Big trouble with raid array
« Reply #1 on: September 13, 2007, 11:16:44 AM »
What if you try to boot on one of the disks only (I assume the disks are mirrored)

Offline Sijhtam

  • *
  • 16
  • +0/-0
    • http://www.sijhtam.nl
Re: Big trouble with raid array
« Reply #2 on: September 13, 2007, 12:59:49 PM »
I have 4 disks. I found out that one of the disks failed i removed that one and replaced the one for a new one.

This is the message i get:

md: md1 stoped
md: bind<hde1>
raid1: raidset md1 active with 3 out of 4 mirrors
md: md2 stoped
mdadm: /dev/md2 assambled from 2 drives - not enough to start the array. Making divice mapper contole node.
scanning logical volumes
reading all physical volumes. This may take a while...
cdrom: open failed
volume group "main" not found
error: bin/lvm exited abnormaly (pid 459)
Creating root device
Mounting root filesystem
Mount error 6 mounting ext3
Mount error 2 mounting none
Switching to new root
Switchingroot: mount failed: 22
Unmount /initrd/dev failed:2
Kernel panic – not syncing: attempted to kill init!


I have tried to strat up on one of the disks (all seperated) and did it with 2 and 3. Nothing works. Is there a way to recover data by putting some disks in a windows machine and use some kind of tool?
Some of the data i cant lose!

Mathijs

Offline byte

  • *
  • 2,183
  • +2/-0
Re: Big trouble with raid array
« Reply #3 on: September 13, 2007, 01:24:06 PM »
Is there a way to recover data by putting some disks in a windows machine and use some kind of tool?
Some of the data i cant lose

I normally have a burn copy of knoppix as a last resort which you can boot as a Live-cd and recover (if possible) any data.
--[byte]--

Have you filled in a Bug Report over @ http://bugs.contribs.org ? Please don't wait to be told this way you help us to help you/others - Thanks!

Offline Sijhtam

  • *
  • 16
  • +0/-0
    • http://www.sijhtam.nl
Re: Big trouble with raid array
« Reply #4 on: September 13, 2007, 01:54:39 PM »
I will download a copy of the knopix cd and try to get some data back.
Hope it workes
Thanx

Offline Sijhtam

  • *
  • 16
  • +0/-0
    • http://www.sijhtam.nl
Re: Big trouble with raid array
« Reply #5 on: September 13, 2007, 02:43:47 PM »
Just logged on with knoppix, but cant get to the disks. This is dmesg | tail i get.
dmesg |tail
Buffer I/O error on device hdg, logical block 6
end_request: I/O error, dev hdg, sector 56
Buffer I/O error on device hdg, logical block 7
end_request: I/O error, dev hdg, sector 0
end_request: I/O error, dev hdg, sector 0
EFS: cannot read volume header
EXT3-fs error (device hde1): ext3_check_descriptors: Inode bitmap for group 7 not in group (block 1073799428)!
EXT3-fs: group descriptors corrupted!
EXT3-fs error (device hde1): ext3_check_descriptors: Inode bitmap for group 7 not in group (block 1073799428)!
EXT3-fs: group descriptors corrupted!



I dont get the error. is there some one else?
Mathijs

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: Big trouble with raid array
« Reply #6 on: September 13, 2007, 03:21:16 PM »

I have tried to strat up on one of the disks (all seperated) and did it with 2 and 3. Nothing works. Is there a way to recover data by putting some disks in a windows machine and use some kind of tool?
Some of the data i cant lose!

try to boot your server (with all 3 good hd onboard) with sme's cd and use
Code: [Select]
sme rescue
at boot prompt

HTH

Stefano

Offline Sijhtam

  • *
  • 16
  • +0/-0
    • http://www.sijhtam.nl
Re: Big trouble with raid array
« Reply #7 on: September 13, 2007, 03:34:49 PM »
try to boot your server (with all 3 good hd onboard) with sme's cd and use
Code: [Select]
sme rescue
at boot prompt

HTH

Stefano

I already try that one. It gives back an error. (kernel panic) and than the server needs to reboot.
When i try sme update. i get an error. It says it is a bug. and i have to report it on bugzilla. but when i try to save, it fails?

Isnt there a way to look at the raw data? I have tryed to put the disk in my windows machine en load te ext2ifs drivers. than i see the bootloader (grub) and some boot img files.

Still nothing works!
Mathijs

Offline Sijhtam

  • *
  • 16
  • +0/-0
    • http://www.sijhtam.nl
Re: Big trouble with raid array
« Reply #8 on: September 13, 2007, 03:52:11 PM »
I have tried some an other thing:
I only put in the disfunctioning disk and this is de servers message:
ide: faild
upcode was unknown
end request: 1/0 error def hda, sector 160071501
/def/ hda2: read faild after 0 of 512 at 8184967976 input/output error


To me this don't make sense.
Mathijs

Offline Gaston94

  • *****
  • 184
  • +0/-0
Re: Big trouble with raid array
« Reply #9 on: September 14, 2007, 11:27:43 PM »
Hi,
I am not sure this can help, but it might help you understanding where you are and how you can you further.

let me try describing your configuration :
 You built your server with 4 disks.
 The installation went in a Raid5 configuration on 3 disks + 1 hot spare
 You are using a LVM layout.

Accessing data for such configuration, out of the normal operation mode, requires  you can 
 - start the raid array (automatically or manually)
 - activate your LVM layout
only once these steps are completed, you will be able to mount the FS and access the datas (and as far as I know this can't be done on a non Linux system). Any access to your lvm part out of these prerequisites will failed (your last error messages ...)

You should not have get errors because, you add a Hot spare disk, unless you choose "no spare" at installation stage (did you ? )
Some informations looks strange.
Quote from: Sijhtam
mdadm: /dev/md2 assambled from 2 drives - not enough to start the array.
like if you were not using the correct initrd image
Are you sure you change the correct faulting drive ?
I do not see neither good reason for the SME not booting in rescue mode  :?
I would have try the following :
 - have your good disks in your box (let's say hda and hdb)
 - boot on a live CD supporting LVM2 and raid device (Linux CD rescue for instance)
 - try starting the raid array : mdadm -AR /dev/md5 /dev/hda2 /dev/hdb2
try any combinaison of your disks - I am naming disk1 as the former hda, disk2 as the former hdb, ... - disk1+disk2 / disk1+disk3 / disk1+disk4 / disk2+disk3 / disk2+disk4 / disk3+disk4
then  go on with activating you lvm and mounting your FS :
Code: [Select]
#vgscan
#vgchange -a y main
#mkdir /mnt/tmp
#mount /dev/main/root /mnt/tmp
and backup your data
should you can't restart a raid array with your existing disk, I have no further idea

Gaston
« Last Edit: September 14, 2007, 11:29:44 PM by Gaston94 »

Offline Sijhtam

  • *
  • 16
  • +0/-0
    • http://www.sijhtam.nl
Re: Big trouble with raid array
« Reply #10 on: September 15, 2007, 09:35:26 AM »
Thanx for your sugestions, i try it today. Let you al know the answer.
Mathijs

Offline Sijhtam

  • *
  • 16
  • +0/-0
    • http://www.sijhtam.nl
Re: Big trouble with raid array
« Reply #11 on: September 15, 2007, 02:30:03 PM »
You should not have get errors because, you add a Hot spare disk, unless you choose "no spare" at installation stage (did you ? )
No i dit not choose "no spare"


Are you sure you change the correct faulting drive ?
Do i need to put a good drive in the server to replace the faulting drive?


After i take out the faulting drive and dit not put in a new one, this is the message given by the server:

md: Kicking non-fresh hde2 from array
md: md5 raidarry is not clean --starting background reconstruction
raid5: not enough operational devices for md5 (3/4 failed)
raid5 conf printout:
--- rd:4 wd:1
disk 2, 0:1 dev:hdh2
raid5: failed to run raid set md5
md: press -> run() failed
mdadm: failed to RUN_ARRAY /dev/md5: input/output error
MDADM not enough divices to start array

What does it say? and What do i have to do to make it work?

Mathijs

Offline Gaston94

  • *****
  • 184
  • +0/-0
Re: Big trouble with raid array
« Reply #12 on: September 15, 2007, 07:49:32 PM »
Hi,
things are quite bad,
md: Kicking non-fresh hde2 from array
md: md5 raidarry is not clean --starting background reconstruction
raid5: not enough operational devices for md5 (3/4 failed)
raid5 conf printout:
--- rd:4 wd:1
disk 2, 0:1 dev:hdh2
raid5: failed to run raid set md5
md: press -> run() failed
mdadm: failed to RUN_ARRAY /dev/md5: input/output error
MDADM not enough devices to start array
this mean that your hdh disk is OK and that the hde drive is the one which is faulty !
and that you originaly have an array of 4 disksn but only one is recognized ok to go on, but the Raid mechanism cannot restart a raid5 on a single disk : you need at least two goode one (but not the spare one ).
Either you make a mistake and move out one of the good disks, either you had more than one disk faulty and there is nothing I know to help you.

Considering you were having a single disk faulty, let's try this last :Put all your original disks in the box and use
          mdadm -AR /dev/md5 /dev/hde2 /dev/hdf2 /dev/hdg2 /dev/hdh2
(yes I know trying to add hde2 will not success, but we can give it a chance ...)

you might need to use the --force option and to star manually the array (mdadm -R /dev/md5)

Yous should see  something like this on the console:
raid5 conf printout:
--- rd:4 wd:3 fd:1
disk 1, o:1, dev:hdf2
disk 2, o:1, dev:hdg2
disk 3, o:1 dev:hdh2
...


I don't think I can help you further  :?
G.

Offline Confucius

  • *****
  • 235
  • +0/-0
Re: Big trouble with raid array
« Reply #13 on: September 15, 2007, 08:01:16 PM »
Mathijs,

Silly question maybe but I didn't see anything about "Linux raid autodetect" being set to the newly attached drive. Might that be the problem ?

Harro

Offline Sijhtam

  • *
  • 16
  • +0/-0
    • http://www.sijhtam.nl
Re: Big trouble with raid array
« Reply #14 on: September 16, 2007, 12:32:14 PM »
Mathijs,

Silly question maybe but I didn't see anything about "Linux raid autodetect" being set to the newly attached drive. Might that be the problem ?

Harro

@Harro, when do i have to see the linux raid autodetect? at start up with a new drive?


Considering you were having a single disk faulty, let's try this last :Put all your original disks in the box and use
          mdadm -AR /dev/md5 /dev/hde2 /dev/hdf2 /dev/hdg2 /dev/hdh2
(yes I know trying to add hde2 will not success, but we can give it a chance ...)

you might need to use the --force option and to star manually the array (mdadm -R /dev/md5)

Yous should see  something like this on the console:
raid5 conf printout:
--- rd:4 wd:3 fd:1
disk 1, o:1, dev:hdf2
disk 2, o:1, dev:hdg2
disk 3, o:1 dev:hdh2
...



@ Gaston,

I put in all my disks, when booting from the rescue cd this is the message:

ide: failed opcode was: unknown end_request: 1/0 error, dev/hdf, sector 160071506
hdf: dma_intr: status = 0x51 {DriveReady Seek Complete Error}
hdf: dma_intr: error = 0x40 {uncorrectable error}, LBAsect = 160071507, sector = 160071507

I get this error on multiple sectors 22 times exactly.

Than after
Code: [Select]
mdadm -AR /dev/md5 /dev/hde2 /dev/hdf2 /dev/hdg2 /dev/hdh2

I get the same error 22 times on different sectors and it ends with:
mdadm: no RAID superblock on /dev/hdf2
mdadm: /dev/hdf2 has no superblock - assembley abborted

Hope you someone have an other suggestion, or else i'm gonne build a new server and loosing my data!

Mathijs

Offline Confucius

  • *****
  • 235
  • +0/-0
Re: Big trouble with raid array
« Reply #15 on: September 16, 2007, 12:40:35 PM »
Mathijs,

Every new disk will have it's own factory set File System. In most cases (almost all) this will be not the 1 that's used for RAID settings.
with a fdisk -l you can see what file system your newly attached drive is having. You'll also see the filesystem used on the other disks.
Since I'm doing RAID1 and not RAID5 I might be thinking the wrong way but my impression is that RAID = RAID and therefore has the same basics for the disks.

Harro

Offline Gaston94

  • *****
  • 184
  • +0/-0
Re: Big trouble with raid array
« Reply #16 on: September 16, 2007, 08:56:28 PM »
Hi,
did hdf was one of your old drives ? If so I am afraid things are done.
Everything that messages you are reporting indicates is that the disk "hdf" is not (anymore?) initialised, neither as a member of a raid array nor within any suitable partition.
This should not scarry me if this was about any new inserted disk :-?

The only things I am certain about are :
 - if you were having 4 disks in your array,
 - if ONLY one disks from this 4 failed (in fact one of the active one, but there ar few chance that the faulted disk was the spare one) , you might recover your data
 - if more thant one disk failed, nothing from my knowledge can be done (some companies might recover your faulted device and then rebuild the array, but this is out of my competences, and might cost much - something like from 200$ up to ... )

Regarding the "Linux raid autodetect" partition system id, I realy don't think it has anything to do so far. Your disk were partitionned and set with this id at the early installation stage. As far as I experienced it was only for help in automatics actions (and I know I have already built raid over "Linux" - id=83 -, even this might not be recommended)
We should have care about for integrating a new not initialized disk in the array.

But may be some other people around there might have some other advices.

G.
« Last Edit: September 16, 2007, 09:01:31 PM by Gaston94 »

Offline kryptos

  • *****
  • 245
  • +0/-0
Re: Big trouble with raid array
« Reply #17 on: September 18, 2007, 10:30:59 AM »
Hi,

I have experienced this twice on the same server but luckily i was able to restore my data. My 3rd disk has malfunctioned. What i did was remove the defective drive replace it with a good one with same capacity of course and boot my SME CD using rescue mode. Assemble back the arrays first i tried the fail,remove and re-add the arrays using mdadm utility but fails then i try the --Assemble --force option then it start rebuilding the arrays. I have found that the power supply of the server were not very stable so i replaced with another power supply with a higher wattage. And now the server is running smoothly. Hope this can help a little in recovering you data.

Regards,
Rocel

Offline Sijhtam

  • *
  • 16
  • +0/-0
    • http://www.sijhtam.nl
Re: Big trouble with raid array
« Reply #18 on: September 21, 2007, 08:02:43 PM »
Thanx for all the sugestions. But i hav made a fresh install.
Never know if your option works kryptos
Thank you for your help
Mathijs