Koozali.org: home of the SME Server
Obsolete Releases => SME Server 7.x => Topic started by: SoftDux on January 25, 2008, 05:57:05 PM
-
Hi all
I hope someone can help me with this. The posts in the forum on similar topics didn't help me at all. I just move aprox. 280GB's worth of data to a new server, which has 3x250GB HDD's in. The 4th drive had a problem and was sent back to supplier, so I would have added it later.
Anyway, this morning we had a power failure (UPS also faulty :( ), and now the server doesn't start-up. It seems like /dev/md1 is fine, but /dev/md2 isn't.
Here's the output that I can currently see on the monitor:
RAID 5 conf printout:
--- rd:3 wd:2 fd:1
disk 0, o:1, dev:hdb2
disk 2, o:1, dev:sdb2
raid5: failed to tun raid set md2
md: peers->run() failed ...
mdamd: failed to RUN_ARRAY /dev/md2: Invalid argument
Making device-mapper control mode
Scanning logical volumes
Reading all physical volumes. This may take a while ...
cdrom: open failed.
No volume groups found
Activating logical volumes
cdrom: open failed
Volume group "main" not found
ERROR: /bin/lvm exited adnormally! (pid 485)
Creating root device
Mounting root filesystem
mount: error 6 mounting ext3
mount: error 2 mounting none
Switching to new root
switchroot: mount failed: 22
umount /initrd/dev failed: 2
Kernel panic - not syninc: Attempted to kill init!
Booting from the CD, and specifying "sme rescue" wasn't very helpful, as it tells me there's no installation found. So, I can't even try and rebuild a failed RAID: http://wiki.contribs.org/Raid#Raid_Notes
I have booted up with System rescue CD, and found some interesting info:
sysresccd ~ # fdisk -ul
Disk /dev/sda: 250.0 GB, 250058268160 bytes
255 heads, 63 sectors/track, 30401 cylinders, total 488395055 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 63 208844 104391 fd Linux raid autodetect
/dev/sda2 208845 488392064 244091610 fd Linux raid autodetect
Disk /dev/sdb: 250.0 GB, 250058268160 bytes
255 heads, 63 sectors/track, 30401 cylinders, total 488395055 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 * 63 208844 104391 fd Linux raid autodetect
/dev/sdb2 208845 488392064 244091610 fd Linux raid autodetect
Disk /dev/sdc: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders, total 488397168 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/sdc1 * 63 208844 104391 fd Linux raid autodetect
/dev/sdc2 208845 488392064 244091610 fd Linux raid autodetect
One thing which I don't understand though, is that it detects all 3 HDD's as sdX, even though I have 1x IDE & 2x SATA HDD's in the server.
Can someone please give me some pointers on this?
-
Have you checked the CMOS on the mother board? I lost a Vista system when the MB raid settings mysteriously changed without human intervention. I suspect that if I'd recognized the BIOS issue soon enough, I could have repaired it, instead everything I did made it worse, right up till I repaired it with XP!
-
There's no RAID in the BIOS, I have double checked it....
-
SoftDux,
I hope you "simply" lost your disk1 (sda1) boot track.
mdadm still almost knows about a "reasonable" setting
for your "data" device
disk 0, o:1, dev:hdb2
disk 1, o:1, dev:sda2
disk 2, o:1, dev:sdb2
...added
sysresccd ~ # fdisk -ul
seems either "bull*" or desastrous :shock:
You need to tell us more about your controller & disks + layout ...
sda: 255 heads, 63 sectors/track, 30401 cylinders, total 488395055 sectors
sdb: 255 heads, 63 sectors/track, 30401 cylinders, total 488395055 sectors
sdc: 255 heads, 63 sectors/track, 30401 cylinders, total 488397168 sectors
...makes us believe you do not only have 3 "sd" Devices (scsi/sata) but they are realllly different
Shortcut:
(assuming you did boot from sata i.e. sda)
(if you boot from hdb & the hd under repair was the 2nd boot device don't try this)
If you can change your setup (bios!) so booting may occur from sdb
you may be on track ... and after a succesful boot fix with what you already did read.
Regards
Reinhold
P.S.: 2 notes however
- of course md1 is operational in that any raid1 is operational with just a single device
- you do realize that you have been in degraded raid5 ;/
-
Hi Reinold
I have tried changing the boot order but I still get the kernel panic.
It also seems to me like on drive might be missing, but I don't know how to get it back the SME boot CD doesn't pick it up at all.
How do I get SSH to work with the SME CD in rescue mode?
sysrescuecd is still running, and running dmesg, I noticed the following:
ata1: PATA max UDMA/133 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001f000 irq 14
ata2: PATA max UDMA/133 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001f008 irq 15
ata1.00: ATAPI: SAMSUNG CD-R/RW DRIVE SW-224B, VE002R50, max MWDMA2
ata1.01: Host Protected Area detected:
current size: 488395055 sectors
native size: 488397168 sectors
ata1.01: ATA-7: ST3250820A, 3.AAF, max UDMA/100
ata1.01: 488395055 sectors, multi 16: LBA48
ata1.00: configured for MWDMA2
ata1.01: Host Protected Area detected:
current size: 488395055 sectors
native size: 488397168 sectors
ata1.01: configured for UDMA/100
scsi 2:0:0:0: CD-ROM SAMSUNG CD-R/RW SW-224B R205 PQ: 0 ANSI: 5
sr0: scsi3-mmc drive: 24x/24x writer cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 2:0:0:0: Attached scsi CD-ROM sr0
sr 2:0:0:0: Attached scsi generic sg0 type 5
scsi 2:0:1:0: Direct-Access ATA ST3250820A 3.AA PQ: 0 ANSI: 5
sd 2:0:1:0: [sda] 488395055 512-byte hardware sectors (250058 MB)
sd 2:0:1:0: [sda] Write Protect is off
sd 2:0:1:0: [sda] Mode Sense: 00 3a 00 00
sd 2:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 2:0:1:0: [sda] 488395055 512-byte hardware sectors (250058 MB)
sd 2:0:1:0: [sda] Write Protect is off
sd 2:0:1:0: [sda] Mode Sense: 00 3a 00 00
sd 2:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sda: sda1 sda2
sd 2:0:1:0: [sda] Attached SCSI disk
sd 2:0:1:0: Attached scsi generic sg1 type 0
ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
ACPI: PCI Interrupt 0000:00:1f.2 -> GSI 19 (level, low) -> IRQ 19
PCI: Setting latency timer of device 0000:00:1f.2 to 64
scsi4 : ata_piix
scsi5 : ata_piix
ata3: SATA max UDMA/133 cmd 0x0001e900 ctl 0x0001ea02 bmdma 0x0001ed00 irq 19
ata4: SATA max UDMA/133 cmd 0x0001eb00 ctl 0x0001ec02 bmdma 0x0001ed08 irq 19
ata3.00: Host Protected Area detected:
current size: 488395055 sectors
native size: 488397168 sectors
ata3.00: ATA-7: ST3250410AS, 3.AAC, max UDMA/133
ata3.00: 488395055 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata3.00: Host Protected Area detected:
current size: 488395055 sectors
native size: 488397168 sectors
ata3.00: configured for UDMA/133
ata4.00: ATA-7: ST3250410AS, 3.AAC, max UDMA/133
ata4.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata4.00: configured for UDMA/133
scsi 4:0:0:0: Direct-Access ATA ST3250410AS 3.AA PQ: 0 ANSI: 5
sd 4:0:0:0: [sdb] 488395055 512-byte hardware sectors (250058 MB)
sd 4:0:0:0: [sdb] Write Protect is off
sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 4:0:0:0: [sdb] 488395055 512-byte hardware sectors (250058 MB)
sd 4:0:0:0: [sdb] Write Protect is off
sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sdb: sdb1 sdb2
sd 4:0:0:0: [sdb] Attached SCSI disk
sd 4:0:0:0: Attached scsi generic sg2 type 0
scsi 5:0:0:0: Direct-Access ATA ST3250410AS 3.AA PQ: 0 ANSI: 5
sd 5:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
sd 5:0:0:0: [sdc] Write Protect is off
sd 5:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 5:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
sd 5:0:0:0: [sdc] Write Protect is off
sd 5:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sdc: sdc1 sdc2
sd 5:0:0:0: [sdc] Attached SCSI disk
sd 5:0:0:0: Attached scsi generic sg3 type 0
It's asif the IDE drive is reporting a different size than it's supposed to
-
Here's a disc geometory breakdown:
sysresccd ~ # dmesg |grep sdc
sd 5:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
sd 5:0:0:0: [sdc] Write Protect is off
sd 5:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 5:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
sd 5:0:0:0: [sdc] Write Protect is off
sd 5:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sdc: sdc1 sdc2
sd 5:0:0:0: [sdc] Attached SCSI disk
sysresccd ~ # dmesg |grep sda
sd 2:0:1:0: [sda] 488395055 512-byte hardware sectors (250058 MB)
sd 2:0:1:0: [sda] Write Protect is off
sd 2:0:1:0: [sda] Mode Sense: 00 3a 00 00
sd 2:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 2:0:1:0: [sda] 488395055 512-byte hardware sectors (250058 MB)
sd 2:0:1:0: [sda] Write Protect is off
sd 2:0:1:0: [sda] Mode Sense: 00 3a 00 00
sd 2:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sda: sda1 sda2
sd 2:0:1:0: [sda] Attached SCSI disk
sysresccd ~ # dmesg |grep sdb
sd 4:0:0:0: [sdb] 488395055 512-byte hardware sectors (250058 MB)
sd 4:0:0:0: [sdb] Write Protect is off
sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 4:0:0:0: [sdb] 488395055 512-byte hardware sectors (250058 MB)
sd 4:0:0:0: [sdb] Write Protect is off
sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sdb: sdb1 sdb2
sd 4:0:0:0: [sdb] Attached SCSI disk
-
SoftDUX,
Seems your hdb is "identified" as sda because it sit's behind you CDROM
(which is master on IDE 0 or pATA1 and CDROMs are scsi-fied in Linux anyway)
Two things are consistent:
...we are having missing/redundant sectors ... 2113 of them
... mdadm doesn't "see" the 3rd needed drive, most likely your 1st SATA
(which in turn allows no md2 device to be formed/autoassembled ... no "/" root to be found by the kernel)
Even though your partition tables seem correct (and Linux Software Raid is partition based)
we SHOULD know why you have (intentionally?) active/activated HPA (http://en.wikipedia.org/wiki/Host_Protected_Area) (click)
on (only) one of the SATA drives and the remaining (?) ATA
(2) md Autodetecting normally works ...
...mdadm is supposed to even catch when the disks not just completely changed names due to
a controller change, but change from hd* to sd* ... but just let us know what layout (order and naming) worked
when your md devices were operating fully (4 disks) and in degraded mode (1 pata 2 sata ?)
(3)Do you remember the layout of your SATA/ATA md devices ?
(i.e. what was the result of)
cat /proc/mdstat
Personalities : [raid1] [raid5]
md2 : active raid5 sda2[0] sdd2[3] sdc2[2] sdb2[1]
1234567 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]
md1 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
12345 blocks [4/4] [UUUU]
(4) Exactly when did you see your RAID 5 conf printout: with the missing disk 1, o:1, dev:sda2
(5) Has that CDROM drive been always located on that cable and always been active ? (or do you power it off /remove it ?)
(6) What drive did you ship to the factory? hdc? hdd?
(7) Some problem is that mdadm in SME 7.x is really old ;-/ ... you could try and check what is found as "available" mdadm --detail --scan ... maybe try to use a new live Linux (knoppix, ubuntu) ... ( maybe UUID on one sata hd is 'gone')
Regards
Reinhold
-
Just re-read what you posted first:
Booting from the CD, and specifying "sme rescue" wasn't very helpful, as it tells me there's no installation found.
So, I can't even try and rebuild a failed RAID: http://wiki.contribs.org/Raid#Raid_Notes
You did notice that with all your data on your already degraded RAID5 md device you wouldn't be able to rebuild in any case!!!
(rebuilding does imply you do have a working system - iow in raid5 this means you 'just' add redundancy again) 8-) :???:
currently THERE IS NO SME INSTALLATION just two unreadable HDs
... in your system everything starting from "/" isn't available :(
- and please do not blame SME for that <eg>
Regards
Reinhold
-
One thing which I don't understand though, is that it detects all 3 HDD's as sdX, even though I have 1x IDE & 2x SATA HDD's in the server.
With later kernels, all drives show as sdx.
-
Hello!
I have had the same problem!
I did try everything I could without success, I even bought a program to rescue the data but that did not help as some of my databases and emails were never recovered correctly. After a week with many sleepless nights this is what saved me:
http://www.howtoforge.com/recover_data_from_raid_lvm_partitions
This works perfectly, note that this is for IDE drives and mine were SATA. Also note that SME Volume Group is named 'main'.
I hope this can help you too!
-
Recover Data From RAID1 LVM Partitions
...will not (really) help him to recover his data from a RAID5 md2 device with 2 drives 8-)
But THE LINK IS good reading ! :grin:
Regards
Reinhold
-
SoftDUX,
Seems your hdb is "identified" as sda because it sit's behind you CDROM
(which is master on IDE 0 or pATA1 and CDROMs are scsi-fied in Linux anyway)
I don't think that's the reason, since the CDROM & HDD are both IDE type, and that shouln'd make a difference. This isn't a train-smash though, since I can identify them from the model numbers.
Two things are consistent:
...we are having missing/redundant sectors ... 2113 of them
... mdadm doesn't "see" the 3rd needed drive, most likely your 1st SATA
(which in turn allows no md2 device to be formed/autoassembled ... no "/" root to be found by the kernel)
Even though your partition tables seem correct (and Linux Software Raid is partition based)
we SHOULD know why you have (intentionally?) active/activated HPA (http://en.wikipedia.org/wiki/Host_Protected_Area) (click)
on (only) one of the SATA drives and the remaining (?) ATA
To be honest with you, I don't know what HPA is, or how / where to configure it. I just purchase the equipment, put it together & install the software :) If the BIOS has software RAID, I'd disable it, but that's it. Therere are 2 different types of drives in the machine though, SATA II & IDE. If you look at the info above, you'll see the following info
sda = ST3250820A - Which is IDE
sdb & sdc = ST3250410AS - which is SATA
SMART is enabled by default on this machine, so I don't know if that plays a role.
I'm in the BIOS right now, and this is what I see:
IDE Channel 0 Master [SAMSUNG CD-R/RW DRVI]
IDE Channel 0 Slave [ST3250820A]
IDE Channel 2 Master [ None ]
IDE Channel 2 Slave [ST3250820AS]
IDE Channel 3 Master [ None ]
IDE Channel 3 Slave [ST3250820AS]
The motherboard is a Gigabyte GA-G31MX-S2, with 4xSATA & 1xIDE (which give 1 drives) ports. The reason for the 1xIDE & 2xSATA's, is that it's going into a nother server with a mobo with only 2xSATA & 2XIDE (4 drives). The sad part is, I just just moved all my data from that server to this one, as I'm going to use the 80GB drives in that machine elsewhere. And the backup HDD is full, so I didn't have time yet to make a backup. I was going todo it just before the power failure struck the machine.
(2) md Autodetecting normally works ...
...mdadm is supposed to even catch when the disks not just completely changed names due to
a controller change, but change from hd* to sd* ... but just let us know what layout (order and naming) worked
when your md devices were operating fully (4 disks) and in degraded mode (1 pata 2 sata ?)
see above
(3)Do you remember the layout of your SATA/ATA md devices ?
(i.e. what was the result of)
cat /proc/mdstat
Personalities : [raid1] [raid5]
md2 : active raid5 sda2[0] sdd2[3] sdc2[2] sdb2[1]
1234567 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]
md1 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
12345 blocks [4/4] [UUUU]
Nope, sorry. But the hardware layout would help a bit?
(4) Exactly when did you see your RAID 5 conf printout: with the missing disk 1, o:1, dev:sda2
This is when I bootup the machine, and then it freezes with that last bit of info from the kernel. I had to type it out from the other monitor...
(5) Has that CDROM drive been always located on that cable and always been active ? (or do you power it off /remove it ?)
Yup, this is a 2U server case, with 2 drive bays at the back, where the 2x SATA HDD's are located, and 1x CD-ROM bay & 1xstiffy bay, where they IDE HDD is attached. Since both the CD-Rom & IDE HDD need to use the same IDE channel & cable, this is the only way. And to take out the CD-Rom is too much work, so I just left it in. I have plenty laying around.
(6) What drive did you ship to the factory? hdc? hdd?
The one I shipped back the the factory is another IDE drive, which wasn't part of the SME installation to begin with. I fried the motherboard before I even assembled this server. But it would go back into the machine. At this stage I'd be prepared to go and buy another drive if I have to. I just don't know if I need to get a SATA or IDE this time.
(7) Some problem is that mdadm in SME 7.x is really old ;-/ ... you could try and check what is found as "available" mdadm --detail --scan ... maybe try to use a new live Linux (knoppix, ubuntu) ... ( maybe UUID on one sata hd is 'gone')
Regards
Reinhold
I have a SystemRescueCd (http://www.sysresccd.org/) which I used to SSH into the machine and get more info from it. It has a lot of tools on it which I could use, I just don't know what exactly todo though
[/quote]
-
Just re-read what you posted first:
You did notice that with all your data on your already degraded RAID5 md device you wouldn't be able to rebuild in any case!!!
(rebuilding does imply you do have a working system - iow in raid5 this means you 'just' add redundancy again) 8-) :???:
currently THERE IS NO SME INSTALLATION just two unreadable HDs
... in your system everything starting from "/" isn't available :(
- and please do not blame SME for that <eg>
Regards
Reinhold
I'm not blaming sme :)
I'm blaming our country's pathetic power stations which don't supply proper, clean & reliable power. And due to this, we have a national shortage of UPS's.....
Anyway, I hear what you're saying. What would happen if I put in another HDD's, which I partitioned & format in another machine?
P.S. I've just tried booting from either of the 3 HDD's, by chaning the boot order in the BIOS, and it does the same thing
-
Hello!
I have had the same problem!
I did try everything I could without success, I even bought a program to rescue the data but that did not help as some of my databases and emails were never recovered correctly. After a week with many sleepless nights this is what saved me:
http://www.howtoforge.com/recover_data_from_raid_lvm_partitions
This works perfectly, note that this is for IDE drives and mine were SATA. Also note that SME Volume Group is named 'main'.
I hope this can help you too!
Thanx, I've read through that section as well, and it wasn't very helpful in my case
-
I'm not blaming sme :)
I'm blaming our country's pathetic power stations which don't supply proper, clean & reliable power. And due to this, we have a national shortage of UPS's.....
Eishkom strikes again :twisted: :evil:
-
yup, and the damage due to lost data is worth MUCH MUCH more than just replacing the HDD :(
-
Something that came to mind:
If my system is in degraded RAID 5, with 2 out of 3 drives working, then why can't I boot up? I thought RAID stripes the parity over all the drives for that exact purpose?
-
If my system is in degraded RAID 5, with 2 out of 3 drives working, then why can't I boot up? I thought RAID stripes the parity over all the drives for that exact purpose?
Sry - I have little time to reply/help now - irresistible family time 8) :D
BUT ... what you have now is a DEAD RAID5 :shock:
(from what I deduct now)
- You had a 4 drive RAID5
- it went degraded when you sent back that IDE drive
- it went dead (fails to autoassemble) when it couldn't find the 3rd drive i.e sda2 (or disk 1 in the conf printout)
In your configuration the RAID1 is boot area only
...after the kernel is loaded it can't contiunue, does panic, because root "/" cannot be found (as it is on md2 which does not get assembled ... see above)
Please have a closer look at that (non performing) drive.
- If possible use a new mdadm on a live linux cd for sleuthing
(keep in mind that the arrangement of your disks is nonstandard and the uuid fails to show correctly)
- Check why you have "all of a sudden" 488397168 sectors on one disk (hpa)
Regards
Reinhold
-
I never had a 4 drive RAID. Forget about the 4th HDD. I only had 3 drives when I setup the RAID, with the sme nospare option.
I have a spare 320GB (busy empty'ing it right now) - can I add this to the system, partition it same as the rest, & rebuilt the RAID?
-
Hi SoftDux,
In this case let's force with just 2 drives...
(I do still recommend to boot a live cd with a newer mdadm - (SME has 1.2 instead of a current 2.6 :-( )
sudo mdadm --assemble --scan
...and if that fails (likely) we just push it through
sudo mdadm --assemble --force /dev/md2
If/when md2 does assemble you may just mount and save your data
...if you had an lvm on top just follow the lvm link above starting at "Recovering LVM Setup"
Regards
Reinhold
P.S.: sudo is there just in case - you need to be root for this
P.P.S.: you can ignore md1 - no data from you inside
-
I have solved the problem, and about 90% of my data is back!! :)
After reading a lot on stuff on the net, and playing around with SysrescueCD, I've discovered that I can recreate the HDD partition tables, without formatting them, and then I could rebuild the RAID5 array. So, I've lost about 1 or 2% of my 280GB data. Most of sme itself didn't work properly afterwards, but I was able to move all the data to a USB HDD, and then reformatted all 3 HDD's and reinstalled sme 7.2 from scratch!
Thanx for all your help :)