Koozali.org: home of the SME Server
Obsolete Releases => SME Server 8.x => Topic started by: jameswilson on October 25, 2013, 02:21:26 PM
-
i have a 32 bit sme install running raid 6 with a spare on 7 drives.
We had a power fail this morning which was to long for the ups.
On power up the kernel panics, not syncing
using a rescue cd it says it cant find any linux partitions.
I have no clue what to do to rescue this thing, any help gratefully recieved.
Yours desperatly
James
-
jameswilson
I have no clue what to do to rescue this thing, any help gratefully recieved.
One approach is to rebuild the system from CD (new install) & restore from backup
You should then configure the UPS/SME server/Nut to gracefully shut your SME server down so that a similar issue does not happen in the future.
-
i can restore the sme bits but i was running various windows vm's in virtualbox. These were not backed up and i need a file from one of them.
-
on researching all of the 7 drives have no md superblock, on sda through to sdg
-
mdadm --assemble --force /dev/md0 /dev/sd[abcdefg]1
found in other places that this command worked but im not brave enough to use it?
-
using a rescue cd it says it cant find any linux partitions.
do you mean the system-rescue-cd
http://www.sysresccd.org/SystemRescueCd_Homepage
normally it can mount automatically the raid system, you can see https://wiki.contribs.org/Recovering_SME_Server_with_lvm_drives
i suppose that you have lvm activated.
-
i was using the sme cd and selecting rescue mode.
Its a standard install i didnt select no spare etc at install time.
-
i was using the sme cd and selecting rescue mode.
Its a standard install i didnt select no spare etc at install time.
therefore you should try this howto https://wiki.contribs.org/Recovering_SME_Server_with_lvm_drives#Method_A_with_SystemRescueCd
after the boot you can check if the raid is activated by performing
cat /proc/mdstat
-
it cant ensemble the array cos non of the drive have any superblock info.
on boot
cat /proc/mdstat gived
Personalities : [raid0] [raid1] [raid 10] [raid 6] [raid 5] [raid 4]
unused devices: <none>
-
well it is not good, really not
-
its not a superblock problem i was using the wrong command
I have issued the command
mdadm --stop /dev/md2
then
mdadm --assemble --force /dev/md2 /dev/sd[abcdefg]2
mdadm then reports
mdadm: /dev/md2 has been started with 5 drives (out of 6).
cat proc/mdstat then reports the array as active.
But i cant mount it?
Any other ideas
-
so the array starts im hopefull i can save this.
when doing
mdadm --examine /dev/sda2 i get different output depending on the drive.
can i make sme do a force on boot or stop it thinking it has a problem to it doesnt need the force?
-
if the raid starts, now you have to start the lvm as described in the howto
-
that has worked. i can now browse the array. How do i make sme boot so i can save the data i need or replace drives etc as needed?
-
ie the fact i can now access the data means i should be able to boot sme normally and replace the drive that appears to causing the problem
-
i mean i should be able to boot it in a degraded state
-
You should backup your data now, a degraded raid may not survive a reboot.
Once done, check all your disks condition with smartctl (and maybe badblocks if You want to double check)
You may try to rebuild your failed raid (add the ficked disk), but personnaly, I would rebuild the whole thing.
Ah, and don't forget to setup a daily backup and upsd ;)
-
well i use hdparm to identify the bad sdb. replaced drive and use the recusecd to resync md1. Rebooted, it came up into sme and added the hot spare (sdg) into the array. its now resyncing. I have other issues but in 2000 mins i should have a stable md2
-
you should setup a wiki page on your adventure for the next man concerned with this kind of issues.
-
good shout
Its far from finished yet, ive never had this happen before, but the arrays are wierd atm
-
[root@sme-big ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1]
md1 : active raid1 sdb[1]
104320 blocks [7/1] [_U_____]
md127 : active raid1 sda1[0] sdc1[2] sdd1[3] sde1[4] sdf1[5] sdg1[6]
104320 blocks [7/6] [U_UUUUU]
resync=DELAYED
md2 : active raid6 sdg2[6] sdf2[5] sde2[4] sdd2[3] sdc2[2] sda2[0]
7813629952 blocks level 6, 256k chunk, algorithm 2 [6/5] [U_UUUU]
[================>....] recovery = 82.3% (1609272576/1953407488) finish=326.8min speed=17546K/sec
unused devices: <none>
Need to sort out the sdb mistake i made