Koozali.org: home of the SME Server

Obsolete Releases => SME Server 8.x => Topic started by: jameswilson on October 25, 2013, 02:21:26 PM

Title: broken system after power fail
Post by: jameswilson on October 25, 2013, 02:21:26 PM
i have a 32 bit sme install running raid 6 with a spare on 7 drives.
We had a power fail this morning which was to long for the ups.
On power up the kernel panics, not syncing
using a rescue cd it says it cant find any linux partitions.

I have no clue what to do to rescue this thing, any help gratefully recieved.

Yours desperatly
James
Title: Re: broken system after power fail
Post by: janet on October 25, 2013, 02:35:05 PM
jameswilson

Quote
I have no clue what to do to rescue this thing, any help gratefully recieved.

One approach is to rebuild the system from CD (new install) & restore from backup
You should then configure the UPS/SME server/Nut to gracefully shut your SME server down so that a similar issue does not happen in the future.
Title: Re: broken system after power fail
Post by: jameswilson on October 25, 2013, 04:16:38 PM
i can restore the sme bits but i was running various windows vm's in virtualbox. These were not backed up and i need a file from one of them.
Title: Re: broken system after power fail
Post by: jameswilson on October 25, 2013, 04:29:13 PM
on researching all of the 7 drives have no md superblock, on sda through to sdg
Title: Re: broken system after power fail
Post by: jameswilson on October 25, 2013, 07:25:52 PM
Quote
mdadm --assemble --force /dev/md0 /dev/sd[abcdefg]1

found in other places that this command worked but im not brave enough to use it?
Title: Re: broken system after power fail
Post by: stephdl on October 26, 2013, 01:17:58 AM
using a rescue cd it says it cant find any linux partitions.

do you mean the system-rescue-cd

http://www.sysresccd.org/SystemRescueCd_Homepage

normally it can mount automatically the raid system, you can see https://wiki.contribs.org/Recovering_SME_Server_with_lvm_drives

i suppose that you have lvm activated.
Title: Re: broken system after power fail
Post by: jameswilson on October 26, 2013, 04:30:51 PM
i was using the sme cd and selecting rescue mode.

Its a standard install i didnt select no spare etc at install time.
Title: Re: broken system after power fail
Post by: stephdl on October 26, 2013, 05:02:23 PM
i was using the sme cd and selecting rescue mode.

Its a standard install i didnt select no spare etc at install time.

therefore you should try this howto https://wiki.contribs.org/Recovering_SME_Server_with_lvm_drives#Method_A_with_SystemRescueCd

after the boot you can check if the raid is activated by performing

cat /proc/mdstat
Title: Re: broken system after power fail
Post by: jameswilson on October 26, 2013, 05:15:11 PM
it cant ensemble the array cos non of the drive have any superblock info.

on boot
cat /proc/mdstat gived

Quote
Personalities : [raid0] [raid1] [raid 10] [raid 6] [raid 5] [raid 4]
unused devices: <none>
Title: Re: broken system after power fail
Post by: stephdl on October 26, 2013, 06:00:03 PM
well it is not good, really not
Title: Re: broken system after power fail
Post by: jameswilson on October 26, 2013, 07:49:47 PM
its not a superblock problem i was using the wrong command

I have issued the command

mdadm --stop /dev/md2

then

mdadm --assemble --force /dev/md2 /dev/sd[abcdefg]2

mdadm then reports

mdadm: /dev/md2 has been started with 5 drives (out of 6).

cat proc/mdstat then reports the array as active.

But i cant mount it?

Any other ideas
Title: Re: broken system after power fail
Post by: jameswilson on October 26, 2013, 08:01:01 PM
so the array starts im hopefull i can save this.
when doing
mdadm --examine /dev/sda2 i get different output depending on the drive.
can i make sme do a force on boot or stop it thinking it has a problem to it doesnt need the force?
Title: Re: broken system after power fail
Post by: stephdl on October 26, 2013, 08:36:14 PM
if the raid starts, now you have to start the lvm as described in the howto
Title: Re: broken system after power fail
Post by: jameswilson on October 26, 2013, 09:29:05 PM
that has worked. i can now browse the array. How do i make sme boot so i can save the data i need or replace drives etc as needed?
Title: Re: broken system after power fail
Post by: jameswilson on October 26, 2013, 09:33:42 PM
ie the fact i can now access the data means i should be able to boot sme normally and replace the drive that appears to causing the problem
Title: Re: broken system after power fail
Post by: jameswilson on October 26, 2013, 09:34:29 PM
i mean i should be able to boot it in a degraded state
Title: Re: broken system after power fail
Post by: _alex on October 26, 2013, 11:07:09 PM
You should backup your data now, a degraded raid may not survive a reboot.
Once done, check all your disks condition with smartctl (and maybe badblocks if You want to double check)

You may try to rebuild your failed raid (add the ficked disk), but personnaly, I would rebuild the whole thing.
Ah, and don't forget to setup a daily backup and upsd ;)
Title: Re: broken system after power fail
Post by: jameswilson on October 26, 2013, 11:10:22 PM
well i use hdparm to identify the bad sdb. replaced drive and use the recusecd to resync md1. Rebooted, it came up into sme and added the hot spare (sdg) into the array. its now resyncing. I have other issues but in 2000 mins i should have a stable md2
Title: Re: broken system after power fail
Post by: stephdl on October 27, 2013, 10:26:55 AM
you should setup a wiki page on your adventure for the next man concerned  with this kind of issues.
Title: Re: broken system after power fail
Post by: jameswilson on October 27, 2013, 06:15:50 PM
good shout

Its far from  finished yet, ive never had this happen before, but the arrays are wierd atm
Title: Re: broken system after power fail
Post by: jameswilson on October 27, 2013, 07:00:33 PM
Quote
[root@sme-big ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1]
md1 : active raid1 sdb[1]
      104320 blocks [7/1] [_U_____]

md127 : active raid1 sda1[0] sdc1[2] sdd1[3] sde1[4] sdf1[5] sdg1[6]
      104320 blocks [7/6] [U_UUUUU]
        resync=DELAYED

md2 : active raid6 sdg2[6] sdf2[5] sde2[4] sdd2[3] sdc2[2] sda2[0]
      7813629952 blocks level 6, 256k chunk, algorithm 2 [6/5] [U_UUUU]
      [================>....]  recovery = 82.3% (1609272576/1953407488) finish=326.8min speed=17546K/sec

unused devices: <none>

Need to sort out the sdb mistake i made