Koozali.org: home of the SME Server

broken system after power fail

Offline jameswilson

  • *
  • 795
  • +0/-0
    • Security Warehouse, professional security equipment
broken system after power fail
« on: October 25, 2013, 02:21:26 PM »
i have a 32 bit sme install running raid 6 with a spare on 7 drives.
We had a power fail this morning which was to long for the ups.
On power up the kernel panics, not syncing
using a rescue cd it says it cant find any linux partitions.

I have no clue what to do to rescue this thing, any help gratefully recieved.

Yours desperatly
James

Offline janet

  • *****
  • 4,812
  • +0/-0
Re: broken system after power fail
« Reply #1 on: October 25, 2013, 02:35:05 PM »
jameswilson

Quote
I have no clue what to do to rescue this thing, any help gratefully recieved.

One approach is to rebuild the system from CD (new install) & restore from backup
You should then configure the UPS/SME server/Nut to gracefully shut your SME server down so that a similar issue does not happen in the future.
Please search before asking, an answer may already exist.
The Search & other links to useful information are at top of Forum.

Offline jameswilson

  • *
  • 795
  • +0/-0
    • Security Warehouse, professional security equipment
Re: broken system after power fail
« Reply #2 on: October 25, 2013, 04:16:38 PM »
i can restore the sme bits but i was running various windows vm's in virtualbox. These were not backed up and i need a file from one of them.

Offline jameswilson

  • *
  • 795
  • +0/-0
    • Security Warehouse, professional security equipment
Re: broken system after power fail
« Reply #3 on: October 25, 2013, 04:29:13 PM »
on researching all of the 7 drives have no md superblock, on sda through to sdg

Offline jameswilson

  • *
  • 795
  • +0/-0
    • Security Warehouse, professional security equipment
Re: broken system after power fail
« Reply #4 on: October 25, 2013, 07:25:52 PM »
Quote
mdadm --assemble --force /dev/md0 /dev/sd[abcdefg]1

found in other places that this command worked but im not brave enough to use it?

Offline stephdl

  • *
  • 1,523
  • +0/-0
    • Linux et Geekeries
Re: broken system after power fail
« Reply #5 on: October 26, 2013, 01:17:58 AM »
using a rescue cd it says it cant find any linux partitions.

do you mean the system-rescue-cd

http://www.sysresccd.org/SystemRescueCd_Homepage

normally it can mount automatically the raid system, you can see https://wiki.contribs.org/Recovering_SME_Server_with_lvm_drives

i suppose that you have lvm activated.
See http://wiki.contribs.org/Koozali_Foundation
irc : Freenode #sme_server #sme-fr

!!! Please write your knowledge to the Wiki !!!

Offline jameswilson

  • *
  • 795
  • +0/-0
    • Security Warehouse, professional security equipment
Re: broken system after power fail
« Reply #6 on: October 26, 2013, 04:30:51 PM »
i was using the sme cd and selecting rescue mode.

Its a standard install i didnt select no spare etc at install time.

Offline stephdl

  • *
  • 1,523
  • +0/-0
    • Linux et Geekeries
Re: broken system after power fail
« Reply #7 on: October 26, 2013, 05:02:23 PM »
i was using the sme cd and selecting rescue mode.

Its a standard install i didnt select no spare etc at install time.

therefore you should try this howto https://wiki.contribs.org/Recovering_SME_Server_with_lvm_drives#Method_A_with_SystemRescueCd

after the boot you can check if the raid is activated by performing

cat /proc/mdstat
See http://wiki.contribs.org/Koozali_Foundation
irc : Freenode #sme_server #sme-fr

!!! Please write your knowledge to the Wiki !!!

Offline jameswilson

  • *
  • 795
  • +0/-0
    • Security Warehouse, professional security equipment
Re: broken system after power fail
« Reply #8 on: October 26, 2013, 05:15:11 PM »
it cant ensemble the array cos non of the drive have any superblock info.

on boot
cat /proc/mdstat gived

Quote
Personalities : [raid0] [raid1] [raid 10] [raid 6] [raid 5] [raid 4]
unused devices: <none>

Offline stephdl

  • *
  • 1,523
  • +0/-0
    • Linux et Geekeries
Re: broken system after power fail
« Reply #9 on: October 26, 2013, 06:00:03 PM »
well it is not good, really not
See http://wiki.contribs.org/Koozali_Foundation
irc : Freenode #sme_server #sme-fr

!!! Please write your knowledge to the Wiki !!!

Offline jameswilson

  • *
  • 795
  • +0/-0
    • Security Warehouse, professional security equipment
Re: broken system after power fail
« Reply #10 on: October 26, 2013, 07:49:47 PM »
its not a superblock problem i was using the wrong command

I have issued the command

mdadm --stop /dev/md2

then

mdadm --assemble --force /dev/md2 /dev/sd[abcdefg]2

mdadm then reports

mdadm: /dev/md2 has been started with 5 drives (out of 6).

cat proc/mdstat then reports the array as active.

But i cant mount it?

Any other ideas

Offline jameswilson

  • *
  • 795
  • +0/-0
    • Security Warehouse, professional security equipment
Re: broken system after power fail
« Reply #11 on: October 26, 2013, 08:01:01 PM »
so the array starts im hopefull i can save this.
when doing
mdadm --examine /dev/sda2 i get different output depending on the drive.
can i make sme do a force on boot or stop it thinking it has a problem to it doesnt need the force?

Offline stephdl

  • *
  • 1,523
  • +0/-0
    • Linux et Geekeries
Re: broken system after power fail
« Reply #12 on: October 26, 2013, 08:36:14 PM »
if the raid starts, now you have to start the lvm as described in the howto
See http://wiki.contribs.org/Koozali_Foundation
irc : Freenode #sme_server #sme-fr

!!! Please write your knowledge to the Wiki !!!

Offline jameswilson

  • *
  • 795
  • +0/-0
    • Security Warehouse, professional security equipment
Re: broken system after power fail
« Reply #13 on: October 26, 2013, 09:29:05 PM »
that has worked. i can now browse the array. How do i make sme boot so i can save the data i need or replace drives etc as needed?

Offline jameswilson

  • *
  • 795
  • +0/-0
    • Security Warehouse, professional security equipment
Re: broken system after power fail
« Reply #14 on: October 26, 2013, 09:33:42 PM »
ie the fact i can now access the data means i should be able to boot sme normally and replace the drive that appears to causing the problem

Offline jameswilson

  • *
  • 795
  • +0/-0
    • Security Warehouse, professional security equipment
Re: broken system after power fail
« Reply #15 on: October 26, 2013, 09:34:29 PM »
i mean i should be able to boot it in a degraded state

Offline _alex

  • ****
  • 103
  • +0/-0
Re: broken system after power fail
« Reply #16 on: October 26, 2013, 11:07:09 PM »
You should backup your data now, a degraded raid may not survive a reboot.
Once done, check all your disks condition with smartctl (and maybe badblocks if You want to double check)

You may try to rebuild your failed raid (add the ficked disk), but personnaly, I would rebuild the whole thing.
Ah, and don't forget to setup a daily backup and upsd ;)

Offline jameswilson

  • *
  • 795
  • +0/-0
    • Security Warehouse, professional security equipment
Re: broken system after power fail
« Reply #17 on: October 26, 2013, 11:10:22 PM »
well i use hdparm to identify the bad sdb. replaced drive and use the recusecd to resync md1. Rebooted, it came up into sme and added the hot spare (sdg) into the array. its now resyncing. I have other issues but in 2000 mins i should have a stable md2

Offline stephdl

  • *
  • 1,523
  • +0/-0
    • Linux et Geekeries
Re: broken system after power fail
« Reply #18 on: October 27, 2013, 10:26:55 AM »
you should setup a wiki page on your adventure for the next man concerned  with this kind of issues.
See http://wiki.contribs.org/Koozali_Foundation
irc : Freenode #sme_server #sme-fr

!!! Please write your knowledge to the Wiki !!!

Offline jameswilson

  • *
  • 795
  • +0/-0
    • Security Warehouse, professional security equipment
Re: broken system after power fail
« Reply #19 on: October 27, 2013, 06:15:50 PM »
good shout

Its far from  finished yet, ive never had this happen before, but the arrays are wierd atm

Offline jameswilson

  • *
  • 795
  • +0/-0
    • Security Warehouse, professional security equipment
Re: broken system after power fail
« Reply #20 on: October 27, 2013, 07:00:33 PM »
Quote
[root@sme-big ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1]
md1 : active raid1 sdb[1]
      104320 blocks [7/1] [_U_____]

md127 : active raid1 sda1[0] sdc1[2] sdd1[3] sde1[4] sdf1[5] sdg1[6]
      104320 blocks [7/6] [U_UUUUU]
        resync=DELAYED

md2 : active raid6 sdg2[6] sdf2[5] sde2[4] sdd2[3] sdc2[2] sda2[0]
      7813629952 blocks level 6, 256k chunk, algorithm 2 [6/5] [U_UUUU]
      [================>....]  recovery = 82.3% (1609272576/1953407488) finish=326.8min speed=17546K/sec

unused devices: <none>

Need to sort out the sdb mistake i made