Koozali.org: home of the SME Server

SME 7 pre1 melts - all data lost instantly (long)

Offline timb

  • ***
  • 41
  • +0/-0
    • http://www.tbitc.com
SME 7 pre1 melts - all data lost instantly (long)
« on: February 13, 2006, 02:43:51 AM »
Hi,
I am in some kind of special hell - go easy.
This is a sad tale of 2 servers totally loosing every damn file in an instant.
It may be I have a bug, It may be me - perhaps I am missing something.

I need to tell a bit of a story...

On 22Dec05 my sme7 box (my main server - no lectures please) melted - total loss of the file system. I had a couple of lock ups and then everything was gone. in the end all the hardware was fine - guess what I am typing this on.... except for the powersupply - bulging capacitors.

I built a new machine - 2 * enterprise class SATA 300GB mirrored with an intel mobo, intel 1G nic, p4 2.8Ghz - nice little box. All new Server - all of it.


It ran fantastic till the house fire tuesday night.

Yes house fire.

The fire was in the power box outside and the main distribution board was totalled. Nothing happened in my office or anywhere else in the house.

I kept the fire at bay till the  fire brigade arrived (make sure you have a fire extinguisher - its worth it to be able to say "I saved my house" - trust me even your wife loves you!)

Whilst the pros were putting out the fire - I pushed the power button on my server. It's on a ups and was still running long after the street power was removed from the house.


It never booted again. @#$%@% Now over the years I have dumped power on all sorts of linux distros - they can take it. (I know you can be unlucky but generally your ok)

Rescue time!
Insert the sme7 cd - run up till the installation wants to delete partitions - switch to another console -

lvm pvscan - no volumes
fdisk reports partitions
e2fsck says no magic number -but I am not surprised I cant get the lvm to work.

Oh hell

BUT-  I have another little server that backs up my main server every day! Ha ha ha - I aint stupid (although there is rising doubt).

Install SME7 on one disk (after all raid didnt do me much good and I can raid later plus I'd like to try to resuce the old build)
Wait for 250Gb of data to transfer - hours and hours
Move the ibays and users into place
copy across the passwd and shadow etc, templates-custom
signal-event post upgrade
signal-event reboot

Never boots
lvm pvscan - no volumes

its all gone.

Um I am at loss and server less - At the moment it looks like back to SME6 after lunch - ah no - wont work with my sata - give me a break.

The exact error message I get (picking through things)
unable to find volume group vg_primary
/bin/lvm exited abnormally.

naturally things go straight to hell from there.

Can any one suggest
A rescue strategy for the disk?
How to install sme 7 WITHOUT lvm? (btw I love the concept of lvm and can use it - but right now -I cant rescue it)


HELP!

Offline timb

  • ***
  • 41
  • +0/-0
    • http://www.tbitc.com
Is this bug 698 at work?
« Reply #1 on: February 13, 2006, 03:43:21 AM »
Just learning about bug 698

Is my build updateing to pre2 and then I am stubling into bug 698?

Or do the server gods have it in for me :-(

Offline timb

  • ***
  • 41
  • +0/-0
    • http://www.tbitc.com
Application of bug 698's workaround resolved this issue
« Reply #2 on: February 13, 2006, 04:26:28 AM »
I have just applied the workaround for bug 698 to my second build and it booted.

Massive sigh of relief

Lets hope its the same for the first disk lost a couple of days ago - fingers crossed.

Here's the workaround from bug 698

WORKAROUND for failed upgrades (normally SCSI and SATA systems, but possibly
others):

Boot from CDROM, holding the SHIFT key
At the boot: prompt, type "text rescue"
Accept default answers, but do not start the networking interfaces
At the shell prompt, type these commands:

chroot /mnt/sysimage
kudzu -q
mkinitrd -f /boot/initrd-2.6.9-22.0.2.ELsmp.img 2.6.9-22.0.2.ELsmp
mkinitrd -f /boot/initrd-2.6.9-22.0.2.EL.img 2.6.9-22.0.2.EL
sync
exit
exit

System should now reboot normally and complete the system reconfiguration.

Offline gordonr

  • *
  • 646
  • +0/-0
    • http://www.smeserver.com.au/
Re: Application of bug 698's workaround resolved this issue
« Reply #3 on: February 13, 2006, 04:52:47 AM »
Quote from: "timb"

Here's the workaround from bug 698.

No, please don't copy the WORKAROUND out of the bug. If we update the bug with further information, we end up with stale information in the forums, and there's already way too much of that...

Please, please, please raise problems in the bug tracker, and only there.
............