Koozali.org: home of the SME Server

Contribs.org Forums => Koozali SME Server 10.x => Topic started by: waldviertler on August 20, 2022, 03:26:59 PM

Title: Server crashed - and does not boot again...
Post by: waldviertler on August 20, 2022, 03:26:59 PM
Can somebody please help?
I have a up-do-date SME 10 Server that crashed today while a power loss.
Now I try to reboot - but I get - please look at the pictures.
Raidsystem says its optimal!

(https://perchtoldsdorf.com/IMG_6189.jpg)
and
(https://perchtoldsdorf.com/IMG_6168.jpg)
and
(https://perchtoldsdorf.com/IMG_6169.jpg)
What can that be?

Edit: I have tried "recovery mode" but it stucks also!!

Martin
Title: Re: Server crashed - and does not boot again...
Post by: ReetP on August 20, 2022, 04:37:45 PM
Better tell us something about your server.

Hardware, and server history - upgraded from previous versions, how, what else have you got installed, is it high or low volume.

Anything else that might be useful.
Title: Re: Server crashed - and does not boot again...
Post by: waldviertler on August 20, 2022, 04:52:50 PM
It's a Dell PowerEdge 1430 Xeon, with a iPerc 6 Raid Card. The raid is a Raid 5 with 4 drives.
I upgraded from previous Versions. I have nothing special installed.
Title: Re: Server crashed - and does not boot again...
Post by: waldviertler on August 20, 2022, 05:10:24 PM
I managed to boot into single mode, but
Code: [Select]
fsck or
Code: [Select]
e2fsck -D -tt -y /dev/main/root does not find a drive...
Title: Re: Server crashed - and does not boot again...
Post by: mophilly on August 20, 2022, 05:22:38 PM
Hardware failure comes to mind, given device check fails in safe mode. Perhaps the power outage was preceded by a spike. Do you have power line conditioning in place? If not, a piece may have been damaged.
Title: Re: Server crashed - and does not boot again...
Post by: waldviertler on August 20, 2022, 05:52:52 PM
In single mode I see all files.
But /dev is empty.
Ist that ok?
Title: Re: Server crashed - and does not boot again...
Post by: ReetP on August 20, 2022, 06:04:41 PM
Can you get into the PERC controller at boot and check the RAID status?

Just to be sure all your drives are up and OK.

And have you got a good backup in case this is fatal?
Title: Re: Server crashed - and does not boot again...
Post by: waldviertler on August 20, 2022, 06:16:17 PM
Yes, I have Perc checked: Raid status is optimal.
And I have a backup. I hope it's a good backup.

Can you give me a hint what to do after booting to single mode?
For checking the filesystem?

Thx



Title: Re: Server crashed - and does not boot again...
Post by: waldviertler on August 20, 2022, 06:24:34 PM
Even in single mode I got this after a while:

(https://perchtoldsdorf.com/IMG_6190.jpg)
Title: Re: Server crashed - and does not boot again...
Post by: ReetP on August 20, 2022, 06:59:02 PM
From what I have seen it is a kernel corruption.

The guy who may know is about and about at the minute - he should be about later.

[Edited the typos - am in the middle of a couple of big upgrades :neutral: ]
Title: Re: Server crashed - and does not boot again...
Post by: Jean-Philippe Pialasse on August 20, 2022, 07:04:14 PM

if you do not have backup time to clone the disk on a fresh one. if you have some check they are healthy. 

boot on SME install disc and use the recovery mode

do not mount filesystem.
do a fsck for xfs and hope it will repair.

check internet for xfs fsck procedure and use the error you show to get more information on your chances of success
Title: Re: Server crashed - and does not boot again...
Post by: waldviertler on August 21, 2022, 03:24:05 AM
I have done a fresh install and restored my backup.
Thanks all.
Title: Re: Server crashed - and does not boot again...
Post by: Jean-Philippe Pialasse on August 21, 2022, 06:58:24 AM
Good news to hear.

I was not suggesting to start with restoring, but if you were in a hurry to get it back I understand.
As suggested by Reetp you need a power backup, corrupted filessytem is not a good situation.

For the completeness of the answer now that I am able to type on a regular user/ computer interface :

for reference here is from Red Hat the procedure to try to fix the filesystem

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide

after rebooting on rescue mode from install disc, without mounting the filesystem, do ( if you use LVM)

Code: [Select]
xfs_repair /dev/mapper/main-root
lsblk command might help you to detect where is your root (/) partition

Code: [Select]
# lsblk
NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sr0            11:0    1 1024M  0 rom 
vda           252:0    0   30G  0 disk
├─vda1        252:1    0  500M  0 part /boot
└─vda2        252:2    0 29,5G  0 part
  ├─main-root 253:0    0 27,5G  0 lvm  /
  └─main-swap 253:1    0    2G  0 lvm  [SWAP]

Code: [Select]
# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0  1,0T  0 disk
├─sda1   8:1    0 1023M  0 part /boot
├─sda2   8:2    0    4G  0 part [SWAP]
├─sda3   8:3    0  956G  0 part /

Quote
The xfs_repair utility cannot repair an XFS file system with a dirty log. To clear the log, mount and unmount the XFS file system. If the log is corrupt and cannot be replayed, use the -L option ("force log zeroing") to clear the log, that is, xfs_repair -L /dev/device. Be aware that this may result in further corruption or data loss.

The need to have a fresh backup is that sometime the issue is the disk, or the journal is broken and the error is not recoverable.
If the issue is a broken disk sector (or multiple) cloning the disk allow you to work on the cloned copy on a working disk and increase change to recover data, without killing the old disk faster with the recover procedure.
Title: Re: Server crashed - and does not boot again...
Post by: waldviertler on August 21, 2022, 06:27:37 PM
Thank you very much!!

Martin
Title: Re: Server crashed - and does not boot again...
Post by: ReetP on August 21, 2022, 07:28:21 PM
Good that you had a decent backup - other readers should note this fact!

Treat yourself to a UPS - saves a lot of disasters :-) Even more so with the complexities of RAID systems.

Note also a RAID 5 with a few large drives is really not a good thing to have. The risk of a second disk failure while rebuilding a replacement drive is very high. The bigger the drives the higher the chance. Be very cautious there. Plenty of reading online about it.