Koozali.org: home of the SME Server

Contribs.org Forums => Koozali SME Server 10.x => Topic started by: waldviertler on August 20, 2022, 03:26:59 PM

Title: Server crashed - and does not boot again...
Post by: waldviertler on August 20, 2022, 03:26:59 PM: Can somebody please help?
I have a up-do-date SME 10 Server that crashed today while a power loss.
Now I try to reboot - but I get - please look at the pictures.
Raidsystem says its optimal!

(https://perchtoldsdorf.com/IMG_6189.jpg)
and
(https://perchtoldsdorf.com/IMG_6168.jpg)
and
(https://perchtoldsdorf.com/IMG_6169.jpg)
What can that be?

Edit: I have tried "recovery mode" but it stucks also!!

Martin
Title: Re: Server crashed - and does not boot again...
Post by: ReetP on August 20, 2022, 04:37:45 PM: Better tell us something about your server.

Hardware, and server history - upgraded from previous versions, how, what else have you got installed, is it high or low volume.

Anything else that might be useful.
Title: Re: Server crashed - and does not boot again...
Post by: waldviertler on August 20, 2022, 04:52:50 PM: It's a Dell PowerEdge 1430 Xeon, with a iPerc 6 Raid Card. The raid is a Raid 5 with 4 drives.
I upgraded from previous Versions. I have nothing special installed.
Title: Re: Server crashed - and does not boot again...
Post by: waldviertler on August 20, 2022, 05:10:24 PM: I managed to boot into single mode, but
Code: [Select]
fsck or
Code: [Select]
e2fsck -D -tt -y /dev/main/root does not find a drive...
Title: Re: Server crashed - and does not boot again...
Post by: mophilly on August 20, 2022, 05:22:38 PM: Hardware failure comes to mind, given device check fails in safe mode. Perhaps the power outage was preceded by a spike. Do you have power line conditioning in place? If not, a piece may have been damaged.
Title: Re: Server crashed - and does not boot again...
Post by: waldviertler on August 20, 2022, 05:52:52 PM: In single mode I see all files.
But /dev is empty.
Ist that ok?
Title: Re: Server crashed - and does not boot again...
Post by: ReetP on August 20, 2022, 06:04:41 PM: Can you get into the PERC controller at boot and check the RAID status?

Just to be sure all your drives are up and OK.

And have you got a good backup in case this is fatal?
Title: Re: Server crashed - and does not boot again...
Post by: waldviertler on August 20, 2022, 06:16:17 PM: Yes, I have Perc checked: Raid status is optimal.
And I have a backup. I hope it's a good backup.

Can you give me a hint what to do after booting to single mode?
For checking the filesystem?

Thx
Title: Re: Server crashed - and does not boot again...
Post by: waldviertler on August 20, 2022, 06:24:34 PM: Even in single mode I got this after a while:

(https://perchtoldsdorf.com/IMG_6190.jpg)
Title: Re: Server crashed - and does not boot again...
Post by: ReetP on August 20, 2022, 06:59:02 PM: From what I have seen it is a kernel corruption.

The guy who may know is about and about at the minute - he should be about later.

[Edited the typos - am in the middle of a couple of big upgrades :neutral: ]
Title: Re: Server crashed - and does not boot again...
Post by: Jean-Philippe Pialasse on August 20, 2022, 07:04:14 PM: if you do not have backup time to clone the disk on a fresh one. if you have some check they are healthy.

boot on SME install disc and use the recovery mode

do not mount filesystem.
do a fsck for xfs and hope it will repair.

check internet for xfs fsck procedure and use the error you show to get more information on your chances of success
Title: Re: Server crashed - and does not boot again...
Post by: waldviertler on August 21, 2022, 03:24:05 AM: I have done a fresh install and restored my backup.
Thanks all.
Title: Re: Server crashed - and does not boot again...
Post by: Jean-Philippe Pialasse on August 21, 2022, 06:58:24 AM: Good news to hear.

I was not suggesting to start with restoring, but if you were in a hurry to get it back I understand.
As suggested by Reetp you need a power backup, corrupted filessytem is not a good situation.

For the completeness of the answer now that I am able to type on a regular user/ computer interface :

for reference here is from Red Hat the procedure to try to fix the filesystem

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide

after rebooting on rescue mode from install disc, without mounting the filesystem, do ( if you use LVM)

Code: [Select]
xfs_repair /dev/mapper/main-root
lsblk command might help you to detect where is your root (/) partition

Code: [Select]
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sr0 11:0 1 1024M 0 rom vda 252:0 0 30G 0 disk ├─vda1 252:1 0 500M 0 part /boot └─vda2 252:2 0 29,5G 0 part ├─main-root 253:0 0 27,5G 0 lvm / └─main-swap 253:1 0 2G 0 lvm [SWAP]
Code: [Select]
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 1,0T 0 disk ├─sda1 8:1 0 1023M 0 part /boot ├─sda2 8:2 0 4G 0 part [SWAP] ├─sda3 8:3 0 956G 0 part /
Quote
The xfs_repair utility cannot repair an XFS file system with a dirty log. To clear the log, mount and unmount the XFS file system. If the log is corrupt and cannot be replayed, use the -L option ("force log zeroing") to clear the log, that is, xfs_repair -L /dev/device. Be aware that this may result in further corruption or data loss.

The need to have a fresh backup is that sometime the issue is the disk, or the journal is broken and the error is not recoverable.
If the issue is a broken disk sector (or multiple) cloning the disk allow you to work on the cloned copy on a working disk and increase change to recover data, without killing the old disk faster with the recover procedure.
Title: Re: Server crashed - and does not boot again...
Post by: waldviertler on August 21, 2022, 06:27:37 PM: Thank you very much!!

Martin
Title: Re: Server crashed - and does not boot again...
Post by: ReetP on August 21, 2022, 07:28:21 PM: Good that you had a decent backup - other readers should note this fact!

Treat yourself to a UPS - saves a lot of disasters :-) Even more so with the complexities of RAID systems.

Note also a RAID 5 with a few large drives is really not a good thing to have. The risk of a second disk failure while rebuilding a replacement drive is very high. The bigger the drives the higher the chance. Be very cautious there. Plenty of reading online about it.