Koozali.org: home of the SME Server

Server crashed - and does not boot again...

Offline waldviertler

  • ***
  • 107
  • +0/-0
Server crashed - and does not boot again...
« on: August 20, 2022, 03:26:59 PM »
Can somebody please help?
I have a up-do-date SME 10 Server that crashed today while a power loss.
Now I try to reboot - but I get - please look at the pictures.
Raidsystem says its optimal!


and

and

What can that be?

Edit: I have tried "recovery mode" but it stucks also!!

Martin
« Last Edit: August 20, 2022, 04:26:25 PM by waldviertler »

Offline ReetP

  • *
  • 3,734
  • +5/-0
Re: Server crashed - and does not boot again...
« Reply #1 on: August 20, 2022, 04:37:45 PM »
Better tell us something about your server.

Hardware, and server history - upgraded from previous versions, how, what else have you got installed, is it high or low volume.

Anything else that might be useful.
...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline waldviertler

  • ***
  • 107
  • +0/-0
Re: Server crashed - and does not boot again...
« Reply #2 on: August 20, 2022, 04:52:50 PM »
It's a Dell PowerEdge 1430 Xeon, with a iPerc 6 Raid Card. The raid is a Raid 5 with 4 drives.
I upgraded from previous Versions. I have nothing special installed.
« Last Edit: August 20, 2022, 04:58:43 PM by waldviertler »

Offline waldviertler

  • ***
  • 107
  • +0/-0
Re: Server crashed - and does not boot again...
« Reply #3 on: August 20, 2022, 05:10:24 PM »
I managed to boot into single mode, but
Code: [Select]
fsck or
Code: [Select]
e2fsck -D -tt -y /dev/main/root does not find a drive...

Offline Mophilly

  • *
  • 384
  • +0/-0
    • Mophilly
Re: Server crashed - and does not boot again...
« Reply #4 on: August 20, 2022, 05:22:38 PM »
Hardware failure comes to mind, given device check fails in safe mode. Perhaps the power outage was preceded by a spike. Do you have power line conditioning in place? If not, a piece may have been damaged.
- Mark

Offline waldviertler

  • ***
  • 107
  • +0/-0
Re: Server crashed - and does not boot again...
« Reply #5 on: August 20, 2022, 05:52:52 PM »
In single mode I see all files.
But /dev is empty.
Ist that ok?

Offline ReetP

  • *
  • 3,734
  • +5/-0
Re: Server crashed - and does not boot again...
« Reply #6 on: August 20, 2022, 06:04:41 PM »
Can you get into the PERC controller at boot and check the RAID status?

Just to be sure all your drives are up and OK.

And have you got a good backup in case this is fatal?
...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline waldviertler

  • ***
  • 107
  • +0/-0
Re: Server crashed - and does not boot again...
« Reply #7 on: August 20, 2022, 06:16:17 PM »
Yes, I have Perc checked: Raid status is optimal.
And I have a backup. I hope it's a good backup.

Can you give me a hint what to do after booting to single mode?
For checking the filesystem?

Thx




Offline waldviertler

  • ***
  • 107
  • +0/-0
Re: Server crashed - and does not boot again...
« Reply #8 on: August 20, 2022, 06:24:34 PM »
Even in single mode I got this after a while:


Offline ReetP

  • *
  • 3,734
  • +5/-0
Re: Server crashed - and does not boot again...
« Reply #9 on: August 20, 2022, 06:59:02 PM »
From what I have seen it is a kernel corruption.

The guy who may know is about and about at the minute - he should be about later.

[Edited the typos - am in the middle of a couple of big upgrades :neutral: ]
« Last Edit: August 20, 2022, 07:02:29 PM by ReetP »
...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline Jean-Philippe Pialasse

  • *
  • 2,763
  • +11/-0
  • aka Unnilennium
    • http://smeserver.pialasse.com
Re: Server crashed - and does not boot again...
« Reply #10 on: August 20, 2022, 07:04:14 PM »

if you do not have backup time to clone the disk on a fresh one. if you have some check they are healthy. 

boot on SME install disc and use the recovery mode

do not mount filesystem.
do a fsck for xfs and hope it will repair.

check internet for xfs fsck procedure and use the error you show to get more information on your chances of success

Offline waldviertler

  • ***
  • 107
  • +0/-0
Re: Server crashed - and does not boot again...
« Reply #11 on: August 21, 2022, 03:24:05 AM »
I have done a fresh install and restored my backup.
Thanks all.

Offline Jean-Philippe Pialasse

  • *
  • 2,763
  • +11/-0
  • aka Unnilennium
    • http://smeserver.pialasse.com
Re: Server crashed - and does not boot again...
« Reply #12 on: August 21, 2022, 06:58:24 AM »
Good news to hear.

I was not suggesting to start with restoring, but if you were in a hurry to get it back I understand.
As suggested by Reetp you need a power backup, corrupted filessytem is not a good situation.

For the completeness of the answer now that I am able to type on a regular user/ computer interface :

for reference here is from Red Hat the procedure to try to fix the filesystem

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide

after rebooting on rescue mode from install disc, without mounting the filesystem, do ( if you use LVM)

Code: [Select]
xfs_repair /dev/mapper/main-root
lsblk command might help you to detect where is your root (/) partition

Code: [Select]
# lsblk
NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sr0            11:0    1 1024M  0 rom 
vda           252:0    0   30G  0 disk
├─vda1        252:1    0  500M  0 part /boot
└─vda2        252:2    0 29,5G  0 part
  ├─main-root 253:0    0 27,5G  0 lvm  /
  └─main-swap 253:1    0    2G  0 lvm  [SWAP]

Code: [Select]
# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0  1,0T  0 disk
├─sda1   8:1    0 1023M  0 part /boot
├─sda2   8:2    0    4G  0 part [SWAP]
├─sda3   8:3    0  956G  0 part /

Quote
The xfs_repair utility cannot repair an XFS file system with a dirty log. To clear the log, mount and unmount the XFS file system. If the log is corrupt and cannot be replayed, use the -L option ("force log zeroing") to clear the log, that is, xfs_repair -L /dev/device. Be aware that this may result in further corruption or data loss.

The need to have a fresh backup is that sometime the issue is the disk, or the journal is broken and the error is not recoverable.
If the issue is a broken disk sector (or multiple) cloning the disk allow you to work on the cloned copy on a working disk and increase change to recover data, without killing the old disk faster with the recover procedure.

Offline waldviertler

  • ***
  • 107
  • +0/-0
Re: Server crashed - and does not boot again...
« Reply #13 on: August 21, 2022, 06:27:37 PM »
Thank you very much!!

Martin

Offline ReetP

  • *
  • 3,734
  • +5/-0
Re: Server crashed - and does not boot again...
« Reply #14 on: August 21, 2022, 07:28:21 PM »
Good that you had a decent backup - other readers should note this fact!

Treat yourself to a UPS - saves a lot of disasters :-) Even more so with the complexities of RAID systems.

Note also a RAID 5 with a few large drives is really not a good thing to have. The risk of a second disk failure while rebuilding a replacement drive is very high. The bigger the drives the higher the chance. Be very cautious there. Plenty of reading online about it.


...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation