Koozali.org: home of the SME Server
Contribs.org Forums => Koozali SME Server 10.x => Topic started by: waldviertler on August 20, 2022, 03:26:59 PM
-
Can somebody please help?
I have a up-do-date SME 10 Server that crashed today while a power loss.
Now I try to reboot - but I get - please look at the pictures.
Raidsystem says its optimal!
(https://perchtoldsdorf.com/IMG_6189.jpg)
and
(https://perchtoldsdorf.com/IMG_6168.jpg)
and
(https://perchtoldsdorf.com/IMG_6169.jpg)
What can that be?
Edit: I have tried "recovery mode" but it stucks also!!
Martin
-
Better tell us something about your server.
Hardware, and server history - upgraded from previous versions, how, what else have you got installed, is it high or low volume.
Anything else that might be useful.
-
It's a Dell PowerEdge 1430 Xeon, with a iPerc 6 Raid Card. The raid is a Raid 5 with 4 drives.
I upgraded from previous Versions. I have nothing special installed.
-
I managed to boot into single mode, but
fsck
or e2fsck -D -tt -y /dev/main/root
does not find a drive...
-
Hardware failure comes to mind, given device check fails in safe mode. Perhaps the power outage was preceded by a spike. Do you have power line conditioning in place? If not, a piece may have been damaged.
-
In single mode I see all files.
But /dev is empty.
Ist that ok?
-
Can you get into the PERC controller at boot and check the RAID status?
Just to be sure all your drives are up and OK.
And have you got a good backup in case this is fatal?
-
Yes, I have Perc checked: Raid status is optimal.
And I have a backup. I hope it's a good backup.
Can you give me a hint what to do after booting to single mode?
For checking the filesystem?
Thx
-
Even in single mode I got this after a while:
(https://perchtoldsdorf.com/IMG_6190.jpg)
-
From what I have seen it is a kernel corruption.
The guy who may know is about and about at the minute - he should be about later.
[Edited the typos - am in the middle of a couple of big upgrades :neutral: ]
-
if you do not have backup time to clone the disk on a fresh one. if you have some check they are healthy.
boot on SME install disc and use the recovery mode
do not mount filesystem.
do a fsck for xfs and hope it will repair.
check internet for xfs fsck procedure and use the error you show to get more information on your chances of success
-
I have done a fresh install and restored my backup.
Thanks all.
-
Good news to hear.
I was not suggesting to start with restoring, but if you were in a hurry to get it back I understand.
As suggested by Reetp you need a power backup, corrupted filessytem is not a good situation.
For the completeness of the answer now that I am able to type on a regular user/ computer interface :
for reference here is from Red Hat the procedure to try to fix the filesystem
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide
after rebooting on rescue mode from install disc, without mounting the filesystem, do ( if you use LVM)
xfs_repair /dev/mapper/main-root
lsblk command might help you to detect where is your root (/) partition
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sr0 11:0 1 1024M 0 rom
vda 252:0 0 30G 0 disk
├─vda1 252:1 0 500M 0 part /boot
└─vda2 252:2 0 29,5G 0 part
├─main-root 253:0 0 27,5G 0 lvm /
└─main-swap 253:1 0 2G 0 lvm [SWAP]
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1,0T 0 disk
├─sda1 8:1 0 1023M 0 part /boot
├─sda2 8:2 0 4G 0 part [SWAP]
├─sda3 8:3 0 956G 0 part /
The xfs_repair utility cannot repair an XFS file system with a dirty log. To clear the log, mount and unmount the XFS file system. If the log is corrupt and cannot be replayed, use the -L option ("force log zeroing") to clear the log, that is, xfs_repair -L /dev/device. Be aware that this may result in further corruption or data loss.
The need to have a fresh backup is that sometime the issue is the disk, or the journal is broken and the error is not recoverable.
If the issue is a broken disk sector (or multiple) cloning the disk allow you to work on the cloned copy on a working disk and increase change to recover data, without killing the old disk faster with the recover procedure.
-
Thank you very much!!
Martin
-
Good that you had a decent backup - other readers should note this fact!
Treat yourself to a UPS - saves a lot of disasters :-) Even more so with the complexities of RAID systems.
Note also a RAID 5 with a few large drives is really not a good thing to have. The risk of a second disk failure while rebuilding a replacement drive is very high. The bigger the drives the higher the chance. Be very cautious there. Plenty of reading online about it.