Koozali.org: home of the SME Server
Contribs.org Forums => Koozali SME Server 10.x => Topic started by: axessit on August 06, 2025, 02:18:02 AM
-
I am swapping out a failing drive (smart fails too high). When I failed the drive from my raid arrays, then removed them (logically) all was fine. I then removed the drive (hot-swap) and inserted the new drive. I had issues trying to partition it properly, so zeroed out with dd to start again. The Riad manager then couldn't see the drive as a block device, so I decided to reboot, but it failed to boot, with some grub error message (yeah, should have screenshotted but as always in a rush to get it back up).
I inserted the old drive and it booted properly (phew!). So I checked the raid status, and it had shown the raids were all only working on one drive as expected. I removed the old drive (hot swapped) then inserted the new drive, then the admin console-> Manage raid picked up the new drive and asked to use it, this time all went smoth and the first two partitions have rebuilt, the third is chugging away.
But I am looking into the bootmanager to see why it didn't boot off the first drive. I thought grub may have been confused/not installed on the second drive, and in looking at https://wiki.koozali.org/Raid:Manual_Rebuild (https://wiki.koozali.org/Raid:Manual_Rebuild) at the bottom, went to run Grub, but it is not installed. Running yum, install grub does not install it either.
So does 10.1 use grub?
If not, do I need to fix up UEFI boot entries ?
uefibootmgr -v shows
BootCurrent: 000E
Timeout: 10 seconds
BootOrder: 000E,000D,0004,0000,0002,0001,0003,0005,0006,0007,0008,0009,000A,000B,000C
Boot0000* CD/DVD Rom /Pci(0x1f,0x5)/Ata(0,0,0)/CDROM(1,0x1a7,0x44c0)
Boot0001* Floppy Disk VenMedia(0c588db8-6af4-11dd-a992-00197d890238,00)
Boot0002* Hard Disk 0 VenMedia(0c588db8-6af4-11dd-a992-00197d890238,08)
Boot0003* PXE Network VenMedia(0c588db8-6af4-11dd-a992-00197d890238,06)
Boot0004* Hard Disk 1 VenMedia(0c588db8-6af4-11dd-a992-00197d890238,09)
Boot0005* Hard Disk 2 VenMedia(0c588db8-6af4-11dd-a992-00197d890238,0a)
Boot0006* Hard Disk 3 VenMedia(0c588db8-6af4-11dd-a992-00197d890238,0b)
Boot0007* USB Storage VenMedia(0c588db8-6af4-11dd-a992-00197d890238,03)
Boot0008* Diagnostics VenMedia(0c588db8-6af4-11dd-a992-00197d890238,da)
Boot0009* iSCSI VenMedia(0c588db8-6af4-11dd-a992-00197d890238,04)
Boot000A* iSCSI Critical VenMedia(0c588db8-6af4-11dd-a992-00197d890238,05)
Boot000B* Legacy Only VenMedia(0c588db8-6af4-11dd-a992-00197d890238,ee)
Boot000C* Embedded Hypervisor VenMedia(0c588db8-6af4-11dd-a992-00197d890238,01)
Boot000D* Koozali SME Server HD(2,GPT,879b3e24-c266-4750-b3cf-e316013439dc,0xfa800,0x64000)/File(\EFI\centos\shimx64.efi)
Boot000E* Koozali SME Server HD(2,GPT,b7cf5585-be10-4eab-9b5b-8ebf32693552,0xfa800,0x64000)/File(\EFI\centos\shimx64.efi)
and blkid shows
/dev/sda1: UUID="96e0aec0-45fe-c4e6-a094-963c31f1aaa3" UUID_SUB="99a22d6b-d36f-56d4-5355-3f3c130cfb4f" LABEL="localhost:0" TYPE="linux_raid_member" PARTUUID="20e7ef88-593c-4d52-9fcb-df328921c0c0"
/dev/sda2: UUID="c925b002-b649-a422-8007-42b06b40e564" UUID_SUB="f0d50525-0337-68a5-4cc6-3407dd9ca199" LABEL="localhost:9" TYPE="linux_raid_member" PARTUUID="879b3e24-c266-4750-b3cf-e316013439dc"
/dev/sda3: UUID="da4811b5-358c-b68a-aa92-42d3118f7a9e" UUID_SUB="b8f24ce1-7bf1-874a-d905-8b7d76e9db8d" LABEL="localhost:1" TYPE="linux_raid_member" PARTUUID="b19a1d91-f9d0-4ba1-b7ab-177637b229c9"
/dev/sdb1: UUID="96e0aec0-45fe-c4e6-a094-963c31f1aaa3" UUID_SUB="f5ca957b-e678-5eb8-a36b-3165acb9ef83" LABEL="localhost:0" TYPE="linux_raid_member" PARTUUID="897276d9-f2cf-48b0-9c2e-8f6f96e74f27"
/dev/sdb2: UUID="c925b002-b649-a422-8007-42b06b40e564" UUID_SUB="4e583425-7652-53c8-6669-f8b495b2335e" LABEL="localhost:9" TYPE="linux_raid_member" PARTUUID="2fe4653b-1aef-4a7a-8110-b6cc5ab7955c"
/dev/sdb3: UUID="da4811b5-358c-b68a-aa92-42d3118f7a9e" UUID_SUB="a4cf7d9e-a3f0-c75f-9fc6-8627da3fcecf" LABEL="localhost:1" TYPE="linux_raid_member" PARTUUID="d8df6d16-1a00-4a81-a5fb-341f2aab5e44"
/dev/md1: UUID="ApjO4o-uG9G-EH0v-61l6-kIGU-1ieq-nnndnS" TYPE="LVM2_member"
/dev/md0: UUID="ca90d06c-8400-4867-9753-b13cb3af184e" TYPE="xfs"
/dev/mapper/main-root: UUID="a471c912-e2b8-436f-8522-ee6cc6a090c3" TYPE="xfs"
/dev/mapper/main-swap: UUID="d10ac112-cf7f-4924-af55-758086918bce" TYPE="swap"
/dev/md9: SEC_TYPE="msdos" UUID="0232-D8DC" TYPE="vfat"
The working disk sda is
Disk /dev/sda: 3907029168 sectors, 1.8 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 0C423252-8575-4602-BE6E-6720198965DE
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 3907029134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2157 sectors (1.1 MiB)
Number Start (sector) End (sector) Size Code Name
1 2048 1026047 500.0 MiB FD00
2 1026048 1435647 200.0 MiB FD00
3 1435648 3907028991 1.8 TiB FD00
while the new disk sdb is
Disk /dev/sdb: 3907029168 sectors, 1.8 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 03C1D62F-49DB-4FAF-ADFE-7C6359D915EC
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 3907029134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2157 sectors (1.1 MiB)
Number Start (sector) End (sector) Size Code Name
1 2048 1026047 500.0 MiB FD00
2 1026048 1435647 200.0 MiB FD00
3 1435648 3907028991 1.8 TiB FD00
I note the UEFI boot uuid does not show in any of the drives or raid partitions. So does this need fixing ?
-
we use grub.
efi partition can not be on a raid partition, they should be copied from disk to disk and are usually fat32.
looking at your blkid i would say you do not have a uefi enabled.
but those commands should help better undertand what is what
df -h
lsblk --fs
cat /proc/mdstat
i would suggest fixing grub using cli for grub (not dd) https://wiki.koozali.org/Raid:Manual_Rebuild#HowTo:_Write_the_GRUB_boot_sector
alternative could be to boot on iso/usb installer and choose the repair installation, if your reboot fails
-
Having waiting a couple of days for the RAID to rebuild (2 TB drive), the new drive sdb joined the RAID partitions nicely. So I failed the original sda as I wanted to swap this out too as it has also thrown some SMART errors. As per the previous. I firstly just hot added the new drive as sdc, then zeroed with dd to wipe any previous data (the drives were pulled from a retired server just under 3 years old from work). Once the drive had zeroed out, I hot remoed it, the failed and removed sda from the RAIDs. I hot removed sda, then hot inserted the blank drive, the admin console RAID manager picked it up and asked to use it, so I did. It started sync'ing up, and after a couple more days, was up to speed.
I then rebooted to check it would work. Upon reboot, I got the same message Boot Failed. Koozali SME Server
As before, I just let it continue, then I got another line, identical. Then after leaving it (and franticly starting to look for my boot CD) next thing the boot screen appeared, and the system fired up. Once it booted, I confirmed everything was happy, RAID was all happy, I rebooted into UEFI startup screen. Under the boot devices, it showed CentOS Linux as the first boot, then two entries for Koozali SME Server.
Not trying to poke the bear, I carried on booting up. I then ran efibootmgr -v to check the boot, and now there was an entry for CentOS Linux, and it was the first boot option. So I used efibootmgr -b 000E -B
and removed the Koozali SME Server entries. I then did a reboot into UEFI setup and checked the settings, the Koozali entries were gone, and the CentOS was present and first.
So now my efibootmgr -v looks like
BootCurrent: 000F
Timeout: 10 seconds
BootOrder: 000F,0004,0000,0002,0001,0003,0005,0006,0007,0008,0009,000A,000B,000C
Boot0000* CD/DVD Rom /Pci(0x1f,0x5)/Ata(0,0,0)/CDROM(1,0x1a7,0x44c0)
Boot0001* Floppy Disk VenMedia(0c588db8-6af4-11dd-a992-00197d890238,00)
Boot0002* Hard Disk 0 VenMedia(0c588db8-6af4-11dd-a992-00197d890238,08)
Boot0003* PXE Network VenMedia(0c588db8-6af4-11dd-a992-00197d890238,06)
Boot0004* Hard Disk 1 VenMedia(0c588db8-6af4-11dd-a992-00197d890238,09)
Boot0005* Hard Disk 2 VenMedia(0c588db8-6af4-11dd-a992-00197d890238,0a)
Boot0006* Hard Disk 3 VenMedia(0c588db8-6af4-11dd-a992-00197d890238,0b)
Boot0007* USB Storage VenMedia(0c588db8-6af4-11dd-a992-00197d890238,03)
Boot0008* Diagnostics VenMedia(0c588db8-6af4-11dd-a992-00197d890238,da)
Boot0009* iSCSI VenMedia(0c588db8-6af4-11dd-a992-00197d890238,04)
Boot000A* iSCSI Critical VenMedia(0c588db8-6af4-11dd-a992-00197d890238,05)
Boot000B* Legacy Only VenMedia(0c588db8-6af4-11dd-a992-00197d890238,ee)
Boot000C* Embedded Hypervisor VenMedia(0c588db8-6af4-11dd-a992-00197d890238,01)
Boot000F* CentOS Linux HD(2,GPT,ff21ce51-3be6-47a2-bd43-3fd1b6d07175,0xfa800,0x64000)/File(\EFI\centos\shimx64.efi)
So it would appear I am all happy, and I am using UEFI boot. When I run grub
from the cli, I get "grub not found", so somehow I am not using grub. When I run yum install grub, it responds No Package grub available.
Server hardware is IBM System x3200 M3.
So not quite sure whether the manual RAID rebuild is relevant, but anyway, I'm back up and running, and all is well.