Koozali.org: home of the SME Server

GRUB error while booting - RAID error

Offline SchulzStefan

  • *
  • 620
  • +0/-0
GRUB error while booting - RAID error
« on: May 25, 2007, 09:03:49 PM »
I had a powerloss during the night. In the morning I tried to boot the server (last updates), no luck. The server has to sata-drives, they are hardware- and! software-mirrored.

I disconnected the drives from the board and tried to boot the server from the install-cd - no luck. I erased the bios-settings, tried the same again. Worked.

I plugged the two harddrives on the board. The machine was booting. Got following messages:

May 25 15:23:20 warp kernel: ata1: SATA max UDMA/133 cmd 0x9F0 ctl 0xBF2 bmdma 0xCC00 irq 185
May 25 15:23:20 warp kernel: ata2: SATA max UDMA/133 cmd 0x970 ctl 0xB72 bmdma 0xCC08 irq 185
May 25 15:23:20 warp kernel: ata1: SATA link up 1.5 Gbps (SStatus 113)
May 25 15:23:20 warp kernel: ata1: dev 0 cfg 49:2f00 82:746b 83:7f61 84:4163 85:7469 86:3c41 87:4163 88:407f
May 25 15:23:20 warp kernel: ata1: dev 0 ATA-7, max UDMA/133, 490234752 sectors: LBA48
May 25 15:23:20 warp kernel: nv_sata: Primary device added
May 25 15:23:20 warp kernel: nv_sata: Primary device removed
May 25 15:23:20 warp kernel: nv_sata: Secondary device added
May 25 15:23:20 warp kernel: nv_sata: Secondary device removed
May 25 15:23:20 warp kernel: ata1: dev 0 configured for UDMA/133
May 25 15:23:20 warp kernel: scsi0 : sata_nv
May 25 15:23:20 warp kernel: ata2: SATA link up 1.5 Gbps (SStatus 113)
May 25 15:23:20 warp kernel: ata2: dev 0 cfg 49:2f00 82:746b 83:7f61 84:4163 85:7469 86:3c41 87:4163 88:407f
May 25 15:23:20 warp kernel: ata2: dev 0 ATA-7, max UDMA/133, 490234752 sectors: LBA48
May 25 15:23:20 warp kernel: nv_sata: Primary device added
May 25 15:23:20 warp kernel: nv_sata: Primary device removed
May 25 15:23:20 warp kernel: nv_sata: Secondary device added
May 25 15:23:20 warp kernel: nv_sata: Secondary device removed
May 25 15:23:20 warp kernel: ata2: dev 0 configured for UDMA/133
May 25 15:23:20 warp kernel: scsi1 : sata_nv
May 25 15:23:20 warp kernel:   Vendor: ATA       Model: WDC WD2500YS-01S  Rev: 20.0
May 25 15:23:20 warp kernel:   Type:   Direct-Access                      ANSI SCSI revision: 05
May 25 15:23:20 warp kernel: SCSI device sda: 490234752 512-byte hdwr sectors (251000 MB)
May 25 15:23:20 warp kernel: SCSI device sda: drive cache: write back
May 25 15:23:20 warp kernel: SCSI device sda: 490234752 512-byte hdwr sectors (251000 MB)
May 25 15:23:20 warp kernel: SCSI device sda: drive cache: write back
May 25 15:23:20 warp kernel:  sda:<4>nv_sata: Primary device added
May 25 15:23:20 warp kernel: nv_sata: Primary device removed
May 25 15:23:20 warp kernel: nv_sata: Secondary device added
May 25 15:23:20 warp kernel: nv_sata: Secondary device removed
May 25 15:23:20 warp kernel:  sda1 sda2
May 25 15:23:20 warp kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
May 25 15:23:20 warp kernel:   Vendor: ATA       Model: WDC WD2500YS-01S  Rev: 20.0
May 25 15:23:20 warp kernel:   Type:   Direct-Access                      ANSI SCSI revision: 05
May 25 15:23:20 warp kernel: SCSI device sdb: 490234752 512-byte hdwr sectors (251000 MB)
May 25 15:23:20 warp kernel: SCSI device sdb: drive cache: write back
May 25 15:23:20 warp kernel: SCSI device sdb: 490234752 512-byte hdwr sectors (251000 MB)
May 25 15:23:20 warp kernel: SCSI device sdb: drive cache: write back
May 25 15:23:20 warp kernel:  sdb:<4>nv_sata: Primary device added
May 25 15:23:20 warp kernel: nv_sata: Primary device removed
May 25 15:23:20 warp kernel: nv_sata: Secondary device added
May 25 15:23:20 warp kernel: nv_sata: Secondary device removed
May 25 15:23:20 warp kernel:  sdb1 sdb2
May 25 15:23:20 warp kernel: Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
May 25 15:23:20 warp kernel: ACPI: PCI interrupt 0000:00:08.0[A] -> GSI 21 (level, low) -> IRQ 193
May 25 15:23:20 warp kernel: PCI: Setting latency timer of device 0000:00:08.0 to 64
May 25 15:23:20 warp kernel: ata3: SATA max UDMA/133 cmd 0x9E0 ctl 0xBE2 bmdma 0xB800 irq 193
May 25 15:23:20 warp kernel: ata4: SATA max UDMA/133 cmd 0x960 ctl 0xB62 bmdma 0xB808 irq 193
May 25 15:23:20 warp kernel: ata3: SATA link down (SStatus 0)
May 25 15:23:20 warp kernel: scsi2 : sata_nv
May 25 15:23:20 warp kernel: ata4: SATA link down (SStatus 0)
May 25 15:23:20 warp kernel: scsi3 : sata_nv
May 25 15:23:20 warp kernel: device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com
May 25 15:23:20 warp kernel: md: raid1 personality registered as nr 3
May 25 15:23:20 warp kernel: md: md2 stopped.
May 25 15:23:20 warp kernel: md: bind<sda2>
May 25 15:23:20 warp kernel: md: bind<sdb2>
May 25 15:23:20 warp kernel: md: kicking non-fresh sda2 from array!
May 25 15:23:20 warp kernel: md: unbind<sda2>
May 25 15:23:20 warp kernel: md: export_rdev(sda2)
May 25 15:23:20 warp kernel: raid1: raid set md2 active with 1 out of 2 mirrors
May 25 15:23:20 warp kernel: md: md1 stopped.
May 25 15:23:20 warp kernel: md: bind<sda1>
May 25 15:23:20 warp kernel: md: bind<sdb1>
May 25 15:23:20 warp kernel: md: kicking non-fresh sda1 from array!
May 25 15:23:20 warp kernel: md: unbind<sda1>
May 25 15:23:20 warp kernel: md: export_rdev(sda1)
May 25 15:23:20 warp kernel: raid1: raid set md1 active with 1 out of 2 mirrors

The hardware monitor (while booting the machine) tells me, the two disks are in sync and healthy.

Next, I checked the following:

[]# mdadm -D /dev/md1
/dev/md1:
Version : 00.90.01
Creation Time : Wed Mar 14 17:14:43 2007
Raid Level : raid1
Array Size : 104320 (101.89 MiB 106.82 MB)
Device Size : 104320 (101.89 MiB 106.82 MB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Fri May 25 15:24:59 2007
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

UUID : fc90b8e1:22605614:45bac827:c9135624
Events : 0.5044

Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 0 0 - removed



[]#mdadm -D /dev/md2
/dev/md2:
Version : 00.90.01
Creation Time : Wed Mar 14 17:14:43 2007
Raid Level : raid1
Array Size : 245007232 (233.66 GiB 250.89 GB)
Device Size : 245007232 (233.66 GiB 250.89 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 2
Persistence : Superblock is persistent

Update Time : Fri May 25 17:49:24 2007
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

UUID : a3af15df:9956af04:351c0bd7:434d8617
Events : 0.6069372

Number Major Minor RaidDevice State
0 8 18 0 active sync /dev/sdb2
1 0 0 - removed

O.K., I decided the start the mirroring from the console. The result now is:

[]# mdadm -D /dev/md1
/dev/md1:
        Version : 00.90.01
  Creation Time : Wed Mar 14 17:14:43 2007
     Raid Level : raid1
     Array Size : 104320 (101.89 MiB 106.82 MB)
    Device Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Fri May 25 20:13:10 2007
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : fc90b8e1:22605614:45bac827:c9135624
         Events : 0.5084

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8        1        1      active sync   /dev/sda1

[]# mdadm -D /dev/md2
/dev/md2:
        Version : 00.90.01
  Creation Time : Wed Mar 14 17:14:43 2007
     Raid Level : raid1
     Array Size : 245007232 (233.66 GiB 250.89 GB)
    Device Size : 245007232 (233.66 GiB 250.89 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Fri May 25 20:41:50 2007
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : a3af15df:9956af04:351c0bd7:434d8617
         Events : 0.6072516

    Number   Major   Minor   RaidDevice State
       0       8       18        0      active sync   /dev/sdb2
       1       8        2        1      active sync   /dev/sda2

So far so good, everything is looking fine. I decided to reboot the machine. Was not able to boot again: Got a Grub error.

Grub loading Stage 1.5

The machine hangs.

After changing the two drive-connections on the board, I got an error from the board, that the RAID is NOT in sync. But I was able to boot the server without any problems.

I decided to boot the machine again - got the same result. Fixed it in the same way as a.m.

This error occurs since the last update (the system was always up to date). Before I had never a problem to reboot. The configuration was the same, no changes at all.

A few weeks ago, I had a similar problem that ended in buying a new server. One of the disks crashed - I was not able to boot from the other disk. Everything endet in a GRUB errot. The setup of the machine was a RAID1 (two identical disks). There was no mirroring on board, only softraid.

I have no idea, where the reason could be - any hints are welcome.

Bug 3029 opened.

stefan
And then one day you find ten years have got behind you.

Time, 1973
(Mason, Waters, Wright, Gilmour)