Koozali.org: home of the SME Server

Software Raid Failure

Mike Drummond

Software Raid Failure
« on: May 07, 2002, 01:44:21 PM »
I have attempted to use software raid on my latest install of SME 5.1.2.
The set up is two identicial Quantum Fireball 20GB IDE drives both set as master and each on one of the two IDE ports on the mother board.  Only one port however is a fast ATA100 compatible port.   I did add a slave drive to the second controller and mounted the drive to restore the home directories and ibays to the server.  This has since been removed.

The following error messages are from the messages log on the server and indicate a problem but I dont know what I need to do to force a rebuild of the array with losing the data.  

Any pointers would be appreciated.

Regards Mike Drummmond



Apr 26 04:35:57 LinuxServer kernel:     ide0: BM-DMA at 0xd800-0xd807, BIOS settings: hda:DMA, hdb:pio
Apr 26 04:35:57 LinuxServer kernel:     ide1: BM-DMA at 0xd808-0xd80f, BIOS settings: hdc:DMA, hdd:pio
Apr 26 04:35:57 LinuxServer kernel: hda: QUANTUM FIREBALLP AS20.5, ATA DISK drive
Apr 26 04:35:57 LinuxServer kernel: hdc: AA@DDP, ATA DISK drive
Apr 26 04:35:57 LinuxServer kernel: ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Apr 26 04:35:57 LinuxServer kernel: ide1 at 0x170-0x177,0x376 on irq 15
Apr 26 04:35:57 LinuxServer kernel: hda: QUANTUM FIREBALLP AS20.5, 19595MB w/1902kB Cache, CHS=2498/255/63
Apr 26 04:35:57 LinuxServer kernel: hdc: AA@DDP, 19457MB w/0kB Cache, CHS=39532/16/63
Apr 26 04:35:57 LinuxServer kernel: Floppy drive(s): fd0 is 1.44M
Apr 26 04:35:57 LinuxServer kernel: FDC 0 is a post-1991 82077
Apr 26 04:35:57 LinuxServer kernel: md driver 0.90.0 MAX_MD_DEVS=256, MAX_REAL=12
Apr 26 04:35:57 LinuxServer kernel: raid5: measuring checksumming speed
Apr 26 04:35:57 LinuxServer kernel: raid5: MMX detected, trying high-speed MMX checksum routines
Apr 26 04:35:57 LinuxServer kernel:    pII_mmx   :   748.665 MB/sec
Apr 26 04:35:57 LinuxServer kernel:    p5_mmx    :   781.812 MB/sec
Apr 26 04:35:57 LinuxServer kernel:    8regs     :   573.786 MB/sec
Apr 26 04:35:57 LinuxServer kernel:    32regs    :   337.185 MB/sec
Apr 26 04:35:57 LinuxServer kernel: using fastest function: p5_mmx (781.812 MB/sec)
Apr 26 04:35:57 LinuxServer kernel: scsi : 0 hosts.
Apr 26 04:35:57 LinuxServer kernel: scsi : detected total.
Apr 26 04:35:57 LinuxServer kernel: md.c: sizeof(mdp_super_t) = 4096
Apr 26 04:35:57 LinuxServer kernel: Partition check:
Apr 26 04:35:57 LinuxServer kernel:  hda: hda1 hda2 < hda5 hda6 >
Apr 26 04:35:57 LinuxServer kernel:  hdc: [PTBL] [2480/255/63] hdc1@ hdc2 < >
Apr 26 04:35:57 LinuxServer kernel: RAMDISK: Compressed image found at block 0
Apr 26 04:35:57 LinuxServer kernel: autodetecting RAID arrays
Apr 26 04:35:57 LinuxServer kernel: (read) hda1's sb offset: 264960 [events: 00000004]
Apr 26 04:35:57 LinuxServer kernel: (read) hda5's sb offset: 15936 [events: 00000004]
Apr 26 04:35:57 LinuxServer kernel: (read) hda6's sb offset: 19783936 [events: 00000004]
Apr 26 04:35:57 LinuxServer kernel: autorun ...
Apr 26 04:35:57 LinuxServer kernel: considering hda6 ...
Apr 26 04:35:57 LinuxServer kernel:   adding hda6 ...
Apr 26 04:35:57 LinuxServer kernel: created md1
Apr 26 04:35:57 LinuxServer kernel: bind
Apr 26 04:35:57 LinuxServer kernel: running:
Apr 26 04:35:57 LinuxServer kernel: now!
Apr 26 04:35:57 LinuxServer kernel: hda6's event counter: 00000004
Apr 26 04:35:57 LinuxServer kernel: md1: former device hdc6 is unavailable, removing from array!
Apr 26 04:35:57 LinuxServer kernel: md: md1: raid array is not clean -- starting background reconstruction
Apr 26 04:35:57 LinuxServer kernel: request_module[md-personality-3]: Root fs not mounted
Apr 26 04:35:57 LinuxServer kernel: do_md_run() returned -22
Apr 26 04:35:57 LinuxServer kernel: unbind
Apr 26 04:35:57 LinuxServer kernel: export_rdev(hda6)
Apr 26 04:35:57 LinuxServer kernel: md1 stopped.
Apr 26 04:35:57 LinuxServer kernel: considering hda5 ...
Apr 26 04:35:57 LinuxServer kernel:   adding hda5 ...
Apr 26 04:35:57 LinuxServer kernel: created md0
Apr 26 04:35:57 LinuxServer kernel: bind
Apr 26 04:35:57 LinuxServer kernel: running:
Apr 26 04:35:57 LinuxServer kernel: now!
Apr 26 04:35:57 LinuxServer kernel: hda5's event counter: 00000004
Apr 26 04:35:57 LinuxServer kernel: md0: former device hdc5 is unavailable, removing from array!
Apr 26 04:35:57 LinuxServer kernel: request_module[md-personality-3]: Root fs not mounted
Apr 26 04:35:57 LinuxServer kernel: do_md_run() returned -22
Apr 26 04:35:57 LinuxServer kernel: unbind
Apr 26 04:35:57 LinuxServer kernel: export_rdev(hda5)
Apr 26 04:35:57 LinuxServer kernel: md0 stopped.
Apr 26 04:35:57 LinuxServer kernel: considering hda1 ...
Apr 26 04:35:57 LinuxServer kernel:   adding hda1 ...
Apr 26 04:35:57 LinuxServer kernel: created md2
Apr 26 04:35:57 LinuxServer kernel: bind
Apr 26 04:35:57 LinuxServer kernel: running:
Apr 26 04:35:57 LinuxServer kernel: now!
Apr 26 04:35:57 LinuxServer kernel: hda1's event counter: 00000004
Apr 26 04:35:57 LinuxServer kernel: md2: former device hdc1 is unavailable, removing from array!
Apr 26 04:35:57 LinuxServer kernel: md: md2: raid array is not clean -- starting background reconstruction
Apr 26 04:35:57 LinuxServer kernel: request_module[md-personality-3]: Root fs not mounted
Apr 26 04:35:57 LinuxServer kernel: do_md_run() returned -22
Apr 26 04:35:57 LinuxServer kernel: unbind
Apr 26 04:35:57 LinuxServer kernel: export_rdev(hda1)
Apr 26 04:35:57 LinuxServer kernel: md2 stopped.
Apr 26 04:35:57 LinuxServer kernel: ... autorun DONE.
Apr 26 04:35:57 LinuxServer kernel: apm: BIOS version 1.2 Flags 0x03 (Driver version 1.13)
Apr 26 04:35:57 LinuxServer kernel: VFS: Mounted root (ext2 filesystem).
Apr 26 04:35:57 LinuxServer kernel: i91u: PCI Base=0x9800, IRQ=11, BIOS=0xFF000, SCSI ID=7
Apr 26 04:35:57 LinuxServer kernel: i91u: Reset SCSI Bus ...
Apr 26 04:35:57 LinuxServer kernel: scsi0 : Initio INI-9X00U/UW SCSI device driver; Revision: 1.03g
Apr 26 04:35:57 LinuxServer kernel: scsi : 1 host.
Apr 26 04:35:57 LinuxServer kernel: raid1 personality registered
Apr 26 04:35:57 LinuxServer kernel: autodetecting RAID arrays
Apr 26 04:35:57 LinuxServer kernel: (read) hda1's sb offset: 264960 [events: 00000004]
Apr 26 04:35:57 LinuxServer kernel: (read) hda5's sb offset: 15936 [events: 00000004]
Apr 26 04:35:57 LinuxServer kernel: (read) hda6's sb offset: 19783936 [events: 00000004]
Apr 26 04:35:57 LinuxServer kernel: autorun ...
Apr 26 04:35:57 LinuxServer kernel: considering hda6 ...
Apr 26 04:35:57 LinuxServer kernel:   adding hda6 ...
Apr 26 04:35:57 LinuxServer kernel: created md1
Apr 26 04:35:57 LinuxServer kernel: bind
Apr 26 04:35:57 LinuxServer kernel: running:
Apr 26 04:35:57 LinuxServer kernel: now!
Apr 26 04:35:57 LinuxServer kernel: hda6's event counter: 00000004
Apr 26 04:35:57 LinuxServer kernel: md1: former device hdc6 is unavailable, removing from array!
Apr 26 04:35:57 LinuxServer kernel: md: md1: raid array is not clean -- starting background reconstruction
Apr 26 04:35:57 LinuxServer kernel: md1: max total readahead window set to 128k
Apr 26 04:35:57 LinuxServer kernel: md1: 1 data-disks, max readahead per data-disk: 128k
Apr 26 04:35:57 LinuxServer kernel: raid1: device hda6 operational as mirror 0
Apr 26 04:35:57 LinuxServer kernel: raid1: md1, not all disks are operational -- trying to recover array
Apr 26 04:35:57 LinuxServer kernel: raid1: raid set md1 active with 1 out of 2 mirrors
Apr 26 04:35:57 LinuxServer kernel: md: updating md1 RAID superblock on device
Apr 26 04:35:57 LinuxServer kernel: hda6 [events: 00000005](write) hda6's sb offset: 19783936
Apr 26 04:35:57 LinuxServer kernel: .
Apr 26 04:35:57 LinuxServer kernel: considering hda5 ...
Apr 26 04:35:57 LinuxServer kernel:   adding hda5 ...
Apr 26 04:35:57 LinuxServer kernel: created md0
Apr 26 04:35:57 LinuxServer kernel: bind
Apr 26 04:35:57 LinuxServer kernel: running:
Apr 26 04:35:57 LinuxServer kernel: now!
Apr 26 04:35:57 LinuxServer kernel: hda5's event counter: 00000004
Apr 26 04:35:57 LinuxServer kernel: md0: former device hdc5 is unavailable, removing from array!
Apr 26 04:35:57 LinuxServer kernel: md0: max total readahead window set to 128k
Apr 26 04:35:57 LinuxServer kernel: md0: 1 data-disks, max readahead per data-disk: 128k
Apr 26 04:35:57 LinuxServer kernel: raid1: device hda5 operational as mirror 0
Apr 26 04:35:57 LinuxServer kernel: raid1: md0, not all disks are operational -- trying to recover array
Apr 26 04:35:57 LinuxServer kernel: md: recovery thread got woken up ...
Apr 26 04:35:57 LinuxServer kernel: md1: no spare disk to reconstruct array! -- continuing in degraded mode
Apr 26 04:35:57 LinuxServer kernel: md: recovery thread finished ...
Apr 26 04:35:57 LinuxServer kernel: raid1: raid set md0 active with 1 out of 2 mirrors
Apr 26 04:35:57 LinuxServer kernel: md: updating md0 RAID superblock on device
Apr 26 04:35:57 LinuxServer kernel: hda5 [events: 00000005](write) hda5's sb offset: 15936
Apr 26 04:35:57 LinuxServer kernel: .
Apr 26 04:35:57 LinuxServer kernel: considering hda1 ...
Apr 26 04:35:57 LinuxServer kernel:   adding hda1 ...
Apr 26 04:35:57 LinuxServer kernel: created md2
Apr 26 04:35:57 LinuxServer kernel: bind
Apr 26 04:35:57 LinuxServer kernel: running:
Apr 26 04:35:57 LinuxServer kernel: now!
Apr 26 04:35:57 LinuxServer kernel: hda1's event counter: 00000004
Apr 26 04:35:57 LinuxServer kernel: md2: former device hdc1 is unavailable, removing from array!
Apr 26 04:35:57 LinuxServer kernel: md: md2: raid array is not clean -- starting background reconstruction
Apr 26 04:35:57 LinuxServer kernel: md2: max total readahead window set to 128k
Apr 26 04:35:57 LinuxServer kernel: md2: 1 data-disks, max readahead per data-disk: 128k
Apr 26 04:35:57 LinuxServer kernel: raid1: device hda1 operational as mirror 0
Apr 26 04:35:57 LinuxServer kernel: raid1: md2, not all disks are operational -- trying to recover array
Apr 26 04:35:57 LinuxServer kernel: md: recovery thread got woken up ...
Apr 26 04:35:57 LinuxServer kernel: md0: no spare disk to reconstruct array! -- continuing in degraded mode
Apr 26 04:35:57 LinuxServer kernel: md1: no spare disk to reconstruct array! -- continuing in degraded mode
Apr 26 04:35:57 LinuxServer kernel: md: recovery thread finished ...
Apr 26 04:35:57 LinuxServer kernel: raid1: raid set md2 active with 1 out of 2 mirrors
Apr 26 04:35:57 LinuxServer kernel: md: updating md2 RAID superblock on device
Apr 26 04:35:57 LinuxServer kernel: hda1 [events: 00000005](write) hda1's sb offset: 264960
Apr 26 04:35:57 LinuxServer kernel: .
Apr 26 04:35:57 LinuxServer kernel: ... autorun DONE.
Apr 26 04:35:57 LinuxServer kernel: md: recovery thread got woken up ...
Apr 26 04:35:57 LinuxServer kernel: md2: no spare disk to reconstruct array! -- continuing in degraded mode
Apr 26 04:35:57 LinuxServer kernel: md0: no spare disk to reconstruct array! -- continuing in degraded mode
Apr 26 04:35:57 LinuxServer kernel: md1: no spare disk to reconstruct array! -- continuing in degraded mode
Apr 26 04:35:57 LinuxServer kernel: md: recovery thread finished ...
Apr 26 04:35:57 LinuxServer kernel: VFS: Mounted root (ext2 filesystem) readonly.
Apr 26 04:35:57 LinuxServer kernel: change_root: old root has d_count=1
Apr 26 04:35:57 LinuxServer kernel: Trying to unmount old root ... okay

Filippo Carletti

Re: Software Raid Failure
« Reply #1 on: May 07, 2002, 03:07:15 PM »
> Apr 26 04:35:57 LinuxServer kernel: hda: QUANTUM FIREBALLP
> AS20.5, ATA DISK drive
> Apr 26 04:35:57 LinuxServer kernel: hdc: AA@DDP, ATA DISK drive

This is suspicious, hdc should be identified as hda.

> Apr 26 04:35:57 LinuxServer kernel: hda: QUANTUM FIREBALLP
> AS20.5, 19595MB w/1902kB Cache, CHS=2498/255/63
> Apr 26 04:35:57 LinuxServer kernel: hdc: AA@DDP, 19457MB
> w/0kB Cache, CHS=39532/16/63

Use the same geometry for both disks.

Nathan Fowler

Re: Software Raid Failure
« Reply #2 on: May 07, 2002, 08:34:36 PM »
You can always re-add the arrays.

raidhotadd /dev/hdaX /dev/mdY

Where X is the parition number and Y is the array number.
Check your /etc/raidtab to make sure you align the arrays with the drive/paritions.

To check the status of your array you can always:

cat /proc/mdstat

UU means the drives are uniform, U_ means one is missing from the array.  /proc/mdstat will give you the array name and the attached drive/parition.

If you need any more help let me know.
Nathan

Ari Novikoff

Re: Software Raid Failure
« Reply #3 on: May 07, 2002, 08:53:05 PM »
>Apr 26 04:35:57 LinuxServer kernel: hda: QUANTUM FIREBALLP AS20.5, 19595MB w/1902kB Cache, CHS=2498/255/63
>Apr 26 04:35:57 LinuxServer kernel: hdc: AA@DDP, 19457MB w/0kB Cache, CHS=39532/16/63

Both of these drives should be identical. The system, for whatever reason does not see them as such and your mirrored disk is reported as being smaller (not by much, mind you) than the primary disk.

This is why your software RAID-1 is failing.

Have you set up both drives in the BIOS properly? i.e. do NOT autodetect - set them both to NORMAL?

Ari

Darrell May

Re: Software Raid Failure
« Reply #4 on: May 08, 2002, 05:33:55 AM »
If I read this correctly the problem is because you do not have both drives on the primary controller.  You are using two independent and different IDE controllers.

Think of the primary ATA100 controller and the secondary controller as two separate expansion cards.  They both are completely different and utilize different bus speeds and logic.

Next you never want to mix a high speed IDE device and a slow speed IDE device on the same channel.  The result is the channel will drop down to match the slow speed device.

Answer 1:

Perform a tape backup.  Put both hard drives as MASTER/SLAVE on the primary ATA100 channel.  Put your CD-ROM on the secondary channel.  Perform a fresh RAID1 install and a server-manager restore from tape.

Answer 2:

Switch to hardware raid.  Here is a great SME compatible product that I sell.

http://myezserver.com/mrs-raid.html

Regards,

Darrell

Wil Johnson

Re: Software Raid Failure
« Reply #5 on: May 08, 2002, 09:24:25 PM »
Or try here for the MRS hardware RAID direct from the distributor...

http://mrseries.ca
http://mrseries.ca/sb_c98.htm
http://mrseries.ca/contact.htm

brian read

Re: Software Raid Failure
« Reply #6 on: May 14, 2002, 03:38:46 PM »
Doesn't this answer contradict one of the basic tenets of Raid 1, i.e .to avoid any single point  of failure?  So the normal configuration would be to keep each disc on a seperate controller?

I am currently experimenting with a 2 x 20Gb installation, so I'll try various combinations.

Mike Drummond

Re: Software Raid Failure Resolved
« Reply #7 on: June 24, 2002, 02:20:51 PM »
Thanks for all the good advice and sorry for the delay comming back on line.

I appear to have resolved the problem using a combination of -:

1. Change the MB bios setting so that I manually configured the two HDD Identically (they are identical drives) rather than used the Autodetect option.

2. Checked to ensure which drive had been kicked out of the array and then used "raidhotadd" to add back partitions on the second drive to the array.  

The raid array checks out OK now.

I checked the HOW TO/software raid  an there was no mention of needing to put both drives on the same controller.  That would be more risky? as could the controller become some what confused when say the primary HDD disappered due to a failure.  BTW these HDD are the only device on each IDE controller.  The CDROM and DAT tape drive are both SCSI.

This home server should be on a UPS but it is not :-( .  Power cuts drop the server.  I suspect that even if the automatic fsck check on reboot is sucessfull one disk will still be dropped out of the array as there will be a superblock update time inconsistency.  If this is correct, is there any way to automatically set off the rebuild of the raid array ? or does it need to be a manual process?

regards

Mike

brian read

Re: Software Raid Failure Resolved
« Reply #8 on: June 24, 2002, 02:38:13 PM »
Mike

I have had exactly what you say happen to me, and after fixing the disc with fsck -f /dev/md1, the system re-sychronised itself automatically.

Cheers

Brian