Koozali.org: home of the SME Server

RAID drive replacement problem

John Crisp

RAID drive replacement problem
« on: January 27, 2003, 01:13:09 AM »
Hi,

I have a drive replacement problem on my 5.1.2 server. I originally replaced hdc with a new drive last year as the old one failed. I used a 60G to replace a 40G and followed Darrell Mays excellent raid recovery howto. I just used exactly the same partition sizes and left the additional space for a rainy day. The only thing I didn't do was run the raid test at the end......

Now hda is dying slowly and needs replacing - it has been dropped from the array. The system will boot off hda and run off hdc quite happily. However, if I pull hda and leave it to boot from hdc, it only boots lilo as far as 'L' - according to the docs I have read this means that lilo is installed to hdc, but it is failing to find the map file. I would guess that the system is booting off hda, and then when it tries to build the array,it fails and falls over to hdc.

When I run the command as per the howto for installing lilo onto the second drive I get the following :


Welcome to the Mitel Networks SME Server V5.1.2
Kernel 2.2.19-7.0.8 on an i686

# /sbin/lilo -v -C /root/raidmonitor/lilo.conf -b /dev/hdc

LILO version 21.4-4, Copyright (C) 1992-1998 Werner Almesberger
'lba32' extensions Copyright (C) 1999,2000 John Coffman

Ignoring entry 'boot'
Reading boot sector from /dev/hdc
Warning: /dev/hdc is not on the first disk
Merging with /boot/boot.b
Mapping message file /boot/mitel.pcx
Boot image: /boot/vmlinuz-2.2.19-7.0.8
Mapping RAM disk /boot/initrd-2.2.19-7.0.8.img
Added esmith *
/boot/boot.1600 exists - no backup copy made.
Writing boot sector.


If I just run lilo I get the following :

[root@server /root]# /sbin/lilo -v
LILO version 21.4-4, Copyright (C) 1992-1998 Werner Almesberger
'lba32' extensions Copyright (C) 1999,2000 John Coffman

boot = /dev/hdc, map = /boot/map.1605
Reading boot sector from /dev/hdc
Merging with /boot/boot.b
Mapping message file /boot/mitel.pcx
Boot image: /boot/vmlinuz-2.2.19-7.0.8
Mapping RAM disk /boot/initrd-2.2.19-7.0.8.img
Added esmith *
/boot/boot.1600 exists - no backup copy made.
Writing boot sector.
[root@server /root]#

One thing I noticed was three map files in /boot - map dated today, map1305 dated last year - probably when I changed the drives previously, and map1605 also dated today.

Question is, how do I get hdc to boot so I can sling in the new drive as hda and re mirror the lot ? I am loath to tinker too much myself at the moment for fear that the machine won't come up at all.

Any advice gratefully appreciated - from the sounds of things the IBM Deathstar is doing its last song of death and it won't be long before it dies entirely and leaves me in a right mess !

B. Rgds

John



Various reports :




[root@server raidmonitor]# fdisk /dev/hdc

The number of cylinders for this disk is set to 7476.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/hdc: 255 heads, 63 sectors, 7476 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hdc1   *         1        33    265041   fd  Linux raid autodetect
/dev/hdc2            34      5005  39937590    5  Extended
/dev/hdc5            34        35     16033+  fd  Linux raid autodetect
/dev/hdc6            36      5005  39921493+  fd  Linux raid autodetect

Command (m for help):

-------------------------------------------------------------
...................
RAID Monitor Report
...................

Current /proc/mdstat saved in /root/raidmonitor/mdstat:

Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hdc1[0] 264960 blocks [2/1] [U_]
md0 : active raid1 hdc5[0] 15936 blocks [2/1] [U_]
md1 : active raid1 hdc6[0] 39921408 blocks [2/1] [U_]
unused devices:

Current partition info saved in /root/raidmonitor/sfdisk.out:

# partition table of /dev/hda
unit: sectors

/dev/hda1 : start=       63, size=  530082, Id=fd, bootable
/dev/hda2 : start=   530145, size=79875180, Id= 5
/dev/hda3 : start=        0, size=       0, Id= 0
/dev/hda4 : start=        0, size=       0, Id= 0
/dev/hda5 : start=   530208, size=   32067, Id=fd
/dev/hda6 : start=   562338, size=79842987, Id=fd
# partition table of /dev/hdc
unit: sectors

/dev/hdc1 : start=       63, size=  530082, Id=fd, bootable
/dev/hdc2 : start=   530145, size=79875180, Id= 5
/dev/hdc3 : start=        0, size=       0, Id= 0
/dev/hdc4 : start=        0, size=       0, Id= 0
/dev/hdc5 : start=   530208, size=   32067, Id=fd
/dev/hdc6 : start=   562338, size=79842987, Id=fd
 /dev/hdd: unrecognized partition
No partitions found

Current /etc/lilo.conf saved in /root/raidmonitor/lilo.conf:

#------------------------------------------------------------
# DO NOT MODIFY THIS FILE! It is updated automatically by the
# e-smith server and gateway software. Instead, modify the source
# template in the /etc/e-smith/templates directory. For more
# information, see http://www.e-smith.org.
#
# copyright (C) 1999, 2000 e-smith, inc.
#------------------------------------------------------------

boot=/dev/md0
map=/boot/map
install=/boot/boot.b
prompt
timeout=50
message=/boot/mitel.pcx
linear
default=esmith

image=/boot/vmlinuz-2.2.19-7.0.8
        label=esmith
        read-only
        root=/dev/md1


        initrd=/boot/initrd-2.2.19-7.0.8.img


#------------------------------------------------------------
# TEMPLATE END
#------------------------------------------------------------