Koozali.org: home of the SME Server

Legacy Forums => Experienced User Forum => Topic started by: ReetP on March 31, 2005, 03:27:04 AM

Title: RAID - disk upgrade problems
Post by: ReetP on March 31, 2005, 03:27:04 AM
Trying to upgrade a pair of software mirrored 40Gb disks to 80s.

Using Acronis True Image 8 I clone both disks and let it do it's thing resizing the main partition. It appears to leave /boot and swap as is.

When the machine reboots I get the following :

Blah...
md : Autodetecting RAID arrays
md : Autorun ...
Mounting root filesytem
EXT3-fs: unable to read superblock
mount error 22 mounting ext3
pivotroot: pivot_root(.sysroot,sysroot/initrd) failed: 2
Freeing unused kernel memory: 120k freed
Kernel panic: No init found. Try passing init= option to kernel


Have read all round about this but still can't figure out what to do to sort it out :-(

Have used tomsrtbt, e2fsck on the drives, mounted them, read all around, checked the initrd folder permissions etc.

It looks like lilo is OK as we get way past that, so I am at a loss to know what else to do. I guess it is something to do with the data (hda2/hdc2) partitions being resized.

If I could I'd run an upgrade which I understand can sometimes sort things out, but as I am at 6.0.1 I am not keen on moving to 6.5 just yet.

Anyone have an idea how to sort this out because I am at a loss ? Worse still, google.co.uk seems out :-(

I don't want to reinstall from backup if I can help it. Surely there must be a relatively easy/straightforward answer ?
Title: Re: RAID - disk upgrade problems
Post by: raem on March 31, 2005, 10:52:38 AM
ReetP

??? Is the problem with the drives not having the same partition and/or boot information (due to cloning inaccuracies) or some other issue.

You could try cloning one drive only and then see if the server starts up in degraded mode.
If it does start up OK, then add the 2nd drive and rebuild the RAOD array using the RAID recovery HOWTO, raidhotadd etc etc.
Title: RAID - disk upgrade problems
Post by: ReetP on March 31, 2005, 06:56:54 PM
Hi Ray,

You may have a point about the cloning but I'm not sure - I'll have to try it again tonight. One disk goes through the normal routine and hangs as indicated previously whereas the other hangs at LIL-

Not sure if that is of any relevance. Could be the original disks aren't quite right either or Acronis has got something wrong. I did clone the disks one to one (A-A and B-B if you get my meaning). I'll try and  check this tonight.


With the single disk that hangs at LIL- I did try to mount it and run lilo to sort that out but it then complains:

open /dev/md0: No such file or directory

or Fatal: Unable to open /dev/md0

depending on whether you run lilo as chroot

I guess this is because the drive has been mounted as a normal partition and not as a raid partition and presumably something in the boot map or somesuch tells it that it has to be written to /dev/md0 and not /dev/hda1 or /dev/hdc1

I'm not experienced enough to figure this one out so any help would be gratefully received.

Let me know if you need further information. If you want me to try a number of steps to see what happens I am happy to oblige. Got a whole weekend of fun ahead :-(

It just strikes me that there should be an easy way to do this and undoubtedly from the various posts it is something that people do (or want to) quite often. It should in theory be quicker than a reinstall / backup restore as well

B. Rgds
John
Title: RAID - disk upgrade problems
Post by: ReetP on March 31, 2005, 09:17:20 PM
Update.

Just cloned hda. I noticed that on automatic it resized hda1 from abt 101 to 220 odd MB

I then chose manual space allocation, kept hda1 (/boot) and hda3 (swap) at the same sizes and only resized hda2

I then tried to boot just off this disk but it failed at LIL-

Also tried unplugging each of the old disks in turn and both seemed to get to the SME boot menu without stoppping. Didn't go any further as it will take all night to remirror them both.

Currently cloning hdc. Report follows in a couple of hours......  :-)
Title: RAID - disk upgrade problems
Post by: raem on April 01, 2005, 12:17:35 AM
ReetP

My knowledge is similar to yours about mirrored drives & cloning etc, but I agree it's something that should be easy to do & desirable.

From your (failure) result after cloning one RAID  drive only, that suggests the cloning process is the problem ie not copying boot and partition information correctly.
Search for LILO issues, there have been many posts and there is also some good reference material on the net, use goggle to find it. That result LIL indicates the point in the process that things are failing, read up on it.
If you can resolve the LILO issue then there is a good chance the cloning will work as expected.

If cloninhg only hda does not work, then I doubt that hdc will work either.
Acronis may be the problem. have you tried other cloning software ?

I have used Norton Ghost 2003 to clone single sme drives and had a hang on reboot issue. It is mentioned on symantecs web site, so it's a limitation of the cloning software (when used with Linux formatted drives).

http://mirror.contribs.org/smeserver/contribs/rmitchell/smeserver/howto/Cloning%20drive%20with%20Ghost%20lilo.conf%20fix%20HOWTO%20for%20sme%20server.htm


My other thought (ie workaround) is to use one of the larger drives (say as hdc) and rebuild the array, albeit with a lot of unused space of the new drive. Then repartition that drive to use all the unused space (the System Rescue CD has a good partition tool).
If that works then you would have a working degraded RAID system with a large drive and you could then connect the other larger drive and rebuild the array.
A bit of a two step process but it may work.

I think there will ne some material here for a good HOWTO, are you willing to "put it on paper" ?
Title: RAID - disk upgrade problems
Post by: ReetP on April 02, 2005, 02:04:58 AM
Well,I did reply but somehow didn't. Hmm. Must have been late and pushed the wrong button.

Plugged in the cloned hdc and that got as far as the initrd failure. Strange that they failed at different points. Must be something to do with Acronis I guess. I will try and get hold of Ghost 7.5 + and give that a whirl too.

Another thought. The box is an old P3 450 on a Gigabyte board that originally didn't support 80Gb drives. There was a BIOS patch that I applied and the BIOS seems to recognise them OK. Not sure if that makes a difference.

I am quite happy to write a Howto, if I can find out how to..... :-)

I will use this thread as my notepad.

I will try the following

1. Fix this as is. I need to do some work on Lilo and how it works with RAID devices.

How can it write to /dev/md0 when the drives have not been RAID mounted e.g booting from a rescue disk and then mounting the drives? Not sure about that at all.

I tried changing lilo.conf from /dev/mdx to /dev/hdax, ran lilo which reports that it installed, and then changed it back whereupon it hangs at LIL- again.

I'll try it again but this time leave lilo.conf with the changes in place. Not sure how you then convert back to an array.

2. Try Ghost 7.5+ once I get hold of it

3. Clone at the original size (no partition resizing) and then resize. then add in the second drive and rebuild the array

Grateful for any ideas on this one. Someone out there must be a lilo guru........

B. Rgds
John
Title: RAID - disk upgrade problems
Post by: raem on April 02, 2005, 06:19:31 AM
ReetP

> Someone out there must be a lilo guru........

Well as Charlie Brady says, google is your friend, he knows more than me and works 24 hours a day !

Searching contribs.org is also your friend as others have possibly already experienced and answered your problem.

Here are some search results that may help you.
The last one (or last few) is where I would head first.
Check some of the contribs.org results.

Just to be sure have you booted to a floppy, checked and modified if necessary the /etc/lilo.conf file and then run the lilo command (as per my HOWTO)?
http://mirror.contribs.org/smeserver/contribs/rmitchell/smeserver/howto/Cloning%20drive%20with%20Ghost%20lilo.conf%20fix%20HOWTO%20for%20sme%20server.htm



LILO
google search results

http://www.google.com.au/search?hl=en&q=LILO&btnG=Google+Search&meta=

http://www.tldp.org/HOWTO/LILO.html

http://www.acm.uiuc.edu/workshops/linux_install/lilo.html


Linux LILO problems
google search results

http://www.google.com.au/search?hl=en&q=Linux+LILO+problems&meta=


contribs.org search
LIL

There are many more than these

http://forums.contribs.org/index.php?topic=25082.msg100884#msg100884

http://forums.contribs.org/index.php?topic=25082.msg100884#msg100884

http://forums.contribs.org/index.php?topic=25082.msg100884#msg100884

http://forums.contribs.org/index.php?topic=25082.msg100884#msg100884

http://forums.contribs.org/index.php?topic=25082.msg100884#msg100884

http://no.longer.valid/phpwiki/index.php/TroubleshootingFAQ

http://no.longer.valid/phpwiki/index.php/TroubleshootingFAQ#boot

The here link mentioned in the above link is incorrect so here is the real site URL

http://linux-newbie.sunsite.dk/


Look at section 5.1.2 in this next link

http://linux-newbie.sunsite.dk/html/lnag.html#5.1.Startup%20Issues%20(LILO%20and%20GRUB)|outline

This quote is related to your LIL... issue

"When LILO loads itself, it displays the word LILO. Each letter is printed before or after performing some specific action. If LILO fails at some point, the letters printed so far can be used to identify the problem. [...]

LI [...] This is caused either by geometry mismatch or by moving /etc/lilo/boot.b without running the map installer.

LIL [...] This is typically caused by media failure or geometry mismatch."

The geometry means the number of sectors/heads/cylinders used in the hard drive configuration of your BIOS. Hope this helps!

It is a very good idea to have a handbook for Linux or at least a general UNIX handbook. Handbooks for Windows are useless, handbooks for Linux are great! "Red Hat Linux Unleashed" is a very good handbook but I am sure there are many other equally good ones.

With a LILO error like above, you can boot your machine using a Linux or DOS boot floppy. There seems to be several general possibilities to correct such a LILO error, depending on what is wrong:

If LILO simply got corrupted (does not seem very common), you can remove and re-install it. You can remove LILO by running under Linux:

lilo -u /dev/hda

or, under DOS:

FDISK/MBR

which rewrites the hard drive master boot record (MBR), in which LILO resides, and replaces it with "clean" DOS stuff. You will lose access to Linux if you rebooted your computer after removing LILO (if this happened, you can boot Linux from the floppy and re-install LILO on top of the DOS MBR).

To re-install LILO, simply re-run the command lilo (as root).
Title: RAID - disk upgrade problems
Post by: raem on April 03, 2005, 04:20:52 AM
This thread may also be of use.

http://forums.contribs.org/index.php?topic=26609.0
Title: RAID - disk upgrade problems
Post by: ldkeen on April 03, 2005, 07:19:55 AM
John,
You may be able to glean some info from the following howto. It has info on resizing the filesystem to avoid corrupt superblock errors etc. I think the only difference between your setup and ours was you used Acronis and we used dd. Here's the howto:
Replacing a Damaged Scsi Software Raid 1 With a Larger Ide Raid 1 on E-Smith

Note: To get the best performance from raided ide drives they should be set as masters on the primary and secondary channels.
ie. hda and hdc

Step-1 - boot knoppix from CD and image good scsi drive to new ide drives

Pull up terminal as root
$ su

Image drives, where sdx is the remaining good drive either sda or sdb
# dd if=/dev/sdx of=/dev/hda
# dd if=/dev/sdx of=/dev/hdc

This step take quite some time (in the order of hours) so be patient

Step-2 - move and resize partitions to use larger hard drive space

Reboot knoppix with no swap device
# knoppix noswap

Pull up terminal as root
$ su

Use graphical interface to move swap partition to end of the drives
# qtparted
select hda
right click on the swap partition and select move
type 0 in the "Free space after" box and press tab.
click OK
click Device and select commit

repeat for hdc

Resize root partition to take up remaining free space (qtparted doesn't support ext3 so must use parted)
# parted /dev/hda
(parted) print

write down the size of the starting positions of the root (Minor 2) and swap (Minor 3) partitions and call them r & s
(parted) resize 2 r s
(parted) quit

repeat for hdc


Step-3 - edit raidtab file and copy to knoppix file system

Mount root partition of new ide drive
right click hda2 icon on desktop
click mount
right click hda2 icon again
click Actions, Change read write mode
click ok to change to write

# mcedit /mnt/hda2/etc/raidtab
Change all sda to hda and all sdb to hdc and save

Now copy the raidtab file from your real root filesystem to the current root filesystem.
# cp /mnt/hda2/etc/raidtab /etc/raidtab


Step-4 - unmount filesystems

In order to start the raid devices, and sync the drives, it is necessary to unmount all the temporary filesystems.

# umount /mnt/hd*


Step-5 - start raid devices

Because there are filesystems on /dev/hda1, /dev/hda2 and /dev/hda3 it is necessary to force the start of the raid device.
# mkraid --really-force /dev/md2

You can check the completion progress by cat'ing the /proc/mdstat file. It shows you status of the raid device and percentage left to sync.
Again this step can take quite some time (in the order of hours) so be patient.

Continue with / and /boot

# mkraid --really-force /dev/md1
# mkraid --really-force /dev/md0

The md driver syncs one device at a time.


Step-6 - resize filesystem

When we created the raid device, the physical partion became slightly smaller because a second superblock is stored at the end of the partition.
If you reboot the system now, the reboot will fail with an error indicating the superblock is corrupt.

Resize them prior to the reboot

You will be required to fsck each of the md devices except the swap device. The -f flag is required to force fsck to check a clean filesystem.
This will generate the same error about inconsistent sizes and possibly corrupted superblock.Say N to 'Abort?'.
# e2fsck -f /dev/md0
# e2fsck -f /dev/md1

# resize2fs /dev/md0
# resize2fs /dev/md1

Check again to be sure. Should be no errors now.
# e2fsck -f /dev/md0
# e2fsck -f /dev/md1

Remake swap space
# mkswap -c /dev/md2

Step-7 - Reboot and viola.

# reboot

Regards Lloyd
Title: RAID - disk upgrade problems
Post by: ReetP on April 04, 2005, 12:25:11 PM
Ray,

thanks for that. It seems that the problem with LILO is writing to a RAID partition e.g. /dev/md0 when you haven't mounted the drives as a RAID, e.g. as /dev/hda1 & /dev/hdc1

Mounting & running LILO on single disks doesn't seem to be an issue.

I shall have some time in the afternoons this week to do some research on it.

Id,

this looks interesting as well. What version of Knoppix were you using ?

The key to this is that you mounted the RAID under Knoppix and file checked it there. This is where I had been getting stuck I think, particularly if you need to run lilo as per my comments above.

I am going to try Knoppix and see if I can mount the  disks as raid devices and then have a play.

Will report back soonest.

B. Rdgs
John
Title: RAID - disk upgrade problems
Post by: ldkeen on April 04, 2005, 02:21:27 PM
John,
We spent almost 4 weeks testing and retesting the above procedure on various scsi and ide setups before applying it to a production system. We managed to move a complete  e-smith setup (including all mods - hylafax, ipsec etc) from a 36GB raid1 scsi over to an 80GB raid1 ide in about 4 hours. I think from memory we used knoppix 3.4 but I don't think there'd be any difference between versions. Just follow the procedure very closely and you should be right.
Regards Lloyd
Title: RAID - disk upgrade problems
Post by: mbachmann on April 04, 2005, 04:25:58 PM
Lloyd, would you type your howto into this wiki i've prepared for you here: http://no.longer.valid/phpwiki/index.php/Migrating%20a%20Raid%201%20to%20bigger%20harddisks%20with%20Knoppix
Title: RAID - disk upgrade problems
Post by: ldkeen on April 04, 2005, 10:45:47 PM
mbachman,
Could you e-mail me the user/pass and I'll do it throughout the day.
Lloyd
Title: RAID - disk upgrade problems
Post by: raem on April 05, 2005, 12:28:50 AM
Lloyd

> Could you e-mail me the user/pass

I think anyone can do it (edit wikis etc) as long as you are a registered contribs.org user.
Title: RAID - disk upgrade problems
Post by: ldkeen on April 05, 2005, 01:40:35 AM
Ray,
I tried my usual login/pass several times but it didn't work?? I thought maybe they had to set you up with a special account to edit the wiki.
Lloyd

It's alright - worked it out.
Title: RAID - disk upgrade problems
Post by: raem on April 05, 2005, 02:03:40 AM
ldkeen

> It's alright - worked it out.

Sorry I missed that last line, just edited it myself (test only)
I'll change it back now
Title: RAID - disk upgrade problems
Post by: mbachmann on April 05, 2005, 08:50:41 AM
Thank you, your work is appreciated very much. Raid migrating is difficult thing and a howto from someone who tested and sorted the process decently out for 4 weeks is very helpful. Pheraps you should point that out in the howto.

The easiest thing to wiki: just log into the Forums, then you can automatically edit the wiki. There is no special login.

I corrected some minor spelling errors/formatting.
Title: RAID - disk upgrade problems
Post by: ReetP on April 05, 2005, 10:29:21 PM
Finally managed to get a test bench machine setup.

Tried to clone with Ghost 2003 to no avail. Same 'kernel panic : No init found'

It occurred to me when I ran ghost and it looked at the partitions that it did not recognose the partition type.  Now I thought that Ghost 2003 recognised EXT3 partitions. But does it recognise RAID partitions ? And presumably the same with Acronis.

Just had a look and found this :

http://service1.symantec.com/SUPPORT/ghost.nsf/docid/1999010613522725?Open&src=sg&docid=2002092510522725&nsf=ghost.nsf&view=8f7dc138830563c888256c2200662ecd/dfb4b017218165c088256c3f00622fae?opendocument&prod=norton%20ghost&ver=2003%20for%20windows%202000/nt/me/98&dtype=&prod=Norton%20Ghost&ver=2003%20for%20Windows%202000/NT/Me/98/XP&osv=&osv_lvl

So it looks as if you cannot use Ghost (or probably Acronis) to directly/easily clone RAID partitions. I am going to try and follow what Lloyd did but use Ghost intead of dd and then follow on with Knoppix.

Anyone else go any thoughts on this ?

Do we have a Wiki page on HowNOTto ;-)
Title: RAID - disk upgrade problems
Post by: ReetP on April 06, 2005, 01:29:04 AM
Ho hum.....

http://service1.symantec.com/SUPPORT/ghost.nsf/8f7dc138830563c888256c2200662ecd/3bb8390b87c494c088256afe00565841?OpenDocument&prod=Norton%20Ghost&ver=2003%20for%20Windows%202000/NT/Me/98/XP&src=sg&pcode=ghost&svy=&csm=no

What fun :-)

I presume that this means convert back to EXT2 with a journal ?
Title: RAID - disk upgrade problems
Post by: ReetP on April 12, 2005, 06:29:31 PM
Lloyd,

Followed your Howto with a few changes and it works admirably.

Not sure if you want to work any of the following into it - only 1 reboot necessary :-) :

For IDE drives do /dev/hdx (instead of /dev/sdx) to /dev/hdy

I left one of the existing drives on /dev/hdb and hung the two new ones on /dev/hda & /dev/hdc

Boot knoppix noswap

mnt hdb, copy the raidtab to local /etc/raidtab, unmount /dev/hd*

IF you are not changing the drive assignments (e.g. you are moving from IDE - IDE) you do not need to alter the raidtab

dd'd hdb to hda & hdc

Follow on the howto

Not sure what the resize2fs does. On mine, I left /dev/md0 (boot partition) as is and did not resize.

resize2fs on md1 did it's thing, but if I recall correctly, it reported 'nothing to do' on md0. Will check again when I do the real thing this weekend.

Hopefully I will get time to try again with Acronis & Ghost as I now have a nice test rig for it.