RAID issue - need help recovering degraded array

jahlewis

151
+0/-0

RAID issue - need help recovering degraded array

« Reply #1 on: October 17, 2006, 03:52:27 AM »

OK, I'm reading like crazy here...

As I interpret this, /dev/md1 is broken, with /dev/hda2 not being mirrored.

However, if I try to add hda2 back to md1, I get an invalid argument error:

Code: [Select]

[root@gluon]# mdadm -a /dev/md1 /dev/hda2
mdadm: hot add failed for /dev/hda2: Invalid argument

So... I tried removing the partition first:

Code: [Select]

[root@gluon]# mdadm /dev/md1 -r /dev/hda2 -a /dev/hda2
mdadm: hot remove failed for /dev/hda2: No such device or address

So now what? the / partition on hda is hosed? How do I rebuild that? I'm quickly diving out of my depth here...

Logged

............

jahlewis

151
+0/-0

RAID issue - need help recovering degraded array

« Reply #3 on: October 17, 2006, 04:40:40 AM »

Last post tonight...

Reading this http://www.linuxquestions.org/questions/showthread.php?t=429857

Suggests running mdadm -C if all else fails... So I did:

Code: [Select]

[root@gluon init.d]# mdadm -C /dev/md1 -l1 -n2 /dev/hda2 /dev/hdb2
mdadm: /dev/hda2 appears to contain an ext2fs file system
    size=155918784K  mtime=Mon Oct 16 18:27:39 2006
mdadm: /dev/hda2 appears to be part of a raid array:
    level=1 devices=2 ctime=Thu Jan 12 19:21:55 2006
mdadm: /dev/hdb2 appears to contain an ext2fs file system
    size=155918784K  mtime=Sun Oct 15 20:33:12 2006
mdadm: /dev/hdb2 appears to be part of a raid array:
    level=1 devices=2 ctime=Thu Jan 12 19:21:55 2006
Continue creating array?

And I chickened out. Afraid of wiping the contends of the surviving partition. Does anyone know if I chose to continue, what whould happen?

Thanks for your patience

Logged

............

crazybob

894
+0/-0

RAID issue - need help recovering degraded array

« Reply #4 on: October 17, 2006, 02:17:33 PM »

When I had a drive that had a failed section in the raid, I removed the problem drive, and ran a program called HDD regenerator (http://www.dposoft.net/) on the drive. When I replaced the drive, it was detected as a new drive, and the raid was rebuilt without issue. You could run the program with the drive in place depending on how long you care to go without the sever being available. Hdd regenerator can take quite a while depending on drive sizw

Logged

If you think you know whats going on, you obviously have no idea whats going on!

jahlewis

151
+0/-0

RAID issue - need help recovering degraded array

« Reply #5 on: October 17, 2006, 10:26:29 PM »

Couple of things...

The drives are OK, since the other partitions on hda are working, so it is just a bad partition (hda2) that is attached to the md1 mirror set.

I have no idea which is hda and which is hdb in my system, so wouldn't know which to unplug.

Is the best course to stop the mirroring, make the the hdb disk the primary, reformat hda, then add it back to the mirror? If this is the case, can anyone point me in the right direction?

Thanks.

Logged

............

ldkeen

405
+0/-0

RAID issue - need help recovering degraded array

« Reply #6 on: October 17, 2006, 11:02:14 PM »

jahlewis,
Can you post the partition info from /dev/hda using

Code: [Select]

#fdisk /dev/hda followed by "p" to print the info.

Quote from: "jahlewis"

I have no idea which is hda

Both your hard drives are on the same cable (which is highly discouraged) so most of the time hda would be the drive at the end of the cable and hdb would be in the middle of the cable, but if unsure you should check the jumper settings on both drives to make sure.

Code: [Select]

#mdadm -a /dev/md1 /dev/hda2
That should have done the trick. I'm trying to work out why you have 3 raid devices instead of 2. Are you running version 7.0? It looks like /dev/md2 must be your swap.
Lloyd

Logged

raem

3,972
+4/-0

RAID issue - need help recovering degraded array

« Reply #7 on: October 17, 2006, 11:35:52 PM »

ldkeen & jahlewis

> I'm trying to work out why you have 3 raid devices instead of 2.
> Are you running version 7.0?

Assuming sme7 (as posted in sme7 forum), then it looks like the server was updated from sme6.x. The 3 partition format has been retained as the upgrade process did not convert it.
It will NOT be possible to simply remove & replace a drive and have the system automatically rebuild the array using the admin console menu. This will only work for new sme7 installs (or new sme7 installs plus restore from 6.x) where there are 2 partitions.

You will have to manually rebuild the array, search the forums as there have been a few good posts recently on this topic.

Logged

...

jahlewis

151
+0/-0

RAID issue - need help recovering degraded array

« Reply #8 on: October 17, 2006, 11:48:33 PM »

I'm pretty sure this was a clean install during the 7.0pre or beta series, then upgraded since. I think they are on the same ide cable, so thanks for that info ray. Is hda usually the master, and hdb the slave? I did copy over a lot of stuff from a 6.0 server, so that may be where this info is from?

My question is, and I guess I'll have to look, how do I break the mirroring/RAID specifying that hdb should be the master?

Yes, md0 is boot, md2 is swap and md1 is /

Code: [Select]

[root@gluon ~]# fdisk /dev/hda

The number of cylinders for this disk is set to 19457.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/hda: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1          13      104391   fd  Linux raid autodetect
/dev/hda2              14       19424   155918857+  fd  Linux raid autodetect
/dev/hda3           19425       19457      265072+  fd  Linux raid autodetect

aslo, FWIW, here is what the logs say during a boot:

Code: [Select]

Oct 17 06:32:35 gluon kernel: md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
Oct 17 06:32:35 gluon kernel: md: raid1 personality registered as nr 3
Oct 17 06:32:35 gluon kernel: md: Autodetecting RAID arrays.
Oct 17 06:32:35 gluon kernel: md: could not bd_claim hda2.
Oct 17 06:32:35 gluon kernel: md: autorun ...
Oct 17 06:32:35 gluon kernel: md: considering hdb3 ...
Oct 17 06:32:35 gluon kernel: md:  adding hdb3 ...
Oct 17 06:32:35 gluon kernel: md: hdb2 has different UUID to hdb3
Oct 17 06:32:35 gluon kernel: md: hdb1 has different UUID to hdb3
Oct 17 06:32:35 gluon kernel: md:  adding hda3 ...
Oct 17 06:32:35 gluon kernel: md: hda1 has different UUID to hdb3
Oct 17 06:32:35 gluon kernel: md: created md2
Oct 17 06:32:35 gluon kernel: md: bind<hda3>
Oct 17 06:32:35 gluon kernel: md: bind<hdb3>
Oct 17 06:32:35 gluon kernel: md: running: <hdb3><hda3>
Oct 17 06:32:35 gluon kernel: raid1: raid set md2 active with 2 out of 2 mirrors
Oct 17 06:32:35 gluon kernel: md: considering hdb2 ...
Oct 17 06:32:35 gluon kernel: md:  adding hdb2 ...
Oct 17 06:32:35 gluon kernel: md: hdb1 has different UUID to hdb2
Oct 17 06:32:35 gluon kernel: md: hda1 has different UUID to hdb2
Oct 17 06:32:35 gluon kernel: md: created md1
Oct 17 06:32:35 gluon kernel: md: bind<hdb2>
Oct 17 06:32:35 gluon kernel: md: running: <hdb2>
Oct 17 06:32:35 gluon kernel: raid1: raid set md1 active with 1 out of 2 mirrors
Oct 17 06:32:35 gluon kernel: md: considering hdb1 ...
Oct 17 06:32:35 gluon kernel: md:  adding hdb1 ...
Oct 17 06:32:35 gluon kernel: md:  adding hda1 ...
Oct 17 06:32:35 gluon kernel: md: created md0
Oct 17 06:32:35 gluon kernel: md: bind<hda1>
Oct 17 06:32:36 gluon kernel: md: bind<hdb1>
Oct 17 06:32:36 gluon kernel: md: running: <hdb1><hda1>
Oct 17 06:32:36 gluon kernel: raid1: raid set md0 active with 2 out of 2 mirrors
Oct 17 06:32:36 gluon kernel: md: ... autorun DONE.
Oct 17 06:32:36 gluon kernel: EXT3 FS on md0, internal journal
Oct 17 06:32:36 gluon kernel: Adding 264952k swap on /dev/md2.  Priority:-1 extents:1

Thanks guys...

Logged

............

raem

3,972
+4/-0

RAID issue - need help recovering degraded array

« Reply #9 on: October 18, 2006, 12:21:44 AM »

jahlewis,

> Lloyd wrote:
> mdadm -a /dev/md1 /dev/hda2
> That should have done the trick.

That looks appropriate, also see
man mdadm

Here's a good thread, see the post by Stefano
http://forums.contribs.org/index.php?topic=32572.msg138217#msg138217

Logged

...

cheezeweeze

18
+0/-0

RAID issue - need help recovering degraded array

« Reply #10 on: November 19, 2006, 05:19:13 PM »

> Lloyd wrote:
> mdadm -a /dev/md1 /dev/hda2
> That should have done the trick.

Try this:
mdadm --add /dev/md1 /dev/hda2

Logged

CharlieBrady

6,918
+3/-0

RAID issue - need help recovering degraded array

« Reply #11 on: November 19, 2006, 07:10:31 PM »

Quote from: "cheezeweeze"

> Lloyd wrote:
> mdadm -a /dev/md1 /dev/hda2
> That should have done the trick.

Try this:
mdadm --add /dev/md1 /dev/hda2

You should only do that if you are certain that the drive is good (and if so, why was it tossed out of the RAID array?) or if you don't care all that much about your data.

Logged

mike_mattos

313
+0/-0

RAID issue - need help recovering degraded array

« Reply #12 on: November 23, 2006, 11:12:40 PM »

Given that vendors have problems deciding if the first drive is 0, 1, or A,
and that sometimes C may be the original drive and D the one added later
(even if D is Primary on Primary )

is there a way to poll SME for the drive serial number?

Really helps when using Ghost to see the drive info!

Mike

Logged

...

CharlieBrady

6,918
+3/-0

RAID issue - need help recovering degraded array

« Reply #13 on: November 23, 2006, 11:33:45 PM »

Quote from: "mike_mattos"

Given that vendors have problems deciding if the first drive is 0, 1, or A,
and that sometimes C may be the original drive and D the one added later
(even if D is Primary on Primary )

Linux doesn't use drive letters A, C, or D, and drives are identified unambiguously by primary/secondary/master/slave. Ask google for details.

Logged

mike_mattos

313
+0/-0

RAID issue - need help recovering degraded array

« Reply #14 on: November 27, 2006, 08:37:33 PM »

SCSI and SATA drives are harder to identify, imagine 7 identical drives on a cable, only difference is a hidden jumper, or 6 red SATA cables neatly bundled with cable ties!

Having the drive serial number allows a printout of diagnostics & after the fact confirmation that the drive being replaced is actually the drive you indended, and that a barin (brain) cramp didn't lead to tracing the wrong cable or enumerating the ID jumpers in the wrong direction!

So I ask again, can you query the drive serial number on SME?

Logged

...