Koozali.org: home of the SME Server

RAID issue - need help recovering degraded array

Offline jahlewis

  • *
  • 151
  • +0/-0
    • http://www.arachnerd.com/
RAID issue - need help recovering degraded array
« on: October 17, 2006, 03:23:44 AM »
Not sure what is going on here, and what to do.  Can any of you RAID guru's interpret this?

Code: [Select]
 ¦ Current RAID status:                                                     ¦
  ¦                                                                          ¦
  ¦ Personalities : [raid1]                                                  ¦
  ¦ md1 : active raid1 hdb2[1]                                               ¦
  ¦       155918784 blocks [2/1] [_U]                                        ¦
  ¦                                                                          ¦
  ¦ md2 : active raid1 hdb3[1] hda3[0]                                       ¦
  ¦       264960 blocks [2/2] [UU]                                           ¦
  ¦                                                                          ¦
  ¦ md0 : active raid1 hdb1[1] hda1[0]                                       ¦
  ¦       104320 blocks [2/2] [UU]                                           ¦
  ¦                                                                          ¦
  ¦ unused devices: <none>                                                   ¦
  ¦                                                                          ¦
  ¦                                                                          ¦
  ¦ There should be two RAID devices, not 3

-----------------------------------------------------------------------------------
I did get an email on reboot from mdam monitoring:
Code: [Select]
Subject: DegradedArray event on /dev/md1:gluon.arachnerd.org
This is an automatically generated mail message from mdadm running on gluon.arachnerd.org.

A DegradedArray event has been detected on md device /dev/md1.

Here is my current filesystem setup
[root@gluon]# df -h
Code: [Select]
Filesystem            Size  Used Avail Use% Mounted on
/dev/md1              147G  8.9G  131G   7% /
/dev/md0               99M   32M   63M  34% /boot
none                  315M     0  315M   0% /dev/shm
/dev/hdd1             230G   63G  156G  29% /mnt/bigdisk

and here are some details on the RAID settings for md0 and md1 (md2 is just like md0)
Code: [Select]
[root@gluon]# mdadm -D /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Thu Jan 12 19:26:31 2006
     Raid Level : raid1
     Array Size : 104320 (101.88 MiB 106.82 MB)
    Device Size : 104320 (101.88 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Oct 16 18:38:10 2006
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0


    Number   Major   Minor   RaidDevice State
       0       3        1        0      active sync   /dev/hda1
       1       3       65        1      active sync   /dev/hdb1
           UUID : 5139bc2e:39939d3e:5abd791c:3ce0a6ef
         Events : 0.3834


Code: [Select]
[root@gluon]# mdadm -D /dev/md1
/dev/md1:
        Version : 00.90.01
  Creation Time : Thu Jan 12 19:21:55 2006
     Raid Level : raid1
     Array Size : 155918784 (148.70 GiB 159.66 GB)
    Device Size : 155918784 (148.70 GiB 159.66 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Mon Oct 16 18:27:38 2006
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0


    Number   Major   Minor   RaidDevice State
       0       0        0       -1      removed
       1       3       66        1      active sync   /dev/hdb2
           UUID : 0a968a22:d1b0d2bd:ab248bae:ec482cc1
         Events : 0.12532934
............

Offline jahlewis

  • *
  • 151
  • +0/-0
    • http://www.arachnerd.com/
RAID issue - need help recovering degraded array
« Reply #1 on: October 17, 2006, 03:52:27 AM »
OK, I'm reading like crazy here...

As I interpret this, /dev/md1 is broken, with /dev/hda2 not being mirrored.

However, if I try to add hda2 back to md1, I get an invalid argument error:

Code: [Select]
[root@gluon]# mdadm -a /dev/md1 /dev/hda2
mdadm: hot add failed for /dev/hda2: Invalid argument


So... I tried removing the partition first:
Code: [Select]
[root@gluon]# mdadm /dev/md1 -r /dev/hda2 -a /dev/hda2
mdadm: hot remove failed for /dev/hda2: No such device or address


So now what?  the / partition on hda is hosed?  How do I rebuild that?  I'm quickly diving out of my depth here...
............

Offline jahlewis

  • *
  • 151
  • +0/-0
    • http://www.arachnerd.com/
RAID issue - need help recovering degraded array
« Reply #2 on: October 17, 2006, 04:22:48 AM »
FWIW
Code: [Select]
[root@gluon init.d]# mdadm -E /dev/hdb2
/dev/hdb2:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 0a968a22:d1b0d2bd:ab248bae:ec482cc1
  Creation Time : Thu Jan 12 19:21:55 2006
     Raid Level : raid1
    Device Size : 155918784 (148.70 GiB 159.66 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1

    Update Time : Mon Oct 16 18:27:38 2006
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0
       Checksum : b0a4fa9a - correct
         Events : 0.12532934


      Number   Major   Minor   RaidDevice State
this     1       3       66        1      active sync   /dev/hdb2
   0     0       0        0        0      removed
   1     1       3       66        1      active sync   /dev/hdb2


Code: [Select]
[root@gluon init.d]# mdadm -E /dev/hda2
/dev/hda2:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 0a968a22:d1b0d2bd:ab248bae:ec482cc1
  Creation Time : Thu Jan 12 19:21:55 2006
     Raid Level : raid1
    Device Size : 155918784 (148.70 GiB 159.66 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1

    Update Time : Sun Oct 15 21:07:07 2006
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : b0a3ce33 - correct
         Events : 0.12532928


      Number   Major   Minor   RaidDevice State
this     0       3        2        0      active sync   /dev/hda2
   0     0       3        2        0      active sync   /dev/hda2
   1     1       3       66        1      active sync   /dev/hdb2
............

Offline jahlewis

  • *
  • 151
  • +0/-0
    • http://www.arachnerd.com/
RAID issue - need help recovering degraded array
« Reply #3 on: October 17, 2006, 04:40:40 AM »
Last post tonight...

Reading this http://www.linuxquestions.org/questions/showthread.php?t=429857

Suggests running mdadm -C if all else fails... So I did:
Code: [Select]
[root@gluon init.d]# mdadm -C /dev/md1 -l1 -n2 /dev/hda2 /dev/hdb2
mdadm: /dev/hda2 appears to contain an ext2fs file system
    size=155918784K  mtime=Mon Oct 16 18:27:39 2006
mdadm: /dev/hda2 appears to be part of a raid array:
    level=1 devices=2 ctime=Thu Jan 12 19:21:55 2006
mdadm: /dev/hdb2 appears to contain an ext2fs file system
    size=155918784K  mtime=Sun Oct 15 20:33:12 2006
mdadm: /dev/hdb2 appears to be part of a raid array:
    level=1 devices=2 ctime=Thu Jan 12 19:21:55 2006
Continue creating array?


And I chickened out.  Afraid of wiping the contends of the surviving partition.  Does anyone know if I chose to continue, what whould happen?

Thanks for your patience
............

Offline crazybob

  • *****
  • 894
  • +0/-0
    • Stalzer R&D
RAID issue - need help recovering degraded array
« Reply #4 on: October 17, 2006, 02:17:33 PM »
When I had a drive that had a failed section in the raid, I removed the problem drive, and ran a program called HDD regenerator (http://www.dposoft.net/) on the drive. When I replaced the drive, it was detected as a new drive, and the raid was rebuilt without issue. You could run the program with the drive in place depending on how long you care to go without the sever being available. Hdd regenerator can take quite a while depending on drive sizw
If you think you know whats going on, you obviously have no idea whats going on!

Offline jahlewis

  • *
  • 151
  • +0/-0
    • http://www.arachnerd.com/
RAID issue - need help recovering degraded array
« Reply #5 on: October 17, 2006, 10:26:29 PM »
Couple of things...

The drives are OK, since the other partitions on hda are working, so it is just a bad partition (hda2) that is attached to the md1 mirror set.

I have no idea which is hda and which is hdb in my system, so wouldn't know which to unplug.

Is the best course to stop the mirroring, make the the hdb disk the primary, reformat hda, then add it back to the mirror?  If this is the case, can anyone point me in the right direction?

Thanks.
............

Offline ldkeen

  • *
  • 405
  • +0/-0
RAID issue - need help recovering degraded array
« Reply #6 on: October 17, 2006, 11:02:14 PM »
jahlewis,
Can you post the partition info from /dev/hda using
Code: [Select]
#fdisk /dev/hda followed by "p" to print the info.
Quote from: "jahlewis"
I have no idea which is hda

Both your hard drives are on the same cable (which is highly discouraged) so most of the time hda would be the drive at the end of the cable and hdb would be in the middle of the cable, but if unsure you should check the jumper settings on both drives to make sure.
Code: [Select]
#mdadm -a /dev/md1 /dev/hda2
That should have done the trick. I'm trying to work out why you have 3 raid devices instead of 2. Are you running version 7.0? It looks like /dev/md2 must be your swap.
Lloyd

Offline raem

  • *
  • 3,972
  • +4/-0
RAID issue - need help recovering degraded array
« Reply #7 on: October 17, 2006, 11:35:52 PM »
ldkeen & jahlewis

>  I'm trying to work out why you have 3 raid devices instead of 2.
> Are you running version 7.0?

Assuming sme7 (as posted in sme7 forum), then it looks like the server was updated from sme6.x. The 3 partition format has been retained as the upgrade process did not convert it.
It will NOT be possible to simply remove & replace a drive and have the system automatically rebuild the array using the admin console menu. This will only work for new sme7 installs (or new sme7 installs plus restore from 6.x) where there are 2 partitions.

You will have to manually rebuild the array, search the forums as there have been a few good posts recently on this topic.
...

Offline jahlewis

  • *
  • 151
  • +0/-0
    • http://www.arachnerd.com/
RAID issue - need help recovering degraded array
« Reply #8 on: October 17, 2006, 11:48:33 PM »
I'm pretty sure this was a clean install during the 7.0pre or beta series, then upgraded since.  I think they are on the same ide cable, so thanks for that info ray.  Is hda usually the master, and hdb the slave? I did copy over a lot of stuff from a 6.0 server, so that may be where this info is from?

My question is, and I guess I'll have to look, how do I break the mirroring/RAID specifying that hdb should be the master?

Yes, md0 is boot, md2 is swap and md1 is /

Code: [Select]
[root@gluon ~]# fdisk /dev/hda

The number of cylinders for this disk is set to 19457.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/hda: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1          13      104391   fd  Linux raid autodetect
/dev/hda2              14       19424   155918857+  fd  Linux raid autodetect
/dev/hda3           19425       19457      265072+  fd  Linux raid autodetect


aslo, FWIW, here is what the logs say during a boot:
Code: [Select]
Oct 17 06:32:35 gluon kernel: md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
Oct 17 06:32:35 gluon kernel: md: raid1 personality registered as nr 3
Oct 17 06:32:35 gluon kernel: md: Autodetecting RAID arrays.
Oct 17 06:32:35 gluon kernel: md: could not bd_claim hda2.
Oct 17 06:32:35 gluon kernel: md: autorun ...
Oct 17 06:32:35 gluon kernel: md: considering hdb3 ...
Oct 17 06:32:35 gluon kernel: md:  adding hdb3 ...
Oct 17 06:32:35 gluon kernel: md: hdb2 has different UUID to hdb3
Oct 17 06:32:35 gluon kernel: md: hdb1 has different UUID to hdb3
Oct 17 06:32:35 gluon kernel: md:  adding hda3 ...
Oct 17 06:32:35 gluon kernel: md: hda1 has different UUID to hdb3
Oct 17 06:32:35 gluon kernel: md: created md2
Oct 17 06:32:35 gluon kernel: md: bind<hda3>
Oct 17 06:32:35 gluon kernel: md: bind<hdb3>
Oct 17 06:32:35 gluon kernel: md: running: <hdb3><hda3>
Oct 17 06:32:35 gluon kernel: raid1: raid set md2 active with 2 out of 2 mirrors
Oct 17 06:32:35 gluon kernel: md: considering hdb2 ...
Oct 17 06:32:35 gluon kernel: md:  adding hdb2 ...
Oct 17 06:32:35 gluon kernel: md: hdb1 has different UUID to hdb2
Oct 17 06:32:35 gluon kernel: md: hda1 has different UUID to hdb2
Oct 17 06:32:35 gluon kernel: md: created md1
Oct 17 06:32:35 gluon kernel: md: bind<hdb2>
Oct 17 06:32:35 gluon kernel: md: running: <hdb2>
Oct 17 06:32:35 gluon kernel: raid1: raid set md1 active with 1 out of 2 mirrors
Oct 17 06:32:35 gluon kernel: md: considering hdb1 ...
Oct 17 06:32:35 gluon kernel: md:  adding hdb1 ...
Oct 17 06:32:35 gluon kernel: md:  adding hda1 ...
Oct 17 06:32:35 gluon kernel: md: created md0
Oct 17 06:32:35 gluon kernel: md: bind<hda1>
Oct 17 06:32:36 gluon kernel: md: bind<hdb1>
Oct 17 06:32:36 gluon kernel: md: running: <hdb1><hda1>
Oct 17 06:32:36 gluon kernel: raid1: raid set md0 active with 2 out of 2 mirrors
Oct 17 06:32:36 gluon kernel: md: ... autorun DONE.
Oct 17 06:32:36 gluon kernel: EXT3 FS on md0, internal journal
Oct 17 06:32:36 gluon kernel: Adding 264952k swap on /dev/md2.  Priority:-1 extents:1


Thanks guys...
............

Offline raem

  • *
  • 3,972
  • +4/-0
RAID issue - need help recovering degraded array
« Reply #9 on: October 18, 2006, 12:21:44 AM »
jahlewis,

> Lloyd wrote:
> mdadm -a /dev/md1 /dev/hda2
> That should have done the trick.

That looks appropriate, also see
man mdadm


Here's a good thread, see the post by Stefano
http://forums.contribs.org/index.php?topic=32572.msg138217#msg138217
...

Offline cheezeweeze

  • *
  • 18
  • +0/-0
RAID issue - need help recovering degraded array
« Reply #10 on: November 19, 2006, 05:19:13 PM »
> Lloyd wrote:
> mdadm -a /dev/md1 /dev/hda2
> That should have done the trick.

Try this:
mdadm --add /dev/md1 /dev/hda2

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
RAID issue - need help recovering degraded array
« Reply #11 on: November 19, 2006, 07:10:31 PM »
Quote from: "cheezeweeze"
> Lloyd wrote:
> mdadm -a /dev/md1 /dev/hda2
> That should have done the trick.

Try this:
mdadm --add /dev/md1 /dev/hda2


You should only do that if you are certain that the drive is good (and if so, why was it tossed out of the RAID array?) or if you don't care all that much about your data.

Offline mike_mattos

  • *
  • 313
  • +0/-0
RAID issue - need help recovering degraded array
« Reply #12 on: November 23, 2006, 11:12:40 PM »
Given that vendors have problems deciding if the first drive is 0, 1, or A,
and that sometimes C may be the original drive and D the one added later
(even if D is Primary on Primary )

is there a way to poll SME for the drive serial number?  

Really helps when using Ghost to see the drive info!

Mike
...

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
RAID issue - need help recovering degraded array
« Reply #13 on: November 23, 2006, 11:33:45 PM »
Quote from: "mike_mattos"
Given that vendors have problems deciding if the first drive is 0, 1, or A,
and that sometimes C may be the original drive and D the one added later
(even if D is Primary on Primary )


Linux doesn't use drive letters A, C, or D, and drives are identified unambiguously by primary/secondary/master/slave. Ask google for details.

Offline mike_mattos

  • *
  • 313
  • +0/-0
RAID issue - need help recovering degraded array
« Reply #14 on: November 27, 2006, 08:37:33 PM »
SCSI and SATA drives are harder to identify, imagine 7 identical drives on a cable, only difference is a hidden jumper, or 6 red SATA cables neatly bundled with cable ties!  

Having the drive serial number allows a printout of diagnostics  & after the fact confirmation that the drive being replaced is actually the drive you indended, and that a barin (brain)  cramp didn't lead to tracing the wrong cable or enumerating the ID jumpers in the wrong direction!

So I ask again, can you query the drive serial number on SME?
...