Koozali.org: home of the SME Server

RAID error appearing in email...

Offline ashpenaz

  • 4
  • +0/-0
RAID error appearing in email...
« on: October 07, 2007, 08:36:25 PM »
The following message appears in my email only when I run the upgrades listed in Server-Manager:
A DegradedArray event has been detected on md device /dev/md2

I have been searching the Forums and BugTracker and though I have seen some similar situations I am still at a loss as to how to proceed. I have 2 Maxtor 160 GB Drives, ATA100, one on the primary IDE and one on the secondary IDE. I am running SME 7.2

In response to what I read in the Forums I ran the following commands with the following output:

mdadm --detail /dev/md2
/dev/md2:
        Version : 00.90.01
  Creation Time : Wed Jun 27 14:03:34 2007

     Raid Level : raid1
     Array Size : 156183808 (148.95 GiB 159.93 GB)
    Device Size : 156183808 (148.95 GiB 159.93 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Sun Oct  7 12:34:49 2007
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 556a3837:b463294d:3d9ce593:24ae7fcc
         Events : 0.3245340

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1      22        2        1      active sync   /dev/hdc2
mdadm --detail /dev/md1
/dev/md1:
        Version : 00.90.01
  Creation Time : Wed Jun 27 14:03:34 2007
     Raid Level : raid1
     Array Size : 104320 (101.89 MiB 106.82 MB)
    Device Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Sun Oct  7 12:01:28 2007
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : f7fac4c3:0f05ec1d:2eb557c0:2cb5965e
         Events : 0.1560

    Number   Major   Minor   RaidDevice State
       0       3        1        0      active sync   /dev/hda1
       1      22        1        1      active sync   /dev/hdc1
fdisk -l

Disk /dev/hda: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1          13      104391   fd  Linux raid autodetect
/dev/hda2              14       19457   156183930   fd  Linux raid autodetect

Disk /dev/hdc: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdc1   *           1          13      104391   fd  Linux raid autodetect
/dev/hdc2              14       19457   156183930   fd  Linux raid autodetect

Disk /dev/md1: 106 MB, 106823680 bytes
2 heads, 4 sectors/track, 26080 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md1 doesn't contain a valid partition table

Disk /dev/md2: 159.9 GB, 159932219392 bytes
2 heads, 4 sectors/track, 39045952 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md2 doesn't contain a valid partition table

Disk /dev/dm-0: 158.3 GB, 158309810176 bytes
2 heads, 4 sectors/track, 38649856 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/dm-0 doesn't contain a valid partition table

Disk /dev/dm-1: 1577 MB, 1577058304 bytes
2 heads, 4 sectors/track, 385024 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/dm-1 doesn't contain a valid partition table

cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 hdc2[1]
      156183808 blocks [2/1] [_U]

md1 : active raid1 hda1[0] hdc1[1]
      104320 blocks [2/2] [UU]

unused devices: <none>

Server-Manager "Manage Disk Redundancy" shows the following:
Current RAID Status

Personalities: [RAID1]
md2: active raid1 hdc2[1]
156183808 blocks [2/1] [ _U]

md1: active raid1 hda1
  • hdc1 [1]

104320 blocks [2/2] [UU]
unused devices: <none>
Only some of the RAID devices are unclean.
Manual intervention may be required.

Any help is appreciated. Thanks in advance.
SME 7.4
RAID 1

Offline NickR

  • *
  • 283
  • +0/-0
    • http://www.witzendcs.co.uk/
Re: RAID error appearing in email...
« Reply #1 on: October 07, 2007, 11:01:24 PM »
You need to add the removed partition hda2 back into md2


#mdadm /dev/md2 -a /dev/hda2

then doing

#mdadm --detail --verbose /dev/md2

should show 2 active disks and a remirror in operation
--
Nick......

Offline ashpenaz

  • 4
  • +0/-0
Re: RAID error appearing in email...
« Reply #2 on: October 07, 2007, 11:50:32 PM »
Thank you Nick. That worked.

I had tried several things I had seen in the forums but none of them worked. I am now in recovery.
SME 7.4
RAID 1

Offline pfloor

  • *****
  • 889
  • +1/-0
Re: RAID error appearing in email...
« Reply #3 on: October 08, 2007, 07:12:41 AM »
You may also want to keep a very close eye on hda.  If it falls out of sync again, replace the disk ASAP!!!
In life, you must either "Push, Pull or Get out of the way!"

Offline NickR

  • *
  • 283
  • +0/-0
    • http://www.witzendcs.co.uk/
Re: RAID error appearing in email...
« Reply #4 on: October 08, 2007, 12:29:10 PM »
@ashpenaz:

Glad to be of help.

@pfloor:

IME, it's rarely the disk itself that causes this problem, it's the controller.  More accurately, it's putting the disks on different channels.  I (now) always put the disks on the primary controller as master & slave and although it doesn't always work, it does seem to reduce the number of times that spurious RAID problems occur.  Moving to SATA disks seems to be a good move for RAID stability.
--
Nick......

Offline warren

  • *
  • 293
  • +0/-0
Re: RAID error appearing in email...
« Reply #5 on: October 08, 2007, 01:27:27 PM »
@ashpenaz:

Glad to be of help.

@pfloor:

IME, it's rarely the disk itself that causes this problem, it's the controller.  More accurately, it's putting the disks on different channels.  I (now) always put the disks on the primary controller as master & slave and although it doesn't always work, it does seem to reduce the number of times that spurious RAID problems occur.  Moving to SATA disks seems to be a good move for RAID stability.

NickR,
Surely putting the raid1 on the same controller channel ( Primary : IDE-master-disk1  IDE-Slave-disk2 ) is more risky ! If the controller goes belly up you're left with NO Raid. The advice thats been given( http://wiki.contribs.org/Raid) before re raid1 is to have one disk on the Primary master and the other disk on the Secondary master.  ?

Offline NickR

  • *
  • 283
  • +0/-0
    • http://www.witzendcs.co.uk/
Re: RAID error appearing in email...
« Reply #6 on: October 08, 2007, 02:51:26 PM »
NickR,
Surely putting the raid1 on the same controller channel ( Primary : IDE-master-disk1  IDE-Slave-disk2 ) is more risky ! If the controller goes belly up you're left with NO Raid. The advice thats been given( http://wiki.contribs.org/Raid) before re raid1 is to have one disk on the Primary master and the other disk on the Secondary master.  ?

Notwithstanding that advice, I have installed many tens of SME / E-Smith machines (all of them RAID1 & mostly using IDE drives) over the years and I have never experienced a controller failure - maybe I'm just incredibly lucky!.  I've even got 1 machine that is 12 years old, running Smoothwall on a disk that is at least 10 years old - it's been up for 3 years.  I have only had 1 (genuine) disk failure but have seen many RAID sync problems.  All I can report is that my own experience is that:

a) there is no real-world performance difference between using the same channel & separate channels
b) some chipsets exhibit RAID sync problems when the disks are on separate channels
c) the examples in (b) have often (but not always) been cured  by putting the disks on the same channel.

I am merely reporting my own experiences (hence my IME preface to my comment).  Others can ignore me if they wish but an alternative view can sometimes be valuable.  I was merely trying to say that I wouldn't suspect the disk first without supporting evidence (like seek errors in the messages log).
--
Nick......

Offline Elliott

  • ****
  • 150
  • +0/-0
Re: RAID error appearing in email...
« Reply #7 on: October 08, 2007, 10:18:20 PM »
You need to add the removed partition hda2 back into md2

Could you possibly explain what led you to this conclusion? I get the mdadm errors with every reboot of one of my servers and I look at the output of all of the above messages and can't see where you figure out what's wrong. I have a similar setup with similar errors but I don't see what conclusively tells me which device is problematic.

My messages are:


Email
This is an automatically generated mail message from mdadm running on mail.dynamictrend.com.

A DegradedArray event has been detected on md device /dev/md2.

Email2
This is an automatically generated mail message from mdadm running on mail.dynamictrend.com.

A DegradedArray event has been detected on md device /dev/md1.



[root@mail ~]# mdadm --detail /dev/md1
/dev/md1:
        Version : 00.90.01
  Creation Time : Wed Jan  3 10:55:40 2007
     Raid Level : raid1
     Array Size : 104320 (101.89 MiB 106.82 MB)
    Device Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Mon Oct  8 14:11:12 2007
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : f554e523:e5d6c5df:d65d57fc:d732c08e
         Events : 0.2528

    Number   Major   Minor   RaidDevice State
       0       3        1        0      active sync   /dev/hda1
       1       0        0        -      removed


[root@mail ~]# mdadm --detail /dev/md2
/dev/md2:
        Version : 00.90.01
  Creation Time : Wed Jan  3 10:54:59 2007
     Raid Level : raid1
     Array Size : 78043648 (74.43 GiB 79.92 GB)
    Device Size : 78043648 (74.43 GiB 79.92 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Mon Oct  8 15:56:20 2007
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : fdd8ea67:b5564767:410e5042:29d225ba
         Events : 0.3819448

    Number   Major   Minor   RaidDevice State
       0       3        2        0      active sync   /dev/hda2
       1       0        0        -      removed

[root@mail ~]# fdisk -l

Disk /dev/hda: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1          13      104391   fd  Linux raid autodetect
/dev/hda2              14        9729    78043770   fd  Linux raid autodetect

Disk /dev/hdc: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdc1   *           1          13      104384+  fd  Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/hdc2              13        9729    78043807   fd  Linux raid autodetect

Disk /dev/md1: 106 MB, 106823680 bytes
2 heads, 4 sectors/track, 26080 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md1 doesn't contain a valid partition table

Disk /dev/md2: 79.9 GB, 79916695552 bytes
2 heads, 4 sectors/track, 19510912 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md2 doesn't contain a valid partition table

Disk /dev/dm-0: 77.7 GB, 77779173376 bytes
2 heads, 4 sectors/track, 18989056 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/dm-0 doesn't contain a valid partition table

Disk /dev/dm-1: 2080 MB, 2080374784 bytes
2 heads, 4 sectors/track, 507904 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/dm-1 doesn't contain a valid partition table


[root@mail ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 hda2[0]
      78043648 blocks [2/1] [U_]

md1 : active raid1 hda1[0]
      104320 blocks [2/1] [U_]

unused devices: <none>
Elliott

Offline NickR

  • *
  • 283
  • +0/-0
    • http://www.witzendcs.co.uk/
Re: RAID error appearing in email...
« Reply #8 on: October 09, 2007, 12:17:08 AM »
Could you possibly explain what led you to this conclusion? I get the mdadm errors with every reboot of one of my servers and I look at the output of all of the above messages and can't see where you figure out what's wrong. I have a similar setup with similar errors but I don't see what conclusively tells me which device is problematic.

I'll do my best  8)

I'll cut your post down to the salient parts for this exercise:

Quote
/dev/md1:
    Number   Major   Minor   RaidDevice State
       0       3        1        0      active sync   /dev/hda1
       1       0        0        -      removed


/dev/md2:
    Number   Major   Minor   RaidDevice State
       0       3        2        0      active sync   /dev/hda2
       1       0        0        -      removed

Using the output below, we know that there are 2 IDE disks present: /dev/hda & /dev/hdc

You can see above that /dev/hdc1 and /dev/hdc2 have both been removed from the md1 and md2 arrays - that tells us that the problem lies with /dev/hdc

Quote

[root@mail ~]# fdisk -l

Disk /dev/hda: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1          13      104391   fd  Linux raid autodetect
/dev/hda2              14        9729    78043770   fd  Linux raid autodetect

Disk /dev/hdc: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdc1   *           1          13      104384+  fd  Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/hdc2              13        9729    78043807   fd  Linux raid autodetect

Although the disks both have identical sizes and the heads, sectors & cylinders match, the second (/dev/hdc) partition table seems to have a problem - it should exactly match that of /dev/hda but there is a difference in the number of blocks.  This is probably why the disk was removed from the array as one of the prime requirements is an identical number of blocks on the mirrored partitions.

Quote

[root@mail ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 hda2[0]
      78043648 blocks [2/1] [U_]

md1 : active raid1 hda1[0]
      104320 blocks [2/1] [U_]

unused devices: <none>


This is just confirming what mdadm is telling us: namely, that /dev/hdc1 & 2 are missing from the array.

To fix this particular case, you will need to blow away the partition table on /dev/hdc and then re-create it to mirror that on /dev/hda exactly.  Once that has been done, you will be able to manually add /dev/hdc partitions back into the arrays as described earlier in this thread.

HTH
--
Nick......