DegradedArray event detected on md device /dev/md2.

ntblade

252
+0/-0

DegradedArray event detected on md device /dev/md2. - Help

« on: June 28, 2007, 04:18:54 PM »

Hi all,
Got an email this morning with:
A DegradedArray event had been detected on md device /dev/md2.
In the contents.

After some searching I ran:

Code: [Select]

[root@haddington /]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[1]
      78019584 blocks [2/1] [_U]

md1 : active raid1 sda1[0] sdb1[1]
      104320 blocks [2/2] [UU]

unused devices: <none>
[root@haddington /]# mdadm --query --detail /dev/md2
/dev/md2:
        Version : 00.90.01
  Creation Time : Sat Aug 26 16:15:34 2006
     Raid Level : raid1
     Array Size : 78019584 (74.41 GiB 79.89 GB)
    Device Size : 78019584 (74.41 GiB 79.89 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Thu Jun 28 14:56:36 2007
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : a037cd95:42254ed3:5a88d081:4aa5384c
         Events : 0.9532757

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1       8       18        1      active sync   /dev/sdb2
[root@haddington /]#

I then called the office where this server lives and it turns out that a worker came in this monrning and found there was no internet access and decided to reboot both the ADSL router and the server (Server - Gateway mode).

Looking through the log I can see the server coming up:

Code: [Select]

<snip>

Jun 28 07:57:38 haddington syslog: klogd shutdown succeeded
Jun 28 07:57:38 haddington exiting on signal 15
Jun 28 07:58:38 haddington syslogd 1.4.1: restart.
Jun 28 07:58:38 haddington syslog: syslogd startup succeeded

<snip>

Jun 28 07:58:41 haddington kernel:   Vendor: ATA       Model: Maxtor 6Y080M0    Rev: YAR5
Jun 28 07:58:41 haddington kernel:   Type:   Direct-Access                      ANSI SCSI revision: 05
Jun 28 07:58:41 haddington kernel: SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 07:58:41 haddington kernel: SCSI device sda: drive cache: write back
Jun 28 07:58:41 haddington kernel: SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 07:58:41 haddington kernel: SCSI device sda: drive cache: write back
Jun 28 07:58:41 haddington kernel:  sda: sda1 sda2
Jun 28 07:58:41 haddington kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Jun 28 07:58:41 haddington kernel:   Vendor: ATA       Model: Maxtor 6Y080M0    Rev: YAR5
Jun 28 07:58:41 haddington kernel:   Type:   Direct-Access                      ANSI SCSI revision: 05
Jun 28 07:58:41 haddington kernel: SCSI device sdb: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 07:58:41 haddington kernel: SCSI device sdb: drive cache: write back
Jun 28 07:58:41 haddington kernel: SCSI device sdb: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 07:58:41 haddington kernel: SCSI device sdb: drive cache: write back
Jun 28 07:58:41 haddington kernel:  sdb: sdb1 sdb2
Jun 28 07:58:41 haddington kernel: Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
Jun 28 07:58:41 haddington kernel: device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com
Jun 28 07:58:41 haddington kernel: md: raid1 personality registered as nr 3
Jun 28 07:58:42 haddington kernel: md: md1 stopped.
Jun 28 07:58:42 haddington kernel: md: bind<sdb1>
Jun 28 07:58:42 haddington kernel: md: bind<sda1>
Jun 28 07:58:42 haddington kernel: raid1: raid set md1 active with 2 out of 2 mirrors
Jun 28 07:58:42 haddington kernel: md: md2 stopped.
Jun 28 07:58:42 haddington kernel: md: bind<sdb2>
Jun 28 07:58:42 haddington kernel: md: bind<sda2>
Jun 28 07:58:42 haddington kernel: raid1: raid set md2 active with 2 out of 2 mirrors

<snip>

So, it looks to me that the server was booting fine and that the RAID array had been setup. Am I right?

Then I guess the user got bored and power-cycled the server:

Code: [Select]

<snip>

Jun 28 07:58:56 haddington dhcpd: Starting dhcpd:
Jun 28 07:58:56 haddington dhcpd: Starting dhcpd succeeded
Jun 28 07:58:56 haddington dhcpd: [60G
Jun 28 07:58:56 haddington dhcpd: 
Jun 28 07:58:56 haddington rc.e-smith: Starting dhcpd:  succeeded
Jun 28 08:00:24 haddington syslogd 1.4.1: restart.
Jun 28 08:00:24 haddington syslog: syslogd startup succeeded
Jun 28 08:00:24 haddington syslog: 
Jun 28 08:00:24 haddington syslog: Starting kernel logger: 
Jun 28 08:00:24 haddington kernel: klogd 1.4.1, log source = /proc/kmsg started.
Jun 28 08:00:24 haddington syslog: klogd startup succeeded
Jun 28 08:00:24 haddington syslog: [60G
Jun 28 08:00:24 haddington syslog: 

<snip>

Then:

Code: [Select]

<snip>

Jun 28 08:00:28 haddington kernel:   Vendor: ATA       Model: Maxtor 6Y080M0    Rev: YAR5
Jun 28 08:00:28 haddington kernel:   Type:   Direct-Access                      ANSI SCSI revision: 05
Jun 28 08:00:28 haddington kernel: SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 08:00:28 haddington kernel: SCSI device sda: drive cache: write back
Jun 28 08:00:28 haddington kernel: SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 08:00:28 haddington kernel: SCSI device sda: drive cache: write back
Jun 28 08:00:28 haddington kernel:  sda: sda1 sda2
Jun 28 08:00:28 haddington kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Jun 28 08:00:28 haddington kernel:   Vendor: ATA       Model: Maxtor 6Y080M0    Rev: YAR5
Jun 28 08:00:28 haddington kernel:   Type:   Direct-Access                      ANSI SCSI revision: 05
Jun 28 08:00:28 haddington network: Setting network parameters:  
Jun 28 08:00:28 haddington kernel: SCSI device sdb: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 08:00:28 haddington sysctl: kernel.sysrq = 0
Jun 28 08:00:28 haddington kernel: SCSI device sdb: drive cache: write back
Jun 28 08:00:28 haddington sysctl: net.ipv4.conf.all.rp_filter = 1
Jun 28 08:00:28 haddington kernel: SCSI device sdb: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 08:00:28 haddington sysctl: net.ipv4.conf.default.rp_filter = 1
Jun 28 08:00:28 haddington kernel: SCSI device sdb: drive cache: write back
Jun 28 08:00:28 haddington sysctl: net.ipv4.ip_dynaddr = 1
Jun 28 08:00:28 haddington kernel:  sdb: sdb1 sdb2
Jun 28 08:00:28 haddington sysctl: net.ipv4.ip_forward = 1
Jun 28 08:00:28 haddington kernel: Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
Jun 28 08:00:28 haddington sysctl: net.ipv4.tcp_keepalive_time = 300
Jun 28 08:00:28 haddington kernel: device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com
Jun 28 08:00:28 haddington sysctl: net.ipv4.tcp_syncookies = 1
Jun 28 08:00:28 haddington kernel: md: raid1 personality registered as nr 3
Jun 28 08:00:28 haddington network: Setting network parameters:  succeeded
Jun 28 08:00:28 haddington kernel: md: md1 stopped.
Jun 28 08:00:28 haddington network: [60G
Jun 28 08:00:28 haddington kernel: md: bind<sdb1>
Jun 28 08:00:28 haddington network: 
Jun 28 08:00:28 haddington kernel: md: bind<sda1>
Jun 28 08:00:28 haddington network: Bringing up loopback interface:  
Jun 28 08:00:28 haddington kernel: raid1: raid set md1 active with 2 out of 2 mirrors
Jun 28 08:00:28 haddington kernel: md: md2 stopped.
Jun 28 08:00:28 haddington kernel: md: bind<sda2>
Jun 28 08:00:28 haddington kernel: md: bind<sdb2>
Jun 28 08:00:28 haddington kernel: md: kicking non-fresh sda2 from array!
Jun 28 08:00:28 haddington kernel: md: unbind<sda2>
Jun 28 08:00:28 haddington kernel: md: export_rdev(sda2)
Jun 28 08:00:28 haddington kernel: md: md2: raid array is not clean -- starting background reconstruction
Jun 28 08:00:28 haddington kernel: raid1: raid set md2 active with 1 out of 2 mirrors

<snip>

Argghhh!!!

Can anyone guide me what to do next please?
How can I tell of the array is being rebuilt?
Does it look like the filesystem got damaged with the second reboot?
Am I right in sysing that the drive is probably physically OK and can be re-added?

First I'm gonna have to visit the site and check the backups!

Many thanks in advance.
Norrie

Logged

Stefano

10,895
+3/-0

DegradedArray event detected on md device /dev/md2. - Help

« Reply #1 on: June 28, 2007, 04:39:45 PM »

Hi Norrie

try

Code: [Select]


mdadm -a /dev/md2 /dev/sda2

then

Code: [Select]


tail -f /var/log/messages

and look for something going wrong..

HTH

Ciao

Stefano

Logged

ntblade

252
+0/-0

DegradedArray event detected on md device /dev/md2. - Help

« Reply #2 on: June 28, 2007, 05:13:28 PM »

Wow! thanks for the quick reply!

Code: [Select]

[root@haddington /]# mdadm -a /dev/md2 /dev/sda2
mdadm: hot added /dev/sda2
[root@haddington /]# tail -f /var/log/messages
Jun 28 15:53:16 haddington sshd(pam_unix)[4842]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=22.125.186.200.sta.impsat.net.br
Jun 28 16:07:49 haddington kernel: md: bind<sda2>
Jun 28 16:07:49 haddington kernel: RAID1 conf printout:
Jun 28 16:07:49 haddington kernel:  --- wd:1 rd:2
Jun 28 16:07:49 haddington kernel:  disk 0, wo:1, o:1, dev:sda2
Jun 28 16:07:49 haddington kernel:  disk 1, wo:0, o:1, dev:sdb2
Jun 28 16:07:49 haddington kernel: md: syncing RAID array md2
Jun 28 16:07:49 haddington kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Jun 28 16:07:49 haddington kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction.
Jun 28 16:07:49 haddington kernel: md: using 128k window, over a total of 78019584 blocks.

Looks like the array is being reconstructed. Does the output of this look healthy?...

Code: [Select]

[root@haddington /]# mdadm --query --detail /dev/md2
/dev/md2:
        Version : 00.90.01
  Creation Time : Sat Aug 26 16:15:34 2006
     Raid Level : raid1
     Array Size : 78019584 (74.41 GiB 79.89 GB)
    Device Size : 78019584 (74.41 GiB 79.89 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Thu Jun 28 16:11:43 2007
          State : clean, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 16% complete

           UUID : a037cd95:42254ed3:5a88d081:4aa5384c
         Events : 0.9534388

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1       8       18        1      active sync   /dev/sdb2

       2       8        2        0      spare rebuilding   /dev/sda2
[root@haddington /]#

Thanks
Norrie

Logged

Stefano

10,895
+3/-0

DegradedArray event detected on md device /dev/md2. - Help

« Reply #3 on: June 28, 2007, 05:40:42 PM »

Quote from: "ntblade"

Looks like the array is being reconstructed.

yes and no.. recostruction is going on..

look at log's line below:

Quote

Code: [Select]
Jun 28 16:07:49 haddington kernel: md: syncing RAID array md2 Jun 28 16:07:49 haddington kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. Jun 28 16:07:49 haddington kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction.

Quote

Does the output of this look healthy?

try

Code: [Select]


cat /proc/mdstat

when recostruction is finished and if everything is Ok you'd see something like

Code: [Select]


Personalities : [raid1]
md2 : active raid1 sdb2[1] sda2[2]
      78019584 blocks [2/1] [UU]

md1 : active raid1 sda1[0] sdb1[1]
      104320 blocks [2/2] [UU]

HTH

ciao

Stefano

Logged

ntblade

252
+0/-0

DegradedArray event detected on md device /dev/md2. - Help

« Reply #4 on: June 28, 2007, 06:08:08 PM »

Yep, all fine now.
I got the series of rebuilt emails as well.

Thank you very much for your help.

Norrie

Logged

pauljclarke

61
+0/-0

Re: DegradedArray event detected on md device /dev/md2. - Help

« Reply #5 on: February 01, 2008, 10:29:44 AM »

Hi, I have been getting this same error today from my server - for both md1 and md2.

Its showing that its degraded but nothing about it being fixed? and server is running very very slow!

Thanks

Code: [Select]

[root@sme ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 hda2[0]
      32917120 blocks [2/1] [U_]

md1 : active raid1 hda1[0]
      104320 blocks [2/1] [U_]

unused devices: <none>
[root@sme ~]# mdadm --query --detail /dev/md1
/dev/md1:
        Version : 00.90.01
  Creation Time : Fri Apr 13 16:03:33 2007
     Raid Level : raid1
     Array Size : 104320 (101.89 MiB 106.82 MB)
    Device Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Fri Feb  1 08:08:54 2008
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 08fae3d0:4cd4f530:915b5ca6:5314f509
         Events : 0.4607

    Number   Major   Minor   RaidDevice State
       0       3        1        0      active sync   /dev/hda1
       1       0        0        -      removed
[root@sme ~]# mdadm --query --detail /dev/md2
/dev/md2:
        Version : 00.90.01
  Creation Time : Fri Apr 13 16:03:20 2007
     Raid Level : raid1
     Array Size : 32917120 (31.39 GiB 33.71 GB)
    Device Size : 32917120 (31.39 GiB 33.71 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Fri Feb  1 09:27:53 2008
          State : active, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : f46feadc:2443a24b:022a4f8b:4ad30262
         Events : 0.8253406

    Number   Major   Minor   RaidDevice State
       0       3        2        0      active sync   /dev/hda2
       1       0        0        -      removed
[root@sme ~]#

Logged

Stefano

10,895
+3/-0

Re: DegradedArray event detected on md device /dev/md2. - Help

« Reply #6 on: February 01, 2008, 02:06:16 PM »

hi..

sounds like one of your HD is gone..

btw, try:

- open a shell and write

Code: [Select]

tail -f /var/log/messages
[code]

- open another shell and write

[code]
mdadm -a /dev/hdc1 /dev/md1
mdadm -a /dev/hdc2 /dev/md2

note replace hdc with your correct device..

then do

Code: [Select]

cat /proc/mdstat[

and wait..

in the first shell you can see logs if there are errors on your added device

HTH

Stefano

[/code][/code]

Logged

pauljclarke

61
+0/-0

Re: DegradedArray event detected on md device /dev/md2. - Help

« Reply #7 on: February 01, 2008, 05:27:25 PM »

Hi,

I only have one hard drive so each command come back saying that its not a md device.

Puzzled....???

Logged