Koozali.org: home of the SME Server

Obsolete Releases => SME Server 7.x => Topic started by: ntblade on June 28, 2007, 04:18:54 PM

Title: DegradedArray event detected on md device /dev/md2. - Help
Post by: ntblade on June 28, 2007, 04:18:54 PM: Hi all,
Got an email this morning with:
A DegradedArray event had been detected on md device /dev/md2.
In the contents.

After some searching I ran:
Code: [Select]
[root@haddington /]# cat /proc/mdstat Personalities : [raid1] md2 : active raid1 sdb2[1] 78019584 blocks [2/1] [_U] md1 : active raid1 sda1[0] sdb1[1] 104320 blocks [2/2] [UU] unused devices: <none> [root@haddington /]# mdadm --query --detail /dev/md2 /dev/md2: Version : 00.90.01 Creation Time : Sat Aug 26 16:15:34 2006 Raid Level : raid1 Array Size : 78019584 (74.41 GiB 79.89 GB) Device Size : 78019584 (74.41 GiB 79.89 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Thu Jun 28 14:56:36 2007 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : a037cd95:42254ed3:5a88d081:4aa5384c Events : 0.9532757 Number Major Minor RaidDevice State 0 0 0 - removed 1 8 18 1 active sync /dev/sdb2 [root@haddington /]#

I then called the office where this server lives and it turns out that a worker came in this monrning and found there was no internet access and decided to reboot both the ADSL router and the server (Server - Gateway mode).

Looking through the log I can see the server coming up:
Code: [Select]
<snip> Jun 28 07:57:38 haddington syslog: klogd shutdown succeeded Jun 28 07:57:38 haddington exiting on signal 15 Jun 28 07:58:38 haddington syslogd 1.4.1: restart. Jun 28 07:58:38 haddington syslog: syslogd startup succeeded <snip> Jun 28 07:58:41 haddington kernel: Vendor: ATA Model: Maxtor 6Y080M0 Rev: YAR5 Jun 28 07:58:41 haddington kernel: Type: Direct-Access ANSI SCSI revision: 05 Jun 28 07:58:41 haddington kernel: SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB) Jun 28 07:58:41 haddington kernel: SCSI device sda: drive cache: write back Jun 28 07:58:41 haddington kernel: SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB) Jun 28 07:58:41 haddington kernel: SCSI device sda: drive cache: write back Jun 28 07:58:41 haddington kernel: sda: sda1 sda2 Jun 28 07:58:41 haddington kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Jun 28 07:58:41 haddington kernel: Vendor: ATA Model: Maxtor 6Y080M0 Rev: YAR5 Jun 28 07:58:41 haddington kernel: Type: Direct-Access ANSI SCSI revision: 05 Jun 28 07:58:41 haddington kernel: SCSI device sdb: 156250000 512-byte hdwr sectors (80000 MB) Jun 28 07:58:41 haddington kernel: SCSI device sdb: drive cache: write back Jun 28 07:58:41 haddington kernel: SCSI device sdb: 156250000 512-byte hdwr sectors (80000 MB) Jun 28 07:58:41 haddington kernel: SCSI device sdb: drive cache: write back Jun 28 07:58:41 haddington kernel: sdb: sdb1 sdb2 Jun 28 07:58:41 haddington kernel: Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0 Jun 28 07:58:41 haddington kernel: device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com Jun 28 07:58:41 haddington kernel: md: raid1 personality registered as nr 3 Jun 28 07:58:42 haddington kernel: md: md1 stopped. Jun 28 07:58:42 haddington kernel: md: bind<sdb1> Jun 28 07:58:42 haddington kernel: md: bind<sda1> Jun 28 07:58:42 haddington kernel: raid1: raid set md1 active with 2 out of 2 mirrors Jun 28 07:58:42 haddington kernel: md: md2 stopped. Jun 28 07:58:42 haddington kernel: md: bind<sdb2> Jun 28 07:58:42 haddington kernel: md: bind<sda2> Jun 28 07:58:42 haddington kernel: raid1: raid set md2 active with 2 out of 2 mirrors <snip>

So, it looks to me that the server was booting fine and that the RAID array had been setup. Am I right?

Then I guess the user got bored and power-cycled the server:
Code: [Select]
<snip> Jun 28 07:58:56 haddington dhcpd: Starting dhcpd: Jun 28 07:58:56 haddington dhcpd: Starting dhcpd succeeded Jun 28 07:58:56 haddington dhcpd: [60G Jun 28 07:58:56 haddington dhcpd: Jun 28 07:58:56 haddington rc.e-smith: Starting dhcpd: succeeded Jun 28 08:00:24 haddington syslogd 1.4.1: restart. Jun 28 08:00:24 haddington syslog: syslogd startup succeeded Jun 28 08:00:24 haddington syslog: Jun 28 08:00:24 haddington syslog: Starting kernel logger: Jun 28 08:00:24 haddington kernel: klogd 1.4.1, log source = /proc/kmsg started. Jun 28 08:00:24 haddington syslog: klogd startup succeeded Jun 28 08:00:24 haddington syslog: [60G Jun 28 08:00:24 haddington syslog: <snip>Then:
Code: [Select]
<snip> Jun 28 08:00:28 haddington kernel: Vendor: ATA Model: Maxtor 6Y080M0 Rev: YAR5 Jun 28 08:00:28 haddington kernel: Type: Direct-Access ANSI SCSI revision: 05 Jun 28 08:00:28 haddington kernel: SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB) Jun 28 08:00:28 haddington kernel: SCSI device sda: drive cache: write back Jun 28 08:00:28 haddington kernel: SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB) Jun 28 08:00:28 haddington kernel: SCSI device sda: drive cache: write back Jun 28 08:00:28 haddington kernel: sda: sda1 sda2 Jun 28 08:00:28 haddington kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Jun 28 08:00:28 haddington kernel: Vendor: ATA Model: Maxtor 6Y080M0 Rev: YAR5 Jun 28 08:00:28 haddington kernel: Type: Direct-Access ANSI SCSI revision: 05 Jun 28 08:00:28 haddington network: Setting network parameters: Jun 28 08:00:28 haddington kernel: SCSI device sdb: 156250000 512-byte hdwr sectors (80000 MB) Jun 28 08:00:28 haddington sysctl: kernel.sysrq = 0 Jun 28 08:00:28 haddington kernel: SCSI device sdb: drive cache: write back Jun 28 08:00:28 haddington sysctl: net.ipv4.conf.all.rp_filter = 1 Jun 28 08:00:28 haddington kernel: SCSI device sdb: 156250000 512-byte hdwr sectors (80000 MB) Jun 28 08:00:28 haddington sysctl: net.ipv4.conf.default.rp_filter = 1 Jun 28 08:00:28 haddington kernel: SCSI device sdb: drive cache: write back Jun 28 08:00:28 haddington sysctl: net.ipv4.ip_dynaddr = 1 Jun 28 08:00:28 haddington kernel: sdb: sdb1 sdb2 Jun 28 08:00:28 haddington sysctl: net.ipv4.ip_forward = 1 Jun 28 08:00:28 haddington kernel: Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0 Jun 28 08:00:28 haddington sysctl: net.ipv4.tcp_keepalive_time = 300 Jun 28 08:00:28 haddington kernel: device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com Jun 28 08:00:28 haddington sysctl: net.ipv4.tcp_syncookies = 1 Jun 28 08:00:28 haddington kernel: md: raid1 personality registered as nr 3 Jun 28 08:00:28 haddington network: Setting network parameters: succeeded Jun 28 08:00:28 haddington kernel: md: md1 stopped. Jun 28 08:00:28 haddington network: [60G Jun 28 08:00:28 haddington kernel: md: bind<sdb1> Jun 28 08:00:28 haddington network: Jun 28 08:00:28 haddington kernel: md: bind<sda1> Jun 28 08:00:28 haddington network: Bringing up loopback interface: Jun 28 08:00:28 haddington kernel: raid1: raid set md1 active with 2 out of 2 mirrors Jun 28 08:00:28 haddington kernel: md: md2 stopped. Jun 28 08:00:28 haddington kernel: md: bind<sda2> Jun 28 08:00:28 haddington kernel: md: bind<sdb2> Jun 28 08:00:28 haddington kernel: md: kicking non-fresh sda2 from array! Jun 28 08:00:28 haddington kernel: md: unbind<sda2> Jun 28 08:00:28 haddington kernel: md: export_rdev(sda2) Jun 28 08:00:28 haddington kernel: md: md2: raid array is not clean -- starting background reconstruction Jun 28 08:00:28 haddington kernel: raid1: raid set md2 active with 1 out of 2 mirrors <snip>Argghhh!!! :evil:

Can anyone guide me what to do next please?
How can I tell of the array is being rebuilt?
Does it look like the filesystem got damaged with the second reboot?
Am I right in sysing that the drive is probably physically OK and can be re-added?

First I'm gonna have to visit the site and check the backups!

Many thanks in advance.
Norrie
Title: DegradedArray event detected on md device /dev/md2. - Help
Post by: Stefano on June 28, 2007, 04:39:45 PM: Hi Norrie

try
Code: [Select]
mdadm -a /dev/md2 /dev/sda2

then

Code: [Select]
tail -f /var/log/messages

and look for something going wrong..

HTH

Ciao

Stefano
Title: DegradedArray event detected on md device /dev/md2. - Help
Post by: ntblade on June 28, 2007, 05:13:28 PM: Wow! thanks for the quick reply!
Code: [Select]
[root@haddington /]# mdadm -a /dev/md2 /dev/sda2 mdadm: hot added /dev/sda2 [root@haddington /]# tail -f /var/log/messages Jun 28 15:53:16 haddington sshd(pam_unix)[4842]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=22.125.186.200.sta.impsat.net.br Jun 28 16:07:49 haddington kernel: md: bind<sda2> Jun 28 16:07:49 haddington kernel: RAID1 conf printout: Jun 28 16:07:49 haddington kernel: --- wd:1 rd:2 Jun 28 16:07:49 haddington kernel: disk 0, wo:1, o:1, dev:sda2 Jun 28 16:07:49 haddington kernel: disk 1, wo:0, o:1, dev:sdb2 Jun 28 16:07:49 haddington kernel: md: syncing RAID array md2 Jun 28 16:07:49 haddington kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. Jun 28 16:07:49 haddington kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction. Jun 28 16:07:49 haddington kernel: md: using 128k window, over a total of 78019584 blocks.Looks like the array is being reconstructed. Does the output of this look healthy?...
Code: [Select]
[root@haddington /]# mdadm --query --detail /dev/md2 /dev/md2: Version : 00.90.01 Creation Time : Sat Aug 26 16:15:34 2006 Raid Level : raid1 Array Size : 78019584 (74.41 GiB 79.89 GB) Device Size : 78019584 (74.41 GiB 79.89 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Thu Jun 28 16:11:43 2007 State : clean, degraded, recovering Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 Rebuild Status : 16% complete UUID : a037cd95:42254ed3:5a88d081:4aa5384c Events : 0.9534388 Number Major Minor RaidDevice State 0 0 0 - removed 1 8 18 1 active sync /dev/sdb2 2 8 2 0 spare rebuilding /dev/sda2 [root@haddington /]#

Thanks
Norrie
Title: DegradedArray event detected on md device /dev/md2. - Help
Post by: Stefano on June 28, 2007, 05:40:42 PM: Quote from: "ntblade"
Looks like the array is being reconstructed.

yes and no.. recostruction is going on..

look at log's line below:

Quote

Code: [Select]
Jun 28 16:07:49 haddington kernel: md: syncing RAID array md2 Jun 28 16:07:49 haddington kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. Jun 28 16:07:49 haddington kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction.

Quote

Does the output of this look healthy?

try
Code: [Select]
cat /proc/mdstat

when recostruction is finished and if everything is Ok you'd see something like

Code: [Select]
Personalities : [raid1] md2 : active raid1 sdb2[1] sda2[2] 78019584 blocks [2/1] [UU] md1 : active raid1 sda1[0] sdb1[1] 104320 blocks [2/2] [UU]

HTH

ciao

Stefano
Title: DegradedArray event detected on md device /dev/md2. - Help
Post by: ntblade on June 28, 2007, 06:08:08 PM: Yep, all fine now.
I got the series of rebuilt emails as well.

Thank you very much for your help. :D

Norrie
Title: Re: DegradedArray event detected on md device /dev/md2. - Help
Post by: pauljclarke on February 01, 2008, 10:29:44 AM: Hi, I have been getting this same error today from my server - for both md1 and md2.

Its showing that its degraded but nothing about it being fixed? and server is running very very slow!

Thanks

Code: [Select]
[root@sme ~]# cat /proc/mdstat Personalities : [raid1] md2 : active raid1 hda2[0] 32917120 blocks [2/1] [U_] md1 : active raid1 hda1[0] 104320 blocks [2/1] [U_] unused devices: <none> [root@sme ~]# mdadm --query --detail /dev/md1 /dev/md1: Version : 00.90.01 Creation Time : Fri Apr 13 16:03:33 2007 Raid Level : raid1 Array Size : 104320 (101.89 MiB 106.82 MB) Device Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Fri Feb 1 08:08:54 2008 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 08fae3d0:4cd4f530:915b5ca6:5314f509 Events : 0.4607 Number Major Minor RaidDevice State 0 3 1 0 active sync /dev/hda1 1 0 0 - removed [root@sme ~]# mdadm --query --detail /dev/md2 /dev/md2: Version : 00.90.01 Creation Time : Fri Apr 13 16:03:20 2007 Raid Level : raid1 Array Size : 32917120 (31.39 GiB 33.71 GB) Device Size : 32917120 (31.39 GiB 33.71 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Fri Feb 1 09:27:53 2008 State : active, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : f46feadc:2443a24b:022a4f8b:4ad30262 Events : 0.8253406 Number Major Minor RaidDevice State 0 3 2 0 active sync /dev/hda2 1 0 0 - removed [root@sme ~]#
Title: Re: DegradedArray event detected on md device /dev/md2. - Help
Post by: Stefano on February 01, 2008, 02:06:16 PM: hi..

sounds like one of your HD is gone..

btw, try:

- open a shell and write
Code: [Select]
tail -f /var/log/messages [code] - open another shell and write [code] mdadm -a /dev/hdc1 /dev/md1 mdadm -a /dev/hdc2 /dev/md2
note replace hdc with your correct device..

then do
Code: [Select]
cat /proc/mdstat[

and wait..

in the first shell you can see logs if there are errors on your added device

HTH

Stefano

[/code][/code]
Title: Re: DegradedArray event detected on md device /dev/md2. - Help
Post by: pauljclarke on February 01, 2008, 05:27:25 PM: Hi,

I only have one hard drive so each command come back saying that its not a md device.

Puzzled....???