Koozali.org: home of the SME Server
Obsolete Releases => SME Server 7.x => Topic started by: ntblade on June 28, 2007, 04:18:54 PM
-
Hi all,
Got an email this morning with:
A DegradedArray event had been detected on md device /dev/md2.
In the contents.
After some searching I ran:[root@haddington /]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[1]
78019584 blocks [2/1] [_U]
md1 : active raid1 sda1[0] sdb1[1]
104320 blocks [2/2] [UU]
unused devices: <none>
[root@haddington /]# mdadm --query --detail /dev/md2
/dev/md2:
Version : 00.90.01
Creation Time : Sat Aug 26 16:15:34 2006
Raid Level : raid1
Array Size : 78019584 (74.41 GiB 79.89 GB)
Device Size : 78019584 (74.41 GiB 79.89 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Thu Jun 28 14:56:36 2007
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
UUID : a037cd95:42254ed3:5a88d081:4aa5384c
Events : 0.9532757
Number Major Minor RaidDevice State
0 0 0 - removed
1 8 18 1 active sync /dev/sdb2
[root@haddington /]#
I then called the office where this server lives and it turns out that a worker came in this monrning and found there was no internet access and decided to reboot both the ADSL router and the server (Server - Gateway mode).
Looking through the log I can see the server coming up:<snip>
Jun 28 07:57:38 haddington syslog: klogd shutdown succeeded
Jun 28 07:57:38 haddington exiting on signal 15
Jun 28 07:58:38 haddington syslogd 1.4.1: restart.
Jun 28 07:58:38 haddington syslog: syslogd startup succeeded
<snip>
Jun 28 07:58:41 haddington kernel: Vendor: ATA Model: Maxtor 6Y080M0 Rev: YAR5
Jun 28 07:58:41 haddington kernel: Type: Direct-Access ANSI SCSI revision: 05
Jun 28 07:58:41 haddington kernel: SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 07:58:41 haddington kernel: SCSI device sda: drive cache: write back
Jun 28 07:58:41 haddington kernel: SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 07:58:41 haddington kernel: SCSI device sda: drive cache: write back
Jun 28 07:58:41 haddington kernel: sda: sda1 sda2
Jun 28 07:58:41 haddington kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Jun 28 07:58:41 haddington kernel: Vendor: ATA Model: Maxtor 6Y080M0 Rev: YAR5
Jun 28 07:58:41 haddington kernel: Type: Direct-Access ANSI SCSI revision: 05
Jun 28 07:58:41 haddington kernel: SCSI device sdb: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 07:58:41 haddington kernel: SCSI device sdb: drive cache: write back
Jun 28 07:58:41 haddington kernel: SCSI device sdb: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 07:58:41 haddington kernel: SCSI device sdb: drive cache: write back
Jun 28 07:58:41 haddington kernel: sdb: sdb1 sdb2
Jun 28 07:58:41 haddington kernel: Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
Jun 28 07:58:41 haddington kernel: device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com
Jun 28 07:58:41 haddington kernel: md: raid1 personality registered as nr 3
Jun 28 07:58:42 haddington kernel: md: md1 stopped.
Jun 28 07:58:42 haddington kernel: md: bind<sdb1>
Jun 28 07:58:42 haddington kernel: md: bind<sda1>
Jun 28 07:58:42 haddington kernel: raid1: raid set md1 active with 2 out of 2 mirrors
Jun 28 07:58:42 haddington kernel: md: md2 stopped.
Jun 28 07:58:42 haddington kernel: md: bind<sdb2>
Jun 28 07:58:42 haddington kernel: md: bind<sda2>
Jun 28 07:58:42 haddington kernel: raid1: raid set md2 active with 2 out of 2 mirrors
<snip>
So, it looks to me that the server was booting fine and that the RAID array had been setup. Am I right?
Then I guess the user got bored and power-cycled the server:<snip>
Jun 28 07:58:56 haddington dhcpd: Starting dhcpd:
Jun 28 07:58:56 haddington dhcpd: Starting dhcpd succeeded
Jun 28 07:58:56 haddington dhcpd: [60G
Jun 28 07:58:56 haddington dhcpd:
Jun 28 07:58:56 haddington rc.e-smith: Starting dhcpd: succeeded
Jun 28 08:00:24 haddington syslogd 1.4.1: restart.
Jun 28 08:00:24 haddington syslog: syslogd startup succeeded
Jun 28 08:00:24 haddington syslog:
Jun 28 08:00:24 haddington syslog: Starting kernel logger:
Jun 28 08:00:24 haddington kernel: klogd 1.4.1, log source = /proc/kmsg started.
Jun 28 08:00:24 haddington syslog: klogd startup succeeded
Jun 28 08:00:24 haddington syslog: [60G
Jun 28 08:00:24 haddington syslog:
<snip>
Then:<snip>
Jun 28 08:00:28 haddington kernel: Vendor: ATA Model: Maxtor 6Y080M0 Rev: YAR5
Jun 28 08:00:28 haddington kernel: Type: Direct-Access ANSI SCSI revision: 05
Jun 28 08:00:28 haddington kernel: SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 08:00:28 haddington kernel: SCSI device sda: drive cache: write back
Jun 28 08:00:28 haddington kernel: SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 08:00:28 haddington kernel: SCSI device sda: drive cache: write back
Jun 28 08:00:28 haddington kernel: sda: sda1 sda2
Jun 28 08:00:28 haddington kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Jun 28 08:00:28 haddington kernel: Vendor: ATA Model: Maxtor 6Y080M0 Rev: YAR5
Jun 28 08:00:28 haddington kernel: Type: Direct-Access ANSI SCSI revision: 05
Jun 28 08:00:28 haddington network: Setting network parameters:
Jun 28 08:00:28 haddington kernel: SCSI device sdb: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 08:00:28 haddington sysctl: kernel.sysrq = 0
Jun 28 08:00:28 haddington kernel: SCSI device sdb: drive cache: write back
Jun 28 08:00:28 haddington sysctl: net.ipv4.conf.all.rp_filter = 1
Jun 28 08:00:28 haddington kernel: SCSI device sdb: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 08:00:28 haddington sysctl: net.ipv4.conf.default.rp_filter = 1
Jun 28 08:00:28 haddington kernel: SCSI device sdb: drive cache: write back
Jun 28 08:00:28 haddington sysctl: net.ipv4.ip_dynaddr = 1
Jun 28 08:00:28 haddington kernel: sdb: sdb1 sdb2
Jun 28 08:00:28 haddington sysctl: net.ipv4.ip_forward = 1
Jun 28 08:00:28 haddington kernel: Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
Jun 28 08:00:28 haddington sysctl: net.ipv4.tcp_keepalive_time = 300
Jun 28 08:00:28 haddington kernel: device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com
Jun 28 08:00:28 haddington sysctl: net.ipv4.tcp_syncookies = 1
Jun 28 08:00:28 haddington kernel: md: raid1 personality registered as nr 3
Jun 28 08:00:28 haddington network: Setting network parameters: succeeded
Jun 28 08:00:28 haddington kernel: md: md1 stopped.
Jun 28 08:00:28 haddington network: [60G
Jun 28 08:00:28 haddington kernel: md: bind<sdb1>
Jun 28 08:00:28 haddington network:
Jun 28 08:00:28 haddington kernel: md: bind<sda1>
Jun 28 08:00:28 haddington network: Bringing up loopback interface:
Jun 28 08:00:28 haddington kernel: raid1: raid set md1 active with 2 out of 2 mirrors
Jun 28 08:00:28 haddington kernel: md: md2 stopped.
Jun 28 08:00:28 haddington kernel: md: bind<sda2>
Jun 28 08:00:28 haddington kernel: md: bind<sdb2>
Jun 28 08:00:28 haddington kernel: md: kicking non-fresh sda2 from array!
Jun 28 08:00:28 haddington kernel: md: unbind<sda2>
Jun 28 08:00:28 haddington kernel: md: export_rdev(sda2)
Jun 28 08:00:28 haddington kernel: md: md2: raid array is not clean -- starting background reconstruction
Jun 28 08:00:28 haddington kernel: raid1: raid set md2 active with 1 out of 2 mirrors
<snip>
Argghhh!!! :evil:
Can anyone guide me what to do next please?
How can I tell of the array is being rebuilt?
Does it look like the filesystem got damaged with the second reboot?
Am I right in sysing that the drive is probably physically OK and can be re-added?
First I'm gonna have to visit the site and check the backups!
Many thanks in advance.
Norrie
-
Hi Norrie
try
mdadm -a /dev/md2 /dev/sda2
then
tail -f /var/log/messages
and look for something going wrong..
HTH
Ciao
Stefano
-
Wow! thanks for the quick reply!
[root@haddington /]# mdadm -a /dev/md2 /dev/sda2
mdadm: hot added /dev/sda2
[root@haddington /]# tail -f /var/log/messages
Jun 28 15:53:16 haddington sshd(pam_unix)[4842]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=22.125.186.200.sta.impsat.net.br
Jun 28 16:07:49 haddington kernel: md: bind<sda2>
Jun 28 16:07:49 haddington kernel: RAID1 conf printout:
Jun 28 16:07:49 haddington kernel: --- wd:1 rd:2
Jun 28 16:07:49 haddington kernel: disk 0, wo:1, o:1, dev:sda2
Jun 28 16:07:49 haddington kernel: disk 1, wo:0, o:1, dev:sdb2
Jun 28 16:07:49 haddington kernel: md: syncing RAID array md2
Jun 28 16:07:49 haddington kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Jun 28 16:07:49 haddington kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction.
Jun 28 16:07:49 haddington kernel: md: using 128k window, over a total of 78019584 blocks.
Looks like the array is being reconstructed. Does the output of this look healthy?...[root@haddington /]# mdadm --query --detail /dev/md2
/dev/md2:
Version : 00.90.01
Creation Time : Sat Aug 26 16:15:34 2006
Raid Level : raid1
Array Size : 78019584 (74.41 GiB 79.89 GB)
Device Size : 78019584 (74.41 GiB 79.89 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Thu Jun 28 16:11:43 2007
State : clean, degraded, recovering
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Rebuild Status : 16% complete
UUID : a037cd95:42254ed3:5a88d081:4aa5384c
Events : 0.9534388
Number Major Minor RaidDevice State
0 0 0 - removed
1 8 18 1 active sync /dev/sdb2
2 8 2 0 spare rebuilding /dev/sda2
[root@haddington /]#
Thanks
Norrie
-
Looks like the array is being reconstructed.
yes and no.. recostruction is going on..
look at log's line below:
Jun 28 16:07:49 haddington kernel: md: syncing RAID array md2
Jun 28 16:07:49 haddington kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Jun 28 16:07:49 haddington kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction.
Does the output of this look healthy?
try
cat /proc/mdstat
when recostruction is finished and if everything is Ok you'd see something like
Personalities : [raid1]
md2 : active raid1 sdb2[1] sda2[2]
78019584 blocks [2/1] [UU]
md1 : active raid1 sda1[0] sdb1[1]
104320 blocks [2/2] [UU]
HTH
ciao
Stefano
-
Yep, all fine now.
I got the series of rebuilt emails as well.
Thank you very much for your help. :D
Norrie
-
Hi, I have been getting this same error today from my server - for both md1 and md2.
Its showing that its degraded but nothing about it being fixed? and server is running very very slow!
Thanks
[root@sme ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 hda2[0]
32917120 blocks [2/1] [U_]
md1 : active raid1 hda1[0]
104320 blocks [2/1] [U_]
unused devices: <none>
[root@sme ~]# mdadm --query --detail /dev/md1
/dev/md1:
Version : 00.90.01
Creation Time : Fri Apr 13 16:03:33 2007
Raid Level : raid1
Array Size : 104320 (101.89 MiB 106.82 MB)
Device Size : 104320 (101.89 MiB 106.82 MB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Fri Feb 1 08:08:54 2008
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
UUID : 08fae3d0:4cd4f530:915b5ca6:5314f509
Events : 0.4607
Number Major Minor RaidDevice State
0 3 1 0 active sync /dev/hda1
1 0 0 - removed
[root@sme ~]# mdadm --query --detail /dev/md2
/dev/md2:
Version : 00.90.01
Creation Time : Fri Apr 13 16:03:20 2007
Raid Level : raid1
Array Size : 32917120 (31.39 GiB 33.71 GB)
Device Size : 32917120 (31.39 GiB 33.71 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Fri Feb 1 09:27:53 2008
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
UUID : f46feadc:2443a24b:022a4f8b:4ad30262
Events : 0.8253406
Number Major Minor RaidDevice State
0 3 2 0 active sync /dev/hda2
1 0 0 - removed
[root@sme ~]#
-
hi..
sounds like one of your HD is gone..
btw, try:
- open a shell and write
tail -f /var/log/messages
[code]
- open another shell and write
[code]
mdadm -a /dev/hdc1 /dev/md1
mdadm -a /dev/hdc2 /dev/md2
note replace hdc with your correct device..
then do
cat /proc/mdstat[
and wait..
in the first shell you can see logs if there are errors on your added device
HTH
Stefano
[/code][/code]
-
Hi,
I only have one hard drive so each command come back saying that its not a md device.
Puzzled....???