Hi all,
Got an email this morning with:
A DegradedArray event had been detected on md device /dev/md2.
In the contents.
After some searching I ran:
[root@haddington /]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[1]
78019584 blocks [2/1] [_U]
md1 : active raid1 sda1[0] sdb1[1]
104320 blocks [2/2] [UU]
unused devices: <none>
[root@haddington /]# mdadm --query --detail /dev/md2
/dev/md2:
Version : 00.90.01
Creation Time : Sat Aug 26 16:15:34 2006
Raid Level : raid1
Array Size : 78019584 (74.41 GiB 79.89 GB)
Device Size : 78019584 (74.41 GiB 79.89 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Thu Jun 28 14:56:36 2007
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
UUID : a037cd95:42254ed3:5a88d081:4aa5384c
Events : 0.9532757
Number Major Minor RaidDevice State
0 0 0 - removed
1 8 18 1 active sync /dev/sdb2
[root@haddington /]#
I then called the office where this server lives and it turns out that a worker came in this monrning and found there was no internet access and decided to reboot both the ADSL router and the server (Server - Gateway mode).
Looking through the log I can see the server coming up:
<snip>
Jun 28 07:57:38 haddington syslog: klogd shutdown succeeded
Jun 28 07:57:38 haddington exiting on signal 15
Jun 28 07:58:38 haddington syslogd 1.4.1: restart.
Jun 28 07:58:38 haddington syslog: syslogd startup succeeded
<snip>
Jun 28 07:58:41 haddington kernel: Vendor: ATA Model: Maxtor 6Y080M0 Rev: YAR5
Jun 28 07:58:41 haddington kernel: Type: Direct-Access ANSI SCSI revision: 05
Jun 28 07:58:41 haddington kernel: SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 07:58:41 haddington kernel: SCSI device sda: drive cache: write back
Jun 28 07:58:41 haddington kernel: SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 07:58:41 haddington kernel: SCSI device sda: drive cache: write back
Jun 28 07:58:41 haddington kernel: sda: sda1 sda2
Jun 28 07:58:41 haddington kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Jun 28 07:58:41 haddington kernel: Vendor: ATA Model: Maxtor 6Y080M0 Rev: YAR5
Jun 28 07:58:41 haddington kernel: Type: Direct-Access ANSI SCSI revision: 05
Jun 28 07:58:41 haddington kernel: SCSI device sdb: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 07:58:41 haddington kernel: SCSI device sdb: drive cache: write back
Jun 28 07:58:41 haddington kernel: SCSI device sdb: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 07:58:41 haddington kernel: SCSI device sdb: drive cache: write back
Jun 28 07:58:41 haddington kernel: sdb: sdb1 sdb2
Jun 28 07:58:41 haddington kernel: Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
Jun 28 07:58:41 haddington kernel: device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com
Jun 28 07:58:41 haddington kernel: md: raid1 personality registered as nr 3
Jun 28 07:58:42 haddington kernel: md: md1 stopped.
Jun 28 07:58:42 haddington kernel: md: bind<sdb1>
Jun 28 07:58:42 haddington kernel: md: bind<sda1>
Jun 28 07:58:42 haddington kernel: raid1: raid set md1 active with 2 out of 2 mirrors
Jun 28 07:58:42 haddington kernel: md: md2 stopped.
Jun 28 07:58:42 haddington kernel: md: bind<sdb2>
Jun 28 07:58:42 haddington kernel: md: bind<sda2>
Jun 28 07:58:42 haddington kernel: raid1: raid set md2 active with 2 out of 2 mirrors
<snip>
So, it looks to me that the server was booting fine and that the RAID array had been setup. Am I right?
Then I guess the user got bored and power-cycled the server:
<snip>
Jun 28 07:58:56 haddington dhcpd: Starting dhcpd:
Jun 28 07:58:56 haddington dhcpd: Starting dhcpd succeeded
Jun 28 07:58:56 haddington dhcpd: [60G
Jun 28 07:58:56 haddington dhcpd:
Jun 28 07:58:56 haddington rc.e-smith: Starting dhcpd: succeeded
Jun 28 08:00:24 haddington syslogd 1.4.1: restart.
Jun 28 08:00:24 haddington syslog: syslogd startup succeeded
Jun 28 08:00:24 haddington syslog:
Jun 28 08:00:24 haddington syslog: Starting kernel logger:
Jun 28 08:00:24 haddington kernel: klogd 1.4.1, log source = /proc/kmsg started.
Jun 28 08:00:24 haddington syslog: klogd startup succeeded
Jun 28 08:00:24 haddington syslog: [60G
Jun 28 08:00:24 haddington syslog:
<snip>
Then:
<snip>
Jun 28 08:00:28 haddington kernel: Vendor: ATA Model: Maxtor 6Y080M0 Rev: YAR5
Jun 28 08:00:28 haddington kernel: Type: Direct-Access ANSI SCSI revision: 05
Jun 28 08:00:28 haddington kernel: SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 08:00:28 haddington kernel: SCSI device sda: drive cache: write back
Jun 28 08:00:28 haddington kernel: SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 08:00:28 haddington kernel: SCSI device sda: drive cache: write back
Jun 28 08:00:28 haddington kernel: sda: sda1 sda2
Jun 28 08:00:28 haddington kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Jun 28 08:00:28 haddington kernel: Vendor: ATA Model: Maxtor 6Y080M0 Rev: YAR5
Jun 28 08:00:28 haddington kernel: Type: Direct-Access ANSI SCSI revision: 05
Jun 28 08:00:28 haddington network: Setting network parameters:
Jun 28 08:00:28 haddington kernel: SCSI device sdb: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 08:00:28 haddington sysctl: kernel.sysrq = 0
Jun 28 08:00:28 haddington kernel: SCSI device sdb: drive cache: write back
Jun 28 08:00:28 haddington sysctl: net.ipv4.conf.all.rp_filter = 1
Jun 28 08:00:28 haddington kernel: SCSI device sdb: 156250000 512-byte hdwr sectors (80000 MB)
Jun 28 08:00:28 haddington sysctl: net.ipv4.conf.default.rp_filter = 1
Jun 28 08:00:28 haddington kernel: SCSI device sdb: drive cache: write back
Jun 28 08:00:28 haddington sysctl: net.ipv4.ip_dynaddr = 1
Jun 28 08:00:28 haddington kernel: sdb: sdb1 sdb2
Jun 28 08:00:28 haddington sysctl: net.ipv4.ip_forward = 1
Jun 28 08:00:28 haddington kernel: Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
Jun 28 08:00:28 haddington sysctl: net.ipv4.tcp_keepalive_time = 300
Jun 28 08:00:28 haddington kernel: device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com
Jun 28 08:00:28 haddington sysctl: net.ipv4.tcp_syncookies = 1
Jun 28 08:00:28 haddington kernel: md: raid1 personality registered as nr 3
Jun 28 08:00:28 haddington network: Setting network parameters: succeeded
Jun 28 08:00:28 haddington kernel: md: md1 stopped.
Jun 28 08:00:28 haddington network: [60G
Jun 28 08:00:28 haddington kernel: md: bind<sdb1>
Jun 28 08:00:28 haddington network:
Jun 28 08:00:28 haddington kernel: md: bind<sda1>
Jun 28 08:00:28 haddington network: Bringing up loopback interface:
Jun 28 08:00:28 haddington kernel: raid1: raid set md1 active with 2 out of 2 mirrors
Jun 28 08:00:28 haddington kernel: md: md2 stopped.
Jun 28 08:00:28 haddington kernel: md: bind<sda2>
Jun 28 08:00:28 haddington kernel: md: bind<sdb2>
Jun 28 08:00:28 haddington kernel: md: kicking non-fresh sda2 from array!
Jun 28 08:00:28 haddington kernel: md: unbind<sda2>
Jun 28 08:00:28 haddington kernel: md: export_rdev(sda2)
Jun 28 08:00:28 haddington kernel: md: md2: raid array is not clean -- starting background reconstruction
Jun 28 08:00:28 haddington kernel: raid1: raid set md2 active with 1 out of 2 mirrors
<snip>
Argghhh!!!
Can anyone guide me what to do next please?
How can I tell of the array is being rebuilt?
Does it look like the filesystem got damaged with the second reboot?
Am I right in sysing that the drive is probably physically OK and can be re-added?
First I'm gonna have to visit the site and check the backups!
Many thanks in advance.
Norrie