RAID1 out of sync

SchulzStefan

620
+0/-0

RAID1 out of sync

« on: December 08, 2013, 03:22:45 PM »

SME 8.1 up-to-date. Got an email that the RAID is out of sync. Removed the sdb from the array. Did a reboot, was dropped in a console. Performed a fsck and corrected the errors. Booted the machine and added the sdb back to the array. Got this email:

Return-Path: <anonymous@ivb.local>
Delivered-To: admin@saturn.ivb.local
Received: (qmail 9289 invoked by alias); 8 Dec 2013 12:09:28 -0000
Delivered-To: alias-localdelivery-admin@ivb.local
Received: (qmail 9286 invoked by uid 0); 8 Dec 2013 12:09:28 -0000
Date: 8 Dec 2013 12:09:28 -0000
Message-ID: <20131208120928.9285.qmail@ivb.local>
From: anonymous@ivb.local
To: admin@ivb.local
Subject: SMART error (CurrentPendingSector) detected on host: saturn

This email was generated by the smartd daemon running on:

host name: saturn
DNS domain: ivb.local
NIS domain: (none)

The following warning/error was logged by the smartd daemon:

Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
Another email message will be sent in 1 days if the problem persists

Did the following:

[root@saturn new]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]

md2 : active raid1 sdb2[1] sda2[2](S)
976655552 blocks [2/1] [_U]

unused devices: <none>

and:

[root@saturn new]# mdadm --detail /dev/md1
/dev/md1:
Version : 0.90
Creation Time : Fri Aug 8 17:01:14 2008
Raid Level : raid1
Array Size : 104320 (101.89 MiB 106.82 MB)
Used Dev Size : 104320 (101.89 MiB 106.82 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Sun Dec 8 10:11:18 2013
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

UUID : b5f1b131:fe27265a:85dfe98f:3fb577a2
Events : 0.19554

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1

[root@saturn new]# mdadm --detail /dev/md2
/dev/md2:
Version : 0.90
Creation Time : Fri Aug 8 17:01:14 2008
Raid Level : raid1
Array Size : 976655552 (931.41 GiB 1000.10 GB)
Used Dev Size : 976655552 (931.41 GiB 1000.10 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent

Update Time : Sun Dec 8 14:54:14 2013
State : clean, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1

UUID : 7be080c3:58e3a9c4:55bdf7e0:ca9607bf
Events : 0.46828276

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 18 1 active sync /dev/sdb2

2 8 2 - spare /dev/sda2

[root@saturn new]# less /var/log/messages

Dec 8 12:48:28 saturn kernel: ata2.00: BMDMA stat 0x24
Dec 8 12:48:28 saturn kernel: ata2.00: cmd 25/00:08:42:2a:a7/00:00:43:00:00/e0 tag 0 dma 4096 in
Dec 8 12:48:28 saturn kernel: res 51/40:00:48:2a:a7/40:00:43:00:00/e0 Emask 0x9 (media error)
Dec 8 12:48:28 saturn kernel: ata2.00: status: { DRDY ERR }
Dec 8 12:48:28 saturn kernel: ata2.00: error: { UNC }
Dec 8 12:48:28 saturn kernel: ata2.00: configured for UDMA/133
Dec 8 12:48:28 saturn kernel: sd 1:0:0:0: Unhandled sense code
Dec 8 12:48:28 saturn kernel: sd 1:0:0:0: SCSI error: return code = 0x08000002
Dec 8 12:48:28 saturn kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Dec 8 12:48:28 saturn kernel: sdb: Current [descriptor]: sense key: Medium Error
Dec 8 12:48:28 saturn kernel: Add. Sense: Unrecovered read error - auto reallocate failed
Dec 8 12:48:28 saturn kernel:
Dec 8 12:48:28 saturn kernel: Descriptor sense data with sense descriptors (in hex):
Dec 8 12:48:28 saturn kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Dec 8 12:48:28 saturn kernel: 43 a7 2a 48
Dec 8 12:48:28 saturn kernel: ata2: EH complete
Dec 8 12:48:28 saturn kernel: raid1: sdb: unrecoverable I/O read error for block 1134819968
Dec 8 12:48:28 saturn kernel: SCSI device sdb: 1953525168 512-byte hdwr sectors (1000205 MB)
Dec 8 12:48:28 saturn kernel: sdb: Write Protect is off
Dec 8 12:48:28 saturn kernel: sdb: Mode Sense: 00 3a 00 00
Dec 8 12:48:28 saturn kernel: SCSI device sdb: drive cache: write back
Dec 8 12:48:28 saturn kernel: SCSI device sdb: 1953525168 512-byte hdwr sectors (1000205 MB)
Dec 8 12:48:28 saturn kernel: sdb: Write Protect is off
Dec 8 12:48:28 saturn kernel: sdb: Mode Sense: 00 3a 00 00
Dec 8 12:48:28 saturn kernel: SCSI device sdb: drive cache: write back
Dec 8 12:48:28 saturn kernel: md: md2: sync done.
Dec 8 12:48:29 saturn kernel: RAID1 conf printout:
Dec 8 12:48:29 saturn kernel: --- wd:1 rd:2
Dec 8 12:48:29 saturn kernel: disk 0, wo:1, o:1, dev:sda2
Dec 8 12:48:29 saturn kernel: disk 1, wo:0, o:1, dev:sdb2
Dec 8 12:48:29 saturn kernel: RAID1 conf printout:
Dec 8 12:48:29 saturn kernel: --- wd:1 rd:2
Dec 8 12:48:29 saturn kernel: disk 1, wo:0, o:1, dev:sdb2
Dec 8 13:09:28 saturn smartd[2768]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Dec 8 13:09:28 saturn smartd[2768]: Sending warning via mail to admin ...
Dec 8 13:09:28 saturn smartd[2768]: Warning via mail to admin: successful
Dec 8 13:39:28 saturn smartd[2768]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Dec 8 14:09:28 saturn smartd[2768]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Dec 8 14:31:44 saturn dhcpd: DHCPINFORM from 192.168.1.93 via br0
Dec 8 14:31:44 saturn dhcpd: DHCPACK to 192.168.1.93 (00:0e:7f:fc:28:46) via br0
Dec 8 14:39:29 saturn smartd[2768]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Dec 8 15:00:29 saturn kernel: raid1: Disk failure on sdb1, disabling device.
Dec 8 15:00:29 saturn kernel: Operation continuing on 1 devices
Dec 8 15:00:29 saturn kernel: RAID1 conf printout:
Dec 8 15:00:29 saturn kernel: --- wd:1 rd:2
Dec 8 15:00:29 saturn kernel: disk 0, wo:0, o:1, dev:sda1
Dec 8 15:00:29 saturn kernel: disk 1, wo:1, o:0, dev:sdb1
Dec 8 15:00:29 saturn kernel: RAID1 conf printout:
Dec 8 15:00:29 saturn kernel: --- wd:1 rd:2
Dec 8 15:00:29 saturn kernel: disk 0, wo:0, o:1, dev:sda1

Did then:

[root@saturn new]# mdadm --manage /dev/md1 --fail /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md1

[root@saturn new]# mdadm --manage /dev/md2 --fail /dev/sdb2
mdadm: set device faulty failed for /dev/sdb2: Device or resource busy

[root@saturn new]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[2](F) sda1[0]
104320 blocks [2/1] [U_]

md2 : active raid1 sdb2[1] sda2[2](S)
976655552 blocks [2/1] [_U]

unused devices: <none>

Next step:

[root@saturn ~]# mdadm --manage /dev/md1 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1

[root@saturn ~]# mdadm --manage /dev/md1 --add /dev/sdb1
mdadm: re-added /dev/sdb1

[root@saturn ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]

md2 : active raid1 sdb2[1] sda2[2](S)
976655552 blocks [2/1] [_U]

unused devices: <none>

and:

[root@saturn ~]# mdadm --detail /dev/md2
/dev/md2:
Version : 0.90
Creation Time : Fri Aug 8 17:01:14 2008
Raid Level : raid1
Array Size : 976655552 (931.41 GiB 1000.10 GB)
Used Dev Size : 976655552 (931.41 GiB 1000.10 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent

Update Time : Sun Dec 8 16:33:44 2013
State : clean, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1

UUID : 7be080c3:58e3a9c4:55bdf7e0:ca9607bf
Events : 0.46830392

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 18 1 active sync /dev/sdb2

2 8 2 - spare /dev/sda2

As far as I understand sdb might be defect. It should be replaced. But how to do this, if sda2 is a spare drive? Can I add sda2 back to the array? Isn't it still in md2? I think, I'm somewhere stuck on the road. Server is up and running. Could anybody advise how to proceed?

Thank's in advance
stefan

« Last Edit: December 08, 2013, 04:36:55 PM by SchulzStefan »

Logged

And then one day you find ten years have got behind you.

Time, 1973
(Mason, Waters, Wright, Gilmour)