Koozali.org: home of the SME Server

Obsolete Releases => SME Server 8.x => Topic started by: SchulzStefan on December 08, 2013, 03:22:45 PM

Title: RAID1 out of sync
Post by: SchulzStefan on December 08, 2013, 03:22:45 PM: SME 8.1 up-to-date. Got an email that the RAID is out of sync. Removed the sdb from the array. Did a reboot, was dropped in a console. Performed a fsck and corrected the errors. Booted the machine and added the sdb back to the array. Got this email:

Return-Path: <anonymous@ivb.local>
Delivered-To: admin@saturn.ivb.local
Received: (qmail 9289 invoked by alias); 8 Dec 2013 12:09:28 -0000
Delivered-To: alias-localdelivery-admin@ivb.local
Received: (qmail 9286 invoked by uid 0); 8 Dec 2013 12:09:28 -0000
Date: 8 Dec 2013 12:09:28 -0000
Message-ID: <20131208120928.9285.qmail@ivb.local>
From: anonymous@ivb.local
To: admin@ivb.local
Subject: SMART error (CurrentPendingSector) detected on host: saturn

This email was generated by the smartd daemon running on:

host name: saturn
DNS domain: ivb.local
NIS domain: (none)

The following warning/error was logged by the smartd daemon:

Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
Another email message will be sent in 1 days if the problem persists

Did the following:

[root@saturn new]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]

md2 : active raid1 sdb2[1] sda2[2](S)
976655552 blocks [2/1] [_U]

unused devices: <none>

and:

[root@saturn new]# mdadm --detail /dev/md1
/dev/md1:
Version : 0.90
Creation Time : Fri Aug 8 17:01:14 2008
Raid Level : raid1
Array Size : 104320 (101.89 MiB 106.82 MB)
Used Dev Size : 104320 (101.89 MiB 106.82 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Sun Dec 8 10:11:18 2013
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

UUID : b5f1b131:fe27265a:85dfe98f:3fb577a2
Events : 0.19554

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1

[root@saturn new]# mdadm --detail /dev/md2
/dev/md2:
Version : 0.90
Creation Time : Fri Aug 8 17:01:14 2008
Raid Level : raid1
Array Size : 976655552 (931.41 GiB 1000.10 GB)
Used Dev Size : 976655552 (931.41 GiB 1000.10 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent

Update Time : Sun Dec 8 14:54:14 2013
State : clean, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1

UUID : 7be080c3:58e3a9c4:55bdf7e0:ca9607bf
Events : 0.46828276

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 18 1 active sync /dev/sdb2

2 8 2 - spare /dev/sda2

[root@saturn new]# less /var/log/messages

Dec 8 12:48:28 saturn kernel: ata2.00: BMDMA stat 0x24
Dec 8 12:48:28 saturn kernel: ata2.00: cmd 25/00:08:42:2a:a7/00:00:43:00:00/e0 tag 0 dma 4096 in
Dec 8 12:48:28 saturn kernel: res 51/40:00:48:2a:a7/40:00:43:00:00/e0 Emask 0x9 (media error)
Dec 8 12:48:28 saturn kernel: ata2.00: status: { DRDY ERR }
Dec 8 12:48:28 saturn kernel: ata2.00: error: { UNC }
Dec 8 12:48:28 saturn kernel: ata2.00: configured for UDMA/133
Dec 8 12:48:28 saturn kernel: sd 1:0:0:0: Unhandled sense code
Dec 8 12:48:28 saturn kernel: sd 1:0:0:0: SCSI error: return code = 0x08000002
Dec 8 12:48:28 saturn kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Dec 8 12:48:28 saturn kernel: sdb: Current [descriptor]: sense key: Medium Error
Dec 8 12:48:28 saturn kernel: Add. Sense: Unrecovered read error - auto reallocate failed
Dec 8 12:48:28 saturn kernel:
Dec 8 12:48:28 saturn kernel: Descriptor sense data with sense descriptors (in hex):
Dec 8 12:48:28 saturn kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Dec 8 12:48:28 saturn kernel: 43 a7 2a 48
Dec 8 12:48:28 saturn kernel: ata2: EH complete
Dec 8 12:48:28 saturn kernel: raid1: sdb: unrecoverable I/O read error for block 1134819968
Dec 8 12:48:28 saturn kernel: SCSI device sdb: 1953525168 512-byte hdwr sectors (1000205 MB)
Dec 8 12:48:28 saturn kernel: sdb: Write Protect is off
Dec 8 12:48:28 saturn kernel: sdb: Mode Sense: 00 3a 00 00
Dec 8 12:48:28 saturn kernel: SCSI device sdb: drive cache: write back
Dec 8 12:48:28 saturn kernel: SCSI device sdb: 1953525168 512-byte hdwr sectors (1000205 MB)
Dec 8 12:48:28 saturn kernel: sdb: Write Protect is off
Dec 8 12:48:28 saturn kernel: sdb: Mode Sense: 00 3a 00 00
Dec 8 12:48:28 saturn kernel: SCSI device sdb: drive cache: write back
Dec 8 12:48:28 saturn kernel: md: md2: sync done.
Dec 8 12:48:29 saturn kernel: RAID1 conf printout:
Dec 8 12:48:29 saturn kernel: --- wd:1 rd:2
Dec 8 12:48:29 saturn kernel: disk 0, wo:1, o:1, dev:sda2
Dec 8 12:48:29 saturn kernel: disk 1, wo:0, o:1, dev:sdb2
Dec 8 12:48:29 saturn kernel: RAID1 conf printout:
Dec 8 12:48:29 saturn kernel: --- wd:1 rd:2
Dec 8 12:48:29 saturn kernel: disk 1, wo:0, o:1, dev:sdb2
Dec 8 13:09:28 saturn smartd[2768]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Dec 8 13:09:28 saturn smartd[2768]: Sending warning via mail to admin ...
Dec 8 13:09:28 saturn smartd[2768]: Warning via mail to admin: successful
Dec 8 13:39:28 saturn smartd[2768]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Dec 8 14:09:28 saturn smartd[2768]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Dec 8 14:31:44 saturn dhcpd: DHCPINFORM from 192.168.1.93 via br0
Dec 8 14:31:44 saturn dhcpd: DHCPACK to 192.168.1.93 (00:0e:7f:fc:28:46) via br0
Dec 8 14:39:29 saturn smartd[2768]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Dec 8 15:00:29 saturn kernel: raid1: Disk failure on sdb1, disabling device.
Dec 8 15:00:29 saturn kernel: Operation continuing on 1 devices
Dec 8 15:00:29 saturn kernel: RAID1 conf printout:
Dec 8 15:00:29 saturn kernel: --- wd:1 rd:2
Dec 8 15:00:29 saturn kernel: disk 0, wo:0, o:1, dev:sda1
Dec 8 15:00:29 saturn kernel: disk 1, wo:1, o:0, dev:sdb1
Dec 8 15:00:29 saturn kernel: RAID1 conf printout:
Dec 8 15:00:29 saturn kernel: --- wd:1 rd:2
Dec 8 15:00:29 saturn kernel: disk 0, wo:0, o:1, dev:sda1

Did then:

[root@saturn new]# mdadm --manage /dev/md1 --fail /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md1

[root@saturn new]# mdadm --manage /dev/md2 --fail /dev/sdb2
mdadm: set device faulty failed for /dev/sdb2: Device or resource busy

[root@saturn new]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[2](F) sda1[0]
104320 blocks [2/1] [U_]

md2 : active raid1 sdb2[1] sda2[2](S)
976655552 blocks [2/1] [_U]

unused devices: <none>

Next step:

[root@saturn ~]# mdadm --manage /dev/md1 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1

[root@saturn ~]# mdadm --manage /dev/md1 --add /dev/sdb1
mdadm: re-added /dev/sdb1

[root@saturn ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]

md2 : active raid1 sdb2[1] sda2[2](S)
976655552 blocks [2/1] [_U]

unused devices: <none>

and:

[root@saturn ~]# mdadm --detail /dev/md2
/dev/md2:
Version : 0.90
Creation Time : Fri Aug 8 17:01:14 2008
Raid Level : raid1
Array Size : 976655552 (931.41 GiB 1000.10 GB)
Used Dev Size : 976655552 (931.41 GiB 1000.10 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent

Update Time : Sun Dec 8 16:33:44 2013
State : clean, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1

UUID : 7be080c3:58e3a9c4:55bdf7e0:ca9607bf
Events : 0.46830392

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 18 1 active sync /dev/sdb2

2 8 2 - spare /dev/sda2

As far as I understand sdb might be defect. It should be replaced. But how to do this, if sda2 is a spare drive? Can I add sda2 back to the array? Isn't it still in md2? I think, I'm somewhere stuck on the road. Server is up and running. Could anybody advise how to proceed?

Thank's in advance
stefan
Title: Re: RAID1 out of sync
Post by: mmccarn on December 08, 2013, 09:03:03 PM: [guess]
You might learn something from
Code: [Select]
smartctl -a /dev/sdaand/or
Code: [Select]
smartcel -a /dev/sdb[/guess]
Title: Re: RAID1 out of sync
Post by: SchulzStefan on December 08, 2013, 09:16:23 PM: Thank's for your reply.

Meanwhile did:

[root@saturn ~]# mdadm --manage /dev/md2 --fail /dev/sda2
mdadm: set /dev/sda2 faulty in /dev/md2

[root@saturn ~]# mdadm --detail /dev/md2
/dev/md2:
Version : 0.90
Creation Time : Fri Aug 8 17:01:14 2008
Raid Level : raid1
Array Size : 976655552 (931.41 GiB 1000.10 GB)
Used Dev Size : 976655552 (931.41 GiB 1000.10 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent

Update Time : Sun Dec 8 21:09:27 2013
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0

UUID : 7be080c3:58e3a9c4:55bdf7e0:ca9607bf
Events : 0.46836377

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 18 1 active sync /dev/sdb2

2 8 2 - faulty spare /dev/sda2

and:

[root@saturn ~]# mdadm /dev/md2 -r /dev/sda2
mdadm: hot removed /dev/sda2

[root@saturn ~]# mdadm --detail /dev/md2
/dev/md2:
Version : 0.90
Creation Time : Fri Aug 8 17:01:14 2008
Raid Level : raid1
Array Size : 976655552 (931.41 GiB 1000.10 GB)
Used Dev Size : 976655552 (931.41 GiB 1000.10 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 2
Persistence : Superblock is persistent

Update Time : Sun Dec 8 21:10:38 2013
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

UUID : 7be080c3:58e3a9c4:55bdf7e0:ca9607bf
Events : 0.46836416

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 18 1 active sync /dev/sdb2
and:

[root@saturn ~]# mdadm --manage /dev/md2 --add /dev/sda2
mdadm: re-added /dev/sda2

[root@saturn ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]

md2 : active raid1 sda2[2] sdb2[1]
976655552 blocks [2/1] [_U]
[>....................] recovery = 0.0% (164480/976655552) finish=1187.2min speed=13706K/sec

unused devices: <none>

Time is ticking away...

We'll see what happens.
Title: Re: RAID1 out of sync
Post by: janet on December 08, 2013, 10:53:08 PM: SchulzStefan

No matter what, you should do a full diagnostic test of both drives, using the drive manufacturers test utility software (also see UBCD).
There is no point trying (& forcing) to add drives back to the array when there is a fault, be it a steady state or intermittent fault.

Before adding any drive you should clear it using the dd command.

You can add a clean or new unused drive to the array using the admin console, no need to issues any commands.
Title: Re: RAID1 out of sync
Post by: SchulzStefan on December 09, 2013, 08:27:22 AM: Janet, thank you for your reply.

Both disks are pretty new. Bought them not twelve moth ago. While booting the machine, BIOS reports that all diks are in health. As I can see in the syslog, the disk I should worry about is sdb - not sda. Therefore I'm confused that sda is now a spare drive. How can this be?

Btw, here are the results of my last action:

email to admin:

This is an automatically generated mail message from mdadm running on saturn.ivb.local.
A RebuildFinished event has been detected on md device /dev/md2.

[root@saturn ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]

md2 : active raid1 sda2[2](S) sdb2[1]
976655552 blocks [2/1] [_U]

unused devices: <none>

[root@saturn ~]# mdadm --detail /dev/md2
/dev/md2:
Version : 0.90
Creation Time : Fri Aug 8 17:01:14 2008
Raid Level : raid1
Array Size : 976655552 (931.41 GiB 1000.10 GB)
Used Dev Size : 976655552 (931.41 GiB 1000.10 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent

Update Time : Mon Dec 9 08:14:32 2013
State : clean, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1

UUID : 7be080c3:58e3a9c4:55bdf7e0:ca9607bf
Events : 0.46852604

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 18 1 active sync /dev/sdb2

2 8 2 - spare /dev/sda2

@Janet: did I get this correct: your suggestion is to test and then wipe (dd) the sda-disk, put it back in the box, add it via admin-menu back in the raid, and let it sync?

First step to me would be:

mdadm --manage /dev/md1 --fail /dev/sda1
mdadm --manage /dev/md2 --fail /dev/sda2

then:

mdadm /dev/md1 -r /dev/sda1
mdadm /dev/md2 -r /dev/sda2

Power down the machine. Move the sda out of the box, restart the server. Wipe sda and test the disk. After the test put the unformatted disk back in the box, and let it sync.

Are those steps safe and correct?
Title: Re: RAID1 out of sync
Post by: SchulzStefan on December 09, 2013, 12:09:54 PM: Did as I told in the post before. Pulled out the sda disk. Did:

# smartctl -l selftest /dev/sdb
smartctl 5.41 2011-06-09 r3365 [i686-linux-3.2.0-4-686-pae] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 6374 -

and:

# smartctl -a /dev/sdb
smartctl 5.41 2011-06-09 r3365 [i686-linux-3.2.0-4-686-pae] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model: WDC WD10EFRX-68JCSN0
Serial Number: WD-WCC1U0641728
LU WWN Device Id: 5 0014ee 2b29a02e9
Firmware Version: 01.01A01
User Capacity: 1.000.204.886.016 bytes [1,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Mon Dec 9 11:58:27 2013 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00)   Offline data collection activity
               was never started.
               Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0)   The previous self-test routine completed
               without error or no self-test has ever
               been run.
Total time to complete Offline
data collection:       (13320) seconds.
Offline data collection
capabilities:           (0x7b) SMART execute Offline immediate.
               Auto Offline data collection on/off support.
               Suspend Offline collection upon new
               command.
               Offline surface scan supported.
               Self-test supported.
               Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities: (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability: (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     ( 2) minutes.
Extended self-test routine
recommended polling time:     ( 152) minutes.
Conveyance self-test routine
recommended polling time:     ( 5) minutes.
SCT capabilities:      (0x30bd)   SCT Status supported.
               SCT Error Recovery Control supported.
               SCT Feature Control supported.
               SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 136 135 021 Pre-fail Always - 4200
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 23
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 6374
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 23
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 13
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 9
194 Temperature_Celsius 0x0022 112 107 000 Old_age Always - 31
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0

SMART Error Log Version: 1
ATA Error Count: 9 (device log contains only the most recent five errors)
   CR = Command Register [HEX]
   FR = Features Register [HEX]
   SC = Sector Count Register [HEX]
   SN = Sector Number Register [HEX]
   CL = Cylinder Low Register [HEX]
   CH = Cylinder High Register [HEX]
   DH = Device/Head Register [HEX]
   DC = Device Command Register [HEX]
   ER = Error register [HEX]
   ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 9 occurred at disk power-on lifetime: 6336 hours (264 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 41 12 00 e0 Error: UNC 8 sectors at LBA = 0x00001241 = 4673

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 41 12 00 e0 08 30d+18:24:00.240 READ DMA
c8 00 08 41 0e 00 e0 08 30d+18:23:53.848 READ DMA
ec 00 00 00 00 00 a0 08 30d+18:23:53.838 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 30d+18:23:53.838 SET FEATURES [Set transfer mode]

Error 8 occurred at disk power-on lifetime: 6336 hours (264 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 41 0e 00 e0 Error: UNC 8 sectors at LBA = 0x00000e41 = 3649

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 41 0e 00 e0 08 30d+18:23:46.627 READ DMA
c8 00 08 39 0a 00 e0 08 30d+18:23:46.289 READ DMA
ec 00 00 00 00 00 a0 08 30d+18:23:46.279 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 30d+18:23:46.279 SET FEATURES [Set transfer mode]

Error 7 occurred at disk power-on lifetime: 6336 hours (264 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 39 0a 00 e0 Error: UNC 8 sectors at LBA = 0x00000a39 = 2617

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 39 0a 00 e0 08 30d+18:23:33.193 READ DMA
c8 00 08 51 06 00 e0 08 30d+18:23:32.954 READ DMA
c8 00 18 39 06 00 e0 08 30d+18:23:32.954 READ DMA
c8 00 08 31 06 00 e0 08 30d+18:23:32.935 READ DMA

Error 6 occurred at disk power-on lifetime: 6336 hours (264 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 31 06 00 e0 Error: UNC 8 sectors at LBA = 0x00000631 = 1585

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 31 06 00 e0 08 30d+18:23:18.871 READ DMA
c8 00 08 09 02 00 e0 08 30d+18:23:16.191 READ DMA
c8 00 08 01 00 00 e0 08 30d+18:22:34.168 READ DMA
c8 00 01 01 2f 03 e0 08 30d+18:22:30.903 READ DMA

Error 5 occurred at disk power-on lifetime: 6335 hours (263 days + 23 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 20 82 2f 03 e0 Error: UNC 32 sectors at LBA = 0x00032f82 = 208770

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 20 82 2f 03 e0 08 30d+18:19:58.850 READ DMA
c8 00 08 00 02 00 e0 08 30d+18:19:58.490 READ DMA

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 6374 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Means to me, the disk is o.k. Will now overwrite the MBR with

dd if=/dev/zero of=/dev/sdb bs=512 count=1

and then destroy any data/partition with gparted.

Next step then would be to put the disk back in the server and add it via admin-menu to the RAID. We'll see what happens.
Title: Re: RAID1 out of sync
Post by: SchulzStefan on December 09, 2013, 02:34:49 PM: Had to change the sata-connectors on the board to boot the box. Added the disk in the admin-menu to the raid.

syslog:

Dec 9 12:35:02 saturn add_drive_to_raid: Warnung: Partition 1 endet nicht an einer Zylindergrenze
Dec 9 12:35:02 saturn add_drive_to_raid:
Dec 9 12:35:02 saturn add_drive_to_raid: Festplatte /dev/sdb: 121601 Zylinder, 255 Köpfe, 63 Sektoren/Spur
Dec 9 12:35:02 saturn add_drive_to_raid: Alte Aufteilung:
Dec 9 12:35:02 saturn add_drive_to_raid: Einheit = Blöcke von 1024 Bytes, Zählung beginnt bei 0
Dec 9 12:35:02 saturn add_drive_to_raid:
Dec 9 12:35:02 saturn add_drive_to_raid: Gerät boot. Anfang Ende #Blöcke Id System
Dec 9 12:35:02 saturn add_drive_to_raid: /dev/sdb1 0 - 0 0 Leer
Dec 9 12:35:02 saturn add_drive_to_raid: /dev/sdb2 0 - 0 0 Leer
Dec 9 12:35:02 saturn add_drive_to_raid: /dev/sdb3 0 - 0 0 Leer
Dec 9 12:35:02 saturn add_drive_to_raid: /dev/sdb4 0 - 0 0 Leer
Dec 9 12:35:02 saturn add_drive_to_raid: Neue Aufteilung:
Dec 9 12:35:02 saturn add_drive_to_raid: Einheit = Blöcke von 1024 Bytes, Zählung beginnt bei 0
Dec 9 12:35:03 saturn add_drive_to_raid:
Dec 9 12:35:03 saturn add_drive_to_raid: Gerät boot. Anfang Ende #Blöcke Id System
Dec 9 12:35:03 saturn add_drive_to_raid: /dev/sdb1 * 0+ 104384 104384+ fd Linux raid autodetect
Dec 9 12:35:03 saturn add_drive_to_raid: /dev/sdb2 104385 976760031 976655647 fd Linux raid autodetect
Dec 9 12:35:03 saturn add_drive_to_raid: /dev/sdb3 0 - 0 0 Leer
Dec 9 12:35:03 saturn add_drive_to_raid: /dev/sdb4 0 - 0 0 Leer
Dec 9 12:35:03 saturn add_drive_to_raid: Die neue Partitionstabelle wurde erfolgreich geschrieben
Dec 9 12:35:03 saturn add_drive_to_raid:
Dec 9 12:35:03 saturn add_drive_to_raid: Die Partitionstabelle wird erneut gelesen...
Dec 9 12:35:50 saturn kernel: SCSI device sdb: 1953525168 512-byte hdwr sectors (1000205 MB)
Dec 9 12:35:50 saturn kernel: sdb: Write Protect is off
Dec 9 12:35:50 saturn kernel: sdb: Mode Sense: 00 3a 00 00
Dec 9 12:35:50 saturn kernel: SCSI device sdb: drive cache: write back
Dec 9 12:35:50 saturn kernel: sdb: sdb1 sdb2
Dec 9 12:35:50 saturn add_drive_to_raid: Wenn Sie eine DOS-Partition angelegt oder geändert haben, z. B. /dev/foo7,
Dec 9 12:35:50 saturn add_drive_to_raid: dann nehmen Sie dd(1), um die ersten 512 Bytes auf 0 zu setzen:
Dec 9 12:35:50 saturn add_drive_to_raid: »dd if=/dev/zero of=/dev/foo7 bs=512 count=1« (siehe fdisk(8)).
Dec 9 12:35:50 saturn add_drive_to_raid:
Dec 9 12:36:11 saturn add_drive_to_raid:
Dec 9 12:36:11 saturn add_drive_to_raid: Checking partitions on /dev/sdb...
Dec 9 12:36:11 saturn add_drive_to_raid:
Dec 9 12:36:11 saturn add_drive_to_raid: Going to add /dev/sdb1 to /dev/md1
Dec 9 12:36:11 saturn kernel: md: bind<sdb1>
Dec 9 12:36:11 saturn kernel: RAID1 conf printout:
Dec 9 12:36:11 saturn add_drive_to_raid: mdadm: re-added /dev/sdb1
Dec 9 12:36:11 saturn kernel: --- wd:1 rd:2
Dec 9 12:36:11 saturn kernel: disk 0, wo:1, o:1, dev:sdb1
Dec 9 12:36:11 saturn kernel: disk 1, wo:0, o:1, dev:sda1
Dec 9 12:36:11 saturn kernel: md: syncing RAID array md1
Dec 9 12:36:11 saturn kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Dec 9 12:36:11 saturn kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Dec 9 12:36:11 saturn kernel: md: using 128k window, over a total of 104320 blocks.
Dec 9 12:36:13 saturn add_drive_to_raid: Going to add /dev/sdb2 to /dev/md2
Dec 9 12:36:13 saturn kernel: md: bind<sdb2>
Dec 9 12:36:14 saturn kernel: RAID1 conf printout:
Dec 9 12:36:14 saturn kernel: --- wd:1 rd:2
Dec 9 12:36:14 saturn kernel: disk 0, wo:1, o:1, dev:sdb2
Dec 9 12:36:14 saturn kernel: disk 1, wo:0, o:1, dev:sda2
Dec 9 12:36:14 saturn kernel: md: delaying resync of md2 until md1 has finished resync (they share one or more physical units)
Dec 9 12:36:14 saturn add_drive_to_raid: mdadm: re-added /dev/sdb2
Dec 9 12:36:17 saturn kernel: md: md1: sync done.
Dec 9 12:36:17 saturn kernel: md: syncing RAID array md2
Dec 9 12:36:17 saturn kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Dec 9 12:36:17 saturn kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Dec 9 12:36:17 saturn kernel: md: using 128k window, over a total of 976655552 blocks.
Dec 9 12:36:18 saturn kernel: RAID1 conf printout:
Dec 9 12:36:18 saturn kernel: --- wd:2 rd:2
Dec 9 12:36:18 saturn kernel: disk 0, wo:0, o:1, dev:sdb1
Dec 9 12:36:18 saturn kernel: disk 1, wo:0, o:1, dev:sda1
Dec 9 12:36:31 saturn add_drive_to_raid:
Dec 9 12:36:32 saturn add_drive_to_raid: Waiting for boot partition to sync before installing grub...
Dec 9 12:36:34 saturn add_drive_to_raid: Probing devices to guess BIOS drives. This may take a long time.
Dec 9 12:36:34 saturn add_drive_to_raid:
Dec 9 12:36:34 saturn add_drive_to_raid:
Dec 9 12:36:34 saturn add_drive_to_raid: GNU GRUB version 0.97 (640K lower / 3072K upper memory)
Dec 9 12:36:34 saturn add_drive_to_raid:
Dec 9 12:36:34 saturn add_drive_to_raid: [ Minimal BASH-like line editing is supported. For the first word, TAB
Dec 9 12:36:34 saturn add_drive_to_raid: lists possible command completions. Anywhere else TAB lists the possible
Dec 9 12:36:34 saturn add_drive_to_raid: completions of a device/filename.]
Dec 9 12:36:34 saturn add_drive_to_raid: grub> device (hd0) /dev/sdb
Dec 9 12:36:34 saturn add_drive_to_raid: grub> root (hd0,0)
Dec 9 12:36:34 saturn add_drive_to_raid: Filesystem type is ext2fs, partition type 0xfd
Dec 9 12:36:34 saturn add_drive_to_raid: grub> setup (hd0)
Dec 9 12:36:34 saturn add_drive_to_raid: Checking if "/boot/grub/stage1" exists... no
Dec 9 12:36:34 saturn add_drive_to_raid: Checking if "/grub/stage1" exists... yes
Dec 9 12:36:34 saturn add_drive_to_raid: Checking if "/grub/stage2" exists... yes
Dec 9 12:36:34 saturn add_drive_to_raid: Checking if "/grub/e2fs_stage1_5" exists... yes
Dec 9 12:36:34 saturn add_drive_to_raid: Running "embed /grub/e2fs_stage1_5 (hd0)"... failed (this is not fatal)
Dec 9 12:36:34 saturn add_drive_to_raid: Running "embed /grub/e2fs_stage1_5 (hd0,0)"... failed (this is not fatal)
Dec 9 12:36:34 saturn add_drive_to_raid: Running "install /grub/stage1 (hd0) /grub/stage2 p /grub/grub.conf "... succeeded
Dec 9 12:36:34 saturn add_drive_to_raid: Done.
Dec 9 12:36:34 saturn add_drive_to_raid: grub> quit
Dec 9 13:03:19 saturn smartd[2685]: Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors
Dec 9 13:33:19 saturn smartd[2685]: Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors
Dec 9 14:03:19 saturn smartd[2685]: Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors

Status:

[root@saturn log]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[0] sda1[1]
104320 blocks [2/2] [UU]

md2 : active raid1 sdb2[2] sda2[1]
976655552 blocks [2/1] [_U]
[=======>.............] recovery = 38.2% (373489280/976655552) finish=128.7min speed=78087K/sec

SMART reports 2 unreadable sectors from the disk, which is mirrored. Could this be the problem, that the RAID will finish with a healthy disk as a spare-disk? Because the original disk cannot be read at 2 sectors?
Title: Re: RAID1 out of sync
Post by: SchulzStefan on December 09, 2013, 03:34:50 PM: Short before 60%:

[root@saturn ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[0] sda1[1]
104320 blocks [2/2] [UU]

md2 : active raid1 sdb2[2](S) sda2[1]
976655552 blocks [2/1] [_U]

unused devices: <none>

And syslog shows:

Dec 9 14:36:53 saturn kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 9 14:36:53 saturn kernel: ata1.00: BMDMA stat 0x25
Dec 9 14:36:53 saturn kernel: ata1.00: cmd 25/00:00:82:92:00/00:01:31:00:00/e0 tag 0 dma 131072 in
Dec 9 14:36:53 saturn kernel: res 51/40:5f:18:93:00/40:00:31:00:00/e0 Emask 0x9 (media error)
Dec 9 14:36:53 saturn kernel: ata1.00: status: { DRDY ERR }
Dec 9 14:36:53 saturn kernel: ata1.00: error: { UNC }
Dec 9 14:36:53 saturn kernel: ata1.00: configured for UDMA/133
Dec 9 14:36:53 saturn kernel: ata1: EH complete
Dec 9 14:36:54 saturn kernel: SCSI device sda: 1953525168 512-byte hdwr sectors (1000205 MB)
Dec 9 14:36:54 saturn kernel: sda: Write Protect is off
Dec 9 14:36:54 saturn kernel: sda: Mode Sense: 00 3a 00 00
Dec 9 14:36:54 saturn kernel: SCSI device sda: drive cache: write back
Dec 9 15:03:19 saturn smartd[2685]: Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors
Dec 9 15:18:13 saturn kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 9 15:18:13 saturn kernel: ata1.00: BMDMA stat 0x25
Dec 9 15:18:13 saturn kernel: ata1.00: cmd 25/00:00:02:28:a7/00:04:43:00:00/e0 tag 0 dma 524288 in
Dec 9 15:18:13 saturn kernel: res 51/40:af:48:2a:a7/40:01:43:00:00/e0 Emask 0x9 (media error)
Dec 9 15:18:13 saturn kernel: ata1.00: status: { DRDY ERR }
Dec 9 15:18:13 saturn kernel: ata1.00: error: { UNC }
Dec 9 15:18:13 saturn kernel: ata1.00: configured for UDMA/133
Dec 9 15:18:13 saturn kernel: ata1: EH complete
Dec 9 15:18:16 saturn kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 9 15:18:16 saturn kernel: ata1.00: BMDMA stat 0x25
Dec 9 15:18:16 saturn kernel: ata1.00: cmd 25/00:00:02:28:a7/00:04:43:00:00/e0 tag 0 dma 524288 in
Dec 9 15:18:16 saturn kernel: res 51/40:af:48:2a:a7/40:01:43:00:00/e0 Emask 0x9 (media error)
Dec 9 15:18:16 saturn kernel: ata1.00: status: { DRDY ERR }
Dec 9 15:18:46 saturn kernel: ata1.00: error: { UNC }
Dec 9 15:18:46 saturn kernel: ata1.00: configured for UDMA/133
Dec 9 15:18:46 saturn kernel: ata1: EH complete
Dec 9 15:18:46 saturn kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 9 15:18:46 saturn kernel: ata1.00: BMDMA stat 0x25
Dec 9 15:18:46 saturn kernel: ata1.00: cmd 25/00:00:02:28:a7/00:04:43:00:00/e0 tag 0 dma 524288 in
Dec 9 15:18:46 saturn kernel: res 51/40:af:48:2a:a7/40:01:43:00:00/e0 Emask 0x9 (media error)
Dec 9 15:18:46 saturn kernel: ata1.00: status: { DRDY ERR }
Dec 9 15:18:46 saturn kernel: ata1.00: error: { UNC }
Dec 9 15:18:46 saturn kernel: ata1.00: configured for UDMA/133
Dec 9 15:18:46 saturn kernel: ata1.00: configured for UDMA/133
Dec 9 15:18:46 saturn kernel: ata1: EH complete
Dec 9 15:18:46 saturn kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 9 15:18:46 saturn kernel: ata1.00: BMDMA stat 0x25
Dec 9 15:18:46 saturn kernel: ata1.00: cmd 25/00:00:02:28:a7/00:04:43:00:00/e0 tag 0 dma 524288 in
Dec 9 15:18:46 saturn kernel: res 51/40:af:48:2a:a7/40:01:43:00:00/e0 Emask 0x9 (media error)
Dec 9 15:18:46 saturn kernel: ata1.00: status: { DRDY ERR }
Dec 9 15:18:46 saturn kernel: ata1.00: error: { UNC }
Dec 9 15:18:46 saturn kernel: ata1.00: configured for UDMA/133
Dec 9 15:18:46 saturn kernel: ata1: EH complete
Dec 9 15:18:46 saturn kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 9 15:18:46 saturn kernel: ata1.00: BMDMA stat 0x25
Dec 9 15:18:46 saturn kernel: ata1.00: cmd 25/00:00:02:28:a7/00:04:43:00:00/e0 tag 0 dma 524288 in
Dec 9 15:18:46 saturn kernel: res 51/40:af:48:2a:a7/40:01:43:00:00/e0 Emask 0x9 (media error)
Dec 9 15:18:46 saturn kernel: ata1.00: status: { DRDY ERR }
Dec 9 15:18:46 saturn kernel: ata1.00: error: { UNC }
Dec 9 15:18:46 saturn kernel: ata1.00: configured for UDMA/133
Dec 9 15:18:46 saturn kernel: ata1: EH complete
Dec 9 15:18:46 saturn kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 9 15:18:46 saturn kernel: ata1.00: BMDMA stat 0x25
Dec 9 15:18:46 saturn kernel: ata1.00: cmd 25/00:00:02:28:a7/00:04:43:00:00/e0 tag 0 dma 524288 in
Dec 9 15:18:46 saturn kernel: res 51/40:af:48:2a:a7/40:01:43:00:00/e0 Emask 0x9 (media error)
Dec 9 15:18:46 saturn kernel: ata1.00: status: { DRDY ERR }
Dec 9 15:18:46 saturn kernel: ata1.00: error: { UNC }
Dec 9 15:18:46 saturn kernel: ata1.00: configured for UDMA/133
Dec 9 15:18:46 saturn kernel: sd 0:0:0:0: Unhandled sense code
Dec 9 15:18:46 saturn kernel: sd 0:0:0:0: SCSI error: return code = 0x08000002
Dec 9 15:18:46 saturn kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Dec 9 15:18:46 saturn kernel: sda: Current [descriptor]: sense key: Medium Error
Dec 9 15:18:46 saturn kernel: Add. Sense: Unrecovered read error - auto reallocate failed
Dec 9 15:18:46 saturn kernel:
Dec 9 15:18:46 saturn kernel: Descriptor sense data with sense descriptors (in hex):
Dec 9 15:18:46 saturn kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Dec 9 15:18:46 saturn kernel: 43 a7 2a 48
Dec 9 15:18:46 saturn kernel: ata1: EH complete
Dec 9 15:18:46 saturn kernel: SCSI device sda: 1953525168 512-byte hdwr sectors (1000205 MB)
Dec 9 15:18:46 saturn kernel: sda: Write Protect is off
Dec 9 15:18:46 saturn kernel: sda: Mode Sense: 00 3a 00 00
Dec 9 15:18:46 saturn kernel: SCSI device sda: drive cache: write back
Dec 9 15:18:46 saturn kernel: SCSI device sda: 1953525168 512-byte hdwr sectors (1000205 MB)
Dec 9 15:18:46 saturn kernel: sda: Write Protect is off
Dec 9 15:18:46 saturn kernel: sda: Mode Sense: 00 3a 00 00
Dec 9 15:18:46 saturn kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 9 15:18:46 saturn kernel: ata1.00: BMDMA stat 0x24
Dec 9 15:18:46 saturn kernel: ata1.00: cmd 25/00:08:42:2a:a7/00:00:43:00:00/e0 tag 0 dma 4096 in
Dec 9 15:18:46 saturn kernel: res 51/40:00:48:2a:a7/40:00:43:00:00/e0 Emask 0x9 (media error)
Dec 9 15:18:46 saturn kernel: ata1.00: status: { DRDY ERR }
Dec 9 15:18:46 saturn kernel: ata1.00: error: { UNC }
Dec 9 15:18:46 saturn kernel: ata1.00: configured for UDMA/133
Dec 9 15:18:46 saturn kernel: ata1: EH complete
Dec 9 15:18:46 saturn kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 9 15:18:46 saturn kernel: ata1.00: BMDMA stat 0x24
Dec 9 15:18:46 saturn kernel: ata1.00: cmd 25/00:08:42:2a:a7/00:00:43:00:00/e0 tag 0 dma 4096 in
Dec 9 15:18:46 saturn kernel: res 51/40:00:48:2a:a7/40:00:43:00:00/e0 Emask 0x9 (media error)
Dec 9 15:18:46 saturn kernel: ata1.00: status: { DRDY ERR }
Dec 9 15:18:46 saturn kernel: ata1.00: error: { UNC }
Dec 9 15:18:46 saturn kernel: ata1.00: configured for UDMA/133
Dec 9 15:18:46 saturn kernel: ata1: EH complete
Dec 9 15:18:46 saturn kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 9 15:18:46 saturn kernel: ata1.00: BMDMA stat 0x24
Dec 9 15:18:46 saturn kernel: ata1.00: cmd 25/00:08:42:2a:a7/00:00:43:00:00/e0 tag 0 dma 4096 in
Dec 9 15:18:46 saturn kernel: res 51/40:00:48:2a:a7/40:00:43:00:00/e0 Emask 0x9 (media error)
Dec 9 15:18:46 saturn kernel: ata1.00: status: { DRDY ERR }
Dec 9 15:18:46 saturn kernel: ata1.00: error: { UNC }
Dec 9 15:18:46 saturn kernel: ata1.00: configured for UDMA/133
Dec 9 15:18:46 saturn kernel: ata1: EH complete
Dec 9 15:18:46 saturn kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 9 15:18:46 saturn kernel: ata1.00: BMDMA stat 0x24
Dec 9 15:18:47 saturn kernel: ata1.00: cmd 25/00:08:42:2a:a7/00:00:43:00:00/e0 tag 0 dma 4096 in
Dec 9 15:18:47 saturn kernel: res 51/40:00:48:2a:a7/40:00:43:00:00/e0 Emask 0x9 (media error)
Dec 9 15:18:47 saturn kernel: ata1.00: status: { DRDY ERR }
Dec 9 15:18:47 saturn kernel: ata1.00: error: { UNC }
Dec 9 15:18:47 saturn kernel: ata1.00: configured for UDMA/133
Dec 9 15:18:47 saturn kernel: ata1: EH complete
Dec 9 15:18:47 saturn kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 9 15:18:47 saturn kernel: ata1.00: BMDMA stat 0x24
Dec 9 15:18:47 saturn kernel: ata1.00: cmd 25/00:08:42:2a:a7/00:00:43:00:00/e0 tag 0 dma 4096 in
Dec 9 15:18:47 saturn kernel: res 51/40:00:48:2a:a7/40:00:43:00:00/e0 Emask 0x9 (media error)
Dec 9 15:18:47 saturn kernel: ata1.00: status: { DRDY ERR }
Dec 9 15:18:47 saturn kernel: ata1.00: error: { UNC }
Dec 9 15:18:47 saturn kernel: ata1.00: configured for UDMA/133
Dec 9 15:18:47 saturn kernel: ata1: EH complete
Dec 9 15:18:47 saturn kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 9 15:18:47 saturn kernel: ata1.00: BMDMA stat 0x24
Dec 9 15:18:47 saturn kernel: ata1.00: cmd 25/00:08:42:2a:a7/00:00:43:00:00/e0 tag 0 dma 4096 in
Dec 9 15:18:47 saturn kernel: res 51/40:00:48:2a:a7/40:00:43:00:00/e0 Emask 0x9 (media error)
Dec 9 15:18:47 saturn kernel: ata1.00: status: { DRDY ERR }
Dec 9 15:18:47 saturn kernel: ata1.00: error: { UNC }
Dec 9 15:18:47 saturn kernel: ata1.00: configured for UDMA/133
Dec 9 15:18:47 saturn kernel: sd 0:0:0:0: Unhandled sense code
Dec 9 15:18:47 saturn kernel: sd 0:0:0:0: SCSI error: return code = 0x08000002
Dec 9 15:18:47 saturn kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Dec 9 15:18:47 saturn kernel: sda: Current [descriptor]: sense key: Medium Error
Dec 9 15:18:47 saturn kernel: Add. Sense: Unrecovered read error - auto reallocate failed
Dec 9 15:18:47 saturn kernel:
Dec 9 15:18:47 saturn kernel: Descriptor sense data with sense descriptors (in hex):
Dec 9 15:18:47 saturn kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Dec 9 15:18:47 saturn kernel: 43 a7 2a 48
Dec 9 15:18:47 saturn kernel: raid1: sda: unrecoverable I/O read error for block 1134819968
Dec 9 15:18:47 saturn kernel: ata1: EH complete
Dec 9 15:18:47 saturn kernel: SCSI device sda: drive cache: write back
Dec 9 15:18:47 saturn kernel: SCSI device sda: 1953525168 512-byte hdwr sectors (1000205 MB)
Dec 9 15:18:47 saturn kernel: sda: Write Protect is off
Dec 9 15:18:47 saturn kernel: sda: Mode Sense: 00 3a 00 00
Dec 9 15:18:47 saturn kernel: SCSI device sda: drive cache: write back
Dec 9 15:18:47 saturn kernel: md: md2: sync done.
Dec 9 15:18:48 saturn kernel: RAID1 conf printout:
Dec 9 15:18:48 saturn kernel: --- wd:1 rd:2
Dec 9 15:18:48 saturn kernel: disk 0, wo:1, o:1, dev:sdb2
Dec 9 15:18:48 saturn kernel: disk 1, wo:0, o:1, dev:sda2
Dec 9 15:18:48 saturn kernel: RAID1 conf printout:
Dec 9 15:18:48 saturn kernel: --- wd:1 rd:2
Dec 9 15:18:48 saturn kernel: disk 1, wo:0, o:1, dev:sda2

Seems that mirroring is not possible. Hmm, what sense does a RAID have in this case?

On the mainboard there is also the option RAID. I'll give this a chance. We'll see.

edit:

bad idea - intel says: "When creating the RAID volume through CRTL-I; (BIOS) this for sure will erase data."
Title: Re: RAID1 out of sync
Post by: SchulzStefan on December 09, 2013, 06:11:23 PM: Pulled the disk out of the box. Right now I'm running with the WD Data Lifeguard Diagnostics Tool (in another machine) the extended test on this disk. If the tool will find bad sectors, it should fix (zero) it, and we'll see if a raid will work. If I'm able to boot from the disk, of course.
Title: Re: RAID1 out of sync
Post by: SchulzStefan on December 09, 2013, 08:50:33 PM: The tool reports errors. Asked if I wish to correct them. Opted for "yes" - process started, but couldn't be finished. "Please contact the technical service." That's it. Will now restore from my last affa-backup on the other disk.
Title: Re: RAID1 out of sync
Post by: SchulzStefan on December 17, 2013, 07:35:18 PM: Final question to a RAID1.

I'm not an expert, therefore my question: Can it be, that in a RAID1 system with two identical disks, one disk (the one with no errors at all) turns in a spare drive, because the other one has bad blocks? Or did I miss something? Well, if that can be mirroring disks in a RAID1 is useless. IMHO. Would be nice to get an answer.

stefan
Title: Re: RAID1 out of sync
Post by: janet on December 17, 2013, 11:34:42 PM: SchulzStefan

It is not normal or usual for a hard disk that is an active member of a software RAID1 array to be reverted to a spare disk. As you say that would somewhat defeat the purpose of having RAID.
If one disk fails the system should continue to run happily in degraded mode using the one good disk.
As far as I can see, that appears to be what happened initially in your system.

The problem is that you then started manually issuing commands to break the array & rebuild it etc, ALL of which were NOT necessary for you to do.

All you SHOULD have done was test both drives "in situ" using the manufacturers diagnostic software from a bootable CD, &/or using smartctl, & proven that a certain drive was faulty.
It is always wise to test both disks, for if one is showing signs of failure, then the other disk (whivh is probably of similar age & manufacturing batch etc) could also be exhibiting problems or early signs of problems.
Then you should have replaced the faulty drive with a brand new drive & rebuilt the array using the admin Console menu, without needing to issue any commands & potentially doing something wrong. If you needed to use an "already used but good drive", then always run the correct dd command to clear that drive.

I think you have either misinterpreted the output of your test results and/or issued inappropriate or incorrect commands which disrupted your functioning but degraded array.

Your server would/should have kept functioning on one drive in degraded mode (although a little less safely), allowing you ample time to obtain a new drive & replace the faulty one in the server.
Title: Re: RAID1 out of sync
Post by: SchulzStefan on December 18, 2013, 09:02:49 AM: janet

thank you for clarification. I wouldn't have manually investigated, if the server would have booted. Next time first thing what I'll do, is to remove the corrupted disk. Hopefully the machine will boot then.

stefan
Title: Re: RAID1 out of sync
Post by: janet on December 18, 2013, 09:29:08 AM: SchulzStefan

Quote
I wouldn't have manually investigated, if the server would have booted.

The most you typically need to do to get your server to boot in degraded mode, is to swap the good drive to the first sata port (as you eventually did).