janet,
Edit: To me it's prudent or wise to even zero out a new drive, at least that way you have totally ruled out the possibility of the drive being problematic for related reasons.
I'll try your advice.
I got this from the brand new drive which I yesterday plugged in:
Feb 13 05:23:22 saturn kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Feb 13 05:23:22 saturn kernel: ata2.00: BMDMA stat 0x25
Feb 13 05:23:22 saturn kernel: ata2.00: cmd 35/00:08:02:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
Feb 13 05:23:22 saturn kernel: res 51/10:08:02:59:70/10:00:74:00:00/e0 Emask 0x81 (invalid argument)
Feb 13 05:23:22 saturn kernel: ata2.00: status: { DRDY ERR }
Feb 13 05:23:22 saturn kernel: ata2.00: error: { IDNF }
Feb 13 05:23:36 saturn kernel: ata2.00: configured for UDMA/133
Feb 13 05:23:36 saturn kernel: sd 1:0:0:0: Unhandled sense code
Feb 13 05:23:36 saturn kernel: sd 1:0:0:0: SCSI error: return code = 0x08000002
Feb 13 05:23:36 saturn kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Feb 13 05:23:36 saturn kernel: sdb: Current [descriptor]: sense key: Aborted Command
Feb 13 05:23:36 saturn kernel: Add. Sense: Recorded entity not found
Feb 13 05:23:36 saturn kernel:
Feb 13 05:23:38 saturn kernel: Descriptor sense data with sense descriptors (in hex):
Feb 13 05:23:38 saturn kernel: 72 0b 14 00 00 00 00 0c 00 0a 80 00 00 00 00 00
Feb 13 05:23:39 saturn kernel: 74 70 59 02
Feb 13 05:23:39 saturn kernel: ata2: EH complete
Feb 13 05:23:39 saturn kernel: raid1: Disk failure on sdb2, disabling device.
Feb 13 05:23:39 saturn kernel: Operation continuing on 1 devices
Feb 13 05:23:39 saturn kernel: SCSI device sdb: 1953525168 512-byte hdwr sectors (1000205 MB)
Feb 13 05:23:39 saturn kernel: sdb: Write Protect is off
Feb 13 05:23:39 saturn kernel: sdb: Mode Sense: 00 3a 00 00
Feb 13 05:23:39 saturn kernel: SCSI device sdb: drive cache: write back
# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]
md2 : active raid1 sdb2[2](F) sda2[0]
976655488 blocks [2/1] [U_]
unused devices: <none>
smartctl -x /dev/sdb
smartctl 5.42 2011-10-20 r3458 [i686-linux-2.6.18-371.4.1.el5PAE] (local build)
Copyright (C) 2002-11 by Bruce Allen,
http://smartmontools.sourceforge.net=== START OF INFORMATION SECTION ===
Device Model: WDC WD10EFRX-68PJCN0
Serial Number: WD-WMC4J0189055
LU WWN Device Id: 5 0014ee 25ebeaa72
Firmware Version: 01.01A01
User Capacity: 1.000.204.886.016 bytes [1,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ACS-2 (revision not indicated)
Local Time is: Thu Feb 13 09:07:42 2014 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (13980) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 159) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 100 253 051 - 0
3 Spin_Up_Time POS--K 100 253 021 - 0
4 Start_Stop_Count -O--CK 100 100 000 - 1
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 100 100 000 - 17
10 Spin_Retry_Count -O--CK 100 253 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 1
192 Power-Off_Retract_Count -O--CK 200 200 000 - 0
193 Load_Cycle_Count -O--CK 200 200 000 - 1047
194 Temperature_Celsius -O---K 115 109 000 - 28
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 100 253 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
GP/S Log at address 0x00 has 1 sectors [Log Directory]
SMART Log at address 0x01 has 1 sectors [Summary SMART error log]
SMART Log at address 0x02 has 5 sectors [Comprehensive SMART error log]
GP Log at address 0x03 has 6 sectors [Ext. Comprehensive SMART error log]
SMART Log at address 0x06 has 1 sectors [SMART self-test log]
GP Log at address 0x07 has 1 sectors [Extended self-test log]
SMART Log at address 0x09 has 1 sectors [Selective self-test log]
GP Log at address 0x10 has 1 sectors [NCQ Command Error log]
GP Log at address 0x11 has 1 sectors [SATA Phy Event Counters]
GP Log at address 0x21 has 1 sectors [Write stream error log]
GP Log at address 0x22 has 1 sectors [Read stream error log]
GP/S Log at address 0x80 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x81 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x82 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x83 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x84 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x85 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x86 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x87 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x88 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x89 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8a has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8b has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8c has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8d has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8e has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8f has 16 sectors [Host vendor specific log]
GP/S Log at address 0x90 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x91 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x92 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x93 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x94 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x95 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x96 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x97 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x98 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x99 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9a has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9b has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9c has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9d has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9e has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9f has 16 sectors [Host vendor specific log]
GP/S Log at address 0xa0 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa1 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa2 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa3 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa4 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa5 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa6 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa7 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa8 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xa9 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xaa has 1 sectors [Device vendor specific log]
GP/S Log at address 0xab has 1 sectors [Device vendor specific log]
GP/S Log at address 0xac has 1 sectors [Device vendor specific log]
GP/S Log at address 0xad has 1 sectors [Device vendor specific log]
GP/S Log at address 0xae has 1 sectors [Device vendor specific log]
GP/S Log at address 0xaf has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb0 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb1 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb2 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb3 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb4 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb5 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb6 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb7 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xbd has 1 sectors [Device vendor specific log]
GP/S Log at address 0xc0 has 1 sectors [Device vendor specific log]
GP Log at address 0xc1 has 93 sectors [Device vendor specific log]
GP/S Log at address 0xe0 has 1 sectors [SCT Command/Status]
GP/S Log at address 0xe1 has 1 sectors [SCT Data Transfer]
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 1
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 1
- occurred at disk power-on lifetime: 13 hours (0 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 00 08 00 00 74 70 59 02 e0 00 Error: IDNF 8 sectors at LBA = 0x74705902 = 1953519874
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
35 00 00 00 08 00 00 74 70 59 02 e0 08 13:45:17.466 WRITE DMA EXT
ea 00 00 00 00 00 00 74 70 59 09 e0 08 13:45:12.148 FLUSH CACHE EXT
ea 00 00 00 00 00 00 74 70 59 09 e0 08 13:45:11.575 FLUSH CACHE EXT
35 00 00 00 08 00 00 74 70 59 02 e0 08 13:45:06.056 WRITE DMA EXT
ea 00 00 00 00 00 00 74 70 59 09 e0 08 13:45:04.380 FLUSH CACHE EXT
SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 28 Celsius
Power Cycle Min/Max Temperature: 25/34 Celsius
Lifetime Min/Max Temperature: 25/34 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (92)
Index Estimated Time Temperature Celsius
93 2014-02-13 01:10 29 **********
... ..( 3 skipped). .. **********
97 2014-02-13 01:14 29 **********
98 2014-02-13 01:15 30 ***********
... ..( 7 skipped). .. ***********
106 2014-02-13 01:23 30 ***********
107 2014-02-13 01:24 31 ************
108 2014-02-13 01:25 31 ************
109 2014-02-13 01:26 30 ***********
110 2014-02-13 01:27 31 ************
... ..( 19 skipped). .. ************
130 2014-02-13 01:47 31 ************
131 2014-02-13 01:48 32 *************
... ..( 34 skipped). .. *************
166 2014-02-13 02:23 32 *************
167 2014-02-13 02:24 31 ************
... ..( 14 skipped). .. ************
182 2014-02-13 02:39 31 ************
183 2014-02-13 02:40 30 ***********
... ..( 20 skipped). .. ***********
204 2014-02-13 03:01 30 ***********
205 2014-02-13 03:02 29 **********
... ..( 4 skipped). .. **********
210 2014-02-13 03:07 29 **********
211 2014-02-13 03:08 30 ***********
212 2014-02-13 03:09 30 ***********
213 2014-02-13 03:10 30 ***********
214 2014-02-13 03:11 29 **********
215 2014-02-13 03:12 30 ***********
... ..( 7 skipped). .. ***********
223 2014-02-13 03:20 30 ***********
224 2014-02-13 03:21 29 **********
225 2014-02-13 03:22 30 ***********
226 2014-02-13 03:23 29 **********
... ..( 87 skipped). .. **********
314 2014-02-13 04:51 29 **********
315 2014-02-13 04:52 28 *********
... ..( 10 skipped). .. *********
326 2014-02-13 05:03 28 *********
327 2014-02-13 05:04 29 **********
... ..(239 skipped). .. **********
89 2014-02-13 09:04 29 **********
90 2014-02-13 09:05 28 *********
91 2014-02-13 09:06 28 *********
92 2014-02-13 09:07 28 *********
SCT Error Recovery Control:
Read: 70 (7,0 seconds)
Write: 70 (7,0 seconds)
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 5 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 6 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 62972 Vendor specific
It seems to me, that beside your hints, that it could also be a misconfiguration or an incompatibility.
Feb 13 05:23:22 saturn kernel: ata2.00: cmd 35/00:08:02:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
Feb 13 05:23:22 saturn kernel: res 51/10:08:02:59:70/10:00:74:00:00/e0 Emask 0x81 (invalid argument)
That's not normal to me, that a brand new disk has dma errors after a few hours running.