Koozali.org: home of the SME Server

Contribs.org Forums => General Discussion => Topic started by: gbentley on May 09, 2011, 08:08:27 PM

Title: Help interpreting logs possible errors etc following SCSI disc upgrade
Post by: gbentley on May 09, 2011, 08:08:27 PM: Anyone give me a hand interpreting this lot following replacing / upgrading a SCSI disc?

From the admin manager panel at the console ;

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Current RAID status:

Personalities : [raid1]
md2 : active raid1 sda2[0] sdb2[1]
71577536 blocks [2/2] [UU]
md1 : active raid1 sda1[0] sdb1[1]
104320 blocks [2/2] [UU]
unused devices: <none>

devices: $VAR1 = {
'/dev/md2' => {
'PreferredMinor' => '2',
'RaidLevel' => 'raid1',
'State' => 'active',
'DeviceSize' => '71577536',
'1' => ' 1 8 18 1 active sync /dev/sdb2
',
'SpareDevices' => '0',
'0' => ' 0 8 2 0 active sync /dev/sda2
',
'RaidDevices' => '2',
'FailedDevices' => '0',
'UpdateTime' => 'Mon May 9 18:48:36 2011',
'ArraySize' => '71577536',
'UUID' => '167d52ff:b2e6714a:b8a80cbe:379c4756',
'CreationTime' => 'Mon Apr 10 13:05:35 2006',
'WorkingDevices' => '2',
'Persistence' => 'Superblock is persistent',
'UsedDisks' => [
'sda',
'sdb'
],
'Version' => '00.90.01',
'TotalDevices' => '2',
'Events' => '0.46268161',
'ActiveDevices' => '2'
},
'/dev/md1' => {
'PreferredMinor' => '1',
'RaidLevel' => 'raid1',
'State' => 'clean',
'DeviceSize' => '104320',
'1' => ' 1 8 17 1 active sync /dev/sdb1
',
'SpareDevices' => '0',
'0' => ' 0 8 1 0 active sync /dev/sda1
',
'RaidDevices' => '2',
'FailedDevices' => '0',
'UpdateTime' => 'Mon May 9 18:25:18 2011',
'ArraySize' => '104320',
'UUID' => '69808efe:17a0f8b4:d600945d:3b9e8e2e',
'CreationTime' => 'Mon Apr 10 13:05:35 2006',
'WorkingDevices' => '2',
'Persistence' => 'Superblock is persistent',
'UsedDisks' => [
'sda',
'sdb'
],
'Version' => '00.90.01',
'TotalDevices' => '2',
'Events' => '0.11701',
'ActiveDevices' => '2'
}
};

used_disks: $VAR1 = {
'sda' => 2,
'sdb' => 2
};

unclean: /dev/md2 => active
recovering:

+-------Disk redundancy status as of Monday May 9, 2011 18:48:34----------+on
¦ Current RAID status: ¦
¦ ¦
¦ Personalities : [raid1] ¦
¦ md2 : active raid1 sda2[0] sdb2[1] ¦
¦ 71577536 blocks [2/2] [UU] ¦
¦ md1 : active raid1 sda1[0] sdb1[1] ¦
¦ 104320 blocks [2/2] [UU] ¦
¦ unused devices: <none> ¦
¦ ¦
¦ ¦
¦ Only some of the RAID devices are unclean. ¦
¦ ¦
¦ Manual intervention may be required. ¦
¦ ¦
¦ ¦

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From CLI ;

[root@server ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda2[0] sdb2[1]
71577536 blocks [2/2] [UU]

md1 : active raid1 sda1[0] sdb1[1]
104320 blocks [2/2] [UU]

unused devices: <none>
[root@server ~]#

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From /var/log/dmsg

scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
<Adaptec AIC7902 Ultra320 SCSI adapter>
aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs

(scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|RTI, 16bit)
(scsi0:A:3): 160.000MB/s transfers (80.000MHz DT, 16bit)
(scsi0:A:15): 320.000MB/s transfers (160.000MHz DT|IU|RTI, 16bit)
Vendor: SEAGATE Model: ST373207LW Rev: D703
Type: Direct-Access ANSI SCSI revision: 03
scsi0:A:0:0: Tagged Queuing enabled. Depth 4
SCSI device sda: 143374650 512-byte hdwr sectors (73408 MB)
SCSI device sda: drive cache: write through
SCSI device sda: 143374650 512-byte hdwr sectors (73408 MB)
SCSI device sda: drive cache: write through
sda: sda1 sda2
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Vendor: HP Model: C7438A Rev: ZP5A
Type: Sequential-Access ANSI SCSI revision: 03
Vendor: FUJITSU Model: MBA3300NP Rev: 0102
Type: Direct-Access ANSI SCSI revision: 03
scsi0:A:15:0: Tagged Queuing enabled. Depth 4
SCSI device sdb: 585937500 512-byte hdwr sectors (300000 MB)
SCSI device sdb: drive cache: write back
SCSI device sdb: 585937500 512-byte hdwr sectors (300000 MB)
SCSI device sdb: drive cache: write back
sdb: sdb1 sdb2
Attached scsi disk sdb at scsi0, channel 0, id 15, lun 0
scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
<Adaptec AIC7902 Ultra320 SCSI adapter>
aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs

libata version 2.00 loaded.
ata_piix 0000:00:1f.2: version 2.00ac7
ata_piix 0000:00:1f.2: MAP [ P0 -- P1 -- ]
ACPI: PCI Interrupt 0000:00:1f.2[A] -> GSI 18 (level, low) -> IRQ 185
PCI: Setting latency timer of device 0000:00:1f.2 to 64
ata1: SATA max UDMA/133 cmd 0xFE00 ctl 0xFE12 bmdma 0xFEA0 irq 185
ata2: SATA max UDMA/133 cmd 0xFE20 ctl 0xFE32 bmdma 0xFEA8 irq 185
scsi2 : ata_piix
scsi3 : ata_piix
device-mapper: 4.5.5-ioctl (2006-12-01) initialised: dm-devel@redhat.com
md: raid1 personality registered as nr 3
md: md1 stopped.
scsi0: ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 08 8b 8f 4d 00 00 80 00
Info fld=0x88b8f55, Current sda: sense key Medium Error
Additional sense: Data synchronization mark error
end_request: I/O error, dev sda, sector 143363925
Buffer I/O error on device sda2, logical block 143155080

last line repated 30 odd times with different block nums

md: bind<sdb1>
md: bind<sda1>
raid1: raid set md1 active with 2 out of 2 mirrors
md: md2 stopped.
scsi0: ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 08 8b 8f 4d 00 00 80 00
Info fld=0x88b8f55, Current sda: sense key Medium Error
Additional sense: Data synchronization mark error
end_request: I/O error, dev sda, sector 143363925
Buffer I/O error on device sda2, logical block 143155080

last line repated 30 odd times with different block nums

scsi0: ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 08 8b 8f 4d 00 00 80 00
Info fld=0x88b8f55, Current sda: sense key Medium Error
Additional sense: Data synchronization mark error
end_request: I/O error, dev sda, sector 143363925
Buffer I/O error on device sda2, logical block 143155080

last line repated 30 odd times with different block nums

md: bind<sdb2>
md: bind<sda2>
raid1: raid set md2 active with 2 out of 2 mirrors

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Have I now found that at the time of replacing the older smaller discs that
the one I have left in to rebuild from does indeed already have problems?
Title: Re: Help interpreting logs possible errors etc following SCSI disc upgrade
Post by: gbentley on May 10, 2011, 09:03:19 AM: > SCSI device sda: 143374650 512-byte hdwr sectors (73408 MB)

Is num in red font the total blocks on the disc sda? If so this line ;

> Buffer I/O error on device sda2, logical block 143155080

Seems to suggest the disc has some problems near the end - this info seems pertinent ;

http://lists.us.dell.com/pipermail/linux-poweredge/2007-January/028991.html

About 30 locations blocks are reported with Buffer I/O messages. Subtracting from the
the first reported from the total blocks [sectors?] would give 219570? Hmm, not sure
how to interpret that?

I have done fairly quick checks on sizes of data files available on ibays from a Winstation
[but not as yet any disc error checks with smartctl etc] but have copied it over to a
backup device etc - theres about 55GB of data on the server.

As I am going to repalce the disc anyway is it worth doing extensive tests?

Or is there any quick way to test data integrity [ie check that only areas of the disc that
data is actually in, is checked?]

Any help / tips apreciated thanks!

Edit: just ran # smartctl -d scsi -H /dev/sda

SMART Health Status: OK

Now doing long test - will report back later :)
Title: Re: Help interpreting logs possible errors etc following SCSI disc upgrade
Post by: gbentley on May 10, 2011, 09:48:29 AM: # smartctl -d scsi -a /dev/sda

smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: SEAGATE ST373207LW Version: D703
Serial number: 3KT42CY1
Device type: disk
Transport protocol: Parallel SCSI (SPI-4)
Local Time is: Tue May 10 08:44:17 2011 BST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature: 37 C
Drive Trip Temperature: 68 C
Vendor (Seagate) cache information
Blocks sent to initiator = 1304467954
Blocks received from initiator = 1336503664
Blocks read from cache and sent to initiator = 3897447114
Number of read and write commands whose size <= segment size = 903394977
Number of read and write commands whose size > segment size = 1

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
EEC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 188285484 0 0 188285484 188380624 86913.338 670
write: 0 0 0 0 0 3166.838 0
verify: 0 0 0 0 0 0.000 0

Non-medium error count: 3

Error Events logging not supported

SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Completed, segment failed - 43319 - [0x4 0x3e 0x3]
# 2 Background short Completed, segment failed - 43319 - [0x4 0x3e 0x3]
# 3 Background short Completed, segment failed - 43319 - [- - -]