Koozali.org: home of the SME Server

Problems with DMA - hdparm please help

Offline Elluminatus

  • **
  • 40
  • +0/-0
Problems with DMA - hdparm please help
« on: August 27, 2007, 12:36:44 PM »
Hi forum,

when i start my SME Server i see some problems during startup depending on DMA. Can you help me to erase this failures?

The bootlog said:

Quote
[...]
Aug 27 08:03:55 hades rc.sysinit: Dateisysteme prüfen succeeded
Aug 27 08:04:10 hades kernel: md: raid5 personality registered as nr 4
Aug 27 08:03:55 hades rc.sysinit: Lokale Dateisysteme einhängen:  succeeded
Aug 27 08:04:10 hades kernel: md: md1 stopped.
Aug 27 08:03:55 hades rc.sysinit: Quota für lokale Dateisysteme aktivieren:  succeeded
Aug 27 08:04:10 hades kernel: md: bind<hdf1>
Aug 27 08:03:56 hades rc.sysinit: Swap-Bereich aktivieren:  succeeded
Aug 27 08:04:10 hades kernel: md: bind<hdg1>
Aug 27 08:04:02 hades net.agent[2103]: remove event not handled
Aug 27 08:04:10 hades kernel: md: bind<hdh1>
Aug 27 08:04:02 hades net.agent[2133]: remove event not handled
Aug 27 08:04:10 hades kernel: md: bind<hde1>
Aug 27 08:04:02 hades init: Entering runlevel: 7
Aug 27 08:04:10 hades kernel: raid1: raid set md1 active with 3 out of 3 mirrors
Aug 27 08:04:08 hades microcode_ctl: Starten von microcode_ctl succeeded
Aug 27 08:04:10 hades kernel: md: md2 stopped.
Aug 27 08:04:10 hades kernel: hdg: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Aug 27 08:04:10 hades kernel: hdg: dma_intr: error=0x84 { DriveStatusError BadCRC }

Aug 27 08:04:10 hades kernel: ide: failed opcode was: unknown
Aug 27 08:04:10 hades kernel: md: bind<hdf2>
Aug 27 08:04:10 hades kernel: md: bind<hdh2>
Aug 27 08:04:10 hades kernel: md: bind<hdg2>
Aug 27 08:04:10 hades kernel: md: bind<hde2>
Aug 27 08:04:10 hades kernel: raid5: device hde2 operational as raid disk 0
Aug 27 08:04:10 hades kernel: raid5: device hdh2 operational as raid disk 2
Aug 27 08:04:10 hades kernel: raid5: device hdf2 operational as raid disk 1
Aug 27 08:04:10 hades kernel: raid5: allocated 3166kB for md2
Aug 27 08:04:10 hades kernel: raid5: raid level 5 set md2 active with 3 out of 3 devices, algorithm 2
Aug 27 08:04:10 hades kernel: RAID5 conf printout:
Aug 27 08:04:10 hades kernel:  --- rd:3 wd:3 fd:0
Aug 27 08:04:10 hades kernel:  disk 0, o:1, dev:hde2
Aug 27 08:04:10 hades kernel:  disk 1, o:1, dev:hdf2
Aug 27 08:04:10 hades kernel:  disk 2, o:1, dev:hdh2
Aug 27 08:04:10 hades kernel: cdrom: open failed.
Aug 27 08:04:10 hades kernel: hdg: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Aug 27 08:04:10 hades kernel: hdg: dma_intr: error=0x84 { DriveStatusError BadCRC }

Aug 27 08:04:10 hades kernel: ide: failed opcode was: unknown
Aug 27 08:04:10 hades kernel: kjournald starting.  Commit interval 5 seconds
Aug 27 08:04:10 hades kernel: EXT3-fs: mounted filesystem with ordered data mode.
Aug 27 08:04:10 hades kernel: inserting floppy driver for 2.6.9-55.0.2.EL
Aug 27 08:04:10 hades kernel: Floppy drive(s): fd0 is 1.44M
Aug 27 08:04:10 hades kernel: FDC 0 is a post-1991 82077
Aug 27 08:04:10 hades kernel: Ethernet Channel Bonding Driver: v2.6.3-rh (June 8, 2005)
Aug 27 08:04:10 hades kernel: bonding: MII link monitoring set to 200 ms
Aug 27 08:04:10 hades kernel: divert: allocating divert_blk for bond0
Aug 27 08:04:11 hades kernel: e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
Aug 27 08:04:11 hades kernel: e100: Copyright(c) 1999-2005 Intel Corporation
Aug 27 08:04:11 hades kernel: PCI: Found IRQ 9 for device 0000:01:08.0
Aug 27 08:04:11 hades kernel: PCI: Sharing IRQ 9 with 0000:00:1f.2
Aug 27 08:04:11 hades kernel: divert: allocating divert_blk for eth0
Aug 27 08:04:11 hades kernel: e100: eth0: e100_probe: addr 0xf4120000, irq 9, MAC addr 00:30:05:07:57:88
Aug 27 08:04:11 hades kernel: hw_random hardware driver 1.0.0 loaded
Aug 27 08:04:11 hades kernel: parport0: PC-style at 0x378 [PCSPP,TRISTATE,EPP]
Aug 27 08:04:11 hades kernel: PCI: Found IRQ 5 for device 0000:01:0f.0
Aug 27 08:04:11 hades kernel: PCI: Sharing IRQ 5 with 0000:00:1f.3
Aug 27 08:04:11 hades kernel: PCI: Sharing IRQ 5 with 0000:01:07.0
Aug 27 08:04:11 hades kernel: PCI parallel port detected: 14d2:8001, I/O at 0x5000(0x4c00)
Aug 27 08:04:11 hades kernel: parport1: PC-style at 0x5000 (0x4c00) [PCSPP,TRISTATE]
Aug 27 08:04:11 hades kernel: PCI parallel port detected: 14d2:8001, I/O at 0x4800(0x4400)
Aug 27 08:04:11 hades kernel: parport2: PC-style at 0x4800 (0x4400) [PCSPP,TRISTATE]
Aug 27 08:04:11 hades kernel: USB Universal Host Controller Interface driver v2.2
Aug 27 08:04:11 hades kernel: PCI: Found IRQ 9 for device 0000:00:1f.2
Aug 27 08:04:11 hades kernel: PCI: Sharing IRQ 9 with 0000:01:08.0
Aug 27 08:04:11 hades kernel: uhci_hcd 0000:00:1f.2: UHCI Host Controller
Aug 27 08:04:11 hades kernel: PCI: Setting latency timer of device 0000:00:1f.2 to 64
Aug 27 08:04:11 hades kernel: uhci_hcd 0000:00:1f.2: irq 9, io base 00001400
Aug 27 08:04:11 hades kernel: uhci_hcd 0000:00:1f.2: new USB bus registered, assigned bus number 1
Aug 27 08:04:11 hades kernel: hub 1-0:1.0: USB hub found
Aug 27 08:04:11 hades kernel: hub 1-0:1.0: 2 ports detected
Aug 27 08:04:11 hades kernel: md: Autodetecting RAID arrays.
Aug 27 08:04:11 hades kernel: md: could not bd_claim hde1.
Aug 27 08:04:11 hades kernel: md: could not bd_claim hde2.
Aug 27 08:04:11 hades kernel: md: could not bd_claim hdf1.
Aug 27 08:04:11 hades kernel: md: could not bd_claim hdf2.
Aug 27 08:04:11 hades kernel: md: could not bd_claim hdg1.
Aug 27 08:04:11 hades kernel: md: could not bd_claim hdg2.
Aug 27 08:04:11 hades kernel: md: could not bd_claim hdh1.
Aug 27 08:04:11 hades kernel: md: could not bd_claim hdh2.
Aug 27 08:04:11 hades kernel: md: autorun ...
Aug 27 08:04:11 hades kernel: md: ... autorun DONE.
Aug 27 08:04:11 hades kernel: EXT3 FS on dm-0, internal journal
Aug 27 08:04:11 hades kernel: loop: loaded (max 8 devices)
Aug 27 08:04:11 hades kernel: cdrom: open failed.
Aug 27 08:04:11 hades last message repeated 3 times
Aug 27 08:04:11 hades kernel: kjournald starting.  Commit interval 5 seconds
Aug 27 08:04:11 hades kernel: EXT3 FS on md1, internal journal
Aug 27 08:04:11 hades kernel: EXT3-fs: mounted filesystem with ordered data mode.
Aug 27 08:04:11 hades kernel: Adding 1048568k swap on /dev/main/swap.  Priority:-1 extents:1
Aug 27 08:04:11 hades kernel: hdh: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Aug 27 08:04:11 hades kernel: hdh: dma_intr: error=0x84 { DriveStatusError BadCRC }
Aug 27 08:04:11 hades kernel: ide: failed opcode was: unknown
Aug 27 08:04:11 hades kernel: hdh: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Aug 27 08:04:11 hades kernel: hdh: dma_intr: error=0x84 { DriveStatusError BadCRC }
Aug 27 08:04:11 hades kernel: ide: failed opcode was: unknown
Aug 27 08:04:11 hades kernel: hdh: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Aug 27 08:04:11 hades kernel: hdh: dma_intr: error=0x84 { DriveStatusError BadCRC }
Aug 27 08:04:11 hades kernel: ide: failed opcode was: unknown
Aug 27 08:04:11 hades kernel: hdh: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Aug 27 08:04:11 hades kernel: hdh: dma_intr: error=0x84 { DriveStatusError BadCRC }
Aug 27 08:04:11 hades kernel: ide: failed opcode was: unknown
Aug 27 08:04:11 hades kernel: ide3: reset: success
Aug 27 08:04:11 hades kernel: hdh: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Aug 27 08:04:11 hades kernel: hdh: dma_intr: error=0x84 { DriveStatusError BadCRC }
Aug 27 08:04:11 hades kernel: ide: failed opcode was: unknown
Aug 27 08:04:11 hades kernel: hdh: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Aug 27 08:04:11 hades kernel: hdh: dma_intr: error=0x84 { DriveStatusError BadCRC }
Aug 27 08:04:11 hades kernel: ide: failed opcode was: unknown
Aug 27 08:04:11 hades kernel: hdh: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Aug 27 08:04:11 hades kernel: hdh: dma_intr: error=0x84 { DriveStatusError BadCRC }
Aug 27 08:04:11 hades kernel: ide: failed opcode was: unknown
Aug 27 08:04:11 hades kernel: hdh: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Aug 27 08:04:11 hades kernel: hdh: dma_intr: error=0x84 { DriveStatusError BadCRC }
Aug 27 08:04:11 hades kernel: ide: failed opcode was: unknown

Aug 27 08:04:11 hades kernel: hdg: DMA disabled
Aug 27 08:04:11 hades kernel: ide3: reset: success

hdparm tell me that...:

Quote
/dev/hde:

# hdparm -iv /dev/hde

/dev/hde:
 multcount    = 16 (on)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    = 256 (on)
 geometry     = 19457/255/63, sectors = 160041885696, start = 0

 Model=ST3160812A, FwRev=3.AAD, SerialNo=5LS12A0H
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=16
 CurCHS=65535/1/63, CurSects=4128705, LBA=yes, LBAsects=268435455
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: device does not report version:

 * signifies the current active mode

[root@hades ~]# hdparm -tT /dev/hde

/dev/hde:
 Timing cached reads:   540 MB in  2.00 seconds = 269.91 MB/sec
 Timing buffered disk reads:  134 MB in  3.02 seconds =  44.35 MB/sec
[root@hades ~]# "/sbin/e-smith/config setprop hdparm status enabled
>
[root@hades ~]# "/sbin/e-smith/config setprop hdparm status enable
>
[root@hades ~]# hdparm -iv /dev/hde

/dev/hde:
 multcount    = 16 (on)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    = 256 (on)
 geometry     = 19457/255/63, sectors = 160041885696, start = 0

 Model=ST3160812A, FwRev=3.AAD, SerialNo=5LS12A0H
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=16
 CurCHS=65535/1/63, CurSects=4128705, LBA=yes, LBAsects=268435455
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: device does not report version:

 * signifies the current active mode

[root@hades ~]# hdparm -iv /dev/hdg

/dev/hdg:
 multcount    = 16 (on)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  0 (off)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    = 256 (on)
 geometry     = 19457/255/63, sectors = 160041885696, start = 0

 Model=ST3160812A, FwRev=3.AAD, SerialNo=5LS12P6M
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=16
 CurCHS=65535/1/63, CurSects=4128705, LBA=yes, LBAsects=268435455
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 *udma4 udma5
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: device does not report version:

 * signifies the current active mode

[root@hades ~]# hdparm -iv /dev/hdf

/dev/hdf:
 multcount    = 16 (on)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    = 256 (on)
 geometry     = 19457/255/63, sectors = 160041885696, start = 0

 Model=ST3160812A, FwRev=3.AAD, SerialNo=5LS131HW
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=16
 CurCHS=65535/1/63, CurSects=4128705, LBA=yes, LBAsects=268435455
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: device does not report version:

 * signifies the current active mode

[root@hades ~]# hdparm -iv /dev/hdh

/dev/hdh:
 multcount    = 16 (on)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    = 256 (on)
 geometry     = 19457/255/63, sectors = 160041885696, start = 0

 Model=ST3160812A, FwRev=3.AAJ, SerialNo=5LS5RF2D
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=16
 CurCHS=65535/1/63, CurSects=4128705, LBA=yes, LBAsects=268435455
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 *udma3 udma4 udma5
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: device does not report version:

 * signifies the current active mode

an hdparm -tT shows me that my hdd g speed is to small

Quote
hdparm -tT /dev/hdg

/dev/hdg:
 Timing cached reads:   516 MB in  2.01 seconds = 256.88 MB/sec
 Timing buffered disk reads:    6 MB in  4.18 seconds =   1.44 MB/sec

The only possible answer for that could be that the fresh SME 7.2 server uses hdd hdg as the hot sprae drive, right? But why are there so many
Quote
Aug 27 08:04:11 hades kernel: hdh: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Aug 27 08:04:11 hades kernel: hdh: dma_intr: error=0x84 { DriveStatusError BadCRC }
failures?

Please help me
Greetings E.

« Last Edit: August 27, 2007, 12:45:05 PM by Elluminatus »

Offline Reinhold

  • *
  • 517
  • +0/-0
    • http://127.0.0.1
Re: Problems with DMA - hdparm please help
« Reply #1 on: August 27, 2007, 10:52:33 PM »
Erleuchteter,

I do NOT see any hdparm problems.
What I see is your kernel trying to recover from IDE lockup
Aug 27 08:04:11 hades kernel: hdg: DMA disabled
Aug 27 08:04:11 hades kernel: ide3: reset: success


Your problems most likely stem from one of the following:

1. Your 2nd ide interface shares an IRQ with something else
2. Your 3rd IDE cable isn't 80pin and/or defective
3  Your 6th IDE drive happens to be old & slow, dying ?

To check have a look:  cat /var/log/dmesg
.... look for your interfaces .. example  ..
Probing IDE interface ide0...
hda: GCR-8483B, ATAPI CD/DVD-ROM drive
Using cfq io scheduler
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14

.............................................
...look for errors concerning drives there ...
...........................................
[pls do not post the whole stuff :-)  ]

Regards
Reinhold

For the rest:
Quote
http://wiki.contribs.org/Raid
...
# 4-6 Drives - Software RAID 5 + 1 Hot-spare
...

If you need to look check your raid with:
# cat /proc/mdstat
# mdadm --misc --detail /dev/md1
# mdadm --misc --detail /dev/md2
............

Offline Reinhold

  • *
  • 517
  • +0/-0
    • http://127.0.0.1
Re: Problems with DMA - hdparm please help
« Reply #2 on: August 27, 2007, 11:14:21 PM »
Addon:
Since Seagate says your drives are fairly new:
5LS12P6M     In Warranty   Expiration 14-Dec-2010

I'd concentrate on the other two suggestions...
 
or
smartctl -a /dev/hdg     
your drives  hda ... hdh

If you still need more answers you need to give more details on your hd & md setup ...
(8 disks ?!  ... and only Raid5 ?!? )

Regards
Reinhold
............

Offline Elluminatus

  • **
  • 40
  • +0/-0
Re: Problems with DMA - hdparm please help
« Reply #3 on: August 28, 2007, 03:54:20 PM »
Hi thanks for your help. I have an onboard controller where a dvd rom drive is installed. And a seperate controller (hde-hdh) with four Seagate 160 GB drives installed.

Quote
1. Your 2nd ide interface shares an IRQ with something else
2. Your 3rd IDE cable isn't 80pin and/or defective
3  Your 6th IDE drive happens to be old & slow, dying ?

1. here is the (small)  8-) output

Code: [Select]
Probing IDE interface ide0...
Probing IDE interface ide1...
hdc: TSSTcorpDVD-ROM TS-H352A, ATAPI CD/DVD-ROM drive
Using cfq io scheduler
ide1 at 0x170-0x177,0x376 on irq 15
SiI680: IDE controller at PCI slot 0000:01:07.0
PCI: Found IRQ 5 for device 0000:01:07.0
PCI: Sharing IRQ 5 with 0000:00:1f.3
PCI: Sharing IRQ 5 with 0000:01:0f.0

Is this what you need? But what mean it to me?


2. I´m not shure maybe we can erase case 1 and 3 first...


3.
Code: [Select]
mdadm --misc --detail /dev/md1

Output:
Quote
3      34       65        -      spare   /dev/hdh1

The results of:
Code: [Select]
smartctl -a /dev/hdg
Quote
SMART overall-health self-assessment test result: PASSED

So there is Raid5 with 4 drives (3 drives and 1 spare).

So can you help me again please
Thank you!

E.


« Last Edit: August 28, 2007, 04:34:30 PM by Elluminatus »

Offline Reinhold

  • *
  • 517
  • +0/-0
    • http://127.0.0.1
Re: Problems with DMA - hdparm please help
« Reply #4 on: August 29, 2007, 11:39:13 PM »
Quote
SiI680: IDE controller at PCI slot 0000:01:07.0
PCI: Found IRQ 5 for device 0000:01:07.0
PCI: Sharing IRQ 5 with 0000:00:1f.3
PCI: Sharing IRQ 5 with 0000:01:0f.0

means that your Sata Controller is sharing it's IRQ 5
with two other devices on the PCI Bus.
THIS is most likely the source of your problems.

Check which devices you have where with lspci

Use your Bios settings (reassign IRQs, turn off unused devices) or
shuffle PCI boards (physically relocate them in other slots)
to make a single IRQ free for the 2nd IDE Controller

Note: There used to be a time when IRQ 5 was "free", Soundcard, LPT2 or NIC
(for explanation you could read http://en.wikipedia.org/wiki/Interrupt_request)

Regards
Reinhold
............