Koozali.org: home of the SME Server

DegradedArray event: how to remove unused devices?

Offline magwm

  • *
  • 159
  • +0/-0
  • SmeLover
    • Gadis Tourist Service Italia SRL
DegradedArray event: how to remove unused devices?
« on: July 30, 2007, 12:09:02 PM »
Hello!

I am using SME as our company web server, on a P4 machine with 2 20gb scsi disks (boot + system) and one 80gb ide disk (one large ibay).

Regularly, I am getting "A DegradedArray event has been detected on md device /dev/md1." e-mails, but when I look at the raid status all seems OK.

I suspect that the unused raid devices (0 in md1, 1 in md2) causes the event, but I really don't understand how to remove the removed  device..

can anyone shed some light on this?

also, the "manage disk redundancy" of the admin menu doesn't help much, as it only reports that there is an additional disk (hdc), and it asks if I want to add it to a raid array. I say NO as it has data that is backed up elsewhere, but then the menu exits and doesn't talk about the raid arrays.

BTW, the SCSI card is a Adaptec 29320 Ultra320 SCSI adapter which should do raid, too, but I seem to have installed software raid..

thanks a lot for your help!!

Michel - Gadis

-----

[root@www ~]# cat /proc/mdstat

Personalities : [raid1]
md2 : active raid1 sda2[0] sdb2[2]
      17816000 blocks [3/2] [U_U]

md1 : active raid1 sda1[1] sdb1[2]
      104320 blocks [3/2] [_UU]

unused devices: <none>

-----

[root@www ~]# mdadm --detail /dev/md1
/dev/md1:
        Version : 00.90.01
  Creation Time : Tue May  8 19:03:09 2007
     Raid Level : raid1
     Array Size : 104320 (101.89 MiB 106.82 MB)
    Device Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 3
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Mon Jul 30 09:54:02 2007
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 123f3a9c:9ade4333:c858deb7:b1aea4be
         Events : 0.1459

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1       8        1        1      active sync   /dev/sda1
       2       8       17        2      active sync   /dev/sdb1

-----
       
[root@www ~]# mdadm --detail /dev/md2
/dev/md2:
        Version : 00.90.01
  Creation Time : Tue May  8 19:03:09 2007
     Raid Level : raid1
     Array Size : 17816000 (16.99 GiB 18.24 GB)
    Device Size : 17816000 (16.99 GiB 18.24 GB)
   Raid Devices : 3
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Mon Jul 30 10:51:52 2007
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 94b5168f:f65fe5e2:8f34cf7d:7724b91b
         Events : 0.1653805

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       0        0        -      removed
       2       8       18        2      active sync   /dev/sdb2

-----
from dmesg:

SCSI subsystem initialized
ACPI: PCI Interrupt 0000:00:08.0[A] -> GSI 19 (level, low) -> IRQ 209
ACPI: PCI Interrupt 0000:00:08.1 -> GSI 16 (level, low) -> IRQ 177
scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
        <Adaptec 29320 Ultra320 SCSI adapter>
        aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs

scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
        <Adaptec 29320 Ultra320 SCSI adapter>
        aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs

(scsi1:A:0): 160.000MB/s transfers (80.000MHz DT, 16bit)
(scsi1:A:1): 160.000MB/s transfers (80.000MHz DT, 16bit)
  Vendor: IBM       Model: DDYS-T18350M      Rev: SA2A
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi1:A:0:0: Tagged Queuing enabled.  Depth 4
SCSI device sda: 35843670 512-byte hdwr sectors (18352 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 35843670 512-byte hdwr sectors (18352 MB)
SCSI device sda: drive cache: write back
 sda: sda1 sda2
Attached scsi disk sda at scsi1, channel 0, id 0, lun 0
  Vendor: IBM       Model: DDYS-T18350M      Rev: SA2A
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi1:A:1:0: Tagged Queuing enabled.  Depth 4
SCSI device sdb: 35843670 512-byte hdwr sectors (18352 MB)
SCSI device sdb: drive cache: write back
SCSI device sdb: 35843670 512-byte hdwr sectors (18352 MB)
SCSI device sdb: drive cache: write back
 sdb: sdb1 sdb2
Attached scsi disk sdb at scsi1, channel 0, id 1, lun 0
device-mapper: 4.5.5-ioctl (2006-12-01) initialised: dm-devel@redhat.com
md: raid1 personality registered as nr 3
md: md1 stopped.
md: bind<sdb1>
md: bind<sda1>
raid1: raid set md1 active with 2 out of 3 mirrors
md: md2 stopped.
hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
md: bind<sdb2>
md: bind<sda2>
raid1: raid set md2 active with 2 out of 3 mirrors
cdrom: open failed.
hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
------
MagWm

Offline p-jones

  • *
  • 594
  • +0/-0
DegradedArray event: how to remove unused devices?
« Reply #1 on: August 02, 2007, 11:01:53 AM »
...

Offline magwm

  • *
  • 159
  • +0/-0
  • SmeLover
    • Gadis Tourist Service Italia SRL
thanks but..
« Reply #2 on: August 03, 2007, 12:05:18 PM »
I seem to have the 2 disks in sync already, the problem is that there seems to be a third non-existing 'removed' raid device which I would like to delete. the thread you mention doesn't help much with this, or so it seems to me..

ciao, Michel
MagWm

Offline p-jones

  • *
  • 594
  • +0/-0
DegradedArray event: how to remove unused devices?
« Reply #3 on: August 03, 2007, 02:06:54 PM »
Sorry it didnt help. I am short of additional suggestions other than to wonder if your drives have additional partitions on them possibly related to housekeeping or vendor system recovery. Is it a proprietary server ?
...

Offline p-jones

  • *
  • 594
  • +0/-0
DegradedArray event: how to remove unused devices?
« Reply #4 on: August 03, 2007, 02:16:08 PM »
That third device is probably your SCSI controller ID 7
...

Offline magwm

  • *
  • 159
  • +0/-0
  • SmeLover
    • Gadis Tourist Service Italia SRL
DegradedArray event: how to remove unused devices?
« Reply #5 on: August 03, 2007, 02:31:18 PM »
what is that, my scsi ID 7 ??

thanks, Michel
MagWm

Offline magwm

  • *
  • 159
  • +0/-0
  • SmeLover
    • Gadis Tourist Service Italia SRL
DegradedArray event: how to remove unused devices?
« Reply #6 on: August 03, 2007, 02:43:41 PM »
it is not a proprietary vendor, and there are no additional partitions on the disks that I know of... no there arent, just sda1/2 and sdb1/2. it is a self-assembled pc.  Anyways, its mdadm that displays this extra thing but doesn't show what is is, only says major 0 minor 0 ...

any help anyone?

BUT, most importantly, do you concur that the messages do NOT indicate a drive failure??

thanks, M
MagWm

Offline Reinhold

  • *
  • 517
  • +0/-0
    • http://127.0.0.1
DegradedArray event: how to remove unused devices?
« Reply #7 on: August 03, 2007, 05:03:04 PM »
Code: [Select]
md2 : active raid1 sda2[0] sdb2[2]
17816000 blocks [3/2] [U_U]

md1 : active raid1 sda1[1] sdb1[2]
104320 blocks [3/2] [_UU]


(as md devices - iow RAID)
You have TWO(2) harddisks:
sda   (IBM Model: DDYS-T18350M)(scsi1, channel 0, id 0, lun 0 )
sdb   (IBM Model: DDYS-T18350M)(scsi1, channel 0, id 1, lun 0 )

You have TWO (2) Raid devices generated (see below)
md1
md2

(... that's what SME usually generates)

BOTH work correctly - no need to worry.

B U T (as you correctly noted):
sdc1 is missing
sdc2 is missing

both are used as SPARE in SME ... so you (just) have "no spare"
(on a RAID1 with you confident about back-up I'd tend to say "so what" ;-)

For an explanation WHAT happened you would need to tell
How hdc (your IDE drive is partitioned)
HOW & WHEN you installed SME ... and the version no.
...  (but in reality it's mote)

- SME generates md1 for boot purposes as RAID1
(without tricks LINUX SOFTRAID (basically) cannot boot from RAID5)
- SME generates md2 for "the rest of the data" iow RAID1, RAID5 or RAID6
SME (the way it's partitioning) only can use SAME SIZE DISKS.

Quote
hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown

...seems to indicate you (physically) REMOVED that drive durch dmesg
(or you have an old HD that takes it's time to spin up...

BTW:
- The third device is hdc  ... an IDE device - so NOT on "S", SCSI, not your controller...

The messages DO NOT CONCUR THERE IS A DRIVE failure
- just a missing spare!

There was some fluke in installation SME (adding hdc)
I do not (for-)see an immediate operating problemhdc
Sometime later it could be wise to reinstall instead of upgrading (to SME 8.0 - whatever)

Regards
Reinhold

P.S.: Not need to remove a device that's "not there"
P.P.S.: You might want to inform Shad Lords (slords)( or open a BUG REPORT ) might well be a "quirk" in installation when hdc doesn't fit the (SAME SIZE) 'bill'
............

Offline magwm

  • *
  • 159
  • +0/-0
  • SmeLover
    • Gadis Tourist Service Italia SRL
DegradedArray event: how to remove unused devices?
« Reply #8 on: August 03, 2007, 06:12:02 PM »
wow thanks Reinhold that was exhaustive.

On this machine I installed SME 7.1 from CD and upgraded immediately to the newest version available.

I later added the IDE disk to serve as a large ibay. I didn't login via ssh as admin until much later, because it was already working fine mounted to

/dev/hdc1 on /home/e-smith/files/ibays/Primary/html/download type ext3 (rw)

Disk /dev/hdc: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdc1               1        9729    78148161   83  Linux

I am a bit afraid of those DMA error messages, maybe I have used an old IDE cable?

Is the configuration I use at all supported (SCSI boot+system, IDE (not-critical) data) ??

ciao, Michel

ps I will now try and raise a bug.. bear with me.. how do I notify  slords?
MagWm

Offline Reinhold

  • *
  • 517
  • +0/-0
    • http://127.0.0.1
DegradedArray event: how to remove unused devices?
« Reply #9 on: August 03, 2007, 07:42:02 PM »
Michel,

...seems to me 7.1 already does "autospare" in installation then...
(I haven't looked and been "away" for some time ;) )

THAT would be a feature ... and (most likely) SHAD (slords) has implemented it.
You can reach/ask him via  the Forum - he is (as you might have guessed) one of the chief developers...

Look into the SPECs for your hdc ... it may require 80pin cable.
(You did not give the brand and it's (as discussed) not in dmesg)
THAT btw IS a problem !

Mixing SCSI and IDE isn't a problem in SME (not standard but working quite well)
Be  aware that when your Adaptec dies ... your RAID1 isn't worth a penny.... both disks will be offline...

Regards
Reinhold
............

Offline magwm

  • *
  • 159
  • +0/-0
  • SmeLover
    • Gadis Tourist Service Italia SRL
DegradedArray event: how to remove unused devices?
« Reply #10 on: August 06, 2007, 11:22:43 AM »
Reinhold,

Hey Hey, what do you mean "when your adaptec dies.. your RAID isn't worth a penny" ??

I mean, If one of the disks breaks down, i'd buy 2 larger scsi disks, rebuild the disk with 1 of them, then fail the original disk, attach the second large disk, and then grow the raid... or??  

if the controller breaks down (unlikely as it has worked for some years now) I can attach the disks to another controller (happen to have one spare) and happily go on??? of course I'd be offline, but surely i can recover from the disks as they are not in hardware raid ??????

Just checked.. the cabling of hdc is a nice long 80pins.. I'm attaching the output of smartctl, just to show off.. well i checked with some googling, smart clearly tells it is an old drive and it's going to fail anytime. that's OK to me as it just holds a backup copy of online data.

btw this is /var/log/raidmonitor..   I don't understand the date format, but it sure talks a log about spares..

ciao and many thanks again, Michel

[root@www raidmonitor]# ll
total 12
-rw-r--r--  1 smelog smelog 3216 May 10 18:08 @40000000464c1b2d28315c2c.u
-rw-r--r--  1 smelog smelog 6993 Aug  3 15:03 current
-rw-------  1 smelog smelog    0 May  8 19:10 lock
-rw-r--r--  1 smelog smelog    0 May 17 11:06 state
[root@www raidmonitor]# cat current
@40000000464c1b300091f53c mdadm: only specify super-minor once, super-minor=2 ignored.
@40000000464c1b300093934c mdadm: only specify super-minor once, super-minor=1 ignored.
@40000000464c1b303840797c Event: DegradedArray, Device: /dev/md1, Member:
@40000000464c1b312e91516c Event: SparesMissing, Device: /dev/md1, Member:
@40000000464c1b3138ac56c4 Event: DegradedArray, Device: /dev/md2, Member:
@40000000464c1b321aa1e8c4 Event: SparesMissing, Device: /dev/md2, Member:
@4000000046515e1e090b8764 Event: DegradedArray, Device: /dev/md1, Member:
@4000000046515e1e1fac2d5c Event: SparesMissing, Device: /dev/md1, Member:
@4000000046515e1e2e4563ec Event: DegradedArray, Device: /dev/md2, Member:
@4000000046515e1f01baf094 Event: SparesMissing, Device: /dev/md2, Member:
@4000000046518dc506295abc Event: DegradedArray, Device: /dev/md1, Member:
@4000000046518dc51982a99c Event: SparesMissing, Device: /dev/md1, Member:
@4000000046518dc523e110f4 Event: DegradedArray, Device: /dev/md2, Member:
@4000000046518dc5323c021c Event: SparesMissing, Device: /dev/md2, Member:
@400000004651d1010e4f7d84 Event: DegradedArray, Device: /dev/md1, Member:
@400000004651d10202c2d72c Event: SparesMissing, Device: /dev/md1, Member:
@400000004651d1020d66521c Event: DegradedArray, Device: /dev/md2, Member:
@400000004651d1021d660a04 Event: SparesMissing, Device: /dev/md2, Member:
@400000004651da782c939154 Event: DegradedArray, Device: /dev/md1, Member:
@400000004651da790b61ac64 Event: SparesMissing, Device: /dev/md1, Member:
@400000004651da7915a7b18c Event: DegradedArray, Device: /dev/md2, Member:
@400000004651da79261e4724 Event: SparesMissing, Device: /dev/md2, Member:
@40000000465577d5218444d4 Event: DegradedArray, Device: /dev/md1, Member:
@40000000465577d600ee9044 Event: SparesMissing, Device: /dev/md1, Member:
@40000000465577d60ad546fc Event: DegradedArray, Device: /dev/md2, Member:
@40000000465577d61907862c Event: SparesMissing, Device: /dev/md2, Member:
@4000000046558468180cbb0c Event: DegradedArray, Device: /dev/md1, Member:
@40000000465584682ba97f74 Event: SparesMissing, Device: /dev/md1, Member:
@4000000046558468357eb99c Event: DegradedArray, Device: /dev/md2, Member:
@400000004655846905faff8c Event: SparesMissing, Device: /dev/md2, Member:
@4000000046558ba438744f04 Event: DegradedArray, Device: /dev/md1, Member:
@4000000046558ba52bd7ceec Event: SparesMissing, Device: /dev/md1, Member:
@4000000046558ba535ef55bc Event: DegradedArray, Device: /dev/md2, Member:
@4000000046558ba609c6c8e4 Event: SparesMissing, Device: /dev/md2, Member:
@400000004655952028d49d24 Event: DegradedArray, Device: /dev/md1, Member:
@40000000465595211d747954 Event: SparesMissing, Device: /dev/md1, Member:
@400000004655952127f95c3c Event: DegradedArray, Device: /dev/md2, Member:
@4000000046559522030f4d64 Event: SparesMissing, Device: /dev/md2, Member:
@40000000465599a40e28cb44 Event: DegradedArray, Device: /dev/md1, Member:
@40000000465599a438edad54 Event: SparesMissing, Device: /dev/md1, Member:
@40000000465599a50a9918bc Event: DegradedArray, Device: /dev/md2, Member:
@40000000465599a5184f32ac Event: SparesMissing, Device: /dev/md2, Member:
@4000000046559d2b06b3d7dc Event: DegradedArray, Device: /dev/md1, Member:
@4000000046559d2c0393850c Event: SparesMissing, Device: /dev/md1, Member:
@4000000046559d2c1012ccac Event: DegradedArray, Device: /dev/md2, Member:
@4000000046559d2c1e7d3aac Event: SparesMissing, Device: /dev/md2, Member:
@400000004655a18815fcbefc Event: DegradedArray, Device: /dev/md1, Member:
@400000004655a18900906ab4 Event: SparesMissing, Device: /dev/md1, Member:
@400000004655a1890d9db1e4 Event: DegradedArray, Device: /dev/md2, Member:
@400000004655a1891c4fbbec Event: SparesMissing, Device: /dev/md2, Member:
@40000000466d605637a66bfc Event: DegradedArray, Device: /dev/md1, Member:
@40000000466d605711e398f4 Event: SparesMissing, Device: /dev/md1, Member:
@40000000466d60571be096cc Event: DegradedArray, Device: /dev/md2, Member:
@40000000466d60572b037b74 Event: SparesMissing, Device: /dev/md2, Member:
@400000004676870811cf1adc Event: DegradedArray, Device: /dev/md1, Member:
@400000004676870824be8c2c Event: SparesMissing, Device: /dev/md1, Member:
@40000000467687082e9b4c1c Event: DegradedArray, Device: /dev/md2, Member:
@400000004676870900cc3f1c Event: SparesMissing, Device: /dev/md2, Member:
@400000004676a48f18df2254 Event: RebuildStarted, Device: /dev/md2, Member:
@400000004676a50723b1330c Event: Rebuild20, Device: /dev/md2, Member:
@400000004676a5bb3300e85c Event: Rebuild40, Device: /dev/md2, Member:
@400000004676a63408d596cc Event: Rebuild60, Device: /dev/md2, Member:
@400000004676a6e81b3e8e54 Event: Rebuild80, Device: /dev/md2, Member:
@400000004676a79c2c2137e4 Event: RebuildFinished, Device: /dev/md2, Member:
@400000004676a79c39c5e0fc Event: SpareActive, Device: /dev/md2, Member: /dev/sda2
@400000004683a0fb3a7ad14c Event: DegradedArray, Device: /dev/md1, Member:
@400000004683a0fc12f318dc Event: SparesMissing, Device: /dev/md1, Member:
@400000004683a0fc1cd94eac Event: DegradedArray, Device: /dev/md2, Member:
@400000004683a0fc3023ebd4 Event: SparesMissing, Device: /dev/md2, Member:
@4000000046a4a22421794084 Event: DegradedArray, Device: /dev/md1, Member:
@4000000046a4a225042dc16c Event: SparesMissing, Device: /dev/md1, Member:
@4000000046a4a2250f42363c Event: DegradedArray, Device: /dev/md2, Member:
@4000000046a4a2252347f98c Event: SparesMissing, Device: /dev/md2, Member:
@4000000046a4b9112138efac Event: DegradedArray, Device: /dev/md1, Member:
@4000000046a4b91207c4bb3c Event: SparesMissing, Device: /dev/md1, Member:
@4000000046a4b91212c853a4 Event: DegradedArray, Device: /dev/md2, Member:
@4000000046a4b91227bc10d4 Event: SparesMissing, Device: /dev/md2, Member:
@4000000046a5f42728e7ee74 Event: DegradedArray, Device: /dev/md1, Member:
@4000000046a5f4280d2dbdbc Event: SparesMissing, Device: /dev/md1, Member:
@4000000046a5f428176f3ea4 Event: DegradedArray, Device: /dev/md2, Member:
@4000000046a5f42828d8cb74 Event: SparesMissing, Device: /dev/md2, Member:
@4000000046a8549b0b98ba24 Event: DegradedArray, Device: /dev/md1, Member:
@4000000046a8549b31c42684 Event: SparesMissing, Device: /dev/md1, Member:
@4000000046a8549c02e280a4 Event: DegradedArray, Device: /dev/md2, Member:
@4000000046a8549c19a62b74 Event: SparesMissing, Device: /dev/md2, Member:
@4000000046ad98d211a07d44 Event: DegradedArray, Device: /dev/md1, Member:
@4000000046ad98d22c4146ec Event: SparesMissing, Device: /dev/md1, Member:
@4000000046ad98d2366bd8bc Event: DegradedArray, Device: /dev/md2, Member:
@4000000046ad98d30dbb53d4 Event: SparesMissing, Device: /dev/md2, Member:
@4000000046b3279b27a6eeac Event: DegradedArray, Device: /dev/md1, Member:
@4000000046b3279c1dcefd34 Event: SparesMissing, Device: /dev/md1, Member:
@4000000046b3279c28adeae4 Event: DegradedArray, Device: /dev/md2, Member:
@4000000046b3279d03924c8c Event: SparesMissing, Device: /dev/md2, Member:



[root@www raidmonitor]# smartctl -a /dev/hdc
smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     MAXTOR STM3802110A
Serial Number:    9LR0SQXE
Firmware Version: 3.AAK
User Capacity:    80,026,361,856 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon Aug  6 11:07:46 2007 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  27) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   074   006    Pre-fail  Always       -       118666658
  3 Spin_Up_Time            0x0003   094   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       3
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   079   060   030    Pre-fail  Always       -       96280985
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1447
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       8
187 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
189 Unknown_Attribute       0x003a   100   100   000    Old_age   Always       -       0
190 Unknown_Attribute       0x0022   060   056   045    Old_age   Always       -       740163624
194 Temperature_Celsius     0x0022   040   044   000    Old_age   Always       -       40 (Lifetime Min/Max 0/27)
195 Hardware_ECC_Recovered  0x001a   050   046   000    Old_age   Always       -       3228972
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   187   000    Old_age   Always       -       120
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 167 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 167 occurred at disk power-on lifetime: 1443 hours (60 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 46 00 e0 e0  Error: ICRC, ABRT at LBA = 0x00e00046 = 14680134

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 80 08 00 00 e0 00      08:21:45.669  READ DMA EXT
  25 00 80 08 00 00 e0 00      08:21:45.669  READ DMA EXT
  10 00 3f 00 00 00 e0 00      08:21:45.669  RECALIBRATE [OBS-4]
  25 00 80 08 00 00 e0 00      08:21:45.669  READ DMA EXT
  25 00 80 08 00 00 e0 00      08:21:45.642  READ DMA EXT

Error 166 occurred at disk power-on lifetime: 1443 hours (60 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 46 00 e0 e0  Error: ICRC, ABRT at LBA = 0x00e00046 = 14680134

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 80 08 00 00 e0 00      08:21:45.669  READ DMA EXT
  10 00 3f 00 00 00 e0 00      08:21:45.669  RECALIBRATE [OBS-4]
  25 00 80 08 00 00 e0 00      08:21:45.669  READ DMA EXT
  25 00 80 08 00 00 e0 00      08:21:45.669  READ DMA EXT
  ea 00 00 00 00 00 e0 00      08:21:45.642  FLUSH CACHE EXIT

Error 165 occurred at disk power-on lifetime: 1443 hours (60 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 46 00 e0 e0  Error: ICRC, ABRT at LBA = 0x00e00046 = 14680134

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 80 08 00 00 e0 00      08:21:45.669  READ DMA EXT
  25 00 80 08 00 00 e0 00      08:21:45.669  READ DMA EXT
  ea 00 00 00 00 00 e0 00      08:21:45.669  FLUSH CACHE EXIT
  25 00 80 a8 f7 50 e0 00      08:21:45.669  READ DMA EXT
  ea 00 00 00 00 00 e0 00      08:21:45.642  FLUSH CACHE EXIT

Error 164 occurred at disk power-on lifetime: 1443 hours (60 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 46 00 e0 e0  Error: ICRC, ABRT at LBA = 0x00e00046 = 14680134

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 80 08 00 00 e0 00      08:21:45.669  READ DMA EXT
  ea 00 00 00 00 00 e0 00      08:21:45.669  FLUSH CACHE EXIT
  25 00 80 a8 f7 50 e0 00      08:21:45.669  READ DMA EXT
  ea 00 00 00 00 00 e0 00      08:21:45.669  FLUSH CACHE EXIT
  25 00 08 a8 f8 50 e0 00      08:21:45.642  READ DMA EXT

Error 163 occurred at disk power-on lifetime: 1443 hours (60 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 46 00 e0 e0  Error: ICRC, ABRT at LBA = 0x00e00046 = 14680134

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 80 3f e4 50 e0 00      08:20:48.850  READ DMA EXT
  25 00 80 3f e4 50 e0 00      08:20:48.400  READ DMA EXT
  ea 00 00 00 00 00 e0 00      08:20:48.399  FLUSH CACHE EXIT
  25 00 08 00 00 00 e0 00      08:20:48.399  READ DMA EXT
  25 00 08 00 00 00 e0 00      08:20:48.330  READ DMA EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      1447         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
MagWm

Offline william_syd

  • *****
  • 1,608
  • +0/-0
  • Nothing to see here.
    • http://www.magicwilly.info
DegradedArray event: how to remove unused devices?
« Reply #11 on: August 06, 2007, 01:49:11 PM »
Quote from: "magwm"

btw this is /var/log/raidmonitor..   I don't understand the date format, but it sure talks a log about spares..



Code: [Select]
2007-05-17 19:06:46.009565500 mdadm: only specify super-minor once, super-minor=2 ignored.
2007-05-17 19:06:46.009671500 mdadm: only specify super-minor once, super-minor=1 ignored.
2007-05-17 19:06:46.943749500 Event: DegradedArray, Device: /dev/md1, Member:
2007-05-17 19:06:47.781275500 Event: SparesMissing, Device: /dev/md1, Member:
2007-05-17 19:06:47.950818500 Event: DegradedArray, Device: /dev/md2, Member:
2007-05-17 19:06:48.446818500 Event: SparesMissing, Device: /dev/md2, Member:
2007-05-21 18:53:40.151750500 Event: DegradedArray, Device: /dev/md1, Member:
2007-05-21 18:53:40.531377500 Event: SparesMissing, Device: /dev/md1, Member:
2007-05-21 18:53:40.776299500 Event: DegradedArray, Device: /dev/md2, Member:
2007-05-21 18:53:41.029028500 Event: SparesMissing, Device: /dev/md2, Member:
2007-05-21 22:16:59.103373500 Event: DegradedArray, Device: /dev/md1, Member:
2007-05-21 22:16:59.427993500 Event: SparesMissing, Device: /dev/md1, Member:
2007-05-21 22:16:59.601952500 Event: DegradedArray, Device: /dev/md2, Member:
2007-05-21 22:16:59.842793500 Event: SparesMissing, Device: /dev/md2, Member:
2007-05-22 03:03:51.240090500 Event: DegradedArray, Device: /dev/md1, Member:
2007-05-22 03:03:52.046323500 Event: SparesMissing, Device: /dev/md1, Member:
2007-05-22 03:03:52.224809500 Event: DegradedArray, Device: /dev/md2, Member:
2007-05-22 03:03:52.493226500 Event: SparesMissing, Device: /dev/md2, Member:
2007-05-22 03:44:14.747868500 Event: DegradedArray, Device: /dev/md1, Member:
2007-05-22 03:44:15.190950500 Event: SparesMissing, Device: /dev/md1, Member:
2007-05-22 03:44:15.363311500 Event: DegradedArray, Device: /dev/md2, Member:
2007-05-22 03:44:15.639518500 Event: SparesMissing, Device: /dev/md2, Member:
2007-05-24 21:32:27.562316500 Event: DegradedArray, Device: /dev/md1, Member:
2007-05-24 21:32:28.015634500 Event: SparesMissing, Device: /dev/md1, Member:
2007-05-24 21:32:28.181749500 Event: DegradedArray, Device: /dev/md2, Member:
2007-05-24 21:32:28.419923500 Event: SparesMissing, Device: /dev/md2, Member:
2007-05-24 22:26:06.403487500 Event: DegradedArray, Device: /dev/md1, Member:
2007-05-24 22:26:06.732528500 Event: SparesMissing, Device: /dev/md1, Member:
2007-05-24 22:26:06.897497500 Event: DegradedArray, Device: /dev/md2, Member:
2007-05-24 22:26:07.100335500 Event: SparesMissing, Device: /dev/md2, Member:
2007-05-24 22:56:58.947146500 Event: DegradedArray, Device: /dev/md1, Member:
2007-05-24 22:56:59.735563500 Event: SparesMissing, Device: /dev/md1, Member:
2007-05-24 22:56:59.904877500 Event: DegradedArray, Device: /dev/md2, Member:
2007-05-24 22:57:00.164022500 Event: SparesMissing, Device: /dev/md2, Member:
2007-05-24 23:37:26.685022500 Event: DegradedArray, Device: /dev/md1, Member:
2007-05-24 23:37:27.494172500 Event: SparesMissing, Device: /dev/md1, Member:
2007-05-24 23:37:27.670653500 Event: DegradedArray, Device: /dev/md2, Member:
2007-05-24 23:37:28.051334500 Event: SparesMissing, Device: /dev/md2, Member:
2007-05-24 23:56:42.237554500 Event: DegradedArray, Device: /dev/md1, Member:
2007-05-24 23:56:42.955100500 Event: SparesMissing, Device: /dev/md1, Member:
2007-05-24 23:56:43.177805500 Event: DegradedArray, Device: /dev/md2, Member:
2007-05-24 23:56:43.407843500 Event: SparesMissing, Device: /dev/md2, Member:
2007-05-25 00:11:45.112449500 Event: DegradedArray, Device: /dev/md1, Member:
2007-05-25 00:11:46.059999500 Event: SparesMissing, Device: /dev/md1, Member:
2007-05-25 00:11:46.269667500 Event: DegradedArray, Device: /dev/md2, Member:
2007-05-25 00:11:46.511523500 Event: SparesMissing, Device: /dev/md2, Member:
2007-05-25 00:30:22.368885500 Event: DegradedArray, Device: /dev/md1, Member:
2007-05-25 00:30:23.009464500 Event: SparesMissing, Device: /dev/md1, Member:
2007-05-25 00:30:23.228438500 Event: DegradedArray, Device: /dev/md2, Member:
2007-05-25 00:30:23.474987500 Event: SparesMissing, Device: /dev/md2, Member:
2007-06-12 00:46:36.933653500 Event: DegradedArray, Device: /dev/md1, Member:
2007-06-12 00:46:37.300128500 Event: SparesMissing, Device: /dev/md1, Member:
2007-06-12 00:46:37.467703500 Event: DegradedArray, Device: /dev/md2, Member:
2007-06-12 00:46:37.721648500 Event: SparesMissing, Device: /dev/md2, Member:
2007-06-18 23:22:06.298785500 Event: DegradedArray, Device: /dev/md1, Member:
2007-06-18 23:22:06.616467500 Event: SparesMissing, Device: /dev/md1, Member:
2007-06-18 23:22:06.781929500 Event: DegradedArray, Device: /dev/md2, Member:
2007-06-18 23:22:07.013385500 Event: SparesMissing, Device: /dev/md2, Member:
2007-06-19 01:28:05.417276500 Event: RebuildStarted, Device: /dev/md2, Member:
2007-06-19 01:30:05.598815500 Event: Rebuild20, Device: /dev/md2, Member:
2007-06-19 01:33:05.855697500 Event: Rebuild40, Device: /dev/md2, Member:
2007-06-19 01:35:06.148215500 Event: Rebuild60, Device: /dev/md2, Member:
2007-06-19 01:38:06.457084500 Event: Rebuild80, Device: /dev/md2, Member:
2007-06-19 01:41:06.740374500 Event: RebuildFinished, Device: /dev/md2, Member:
2007-06-19 01:41:06.969269500 Event: SpareActive, Device: /dev/md2, Member: /dev/sda2
2007-06-28 21:52:17.981127500 Event: DegradedArray, Device: /dev/md1, Member:
2007-06-28 21:52:18.317921500 Event: SparesMissing, Device: /dev/md1, Member:
2007-06-28 21:52:18.484003500 Event: DegradedArray, Device: /dev/md2, Member:
2007-06-28 21:52:18.807660500 Event: SparesMissing, Device: /dev/md2, Member:
2007-07-23 22:42:02.561594500 Event: DegradedArray, Device: /dev/md1, Member:
2007-07-23 22:42:03.070107500 Event: SparesMissing, Device: /dev/md1, Member:
2007-07-23 22:42:03.255997500 Event: DegradedArray, Device: /dev/md2, Member:
2007-07-23 22:42:03.591919500 Event: SparesMissing, Device: /dev/md2, Member:
2007-07-24 00:19:51.557379500 Event: DegradedArray, Device: /dev/md1, Member:
2007-07-24 00:19:52.130333500 Event: SparesMissing, Device: /dev/md1, Member:
2007-07-24 00:19:52.315118500 Event: DegradedArray, Device: /dev/md2, Member:
2007-07-24 00:19:52.666636500 Event: SparesMissing, Device: /dev/md2, Member:
2007-07-24 22:44:13.686288500 Event: DegradedArray, Device: /dev/md1, Member:
2007-07-24 22:44:14.221101500 Event: SparesMissing, Device: /dev/md1, Member:
2007-07-24 22:44:14.393166500 Event: DegradedArray, Device: /dev/md2, Member:
2007-07-24 22:44:14.685296500 Event: SparesMissing, Device: /dev/md2, Member:
2007-07-26 18:00:17.194558500 Event: DegradedArray, Device: /dev/md1, Member:
2007-07-26 18:00:17.834938500 Event: SparesMissing, Device: /dev/md1, Member:
2007-07-26 18:00:18.048398500 Event: DegradedArray, Device: /dev/md2, Member:
2007-07-26 18:00:18.430320500 Event: SparesMissing, Device: /dev/md2, Member:
2007-07-30 17:52:40.295730500 Event: DegradedArray, Device: /dev/md1, Member:
2007-07-30 17:52:40.742475500 Event: SparesMissing, Device: /dev/md1, Member:
2007-07-30 17:52:40.913037500 Event: DegradedArray, Device: /dev/md2, Member:
2007-07-30 17:52:41.230380500 Event: SparesMissing, Device: /dev/md2, Member:
2007-08-03 23:03:13.665251500 Event: DegradedArray, Device: /dev/md1, Member:
2007-08-03 23:03:14.500104500 Event: SparesMissing, Device: /dev/md1, Member:
2007-08-03 23:03:14.682486500 Event: DegradedArray, Device: /dev/md2, Member:
2007-08-03 23:03:15.059919500 Event: SparesMissing, Device: /dev/md2, Member:


You need to pipe it through tai65nlocal to get local time.

Code: [Select]
cat /var/log/raidmonitor/current | tai64nlocal
Regards,
William

IF I give advise.. It's only if it was me....

Offline Reinhold

  • *
  • 517
  • +0/-0
    • http://127.0.0.1
DegradedArray event: how to remove unused devices?
« Reply #12 on: August 07, 2007, 12:39:30 AM »
Quote
if the controller breaks down (unlikely as it has worked for some years now) I can attach the disks to another controller (happen to have one spare) and happily go on??? of course I'd be offline, but surely i can recover from the disks as they are not in hardware raid ??????


Believe me "Controllers DO die"  :cry:   ... especially older ones - those out of production   :roll:

...and without a spare controller ... that's it...
...which is what most advocates of Hardware Raid DO like to overlook.

YOU on the other hand ARE SAFE (hopefully) :D... at least correct in what you say above.
"grow" btw is used to add a disk to an existing array (hard to do in current SME)
"resize" is what you are referring to (and don't forget the lvm on top)

Your old Maxtor seems limping quite badly ... it will possibly "hang" it's slave (install CD?!) sometimes too...
You may want DMA off...

Regards
Reinhold

P.S.: "unlikely as it has worked for some years now"  ...does make me smile...
Your belief that a device that hasn't failed in a long time will not fail in future ...is a very human type view
... it clearly contradicts electronic-parts-lifetime-reality   :D  :D  :D [/url]
cheers
............

Offline NickR

  • *
  • 283
  • +0/-0
    • http://www.witzendcs.co.uk/
DegradedArray event: how to remove unused devices?
« Reply #13 on: August 07, 2007, 11:55:15 AM »
Quote from: "Reinhold"
Believe me "Controllers DO die"  :cry:   ... especially older ones - those out of production   :roll:

...and without a spare controller ... that's it...
...which is what most advocates of Hardware Raid DO like to overlook.


There's nothing magical about hardware RAID-1 - it's only offloading the duplication of data between 2 disks from a CPU process to onboard firmware.

You can easily move disks to a new controller either singly or in pairs. So long as the SME supports the replacement controller there's absolutely no problem.  I've done it several times, going right back to E-Smith 5.x

Now, if it was hardware RAID-5, I'd completely agree  :wink:
--
Nick......

Offline Reinhold

  • *
  • 517
  • +0/-0
    • http://127.0.0.1
DegradedArray event: how to remove unused devices?
« Reply #14 on: August 11, 2007, 08:26:38 PM »
Quote from: "NickR"

Now, if it was hardware RAID-5, I'd completely agree  :wink:


So let us agree on that ;-)

Regards
Reinhold
...who will no longer take out his hardware-paranoia into the public domain  :roll:
............