Koozali.org: home of the SME Server

DegradedArray event - Notifications

Offline davidpfox

  • *
  • 14
  • +0/-0
DegradedArray event - Notifications
« on: October 18, 2009, 12:47:38 AM »
Hi All ....... I had my first raid controller issue across the weekend where I received approx 3 separate email notifications 5min apart. From what I read through doco, I would of expected to receive a "Everything is now ok" email if the raid controller resolved itself, but also why am I not still receiving the notifications? Does it only get 'tripped' when it gets to that part of the HDD?

Thanks

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: DegradedArray event - Notifications
« Reply #1 on: October 18, 2009, 12:50:52 AM »
hi

do you really have a raid controller? or are you using SME's sw raid?

please post the result of
Code: [Select]
cat /proc/mdstat

thank you

Offline davidpfox

  • *
  • 14
  • +0/-0
Re: DegradedArray event - Notifications
« Reply #2 on: October 18, 2009, 01:21:08 AM »
SME SW

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: DegradedArray event - Notifications
« Reply #3 on: October 18, 2009, 01:22:46 AM »
and the output? :-)

Offline davidpfox

  • *
  • 14
  • +0/-0
Re: DegradedArray event - Notifications
« Reply #4 on: October 18, 2009, 02:00:07 AM »
lol ..... sorry ....

Personalities : [raid1]
md2 : active raid1 sdb2[1]
      156183808 blocks [2/1] [_U]
     
md1 : active raid1 sda1[0] sdb1[1]
      104320 blocks [2/2] [UU]
     
unused devices: <none>


and the output? :-)

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: DegradedArray event - Notifications
« Reply #5 on: October 18, 2009, 08:51:20 AM »
your sdda disk is gone... change it asap

Offline davidpfox

  • *
  • 14
  • +0/-0
Re: DegradedArray event - Notifications
« Reply #6 on: October 18, 2009, 09:18:25 AM »
your sdda disk is gone... change it asap

Hi Stefano ...... are you able to give me a '101' lesson on all sdda, mdda etc. that are used or should be used?

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: DegradedArray event - Notifications
« Reply #7 on: October 18, 2009, 09:57:27 AM »
mmmhh... first of alll it's 'sda', my bad ;-)

what you ask is a bit complex, so you should learn something about linux hd's naming rules (search with google)

then you should read this wiki page..

if you need to identify your sda disk (assuming you have 2 identical hds), you can recover s/n with
Code: [Select]
smartctl -i /dev/sda
smartctl -i /dev/sdb

hth

Offline davidpfox

  • *
  • 14
  • +0/-0
Re: DegradedArray event - Notifications
« Reply #8 on: October 18, 2009, 10:07:12 AM »
I am just reviewing that wiki article now - thanks for the heads up.

So I can see that sdb is the troublesome drive, and form this wiki article it tells me to go ahead with a 'smartctl -i /dev/sdb' command. When I do this I get a 'Smartctl: please specify device type with the -d option'.

Looks like I need to get to know smartctl a bit more.

Thanks again for your help!!

Offline sekt

  • 4
  • +0/-0
Re: DegradedArray event - Notifications
« Reply #9 on: October 18, 2009, 01:03:15 PM »
I use skype, contact me at 'svenn-erik'

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: DegradedArray event - Notifications
« Reply #10 on: October 18, 2009, 01:41:29 PM »
So I can see that sdb is the troublesome drive,

biiiiiiip.. wrong guess.. your faulty hd is sda :)

Quote
and form this wiki article it tells me to go ahead with a 'smartctl -i /dev/sdb' command. When I do this I get a 'Smartctl: please specify device type with the -d option'.

strange.. could you please describe a little your hw?
- what kind of hd are you using?
- mb?
- chipset/controller?

please take the time to make a backup of your data.. :wink:

Offline davidpfox

  • *
  • 14
  • +0/-0
Re: DegradedArray event - Notifications
« Reply #11 on: October 18, 2009, 01:52:31 PM »
Thanks Stefano ..... I will be onsite with this client tomorrow (9 hours from now) so will be able to get some specs then - unless you know of anything I can use to pull this information from command-line etc.?

I was about to reply to all posts after some digging around - see below

*********

After doing a lot of reading of documentation and others experiences, I ended up executing 'mdadm --add /dev/md2 /dev/sda2'. The rebuild looks pretty good as I can now run 'cat /proc/mdstat':

Personalities : [raid1]
md2 : active raid1 sda2[0] sdb2[1]
      156183808 blocks [2/2] [UU]
     
md1 : active raid1 sda1[0] sdb1[1]
      104320 blocks [2/2] [UU]
     
unused devices: <none>

Also, when I access Server Console and check 'Manage disk redundancy', I see that all is in working order.

So it looks like the rebuild was successfull and my sda2 is back online. Should I be looking at my sdb drive for errors?


Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: DegradedArray event - Notifications
« Reply #12 on: October 18, 2009, 01:57:13 PM »
ok.. so your raid arrays are ok..

I would check /var/log/messages for errors to discover why sda was thrown out of the array..

about CLI commands to investigate your hw configuration:
- lspci (eventually with -v flag)
- dmidecode

HTH

Offline davidpfox

  • *
  • 14
  • +0/-0
Re: DegradedArray event - Notifications
« Reply #13 on: October 18, 2009, 02:12:21 PM »
ok.. so your raid arrays are ok..

I would check /var/log/messages for errors to discover why sda was thrown out of the array..
Feel like reviewing these with me?

Oct 17 09:58:26 sme-server kernel: SCSI device sda: 312579695 512-byte hdwr sectors (160041 MB)
Oct 17 09:58:26 sme-server kernel: SCSI device sda: drive cache: write back
Oct 17 09:58:26 sme-server kernel: SCSI device sda: 312579695 512-byte hdwr sectors (160041 MB)
Oct 17 09:58:26 sme-server kernel: SCSI device sda: drive cache: write back
Oct 17 09:58:26 sme-server kernel:  sda: sda1 sda2
Oct 17 09:58:26 sme-server kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Oct 17 09:58:26 sme-server kernel: md: bind<sda1>
Oct 17 09:58:26 sme-server kernel: md: bind<sda2>
Oct 17 09:58:26 sme-server kernel: md: kicking non-fresh sda2 from array!
Oct 17 09:58:26 sme-server kernel: md: unbind<sda2>
Oct 17 09:58:26 sme-server kernel: md: export_rdev(sda2)
Oct 17 09:58:27 sme-server kernel: md: could not bd_claim sda1.
Oct 17 09:58:27 sme-server kernel: md: could not bd_claim sda2.
Oct 17 09:58:27 sme-server kernel: md: considering sda2 ...
Oct 17 09:58:27 sme-server kernel: md:  adding sda2 ...
Oct 17 09:58:27 sme-server kernel: md: md2 already running, cannot run sda2
Oct 17 09:58:27 sme-server kernel: md: export_rdev(sda2)

about CLI commands to investigate your hw configuration:
- lspci (eventually with -v flag)
- dmidecode
lspci
00:00.0 Host bridge: Intel Corporation 82945G/GZ/P/PL Memory Controller Hub (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82945G/GZ Integrated Graphics Controller (rev 02)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01)
00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family) SATA IDE Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
02:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
02:07.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)

I did not include output from dmidecode as this is quite lengthy .....  :P

Offline piran

  • *****
  • 502
  • +0/-0
Re: DegradedArray event - Notifications
« Reply #14 on: October 19, 2009, 01:31:30 PM »
Water under the bridge I know but it may have been your
sdb (not sda) that was the more likely one to be suspect?
http://wiki.contribs.org/Raid#Resynchronising_a_Failed_RAID
Code: [Select]
(wiki extract)
md2 : active raid1 hdb2[1]        <-- missing partition   
     1048704 blocks [2/1] [_U]    <-- failed

Personalities : [raid1]
md2 : active raid1 sdb2[1]             <-- missing partition 
      156183808 blocks [2/1] [_U]    <-- failed
     
md1 : active raid1 sda1[0] sdb1[1]
      104320 blocks [2/2] [UU]