Koozali.org: home of the SME Server

Obsolete Releases => SME Server 7.x => Topic started by: davidpfox on October 18, 2009, 12:47:38 AM

Title: DegradedArray event - Notifications
Post by: davidpfox on October 18, 2009, 12:47:38 AM: Hi All ....... I had my first raid controller issue across the weekend where I received approx 3 separate email notifications 5min apart. From what I read through doco, I would of expected to receive a "Everything is now ok" email if the raid controller resolved itself, but also why am I not still receiving the notifications? Does it only get 'tripped' when it gets to that part of the HDD?

Thanks
Title: Re: DegradedArray event - Notifications
Post by: Stefano on October 18, 2009, 12:50:52 AM: hi

do you really have a raid controller? or are you using SME's sw raid?

please post the result of
Code: [Select]
cat /proc/mdstat
thank you
Title: Re: DegradedArray event - Notifications
Post by: davidpfox on October 18, 2009, 01:21:08 AM: SME SW
Title: Re: DegradedArray event - Notifications
Post by: Stefano on October 18, 2009, 01:22:46 AM: and the output? :-)
Title: Re: DegradedArray event - Notifications
Post by: davidpfox on October 18, 2009, 02:00:07 AM: lol ..... sorry ....

Personalities : [raid1]
md2 : active raid1 sdb2[1]
156183808 blocks [2/1] [_U]

md1 : active raid1 sda1[0] sdb1[1]
104320 blocks [2/2] [UU]

unused devices: <none>

Quote from: Stefano on October 18, 2009, 01:22:46 AM
and the output? :-)
Title: Re: DegradedArray event - Notifications
Post by: Stefano on October 18, 2009, 08:51:20 AM: your sdda disk is gone... change it asap
Title: Re: DegradedArray event - Notifications
Post by: davidpfox on October 18, 2009, 09:18:25 AM: Quote from: Stefano on October 18, 2009, 08:51:20 AM
your sdda disk is gone... change it asap

Hi Stefano ...... are you able to give me a '101' lesson on all sdda, mdda etc. that are used or should be used?
Title: Re: DegradedArray event - Notifications
Post by: Stefano on October 18, 2009, 09:57:27 AM: mmmhh... first of alll it's 'sda', my bad ;-)

what you ask is a bit complex, so you should learn something about linux hd's naming rules (search with google)

then you should read this (http://wiki.contribs.org/Raid) wiki page..

if you need to identify your sda disk (assuming you have 2 identical hds), you can recover s/n with
Code: [Select]
smartctl -i /dev/sda smartctl -i /dev/sdb
hth
Title: Re: DegradedArray event - Notifications
Post by: davidpfox on October 18, 2009, 10:07:12 AM: I am just reviewing that wiki article now - thanks for the heads up.

So I can see that sdb is the troublesome drive, and form this wiki article it tells me to go ahead with a 'smartctl -i /dev/sdb' command. When I do this I get a 'Smartctl: please specify device type with the -d option'.

Looks like I need to get to know smartctl a bit more.

Thanks again for your help!!
Title: Re: DegradedArray event - Notifications
Post by: sekt on October 18, 2009, 01:03:15 PM: try this link http://wiki.contribs.org/Raid
Title: Re: DegradedArray event - Notifications
Post by: Stefano on October 18, 2009, 01:41:29 PM: Quote from: davidpfox on October 18, 2009, 10:07:12 AM
So I can see that sdb is the troublesome drive,

biiiiiiip.. wrong guess.. your faulty hd is sda :)

Quote
and form this wiki article it tells me to go ahead with a 'smartctl -i /dev/sdb' command. When I do this I get a 'Smartctl: please specify device type with the -d option'.

strange.. could you please describe a little your hw?
- what kind of hd are you using?
- mb?
- chipset/controller?

please take the time to make a backup of your data.. :wink:
Title: Re: DegradedArray event - Notifications
Post by: davidpfox on October 18, 2009, 01:52:31 PM: Thanks Stefano ..... I will be onsite with this client tomorrow (9 hours from now) so will be able to get some specs then - unless you know of anything I can use to pull this information from command-line etc.?

I was about to reply to all posts after some digging around - see below

*********

After doing a lot of reading of documentation and others experiences, I ended up executing 'mdadm --add /dev/md2 /dev/sda2'. The rebuild looks pretty good as I can now run 'cat /proc/mdstat':

Personalities : [raid1]
md2 : active raid1 sda2[0] sdb2[1]
156183808 blocks [2/2] [UU]

md1 : active raid1 sda1[0] sdb1[1]
104320 blocks [2/2] [UU]

unused devices: <none>

Also, when I access Server Console and check 'Manage disk redundancy', I see that all is in working order.

So it looks like the rebuild was successfull and my sda2 is back online. Should I be looking at my sdb drive for errors?
Title: Re: DegradedArray event - Notifications
Post by: Stefano on October 18, 2009, 01:57:13 PM: ok.. so your raid arrays are ok..

I would check /var/log/messages for errors to discover why sda was thrown out of the array..

about CLI commands to investigate your hw configuration:
- lspci (eventually with -v flag)
- dmidecode

HTH
Title: Re: DegradedArray event - Notifications
Post by: davidpfox on October 18, 2009, 02:12:21 PM: Quote from: Stefano on October 18, 2009, 01:57:13 PM
ok.. so your raid arrays are ok..

I would check /var/log/messages for errors to discover why sda was thrown out of the array..
Feel like reviewing these with me?

Oct 17 09:58:26 sme-server kernel: SCSI device sda: 312579695 512-byte hdwr sectors (160041 MB)
Oct 17 09:58:26 sme-server kernel: SCSI device sda: drive cache: write back
Oct 17 09:58:26 sme-server kernel: SCSI device sda: 312579695 512-byte hdwr sectors (160041 MB)
Oct 17 09:58:26 sme-server kernel: SCSI device sda: drive cache: write back
Oct 17 09:58:26 sme-server kernel: sda: sda1 sda2
Oct 17 09:58:26 sme-server kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Oct 17 09:58:26 sme-server kernel: md: bind<sda1>
Oct 17 09:58:26 sme-server kernel: md: bind<sda2>
Oct 17 09:58:26 sme-server kernel: md: kicking non-fresh sda2 from array!
Oct 17 09:58:26 sme-server kernel: md: unbind<sda2>
Oct 17 09:58:26 sme-server kernel: md: export_rdev(sda2)
Oct 17 09:58:27 sme-server kernel: md: could not bd_claim sda1.
Oct 17 09:58:27 sme-server kernel: md: could not bd_claim sda2.
Oct 17 09:58:27 sme-server kernel: md: considering sda2 ...
Oct 17 09:58:27 sme-server kernel: md: adding sda2 ...
Oct 17 09:58:27 sme-server kernel: md: md2 already running, cannot run sda2
Oct 17 09:58:27 sme-server kernel: md: export_rdev(sda2)

Quote from: Stefano on October 18, 2009, 01:57:13 PM
about CLI commands to investigate your hw configuration:
- lspci (eventually with -v flag)
- dmidecode
lspci
00:00.0 Host bridge: Intel Corporation 82945G/GZ/P/PL Memory Controller Hub (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82945G/GZ Integrated Graphics Controller (rev 02)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01)
00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family) SATA IDE Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
02:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
02:07.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)

I did not include output from dmidecode as this is quite lengthy ..... :P
Title: Re: DegradedArray event - Notifications
Post by: piran on October 19, 2009, 01:31:30 PM: Water under the bridge I know but it may have been your
sdb (not sda) that was the more likely one to be suspect?
http://wiki.contribs.org/Raid#Resynchronising_a_Failed_RAID
Code: [Select]
(wiki extract) md2 : active raid1 hdb2[1] <-- missing partition 1048704 blocks [2/1] [_U] <-- failed
Quote from: davidpfox on October 18, 2009, 02:00:07 AM
Personalities : [raid1]
md2 : active raid1 sdb2[1] <-- missing partition
156183808 blocks [2/1] [_U] <-- failed

md1 : active raid1 sda1[0] sdb1[1]
104320 blocks [2/2] [UU]
Title: Re: DegradedArray event - Notifications
Post by: ReetP on October 19, 2009, 02:23:19 PM: Code: [Select]
(wiki extract) md2 : active raid1 hdb2[1] <-- missing partition 1048704 blocks [2/1] [_U] <-- failed
I think this section of the Wiki is incorrect. I think that it shows that hda2 is missing and that hdb2 is now active. If you read a little further down :

Code: [Select]
Determine the missing physical partition, Look carefully, and fill in the gap, in this example, it's hda2, the device being /dev/hda2 md1 : active raid1 hda3[0] hdb3[1] md2 : active raid1 hda2[0] hdb2[1] md0 : active raid1 hda1[0] hdb1[1]
In any event you can see in the log that is was SDA2 that was kicked:

Quote
Oct 17 09:58:26 sme-server kernel: md: kicking non-fresh sda2 from array!

You should run a proper disk check as well to be sure the drive is OK
Title: Re: DegradedArray event - Notifications
Post by: piran on October 19, 2009, 02:26:55 PM: I wanted clarification... the two didn't seem to add up, so
that when I get a similar report I know exactly what to do;~)
Title: Re: DegradedArray event - Notifications
Post by: Stefano on October 19, 2009, 02:37:58 PM: Quote from: ReetP on October 19, 2009, 02:23:19 PM
Code: [Select]
(wiki extract) md2 : active raid1 hdb2[1] <-- missing partition 1048704 blocks [2/1] [_U] <-- failed
I think this section of the Wiki is incorrect. I think that it shows that hda2 is missing and that hdb2 is now active.

please raise a bug for it or create a wiki account and correct yourself, thank you
Title: Re: DegradedArray event - Notifications
Post by: ReetP on October 19, 2009, 03:20:06 PM: Quote from: Stefano on October 19, 2009, 02:37:58 PM
please raise a bug for it or create a wiki account and correct yourself, thank you

I used to have a wiki account but I didn't use it enough to justify my 'wikiship' and I think it was withdrawn.

I have put in a request. If that fails I'll post a bug.
Title: Re: DegradedArray event - Notifications
Post by: ReetP on October 19, 2009, 03:41:33 PM: All Wiki'ed up. Edit done - Hope it's a) correct & b) little clearer !!
Title: Re: DegradedArray event - Notifications
Post by: davidpfox on October 19, 2009, 03:51:22 PM: Quote from: piran on October 19, 2009, 01:31:30 PM
Water under the bridge I know but it may have been your
sdb (not sda) that was the more likely one to be suspect?
http://wiki.contribs.org/Raid#Resynchronising_a_Failed_RAID
Perhaps my message was worded incorrectly, but what you have stated above is what I want to be cautious about 'sdb'. By the looks of this article (feels like i have read this 100 times), only if this issue arises again should I be concerned and look at using SMART.

Can someone confirm this for me?

Cheers
Title: Re: DegradedArray event - Notifications
Post by: piran on October 19, 2009, 03:54:12 PM: Quote from: ReetP on October 19, 2009, 03:41:33 PM
All Wiki'ed up. Edit done - Hope it's a) correct & b) little clearer !!
Much, thank you;~) Will be helpful when the inevitable occurs.
Title: Re: DegradedArray event - Notifications
Post by: piran on October 19, 2009, 03:55:53 PM: Quote from: davidpfox on October 19, 2009, 03:51:22 PM
Can someone confirm this for me?
The wiki was in error, it is more correct now, please follow Stefano.
Title: Re: DegradedArray event - Notifications
Post by: ReetP on October 19, 2009, 04:03:59 PM: Quote from: piran on October 19, 2009, 03:54:12 PM
Much, thank you;~) Will be helpful when the inevitable occurs.

Know the feeling :-)

After that brain strain it must be time for a :pint:
Title: Re: DegradedArray event - Notifications
Post by: purvis on October 20, 2009, 12:13:13 AM: Hi Piran

from the report below, i would of thought sda was the drive with issues and i still believe it is sda

Personalities : [raid1]
md2 : active raid1 sdb2[1] -----> i, paul, see sda is missing here
156183808 blocks [2/1] [_U]

md1 : active raid1 sda1[0] sdb1[1]
104320 blocks [2/2] [UU]

unused devices: <none>

and if i where you making a 9 hour trip, i would just upgrade and replace both drives, and grow the drives if running raid1.
when buying the drives, look over the serial/product id/ and anything else you can when selecting drives to get the two most perfect drives.

go ahead and yank out sda drive, in a raid1 you, you can always afford to be safer than sorry, then use sdb for rebuilding

after replacing the drives, you can always use the drives you removed in another system(possibly a workstation or usb backup) after fill comfortable about the new drives, if you feel they are still good to use. Plus you get a backup if you keep the sdb drive on a shelf.
Title: Re: DegradedArray event - Notifications
Post by: janet on October 20, 2009, 07:15:15 AM: davidpfox

Quote
.... only if this issue arises again should I be concerned and look at using SMART.
Can someone confirm this for me?

No. You should run extensive hard drive diagnostic tests on sda NOW, and while you are doing it also fully test sdb.

Being tossed out of an array could be a sign of a faulty drive.
Use smartctl and the manufacturers test software eg for Seagate drives use Seatools etc etc.
You can get a lot of test software on the Ultimate Boot CD (UBCD), google for it.

Also if one drive is showing signs of being faulty, then it's quite possible the other drive is showing signs of age too, and may fail soon, so best to check them both. If you feel it appropriate or convenient to do so, then go ahead and replace the drives. They are cheap compared to your running around time, and downtime & inconvenience for end users etc.

Another possible area that can affect drives and cause them to be spat out from an array is mother board incompatibilities. Some not so old m/b's did not support the faster SATA speeds, so it was necessary to slow the drives down by jumpering them so they were compatible with the m/b.
Title: Re: DegradedArray event - Notifications
Post by: davidpfox on October 20, 2009, 09:28:20 AM: Mary ..... awesome reply and very valuable.

Thanks!