DegradedArray event

steve288

336
+0/-0

DegradedArray event

« on: December 22, 2011, 08:26:42 PM »

I have a system 7.5.1 that sends me an email when I reboot saying....

rom: mdadm monitoring[root@abc.mydomain.com]
   To: admin_raidreport@abc.mydomain.com

   Subject: DegradedArray event on /dev/md2:linux.abc.mydomain.com

   This is an automatically generated mail message from mdadm running on smclinux.abc.mydomain.com.

   A DegradedArray event has been detected on md device /dev/md2.

The Gui says ...

   ------Disk Reduncancy status as of Thursday Dec 22 etc.. ------
   Current RAID status:

   Personalities : [raid1]
      md2 : active raid1 hda2[0]
         38973568 blocks [2/1] [U_]

      md1 : active raid1 hda1[0] hdb1[1]
         104320 blocks [2/2] [UU]

         unused devices: <none>
   Only Some of the RAID devices are unclean.
   Manual intervention may be required.

I have also run the following commands.

[root@smclinux ~]# mdadm --query --detail /dev/md2
/dev/md2:
Version : 00.90.01
Creation Time : Thu Apr 24 13:58:39 2008
Raid Level : raid1
Array Size : 38973568 (37.17 GiB 39.91 GB)
Device Size : 38973568 (37.17 GiB 39.91 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 2
Persistence : Superblock is persistent

Update Time : Thu Dec 22 12:01:58 2011
State : clean, degraded   <--------- NOTE THIS
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

UUID : 48710ec5:b53317cb:05eafe7f:5ae8df80
Events : 0.30714056

Number Major Minor RaidDevice State
0 3 2 0 active sync /dev/hda2
1 0 0 - removed <---------- NOTE THIS

--------------------------------------------------------------------
Here is the command for MD1

[root@smclinux ~]# mdadm --query --detail /dev/md1
/dev/md1:
Version : 00.90.01
Creation Time : Thu Apr 24 13:58:39 2008
Raid Level : raid1
Array Size : 104320 (101.89 MiB 106.82 MB)
Device Size : 104320 (101.89 MiB 106.82 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Thu Dec 22 04:05:00 2011
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

UUID : da6572c9:181c2142:f8f7fecf:df36a8e2
Events : 0.14725

Number Major Minor RaidDevice State
0 3 1 0 active sync /dev/hda1
1 3 65 1 active sync /dev/hdb1

I'm not sure what to do. I have read several wiki's and am trying to understand this data.

Since this computer is at a remote location I cant just go there and open it up. Frankly I installed this computer several years ago and cant remember much about the hardware.

Can someone help me understand the data I have retrieved. First I want to understand the hardware I have and the problem I have. Can someone confirm or not my asumptions 1.-5.

1. I'm guessing I have two 40 gig (basically) IDE drives e.g. /dev/hda and /dev/hdb

2. Is the The system mirrored, with boot on (hda1 (mirrored on second drive) hdb1)
and
a second partition hda2 (mirrored on second drive) hdb2)

3. MD1 is the boot mirrored device and MD2 is the user files mirrored device.

4. Techincally the failure is on the second drive, on the second partition. E.G. hdb2.

5. I understand that the drive may be going bad and should be replaced but Initialy I would like to rebuild the drive if thats an option. Then I will get a new drive and travel to replace. But untill I can buy new drives and replace them I would like to rebuild.

To rebuild the array, or at least try to, do the following????

#> mdadm -a /dev/hdb2 /dev/md2

I have also read on the wiki about removing disks. Should I do that first, eg. run the following command.

#> mdadm --remove /dev/md2 /dev/hdb2

Before running the #> mdadm -a /dev/hdb2 /dev/md2

Obviously this is important and I want to make sure Im doing it right.

Am I on the right track here?

By the way I saw a bug report
http://bugs.contribs.org/show_bug.cgi?id=2390
Regarding emailing you when a drive fails. I dont really understand it. Im confused whether I should only get this error when I reboot or if I should get it on some sort of scedule.
It seems if its important (and it is) that It should be sent out regularly? Not sure?

thanks.

Logged

cactus

4,880
+3/-0

Re: DegradedArray event

« Reply #1 on: December 22, 2011, 10:01:37 PM »

Quote from: steve288 on December 22, 2011, 08:26:42 PM

To rebuild the array, or at least try to, do the following????

#> mdadm -a /dev/hdb2 /dev/md2

I have not much experience with raid but if the wiki says so, I think it is sound.

Quote from: steve288 on December 22, 2011, 08:26:42 PM

I have also read on the wiki about removing disks. Should I do that first, eg. run the following command.

#> mdadm --remove /dev/md2 /dev/hdb2

I don't think you need to do that as the partition is already removed according to:

Quote from: steve288 on December 22, 2011, 08:26:42 PM

Code: [Select]
[root@smclinux ~]# mdadm --query --detail /dev/md2 /dev/md2: Version : 00.90.01 Creation Time : Thu Apr 24 13:58:39 2008 Raid Level : raid1 Array Size : 38973568 (37.17 GiB 39.91 GB) Device Size : 38973568 (37.17 GiB 39.91 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Thu Dec 22 12:01:58 2011 State : clean, degraded <--------- NOTE THIS Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 48710ec5:b53317cb:05eafe7f:5ae8df80 Events : 0.30714056 Number Major Minor RaidDevice State 0 3 2 0 active sync /dev/hda2 1 0 0 - removed <---------- NOTE THIS

Logged

Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

steve288

336
+0/-0

Re: DegradedArray event

« Reply #2 on: December 23, 2011, 03:08:57 PM »

Thanks for the info.
To update you and anyone who is having the same problems.

I was WRONG on the rebuild command I listed above. I read it in a post not the wiki, and I'm not sure what it does. However the proper command for rebuilding your array is ;
mdadm --add /dev/md2 /dev/hdb2
(of course put in the right hdb2 and md2 designators for your drive/partition and multidevice#)
Then I ran ;
watch -n .1 cat /proc/mdstat
And merrily watched my raid rebuild.
And again thanks to cactus on the advice not to remove. When I run the command ;
mdadm --query --detail /dev/md2
Clearly at the bottom of the Output I see the line that says "removed"
If it says this then duhh It's removed and you dont need to remove it.

Number Major Minor RaidDevice State
0 3 2 0 active sync /dev/hda2
1 0 0 - removed <---------- NOTE THIS

The system has rebuild and seems to be holding. Next I will be changing the drive I think, as failures are often signs of drives going. I will most likely get back to the group on how to do this. But for now. Its the holidays.
thanks.

« Last Edit: December 23, 2011, 03:10:53 PM by steve288 »

Logged