Koozali.org: home of the SME Server
Obsolete Releases => SME Server 7.x => Topic started by: jim7jim on February 29, 2012, 06:58:52 PM
-
Running SME 7.5.1 - I got an email from mdadm that 'A Fail event has been detected on md device /dev/md2.'
I have two disks in software RAID1. Result of cat /proc/mdstat:
Personalities : [raid1]
md2 : active raid1 sda2[0] sdb2[2](F)
78043648 blocks [2/1] [U_]
md1 : active raid1 sda1[0] sdb1[1]
104320 blocks [2/2] [UU]
unused devices: <none>
================================================
And:
[root@rr-gateway ~]# mdadm --detail /dev/md1
/dev/md1:
Version : 00.90.01
Creation Time : Fri Jun 1 21:12:52 2007
Raid Level : raid1
Array Size : 104320 (101.89 MiB 106.82 MB)
Device Size : 104320 (101.89 MiB 106.82 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Wed Feb 29 01:31:58 2012
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : ba509459:ff471e08:288dc218:33702b08
Events : 0.14812
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
[root@rr-gateway ~]# mdadm --detail /dev/md2
/dev/md2:
Version : 00.90.01
Creation Time : Fri Jun 1 21:12:52 2007
Raid Level : raid1
Array Size : 78043648 (74.43 GiB 79.92 GB)
Device Size : 78043648 (74.43 GiB 79.92 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Wed Feb 29 12:54:44 2012
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
UUID : 0a13f350:a0027fcb:3ea6bce2:1ca7e3c5
Events : 0.72750928
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 0 0 - removed
2 8 18 - faulty /dev/sdb2
===============================================
Do I just need to add the removed disk back with:
mdadm /dev/md2 --add /dev/sda2
Or is there other steps I should follow?
TIA.
-
Follow the instructions in the raid howto
http://wiki.contribs.org/Raid#Resynchronising_a_Failed_RAID (http://wiki.contribs.org/Raid#Resynchronising_a_Failed_RAID)
-
Thanks for the response. I read that, but I'm confused whether I should try to add back sda2 or sdb2.
Also, when using 'smartctl -i /dev/sdb2 -d ata' or 'smartctl -i /dev/sdb1 -d ata'
It tells me that the Device Read Identity Failed. Does this mean I have a bad disk that needs replaced?
-
jim7jim
sdb disk is the problem, in particular partition sdb2 has been thrown out of the array.
This can happen for a variety of reasons, one of which is a faulty or failing hard disk drive (in this case sdb).
Adding the drive back without thoroughly testing it first is VERY UNWISE !
Your error message is already suggestive of a fault.
To determine which drive is which use
fdisk -l
and
smartctl -a /dev/sdx
(replace sdx with sda sdb sdc etc, or even hda hdc on older IDE systems)
You can identify the serial number of the drive so you remove the correct faulty) drive.
Run thorough tests using the drive manufacturers diagnostic testing software (free download usually, or get the Ultimate Boot CD - UBCD), also free download, and also test with smartctl (long tests preferably), refer wiki Howto re Drive Health or similar name - http://wiki.contribs.org/Monitor_Disk_Health
In this case test/dev/sdb, but it would not hurt to test all drives while you are there, ie run a test on /dev/sda also, the reasoning being that if one drive is faulty or failing, then possibly the other drive in a RAID1 array is also becoming problematic.
-
I replaced the bad drive and the array rebuilt successfully.
Thanks for the help.