Koozali.org: home of the SME Server
Obsolete Releases => SME Server 8.x => Topic started by: pablitobs on August 20, 2011, 12:27:29 AM
-
I have degraded array event mail from my sme, every day...
Ran cat /proc/mdstat
Got:
Personalities : [raid1]
md1 : active raid1 sda1[0]
104320 blocks [2/1] [U_]
md2 : active raid1 sda2[0]
975595200 blocks [2/1] [U_]
unused devices: <none>
Meaning the md2 is having troubles..
Quick search and I found this http://wiki.contribs.org/Raid#Resynchronising_a_Failed_RAID
Followed the instructions on removing md2 as U_ means md2 is having the problem:
mdadm --remove /dev/md2 /dev/sda2
but I am getting this error:
mdadm: hot remove failed for /dev/sda2: Device or resource busy
Tried with mdadm --manage /dev/md2 --remove /dev/sda2
same answer... mdadm: hot remove failed for /dev/sda2: Device or resource busy
Tried with mdadm --stop /dev/md2
same answer... mdadm: hot remove failed for /dev/sda2: Device or resource busy
mount got me this:
/dev/mapper/main-root on / type ext3 (rw,usrquota,grpquota)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/md1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
/dev/sdb1 on /home/e-smith/files/ibays/hal/files type ext4 (rw)
/dev/sdc1 on /media/usbdisk type ext3 (rw)
as you can see there is no reference to md2.. or sda2.. so looks like is not mounted...
running mdadm --query --detail /dev/md1 I get....
/dev/md1:
Version : 0.90
Creation Time : Mon May 2 08:52:48 2011
Raid Level : raid1
Array Size : 104320 (101.89 MiB 106.82 MB)
Used Dev Size : 104320 (101.89 MiB 106.82 MB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Fri Aug 19 14:56:57 2011
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
UUID : 7361e483:63142ad2:773f3338:619f9c18
Events : 0.796
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 0 0 1 removed
running the same at md2
/dev/md2:
Version : 0.90
Creation Time : Mon May 2 08:52:49 2011
Raid Level : raid1
Array Size : 975595200 (930.40 GiB 999.01 GB)
Used Dev Size : 975595200 (930.40 GiB 999.01 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Fri Aug 19 16:25:38 2011
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
UUID : 080785ae:0db28d57:6a2c4489:a18058a4
Events : 0.5228248
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 0 0 1 removed
so... looks like md2 is removed.... then I tried to add it again with mdadm --add /dev/md2 /dev/sda2
got: mdadm: Cannot open /dev/sda2: Device or resource busy
any help will be appreciated.....
Thanks
-
pablitobs
The first raid error means sda is OK, but the second disk in RAID1 is missing (failed). In a typical situation this would be sdb.
I suggest you run a disk manufacturers diagnostic test to check the other sdb (or even both sda & sdb drives).
eg UBCD (ultimate boot CD)
The other errors about drive being busy are probably caused by you when trying to add sda back to the RAID array, when it is still a member of the array, and appears to be functioning correctly too.
So you have misinterpreted the errors and are doing the wrong thing to repair the problem.
It may be that sdb has totally failed and you need to physically replace that drive, and then add & resync the drive/array using the menu in the SME console (log in as admin).
Please read the manual & the various RAID and drive Howtos, there should be enough info there to get you going again.
-
please try to tell us what RAID you use
I have also a hardware RAID which will provide to SME information that only one HDD it exist
in my hardware RAID I have several HDD into one RAID5
but SME will see only one HDD (sda) and make 2 partitions on it (sda1 and sda2)
mine looks like this
[root@mail ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[0]
104320 blocks [2/1] [U_]
md2 : active raid1 sda2[0]
1318221504 blocks [2/1] [U_]
even in the back there are more HDD (I can see them in RAID management interface)
What I see is that you have sda1 and sda2 partitions - almost the same behavior
if you have an RAID which provide the equivalent HDD to the SME this should be the right thing to see
what did you see on fdisk -l?
for example in same environment I see only one HDD
fdisk -l
Disk /dev/sda: 1349.9 GB, 1349967151104 bytes
255 heads, 63 sectors/track, 164124 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 fd Linux raid autodetect
/dev/sda2 14 164124 1318221607+ fd Linux raid autodetect
Disk /dev/md2: 1349.8 GB, 1349858820096 bytes
2 heads, 4 sectors/track, 329555376 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/md2 doesn't contain a valid partition table
Disk /dev/md1: 106 MB, 106823680 bytes
2 heads, 4 sectors/track, 26080 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/md1 doesn't contain a valid partition table
and as you can see apparently I have a problem with the second md2 - but this is due the fact SME try to make an software RAID (or at least it is prepared to do it)
please try to see if you have an hardware RAID or not
-
Assuming you are using the built in linux software raid, and your /proc/mdstat is as you indicated:
md1 : active raid1 sda1[0]
104320 blocks [2/1] [U_]
md2 : active raid1 sda2[0]
975595200 blocks [2/1] [U_]
then I would start by seeing if you can just re-add the offending disc by:
mdadm --add /dev/md1 /dev/sdb1
and
mdadm --add /dev/md2 /dev/sdb2
If this does not work, then you need to test /dev/sdb (if indeed that is the second drive), and either replace or at least re-format it. You could also try the "p" command from:
fdisk /dev/sdb
this will show you details of the partition table.
-
The other errors about drive being busy are probably caused by you when trying to add sda back to the RAID array, ...
No, from trying to remove one device which is the only device in an array, and from trying to stop an array while it was in use.
-
i also find using smartmontools very helpful
http://wiki.contribs.org/Monitor_Disk_Health
it is likely you may find(the server may find) problems with the hard drive and you may get some emails to the admin account.