Koozali.org: home of the SME Server
Obsolete Releases => SME Server 7.x => Topic started by: wjhobbs on February 05, 2007, 08:19:35 PM
-
Almost exactly a month ago I received the following email message to admin:
This is an automatically generated mail message from mdadm running on chryxus.primary.chryxus.ca.
A DegradedArray event has been detected on md device /dev/md2.
I presumed the drive had gone bad and replaced the physical drive.
On the drive that I removed, I performed a full write-read test on the entire extent of the drive -- with no errors. That drive seems fine and is now in use on a test server.
Now, today I have received exactly the same message, identifying the brand new drive (/dev/md2) as the sources of the problem.
I am beginning to suspect that the physical drive may not be the issue.
Does anyone have any comments/suggestions?
Thanks.
John
-
I have a comment:
Degraded array can be the cause of server crash, power loss or a reset. This happens if one array falls behind. Raid then rebuilds the array and you're set.
It happend to me when i had to reboot my server (hard reboot) when i had kernel panic.
-
Maybe I can help, I had recover 800gb of my irreplaceable data after a raid5 failure with 2 drives down.
Log on to the console and give me the output of:
cat /proc/mdstat
-
Gert,
Thanks for your response.
Too late for anything useful. I am in the process of rebuilding the array and mdstat just shows the resync in progress.
What I am wondering is if anyone suspects a potential problem elsewhere, like the second IDE controller or something else.
John
-
Doesn't Matter id it is rebuilding, I wanted to see what your raid configuration looks like. I is definately posible that it is the controller. In my case it was and as result it took both drives connected to my primary controller out. But the array only became degraded after a reboot and it would not boot. so I could not recreate the array. Even the rescue option from the sme cd did not detect the installation. I had to use RIP and knoppix to save my data. I ended up with a corrupt ext3 file system within a logical volume, within a lost logical volume group, within a degraded (failed) raid 5 array.
Are you using raid 1 or 5? IDE, SCSI or SATA?
-
like the second IDE controller
Ok, IDE. wich device failed?
second IDE controller
I suppose it would be then hdc of hde.
while rebuilding do:
mdadm --examine /dev/hdX2 (where X is the failed drive)
and see if the checksum failes. If that is the case then your secondary controller is faulty and your array will be degraded again after you reboot.
-
Thanks Gert,
Sorry I didn't get to your post until the rebuild had completed. The results of --examine are:
[root@chryxus ~]# mdadm --examine /dev/hdc2
/dev/hdc2:
Magic : a92b4efc
Version : 00.90.00
UUID : a7d745b4:eba3a1a5:78728991:024fc0de
Creation Time : Sun Jan 7 12:07:23 2007
Raid Level : raid1
Device Size : 292945152 (279.37 GiB 299.98 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Update Time : Tue Feb 6 17:47:38 2007
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : 54631d6f - correct
Events : 0.866385
Number Major Minor RaidDevice State
this 1 22 2 1 active sync /dev/hdc2
0 0 3 2 0 active sync /dev/hda2
1 1 22 2 1 active sync /dev/hdc2
Checksum is OK at this point.
John
-
ok, if your drive falls again, take a look at your checksum.