Koozali.org: home of the SME Server

Obsolete Releases => SME Server 7.x => Topic started by: Tib on February 14, 2009, 07:53:23 AM

Title: Another Raid Problem [SOLVED]
Post by: Tib on February 14, 2009, 07:53:23 AM: I've spent a few hrs reading up on all the raid problems here and haven't yet found a variation that I have come across.

when I run
Code: [Select]
cat /proc/mdstatI get:
Quote
Personalities : [raid1]
md2 : active raid1 sdb2[1]
292929088 blocks [2/1] [_U]

md1 : active raid1 sda1[0]
104320 blocks [2/1] [U_]

unused devices: <none>

when I run
Code: [Select]
mdadm --query --detail /dev/md[12]I get:
Quote
/dev/md1:
Version : 00.90.01
Creation Time : Sat Mar 18 05:27:17 2006
Raid Level : raid1
Array Size : 104320 (101.89 MiB 106.82 MB)
Device Size : 104320 (101.89 MiB 106.82 MB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Sat Feb 14 15:40:53 2009
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

UUID : 1b2e31c8:d62a150c:18733cf4:452550cd
Events : 0.18551

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 0 0 - removed
/dev/md2:
Version : 00.90.01
Creation Time : Sat Mar 18 05:27:17 2006
Raid Level : raid1
Array Size : 292929088 (279.36 GiB 299.96 GB)
Device Size : 292929088 (279.36 GiB 299.96 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 2
Persistence : Superblock is persistent

Update Time : Sat Feb 14 16:31:37 2009
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

UUID : 6e3d0416:d1e50753:753b2da2:a6d774fc
Events : 0.35274281

Number Major Minor RaidDevice State
0 0 0 - removed
1 8 18 1 active sync /dev/sdb2

I had to do a fsck / to fix up the boot problem the server had after a power outage.

In the process of data retrieval I found that the second or other drive has not been syncing since 2006.

The people that own this server have just ignored the messages cause they didn't know what it was :shock:

So they called upon myself to fix there problem and implement a data backup scheme as well.

OK so back to the raid problem ... what would be the best thing here for me to do.

I have disconnected one drive at a time and Both drives boot up and can be accessed etc.

But only one drive has all the current data and operating system on it.

Regards,

Tib
Title: Re: Another Raid Problem
Post by: Stefano on February 14, 2009, 08:57:41 AM: hi

Code: [Select]
mdadm -a /dev/md1 /dev/sdb1
watch /proc/mdstat and check it syncronize.. if not, sdb need to be replaced

repeat the same step for sda2 on md2

if something is wrong with all the hds, then backup your data, buy 2 new disks (or 3, so you have a spare one in array) and rebuild your server

HTH
Ciao
Stefano
Title: Re: Another Raid Problem
Post by: Tib on February 14, 2009, 09:23:01 AM: hi Stefano

Thanks for the supper fast reply :)

I did
Code: [Select]
mdadm -a /dev/md1 /dev/sdb1
and

Code: [Select]
mdadm -a /dev/md2 /dev/sda2
then run

Code: [Select]
cat /proc/mdstat
I get

Quote
Personalities : [raid1]
md2 : active raid1 sda2[2] sdb2[1]
292929088 blocks [2/1] [_U]
[>....................] recovery = 2.1% (6384256/292929088) finish=83.3min speed=57313K/sec
md1 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]

unused devices: <none>

So looks like there starting to sync

again ... thank you

regards,

Tib
Title: Re: Another Raid Problem
Post by: CharlieBrady on February 14, 2009, 04:05:09 PM: Quote from: Tib on February 14, 2009, 07:53:23 AM
I have disconnected one drive at a time and Both drives boot up and can be accessed etc.

That was an unwise thing to do. That can cause the RAID layer to be mistaken about which is the most up to date drive, and can cause data loss.
Title: Re: Another Raid Problem
Post by: electroman00 on February 14, 2009, 04:20:37 PM: I was just about to post the same thing Charlie, but you beat me by a second or so.

From mdstat results it looks like it's messed up now though.

Also it's not a good idea to suggest to someone to replace a drive.

Better to suggest running MFG diag on it to determine if the drive needs to be replace.

From the information present in this thread it's impossible to tell if one or the other drive or both are bad.

Only MFG diag would confirm that.

However one could surmise there was a drive swap issue that occurred at sometime between the OP and subsequent posts.

Which should have been corrected before a resync was started, now it's pretty much water under the bridge.

Not an easy fix now.
Title: Re: Another Raid Problem
Post by: CharlieBrady on February 14, 2009, 04:30:55 PM: Quote from: electroman00 on February 14, 2009, 04:20:37 PM
Also it's not a good idea to suggest to someone to replace a drive.

I think it's good "safety first" advice. Disks are cheap. Only re-use a drive which has been thrown out of a RAID array if you are really confident that there's nothing wrong with the drive.

Pretty much all drives these days are SMART capable, and smartctl can be used to query status. I don't know that the MFG diagnostic will tell you more.
Title: Re: Another Raid Problem
Post by: CharlieBrady on February 14, 2009, 04:43:40 PM: Quote from: CharlieBrady on February 14, 2009, 04:30:55 PM
I think it's good "safety first" advice. Disks are cheap. Only re-use a drive which has been thrown out of a RAID array if you are really confident that there's nothing wrong with the drive.

Note that in this case, however, there wasn't a "failed disk". There was one partition on one disk thrown out of its array, and another partition on the other drive thrown out of the other array. That situation requires more careful recovery - and also makes it less likely that there is really a drive failure issue.
Title: Re: Another Raid Problem
Post by: electroman00 on February 14, 2009, 06:59:19 PM: And the drive positions got swapped somewhere along the way.
Nothing wrong with the drives, more like a fault install or cable or controller.

New drives won't fix those issues.
Title: Re: Another Raid Problem
Post by: electroman00 on February 14, 2009, 07:02:45 PM: Drives ain't cheap if your replacing good drives every time you have a raid issue.
Reading Smart is by no means a one stop diag shop.

Yes MFG diag will confirm if the drive is actually bad or good, won't tell you when they might fail in the future.
Only a palm reader can tell you that.
Yes the diag reads smart also and will report any smart errors as diag codes.

And some MFG's these days will not warranty the drive if you don't run the MFG diag first.
WD for example wants to know the failure code from diag. before they RMA.
Title: Re: Another Raid Problem
Post by: CharlieBrady on February 14, 2009, 07:18:17 PM: Quote from: electroman00 on February 14, 2009, 07:02:45 PM
Drives ain't cheap if your replacing good drives every time you have a raid issue.

You should never see "a raid issue" in normal operation. If you are frequently seeing "raid issues" with "good" drives, then you have an undiagnosed problem which should be investigated and rectified.
Title: Re: Another Raid Problem
Post by: Tib on February 16, 2009, 01:10:06 AM: Thanks for the heads up ...

I've checked the data and everything is there so I didn't have to restore any data.

I'll put all this in my notes just in case one of the drives has a spack attack.

Regards,

Tib