Koozali.org: home of the SME Server

Trying to get my head around RAID

EnglishRob

Trying to get my head around RAID
« on: March 02, 2006, 01:02:30 PM »
Hi there folks,

I've recently setup an SME Server 7 box for a friend.

They have two hard disks in their server which are in a software raid configuration.

They have asked me to put together some procedures for disaster recovery in case of the hard disks was to fail.

Now, I haven't played around with RAID much in the past so I figured to make life easy I'd have a go in VMWare.

I've tried to setup a virtual machine with a similar setup to my friends server.  The virtual machine has two 8GB hard disks attached on HDA and HDB.  I installed a basic installation of SME Server 7.0 pre1 on this virtual machine.

Everything seemed okay up to that point, both drives were seen and the RAID array was okay.

Now to simulate a relpacement hard disk being added, I shutdown the server, removed one of the hard disks (HDB) and added a blank 8GB virtual hard disk in it's place.

After doing this, I then had an 8GB HDA with the installation of SME 7 on it and a blank 8GB HDB (no partitions or anything on the drive).

Now this is where I'm getting stuck.  I assumed that if the drive was replaced, the RAID array would be automatically recreated and the second hard disk would be an identical copy of the first hard disk.

When I go into the console and select Manage Disk Redundancy, I get the following message:

Code: [Select]

Current RAID status:

Personalities : [raid1]
md2 : active raid1 hda2[0]
      8281408 blocks [2/1] [U_]

md1 : active raid1 hda1[0]
      104320 blocks [2/1] [U_]

unused devices: <none>

All RAID devices are in a clean state

and in /var/log/raidmonitor/current I am getting a load of messages which say:

Code: [Select]

mdadm: only specify super-minor once, super-minor=2 ignored.

and

Code: [Select]

mdadm: only specify super-minor once, super-minor=1 ignored.

So, I'm stumped.  Do I need to run any commands to get the discs to mirror properly or it is supposed to be automatic?

Regards,

Rob

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: Trying to get my head around RAID
« Reply #1 on: March 02, 2006, 02:58:30 PM »
Quote from: "EnglishRob"

Now this is where I'm getting stuck.  I assumed that if the drive was replaced, the RAID array would be automatically recreated and the second hard disk would be an identical copy of the first hard disk.


You assumed wrongly.

Quote

When I go into the console and select Manage Disk Redundancy, I get the following message:

Code: [Select]

Current RAID status:

Personalities : [raid1]
md2 : active raid1 hda2[0]
      8281408 blocks [2/1] [U_]

md1 : active raid1 hda1[0]
      104320 blocks [2/1] [U_]

unused devices: <none>

All RAID devices are in a clean state



Which is what you'd expect.  To add the new disk, you'll need to do:

add_mirror hda hdb

EnglishRob

Trying to get my head around RAID
« Reply #2 on: March 02, 2006, 03:52:15 PM »
Wow, it's that simple?

Thanks I'll give that a try.

Rob

EnglishRob

Trying to get my head around RAID
« Reply #3 on: March 03, 2006, 12:20:43 PM »
I think it's worked.

I had to use the -f option which I'm assuming forces the process.  Otherwise it bombed out with an error.

Anyway, on checking /proc/mdstat it says something about recovery and it keeps going up when I check it.

Hopefully thats done the job. :)

Rob

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Trying to get my head around RAID
« Reply #4 on: March 03, 2006, 04:35:49 PM »
Quote from: "EnglishRob"

I had to use the -f option which I'm assuming forces the process.  Otherwise it bombed out with an error.


If the disk was indeed blank, you shouldn't have needed the -f option. What was the "error"?

Offline gordonr

  • *
  • 646
  • +0/-0
    • http://www.smeserver.com.au/
Re: Trying to get my head around RAID
« Reply #5 on: March 06, 2006, 02:11:24 AM »
Quote from: "EnglishRob"

So, I'm stumped.  Do I need to run any commands to get the discs to mirror properly or it is supposed to be automatic?

Please raise this in the bug tracker.

As Charlie said, you can run add_mirror, but the console should be able to detect that the failed drive has been replaced with a new one.  It doesn't currently, and I class that as a bug.

I wrote that code, and catered for adding a second disk to an existing one-way mirror, but it looks like it's not correctly dealing with replacing a failed mirror component. It should, but we need to be very careful not to clobber a good disk that just happens to be there.
............

Offline gordonr

  • *
  • 646
  • +0/-0
    • http://www.smeserver.com.au/
Re: Trying to get my head around RAID
« Reply #6 on: March 06, 2006, 03:30:02 AM »
Quote from: "gordonr"

As Charlie said, you can run add_mirror, but the console should be able to detect that the failed drive has been replaced with a new one.  It doesn't currently, and I class that as a bug.

For the record, it does work for me. I just went through the exercise of shutting down a box with a mirrored pair and replacing one with a new disk.

I need details from your system to chase what's happening. Thanks.
............

EnglishRob

Trying to get my head around RAID
« Reply #7 on: March 06, 2006, 10:40:43 AM »
Well the system was a virtual machine running under VMWare.

I installed SME 7.0pre3, didn't install any updates (due to lack of internet connection to the PC).

I configured two virtual hard drives of 8GB in size.  They were both setup on the Master IDE channel as Primary & Slaves.  The virtual machine also had 256MB memory allocated to it.

I haven't got the PC to hand at the moment, when I get a spare few minutes I'll try and re-create the error and let you know what the error messages are.

Rob

Offline NickCritten

  • *
  • 245
  • +0/-0
Trying to get my head around RAID
« Reply #8 on: March 06, 2006, 04:43:27 PM »
Hi All,

If it helps at all, I have recently done the same thing (not in a VM though - on a real box)

I had a single drive on PriMaster, then added a second on SecMaster.
I was kind of expecting to get asked if I wanted to add it in during bootup, but when nothing happened I searched and found this post.

I did a add_mirror hda hdc and everything zoomed past but looked OK. (I also had to do a -f to force, because the second drive had a single unformatted NTFS partition from when I wiped it on my Windows server)

I've just checked it again now (more than 24Hrs later) and I'm getting this:

Code: [Select]
 │ Current RAID status:  
  │              
  │ Personalities : [raid1]
  │ md2 : active raid1 hda2[0]
  │       117113728 blocks [2/1] [U_]
  │                    
  │ md1 : active raid1 hdc1[1] hda1[0]
  │       104320 blocks [2/2] [UU]
  |
  | unused devices: <none>
  |
  |
  |  Only some of the RAID devices are unclean. Manual intervention may be required.


It doen't look quite right to me... Doesn't SME install with 3Partitions?
Plus the Larger of the two partitions doesn't seem to have a mirror...


Is there any documenentation anywhere for all this... EnglishRob and myself cannot be the only people who didn't know about the add_mirror command, or the procedure for adding a Mirror / replacing a duff drive.
I'd be happy to write a newbie-guide if I had some info to work with.
...
Nick

"No good deed goes unpunished." :-x...

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Trying to get my head around RAID
« Reply #9 on: March 06, 2006, 05:02:10 PM »
Quote from: "NickCritten"

I did a add_mirror hda hdc and everything zoomed past but looked OK. (I also had to do a -f to force, because the second drive had a single unformatted NTFS partition from when I wiped it on my Windows server)

I've just checked it again now (more than 24Hrs later) and I'm getting this:

Code: [Select]
 │ Current RAID status:  
  │              
  │ Personalities : [raid1]
  │ md2 : active raid1 hda2[0]
  │       117113728 blocks [2/1] [U_]
  │                    
  │ md1 : active raid1 hdc1[1] hda1[0]
  │       104320 blocks [2/2] [UU]
  |
  | unused devices: <none>
  |
  |
  |  Only some of the RAID devices are unclean. Manual intervention may be required.


It doen't look quite right to me...


My guess is that the second disk has already been booted out of md2. You'll need to dig to find out why.

There's a second issue here. The raid monitor is shut down when the system starts up with only a single disk. It should be started up again when add_mirror is run. If it had have been, you might have received notification about the mirror being broken via email. Please report that issue to the bug tracker.

Offline NickCritten

  • *
  • 245
  • +0/-0
Trying to get my head around RAID
« Reply #10 on: March 06, 2006, 05:19:48 PM »
Quote from: "CharlieBrady"

My guess is that the second disk has already been booted out of md2. You'll need to dig to find out why.

There's a second issue here. The raid monitor is shut down when the system starts up with only a single disk. It should be started up again when add_mirror is run. If it had have been, you might have received notification about the mirror being broken via email. Please report that issue to the bug tracker.



the Log file shows the following:
Code: [Select]
@40000000440b459139744274 Event: SparesMissing, Device: /dev/md1, Member:
@40000000440b45921919b284 Event: DegradedArray, Device: /dev/md2, Member:
@40000000440b4593038bd834 Event: SparesMissing, Device: /dev/md2, Member:


I also recieved the same messages in the admin email account.


Quote from: "CharlieBrady"
You'll need to dig to find out why


Please define 'dig' :-)
Seriously though, please give me a kick in the right direction, I don't know how it's supposed to work, so Its virtually impossible to know what is a bug and what is my lack of understanding!
...
Nick

"No good deed goes unpunished." :-x...

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Trying to get my head around RAID
« Reply #11 on: March 06, 2006, 05:40:55 PM »
Quote from: "NickCritten"

Quote from: "CharlieBrady"
You'll need to dig to find out why


Please define 'dig' :-)


Look through log files. Run commands to investigate. Google.

Offline gordonr

  • *
  • 646
  • +0/-0
    • http://www.smeserver.com.au/
Trying to get my head around RAID
« Reply #12 on: March 06, 2006, 10:26:44 PM »
Quote from: "NickCritten"

It doen't look quite right to me... Doesn't SME install with 3Partitions?
Plus the Larger of the two partitions doesn't seem to have a mirror...

Pretty please - could we take this to the bug tracker?

We obviously have an issue and I want to track it, and gather your system configuration details so we can fix it. I have been through the procedures above many times and they do work for me. We need to work out why they don't work for you, and the bug tracker is the place.
Quote from: "NickCritten"

Is there any documenentation anywhere for all this... EnglishRob and myself cannot be the only people who didn't know about the add_mirror command, or the procedure for adding a Mirror / replacing a duff drive.

There is a category in the bug tracker for SME Server Documentation, and the console menu item is already referenced there, with procedures for adding a mirror drive.

The add_mirror command was the first step and shouldn't normally be required as the console menu item should do the right thing. If it's not, I want to know why and track it in the bug tracker so I can fix it.
Quote from: "NickCritten"

I'd be happy to write a newbie-guide if I had some info to work with.

I'd prefer that this information went straight into the manual. Thanks.
............

Offline gordonr

  • *
  • 646
  • +0/-0
    • http://www.smeserver.com.au/
Trying to get my head around RAID
« Reply #13 on: March 06, 2006, 10:29:12 PM »
Quote from: "NickCritten"

It doen't look quite right to me... Doesn't SME install with 3Partitions?
Plus the Larger of the two partitions doesn't seem to have a mirror...

Pretty please - could we take this to the bug tracker?

We obviously have an issue and I want to track it, and gather your system configuration details so we can fix it. I have been through the procedures above many times and they do work for me. We need to work out why they don't work for you, and the bug tracker is the place.
Quote from: "NickCritten"

Is there any documenentation anywhere for all this... EnglishRob and myself cannot be the only people who didn't know about the add_mirror command, or the procedure for adding a Mirror / replacing a duff drive.

There is a category in the bug tracker for SME Server Documentation, and the console menu item is already referenced there, with procedures for adding a mirror drive.

The add_mirror command was the first step and shouldn't normally be required as the console menu item should do the right thing. If it's not, I want to know why and track it in the bug tracker so I can fix it.
Quote from: "NickCritten"

I'd be happy to write a newbie-guide if I had some info to work with.

Excellent, but I think this information should go straight into the manual. I;m sure that docteam would love some extra help with the manual. Thanks.
............

Offline NickCritten

  • *
  • 245
  • +0/-0
Trying to get my head around RAID
« Reply #14 on: March 06, 2006, 11:12:41 PM »
Quote from: "gordonr"
Pretty please - could we take this to the bug tracker?


Done :  http://bugs.contribs.org/show_bug.cgi?id=959

The thing is though, I don't know how it's supposed to work!  I've search and searched through the documentation and the only mention of RAID really is the proclamation that SME7 supports RAID 1, 5 & 6 at boottime. http://no.longer.valid/phpwiki/index.php/SME%207%20Features
 If there is procedural info somewhere, I can't find it and not for lack of trying! unfortunately RTFM doesn't work in this case... There isn't one yet!!  :-)

Quote from: "GordonR"

There is a category in the bug tracker for SME Server Documentation, and the console menu item is already referenced there, with procedures for adding a mirror drive.


In that case I'll have a look, I never even considered looking in the bugtracker for documentation.
...
Nick

"No good deed goes unpunished." :-x...

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Trying to get my head around RAID
« Reply #15 on: March 06, 2006, 11:16:32 PM »
Quote from: "NickCritten"

 RTFM doesn't work in this case... There isn't one yet!!  :-)


Contributions will be welcomed with open arms.

Offline NickCritten

  • *
  • 245
  • +0/-0
Trying to get my head around RAID
« Reply #16 on: March 06, 2006, 11:53:33 PM »
Quote from: "CharlieBrady"
Quote from: "NickCritten"

 RTFM doesn't work in this case... There isn't one yet!!  :-)


Contributions will be welcomed with open arms.


Already in the works.... Once I figure it out for myself.
Sometime you've got to remember that we aren't all at the Guru level!
Us little Paduans still need to learn from the Masters :-)
...
Nick

"No good deed goes unpunished." :-x...

Offline gordonr

  • *
  • 646
  • +0/-0
    • http://www.smeserver.com.au/
Trying to get my head around RAID
« Reply #17 on: March 07, 2006, 12:54:45 AM »
Quote from: "NickCritten"

If there is procedural info somewhere, I can't find it and not for lack of trying! unfortunately RTFM doesn't work in this case... There isn't one yet!!  :-)

I'm sure that docteam_at_lists.contribs.org would love some extra help to get it into shape.

See here for the actual work and testing:

http://bugs.contribs.org/show_bug.cgi?id=516

Follow-up to the bug tracker please - either as bugs in the code or the doco. Thanks.
............

Chrisco781

Trying to get my head around RAID
« Reply #18 on: March 17, 2006, 07:46:41 AM »
Nick, I recently was able to successfully rebuild a blank hdb drive with raid1 on SME7.0 pre4 in VM ware with 8gig drives. I selected manage disk redundancy in the server console. Then logged in as root and ran cat /proc/mdstat. It then showed progress in rebuilding hdb. Then I decided to test on real machine and ran into same problem you did. Only thing I can see that I did different is that my two hard drives are not exactly the same, about a 400MB difference in size. Are the two hard drives that you are using exactly the same?

Offline NickCritten

  • *
  • 245
  • +0/-0
Trying to get my head around RAID
« Reply #19 on: March 17, 2006, 11:28:05 AM »
Quote from: "Chrisco781"
Nick, I recently was able to successfully rebuild a blank hdb drive with raid1 on SME7.0 pre4 in VM ware with 8gig drives. I selected manage disk redundancy in the server console. Then logged in as root and ran cat /proc/mdstat. It then showed progress in rebuilding hdb. Then I decided to test on real machine and ran into same problem you did. Only thing I can see that I did different is that my two hard drives are not exactly the same, about a 400MB difference in size. Are the two hard drives that you are using exactly the same?


Yes they're Identical, I don't think the Menu option to rebild RAID is supposed to work yet with non-identical disks anyway though...
...
Nick

"No good deed goes unpunished." :-x...