Koozali.org: home of the SME Server

Problem replacing new disk in RAID

Offline judgej

  • *
  • 375
  • +0/-0
Re: Problem replacing new disk in RAID
« Reply #15 on: April 07, 2011, 05:25:16 PM »
...I have completely highjacked your thread.  My apologies.

No, no - this is all good stuff. I just need to get a production server up and running with larger hard drives as quickly and reliably as I can, and this looks like a way to do it. I'm already half-way through the install process of a new server now.

The role this server has is as a mail server, firewall, file shares, backup server for our web hosted sites (I run some custom scripts and rsync via cron for that). There is nothing particularly custom about it, although there are a few contribs installed to monitor the system. It is a shame that one of them was not a SMART warning system, something I am surprised that SME Server does not do out of the box.

Anyway - we are well off the original topic, but still well in line to solving my problem :-)
-- Jason

Offline judgej

  • *
  • 375
  • +0/-0
Re: Problem replacing new disk in RAID
« Reply #16 on: April 08, 2011, 11:06:28 AM »
Okay, I am finding out the hard way that affa is simply not going to work.

On our server, we have backups from our production web servers. These backups are taken each day using rsync, and then a snapshot is taken using rsync and hard links every couple of days. This works in a similar way to affa in that files that do not change between each backup job are not duplicated, but stored as a single inode.

Now, the problem here is that affa is not getting these hard links transferred across. One "index.php" file on our office server, that is hard-linked into twelve monthly backup snapshots, is being copied across to the backup server as twelve individual files. So 40G of backups with monthly snapshots kept over a year on our main server becomes a Terabyte of files on the affa backup server, which will obviously not do.

I guess I am going to have to work out how to tell affa to exclude certain ibays, or at least certain 'snapshot' folders within certain ibays, when backing up.

A word of warning to anyone else who may end up going through this pain: NEVER EVER take out a RAID disk to replace with another, unless you are ABSOLUTELY CERTAIN that the remaining disk has NO read errors ANYWHERE on it. It basically means you cannot consider replacing disks or rebuilding a failed RAID unless you are prepared to take the server completely offline while you do this. The risks and pain involved in trying to get the disk array rebuilt is just too high. I've lost several days of work now, and am not happy.
-- Jason

Offline judgej

  • *
  • 375
  • +0/-0
Re: Problem replacing new disk in RAID
« Reply #17 on: April 08, 2011, 11:45:23 AM »
Adding "exclude" folders in the affa job is easy enough. The next time the backup job is run, it will remove the excluded folders from the backup entirely (don't expect them to simply not back up any more - they will go).

If you run the backup job from the command line, it will do the deletions before it sets the backup going as a background job, so it can take a little time to return from "affa --run prodserver".
-- Jason

Offline axessit

  • *****
  • 213
  • +0/-0
Re: Problem replacing new disk in RAID
« Reply #18 on: April 14, 2011, 11:41:26 PM »
I went through this process (adding a larger drive) and thought I was OK. But after the raid synch'd up, I tested it by pulling one of the drives out and trying to boot, only to find all sorts of problems (the server wouldn't boot, got grub errors, then server would crash trying to mount the LVM etc etc).

Word of warning, if you're thinking of cloning a disk with g4u, ghost or whatever, these won't work as the Linux RAID reads drive serial numbers as part of the process and the cloning tools can't change that. That may have had something to do with my problems too. It also had my server out of action while they were cloning for a couple of hours, but I was doing that as I was paranoid about stuffing up my RAID as I thought only one drive had data.

In the end, I restorted to working through the http://wiki.contribs.org/AddExtraHardDisk  and http://wiki.contribs.org/Raid:Manual_Rebuild to add the larger disk, then create the partitions on the new disk exactly as the old disks - ie creating a 500GB drive on the 1TB one, reinstalling the grub, then adding into the RAID and letting it rebuild, then removing the original 500GB drive and adding a new 1TB drive as above, thus creating a proper 500GB RAID on two 1TB drives, then growing the LVM partition. Apart from the cloning, I did all this on the fly. There were a few reboots, but that was only a few minutes until the system came back up.

Found a good tutorial somewhere on the net regarding creating a RAID on the fly using mdadm, sorry, can't put my finger on it now, that also enhanced the above how-to's quite well.