Koozali.org: home of the SME Server

SME restore questions 6.01-01

Offline kiig

  • *
  • 19
  • +0/-0
    • http://igel.it
SME restore questions 6.01-01
« on: August 24, 2005, 08:18:18 PM »
Hi all.

My SME is crawling above the 2 Gb 'limit' so I'm experimenting with other ways of doing a backup and I have encountered a couple of weird things I'd like your oppinions about.

Currently, - I'm using backup to desktop with the 'backup.exclude' hack which enables me to 'limit' the files I back up along with db dump files and all that, - it enables me to restore a 'complete' working server with a few files missing, - but right now I'm excluding about half the data, - so it'll not be an easy task to fill in the blanks afterwards.

Then I found a tapestreamer, - did a backup with no problems, - and restored it into another machine. All data was there, - but in the 'panel' in server-manager I had no Ibays, Users or anything.... did a post-upgrade /reboot event, - same thing... tried it again.. it wont work.

then I worked with backup2ws... backup was easy, - restore was allmost easy.. the restore.log says "Errors: 2642" or something similar 'big' on the last line, - though I can't see anything but "ok" lines in the log (which is fairly long, - could have missed something.. .-)  )
If I press the Advanced restore options / perform button it does a post-upgrade/reboot event, - but fails (... "call administrator"..something, - the code indicates it either works or fails so the actual message is not that important here). /var/log/messages does not indicate something speciel either.

If I do not use the "perform" button in the panel, - but do it manually, - then I get no errors...


After a reboot, - everything appparently works...


but, - when I 'reconfigure' the backup server (it now has the original ip address and name of the production-server), - and change the name and ip adress and allows it to reboot it, - then the name does not change... pressing ctrl-Alt-f2 on the console after the reboot still gives me the old name.... thought this could be the key for the following problem, - but if I reboot it once more, - the servername changes correctly.... has anyone seen that before ?

The only problem I found, - was that webmail would not work... couldn't log on with anything but admin, - and all Maildir's had the correct user.user owner-rights

I tried reinstalling a 6.01-01 with the pluspackage (php 4.3 + mysql 4.022), - same thing, - did it again with a basic 6.01-01, - same thing...

then, - for some reason I can't remember, - i did a restore once more... that is, onto the allready restored disc... this time the restore.log file ended with "all OK"... ??, - and after a manual post-upgrade/reboot the webmail worked again....


Is this some access right problem... somehow the script can't create all the folders ,- or files in the first run, - but can the second time.. ??

anyone seen this ?

I guess I'll have to look very carefully on the test-box before I try to reinstall the production machine (I just want the software Raid and I assume I need to reinstall it to get it ?, - the pluspackage will also be a nice bonus and that probably also needs a clean 6.01-01 install (it won't work with 6.5RC1 at least))


and along the læines of the first part of this message, - how do you guys do backup... or more interesting, - how do you restore it .-) ?


thanx for any input you might have.
Kim.

Offline raem

  • *
  • 3,972
  • +4/-0
Re: SME restore questions 6.01-01
« Reply #1 on: August 25, 2005, 09:03:07 AM »
kiig

> then I worked with backup2ws...

What backup job did you use/create. Is it the 911 Disaster Recovery backup ?
Did you split the job into parts of 2Gb or less ?

It works for me although I did experience some mail dir permission problems on one occasion after a restore.
...

Offline kiig

  • *
  • 19
  • +0/-0
    • http://igel.it
SME restore questions 6.01-01
« Reply #2 on: August 25, 2005, 09:07:02 AM »
thanx for responding, - yes it was a 911 disaster recovery job, - with the default 'splitsize' 650 mb.

Offline raem

  • *
  • 3,972
  • +4/-0
SME restore questions 6.01-01
« Reply #3 on: August 25, 2005, 09:31:25 AM »
kiig

>..it was a 911 disaster recovery job, - with the default 'splitsize' 650 mb

OK

The other thought I had was that the restore job did not actually finish and that's why you experienced problems, and that's possibly why it worked OK on the second restore.
Depending on the amount of data it can take many many hours to do a big restore.
Some web browers do not work well with the large multi Gb files associated with backup & restore eg some IE versions are no good whereas Netscape is OK (and other browsers too).
...

Offline kiig

  • *
  • 19
  • +0/-0
    • http://igel.it
SME restore questions 6.01-01
« Reply #4 on: August 25, 2005, 09:57:02 AM »
Yes it is a possibility, - but I've done this... 5 times I think now (just to be VERY sure about the behaviour), - and I've waited a long time on some of the jobs, - just to see what happened. Either way, - I wait until the log finishes and the samba-mount is 'umounted', - and I have no disc-activity at that time....

What about the weird servername change... have you noticed that before ?

P.S. it takes about 30 minutes for 3.5 Gb. restore

Offline raem

  • *
  • 3,972
  • +4/-0
SME restore questions 6.01-01
« Reply #5 on: August 25, 2005, 10:36:58 AM »
kiig

Do you check to see that the restore complete email message is sent ? That's the way to ensure the restore has really completed.

What version of backup2ws ?
do
rpm -q smeserver-backup2ws

Your weird config problems seem to me to be symptoms of a restore that did not fully complete and/or a post-upgrade event that did not apply those restored changes ie the post-updgrade did not run or complete or the data being applied was imcomplete due to a restore that did not finish.

What browser & version are you using ?

Restores should only ever be made to freshly installed versions of the sme server operating system
without any user configuration & data etc.
You will have all sorts of unusual combinations of data & users etc etc if you restore to an existing system. It won't work correctly doing that.
...

Offline kiig

  • *
  • 19
  • +0/-0
    • http://igel.it
SME restore questions 6.01-01
« Reply #6 on: August 25, 2005, 10:55:53 AM »
I'm running 0.0.1-22 which was the latest I could find...

regarding the confirmation e-mail I didn't check really, - as I assumed the test-server would have a problem sending emails, - but it is to a local account .... hmm.. I'll look into that.

I'm running XP sp1 with the standard browser... I could try using Netscape when pressing the final "perform" button in the advanced disaster recovery page... thanx for the hint..

Still, - I wonder why I get the fairly large amount of errors the first time I restore with backup2ws (in the last line of the log) though I can't find any lines stating anything but 'ok'...


Regarding the Pluspackage (updates php and mysql, - among others)... should I do a fresh install of SME,- apply the pluspackage and then restore the backup (twice :-)  ) ? or should the pluspackage wait until after the restore....

I assume the backup _could_ contain some templates or other things that might overwrite whatever the pluspackage updates....

I'll try both scenarios tonight.

Thanx sofar for your input Ray.

Offline raem

  • *
  • 3,972
  • +4/-0
SME restore questions 6.01-01
« Reply #7 on: August 25, 2005, 11:21:14 AM »
kiig

>I'm running 0.0.1-22 which was the latest I could find...

looks OK I have
smeserver-backup2ws-0.0.1-22dmay.noarch.rpm


> regarding the confirmation e-mail I didn't check

login as root and check if there is a message in the admin Maildir
or you could directly check the
/var/log/backup2ws/restore.log

> I'm running XP sp1 with the standard browser...
> I could try using Netscape when pressing the
> final "perform" button in the advanced disaster
> recovery page... thanx for the hint..

I would use a different browser than IE for the whole backup and restore process, not just the final post-upgrade function.


>...fairly large amount of errors the first time I
> restore with backup2ws

You need to show us the log file contents to analyse that


> Regarding the Pluspackage.....

The recommended method is to
instal fresh OS
install approved sme 6 updates
restore from backup
instal contribs

As the smeplus does other & additional things then I would leave that until last.
Note that the sme plus script is not an approved update and does do extra things to your system that may cause complications when upgrading to newer versions eg sme7 eg newer version of packages etc
...

kangkc

SME restore questions 6.01-01
« Reply #8 on: September 12, 2005, 03:34:28 AM »
To tag on this thread. I'm looking into disaster recovery as the SME Server is sort of being declared as mission critical, even though we have Software Raid implemented.

The current server has SMEPlus installed and couple of LAMP applications which are are in ibays.

I need to establish a recovery procedure and from forum entries, it seems to confuse me further. I gather that the steps for my case should be:

1. Install a new SME 6 based contribs
2. Restore from backup
3. Install SMEPlus

My question is step 3, wouldn't that actually override the config files for SMEPlus contribs restored in step 2?

Offline raem

  • *
  • 3,972
  • +4/-0
SME restore questions 6.01-01
« Reply #9 on: September 12, 2005, 04:26:55 AM »
kangkc

Try this order:

1. Install new SME 6 OS
2. Install recommended updates (not smeplus)
3. Install add on contribs
4. Install SMEPlus
5, Restore from backup


If you have a software RAID1 setup, you may want to consider making one of the drives removable, then swap out that drive, rebuild the array, and you have the whole server configuration & data as at the date of swapping.

In the event of a major catastrophe you rebuild the array using the last known good RAID drive and then restore any more recent data from normal daily backups.

That way, at least the whole server configuration is "backed up", and only minimal data restore would be needed to get things back to where they were.

It is an interruption to services when you shutdown & swap out the drive, of maybe 10-15 minutes, depending how quickly you can do it.
The array will rebuild itself while the server is being used, thus keeping downtime to a minimum.
Some scripts can help speed up the RAID rebuild/raidhotadd sequence of events as it is a standard procedure to follow.

It's a tradeoff between how quickly you want to be up and running again, versus the "inconveience" of having to swap the drives and rebuild the array regularly (say monthly or as often as you deem necessary).

I have found it to be much less of a hassle to swap a drive every month, and then know that I have a relatively easy and quick process to get the server up and running again, compared to the amount of work & time required to rebuild a server completely & install numerous contribs and restore lots of data as well.
...

kangkc

SME restore questions 6.01-01
« Reply #10 on: September 12, 2005, 05:05:53 AM »
Thanks.
Having someone that is close to the time zone is great as feedback is immediate.  :-D

The proposed raid swap is indeed the easiest way and this is exactly what I have done for M$ based server.

I have not experience HDD failure (so far), just curious and to confirm the step:
When a HDD failed in a SME soft raid, in most situation, we will just shutdown the server, replace the faulty HDD, and restart the server so that SME will re-built the raid. Is this the correct steps?

Offline raem

  • *
  • 3,972
  • +4/-0
SME restore questions 6.01-01
« Reply #11 on: September 12, 2005, 08:32:10 AM »
kangkc

>..shutdown the server, replace the faulty HDD, and > restart the server so that SME will re-built the
> raid.

The rebuild is not automatic, you need to issue the correct commands.
See dmay contrib area for a recovery howto (raidmonitor).
...