Koozali.org: home of the SME Server

Recent outage a timely reminder to have a working backup in place

Offline joshAU

  • ***
  • 70
  • +0/-0
Hi Guys. Glad to hear your back.
What a nightmare hey.

Happens to us all at some point, bummer it was an array.
Everyone should take heed and make sure you have a working backup of your own systems.

It happened to me on  a standalone disk, after a simple power failure the sme server would not boot, corrupted hard disk....and no recent backup....

A week later after manually recovering all files I was back...if I had had a recent backup I could have been up within an hour or two, and then recovered the remaining files later.

Probably somewhat harder in your case, I imagine. :smile:
I guess someone there is really going to enjoy their weekend this weekend.


Offline Hummdis

  • *
  • 16
  • +0/-0
    • Hummdis.com
Re: Recent outage a timely reminder to have a working backup in place
« Reply #1 on: January 16, 2009, 01:07:00 AM »
I could not agree with you more! :D

Just the other day I was reading some pretty scary stats about data loss that should scary any admin enough to implement some type of backup.

I use BRU Server for Linux to backup my systems in-house, including my SME Server system.  Since the SME Server is CentOS (BRU Server has a native Agent), it works great!

I had a hard drive fail about a year ago and with BRU Server, I had the system back online in about an hour -- once I made sure that everything was working as it should and making the required changes that I needed to make that were not included in my backup (because I didn't select them to be -- user error on that one), I was running 100% within two hours of the crash.

A week offline to restore a RAID seems excessive (my 2 cents).  My 4TB RAID was restored in just a few hours when it crashed in July.
« Last Edit: January 16, 2009, 05:27:45 AM by Hummdis »
"It is always darkest just before it goes pitch black."
-Despair, Inc.

Offline mrjhb3

  • *
  • 1,188
  • +0/-0
    • John Bennett Services
Re: Recent outage a timely reminder to have a working backup in place
« Reply #2 on: January 16, 2009, 04:43:37 AM »
I could not agree with you more! :D

I use BRU Server for Linux to backup my systems in-house, including my SME Server system.  Since the SME Server is CentOS, it works great!


Would you provide some details as to how you have integrated BRU with SME Server?  How have you structured the backup and restore to take into account the extra stuff SME Server does before and after a backup and restore?  Also curious as to what the price is asthe website doesn't list that.

Thanks,

John
......

Offline Hummdis

  • *
  • 16
  • +0/-0
    • Hummdis.com
Re: Recent outage a timely reminder to have a working backup in place
« Reply #3 on: January 16, 2009, 05:25:26 AM »
Would you provide some details as to how you have integrated BRU with SME Server?  How have you structured the backup and restore to take into account the extra stuff SME Server does before and after a backup and restore?  Also curious as to what the price is asthe website doesn't list that.

Thanks,

John

There was no integration needed!  BRU Server uses a client-server architecture and they have what's called an Agent that is installed onto the system.  Once installed, the main BRU Server system connects to the Agent to backup the files that you selected for backup.

Concerning your question as to the extra stuff SME Server does before and after a backup/restore, BRU Server uses the BRU I/O engine and the SME Server has no idea that BRU Server has backed the system up.

To be honest though, what exactly does the SME Server do before and after a backup/restore process?

I have used the backup/restore function in SME only once and the one time I did I learned of the 4GB (if I remember correctly) archive import limitation.  This put a VERY bitter taste in my mouth as I was not able to restore the archive that I created using SME with the SME restore tool.  That's when I searched for a native Linux backup tool that would be reliable and fast.

As for the price, it depends on the number of client systems that you plan to backup.  TOLIS Group does not charge for the amount of data you backup, just the number of clients (BRU Server only).  I only have five clients so I purchased a 5-client license (they can create custom client numbers upon request).  You can find the standard pricing on the website in a single page (I did some checking on the site to find this -- there's a link on right side that says "BRU Server Pricing").  I bought mine about three years ago and if my memory serves me right, it was about $850 or so for the five clients.

BRU Server Pricing Page

I had considered Arkeia Network Backup but they wanted almost 7x as much!  Additionally, Arkeia has only been around a short time in comparison to the BRU technology.  Arkeia was founded in 1996 (reference) whereas the BRU technology was started in 1985 (reference).  That alone made me feel more confortable because the BRU technology is 11 years more mature. :)

I feel like a salesman! :)

I was just curious as to what backup method/product that SME Server uses that would cause a 1-week downtime.  Do you know?

According to the data loss stats that were in my first post, if SME Server falls into either of these two statistics:
  • 60% of companies that lose their data will shut down within 6 months of the disaster.
  • 93% of companies that lost their data center for 10 days or more due to a disaster filed for bankruptcy within one year of the disaster. 50% of businesses that found themselves without data management for this same time period filed for bankruptcy immediately. (National Archives & Records Administration in Washington)
It would mean that SME Server could be going out of business within the year.  :cry:

Now, I don't think that will actually happen because of the strong community following, but it makes one wonder....
"It is always darkest just before it goes pitch black."
-Despair, Inc.

Offline slords

  • *****
  • 235
  • +3/-0
Re: Recent outage a timely reminder to have a working backup in place
« Reply #4 on: January 16, 2009, 04:53:05 PM »
A week offline to restore a RAID seems excessive (my 2 cents).  My 4TB RAID was restored in just a few hours when it crashed in July.

Well that is a pretty arrogant statement without knowing anything about what happened.  Would you please share with us what system you have that allows you to restore/copy data at 580+MB/s (time needed to restore 4TB in 2 hours asuming you only have to copy/restore the data one time).  Using a more realistic throughput of 30MB/s average 4TB should take about 38 hours to copy/restore.

According to the data loss stats that were in my first post, if SME Server falls into either of these two statistics:
  • 60% of companies that lose their data will shut down within 6 months of the disaster.
  • 93% of companies that lost their data center for 10 days or more due to a disaster filed for bankruptcy within one year of the disaster. 50% of businesses that found themselves without data management for this same time period filed for bankruptcy immediately. (National Archives & Records Administration in Washington)
It would mean that SME Server could be going out of business within the year.

These numbers represent companies that are making money with the data that they supposedly lost.  SME doesn't fall into this category.  SME also didn't loose ANY data and wasn't down 10 days.
« Last Edit: January 16, 2009, 04:58:25 PM by slords »
"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs,
and the Universe trying to produce bigger and better idiots. So far, the Universe is winning." -- Rich Cook

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: Recent outage a timely reminder to have a working backup in place
« Reply #5 on: January 16, 2009, 08:20:16 PM »
It would mean that SME Server could be going out of business within the year.

SME Server, Inc, is not a business.

Offline christian

  • *
  • 369
  • +0/-0
    • http://www.szpilfogel.com
Re: Recent outage a timely reminder to have a working backup in place
« Reply #6 on: January 17, 2009, 03:53:14 AM »
According to the data loss stats that were in my first post, if SME Server falls into either of these two statistics:
  • 60% of companies that lose their data will shut down within 6 months of the disaster.
  • 93% of companies that lost their data center for 10 days or more due to a disaster filed for bankruptcy within one year of the disaster. 50% of businesses that found themselves without data management for this same time period filed for bankruptcy immediately. (National Archives & Records Administration in Washington)
It would mean that SME Server could be going out of business within the year.  :cry:

Ah yes those stats. I've used them myself in the past when re-enforcing the need for businesses to plan for business continuity.

The problem with those stats is they read as cause and effect where losing data causes business failure. These stats are not in fact cause and effect. They are correlation. It could in fact be that companies doomed to failure also have bad business continuity practices.

In contribs case it is far more likely that the failure was due to under-funding and best effort time allocation. Of course I say that in ignorance of the details.

Christian

SME since 2003

Offline kruhm

  • *
  • 680
  • +0/-0
Re: Recent outage a timely reminder to have a working backup in place
« Reply #7 on: January 19, 2009, 06:17:32 AM »
The whole situation does bring up the question of what's the best backup system for general SME system admins?

I suffered a corrupt filesystem that put me down for 48 hours as I tried to fix it. And another 72 to rsync the data back. I lost important key clients that I won't regain and it took me days to respond to phone calls and email.

In short, my backup system was/is flawed. Problem is, I don't know what to do about it.

Using a more realistic throughput of 30MB/s average 4TB should take about 38 hours to copy/restore.

And even less if you have it at a datacenter.

Offline smiit

  • ***
  • 41
  • +0/-0
Re: Recent outage a timely reminder to have a working backup in place
« Reply #8 on: January 19, 2009, 05:45:40 PM »
The whole situation does bring up the question of what's the best backup system for general SME system admins?

I suffered a corrupt filesystem that put me down for 48 hours as I tried to fix it. And another 72 to rsync the data back. I lost important key clients that I won't regain and it took me days to respond to phone calls and email.

In short, my backup system was/is flawed. Problem is, I don't know what to do about it.

And even less if you have it at a datacenter.

I previously used an rsync script that Silasp posted a few years back that worked well:

http://forums.contribs.org/index.php/topic,31246.msg131477.html#msg131477

I then switched to AFFA last year - it's like an automated turbocharged version of the rsync method linked above.

It's worked flawlessly so far - the main production server is a Dell SC440 with 2 identical 500GB drives in the default software raid setup.  It was 'risen' using the AFFA move hardware option.

It replaced a Dell SC420 which is now the offsite AFFA backup with 2 1TB drives that syncs overnight and can be moved onsite and risen in a matter of hours.

Relatively inexpensive and between the duplicate raid drives in each machine and having one physically offsite I feel fairly secure.

Offline Hummdis

  • *
  • 16
  • +0/-0
    • Hummdis.com
Re: Recent outage a timely reminder to have a working backup in place
« Reply #9 on: January 24, 2009, 03:53:46 PM »
Well that is a pretty arrogant statement without knowing anything about what happened.  Would you please share with us what system you have that allows you to restore/copy data at 580+MB/s (time needed to restore 4TB in 2 hours asuming you only have to copy/restore the data one time).  Using a more realistic throughput of 30MB/s average 4TB should take about 38 hours to copy/restore.

I understand what you're asking about the transfer rate and the fact that I'm now be considered an 'idiot' by you (don't lie) only indicates your lack of knowledge of data transfer technology.

An LTO-3 drive runs at about 60MB/s on average, LTO-4 does better at about 90-100MB/s sustained when writing to the tape.  Those two transfer mechanisms are SCSI and those speeds are possible.  However, when reading from the tape, it's not uncommon for an LTO-3 user to report 80-90MB/s or 130-140MB/s with LTO-4.

As a quick tangent, understand that LTO-3 has a top native speed of 80MB/s and LTO-4 has a top native speed of 120MB/s. Remember, native.  That does not include any compression that could be going on there, in which case 2:1 compression indicates that the top speed of LTO-3 with 2:1 compression is 160MB/s and for LTO-4 it would be 240MB/s.

Now, that's tape data transfer.  Let's talk disk...say, 4-Gig Fibre-Channel?

Here's the way the setup works...there's a 4TB RAID that's being mirrored to another 4TB RAID. What I actually backup is the mirror, not the live disk.  However, when the original RAID failed in July, as my initial post indicated, I only needed to restore the data from the mirror over since the mirror was fine.

Therefore, you repair the initial RAID failure, in this case it was two hard drives in the RAID, and perform the sync back.  4TB of data on a 4GB Fibre-Channel....the math works out to about 100 minutes (1 hour, 40 minutes).

I won't get into 8GB Fibre-Channel....

Quote
These numbers represent companies that are making money with the data that they supposedly lost.  SME doesn't fall into this category.  SME also didn't loose ANY data and wasn't down 10 days.

I never said that SME lost data, I was simply inquiring as to what backup software was being used.

SME was not down for 10 days, but are you really criticizing me for 3 days!?  It was a full week that was lost.  Even at your data transfer indications, SME should have been back up within 48 hours if the restore operation only took 38 hours.  So why the six extra days?

I was not trying to be ridiculed or be "arrogant" with my statement and if you'll notice, I was asked for additional details.  I was simply trying to let the admins of Contribs.org know that there are utilities out there that can get you back online in a smaller time frame than the 7 days that Contribs.org was down.

SME Server, Inc, is not a business.

This is true.  Therefore the stats about going out of business really doesn't apply.  However, if the SME team lost all of the data today, could they pick up with where they left off tomorrow and how badly does it affect things if they can.

The bottom line in my statement was that your data is important, no matter what you do with the data.  It's lost time, lost money (potentially and realistically), and a down right pain in the a@# when data is lost.  It's not a fun thing for any company or organization.  I'm sorry that SME had to go through that whole ordeal.  I really am! I've done it, it's not fun, and quite frankly it's exhausting.  Sleepless nights, stressful days....I was really just trying to be helpful to the community by informing them of a utility that can help them rest a bit easier.
« Last Edit: January 24, 2009, 03:55:26 PM by Hummdis »
"It is always darkest just before it goes pitch black."
-Despair, Inc.

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: Recent outage a timely reminder to have a working backup in place
« Reply #10 on: January 24, 2009, 04:30:56 PM »
I was not trying to be ridiculed or be "arrogant" with my statement and if you'll notice, I was asked for additional details.  I was simply trying to let the admins of Contribs.org know that there are utilities out there that can get you back online in a smaller time frame than the 7 days that Contribs.org was down.

maybe you are missing the point that it was an HW failure, so maybe NOT a hd failure.. and that contribs.org runs on hw donated or payed with donations..

anyway, I miss the point: what are you trying to demonstrate? or what is your curiosity?

Ciao
Stefano

Offline slords

  • *****
  • 235
  • +3/-0
Re: Recent outage a timely reminder to have a working backup in place
« Reply #11 on: January 24, 2009, 10:14:38 PM »
I appreciate your responses.  I can see how you could recover your 4Tb of data using that setup.  I've got something very similar (and larger ~500TB) at work so I know the speeds that it can push.

However if you price out all the equipment you are talking about you have a pricetag that is close the the national budget of some small countries.  Contribs is currently running on the hardware that I have running which isn't much but is plenty to keep contribs running.  I also run it on my own time and can't spend 24x7 maintaining it.  Throw into my part time schedule with trying to coordinate with two different vendors and you are lucky it only took 7 days to get the system back up.
"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs,
and the Universe trying to produce bigger and better idiots. So far, the Universe is winning." -- Rich Cook

Offline dgs

  • ***
  • 63
  • +0/-0
Re: Recent outage a timely reminder to have a working backup in place
« Reply #12 on: January 25, 2009, 12:54:08 AM »
Contribs is currently running on the hardware that I have running which isn't much but is plenty to keep contribs running.  I also run it on my own time and can't spend 24x7 maintaining it.  Throw into my part time schedule with trying to coordinate with two different vendors and you are lucky it only took 7 days to get the system back up.

The effort to provide and maintain that hardware, and the extra efforts taken to rectify and restore after recent failures are very much appreciated.
We'd all love perfect, state of the art hardware, but that hardware is not free, nor is the time to install and maintain itself or the bandwidth that connects it to us the audience.  Many thanks to all who have contributed it whatever ways and particular thanks to those who have rushed to get this site back on-line.