Koozali.org: home of the SME Server

Workstation backup failed

Offline DanB35

  • *****
  • 764
  • +0/-0
    • http://www.familybrown.org
Workstation backup failed
« on: June 06, 2016, 02:24:32 AM »
tl;dr: Workstation backup was aborted by a system reboot, and now it won't run at all.

My workstation backup had been running daily over NFS to my FreeNAS server without problems.  It's set to run daily at 23:00, running full backups on Fridays, and incrementals the rest of the week.  It's configured to keep four sets.  A full backup runs somewhere around 9 hours.

Yesterday morning, since I'd installed a bunch of updates in the last few days and the server-manager was complaining about unsaved changes, I decided to to signal-event post-upgrade && signal-event reboot, forgetting that the backup hadn't finished yet.  Of course, the backup was aborted by the reboot.

This morning, I woke to an error message from 23:49 last night saying the backup had failed.  The failure email stated, "Failed to add set /mnt/smb/e-smith.familybrown.org/set1/full-20160603230023 to catalog. No child processes".

Thinking that I'd need to redo the full backup, I reconfigured to run the full backup on Sundays, and manually kicked off the backup from the shell (/sbin/e-smith/do_backupwk).  After running several hours, I got this at the terminal:
Code: [Select]
Error met while opening the last slice: Data corruption met at end of slice, unknown flag found. Trying to open the archive using the first slice...
Aborting program. User refused to continue while asking: Found a correct archive header at the beginning of the archive, which does not stands to be an old archive, the end of the archive is thus corrupted. If you have an external catalog given as reference we can continue, OK ?
Failed to add set /mnt/smb/e-smith.familybrown.org/set1/full-20160603230023 to catalog. No child processes
Backup terminated: backup failed - status: 7424

I'd think that set (full-20160603230023) would be the one that was aborted by the reboot.  Currently the /set1/ directory includes 237 parts of full-20160527230009 and 296 parts of full-20160603230023, as well as seven incrementals.  Six of the incremental sets predated the 3 Jun full set, while one was from 4 Jun.

I'm thinking I need to either clean out the 3 Jun full and 4 Jun incremental sets, or possibly everything out of /set1/.  Thoughts?
......

Offline sages

  • *
  • 190
  • +0/-0
    • http://www.sages.com.au
Re: Workstation backup failed
« Reply #1 on: June 06, 2016, 05:15:13 AM »
If you have the space on your NAS, why not copy all of the existing backups and start with a fresh sequence of backups?
That way you would only have to work with the corrupted backups if you needed to recover anything. And you'd start off with a clean slate as far as backups go.
...

Offline DanB35

  • *****
  • 764
  • +0/-0
    • http://www.familybrown.org
Re: Workstation backup failed
« Reply #2 on: June 06, 2016, 12:32:05 PM »
If you have the space on your NAS, why not copy all of the existing backups and start with a fresh sequence of backups?
Seems like quite the "sledge hammer" solution, but there's a definite logic there.  I have over 20 TB free on the NAS, so space isn't a factor right now.  I'll give that a try and see what happens.
......

Offline DanB35

  • *****
  • 764
  • +0/-0
    • http://www.familybrown.org
Re: Workstation backup failed
« Reply #3 on: June 07, 2016, 03:06:39 AM »
So moving the entire backup tree somewhere else, and starting the backup fresh, has worked.  The backup also completed much faster than some recent full backups, taking just under 6 hours, rather than 8-9.  I'm guessing the extra time is mainly in cataloging the other backup sets, but I'm surprised it's that much time.

Not completely happy with that solution, though--an interrupted backup (particularly when the interruption was an orderly system shutdown) should not result in an irretrievably broken backup set.  If that's just what happens with the workstation backup system (i.e., with dar), then that's really too fragile for production use.
......

Offline sages

  • *
  • 190
  • +0/-0
    • http://www.sages.com.au
Re: Workstation backup failed
« Reply #4 on: June 07, 2016, 04:08:23 AM »
I agree it's not ideal but at least it has allowed you to get your system operational again. Sometimes starting with a clean slate can be the most expedient way of moving forward.
...

Offline TerryF

  • grumpy old man
  • *
  • 1,847
  • +6/-0
Re: Workstation backup failed
« Reply #5 on: June 07, 2016, 06:07:35 AM »
Have a read here: https://bugs.contribs.org/show_bug.cgi?id=9159

There are also other bugs, more than this bug lists, the fix suggested here has been muted in other bugs as well..do a search in buzilla for dar_manager

Edited appropriate file manually and did not have any issues...
--
qui scribit bis legit

Offline DanB35

  • *****
  • 764
  • +0/-0
    • http://www.familybrown.org
Re: Workstation backup failed
« Reply #6 on: June 07, 2016, 02:31:19 PM »
I agree it's not ideal but at least it has allowed you to get your system operational again.
Indeed, and thanks for the suggestion.
......