Koozali.org: home of the SME Server
Obsolete Releases => SME Server 8.x => Topic started by: p-jones on January 10, 2015, 11:16:00 AM
-
Is this how a workstation backup report should look ?
Partial backup stored on backup workstation.
Session cleanly closed by timeout after 82770 seconds.
Not an error, backup process will continue next night.
Received signal: Quit
Archive delayed termination engaged
Disabling signal handler, the next time this signal is received the program will abort immediately
Final memory cleanup...
Program has been aborted for the following reason: Thread cancellation requested, aborting as properly as possible
Error while running dar: 4 at /etc/e-smith/events/actions/workstation-backup-dar line 505.
Backup terminated: backup failed - status: 7424
-
Maybe you want to follow this http://bugs.contribs.org/show_bug.cgi?id=8789
-
Thank You. I will follow this bug however I am using SME8.1 and it is not a USB disk so I am not sure if it is fully relevant.
In fact at this time I am not sure if it is a bug or only a limitation. I have done a heap of searching on "Dar backp resume" and similiar but not really turned anything up which is useful.
-
how much data to backup? where?
-
Do you have your backup set to timeout at 23hrs?
What is your backup config
db configuration show backupwk
will display your settings from a terminal
-
Backup is set to time out at 23hrs (with the expectation it will resume and complete). Without the timeout, the job tries to restart before the previous job has completed.
backupwk=service
BackupTime=23:19
Compression=6
CompressionProg=gzip
DaysInSet=1
FullDay=7
IncOnlyTimeout=no
Login=
Password=
Program=dar
SetsMax=1
SmbHost=192.168.2.22
SmbShare=nasstore
Timeout=23
VFSType=cifs
status=enabled
Backing up about 1.5Tb of Data
-
How much free space on the target?, default behaviour setup requires 2*dar to complete successfully
Added: can you include the section of log that covers this period and error please
-
Jan 10 22:24:34 server2 /sbin/e-smith/do_backupwk[25731]: /home/e-smith/db/backups: OLD 1420798741=backup_record|BackupType|workstation|StartEpochTime|1420798741
Jan 10 22:24:34 server2 /sbin/e-smith/do_backupwk[25731]: /home/e-smith/db/backups: NEW 1420798741=backup_record|BackupType|workstation|EndEpochTime|1420881874|StartEpochTime|1420798741
Jan 10 22:24:34 server2 /sbin/e-smith/do_backupwk[25731]: /home/e-smith/db/backups: OLD 1420798741=backup_record|BackupType|workstation|EndEpochTime|1420881874|StartEpochTime|1420798741
Jan 10 22:24:34 server2 /sbin/e-smith/do_backupwk[25731]: /home/e-smith/db/backups: NEW 1420798741=backup_record|BackupType|workstation|EndEpochTime|1420881874|Result|backup:7424|StartEpochTime|1420798741
There is heaps of room - approx 3-3.5Tb's
After re-reading a link re backup timeouts, setting full backups to timeout at 23hrs is wrong - my bad, and I have kicked off a new backup without that option. Will be 20hrs before I know anything.
-
There is heaps of room - approx 3-3.5Tb's
A full backup with default settings requires space for twice the size of the backup, there also memory requirements to consider.
-
Exact amount Data is 1.1Tb. 3-3.5Tb (depending on how you define 1Tb) far exceeds the 2x data. I would have expected 4Gb RAM, an x64 Kernel and zero swapfile useage to suggest the RAM resources were adequate ??
I am still assuming the message I initially posted is incorrect and that my expectation that the backup should just resume is not an incorrect assumption.
-
You may very well be correct in all assumptions, however without knowing your SME version, config or hardware setup etc, things like this have to be considered. http://sourceforge.net/p/dar/mailman/message/18851530/
and it may simply be the miss config as you suggest
-
There are a number of references to issues that MAY be related to the errors we sometimes see with a failed backup.
eg.
Jan 8 01:12:16 rslserver kernel: CIFS VFS: sends on sock e8f45b00 stuck for 15 seconds
Jan 8 01:12:16 rslserver kernel: CIFS VFS: Error -11 sending data on socket to server
Jan 8 01:12:16 rslserver kernel: CIFS VFS: Write2 ret -11, wrote
and
Jan 11 00:32:43 rslserver kernel: CIFS VFS: No response to cmd 47 mid 32912
Jan 11 00:32:43 rslserver kernel: CIFS VFS: Write2 ret -11, wrote 0
Jan 11 00:32:46 rslserver kernel: CIFS VFS: No response to cmd 47
http://blog.dhampir.no/content/cifs-vfs-no-response-for-cmd-n-mid
Am now running with OpLocks disabled on a system that has displayed intermittent failures for unknown reasons.
Lets see what happens :-)
db configuration setprop smb OpLocks disabled
signal-event ibay-modify
-
Thank you for that information.
At approx 23hrs 45min the backup stopped in an orderly manner and appears to have resumed where it left off. That is the result of changing the timeout setting. Guess I can only be patient and see where it goes from here !
Re oplocks, I have been running without oplocks for some significant time to accomodate and "old" database which hates op locks.
I have a second identical server that also also runs without op locks for same reason but only has a tiny amount of data and that has backed up totally uneventfully for a long time, same config.
At this point, although perhaps a little prematurely, I am leaning towards answering my initial question that the reports I posted is abnormal and also that I ererred in timing out the full backup. Time will confirm.
-
The reason I went a looking is because this morning got this cron error report, look familiar? :-)
Partial backup stored on backup workstation.
Session cleanly closed by timeout after 88500 seconds.
Not an error, backup process will continue next night.
Received signal: Quit
Archive delayed termination engaged
Disabling signal handler, the next time this signal is received the program will abort immediately
Final memory cleanup...
Program has been aborted for the following reason: Thread cancellation requested, aborting as properly as possible
umount: /mnt/smb: device is busy
umount: /mnt/smb: device is busy
umount: /mnt/smb: device is busy
umount: /mnt/smb: device is busy
Error while running dar: 4 at /etc/e-smith/events/actions/workstation-backup-dar line 505.
Backup terminated: backup failed - status: 7424
Full backup failed, and then an inc was done, screwy...
See log extracts above..
Seems to all be related to samba issues..maybe :-)
-
About time you guys start report and continue this possible issue over at our Bugzilla :-)
-
is there any (open) bug about it?
IMO we should change backup behaviour.. I mean.. a full backup with hundreds/thousands GB of data need many hours, days to complete.. backup script should start and continue till the end.. it should create a lockfile and check for the existence of it at start.. if the lockfile exists then another backup instance is running.. abort and send an informative email to root.. when backup is done the lockfile is removed..
@p-jones: if you can, try no use nfs.. in my experience it is almost 3 times faster than smb
-
Stefano,
there is not an open bug for this. My primary issue was a mis-configuration, not a bug and the balance of my issues are the limitations of the software.
Your comments re lockfile make perfect sense however I dont believe it appropiate to raise an NFR for V8/8.1 and I have not progressed to V9 as yet.
I have trialed nfs rather than cifs however I have not experienced anywhere near close to a 3x speed improvement. Speed improvement, if any, is very small, in my experience
-
Just an update, from my January 12, 2015, 12:11:24 PM post..
Turned out to be a Cat5 Ethernet cable from a wall socket to a basic D-Link DGS-1005A switch, it turned out to be an intermittent loss of connectivity, perhaps caused by changes in ambient temperatures in an room that is not air conned or cooled, anyway it was not consistent, only confirmed it as I was watching the switch as it lost connectivity, a jiggle and it was back, an hour or so later gone again.
Changed the cable and all problems have now gone, hopefully never to return :-) so as always make sure the easy basic things are right first before going digging :-)
-
please see https://bugs.contribs.org/show_bug.cgi?id=9127