Backup errors...

Thomas Kristensen

Backup errors...

« on: February 23, 2003, 01:09:37 PM »

Hi everyone,

Got up this morning and found a report from my nightly backup filled with this:

DUMP: bread: lseek fails
DUMP: bread: lseek fails
DUMP: short read error from /dev/md1: [block -1713827056]: count=4096, got=0
DUMP: bread: lseek2 fails!
DUMP: short read error from /dev/md1: [sector -1713827056]: count=512, got=0
DUMP: bread: lseek2 fails!
DUMP: short read error from /dev/md1: [sector -1713827055]: count=512, got=0
DUMP: bread: lseek2 fails!
DUMP: short read error from /dev/md1: [sector -1713827054]: count=512, got=0
DUMP: bread: lseek2 fails!

Server is a SME 5.5U2, SRAID-1. Apparently there is something broken in the filesystem, but trying to run fsck or e2fsck results in severe warnings about serious damage to mounted filesystems so I haven't done that. Is it safe to run fsck or e2fsck??

I tried rebooting the server, hoping that an automatic filesystem check would be run on startup but there were no complaints at all and everything seems to be working fine. I am, however, a little nervous that something serious has happened and besides that, my backups don't work...

Any help is much appreciated...

Thanks in advance,
Thomas

Logged

blakeh

Re: Backup errors...

« Reply #1 on: March 09, 2003, 06:35:10 PM »

I started getting these this morning also, anyone reply to you?

Logged

blakeh

Re: Backup errors...

« Reply #2 on: March 09, 2003, 07:03:44 PM »

think I found the solution to this, see this link:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=7960d3ee.0111080951.76cb4e28%40posting.google.com

Here's the text if the link doesn't work, and it's a good little backgrounder on dump.

------
From: Doug Freyubrger (dfreybur@yahoo.com)
Subject: Re: DUMP: short read error from
View: Complete Thread (2 articles)
Original Format
Newsgroups: comp.unix.admin
Date: 2001-11-08 09:51:13 PST

ronen amity wrote:
>
> trying to dump from red hat 6.2 to a tape that is on a solaris 2.6 box
> the disk is mirrord, and tar to /dev/null works fine.

I trimmed like a hundred groups from the distribution list.

I trimmed out most of the lines to highlite the interesting ones:

> DUMP: Dumping /dev/md1 (/var/spool/imap) to /dev/rmt/0cn on host
> guesttp@police

A very active filesystem, contray to the instructions to have a filesystem
either offline or idle per the dump man page.

> DUMP: mapping (Pass I) [regular files]
> DUMP: mapping (Pass II) [directories]
> DUMP: dumping (Pass III) [directories]
> DUMP: dumping (Pass IV) [regular files]

Few people ever look at the above lines and think about what they imply. Dump
works on a raw device, not through the filesystem. If the filesystem changes
while it is running, it has no way of knowing that.

> DUMP: mapping (Pass I) [regular files]

First (Pass 1) every inode is scanned to see if it is newer mtime/ctime than
the reference time. All inodes to be backed up are put into a list, and the
information stored includes there sizes (critical data later on in the tale).

> DUMP: mapping (Pass II) [directories]

Next (Pass 2) the root directory of the device is put in the list and a
dependency tree is built to ensure that all inodes being backed up have
directory entries. If needed, parent directory inodes are found and added to
the list. Once again, the sizes of all inodes invloved are stored at that time
(critical data later if a directory shrinks somehow, this is way a directory
retains its blocks when child files are deleted, BTW). In the process all of
the inodes that are directories are sorted into dependency order.

> DUMP: dumping (Pass III) [directories]

Next (Pass 3) the inodes in the list that are directories are dumped to tape.
If a directory were to shrink between Phase 2 and Phase 3, you would get an
error here.

Most of the time the above 3 passes are fairly fast, and most of the time the
filesystem does not change enough to have much impact even if it is mounted.

> DUMP: dumping (Pass IV) [regular files]

This phase can take a long time in many cases. Each non-directory to be
backed up is copied to tape in inode order. Note that the exact length
to be backed up was recorded in an earlier phase. If a file is unchanged
during that time-span that entire file is backed up. If a file grows, only the
blocks that existed at the start are backed up. If a file is deleted (vi and
IMAP tend to recycle the inode, but emacs and POP tend to create new files and
delete the old ones) then you get a complaint that an inode has vanished.

If a file is truncated during the time span sice the earlier passes, like an
IMAP client reading and deleting a message, dump knew how many bytes to dump,
but the file now longer has that many bytes. Dump tried to read off the end
of the file, past its end. The result is an error like this:

> DUMP: bread: lseek fails
> DUMP: short read error from /dev/md1: [block -1686942968]: count=4096,
> got=0
> DUMP: bread: lseek2 fails!

Yup, you backed up an active filesystem. It takes a very high level of activity
to catch dump between phases.

For what it's worth, if you switch from "mailbox format" to "maildir format"
for your e-mail, there will be far more files but each one will be smaller.
You will encounter these errors far less often. Maildir format has its own
price if your users keep a lot of messages, though. It's a tradeoff.

Logged

blakeh

Re: Backup errors...

« Reply #3 on: March 09, 2003, 07:07:28 PM »

One note to add, I believe I got these errors, and also Thomas did as well because, on Sunday mornings, SME does it's log rotate and other weekly functions, and if a long backup is taking place, especially after reading the above google post, it will cause these errors because the file list and inode block list will change from the time the backup started to after the sme weekly events.

Just a theory.

bh

Logged

Thomas Kristensen

Re: Backup errors...

« Reply #4 on: March 11, 2003, 10:52:50 AM »

I actually never got an answer to my original question about "lseek2 fails" and such but the problem went away by itself during the very next backup.

The day before the errors showed up I had been moving my emails around a lot, I was trying to go from a pop-based mailbox to IMAP and so I was copying all my mailboxes from pop to imap folders using Outlook XP.

I did this several times since I had troubles keeping the original timestamps on the messages. At some point the SME server started coughing up errors regarding message sequence (I actually don't remember precisely but it was something with the internal pointers between the messages).

I let messages be messages and left the machine overnight and the next morning the backup errors were all over the place. I suspected my imap folders and I then proceeded to remove each and every one and create them again. That made the backup error go away...

Thanks for your assistance,
Thomas

Logged

blakeh

Re: Backup errors...

« Reply #5 on: March 11, 2003, 05:07:13 PM »

Mine went away too the next day, but creeped up again another night. I was copying files and moving things around as well during a backup (was in remotely, didn't realize backup was in progress) and the report had the same lseek errors. I'm looking into tapeware to see how it behaves in these situations.

bh

Logged