[Bug 11521] New: rsync does not use high-resolution timestamps to determine file differences

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 11521] New: rsync does not use high-resolution timestamps to determine file differences

samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=11521

            Bug ID: 11521
           Summary: rsync does not use high-resolution timestamps to
                    determine file differences
           Product: rsync
           Version: 3.1.2
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P5
         Component: core
          Assignee: [hidden email]
          Reporter: [hidden email]
        QA Contact: [hidden email]

The sub-second timestamps available on many filesystems are preserved when
requested across copies, but aren't used to determine file differences.

If a file exists at both origin and destination and its contents the same size
in each place, and the timestamps only differ in the sub-second resolution,
rsync will treat the files as the same (unless you use --checksum).

So if a file is created, and then a snapshot of its dir is taken, then the
origin file is modified (but the size is preserved) within the same second, an
attempt to update that snapshot using rsync will fail to copy the change.

Here's a script that reproduces the issue with high reliability for me:

#!/bin/bash                                                                    

set -x

DIR=$(mktemp -d -p $(pwd))

mkdir $DIR/d1
mkdir $DIR/d2

echo dummy > $DIR/d1/dummy
echo dummy > $DIR/d2/dummy

echo one > $DIR/d1/afile
sleep 0.1
echo two > $DIR/d2/afile

/usr/bin/stat $DIR/d1/afile | grep Mod
/usr/bin/stat $DIR/d2/afile | grep Mod

~/packages/rsync/rsync --delete -a -HAX -vii $DIR/d2/ $DIR/d1

diff -r $DIR/d1 $DIR/d2

/usr/bin/stat $DIR/d1/afile | grep Mod
/usr/bin/stat $DIR/d2/afile | grep Mod



If the diff shows a difference, then the rsync didn't copy afile's contents
over. However, note the stat info from the last two lines - the updated modify
timestamp *will* be synced, making an inconsistent sync.

The following patch adds a check of the high-res timestamp to unchanged_file.
This solves the problem for me, and I've guarded it so it shouldn't break on
systems with no high-res timestamp. Please let me know if I can be helpful in
testing it further or making it more robust.



diff --git a/generator.c b/generator.c
index 3a4504f..2f64f5d 100644
--- a/generator.c
+++ b/generator.c
@@ -588,7 +588,11 @@ int unchanged_file(char *fn, struct file_struct *file,
STRUCT_STAT *st)
        if (ignore_times)
                return 0;

-       return cmp_time(st->st_mtime, file->modtime) == 0;
+       return cmp_time(st->st_mtime, file->modtime) == 0
+#ifdef ST_MTIME_NSEC
+               && st->ST_MTIME_NSEC == F_MOD_NSEC(file)
+#endif
+               ;
 }

--
You are receiving this mail because:
You are the QA Contact for the bug.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

[Bug 11521] rsync does not use high-resolution timestamps to determine file differences

samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=11521

--- Comment #1 from Michael McCracken <[hidden email]> ---
Created attachment 11440
  --> https://bugzilla.samba.org/attachment.cgi?id=11440&action=edit
patch to check hi-res timestamp in unchanged_file

--
You are receiving this mail because:
You are the QA Contact for the bug.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

[Bug 11521] rsync does not use high-resolution timestamps to determine file differences

samba-bugs
In reply to this post by samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=11521

--- Comment #2 from Andrey Gursky <[hidden email]> ---
(In reply to Michael McCracken from comment #1)

I believe the rsync maintainer might have commented this with at least the
reference to the mailing list [1], where this has been already proposed, though
ignored (like this bug report either).

The things are not so easy, of course [2] (and follow the discussion).

[1] [PATCH] Consider nanoseconds when quick-checking for unchanged files
    https://lists.samba.org/archive/rsync/2014-December/029853.html
[2] [PATCH] Consider nanoseconds when quick-checking for unchanged files
    https://lists.samba.org/archive/rsync/2016-January/030511.html

--
You are receiving this mail because:
You are the QA Contact for the bug.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

[Bug 11521] rsync does not use high-resolution timestamps to determine file differences

samba-bugs
In reply to this post by samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=11521

Wayne Davison <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #3 from Wayne Davison <[hidden email]> ---
The latest git version has an option that lets you choose to include
nanoseconds in comparisons if you want them. Having it on by default would
likely cause far too many headaches for various backup solutions that use an
older filesystem (e.g. ext3) that doesn't support nanoseconds.

--
You are receiving this mail because:
You are the QA Contact for the bug.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: [Bug 11521] rsync does not use high-resolution timestamps to determine file differences

f-rsync
[Included text copied from the commit.]

    > Date: Sun, 24 Jan 2016 19:52:49 +0000
    > From: [hidden email]

    > Auto-Submitted: auto-generated

    > https://bugzilla.samba.org/show_bug.cgi?id=11521

    > Wayne Davison <[hidden email]> changed:

    >            What    |Removed                     |Added
    > ----------------------------------------------------------------------------
    >              Status|NEW                         |RESOLVED
    >          Resolution|---                         |FIXED

    > --- Comment #3 from Wayne Davison <[hidden email]> ---
    > The latest git version has an option that lets you choose to include
    > nanoseconds in comparisons if you want them. Having it on by default would
    > likely cause far too many headaches for various backup solutions that use an
    > older filesystem (e.g. ext3) that doesn't support nanoseconds.

Thanks for the patch!

Just FYI, this comment is true but incomplete---the scenario I was
describing was straight ext4-to-ext4 copies and/or backups.  The
timestamp problem I currently see with those is because rsync was
throwing away the nanosecond information until this patch, even though
both ends supported it.

Anyone who's used dirvish, or presumably similar tools such as
rsnapshot, from and to ext4 or other ns-supporting filesystems, will
be bitten by the problem of non-ns vs ns timestamps bloating backups
and breaking hardlinks, either when they manually use --modify-window=-1,
or when this becomes the default.

I'd still love to see either some in-rsync workaround that can be left
in place*, or (second best) some clever by-hand one-time workaround
that uses rsync just once to update all those timestamps while not
breaking the hardlinks (presumably -not- using --link-dest for those),
or (third best) some not-rsync-at-all solution that basically does
a giant ls at the source and a giant touch at the destination.  Yes,
I know that none of those can fix up backed-up files that are no
longer in the same place in the source; those at least won't bloat/
unhardlink later backups.

This solution is likely going to have to be reinvented repeatedly by
people running backups, so working out the right way to do it and then
putting it somewhere other rsync users will find it is likely a useful
exercise.  Before I solve it by hand for myself, it'd be useful to
know if it's likely rsync will come up with some way to solve it for
me.

* E.g., (a) "If modify-window is -1, and --link-dest is in use, and
everything else about the file or directory matches -except- the
timestamp, then update the timestamp," and (b) "Provide a switch to
turn off this heuristic after I'm sure my backups are okay."  (The
intent of (b) is to catch later slight changes of timestamp but
actually record them as -separate files-, which is important so
older snapshots don't magically change out from under you if something
updates a timestamp by a fraction of a second without otherwise
changing anything, which does happen and can occasionally be -very-
important to know about when tracking down issues.)

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: [Bug 11521] rsync does not use high-resolution timestamps to determine file differences

f-rsync
    > Date: Sun, 24 Jan 2016 15:43:20 -0800
    > From: Wayne Davison <[hidden email]>

A couple questions below; please bear with me.

    > No, if you do a ext4 -> ext4 copy, rsync has set the matching ns info for
    > transferred files since 3.1.0. There was a case prior to rsync 3.1.2 where
    > a brand-new file transferred in the same second it was created wouldn't get
    > the right ns value because rsync was optimizing away the time-set if the
    > file's mod-time matched in the integer part (3.1.2 fixed that).

Oh, I see what happened.  My problem is that no Ubuntu LTS before
14.04 had rsync 3.1.0 or newer, and the original capability took more
than four years to make it into a released rsync version, if I'm
reading the release notes correctly.*  Unfortunately, that means
the vast majority of my machine base predates the fix, including the
machine hosting the backups.  I can obviously install newer rsyncs,
but that gives me a big installed base of pre-fix data that I'm going
to have to fix, and no more rsync security updates unless I track them
manually.  Yet I'd rather do this now, so I'm future-proofed, than
be badly surprised some years down the road when this rsync behavior
becomes the default, and to keep the problem from continuing to get worse.

* I may have tried 3.1.0 at some point and then realized the problems
  it'd give me for backups and didn't install it everywhere, pending
  a better fix; this is starting to ring bells.  I'm really surprised
  that the initial patch of Sep 7, 2009 never made it into a released
  rsync until 3.1.0 of Sep 28 2013; that's a four-year delay, and
  explains why I obviously never tried this out until perhaps an
  experiment with 3.1.0, and no doubt I didn't want to run a private
  version that wouldn't get security updates.  Ubuntu rsync versions:
  10.04 has 3.0.7 proto 30; 12.04 has 3.0.9 p 30; 14.04 has 3.1.0 p 31.)

So what it looks like is that the capability to transfer ns times at
all existed in CVS but not released since 2009, in released since
2013, and in an Ubuntu LTS since 2014.  And the current patch -seems-
to be an optimization that avoids -comparison- if the ns times match,
but that only affects speed---it doesn't change what gets written in
any event, just how fast.  Right?  But actually I think it changes
behavior besides that---see my test case below.

    > Beginning with this patch you can run rsync -aiv --checksum -@-1 and have
    > it fix the full mod-time on any matching files it finds. But for most newer
    > backups, the ns time will already be set correctly (as long as it was
    > created using a new enough rsync and protocol 31). If someone has a large
    > link-dest hierarchy that predates 3.1.0, then you could be sharing
    > hard-linked matching files from back before the ns info was included (the
    > older files would all have 0 for the ns value).

Wow, that -c really hurts.  If one wanted to live dangerously---with
the assumption that two files that otherwise match in all metadata
(including obviously length :) but whose timestamps differ in that one
has integer seconds and the other has the same integer seconds but
also nanoseconds, can rsync readjust the dates, without doing a full
checksum?  If not, I may write such a tool, or do it the (very) slow
way and have rsync re-checksum a few terabytes of my backups... :)
[Might find some bitrot that way, of course.]

Also, I actually -can't- use that command to fix my snapshots,
because (if I understand correctly), it will -alter- my existing
snapshots to match the -current- contents of files, destroying them
---I'll no longer be able to go back in time to a previous version.
I only want to update ns times on files in the older snapshots
if and only if changing integer times to ns times would be the
only modification.  I think rsync -ac -@-1 will do far more, yes?

As for -@-1, that introduces a surprising change in behavior when I
try it.  I'm unsure if it's intended, though I think it is.  But it
will -definitely- break my hardlinks and bloat the backups if I try
it without readjusting the dates in --link-dest directories (e.g.,
previous snapshots).  I find that specifying -@-1 copies the ns
timestamp from the source to the destination even if the --link-dest
directory has an integer timestamp, and so I assume this is part of
the purpose of the patch?  Not just an optimization, but a change in
the way --link-dest might work.  Observe:

22:57:22 ~$ mkdir T
22:57:25 ~$ cd T
22:57:26 ~/T$ mkdir 1 2 3 4 5 6
22:57:30 ~/T$ lat() { ls -alF -i --full-time "$@"; }
22:57:49 ~/T$ touch 1/foo
22:57:53 ~/T$ ln 1/foo 1/bar
22:57:56 ~/T$ lat */*
1321175 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 1/bar
1321175 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 1/foo
22:57:59 ~/T$ rsync -aviH 1/ 2/
sending incremental file list
.d..t...... ./
>f+++++++++ foo
hf+++++++++ bar => foo

sent 139 bytes  received 53 bytes  384.00 bytes/sec
total size is 0  speedup is 0.00
22:58:13 ~/T$ lat */*
1321175 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 1/bar
1321175 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 1/foo
1321176 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 2/bar
1321176 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 2/foo
22:58:17 ~/T$ rsync -aviH --link-dest=../2 1/ 3/
sending incremental file list
.d..t...... ./

sent 89 bytes  received 19 bytes  216.00 bytes/sec
total size is 0  speedup is 0.00
22:59:05 ~/T$ lat */*
1321175 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 1/bar
1321175 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 1/foo
1321176 -rw-r--r-- 4 user user 0 2016-01-24 22:57:53.013689572 -0500 2/bar
1321176 -rw-r--r-- 4 user user 0 2016-01-24 22:57:53.013689572 -0500 2/foo
1321176 -rw-r--r-- 4 user user 0 2016-01-24 22:57:53.013689572 -0500 3/bar
1321176 -rw-r--r-- 4 user user 0 2016-01-24 22:57:53.013689572 -0500 3/foo
22:59:10 ~/T$ touch --date="2016-01-24 22:57:53 -0500" 2/bar
23:00:07 ~/T$ lat */*
1321175 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 1/bar
1321175 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 1/foo
1321176 -rw-r--r-- 4 user user 0 2016-01-24 22:57:53.000000000 -0500 2/bar
1321176 -rw-r--r-- 4 user user 0 2016-01-24 22:57:53.000000000 -0500 2/foo
1321176 -rw-r--r-- 4 user user 0 2016-01-24 22:57:53.000000000 -0500 3/bar
1321176 -rw-r--r-- 4 user user 0 2016-01-24 22:57:53.000000000 -0500 3/foo
23:00:09 ~/T$ rsync -aviH --link-dest=../2 1/ 4/
sending incremental file list
.d..t...... ./

sent 89 bytes  received 19 bytes  216.00 bytes/sec
total size is 0  speedup is 0.00
23:00:36 ~/T$ lat */*
1321175 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 1/bar
1321175 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 1/foo
1321176 -rw-r--r-- 6 user user 0 2016-01-24 22:57:53.000000000 -0500 2/bar
1321176 -rw-r--r-- 6 user user 0 2016-01-24 22:57:53.000000000 -0500 2/foo
1321176 -rw-r--r-- 6 user user 0 2016-01-24 22:57:53.000000000 -0500 3/bar
1321176 -rw-r--r-- 6 user user 0 2016-01-24 22:57:53.000000000 -0500 3/foo
1321176 -rw-r--r-- 6 user user 0 2016-01-24 22:57:53.000000000 -0500 4/bar
1321176 -rw-r--r-- 6 user user 0 2016-01-24 22:57:53.000000000 -0500 4/foo
23:00:38 ~/T$ ~/rsync-HEAD-20160124-1917GMT/rsync -aviH --link-dest ../2/ 1/ 5/
sending incremental file list
.d..t...... ./

sent 89 bytes  received 19 bytes  216.00 bytes/sec
total size is 0  speedup is 0.00
23:00:56 ~/T$ lat */*
1321175 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 1/bar
1321175 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 1/foo
1321176 -rw-r--r-- 8 user user 0 2016-01-24 22:57:53.000000000 -0500 2/bar
1321176 -rw-r--r-- 8 user user 0 2016-01-24 22:57:53.000000000 -0500 2/foo
1321176 -rw-r--r-- 8 user user 0 2016-01-24 22:57:53.000000000 -0500 3/bar
1321176 -rw-r--r-- 8 user user 0 2016-01-24 22:57:53.000000000 -0500 3/foo
1321176 -rw-r--r-- 8 user user 0 2016-01-24 22:57:53.000000000 -0500 4/bar
1321176 -rw-r--r-- 8 user user 0 2016-01-24 22:57:53.000000000 -0500 4/foo
1321176 -rw-r--r-- 8 user user 0 2016-01-24 22:57:53.000000000 -0500 5/bar
1321176 -rw-r--r-- 8 user user 0 2016-01-24 22:57:53.000000000 -0500 5/foo
23:00:59 ~/T$ ~/rsync-HEAD-20160124-1917GMT/rsync -@-1 -aviH --link-dest ../2/ 1/ 6/
sending incremental file list
.d..t...... ./
>f..t...... bar
hf+++++++++ foo => bar

sent 136 bytes  received 50 bytes  372.00 bytes/sec
total size is 0  speedup is 0.00
23:01:13 ~/T$ lat */*
1321175 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 1/bar
1321175 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 1/foo
1321176 -rw-r--r-- 8 user user 0 2016-01-24 22:57:53.000000000 -0500 2/bar
1321176 -rw-r--r-- 8 user user 0 2016-01-24 22:57:53.000000000 -0500 2/foo
1321176 -rw-r--r-- 8 user user 0 2016-01-24 22:57:53.000000000 -0500 3/bar
1321176 -rw-r--r-- 8 user user 0 2016-01-24 22:57:53.000000000 -0500 3/foo
1321176 -rw-r--r-- 8 user user 0 2016-01-24 22:57:53.000000000 -0500 4/bar
1321176 -rw-r--r-- 8 user user 0 2016-01-24 22:57:53.000000000 -0500 4/foo
1321176 -rw-r--r-- 8 user user 0 2016-01-24 22:57:53.000000000 -0500 5/bar
1321176 -rw-r--r-- 8 user user 0 2016-01-24 22:57:53.000000000 -0500 5/foo
1321177 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 6/bar
1321177 -rw-r--r-- 2 user user 0 2016-01-24 22:57:53.013689572 -0500 6/foo
23:01:16 ~/T$ rsync --version
rsync  version 3.1.0  protocol version 31
Copyright (C) 1996-2013 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
    64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
    socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
    append, ACLs, xattrs, iconv, symtimes, prealloc

rsync comes with ABSOLUTELY NO WARRANTY.  This is free software, and you
are welcome to redistribute it under certain conditions.  See the GNU
General Public Licence for details.
23:07:25 ~/T$ ~/rsync-HEAD-20160124-1917GMT/rsync --version
rsync  version 3.1.3dev  protocol version 31
Copyright (C) 1996-2015 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
    64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
    socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
    append, no ACLs, xattrs, iconv, symtimes, prealloc

rsync comes with ABSOLUTELY NO WARRANTY.  This is free software, and you
are welcome to redistribute it under certain conditions.  See the GNU
General Public Licence for details.
23:07:32 ~/T$ cat /etc/issue
Ubuntu 14.04.3 LTS \n \l

23:08:21 ~/T$

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html