[PATCH] Consider nanoseconds when quick-checking for unchanged files

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Consider nanoseconds when quick-checking for unchanged files

Ingo Brückl
On systems using nanoseconds differences should be taken into consideration.

--- a/generator.c 2014-06-14 01:05:08.000000000 +0200
+++ b/generator.c 2014-12-25 11:19:54.000000000 +0100
@@ -588,7 +588,13 @@
  if (ignore_times)
  return 0;

- return cmp_time(st->st_mtime, file->modtime) == 0;
+ return cmp_time(st->st_mtime, file->modtime) == 0
+#ifdef ST_MTIME_NSEC
+       ? NSEC_BUMP(file) ? (uint32)st->ST_MTIME_NSEC == F_MOD_NSEC(file)
+                         : 1
+       : 0
+#endif
+ ;
 }


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Consider nanoseconds when quick-checking for unchanged files

Ingo Brückl
Hi,

I obviously didn't think of --modify-window, so in order to not behave
erratically it should be at least:

--- a/generator.c 2014-06-14 01:05:08.000000000 +0200
+++ b/generator.c 2015-01-02 15:50:30.000000000 +0100
@@ -588,7 +588,14 @@
  if (ignore_times)
  return 0;

- return cmp_time(st->st_mtime, file->modtime) == 0;
+ return cmp_time(st->st_mtime, file->modtime) == 0
+#ifdef ST_MTIME_NSEC
+       ? st->st_mtime == file->modtime
+         && NSEC_BUMP(file) ? (uint32)st->ST_MTIME_NSEC == F_MOD_NSEC(file)
+                            : 1
+       : 0
+#endif
+ ;
 }



Most probably, the check should be part of cmp_time(), but I can't overview
possible consequences and I don't know whether such a change would have a
chance to be accepted and hence is worth the effort.

Ingo
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Consider nanoseconds when quick-checking for unchanged files

Ingo Brückl
I wrote on Fri, 02 Jan 2015 16:02:27 +0100:

> --- a/generator.c       2014-06-14 01:05:08.000000000 +0200
> +++ b/generator.c       2015-01-02 15:50:30.000000000 +0100
> @@ -588,7 +588,14 @@
>         if (ignore_times)
>                 return 0;

> -       return cmp_time(st->st_mtime, file->modtime) == 0;
> +       return cmp_time(st->st_mtime, file->modtime) == 0
> +#ifdef ST_MTIME_NSEC
> +              ? st->st_mtime == file->modtime
> +                && NSEC_BUMP(file) ? (uint32)st->ST_MTIME_NSEC ==
> F_MOD_NSEC(file)
> +                                   : 1
> +              : 0
> +#endif
> +       ;
>  }

Ping?

Unfortunately, there weren't any comments yet.

Ingo

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Consider nanoseconds when quick-checking for unchanged files

Andrey Gursky
On Wed, 20 Jan 2016 12:58:51 +0100
Ingo Brückl <[hidden email]> wrote:

> I wrote on Fri, 02 Jan 2015 16:02:27 +0100:
>
> > --- a/generator.c       2014-06-14 01:05:08.000000000 +0200
> > +++ b/generator.c       2015-01-02 15:50:30.000000000 +0100
> > @@ -588,7 +588,14 @@
> >         if (ignore_times)
> >                 return 0;
>
> > -       return cmp_time(st->st_mtime, file->modtime) == 0;
> > +       return cmp_time(st->st_mtime, file->modtime) == 0
> > +#ifdef ST_MTIME_NSEC
> > +              ? st->st_mtime == file->modtime
> > +                && NSEC_BUMP(file) ? (uint32)st->ST_MTIME_NSEC ==
> > F_MOD_NSEC(file)
> > +                                   : 1
> > +              : 0
> > +#endif
> > +       ;
> >  }
>
> Ping?
>
> Unfortunately, there weren't any comments yet.

Ingo,

I was just about to implement the same, since nanoseconds are taken
into account when transferring, thus making it obvious not to ignore
them when comparing.

However I believe the time_cmp() function should be extended and a
few more code adjustments would be needed, which... you also already
addressed in a previously mail.

Regards,
Andrey

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Consider nanoseconds when quick-checking for unchanged files

Paul Slootman-5
On Wed 20 Jan 2016, Andrey Gursky wrote:
>
> I was just about to implement the same, since nanoseconds are taken
> into account when transferring, thus making it obvious not to ignore

Really? I thought the protocol only transmits seconds.


Paul

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Consider nanoseconds when quick-checking for unchanged files

Andrey Gursky
On Wed, 20 Jan 2016 16:17:57 +0100
Paul Slootman <[hidden email]> wrote:

> On Wed 20 Jan 2016, Andrey Gursky wrote:
> >
> > I was just about to implement the same, since nanoseconds are taken
> > into account when transferring, thus making it obvious not to ignore
>
> Really? I thought the protocol only transmits seconds.

07.09.2009 "Add support for transferring & setting nsec time values."

--
Andrey

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Consider nanoseconds when quick-checking for unchanged files

Wayne Davison-2
In reply to this post by Ingo Brückl

On Thu, Dec 25, 2014 at 2:48 AM, Ingo Brückl <[hidden email]> wrote:
On systems using nanoseconds differences should be taken into consideration.

The problem is that if you transfer from a filesystem that has nanoseconds to one that does not support it, rsync would consider most of the files to be constantly different, since the nanosecond values would only match if the source file happened to have 0 nanoseconds. So, the logic has to be improved to somehow detect such a case and treat the truncated values as equal. One possible improvement would be to skip the nanosecond check if the destination file has a nanosecond value of 0.  That could possibly be improved if we figure out if a particular device ID supports nanoseconds somehow.  I have a potential heuristic in mind that I can code up and see how it works.

..wayne..

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Consider nanoseconds when quick-checking for unchanged files

Paul Slootman-5
On Wed 20 Jan 2016, Wayne Davison wrote:

> equal. One possible improvement would be to skip the nanosecond check if
> the destination file has a nanosecond value of 0.  That could possibly be
> improved if we figure out if a particular device ID supports nanoseconds
> somehow.  I have a potential heuristic in mind that I can code up and see
> how it works.

It would be very handy if that could be extended to take into account
FAT filesystems, i.e. automagically set --modify-window=2 if it's
detected that you're writing to a FAT filesystem. It's basically the
same problem as detecting that you're writing to a filesystem that
doesn't support nanoseconds IMHO.


Paul

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Consider nanoseconds when quick-checking for unchanged files

f-rsync
In reply to this post by Wayne Davison-2
    > Date: Wed, 20 Jan 2016 23:04:20 -0800
    > From: Wayne Davison <[hidden email]>

    > On Thu, Dec 25, 2014 at 2:48 AM, Ingo Br=C3=BCckl <[hidden email]> wrote:

    > > On systems using nanoseconds differences should be taken into
    > > consideration.

    > The problem is that if you transfer from a filesystem that has nanoseconds
    > to one that does not support it, rsync would consider most of the files to
    > be constantly different, since the nanosecond values would only match if
    > the source file happened to have 0 nanoseconds. So, the logic has to be
    > improved to somehow detect such a case and treat the truncated values as
    > equal. One possible improvement would be to skip the nanosecond check if
    > the destination file has a nanosecond value of 0.  That could possibly be
    > improved if we figure out if a particular device ID supports nanoseconds
    > somehow.  I have a potential heuristic in mind that I can code up and see
    > how it works.

Here's one idea, and note an important issue with ns times and --link-dest:

(a) For each end, see if any of the files being considered already
have nonzero nanosecond parts.  If so, then that end of the transfer
supports nanosecond timing.

(b) If the sending filesystem appears to have nonzero ns parts, and
the receiving filesystem appears to have all-zero ns parts (including
any directories under consideration), the receiver may still support
ns times, but have been synchronized from a filesystem that didn't.
We don't want to perpetuate that on the -next- sync, however, so we
can't just disallow ns times on the receiver, or we'll never try them
again.

(c) In case (b) above, therefore, if any file to be transmitted has a
ns time, transfer it and then immediately check the received file's
timestamp.  If its ns time is still zero, then the receiving
filesystem doesn't support it, so disable ns times during the
transfer.  If it's nonzero, then enable.  (I am eliding the pipelining
that happens during an actual rsync; that may have to be dealt with
somehow.)  Also, check the directory mod time, and see if -that's- now
nonzero; you have a very small chance of it being zero if ns times are
supported, and you can check for being in or near that window.  And
the first time it's nonzero in this filesystem, you know it'll work
for everything else in this fs.*

* "This filesystem" assumes either that you can detect mountpoints,
or that the heuristic should be applied per-directory, and that no
directory has a single-file mountpoint that doesn't support it, etc.
I assume rsync must already have some sort of logic like this for
dealing with xattr support per-fs, etc.  If this is flaky to do,
then you might need --[no]ns-timing switches to force rsync to do
the right thing without complaining on every single file if it guesses
wrong.

I don't know if the rsync protocol is flexible enough to dynamically
enable or disable this capability partway through a transfer.  If it
isn't, then there's an even more hackish approach, which is to add,
and unconditionally attempt to honor, a --ns-times-valid sort of
switch.  Users can then use the heuristics above in a dummy transfer
to know whether to set that switch for the real transfer.  (Or they
may know out-of-band that their FS supports ns times.)  But I'd think
such a switch and workaround should be last resorts.

I would really like to see ns times supported.  I use dirvish to back
up filesystems, which uses rsync, and if I ever have to restore any
files from that (which I do more often due to accidentally deleting or
bashing a file than due to media failure), I lose the ns timestamps,
and they're sometimes extremely valuable forensically when I'm trying
to debug something else.  Having them be 0 when I thought they shouldn't
be has more than once cost me time until I realized that I'm looking
at files that were rsync'ed from another host (either to duplicate
a setup, or from a backup) and rsync didn't preserve the ns times.

Unfortunately, of course, if rsync gets fixed now, it -will- consider
every single backed-up file in my dirvish vaults to be "new" and will
insanely bloat the vault (doubling its size) the next time it runs,
and then I'll have to tell faster-dupemerge how to re-merge all that
stuff, too.  (After all, even if the file contents haven't changed,
its metadata has, so --link-dest is required to create a new copy
of the file rather than hardlink to one with a different timestamp.)

What I'd -really- like is for some sane interaction with --link-dest
as well (which probably requires another switch, alas), which
basically says "a change from ns-0 to ns-other with no other changes
to the file is considered the same file---update the timestamp to the
new ns time but don't break the hardlink", with a way of forcing that
off for people who aren't in my situation and do care about such a
change.  Failing that, I'd need to do something like (a) run a backup
in non-ns mode by force, then (b) immediately re-run the backup in
ns-mode -on the same output directories-, e.g., -not- using
--link-dest to create new dirvish vaults.  This should get the times
resynchronized without breaking all the hardlinks to the previous
backups.  (I suspect that this would force ns times into files dozens
of generations back in the vaults, since those hardlinks would all
share metadata, but that's okay and in fact desireable.)

Note that this change in rsync behavior would thus appear to need a
pretty big warning in the changelog and new-version announcements
warning people that those who use --link-dest (which I assume means
by-hand, via dirvish, and via rsnapshot, at least) need to make some
sort of workaround (TBD) so as not to have their backups suddenly
explode in both time and space.  I -still- think I'd like to see
ns times in rsync, despite this caveat---the longer it's delayed, the
worst the situation gets.  (A coordinated change in the most-popular
tools that use --link-dest to implement a workaround or at least
warn the user also seems wise; otherwise, those who upgrade their
OS and get a new version of rsync that way, without reading release
notes, may be surprised.  Which means such tools need a way of knowing
which rsync implements ns times, presumably by adding it to the
"Capabilities" output of --version or something.  Unless, of course,
the ns-0-to-ns-other-means-same-file-for-link-dest is the default,
which I think is what I'd recommend, as long as there's a way to
turn that off and it's well-documented.)

P.S. The current situation also means that faster-dupemerge can't use
that information, either, because I can't trust it to be correct
across hosts in such situations.  [I made a version of f-d that
respected ns times, only to abandon it when I realized that rsync
wasn't preserving them!]  I merge -across- vaults with f-d to catch
files that are the same on multiple backed-up hosts, or to catch
pushing a file from one host to another and deleting it from the
original host, or to merge identical files on same host in the backup
even if they aren't merged on the host being backed up.

[Paul Slootman's request for FAT filesystems would be a generalization
of this sort of strategy, although I'd think that in that case it's a
lot more obvious to the user invoking rsync that the fs is FAT.]

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Consider nanoseconds when quick-checking for unchanged files

Karl O. Pinc
In reply to this post by Wayne Davison-2
On Wed, 20 Jan 2016 23:04:20 -0800
Wayne Davison <[hidden email]> wrote:

>
> The problem is that if you transfer from a filesystem that has
> nanoseconds to one that does not support it, rsync would consider
> most of the files to be constantly different, since the nanosecond
> values would only match if the source file happened to have 0
> nanoseconds. So, the logic has to be improved to somehow detect such
> a case and treat the truncated values as equal. One possible
> improvement would be to skip the nanosecond check if the destination
> file has a nanosecond value of 0.  That could possibly be improved if
> we figure out if a particular device ID supports nanoseconds
> somehow.

Seems to me that nanoseconds are the sort of thing that could cause
sysadms crazy headaches.   My thought is to have a declaration
in the rsync configuration file (that can be overridden on
the command line).    Something like "--nanosecond".
It'd have the following values:

ignore   : Ignore nanoseconds.  (default)

update   : Ignore nanoseconds, but update destination timestamps
           when nanoseconds differ.

heuristic: Check nanoseconds with Wayne's spiffy heuristic.

check    : Check nanoseconds.

When there is a conflict between the conf files of the 2 endpoints the
topmost of the above options has priority.  (When no configuration
is specified on at least one endpoint there is no conflict.)

To provide control over conflict management you could have another
option, say, --nanosecond-force, to force your endpoint's choice.  If
both ends force then the later of ignore, update, heuristic, check has
priority.

I don't know how this would work with the existing rsync protocol.
Perhaps it'd be easier to have only the destination end's config
matter, although this does not provide a lot of flexibility from the
command line.  The motivation is to be able to keep things simple,
or as simple as they can be.  Already my ideas seem overly complicated.
Perhaps someone can improve them.

It makes some sense to be able to configure nanosecond related
behavior on a per-directory (i.e., mountpoint) basis, as a substitute
for knowing about every possible filesystem type and being able to
detect file system type.  But this introduces yet more complication.

The default is backward compatible.  Distros could set their
own default in the rsync.conf file they install.

---

Maybe the thing to do is to give up on runtime complication and
just do testing on the destination filesystem when rsync initializes.
This would be on by default but could be turned off by command line.

Since the problem is with destination filesystems that don't support
nanoseconds, and destination filesystems are by definition written to,
then test for nanosecond support once when rsync starts.  Write a file
with non-zero nanoseconds and read it back and see if the nanoseconds
are zero.  Then delete the test file.  Easy if the destination is
empty, harder if there's an existing directory hierarchy.  (But rsync
can already tell if it's crossing a filesystem boundary so....)

Trouble is that unconditionally writing a file would affect directory
timestamps.  If you wait until you know you're writing to a directory
then the approach here is starting to sound suspiciously like Wayne's
heuristic....

Hope these thoughts are helpful to somebody.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html