Yet another filter question

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Yet another filter question

Christoph Biedl
Hello,

Since the very first day I've been using rsync - some 15 years ago -
the filtering rules caused great grieve. Their behaviour is just not
the way I'd expect it be be and as I read the manpage. Usually I end
up with some hand-written recipes, carefully documented,y including all
the gotchas.

This time however I failed and I see no other way than to ask for
advice.

Given the following structure

        project/
            .rsync-filter
            project.git/
                .git/

Now, the following command (rsync 3.1.1)

    rsync -av --del --deleted-excluded -F \
        /path/to/project/ /path/to/backup/project/

should transfer

- sync: project/project.git/.git/
- skip: Everything else in project/project.git/
- sync: Everything else in project/

In other words: Don't transfer the git repo except for .git/ itself.
Yes, this means files not checked in are lost, that's by intention.

Now, how is .rsync-filter supposed to look? The first of many failing
attempts was

    + *.git/.git/*
    - *.git/

This however kills the entire project.git/ directory, in violation of
"the first matching pattern is acted on". Given previous bad
experiences I've tried (using a script) all 128 combination of

- line order
- '+' or '-' at the start of any line
- '*' appended to any line, or not, with without trailing slash

but no avail.

Any clue?

    Christoph

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: Yet another filter question

Kevin Korb
Try:
+ *.git/
+ *.git/.git/***
- *.git/*

(you probably want to add --prune-empty-dirs)

Also, when debugging filters/excludes use -vv as it will tell you which
pattern is causing it to do what.

Also, there is no point in using -v without --itemize-changes.

On 05/03/2016 04:07 PM, Christoph Biedl wrote:

> Hello,
>
> Since the very first day I've been using rsync - some 15 years ago -
> the filtering rules caused great grieve. Their behaviour is just not
> the way I'd expect it be be and as I read the manpage. Usually I end
> up with some hand-written recipes, carefully documented,y including all
> the gotchas.
>
> This time however I failed and I see no other way than to ask for
> advice.
>
> Given the following structure
>
>         project/
>             .rsync-filter
>             project.git/
>                 .git/
>
> Now, the following command (rsync 3.1.1)
>
>     rsync -av --del --deleted-excluded -F \
>         /path/to/project/ /path/to/backup/project/
>
> should transfer
>
> - sync: project/project.git/.git/
> - skip: Everything else in project/project.git/
> - sync: Everything else in project/
>
> In other words: Don't transfer the git repo except for .git/ itself.
> Yes, this means files not checked in are lost, that's by intention.
>
> Now, how is .rsync-filter supposed to look? The first of many failing
> attempts was
>
>     + *.git/.git/*
>     - *.git/
>
> This however kills the entire project.git/ directory, in violation of
> "the first matching pattern is acted on". Given previous bad
> experiences I've tried (using a script) all 128 combination of
>
> - line order
> - '+' or '-' at the start of any line
> - '*' appended to any line, or not, with without trailing slash
>
> but no avail.
>
> Any clue?
>
>     Christoph
>
--
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
        Kevin Korb Phone:    (407) 252-6853
        Systems Administrator Internet:
        FutureQuest, Inc. [hidden email]  (work)
        Orlando, Florida [hidden email] (personal)
        Web page: http://www.sanitarium.net/
        PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Yet another filter question

Wayne Davison-2
In reply to this post by Christoph Biedl

On Tue, May 3, 2016 at 1:07 PM, Christoph Biedl <[hidden email]> wrote:
  + *.git/.git/*
  - *.git/

 
From the man page near the start of the "INCLUDE/EXCLUDE PATTERN RULES" section:

Note that, when using the --recursive (-r) option (which is implied by -a), every subcomponent of every path is visited from the top down, so include/exclude patterns get applied recursively to each subcomponent’s full name (e.g. to include "/foo/bar/baz" the subcomponents "/foo" and "/foo/bar" must not be excluded). The exclude patterns actually short-circuit the directory traversal stage when rsync finds the files to send.

Thus, your latter exclude prevents the first include from ever seeing any data that it could match. You'd need to use something more like this:

+ *.git/.git/
- *.git/*


..wayne..

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: Yet another filter question

Kevin Korb
I hate to say anything remotely negative to Wayne but...

That wording from the man page makes almost no sense without the
examples directly after it (and I have read it many times and know what
it is saying).

When I go all RTFM on this topic I usually tell them to 'man rsync',
search for file-will-not-be-found and start reading from that line.
Once you understand the broken and correct examples you can decipher
what the explanation above means.

On 05/04/2016 09:03 PM, Wayne Davison wrote:

>
> On Tue, May 3, 2016 at 1:07 PM, Christoph Biedl <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>       + *.git/.git/*
>       - *.git/
>
>  
> From the man page near the start of the "INCLUDE/EXCLUDE PATTERN RULES"
> section:
>
>     /Note that, when using the --recursive (-r) option (which is implied
>     by -a), every subcomponent of every path is visited from the top
>     down, so include/exclude patterns get applied recursively to each
>     subcomponent’s full name (e.g. to include "/foo/bar/baz" the
>     subcomponents "/foo" and "/foo/bar" must not be excluded). The
>     exclude patterns actually short-circuit the directory traversal
>     stage when rsync finds the files to send./
>
>
> Thus, your latter exclude prevents the first include from ever seeing
> any data that it could match. You'd need to use something more like this:
>
> + *.git/.git/
> - *.git/*
>
>
> ..wayne..
>
>
--
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
        Kevin Korb Phone:    (407) 252-6853
        Systems Administrator Internet:
        FutureQuest, Inc. [hidden email]  (work)
        Orlando, Florida [hidden email] (personal)
        Web page: http://www.sanitarium.net/
        PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Yet another filter question

Karl O. Pinc
On Wed, 4 May 2016 21:09:44 -0400
Kevin Korb <[hidden email]> wrote:

> That wording from the man page makes almost no sense without the
> examples directly after it (and I have read it many times and know
> what it is saying).

Makes sense to me.  The only thing I'd change is to use "in a depth
first fasion" instead of "from the top down", "depth first" (v.s.
"breadth first") being the standard idiom when talking about
this tree traversal strategy.  It also matches up with some
of the vocabulary (e.g. "deeper") used later in the paragraph.

Er, ok.  It could also say "component" instead of "subcomponent",
in the first instance.  And "to the full name of each node in the
filesystem's tree" instead of "to each subcomponents full name",
in the second instance.  And eliminate "the subcomponents" in the
third instance.

Patch appended. (Which does not perfectly match my comments
above.)

> On 05/04/2016 09:03 PM, Wayne Davison wrote:

> > From the man page near the start of the "INCLUDE/EXCLUDE PATTERN
> > RULES" section:
> >
> >     /Note that, when using the --recursive (-r) option (which is
> > implied by -a), every subcomponent of every path is visited from
> > the top down, so include/exclude patterns get applied recursively
> > to each subcomponent’s full name (e.g. to include "/foo/bar/baz" the
> >     subcomponents "/foo" and "/foo/bar" must not be excluded). The
> >     exclude patterns actually short-circuit the directory traversal
> >     stage when rsync finds the files to send./



Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein


diff --git a/rsync.yo b/rsync.yo
index 0ec5e55..3f742ab 100644
--- a/rsync.yo
+++ b/rsync.yo
@@ -2830,14 +2830,16 @@ itemization(
 )
 
 Note that, when using the bf(--recursive) (bf(-r)) option (which is implied by
-bf(-a)), every subcomponent of every path is visited from the top down, so
-include/exclude patterns get applied recursively to each subcomponent's
-full name (e.g. to include "/foo/bar/baz" the subcomponents "/foo" and
+bf(-a)), every component of every pathname is visited
+in a depth-first fashion.  In this way
+include/exclude patterns are applied recursively to the full name of each
+node in the filesystem's tree.
+(E.g. to include "/foo/bar/baz", "/foo" and
 "/foo/bar" must not be excluded).
-The exclude patterns actually short-circuit the directory traversal stage
+The exclude patterns short-circuit the directory traversal stage
 when rsync finds the files to send.  If a pattern excludes a particular
-parent directory, it can render a deeper include pattern ineffectual
-because rsync did not descend through that excluded section of the
+parent directory this can render a deeper include pattern ineffectual
+because rsync does not descend through the excluded section of the
 hierarchy.  This is particularly important when using a trailing '*' rule.
 For instance, this won't work:
 

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: Yet another filter question

Karl O. Pinc
In reply to this post by Kevin Korb
On Wed, 4 May 2016 21:09:44 -0400
Kevin Korb <[hidden email]> wrote:

> That wording from the man page makes almost no sense without the
> examples directly after it (and I have read it many times and know
> what it is saying).

Makes sense to me.  The only thing I'd change is to use "in a depth
first fasion" instead of "from the top down", "depth first" (v.s.
"breadth first") being the standard idiom when talking about
this tree traversal strategy.  It also matches up with some
of the vocabulary (e.g. "deeper") used later in the paragraph.

Er, ok.  It could also say "component" instead of "subcomponent",
in the first instance.  And "to the full name of each node in the
filesystem's tree" instead of "to each subcomponents full name",
in the second instance.  And eliminate "the subcomponents" in the
third instance.

Patch attached. (Which does not perfectly match my comments
above.)

> On 05/04/2016 09:03 PM, Wayne Davison wrote:

> > From the man page near the start of the "INCLUDE/EXCLUDE PATTERN
> > RULES" section:
> >
> >     /Note that, when using the --recursive (-r) option (which is
> > implied by -a), every subcomponent of every path is visited from
> > the top down, so include/exclude patterns get applied recursively
> > to each subcomponent’s full name (e.g. to include "/foo/bar/baz" the
> >     subcomponents "/foo" and "/foo/bar" must not be excluded). The
> >     exclude patterns actually short-circuit the directory traversal
> >     stage when rsync finds the files to send./
> >
Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

rsync.yo.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Yet another filter question

Karl O. Pinc
In reply to this post by Karl O. Pinc
On Wed, 4 May 2016 21:26:26 -0500
"Karl O. Pinc" <[hidden email]> wrote:

> On Wed, 4 May 2016 21:09:44 -0400
> Kevin Korb <[hidden email]> wrote:
>
> > That wording from the man page makes almost no sense without the
> > examples directly after it (and I have read it many times and know
> > what it is saying).  

> Patch appended. (Which does not perfectly match my comments
> above.)

2nd version of the patch appended. (I also line-wrapped and filled.
This makes it harder to see the exact details of the difference,
but there's enough changes now that it does not really matter.
Use wdiff for an exact comparison.)

The wording now contains some redundancy.  Or maybe
the redundancy that was there is now more apparent.  I don't think
this is necessarily bad when explaining something that people
find confusing.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein
diff --git a/rsync.yo b/rsync.yo
index 0ec5e55..166b58e 100644
--- a/rsync.yo
+++ b/rsync.yo
@@ -2829,17 +2829,19 @@ itemization(
   version 2.6.7.
 )
 
-Note that, when using the bf(--recursive) (bf(-r)) option (which is implied by
-bf(-a)), every subcomponent of every path is visited from the top down, so
-include/exclude patterns get applied recursively to each subcomponent's
-full name (e.g. to include "/foo/bar/baz" the subcomponents "/foo" and
-"/foo/bar" must not be excluded).
-The exclude patterns actually short-circuit the directory traversal stage
-when rsync finds the files to send.  If a pattern excludes a particular
-parent directory, it can render a deeper include pattern ineffectual
-because rsync did not descend through that excluded section of the
-hierarchy.  This is particularly important when using a trailing '*' rule.
-For instance, this won't work:
+Note that, when using the bf(--recursive) (bf(-r)) option (which is
+implied by bf(-a)), every component of every pathname is visited left
+to right; directories are examined before their content.  In this way
+include/exclude patterns are applied recursively to the full pathname
+of each node in the filesystem's tree.  The exclude patterns
+short-circuit the directory traversal stage as rsync finds the files
+to send.  (E.g. to include "/foo/bar/baz", "/foo" and "/foo/bar" must
+not be excluded.  Excluding either prevents examination of their
+content.)  If a pattern excludes a particular parent directory this
+will render a deeper include pattern ineffectual because rsync does
+not descend through the excluded section of the hierarchy.  This is
+particularly important when using a trailing '*' rule.  For instance,
+this won't work:
 
 quote(
 tt(+ /some/path/this-file-will-not-be-found)nl()

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: Yet another filter question

Christoph Biedl
In reply to this post by Wayne Davison-2
Wayne Davison wrote...

> On Tue, May 3, 2016 at 1:07 PM, Christoph Biedl <[hidden email]> wrote:
>
> >   + *.git/.git/*
> >   - *.git/
> >
>
> >From the man page near the start of the "INCLUDE/EXCLUDE PATTERN RULES"
> section:
>
> *Note that, when using the --recursive (-r) option (which is implied by
> > -a), every subcomponent of every path is visited from the top down, so
> > include/exclude patterns get applied recursively to each subcomponent’s
> > full name (e.g. to include "/foo/bar/baz" the subcomponents "/foo" and
> > "/foo/bar" must not be excluded). The exclude patterns actually
> > short-circuit the directory traversal stage when rsync finds the files to
> > send.*

Yeah, things like this are the reason why I got the impression rsync's
filter behaviour is acting strange. It certainly makes perfect sense
once you've understood in great detail how it works - until then you're
just confused.

> Thus, your latter exclude prevents the first include from ever seeing any
> data that it could match. You'd need to use something more like this:
>
> + *.git/.git/
> - *.git/*

No, that one kills files inside *.git/.git/, too.

Kevin's suggestion however does the trick. Thanks!

Another working solution:

+ /*.git/.git/
- /*.git/*

    Christoph

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html