Can rsync assume that the destination directory is empty ?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Can rsync assume that the destination directory is empty ?

Arnaud Aujon Chevallier
Hello,

I'm currently using rsync to backup up to 1 TB of small files of
relatively small files (hundreds of Ko mostly)

My backup strategy is to use a full backup and then backup the diff
every day using hardlink with the previous backup. This means that each
time I use rsync, the destination directory is empty.

Using strace, I can see that rsync call a 'lstat' command to try to see
if the file already exists in my destination directory. Is there an
option to tell rsync that the destination directory is empty ?

Do you think that avoiding this call can improve rsync performances in
this specific case ?

I tried reading the source code, but I'm not exactly sure where this
lstat call happens.

Thanks a lot,

Arnaud Aujon Chevallier


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: Can rsync assume that the destination directory is empty ?

Kevin Korb
There isn't an option for that and it isn't actually required that the
target directory be empty (just a good idea).  Plus it has to do the
stat calls on the other end anyway so I doubt there would be much
performance benefit.

Maybe --ignore-times would cause it to not look but I kinda doubt it and
I am too tired to do an strace right now ;)

On 06/09/2016 05:32 AM, Arnaud Aujon Chevallier wrote:

> Hello,
>
> I'm currently using rsync to backup up to 1 TB of small files of
> relatively small files (hundreds of Ko mostly)
>
> My backup strategy is to use a full backup and then backup the diff
> every day using hardlink with the previous backup. This means that each
> time I use rsync, the destination directory is empty.
>
> Using strace, I can see that rsync call a 'lstat' command to try to see
> if the file already exists in my destination directory. Is there an
> option to tell rsync that the destination directory is empty ?
>
> Do you think that avoiding this call can improve rsync performances in
> this specific case ?
>
> I tried reading the source code, but I'm not exactly sure where this
> lstat call happens.
>
> Thanks a lot,
>
> Arnaud Aujon Chevallier
>
>
--
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
        Kevin Korb Phone:    (407) 252-6853
        Systems Administrator Internet:
        FutureQuest, Inc. [hidden email]  (work)
        Orlando, Florida [hidden email] (personal)
        Web page: http://www.sanitarium.net/
        PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Can rsync assume that the destination directory is empty ?

Kevin Korb
Actually, don't do --ignore-times.  Even if it did prevent the stat
calls it would also tell rsync to not care about matching files in the
--link-dest dir which would be very bad.

On 06/09/2016 06:27 AM, Kevin Korb wrote:

> There isn't an option for that and it isn't actually required that the
> target directory be empty (just a good idea).  Plus it has to do the
> stat calls on the other end anyway so I doubt there would be much
> performance benefit.
>
> Maybe --ignore-times would cause it to not look but I kinda doubt it and
> I am too tired to do an strace right now ;)
>
> On 06/09/2016 05:32 AM, Arnaud Aujon Chevallier wrote:
>> Hello,
>>
>> I'm currently using rsync to backup up to 1 TB of small files of
>> relatively small files (hundreds of Ko mostly)
>>
>> My backup strategy is to use a full backup and then backup the diff
>> every day using hardlink with the previous backup. This means that each
>> time I use rsync, the destination directory is empty.
>>
>> Using strace, I can see that rsync call a 'lstat' command to try to see
>> if the file already exists in my destination directory. Is there an
>> option to tell rsync that the destination directory is empty ?
>>
>> Do you think that avoiding this call can improve rsync performances in
>> this specific case ?
>>
>> I tried reading the source code, but I'm not exactly sure where this
>> lstat call happens.
>>
>> Thanks a lot,
>>
>> Arnaud Aujon Chevallier
>>
>>
>
>
>
--
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
        Kevin Korb Phone:    (407) 252-6853
        Systems Administrator Internet:
        FutureQuest, Inc. [hidden email]  (work)
        Orlando, Florida [hidden email] (personal)
        Web page: http://www.sanitarium.net/
        PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Can rsync assume that the destination directory is empty ?

Arnaud Aujon Chevallier

Thanks for your answer,

I ran some more test and it show that the lstat calls are only responsible for 3.7 % of the total time.

So we could avoid about a third of them (the errors numbers), which will be about 1%, not very interesting :)

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------

  3.74    1.792339           1   2088744    693051 lstat


Le 09/06/2016 à 12:29, Kevin Korb a écrit :
Actually, don't do --ignore-times.  Even if it did prevent the stat
calls it would also tell rsync to not care about matching files in the
--link-dest dir which would be very bad.

On 06/09/2016 06:27 AM, Kevin Korb wrote:
There isn't an option for that and it isn't actually required that the
target directory be empty (just a good idea).  Plus it has to do the
stat calls on the other end anyway so I doubt there would be much
performance benefit.

Maybe --ignore-times would cause it to not look but I kinda doubt it and
I am too tired to do an strace right now ;)

On 06/09/2016 05:32 AM, Arnaud Aujon Chevallier wrote:
Hello,

I'm currently using rsync to backup up to 1 TB of small files of
relatively small files (hundreds of Ko mostly)

My backup strategy is to use a full backup and then backup the diff
every day using hardlink with the previous backup. This means that each
time I use rsync, the destination directory is empty.

Using strace, I can see that rsync call a 'lstat' command to try to see
if the file already exists in my destination directory. Is there an
option to tell rsync that the destination directory is empty ?

Do you think that avoiding this call can improve rsync performances in
this specific case ?

I tried reading the source code, but I'm not exactly sure where this
lstat call happens.

Thanks a lot,

Arnaud Aujon Chevallier





      



--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: Can rsync assume that the destination directory is empty ?

Simon Hobson-2

On 9 Jun 2016, at 11:35, Arnaud Aujon Chevallier <[hidden email]> wrote:

> I ran some more test and it show that the lstat calls are only responsible for 3.7 % of the total time.
>
> So we could avoid about a third of them (the errors numbers), which will be about 1%, not very interesting :)
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>   3.74    1.792339           1   2088744    693051 lstat

Is that wallclock time or cpu time ?
AIUI rsync is optimised to work over slow or high latency links and does parallel operations.
Thus, while it may be doing checks it doesn't need to, these don't necessarily contribute to total time taken which will be dominated by data transfer times. Obviously this will depend on various factors - particularly the link speed and the performance of the two systems. If the stat operations happen in parallel with the data transfer, it may be that they don't affect overall time taken at all.
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html