rsync --link-dest won't link even if existing file is out of date

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

rsync --link-dest won't link even if existing file is out of date

Ken Chase
Feature request: allow --link-dest dir to be linked to even if file exists
in target.

This statement from the man page is adhered to too strongly IMHO:

"This option works best when copying into an empty destination hierarchy, as
rsync treats existing files as definitive (so it never looks in the link-dest
dirs when a destination file already exists)".

I was suprised by this behaviour as generally the scheme is to be efficient/save
space with rsync.

When the file is out of date but exists in the --l-d target, it would be great
if it could be removed and linked. If an option was supplied to request this
behaviour, I'd actually throw some money at making it happen.  (And a further
option to retain a copy if inode permissions/ownership would otherwise be
changed.)

Reasoning:

I backup many servers with --link-dest that have filesystems of 10+M files on
them.  I do not delete old backups - which take 60min per tree or more just so
rsync can recreate them all in an empty target dir when <1% of files change
per day (takes 3-5 hrs per backup!).

Instead, I cycle them in with mv $olddate $today then rsync --del --link-dest
over them - takes 30-60 min depending. (Yes, some malleability of permissions
risk there, mostly interested in contents tho).  Problem is, if a file exists
AT ALL, even out of date, a new copy is put overtop of it per the above man
page decree.

Thus much more disk space is used. Running this scheme with moving old backups
to be written overtop of accumulates many copies of the exact same file over
time.  Running pax -rpl over the copies before rsyncing to them works (and
saves much space!), but takes a very long time as it traverses and compares 2
large backup trees thrashing the same device (in the order of 3-5x the rsync's
time, 3-5 hrs for pax - hardlink(1) is far worse, I suspect a some non-linear
algorithm therein - it ran 3-5x slower than pax again).

I have detailed an example of this scenario at

http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists

which also indicates --delete-before and --whole-file do not help at all.

/kc
--
Ken Chase - [hidden email] skype:kenchase23 +1 416 897 6284 Toronto Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: rsync --link-dest won't link even if existing file is out of date

Kevin Korb
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Since you are in an environment with millions of files I highly
recommend that you move to ZFS storage and use ZFS's subvolume
snapshots instead of --link-dest.  It is much more space efficient,
rsync run time efficient, and the old backups can be deleted in
seconds.  Rsync doesn't have to understand anything about ZFS.  You
just rsync to the same directory every time and have ZFS do a snapshot
on that directory between runs.

On 04/06/2015 01:51 AM, Ken Chase wrote:

> Feature request: allow --link-dest dir to be linked to even if file
> exists in target.
>
> This statement from the man page is adhered to too strongly IMHO:
>
> "This option works best when copying into an empty destination
> hierarchy, as rsync treats existing files as definitive (so it
> never looks in the link-dest dirs when a destination file already
> exists)".
>
> I was suprised by this behaviour as generally the scheme is to be
> efficient/save space with rsync.
>
> When the file is out of date but exists in the --l-d target, it
> would be great if it could be removed and linked. If an option was
> supplied to request this behaviour, I'd actually throw some money
> at making it happen.  (And a further option to retain a copy if
> inode permissions/ownership would otherwise be changed.)
>
> Reasoning:
>
> I backup many servers with --link-dest that have filesystems of
> 10+M files on them.  I do not delete old backups - which take 60min
> per tree or more just so rsync can recreate them all in an empty
> target dir when <1% of files change per day (takes 3-5 hrs per
> backup!).
>
> Instead, I cycle them in with mv $olddate $today then rsync --del
> --link-dest over them - takes 30-60 min depending. (Yes, some
> malleability of permissions risk there, mostly interested in
> contents tho).  Problem is, if a file exists AT ALL, even out of
> date, a new copy is put overtop of it per the above man page
> decree.
>
> Thus much more disk space is used. Running this scheme with moving
> old backups to be written overtop of accumulates many copies of the
> exact same file over time.  Running pax -rpl over the copies before
> rsyncing to them works (and saves much space!), but takes a very
> long time as it traverses and compares 2 large backup trees
> thrashing the same device (in the order of 3-5x the rsync's time,
> 3-5 hrs for pax - hardlink(1) is far worse, I suspect a some
> non-linear algorithm therein - it ran 3-5x slower than pax again).
>
> I have detailed an example of this scenario at
>
> http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists
>
>  which also indicates --delete-before and --whole-file do not help
> at all.
>
> /kc
>

- --
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
        Kevin Korb Phone:    (407) 252-6853
        Systems Administrator Internet:
        FutureQuest, Inc. [hidden email]  (work)
        Orlando, Florida [hidden email] (personal)
        Web page: http://www.sanitarium.net/
        PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlUirykACgkQVKC1jlbQAQc83ACfa7lawkyPFyO9kDE/D8aztql0
AkAAoIQ970yTCHB1ypScQ8ILIQR6zphl
=ktEg
-----END PGP SIGNATURE-----
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: rsync --link-dest won't link even if existing file is out of date

Ken Chase
This has been a consideration. But it pains me that a tiny change/addition
to the rsync option set would save much time and space for other legit use
cases.

We know rsync very well, we dont know ZFS very well (licensing kept the
tech out of our linux-centric operations). We've been using it but we're
not experts yet.

Thanks for the suggestion.

/kc

On Mon, Apr 06, 2015 at 12:07:05PM -0400, Kevin Korb said:
  >-----BEGIN PGP SIGNED MESSAGE-----
  >Hash: SHA1
  >
  >Since you are in an environment with millions of files I highly
  >recommend that you move to ZFS storage and use ZFS's subvolume
  >snapshots instead of --link-dest.  It is much more space efficient,
  >rsync run time efficient, and the old backups can be deleted in
  >seconds.  Rsync doesn't have to understand anything about ZFS.  You
  >just rsync to the same directory every time and have ZFS do a snapshot
  >on that directory between runs.
  >
  >On 04/06/2015 01:51 AM, Ken Chase wrote:
  >> Feature request: allow --link-dest dir to be linked to even if file
  >> exists in target.
  >>
  >> This statement from the man page is adhered to too strongly IMHO:
  >>
  >> "This option works best when copying into an empty destination
  >> hierarchy, as rsync treats existing files as definitive (so it
  >> never looks in the link-dest dirs when a destination file already
  >> exists)".
  >>
  >> I was suprised by this behaviour as generally the scheme is to be
  >> efficient/save space with rsync.
  >>
  >> When the file is out of date but exists in the --l-d target, it
  >> would be great if it could be removed and linked. If an option was
  >> supplied to request this behaviour, I'd actually throw some money
  >> at making it happen.  (And a further option to retain a copy if
  >> inode permissions/ownership would otherwise be changed.)
  >>
  >> Reasoning:
  >>
  >> I backup many servers with --link-dest that have filesystems of
  >> 10+M files on them.  I do not delete old backups - which take 60min
  >> per tree or more just so rsync can recreate them all in an empty
  >> target dir when <1% of files change per day (takes 3-5 hrs per
  >> backup!).
  >>
  >> Instead, I cycle them in with mv $olddate $today then rsync --del
  >> --link-dest over them - takes 30-60 min depending. (Yes, some
  >> malleability of permissions risk there, mostly interested in
  >> contents tho).  Problem is, if a file exists AT ALL, even out of
  >> date, a new copy is put overtop of it per the above man page
  >> decree.
  >>
  >> Thus much more disk space is used. Running this scheme with moving
  >> old backups to be written overtop of accumulates many copies of the
  >> exact same file over time.  Running pax -rpl over the copies before
  >> rsyncing to them works (and saves much space!), but takes a very
  >> long time as it traverses and compares 2 large backup trees
  >> thrashing the same device (in the order of 3-5x the rsync's time,
  >> 3-5 hrs for pax - hardlink(1) is far worse, I suspect a some
  >> non-linear algorithm therein - it ran 3-5x slower than pax again).
  >>
  >> I have detailed an example of this scenario at
  >>
  >> http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists
  >>
  >>  which also indicates --delete-before and --whole-file do not help
  >> at all.
  >>
  >> /kc
  >>
  >
  >- --
  >~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
  > Kevin Korb Phone:    (407) 252-6853
  > Systems Administrator Internet:
  > FutureQuest, Inc. [hidden email]  (work)
  > Orlando, Florida [hidden email] (personal)
  > Web page: http://www.sanitarium.net/
  > PGP public key available on web site.
  >~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
  >-----BEGIN PGP SIGNATURE-----
  >Version: GnuPG v2
  >
  >iEYEARECAAYFAlUirykACgkQVKC1jlbQAQc83ACfa7lawkyPFyO9kDE/D8aztql0
  >AkAAoIQ970yTCHB1ypScQ8ILIQR6zphl
  >=ktEg
  >-----END PGP SIGNATURE-----
  >--
  >Please use reply-all for most replies to avoid omitting the mailing list.
  >To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
  >Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

--
Ken Chase - ken att heavycomputing.ca Toronto Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: rsync --link-dest won't link even if existing file is out of date

Clint Olsen
Not to mention the fact that ZFS requires considerable hardware resources (CPU & memory) to perform well. It also requires you to learn a whole new terminology to wrap your head around it.

It's certainly not a trivial swap to say the least...

Thanks,

-Clint

On Mon, Apr 6, 2015 at 9:12 AM, Ken Chase <[hidden email]> wrote:
This has been a consideration. But it pains me that a tiny change/addition
to the rsync option set would save much time and space for other legit use
cases.

We know rsync very well, we dont know ZFS very well (licensing kept the
tech out of our linux-centric operations). We've been using it but we're
not experts yet.

Thanks for the suggestion.

/kc

On Mon, Apr 06, 2015 at 12:07:05PM -0400, Kevin Korb said:
  >-----BEGIN PGP SIGNED MESSAGE-----
  >Hash: SHA1
  >
  >Since you are in an environment with millions of files I highly
  >recommend that you move to ZFS storage and use ZFS's subvolume
  >snapshots instead of --link-dest.  It is much more space efficient,
  >rsync run time efficient, and the old backups can be deleted in
  >seconds.  Rsync doesn't have to understand anything about ZFS.  You
  >just rsync to the same directory every time and have ZFS do a snapshot
  >on that directory between runs.
  >
  >On 04/06/2015 01:51 AM, Ken Chase wrote:
  >> Feature request: allow --link-dest dir to be linked to even if file
  >> exists in target.
  >>
  >> This statement from the man page is adhered to too strongly IMHO:
  >>
  >> "This option works best when copying into an empty destination
  >> hierarchy, as rsync treats existing files as definitive (so it
  >> never looks in the link-dest dirs when a destination file already
  >> exists)".
  >>
  >> I was suprised by this behaviour as generally the scheme is to be
  >> efficient/save space with rsync.
  >>
  >> When the file is out of date but exists in the --l-d target, it
  >> would be great if it could be removed and linked. If an option was
  >> supplied to request this behaviour, I'd actually throw some money
  >> at making it happen.  (And a further option to retain a copy if
  >> inode permissions/ownership would otherwise be changed.)
  >>
  >> Reasoning:
  >>
  >> I backup many servers with --link-dest that have filesystems of
  >> 10+M files on them.  I do not delete old backups - which take 60min
  >> per tree or more just so rsync can recreate them all in an empty
  >> target dir when <1% of files change per day (takes 3-5 hrs per
  >> backup!).
  >>
  >> Instead, I cycle them in with mv $olddate $today then rsync --del
  >> --link-dest over them - takes 30-60 min depending. (Yes, some
  >> malleability of permissions risk there, mostly interested in
  >> contents tho).  Problem is, if a file exists AT ALL, even out of
  >> date, a new copy is put overtop of it per the above man page
  >> decree.
  >>
  >> Thus much more disk space is used. Running this scheme with moving
  >> old backups to be written overtop of accumulates many copies of the
  >> exact same file over time.  Running pax -rpl over the copies before
  >> rsyncing to them works (and saves much space!), but takes a very
  >> long time as it traverses and compares 2 large backup trees
  >> thrashing the same device (in the order of 3-5x the rsync's time,
  >> 3-5 hrs for pax - hardlink(1) is far worse, I suspect a some
  >> non-linear algorithm therein - it ran 3-5x slower than pax again).
  >>
  >> I have detailed an example of this scenario at
  >>
  >> http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists
  >>
  >>  which also indicates --delete-before and --whole-file do not help
  >> at all.
  >>
  >> /kc
  >>
  >
  >- --
  >~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
  >     Kevin Korb                      Phone:    <a href="tel:%28407%29%20252-6853" value="+14072526853">(407) 252-6853
  >     Systems Administrator           Internet:
  >     FutureQuest, Inc.               [hidden email]  (work)
  >     Orlando, Florida                [hidden email] (personal)
  >     Web page:                       http://www.sanitarium.net/
  >     PGP public key available on web site.
  >~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
  >-----BEGIN PGP SIGNATURE-----
  >Version: GnuPG v2
  >
  >iEYEARECAAYFAlUirykACgkQVKC1jlbQAQc83ACfa7lawkyPFyO9kDE/D8aztql0
  >AkAAoIQ970yTCHB1ypScQ8ILIQR6zphl
  >=ktEg
  >-----END PGP SIGNATURE-----
  >--
  >Please use reply-all for most replies to avoid omitting the mailing list.
  >To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
  >Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

--
Ken Chase - ken att heavycomputing.ca Toronto Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: rsync --link-dest won't link even if existing file is out of date

Kevin Korb
In reply to this post by Ken Chase
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

It is actually pretty simple...
Instead of mkdir you run zfs create [options] /path/to/directory zfspath
When the rsync run finishes you would do: zfs snapshot zfspath@date
When you want to delete an old backup it do: zfs destroy zfspath

To list the subvolumes: zfs list [-t snapshot]

On 04/06/2015 12:12 PM, Ken Chase wrote:

> This has been a consideration. But it pains me that a tiny
> change/addition to the rsync option set would save much time and
> space for other legit use cases.
>
> We know rsync very well, we dont know ZFS very well (licensing kept
> the tech out of our linux-centric operations). We've been using it
> but we're not experts yet.
>
> Thanks for the suggestion.
>
> /kc
>
> On Mon, Apr 06, 2015 at 12:07:05PM -0400, Kevin Korb said: Since
> you are in an environment with millions of files I highly recommend
> that you move to ZFS storage and use ZFS's subvolume snapshots
> instead of --link-dest.  It is much more space efficient, rsync run
> time efficient, and the old backups can be deleted in seconds.
> Rsync doesn't have to understand anything about ZFS.  You just
> rsync to the same directory every time and have ZFS do a snapshot
> on that directory between runs.
>
> On 04/06/2015 01:51 AM, Ken Chase wrote:
>> Feature request: allow --link-dest dir to be linked to even if
>> file exists in target.
>
>> This statement from the man page is adhered to too strongly
>> IMHO:
>
>> "This option works best when copying into an empty destination
>> hierarchy, as rsync treats existing files as definitive (so it
>> never looks in the link-dest dirs when a destination file
>> already exists)".
>
>> I was suprised by this behaviour as generally the scheme is to
>> be efficient/save space with rsync.
>
>> When the file is out of date but exists in the --l-d target, it
>> would be great if it could be removed and linked. If an option
>> was supplied to request this behaviour, I'd actually throw some
>> money at making it happen.  (And a further option to retain a
>> copy if inode permissions/ownership would otherwise be changed.)
>
>> Reasoning:
>
>> I backup many servers with --link-dest that have filesystems of
>> 10+M files on them.  I do not delete old backups - which take
>> 60min per tree or more just so rsync can recreate them all in an
>> empty target dir when <1% of files change per day (takes 3-5 hrs
>> per backup!).
>
>> Instead, I cycle them in with mv $olddate $today then rsync
>> --del --link-dest over them - takes 30-60 min depending. (Yes,
>> some malleability of permissions risk there, mostly interested
>> in contents tho).  Problem is, if a file exists AT ALL, even out
>> of date, a new copy is put overtop of it per the above man page
>> decree.
>
>> Thus much more disk space is used. Running this scheme with
>> moving old backups to be written overtop of accumulates many
>> copies of the exact same file over time.  Running pax -rpl over
>> the copies before rsyncing to them works (and saves much space!),
>> but takes a very long time as it traverses and compares 2 large
>> backup trees thrashing the same device (in the order of 3-5x the
>> rsync's time, 3-5 hrs for pax - hardlink(1) is far worse, I
>> suspect a some non-linear algorithm therein - it ran 3-5x slower
>> than pax again).
>
>> I have detailed an example of this scenario at
>
>> http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists
>
>>  which also indicates --delete-before and --whole-file do not
>> help at all.
>
>> /kc
>
>
>> -- Please use reply-all for most replies to avoid omitting the
>> mailing list. To unsubscribe or change options:
>> https://lists.samba.org/mailman/listinfo/rsync Before posting,
>> read: http://www.catb.org/~esr/faqs/smart-questions.html
>

- --
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
        Kevin Korb Phone:    (407) 252-6853
        Systems Administrator Internet:
        FutureQuest, Inc. [hidden email]  (work)
        Orlando, Florida [hidden email] (personal)
        Web page: http://www.sanitarium.net/
        PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlUitHAACgkQVKC1jlbQAQeLYQCghRS26weHdBuYDAGBtM0mSB22
OvMAnjmLti7BqNiD9bCfjdewQQ/x2jts
=kFFB
-----END PGP SIGNATURE-----
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: rsync --link-dest won't link even if existing file is out of date

Kevin Korb
In reply to this post by Clint Olsen
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

ZFS does have big RAM requirements.  8GB of RAM is pretty much the
minimum.  As for CPU besides being new enough to be on a motherboard
with 8GB of RAM you should be fine.

On 04/06/2015 12:25 PM, Clint Olsen wrote:

> Not to mention the fact that ZFS requires considerable hardware
> resources (CPU & memory) to perform well. It also requires you to
> learn a whole new terminology to wrap your head around it.
>
> It's certainly not a trivial swap to say the least...
>
> Thanks,
>
> -Clint
>
> On Mon, Apr 6, 2015 at 9:12 AM, Ken Chase
> <[hidden email] <mailto:[hidden email]>>
> wrote:
>
> This has been a consideration. But it pains me that a tiny
> change/addition to the rsync option set would save much time and
> space for other legit use cases.
>
> We know rsync very well, we dont know ZFS very well (licensing kept
> the tech out of our linux-centric operations). We've been using it
> but we're not experts yet.
>
> Thanks for the suggestion.
>
> /kc
>
> On Mon, Apr 06, 2015 at 12:07:05PM -0400, Kevin Korb said: Since
> you are in an environment with millions of files I highly recommend
> that you move to ZFS storage and use ZFS's subvolume snapshots
> instead of --link-dest.  It is much more space efficient, rsync run
> time efficient, and the old backups can be deleted in seconds.
> Rsync doesn't have to understand anything about ZFS.  You just
> rsync to the same directory every time and have ZFS do a
>> snapshot
> on that directory between runs.
>
> On 04/06/2015 01:51 AM, Ken Chase wrote:
>> Feature request: allow --link-dest dir to be linked to even if
>> file exists in target.
>
>> This statement from the man page is adhered to too strongly
>> IMHO:
>
>> "This option works best when copying into an empty destination
>> hierarchy, as rsync treats existing files as definitive (so it
>> never looks in the link-dest dirs when a destination file
>> already exists)".
>
>> I was suprised by this behaviour as generally the scheme is to
>> be efficient/save space with rsync.
>
>> When the file is out of date but exists in the --l-d target, it
>> would be great if it could be removed and linked. If an option
>> was supplied to request this behaviour, I'd actually throw some
>> money at making it happen.  (And a further option to retain a
>> copy if inode permissions/ownership would otherwise be changed.)
>
>> Reasoning:
>
>> I backup many servers with --link-dest that have filesystems of
>> 10+M files on them.  I do not delete old backups - which take
>> 60min per tree or more just so rsync can recreate them all in an
>> empty target dir when <1% of files change per day (takes 3-5 hrs
>> per backup!).
>
>> Instead, I cycle them in with mv $olddate $today then rsync
>> --del --link-dest over them - takes 30-60 min depending. (Yes,
>> some malleability of permissions risk there, mostly interested
>> in contents tho).  Problem is, if a file exists AT ALL, even out
>> of date, a new copy is put overtop of it per the above man page
>> decree.
>
>> Thus much more disk space is used. Running this scheme with
>> moving old backups to be written overtop of accumulates many
>> copies of the exact same file over time.  Running pax -rpl over
>> the copies before rsyncing to them works (and saves much space!),
>> but takes a very long time as it traverses and compares 2 large
>> backup trees thrashing the same device (in the order of 3-5x the
>> rsync's time, 3-5 hrs for pax - hardlink(1) is far worse, I
>> suspect a some non-linear algorithm therein - it ran 3-5x slower
>> than pax again).
>
>> I have detailed an example of this scenario at
>
>
>> http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists
>
>>  which also indicates --delete-before and --whole-file do not
>> help at all.
>
>> /kc
>
>
>> -- Please use reply-all for most replies to avoid omitting the
> mailing list.
>> To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync
>> Before posting, read:
> http://www.catb.org/~esr/faqs/smart-questions.html
>
> -- Ken Chase - ken att heavycomputing.ca
> <http://heavycomputing.ca> Toronto Canada Heavy Computing - Clued
> bandwidth, colocation and managed linux VPS @151 Front St. W. --
> Please use reply-all for most replies to avoid omitting the
> mailing list. To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync Before posting,
> read: http://www.catb.org/~esr/faqs/smart-questions.html
>
>
>
>

- --
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
        Kevin Korb Phone:    (407) 252-6853
        Systems Administrator Internet:
        FutureQuest, Inc. [hidden email]  (work)
        Orlando, Florida [hidden email] (personal)
        Web page: http://www.sanitarium.net/
        PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlUitMsACgkQVKC1jlbQAQcNBgCeLznsYokPy4A3BGmsRmabFmag
C4IAoKWUVb+azUEXtMFdUQHKUTU4kV3+
=cuLG
-----END PGP SIGNATURE-----
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: rsync --link-dest won't link even if existing file is out of date

Wayne Davison-2
In reply to this post by Ken Chase

On Sun, Apr 5, 2015 at 10:51 PM, Ken Chase <[hidden email]> wrote:
Feature request: allow --link-dest dir to be linked to even if file exists
in target.

From the release notes for 3.1.0:
  • Improved the use of alt-dest options into an existing hierarchy of files:  If a match is found in an alt-dir, it takes precedence over an existing file.  (We'll need to wait for a future version before attribute-changes on otherwise unchanged files are safe when using an existing hierarchy.)
So, storage savings are realized, and things like mode changes affect all the hard-linked files.

..wayne..

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: rsync --link-dest won't link even if existing file is out of date

Henri Shustak
In reply to this post by Ken Chase
Hi Ken,

You may wish to take a quick look at LBackup (disclaimer I am a developer on this project) which is a wrapper to rsync ; designed for reliable user data backups.

LBackup always starts a new backup snapshot with an empty directory. I have been looking at extending --link-dest options to scan beyond just the previous successful backup to (failed backups / older backups). However, there are all kinds of edge cases which are worth considering with such a changes. At present LBackup is focused on reliability as such, this R&D is quite slow given limited resources. The current version of LBackup offers IMHO reliable backups of user data and the scripting sub-system offers a high degree of flxibility.

Yes, every time you start a backup snapshot, a directory is re-populated from scratch and this takes time with LBackup. However, if you are seeking reliability then you may wish to check out the following URL : http://www.lbackup.org

If you are looking to speed up performance, then investing in faster hardware, additional file system caching or considering various file systems is well worth while.

Ideas and patches are welcome to improve the LBackup project.

--------------------------------------------------------------------
This email is protected by LBackup, an open source backup solution
http://www.lbackup.org

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: rsync --link-dest won't link even if existing file is out of date

Henri Shustak
> Ill take a look but I imagine I cant backup the 80 Million files I need
> to in under the 5 hours i have for nightly maintenance/backups. Currently
> it's possible by recycling directories...

To cover that many files in that much time you will require a high speed system.
Just another thought. Perhaps splitting the backup onto multiple backup servers / storage systems would reduce the backup time so that it fits into your window?

Also, I strongly agree with the previous posts relating to file system snapshots. ZFS is just one file system which supports this kind of system.

---------------------------------------------------------------------
This email is protected by LBackup, an open source backup solution.
http://www.lbackup.org
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: rsync --link-dest won't link even if existing file is out of date

Kevin Korb
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/14/2015 11:35 PM, Henri Shustak wrote:
>> Ill take a look but I imagine I cant backup the 80 Million files
>> I need to in under the 5 hours i have for nightly
>> maintenance/backups. Currently it's possible by recycling
>> directories...

I would expect that recycling directories actually makes this worse.
With an empty target directory you don't even need the overhead of
- --delete (not as bad as it used to be thanks to --delete-during but it
is still overhead).  If your backup window is only 5 hours then that
leaves you with 19 hours a day to do other things on your backup
server(s) such as deleting off old backups.  Get all those unlink()
calls out of your backup window.  Bad enough you need to do 80 million
calls to stat().

>
> To cover that many files in that much time you will require a high
> speed system. Just another thought. Perhaps splitting the backup
> onto multiple backup servers / storage systems would reduce the
> backup time so that it fits into your window?

Agreed completely here.  It is much easier to make more backup servers
than it is to make one big one that can handle the entire load.  We
divide our backup load by server.  IOW, each backup server has a list
of production servers that it backs up.

> Also, I strongly agree with the previous posts relating to file
> system snapshots. ZFS is just one file system which supports this
> kind of system.

I have also attempted to use btrfs in Linux for this.  I even wrote up
a presentation for my local LUG about it:
https://sanitarium.net/golug/rsync+btrfs_backups_2011.html
Unfortunately there was nothing but grief.  The btrfs just wasn't
stable enough and the btrfs-cleaner kernel thread drove performance
into the ground.   We eventually had to abandon it in favor of ZFS on
TrueOS.

As far as "fast box" goes we decided on 8GB of RAM for most of the
backup servers and essentially whatever CPU can handle that much RAM.
 Most of them are older AMD Athlon 64 X2 desktops.  We do have one
with a quad core CPU and 16GB of RAM.  That is the only one running
ZFS de-duplication as that is the big RAM hog.

> ---------------------------------------------------------------------
>
>
This email is protected by LBackup, an open source backup solution.
> http://www.lbackup.org
>

- --
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
        Kevin Korb Phone:    (407) 252-6853
        Systems Administrator Internet:
        FutureQuest, Inc. [hidden email]  (work)
        Orlando, Florida [hidden email] (personal)
        Web page: http://www.sanitarium.net/
        PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlUuCP0ACgkQVKC1jlbQAQfvtQCgyUNEGbwYaX3RILUnBvHCn1KH
x4MAoIqmRBNpMDkZfiqndZ6oll+GfhLH
=8saN
-----END PGP SIGNATURE-----
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: rsync --link-dest won't link even if existing file is out of date

Ken Chase
80 million calls isnt 'that bad' since it completes in 5 hours, yes? I suppose
I dont mind. I should throw more ram in the box and figure out how to tune
meta-data caching so its preferred over file data. Then it'd be quicker.

Either way, it's working for me now, and in fact, if the backup server is
'slow', then thrashing the production servers isnt as bad. I want them
thrashed just the right amount so things complete in a total of 5 hours,
about 1 hr per server. (--bwlimit doenst do it, because much thrashing
of 15M+ files/server is enough meta-data cache disk IO).

More servers? more U and more power in my $expensive facility? Things are
working for me now because I am not recreating 80 million links, which seems
to cost much much more than 80 million stats (reads vs reads+writes+head-thrash-
between-them?)

My secret is doing the servers sequentially, so I dont have more head-thrash
on the backup server. Then things really start slowing down.

Though Im interested in ZFS dedupe, this is the wrong list for that :)

Im more curious about how my system is actually NOT working properly for me,
other than using up more disk than i wanted - except that's fixed now in 3.1
apparently (hasnt made it into debian stable yet, among other distros...).

Im just pax -rwl'ing my old backups manually til they're all using the same inodes,
then I can continue with mv $olddate $otday; rsync --link-dest=$yest host:/dir $today
and things will be great. I've already saved 1TB of 12 over the last 2 weeks by paxing
(and in the last week with the 3.1 rsync), and I expect that to drop by another 2-3TB
over the next month.

/kc


On Wed, Apr 15, 2015 at 02:45:17AM -0400, Kevin Korb said:
  >-----BEGIN PGP SIGNED MESSAGE-----
  >Hash: SHA1
  >
  >On 04/14/2015 11:35 PM, Henri Shustak wrote:
  >>> Ill take a look but I imagine I cant backup the 80 Million files
  >>> I need to in under the 5 hours i have for nightly
  >>> maintenance/backups. Currently it's possible by recycling
  >>> directories...
  >
  >I would expect that recycling directories actually makes this worse.
  >With an empty target directory you don't even need the overhead of
  >- --delete (not as bad as it used to be thanks to --delete-during but it
  >is still overhead).  If your backup window is only 5 hours then that
  >leaves you with 19 hours a day to do other things on your backup
  >server(s) such as deleting off old backups.  Get all those unlink()
  >calls out of your backup window.  Bad enough you need to do 80 million
  >calls to stat().
  >
  >>
  >> To cover that many files in that much time you will require a high
  >> speed system. Just another thought. Perhaps splitting the backup
  >> onto multiple backup servers / storage systems would reduce the
  >> backup time so that it fits into your window?
  >
  >Agreed completely here.  It is much easier to make more backup servers
  >than it is to make one big one that can handle the entire load.  We
  >divide our backup load by server.  IOW, each backup server has a list
  >of production servers that it backs up.
  >
  >> Also, I strongly agree with the previous posts relating to file
  >> system snapshots. ZFS is just one file system which supports this
  >> kind of system.
  >
  >I have also attempted to use btrfs in Linux for this.  I even wrote up
  >a presentation for my local LUG about it:
  >https://sanitarium.net/golug/rsync+btrfs_backups_2011.html
  >Unfortunately there was nothing but grief.  The btrfs just wasn't
  >stable enough and the btrfs-cleaner kernel thread drove performance
  >into the ground.   We eventually had to abandon it in favor of ZFS on
  >TrueOS.
  >
  >As far as "fast box" goes we decided on 8GB of RAM for most of the
  >backup servers and essentially whatever CPU can handle that much RAM.
  > Most of them are older AMD Athlon 64 X2 desktops.  We do have one
  >with a quad core CPU and 16GB of RAM.  That is the only one running
  >ZFS de-duplication as that is the big RAM hog.
  >
  >> ---------------------------------------------------------------------
  >>
  >>
  >This email is protected by LBackup, an open source backup solution.
  >> http://www.lbackup.org
  >>
  >
  >- --
  >~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
  > Kevin Korb Phone:    (407) 252-6853
  > Systems Administrator Internet:
  > FutureQuest, Inc. [hidden email]  (work)
  > Orlando, Florida [hidden email] (personal)
  > Web page: http://www.sanitarium.net/
  > PGP public key available on web site.
  >~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
  >-----BEGIN PGP SIGNATURE-----
  >Version: GnuPG v2
  >
  >iEYEARECAAYFAlUuCP0ACgkQVKC1jlbQAQfvtQCgyUNEGbwYaX3RILUnBvHCn1KH
  >x4MAoIqmRBNpMDkZfiqndZ6oll+GfhLH
  >=8saN
  >-----END PGP SIGNATURE-----
  >--
  >Please use reply-all for most replies to avoid omitting the mailing list.
  >To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
  >Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

--
Ken Chase - [hidden email] skype:kenchase23 +1 416 897 6284 Toronto Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html