|
I want to use rsync with a cloud based rsync provider to do off-site backing up of a large (1TB) dataset which consists of 32 million+ files spread out in 300 directories. So the amount of files in any one directory can be quite large (upwards of 2 million).
Rsync doesn't seem to cope with this well - even doing local copies in a directory with several thousands of files takes a long time to initiate any transferring. I though that with version 3, rsync was supposed to start transferring before fully testing all of the files in a directory?
I am using version 3.0.9 under Cygwin. Is there a command line switch I am supposed to use to force rsync to start transferring more quickly? Any insight / suggestions would be most appreciated.
Thank you.
-- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html |
|
On Thu 19 Jul 2012, Cary Lewis wrote:
> Rsync doesn't seem to cope with this well - even doing local copies in a > directory with several thousands of files takes a long time to initiate any > transferring. > > I though that with version 3, rsync was supposed to start transferring > before fully testing all of the files in a directory? Maybe it's busy testing a lot of files before finding any that need transferring, so you don't see anything happen in the meantime. Paul -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html |
|
In reply to this post by Cary Lewis
On Thu, Jul 19, 2012 at 01:51:43PM -0400, Cary Lewis wrote:
> I want to use rsync with a cloud based rsync provider to do off-site > backing up of a large (1TB) dataset which consists of 32 million+ files > spread out in 300 directories. So the amount of files in any one directory > can be quite large (upwards of 2 million). You realize that stat() is a costly operation, especially if the inodes are cache cold, even more so if something else stresses the IO and VM subsystems on the box. On a moderately loaded box, recursively stating 3 million files occasionally took 90 minutes and more. Doing the same once the inodes are cache-hot takes the same box under the same overall stress 30 to 90 *seconds*. Holding 3 Millon dentries and inodes cache-hot requires (on that box, anyways) ~ 5 Gigabyte of slab memory (of 128 G available...). So if you want to regularly recursively stat (and that's what rsync needs to do) 32 millon files, you better add more ram, much more ram, to your box. Also, you mention Cygwin. IIRC, by default, that will still treat file names as case*in*sensitive, so you get really bad (maybe O N^2?) behaviour when walking large directories. There was some setting which I do not remember right now, to tell rsync and/or cygwin to treat this as casesensitive, which can seriously improve behaviour with large directories. > Rsync doesn't seem to cope with this well - even doing local copies in a > directory with several thousands of files takes a long time to initiate any > transferring. I'm speculating here. But I thought the file list generation is still per sub-directory, so would need to scan the current subdir fully before starting to work on the resulting partial file list. > I though that with version 3, rsync was supposed to start transferring > before fully testing all of the files in a directory? > > I am using version 3.0.9 under Cygwin. > > Is there a command line switch I am supposed to use to force rsync to start > transferring more quickly? > > Any insight / suggestions would be most appreciated. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html |
|
Thanks so much for the info. It does appears as though rsync scans the entire subdir before doing anything, which seems pretty inefficient, perhaps this will be improved in a future release. Although, maybe it has to be this way, so that the --delete commands can work?
On Thu, Jul 26, 2012 at 4:42 PM, Lars Ellenberg <[hidden email]> wrote:
-- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html |
| Powered by Nabble | Edit this page |
