[Bug 12570] New: Problems with --checksum --existing

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 12570] New: Problems with --checksum --existing

samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=12570

            Bug ID: 12570
           Summary: Problems with --checksum --existing
           Product: rsync
           Version: 3.1.1
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P5
         Component: core
          Assignee: [hidden email]
          Reporter: [hidden email]
        QA Contact: [hidden email]

Problem:

I've got an sd-card with some movies, a few of which are corrupted files.

I want to copy only the files that don't match the good files.

command:
 rsync --checksum --existing -vhriP /movies/ /media/128-SD/Movies/

The problem here is that *all* files in "/movies/" are hashed before anything
else happens. This can be verified with lsof: "lsof +D /movies".

I've got <100GB in "/media/128-SD/Movies/".

I've got >1.5TB in "/movies/", and hashing all of those files is just a huge
waste of time and system resources.

When "--existing" and "--checksum" are both used, the algorithm should first
make a list of candidate files, then start hashing. It should *not* start
hashing everything on the send-side and then figure out which files might be
needed.

Workaround for me:
 diff -r /movies/ /media/128-SD/Movies/ | grep differ | awk '{print "pv " $3" >
"$5}' | sh

nb, that workaround requires "pv" and only works with file-names that do not
contain spaces, but for me it's a quick and easy way to see progress while
files are being copied. "cp" would work fine in place of "pv".

On my system, that workaround saved my about 1-2 days of hashing, and completed
in less than an hour.

--
You are receiving this mail because:
You are the QA Contact for the bug.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

[Bug 12570] Problems with --checksum --existing

samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=12570

--- Comment #1 from Kevin Korb <[hidden email]> ---
Unfortunately rync's --checksum is just that dumb.  It checksums *EVERYTHING*
on the source and the target before it does anything else.  Since --checksum is
almost always the wrong thing to do nobody seems to be willing to add basic
intelligence to it.  Unfortunately, what you are trying to do is one of those
few instances when --checksum is the right thing to use.  So, that is just the
way it works.

--
You are receiving this mail because:
You are the QA Contact for the bug.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html