rsync: "-c" option clarification

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

rsync: "-c" option clarification

Samba - rsync mailing list

Hi


I am using "rsync" to send files from a source machine to a remote machine as one typically does.  I would like to clarify that the "-c" option will cause the checksum on the receiving end to be created by reading the already written file and NOT the data stream on the receiving end.  This would help in catching disk I/O errors if the checksum is done on the file on disk.

I understand if the size and (or date?) don't match, the checksum is not needed on the receiving end.

I may be missing something but it wasn't entirely clear to me that the checksum is done based on the file on disk.

Thanks,
-Steve


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: rsync: "-c" option clarification

Samba - rsync mailing list
The -c option causes rsync to checksum EVERY file on both ends BEFORE
rsync does anything else.  It checksums files that are on only 1 end.
It checksums files that are different sizes.  It will not catch a
hardware problem preventing rsync from writing a file correctly.

On 03/23/2017 03:12 PM, steven banville via rsync wrote:

>
> Hi
>
>
> I am using "rsync" to send files from a source machine to a remote
> machine as one typically does.  I would like to clarify that the "-c"
> option will cause the checksum on the receiving end to be created by
> reading the already written file and NOT the data stream on the
> receiving end.  This would help in catching disk I/O errors if the
> checksum is done on the file on disk.
>
> I understand if the size and (or date?) don't match, the checksum is not
> needed on the receiving end.
>
> I may be missing something but it wasn't entirely clear to me that the
> checksum is done based on the file on disk.
>
> Thanks,
> -Steve
>
>
>
--
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
        Kevin Korb Phone:    (407) 252-6853
        Systems Administrator Internet:
        FutureQuest, Inc. [hidden email]  (work)
        Orlando, Florida [hidden email] (personal)
        Web page: http://www.sanitarium.net/
        PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

signature.asc (231 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: rsync: "-c" option clarification

Samba - rsync mailing list
Before anyone yells at me, yes, you can use rsync's --checksum to detect
(and fix) files that are incorrect despite having correct timestamps and
sizes.  This would mean that a previous rsync had been corrupted not the
current one.  But it is important to note that this would only be
reported to you if you also use --itemize-changes and what to look for
(a file with a c but not an s or a t).

It is also worth noting that single file compression tools (like gzip)
automatically set the original mtime when compressing or decompressing.
If you decompress then recompress such a file you can cause a case of a
file with matching mtime+size but not matching checksum due to gzip's
metadata even though the uncompressed result is identical.  I would not
consider this to be a case worth updating the remote copy but I am sure
someone will disagree.

On 03/23/2017 03:49 PM, Kevin Korb via rsync wrote:

> The -c option causes rsync to checksum EVERY file on both ends BEFORE
> rsync does anything else.  It checksums files that are on only 1 end.
> It checksums files that are different sizes.  It will not catch a
> hardware problem preventing rsync from writing a file correctly.
>
> On 03/23/2017 03:12 PM, steven banville via rsync wrote:
>>
>> Hi
>>
>>
>> I am using "rsync" to send files from a source machine to a remote
>> machine as one typically does.  I would like to clarify that the "-c"
>> option will cause the checksum on the receiving end to be created by
>> reading the already written file and NOT the data stream on the
>> receiving end.  This would help in catching disk I/O errors if the
>> checksum is done on the file on disk.
>>
>> I understand if the size and (or date?) don't match, the checksum is not
>> needed on the receiving end.
>>
>> I may be missing something but it wasn't entirely clear to me that the
>> checksum is done based on the file on disk.
>>
>> Thanks,
>> -Steve
>>
>>
>>
>
>
>
--
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
        Kevin Korb Phone:    (407) 252-6853
        Systems Administrator Internet:
        FutureQuest, Inc. [hidden email]  (work)
        Orlando, Florida [hidden email] (personal)
        Web page: http://www.sanitarium.net/
        PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

signature.asc (231 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: rsync: "-c" option clarification

Samba - rsync mailing list

Hi

This is a very delayed response but thanks very much for your answer, it is appreciated.

It seems that if you do an rsync a second / subsequent time with the "-c" (--checksum), say for data that has not changed, it would have to generate checksums from the files on disk at both ends, even if the size and timestamps are the same, is this not the case ?  If it is, then it would seem we would be catching a disk write error.

In the past I had experienced issues with hardware writes failing (network or disk), and although rare, for some critical data it is something of concern; that is what prompted this question.  I don't need this high level of fidelity of most data, just a small subset.

The use case is:
  * Create raw data
  * Move to backup location very reliably.
  * Delete original data set.

Thanks again.

Steven Banville
Cirina
201 Gateway Boulevard, Floor 1
South San Francisco, CA 94080-7019
http://cirina.com/

This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

________________________________________
From: rsync [[hidden email]] on behalf of Kevin Korb via rsync [[hidden email]]
Sent: Thursday, March 23, 2017 1:10 PM
To: [hidden email]
Subject: Re: rsync: "-c" option clarification

Before anyone yells at me, yes, you can use rsync's --checksum to detect
(and fix) files that are incorrect despite having correct timestamps and
sizes.  This would mean that a previous rsync had been corrupted not the
current one.  But it is important to note that this would only be
reported to you if you also use --itemize-changes and what to look for
(a file with a c but not an s or a t).

It is also worth noting that single file compression tools (like gzip)
automatically set the original mtime when compressing or decompressing.
If you decompress then recompress such a file you can cause a case of a
file with matching mtime+size but not matching checksum due to gzip's
metadata even though the uncompressed result is identical.  I would not
consider this to be a case worth updating the remote copy but I am sure
someone will disagree.

On 03/23/2017 03:49 PM, Kevin Korb via rsync wrote:

> The -c option causes rsync to checksum EVERY file on both ends BEFORE
> rsync does anything else.  It checksums files that are on only 1 end.
> It checksums files that are different sizes.  It will not catch a
> hardware problem preventing rsync from writing a file correctly.
>
> On 03/23/2017 03:12 PM, steven banville via rsync wrote:
>>
>> Hi
>>
>>
>> I am using "rsync" to send files from a source machine to a remote
>> machine as one typically does.  I would like to clarify that the "-c"
>> option will cause the checksum on the receiving end to be created by
>> reading the already written file and NOT the data stream on the
>> receiving end.  This would help in catching disk I/O errors if the
>> checksum is done on the file on disk.
>>
>> I understand if the size and (or date?) don't match, the checksum is not
>> needed on the receiving end.
>>
>> I may be missing something but it wasn't entirely clear to me that the
>> checksum is done based on the file on disk.
>>
>> Thanks,
>> -Steve
>>
>>
>>
>
>
>

--
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
        Kevin Korb                      Phone:    (407) 252-6853
        Systems Administrator           Internet:
        FutureQuest, Inc.               [hidden email]  (work)
        Orlando, Florida                [hidden email] (personal)
        Web page:                       http://www.sanitarium.net/
        PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: rsync: "-c" option clarification

Samba - rsync mailing list
inline...

On 05/19/2017 06:07 PM, steven banville wrote:
>
> Hi
>
> This is a very delayed response but thanks very much for your answer, it is appreciated.
>
> It seems that if you do an rsync a second / subsequent time with the "-c" (--checksum), say for data that has not changed, it would have to generate checksums from the files on disk at both ends, even if the size and timestamps are the same, is this not the case ?  If it is, then it would seem we would be catching a disk write error.

Yes, it checks every file even if the timestamps match.  It even
checksums the files that only exist on one end!

This does not necessarily detect disk errors unless you flush your cache
between runs.  It also wouldn't report catching corruption without
--itemize-changes and your interpretation of that output.  Even then
there can be false positives (gzip and similar will backdate a file when
you compress/decompress even though the compressed version can be
different).

> In the past I had experienced issues with hardware writes failing (network or disk), and although rare, for some critical data it is something of concern; that is what prompted this question.  I don't need this high level of fidelity of most data, just a small subset.
>
> The use case is:
>   * Create raw data
>   * Move to backup location very reliably.
>   * Delete original data set.

The only time I have seen this kind of problem was when there was bad
RAM being used as disk cache.  The solution there is ECC RAM.

> Thanks again.
>
> Steven Banville
> Cirina
> 201 Gateway Boulevard, Floor 1
> South San Francisco, CA 94080-7019
> http://cirina.com/
>
> This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
>
> ________________________________________
> From: rsync [[hidden email]] on behalf of Kevin Korb via rsync [[hidden email]]
> Sent: Thursday, March 23, 2017 1:10 PM
> To: [hidden email]
> Subject: Re: rsync: "-c" option clarification
>
> Before anyone yells at me, yes, you can use rsync's --checksum to detect
> (and fix) files that are incorrect despite having correct timestamps and
> sizes.  This would mean that a previous rsync had been corrupted not the
> current one.  But it is important to note that this would only be
> reported to you if you also use --itemize-changes and what to look for
> (a file with a c but not an s or a t).
>
> It is also worth noting that single file compression tools (like gzip)
> automatically set the original mtime when compressing or decompressing.
> If you decompress then recompress such a file you can cause a case of a
> file with matching mtime+size but not matching checksum due to gzip's
> metadata even though the uncompressed result is identical.  I would not
> consider this to be a case worth updating the remote copy but I am sure
> someone will disagree.
>
> On 03/23/2017 03:49 PM, Kevin Korb via rsync wrote:
>> The -c option causes rsync to checksum EVERY file on both ends BEFORE
>> rsync does anything else.  It checksums files that are on only 1 end.
>> It checksums files that are different sizes.  It will not catch a
>> hardware problem preventing rsync from writing a file correctly.
>>
>> On 03/23/2017 03:12 PM, steven banville via rsync wrote:
>>>
>>> Hi
>>>
>>>
>>> I am using "rsync" to send files from a source machine to a remote
>>> machine as one typically does.  I would like to clarify that the "-c"
>>> option will cause the checksum on the receiving end to be created by
>>> reading the already written file and NOT the data stream on the
>>> receiving end.  This would help in catching disk I/O errors if the
>>> checksum is done on the file on disk.
>>>
>>> I understand if the size and (or date?) don't match, the checksum is not
>>> needed on the receiving end.
>>>
>>> I may be missing something but it wasn't entirely clear to me that the
>>> checksum is done based on the file on disk.
>>>
>>> Thanks,
>>> -Steve
>>>
>>>
>>>
>>
>>
>>
>
> --
> ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
>         Kevin Korb                      Phone:    (407) 252-6853
>         Systems Administrator           Internet:
>         FutureQuest, Inc.               [hidden email]  (work)
>         Orlando, Florida                [hidden email] (personal)
>         Web page:                       http://www.sanitarium.net/
>         PGP public key available on web site.
> ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
>
--
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
        Kevin Korb Phone:    (407) 252-6853
        Systems Administrator Internet:
        FutureQuest, Inc. [hidden email]  (work)
        Orlando, Florida [hidden email] (personal)
        Web page: http://www.sanitarium.net/
        PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

signature.asc (231 bytes) Download Attachment