[Bug 12819] New: [PATCH] sync() on receiving side for data consistency

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 12819] New: [PATCH] sync() on receiving side for data consistency

Samba - rsync mailing list
https://bugzilla.samba.org/show_bug.cgi?id=12819

            Bug ID: 12819
           Summary: [PATCH] sync() on receiving side for data consistency
           Product: rsync
           Version: 3.1.2
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P5
         Component: core
          Assignee: [hidden email]
          Reporter: [hidden email]
        QA Contact: [hidden email]

Created attachment 13253
  --> https://bugzilla.samba.org/attachment.cgi?id=13253&action=edit
rsync_sync

Hello,

Here is a patch which sync() once files received, for data consistency.

Thank you !

Ben

--
You are receiving this mail because:
You are the QA Contact for the bug.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

[Bug 12819] [PATCH] sync() on receiving side for data consistency

Samba - rsync mailing list
https://bugzilla.samba.org/show_bug.cgi?id=12819

--- Comment #1 from Brian K. White <[hidden email]> ---
This seems wrong to me. If the OS is failing to manage write buffers and file
access between processes, you would have a lot bigger problems in every process
all through the system, and this wouldn't fix it.

Similarly, if rsync were corrupting data, a lot of people would already know
about it. It gets used way too much and too heavily for anything like this to
go unnoticed for more than a day, let alone 15 or more years.

It's almost axiomatic: No matter what problem you think you have, no matter
what language or OS or platform, if you think it's fixed by either sleep() or
sync(), it's not.

--
You are receiving this mail because:
You are the QA Contact for the bug.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

[Bug 12819] [PATCH] sync() on receiving side for data consistency

Samba - rsync mailing list
In reply to this post by Samba - rsync mailing list
https://bugzilla.samba.org/show_bug.cgi?id=12819

--- Comment #2 from Ben RUBSON <[hidden email]> ---
Thank you for your feedback Brian.
I don't have any problem.
I just want to be sure that when client (sender) has finished its transfer, its
data is on server's (receiver) disks, before it disconnects.
So that when it correctly / successfully disconnects, its data is for sure on
disks.

On disks means on platters, so that if there is a failure (hardware, power...),
data is safe, not lost.

Of course disks which do not lie about sync() command must be used (data must
be on platters, not only in disks' cache). As well as a robust filesystem, some
redundancy... (but here that's off-topic).

Perhaps we could make it an option, so that those who have OS failing to manage
write buffers would not be degraded even more... But certainly they should have
a look to their performance issue first.

--
You are receiving this mail because:
You are the QA Contact for the bug.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

[Bug 12819] [PATCH] sync() on receiving side for data consistency

Samba - rsync mailing list
In reply to this post by Samba - rsync mailing list
https://bugzilla.samba.org/show_bug.cgi?id=12819

--- Comment #3 from Paul Slootman <[hidden email]> ---
How about just using a post-xfer command on the server side that does 'sync'?

--
You are receiving this mail because:
You are the QA Contact for the bug.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

[Bug 12819] [PATCH] sync() on receiving side for data consistency

Samba - rsync mailing list
In reply to this post by Samba - rsync mailing list
https://bugzilla.samba.org/show_bug.cgi?id=12819

--- Comment #4 from Ben RUBSON <[hidden email]> ---
Yes Paul I thought about it but sync command may not be available if the server
(receiver) is chrooted (for example using patch proposed in #12817).

--
You are receiving this mail because:
You are the QA Contact for the bug.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

[Bug 12819] [PATCH] sync() on receiving side for data consistency

Samba - rsync mailing list
In reply to this post by Samba - rsync mailing list
https://bugzilla.samba.org/show_bug.cgi?id=12819

--- Comment #5 from Brian K. White <[hidden email]> ---
Any program could make this same "just to be safe" argument practically every
time they ever close-on-write for any reason. If they wrote anything, it was
always for some reason, and they want to know for sure that it really got
safely written. There is nothing special about rsync in that regard. cp might
as well have it. The ">" operator in bash might as well have it.

The kernel and vfs and hardware drivers all already do whatever is necessary in
that regard, and it's generally wrong for any application to try to do it
itself. Otherwise the disk would be in a constant state of sync()'ing and never
actually manage to get any other work done. Consider a multiuser host with 500
rsync receivers. Each individual sync() is incredibly disruptive to all other
processes. "Everyone hold up while we flush the disk buffer...". The entire
system waits while that happens.

That way just leads to things like the example you just used, lower layers that
just start lying about sync() to upper layers because too many apps use it when
they shouldn't. "Fine, if apps are going to sync all the time, that ends up
being 86 times a second between all procs running at any given moment, which is
unsupportable, so we'll just make sync() a no-op stub and we'll do it when
it's' actually required, and apps can sync()-away to their hearts content".

I think the only reason rsync might have to sync is if you built rsync as a
self-contained bootable executable like memtest86, or possibly as an MS-DOS
executable.

--
You are receiving this mail because:
You are the QA Contact for the bug.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

[Bug 12819] [PATCH] sync() on receiving side for data consistency

Samba - rsync mailing list
In reply to this post by Samba - rsync mailing list
https://bugzilla.samba.org/show_bug.cgi?id=12819

--- Comment #6 from Brian K. White <[hidden email]> ---
Think of it this way, write() already makes a certain promise that it will not
return until it's done it's job, and it will not assert success when it can't.
Essentially the man page for any syscall is a contract. In fact all API's are
contracts.

write() in turn is relies on various other calls to even lower layers to keep
their promises too, to manage the in-kernel buffer or the cache on a raid card
etc.

All of these things MUST be relied on rather than second-guessed. It would be
insane for example, for write() to say "I can't really be sure this disk driver
has really done it's thing. I better force it to sync before I return to the
application." or "I can't really be sure malloc() really allocated the memory,
I better malloc 3 or 4 copies and compare them and use whichever copies agree
with each other... It's insane.

You write(), you check the return value, and you're done. The low level
hardware is someone else's job, and you won't be doing a better job than they
already did.

--
You are receiving this mail because:
You are the QA Contact for the bug.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

[Bug 12819] [PATCH] sync() on receiving side for data consistency

Samba - rsync mailing list
In reply to this post by Samba - rsync mailing list
https://bugzilla.samba.org/show_bug.cgi?id=12819

--- Comment #7 from Ben RUBSON <[hidden email]> ---
And what about a power failure between 2 ZFS transaction groups ?

Note that my patch simply adds a sync() just after recv_files(), so one sync()
per connection, not per write operation.
Quite low workload actually :)

But we could make this a rsync option, so that one can enable / disable it on
its own.

--
You are receiving this mail because:
You are the QA Contact for the bug.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: [Bug 12819] [PATCH] sync() on receiving side for data consistency

Samba - rsync mailing list
On Thu, 15 Jun 2017 13:23:44 +0000
just subscribed for rsync-qa from bugzilla via rsync
<[hidden email]> wrote:

> https://bugzilla.samba.org/show_bug.cgi?id=12819
>
> --- Comment #7 from Ben RUBSON <[hidden email]> ---

> Note that my patch simply adds a sync() just after recv_files(), so
> one sync() per connection, not per write operation.

> But we could make this a rsync option, so that one can enable /
> disable it on its own.

I think the "right" rsync option to add (because rsync does
not have enough options already ;-) is a --hook-post option.
It would run something (a `sync` in your case) on the
remote end after finishing.  There are clear security issues
here.

Rather than having --hook-post and having to do something
(a server side config option that says what --hook-post
can do?) to address the security concerns it seems much
simpler to improve the rsync documentation regarding running
the rsync server side.

I'm still using command="rsync --server --daemon ." in my
~/.ssh/authorized_keys file on the remote end.  It'd be simple
enough to add, say, a "sync" to the end of this to force a sync
when rsync finishes.  The problem is that the --server (and, especially,
--daemon) documentation has gone away.  Or at least
left the man page. (v3.1.1, Debian 8, Jessie)  Except
for a hint that --server exists at the bottom.

If the server side of rsync was better documented then
perhaps a simple inetd rsync service (or --rsync-path
or -e value, etc.) would be easy for the end-user to
cobble together to meet needs such as this.

Can somebody please explain --server?  (And --sender, I guess.)
I might (possibly) be motivated to send in a man page patch.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

[Bug 12819] [PATCH] sync() on receiving side for data consistency

Samba - rsync mailing list
In reply to this post by Samba - rsync mailing list
https://bugzilla.samba.org/show_bug.cgi?id=12819

--- Comment #8 from Brian K. White <[hidden email]> ---
You tell me, what ABOUT a power failure between 2 zfs, or any other fs
operations?

This does not improve or solve any problem that the fs and all the other layers
aren't already handling. This is simply a misguided idea, however sensible and
attractive it seems.

--
You are receiving this mail because:
You are the QA Contact for the bug.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: [Bug 12819] [PATCH] sync() on receiving side for data consistency

Samba - rsync mailing list
In reply to this post by Samba - rsync mailing list

> On 15 Jun 2017, at 19:29, Karl O. Pinc via rsync <[hidden email]> wrote:
>
> On Thu, 15 Jun 2017 13:23:44 +0000
> just subscribed for rsync-qa from bugzilla via rsync
> <[hidden email]> wrote:
>
>> https://bugzilla.samba.org/show_bug.cgi?id=12819
>>
>> --- Comment #7 from Ben RUBSON <[hidden email]> ---
>
>> Note that my patch simply adds a sync() just after recv_files(), so
>> one sync() per connection, not per write operation.
>
>> But we could make this a rsync option, so that one can enable /
>> disable it on its own.
>
> I think the "right" rsync option to add (because rsync does
> not have enough options already ;-) is a --hook-post option.
> It would run something (a `sync` in your case) on the
> remote end after finishing.  There are clear security issues
> here.
>
> Rather than having --hook-post and having to do something
> (a server side config option that says what --hook-post
> can do?) to address the security concerns it seems much
> simpler to improve the rsync documentation regarding running
> the rsync server side.

--daemon (if used) already has post-xfer option, but as explained in
the bug report, could be hard to use when daemon is chrooted.

> I'm still using command="rsync --server --daemon ." in my
> ~/.ssh/authorized_keys file on the remote end.  It'd be simple
> enough to add, say, a "sync" to the end of this to force a sync
> when rsync finishes.

It would however sync() even if the client only read files.

> The problem is that the --server (and, especially,
> --daemon) documentation has gone away.  Or at least
> left the man page. (v3.1.1, Debian 8, Jessie)  Except
> for a hint that --server exists at the bottom.

Are you looking for `man rsyncd.conf` ?

> If the server side of rsync was better documented then
> perhaps a simple inetd rsync service (or --rsync-path
> or -e value, etc.) would be easy for the end-user to
> cobble together to meet needs such as this.
>
> Can somebody please explain --server?  (And --sender, I guess.)
> I might (possibly) be motivated to send in a man page patch.
>
> Regards,
>
> Karl <[hidden email]>

Thank you for your feedback Karl !

Ben


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: [Bug 12819] [PATCH] sync() on receiving side for data consistency

Samba - rsync mailing list
On Fri, 16 Jun 2017 12:34:40 +0200
Ben RUBSON via rsync <[hidden email]> wrote:

> > On 15 Jun 2017, at 19:29, Karl O. Pinc via rsync
> > <[hidden email]> wrote:

> > The problem is that the --server (and, especially,
> > --daemon) documentation has gone away.  Or at least
> > left the man page. (v3.1.1, Debian 8, Jessie)  Except
> > for a hint that --server exists at the bottom.  
>
> Are you looking for `man rsyncd.conf` ?

No, that tells me what --daemon does; how to run rsync
as a server.  It does not tell me how to invoke rsync at the
remote end manually without doing server-side things
such as the reading of rsyncd.conf.

What I want documened is how to use a customized
transport that does not allow the client side to
send arbirtrary commands to the remote end.
The sort of thing done when using
ssh with keys and the command= option within an
authorized_keys file.

As mentioned, now I use command="rsync --server --daemon ."
in my authorized_keys file.
I once figured this out from old rsync man pages, but don't
see how to glean this command sequence from a more recent
man page.

Again, I might (eventually) get around to sending
in a man page patch if somebody explains how it's done.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html