Patch to add support for advertising FULLSYNC to Mac OSX Clients

classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Patch to add support for advertising FULLSYNC to Mac OSX Clients

Kevin Anderson
Good afternoon,

Attached is a patch to add support for advertising FULLSYNC to
Mac OSX clients that request it. This is necessary to support
Time Machine backups over SMB. I opened Bugzilla 12380 for this
and Github pull request #64.

Thanks,
Kevin

0001-vfs_fruit-Add-capability-to-advertise-FULLSYNC.patch (4K) Download Attachment
0002-vfs_fruit-Add-Fix-config-option-to-enable-disable-fu.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Kevin Anderson
Good morning,

Just politely requesting if anyone has had a chance to check this patch
out? I am curious if it needs more work or if it is fine as is and can be
merged?

Thanks,
Kevin

On Tue, Oct 25, 2016 at 5:41 PM, Kevin Anderson <[hidden email]>
wrote:

> Good afternoon,
>
> Attached is a patch to add support for advertising FULLSYNC to
> Mac OSX clients that request it. This is necessary to support
> Time Machine backups over SMB. I opened Bugzilla 12380 for this
> and Github pull request #64.
>
> Thanks,
> Kevin
>
Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Ralph Böhme-2
Kevin,

On Mon, Oct 31, 2016 at 11:51:49AM -0400, Kevin Anderson wrote:
> Just politely requesting if anyone has had a chance to check this patch
> out? I am curious if it needs more work or if it is fine as is and can be
> merged?

Thanks for the patches and please keep pushing me to get them
integrated. I need to discuss a few technical things with Jeremy
related to this before I can start to do a review. So please stay
tuned... :)

-slow

Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Ralph Böhme-2
In reply to this post by Kevin Anderson
Hi Kevin,

sorry for taking so long to get to this.

First the boring part, review:

On Mon, Oct 31, 2016 at 11:51:49AM -0400, Kevin Anderson wrote:
> --- a/docs-xml/manpages/vfs_fruit.8.xml
> +++ b/docs-xml/manpages/vfs_fruit.8.xml
> @@ -108,6 +108,27 @@
>    </varlistentry>
>  
>    <varlistentry>
> +    <term>fruit:advertise_fullsync = [ true | false ]</term>

maybe we may want to call a spade a space and name this option
"fruit:time machine". Thoughts?

> +    <listitem>
> +      <para>Controls if the FULLSYNC volume capability is advertised to clients</para>
> +
> +      <itemizedlist>
> + <listitem><para><command>false (default)</command>
> + Disables advertising the FULLSYNC volume capability to clients.
> + </para></listitem>
> +
> + <listitem><para><command>true</command> - Enables advertising
> + the FULLSYNC volume capability to clients. This is necessary for supporting
> + Time Machine backups from Mac OSX clients. This value only advertises the
> + capability and does nothing else with the sync requests from clients.
> + </para></listitem>
> +
> +      </itemizedlist>
> +
> +    </listitem>
> +  </varlistentry>
> +
> +  <varlistentry>
>      <term>fruit:metadata = [ stream | netatalk ]</term>
>      <listitem>
>        <para>Controls where the OS X metadata stream is stored:</para>
> diff --git a/libcli/smb/smb2_create_ctx.h b/libcli/smb/smb2_create_ctx.h
> index cb194f5..1c65e6c 100644
> --- a/libcli/smb/smb2_create_ctx.h
> +++ b/libcli/smb/smb2_create_ctx.h
> @@ -30,7 +30,7 @@
>  
>  /* "AAPL" Server Query request/response bitmap */
>  #define SMB2_CRTCTX_AAPL_SERVER_CAPS 1
> -#define SMB2_CRTCTX_AAPL_VOLUME_CAPS 2
> +#define SMB2_CRTCTX_AAPL_VOLUME_CAPS 6
>  #define SMB2_CRTCTX_AAPL_MODEL_INFO  4
This definitely looks wrong, these are the defines for individual bits
in a bitfield. Why are you changing SMB2_CRTCTX_AAPL_VOLUME_CAPS to 6 ?

>  
> /* "AAPL" Client/Server Capabilities bitmap */
> @@ -42,5 +42,6 @@
>  /* "AAPL" Volume Capabilities bitmap */
>  #define SMB2_CRTCTX_AAPL_SUPPORT_RESOLVE_ID 1
>  #define SMB2_CRTCTX_AAPL_CASE_SENSITIVE     2
> +#define SMB2_CRTCTX_AAPL_FULL_SYNC          4
>  
>  #endif
> diff --git a/source3/modules/vfs_fruit.c b/source3/modules/vfs_fruit.c
> index 6e7899aa..c0101c1 100644
> --- a/source3/modules/vfs_fruit.c
> +++ b/source3/modules/vfs_fruit.c
> @@ -133,6 +133,7 @@ struct fruit_config_data {
>   bool copyfile_enabled;
>   bool veto_appledouble;
>   bool posix_rename;
> + bool advertise_fullsync;
>  
>   /*
>   * Additional options, all enabled by default,
> @@ -1352,6 +1353,9 @@ static int init_fruit_config(vfs_handle_struct *handle)
>   config->use_aapl = lp_parm_bool(
>   -1, FRUIT_PARAM_TYPE_NAME, "aapl", true);
>  
> + config->advertise_fullsync = lp_parm_bool(
> + -1, FRUIT_PARAM_TYPE_NAME, "advertise_fullsync", false);
        config->advertise_fullsync = lp_parm_bool(
                -1, FRUIT_PARAM_TYPE_NAME, "time machine", false);

>   config->unix_info_enabled = lp_parm_bool(
>   -1, FRUIT_PARAM_TYPE_NAME, "nfs_aces", true);
>  
> @@ -1827,6 +1831,7 @@ static NTSTATUS check_aapl(vfs_handle_struct *handle,
>   DATA_BLOB blob = data_blob_talloc(req, NULL, 0);
>   uint64_t req_bitmap, client_caps;
>   uint64_t server_caps = SMB2_CRTCTX_AAPL_UNIX_BASED;
> + uint64_t volume_caps;
>   smb_ucs2_t *model;
>   size_t modellen;
>  
> @@ -1898,7 +1903,11 @@ static NTSTATUS check_aapl(vfs_handle_struct *handle,
>   if (req_bitmap & SMB2_CRTCTX_AAPL_VOLUME_CAPS) {
>   SBVAL(p, 0,
>        lp_case_sensitive(SNUM(handle->conn->tcon->compat)) ?
> -      SMB2_CRTCTX_AAPL_CASE_SENSITIVE : 0);
> +      volume_caps |= SMB2_CRTCTX_AAPL_CASE_SENSITIVE : 0);
> +
> + if (config->advertise_fullsync) {
> + SBVAL(p, 0, volume_caps |= SMB2_CRTCTX_AAPL_FULL_SYNC);
> + }
>   ok = data_blob_append(req, &blob, p, 8);
>   if (!ok) {
>   return NT_STATUS_UNSUCCESSFUL;
>
> From bdd62adfc29f768cca5ecb6fb2d4bad2c6b0cf35 Mon Sep 17 00:00:00 2001
> From: Kevin Anderson <[hidden email]>
> Date: Tue, 25 Oct 2016 16:21:19 -0400
> Subject: [PATCH 2/2] vfs_fruit: Add/Fix config option to enable/disable
>  fullsync
>
> Add a configuration option to disable/enable fullsync
> and to fix mistakenly overwriting values incorrectly.
Cleaned up in the attached patchset.

I've also added code that ensures all prerequisite Samba options are
set on the fly when a Time Machine enabled share is connected.

Now, secondly, the interesting part: have you ever tested if the TM
disk image filesystem survives network disconnects and/or hard server
power offs ?

I've been told that there seems to be an issue in the Linux kernel not
properly flushing buffers to disk in an fsync() resulting in damaged
TM disk image filesytems. This was discovered by folks running tests
with a similar patch.

From hearsay, some storage devices cheet when they get a flush
write-buffer command and ignore it, but the testing was done with a
storage device that was known not to cheet. But still, after power
cycling the server while a TM backup was in progress the TM disk image
filesystem was frequently reported as damaged by the client.

Do we want to put our users at risk of loosing their backups in
situations like this ? Do we want to pretend being a suitable backup
target for something that breaks easily for unknown reasons ?

From past experience of adding Time Machine support to another
filesharing protocol (for AFP/Netatalk), I'm worried that our
mailing lists will be flooded with complaints like this:

<https://www.google.de/search?q=time+machine+must+create+a+new+backup+for+you>

It seems just putting your laptop to sleep or disconnecting from
network while TM is running seems to be the primary cause for this. To
me it's entirely unclear how this relates to fsync implementation bugs,
it might be unrelated.

I'm not saying that I'm really opposed to adding the patch, sooner
then later vendors will stick it in their boxes anyway, but I'd like
to stir up some discussion and raise awareness.

Thoughts?

Cheerio!
-slow

time-machine.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Kevin Anderson
On Mon, Nov 21, 2016 at 05:00:26PM +0100, Ralph Böhme wrote:
Hi Ralph,
   No worries. I understand that people get busy with other things.

>
> maybe we may want to call a spade a space and name this option
> "fruit:time machine". Thoughts?
>

I am fine with this. I contemplated that name as well but wasn't sure
if advertising the FULLSYNC capability had a use case outside of supporting
Time Machine. Either name works for me though.

> > diff --git a/libcli/smb/smb2_create_ctx.h b/libcli/smb/smb2_create_ctx.h
> > index cb194f5..1c65e6c 100644
> > --- a/libcli/smb/smb2_create_ctx.h
> > +++ b/libcli/smb/smb2_create_ctx.h
> > @@ -30,7 +30,7 @@
> >  
> >  /* "AAPL" Server Query request/response bitmap */
> >  #define SMB2_CRTCTX_AAPL_SERVER_CAPS 1
> > -#define SMB2_CRTCTX_AAPL_VOLUME_CAPS 2
> > +#define SMB2_CRTCTX_AAPL_VOLUME_CAPS 6
> >  #define SMB2_CRTCTX_AAPL_MODEL_INFO  4
>
> This definitely looks wrong, these are the defines for individual bits
> in a bitfield. Why are you changing SMB2_CRTCTX_AAPL_VOLUME_CAPS to 6 ?
>

You are correct, that shouldn't be there. I mistakenly thought that needed to
be bumped for the correct response to be sent to clients.

> I've also added code that ensures all prerequisite Samba options are
> set on the fly when a Time Machine enabled share is connected.
>
> Now, secondly, the interesting part: have you ever tested if the TM
> disk image filesystem survives network disconnects and/or hard server
> power offs ?
>

I have been running the provided patch set for the past month and have not
noticed any issues. In that time I have restarted the networking interfaces
on the server I am using while backups are running without any issues being
reported as well as being able to restore from the same backup. With that
being said I have not tested a hard poweroff of the server as it is backed
by an UPS. I will try to test this case and report back.


> I've been told that there seems to be an issue in the Linux kernel not
> properly flushing buffers to disk in an fsync() resulting in damaged
> TM disk image filesytems. This was discovered by folks running tests
> with a similar patch.
>

I am by no means an expert here but I think the success of fsync() may
depend on write barrier support in the underlying file system. I think
in kernels after 2.6.30 and at least ext4, this should be improved
according to these:

https://wiki.archlinux.org/index.php/ext4#Barriers_and_performance
https://lwn.net/Articles/283161/

> From hearsay, some storage devices cheet when they get a flush
> write-buffer command and ignore it, but the testing was done with a
> storage device that was known not to cheet. But still, after power
> cycling the server while a TM backup was in progress the TM disk image
> filesystem was frequently reported as damaged by the client.

> Do we want to put our users at risk of loosing their backups in
> situations like this ? Do we want to pretend being a suitable backup
> target for something that breaks easily for unknown reasons ?

I can certainly understand the concern and I think it is valid. Re-reading
the Time Machine spec, the FULLSYNC capability is embedded in a SMB FLUSH
request. Also based one this email thread, Samba FLUSH operations are
asynchronous by default:

https://lists.samba.org/archive/samba/2008-September/143627.html

The asynchronous writes make me curious if this might be leading to
some of the corruption edge cases as well as the case above. Is it possible to
force a fsync() from the VFS layer? Could we add a handler for SMB2 FLUSH
commands that check for a Reserved1 Field set to 0xFFFF and force an fsync()?

> It seems just putting your laptop to sleep or disconnecting from
> network while TM is running seems to be the primary cause for this. To
> me it's entirely unclear how this relates to fsync implementation bugs,
> it might be unrelated.

I'm unclear how or if they are related as well. I could definitely see
potential corruption issues occuring from hard power off's but not necessarily
client disconnects or the client entering sleep mode. In the case of the
client disconnecting, the server should still be able to sync data in the cache
successfully. The client disconnect case I will try to test some more
as well but so far I haven't noticed any issues and I have pretty regularly
put my laptop to sleep while taking backups.

Thanks,
Kevin

Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Ralph Böhme-2
Hi Kevin,

On Sat, Nov 26, 2016 at 09:40:29PM -0500, Kevin Anderson wrote:

> On Mon, Nov 21, 2016 at 05:00:26PM +0100, Ralph Böhme wrote:
> Hi Ralph,
>    No worries. I understand that people get busy with other things.
>
> >
> > maybe we may want to call a spade a space and name this option
> > "fruit:time machine". Thoughts?
> >
>
> I am fine with this. I contemplated that name as well but wasn't sure
> if advertising the FULLSYNC capability had a use case outside of supporting
> Time Machine. Either name works for me though.

ok.

> > I've also added code that ensures all prerequisite Samba options are
> > set on the fly when a Time Machine enabled share is connected.
> >
> > Now, secondly, the interesting part: have you ever tested if the TM
> > disk image filesystem survives network disconnects and/or hard server
> > power offs ?
> >
>
> I have been running the provided patch set for the past month and have not
> noticed any issues. In that time I have restarted the networking interfaces
> on the server I am using while backups are running without any issues being
> reported as well as being able to restore from the same backup. With that
> being said I have not tested a hard poweroff of the server as it is backed
> by an UPS. I will try to test this case and report back.

did you run into any issues?

> > I've been told that there seems to be an issue in the Linux kernel not
> > properly flushing buffers to disk in an fsync() resulting in damaged
> > TM disk image filesytems. This was discovered by folks running tests
> > with a similar patch.
> >
>
> I am by no means an expert here but I think the success of fsync() may
> depend on write barrier support in the underlying file system. I think
> in kernels after 2.6.30 and at least ext4, this should be improved
> according to these:
>
> https://wiki.archlinux.org/index.php/ext4#Barriers_and_performance
> https://lwn.net/Articles/283161/

That were my findings as well.

> > From hearsay, some storage devices cheet when they get a flush
> > write-buffer command and ignore it, but the testing was done with a
> > storage device that was known not to cheet. But still, after power
> > cycling the server while a TM backup was in progress the TM disk image
> > filesystem was frequently reported as damaged by the client.
>
> > Do we want to put our users at risk of loosing their backups in
> > situations like this ? Do we want to pretend being a suitable backup
> > target for something that breaks easily for unknown reasons ?
>
> I can certainly understand the concern and I think it is valid. Re-reading
> the Time Machine spec, the FULLSYNC capability is embedded in a SMB FLUSH
> request.

yes.

> Also based one this email thread, Samba FLUSH operations are
> asynchronous by default:
>
> https://lists.samba.org/archive/samba/2008-September/143627.html

yes, they are asynchronous *and* they're disabled by default (strict sync =
no), that's why we'ge going to enable it at runtime if fruit:time machine=yes.

> The asynchronous writes make me curious if this might be leading to
> some of the corruption edge cases as well as the case above.

Hm, I guess the time window is small where we responsed to the flush request
while the fsync is still being done in a worker thread, but it's there, so yes,
this could be possible.

> Is it possible to force a fsync() from the VFS layer? Could we add a handler
> for SMB2 FLUSH commands that check for a Reserved1 Field set to 0xFFFF and
> force an fsync()?

Yes, we probably want to parse the Reserved1 field in the SMB2 frontend and pass
it down to the SMB2 flush request handler. Depending on the setting we could the
switch between callinc async flush() or sync.

>
> > It seems just putting your laptop to sleep or disconnecting from
> > network while TM is running seems to be the primary cause for this. To
> > me it's entirely unclear how this relates to fsync implementation bugs,
> > it might be unrelated.
>
> I'm unclear how or if they are related as well. I could definitely see
> potential corruption issues occuring from hard power off's but not necessarily
> client disconnects or the client entering sleep mode. In the case of the
> client disconnecting, the server should still be able to sync data in the cache
> successfully. The client disconnect case I will try to test some more
> as well but so far I haven't noticed any issues and I have pretty regularly
> put my laptop to sleep while taking backups.

ok.

Cheerio!
-slow

Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Kevin Anderson
Hi Ralph,
   Sorry for the delay.

On Sat, Jan 21, 2017 at 3:13 AM, Ralph Böhme <[hidden email]> wrote:

>
> Hi Kevin,
>
> On Sat, Nov 26, 2016 at 09:40:29PM -0500, Kevin Anderson wrote:
> > On Mon, Nov 21, 2016 at 05:00:26PM +0100, Ralph Böhme wrote:
> > Hi Ralph,
> >    No worries. I understand that people get busy with other things.
> >
> > >
> > > maybe we may want to call a spade a space and name this option
> > > "fruit:time machine". Thoughts?
> > >
> >
> > I am fine with this. I contemplated that name as well but wasn't sure
> > if advertising the FULLSYNC capability had a use case outside of supporting
> > Time Machine. Either name works for me though.
>
> ok.
>
> > > I've also added code that ensures all prerequisite Samba options are
> > > set on the fly when a Time Machine enabled share is connected.
> > >
> > > Now, secondly, the interesting part: have you ever tested if the TM
> > > disk image filesystem survives network disconnects and/or hard server
> > > power offs ?
> > >
> >
> > I have been running the provided patch set for the past month and have not
> > noticed any issues. In that time I have restarted the networking interfaces
> > on the server I am using while backups are running without any issues being
> > reported as well as being able to restore from the same backup. With that
> > being said I have not tested a hard poweroff of the server as it is backed
> > by an UPS. I will try to test this case and report back.
>
> did you run into any issues?
>

So far I have not run in to any issues even doing a hard power off a
couple of times.

> > > I've been told that there seems to be an issue in the Linux kernel not
> > > properly flushing buffers to disk in an fsync() resulting in damaged
> > > TM disk image filesytems. This was discovered by folks running tests
> > > with a similar patch.
> > >
> >
> > I am by no means an expert here but I think the success of fsync() may
> > depend on write barrier support in the underlying file system. I think
> > in kernels after 2.6.30 and at least ext4, this should be improved
> > according to these:
> >
> > https://wiki.archlinux.org/index.php/ext4#Barriers_and_performance
> > https://lwn.net/Articles/283161/
>
> That were my findings as well.
>
> > > From hearsay, some storage devices cheet when they get a flush
> > > write-buffer command and ignore it, but the testing was done with a
> > > storage device that was known not to cheet. But still, after power
> > > cycling the server while a TM backup was in progress the TM disk image
> > > filesystem was frequently reported as damaged by the client.
> >
> > > Do we want to put our users at risk of loosing their backups in
> > > situations like this ? Do we want to pretend being a suitable backup
> > > target for something that breaks easily for unknown reasons ?
> >
> > I can certainly understand the concern and I think it is valid. Re-reading
> > the Time Machine spec, the FULLSYNC capability is embedded in a SMB FLUSH
> > request.
>
> yes.
>
> > Also based one this email thread, Samba FLUSH operations are
> > asynchronous by default:
> >
> > https://lists.samba.org/archive/samba/2008-September/143627.html
>
> yes, they are asynchronous *and* they're disabled by default (strict sync =
> no), that's why we'ge going to enable it at runtime if fruit:time machine=yes.

OK.

>
> > The asynchronous writes make me curious if this might be leading to
> > some of the corruption edge cases as well as the case above.
>
> Hm, I guess the time window is small where we responsed to the flush request
> while the fsync is still being done in a worker thread, but it's there, so yes,
> this could be possible.
>
> > Is it possible to force a fsync() from the VFS layer? Could we add a handler
> > for SMB2 FLUSH commands that check for a Reserved1 Field set to 0xFFFF and
> > force an fsync()?
>
> Yes, we probably want to parse the Reserved1 field in the SMB2 frontend and pass
> it down to the SMB2 flush request handler. Depending on the setting we could the
> switch between callinc async flush() or sync.

OK. I will look to adding that to the provided patch but it may take
some time as I understand some other parts of the code base. I think
that would provide the necessary balance between data consistency and
performance.

>
> >
> > > It seems just putting your laptop to sleep or disconnecting from
> > > network while TM is running seems to be the primary cause for this. To
> > > me it's entirely unclear how this relates to fsync implementation bugs,
> > > it might be unrelated.
> >
> > I'm unclear how or if they are related as well. I could definitely see
> > potential corruption issues occuring from hard power off's but not necessarily
> > client disconnects or the client entering sleep mode. In the case of the
> > client disconnecting, the server should still be able to sync data in the cache
> > successfully. The client disconnect case I will try to test some more
> > as well but so far I haven't noticed any issues and I have pretty regularly
> > put my laptop to sleep while taking backups.
>
> ok.
>
> Cheerio!
> -slow

-Kevin

Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Christoph Hellwig
Coming in late here, but I would suggest that the Samba team
reconsiders implementing the flush commands properly, as in not
being a no-op.  Except for the Apple world with it's odd FULLSYNC
fcntl fsync/fdatasync is _the_ primitive to guarantee data integrity,
and a major file sever that simply ignores it is a major disaster for
the full storage stack.

Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Alexander Bokovoy
On su, 19 helmi 2017, Christoph Hellwig wrote:
> Coming in late here, but I would suggest that the Samba team
> reconsiders implementing the flush commands properly, as in not
> being a no-op.  Except for the Apple world with it's odd FULLSYNC
> fcntl fsync/fdatasync is _the_ primitive to guarantee data integrity,
> and a major file sever that simply ignores it is a major disaster for
> the full storage stack.
Samba already has vfs_commit module which implements fsync/fdatasync
use. It can be stacked with other modules to achieve what you just
described.

--
/ Alexander Bokovoy

Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Christoph Hellwig
On Sun, Feb 19, 2017 at 08:44:47PM +0200, Alexander Bokovoy wrote:

> On su, 19 helmi 2017, Christoph Hellwig wrote:
> > Coming in late here, but I would suggest that the Samba team
> > reconsiders implementing the flush commands properly, as in not
> > being a no-op.  Except for the Apple world with it's odd FULLSYNC
> > fcntl fsync/fdatasync is _the_ primitive to guarantee data integrity,
> > and a major file sever that simply ignores it is a major disaster for
> > the full storage stack.
> Samba already has vfs_commit module which implements fsync/fdatasync
> use. It can be stacked with other modules to achieve what you just
> described.

I'm worried about doing the right thing out of the box and by default.

Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Jeremy Allison
In reply to this post by Christoph Hellwig
On Sun, Feb 19, 2017 at 09:23:54AM -0800, Christoph Hellwig wrote:
> Coming in late here, but I would suggest that the Samba team
> reconsiders implementing the flush commands properly, as in not
> being a no-op.  Except for the Apple world with it's odd FULLSYNC
> fcntl fsync/fdatasync is _the_ primitive to guarantee data integrity,
> and a major file sever that simply ignores it is a major disaster for
> the full storage stack.

We already implement the flush commands properly.

man smb.conf

/strict sync

Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Jeremy Allison
In reply to this post by Christoph Hellwig
On Sun, Feb 19, 2017 at 11:41:13AM -0800, Christoph Hellwig wrote:

> On Sun, Feb 19, 2017 at 08:44:47PM +0200, Alexander Bokovoy wrote:
> > On su, 19 helmi 2017, Christoph Hellwig wrote:
> > > Coming in late here, but I would suggest that the Samba team
> > > reconsiders implementing the flush commands properly, as in not
> > > being a no-op.  Except for the Apple world with it's odd FULLSYNC
> > > fcntl fsync/fdatasync is _the_ primitive to guarantee data integrity,
> > > and a major file sever that simply ignores it is a major disaster for
> > > the full storage stack.
> > Samba already has vfs_commit module which implements fsync/fdatasync
> > use. It can be stacked with other modules to achieve what you just
> > described.
>
> I'm worried about doing the right thing out of the box and by default.

So really all you're asking for is 'strict sync = yes'
by default. That's doable.

FYI, for paranoid use cases we also have 'sync always'
as well, which makes every write followed by an fsync().

Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Ralph Böhme-2
On Tue, Feb 21, 2017 at 08:55:15AM -0800, Jeremy Allison wrote:

> On Sun, Feb 19, 2017 at 11:41:13AM -0800, Christoph Hellwig wrote:
> > On Sun, Feb 19, 2017 at 08:44:47PM +0200, Alexander Bokovoy wrote:
> > > On su, 19 helmi 2017, Christoph Hellwig wrote:
> > > > Coming in late here, but I would suggest that the Samba team
> > > > reconsiders implementing the flush commands properly, as in not
> > > > being a no-op.  Except for the Apple world with it's odd FULLSYNC
> > > > fcntl fsync/fdatasync is _the_ primitive to guarantee data integrity,
> > > > and a major file sever that simply ignores it is a major disaster for
> > > > the full storage stack.
> > > Samba already has vfs_commit module which implements fsync/fdatasync
> > > use. It can be stacked with other modules to achieve what you just
> > > described.
> >
> > I'm worried about doing the right thing out of the box and by default.
>
> So really all you're asking for is 'strict sync = yes'
> by default. That's doable.

maybe at least for SMB2.

Cheerio!
-slow

Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Ralph Böhme-2
In reply to this post by Kevin Anderson
Hi Kevin,

On Sat, Feb 18, 2017 at 06:32:00PM -0500, Kevin Anderson wrote:

> > > > I've also added code that ensures all prerequisite Samba options are
> > > > set on the fly when a Time Machine enabled share is connected.
> > > >
> > > > Now, secondly, the interesting part: have you ever tested if the TM
> > > > disk image filesystem survives network disconnects and/or hard server
> > > > power offs ?
> > > >
> > >
> > > I have been running the provided patch set for the past month and have not
> > > noticed any issues. In that time I have restarted the networking interfaces
> > > on the server I am using while backups are running without any issues being
> > > reported as well as being able to restore from the same backup. With that
> > > being said I have not tested a hard poweroff of the server as it is backed
> > > by an UPS. I will try to test this case and report back.
> >
> > did you run into any issues?
> >
>
> So far I have not run in to any issues even doing a hard power off a
> couple of times.

ok.

> > > Also based one this email thread, Samba FLUSH operations are
> > > asynchronous by default:
> > >
> > > https://lists.samba.org/archive/samba/2008-September/143627.html
> >
> > yes, they are asynchronous *and* they're disabled by default (strict sync =
> > no), that's why we'ge going to enable it at runtime if fruit:time machine=yes.
>
> OK.
>
> >
> > > The asynchronous writes make me curious if this might be leading to
> > > some of the corruption edge cases as well as the case above.
> >
> > Hm, I guess the time window is small where we responsed to the flush request
> > while the fsync is still being done in a worker thread, but it's there, so yes,
> > this could be possible.
> >
> > > Is it possible to force a fsync() from the VFS layer? Could we add a handler
> > > for SMB2 FLUSH commands that check for a Reserved1 Field set to 0xFFFF and
> > > force an fsync()?
> >
> > Yes, we probably want to parse the Reserved1 field in the SMB2 frontend and pass
> > it down to the SMB2 flush request handler. Depending on the setting we could the
> > switch between callinc async flush() or sync.
>
> OK. I will look to adding that to the provided patch but it may take
> some time as I understand some other parts of the code base. I think
> that would provide the necessary balance between data consistency and
> performance.

I can do this part, you can then do the testing. :)

Cheerio!
-slow

Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Kevin Anderson
Hi Ralph,

That works for me. :) Let me know if there is anything else I can help with.

-Kevin

On Tue, Feb 21, 2017 at 12:31 PM, Ralph Böhme <[hidden email]> wrote:

> Hi Kevin,
>
> On Sat, Feb 18, 2017 at 06:32:00PM -0500, Kevin Anderson wrote:
>> > > > I've also added code that ensures all prerequisite Samba options are
>> > > > set on the fly when a Time Machine enabled share is connected.
>> > > >
>> > > > Now, secondly, the interesting part: have you ever tested if the TM
>> > > > disk image filesystem survives network disconnects and/or hard server
>> > > > power offs ?
>> > > >
>> > >
>> > > I have been running the provided patch set for the past month and have not
>> > > noticed any issues. In that time I have restarted the networking interfaces
>> > > on the server I am using while backups are running without any issues being
>> > > reported as well as being able to restore from the same backup. With that
>> > > being said I have not tested a hard poweroff of the server as it is backed
>> > > by an UPS. I will try to test this case and report back.
>> >
>> > did you run into any issues?
>> >
>>
>> So far I have not run in to any issues even doing a hard power off a
>> couple of times.
>
> ok.
>
>> > > Also based one this email thread, Samba FLUSH operations are
>> > > asynchronous by default:
>> > >
>> > > https://lists.samba.org/archive/samba/2008-September/143627.html
>> >
>> > yes, they are asynchronous *and* they're disabled by default (strict sync =
>> > no), that's why we'ge going to enable it at runtime if fruit:time machine=yes.
>>
>> OK.
>>
>> >
>> > > The asynchronous writes make me curious if this might be leading to
>> > > some of the corruption edge cases as well as the case above.
>> >
>> > Hm, I guess the time window is small where we responsed to the flush request
>> > while the fsync is still being done in a worker thread, but it's there, so yes,
>> > this could be possible.
>> >
>> > > Is it possible to force a fsync() from the VFS layer? Could we add a handler
>> > > for SMB2 FLUSH commands that check for a Reserved1 Field set to 0xFFFF and
>> > > force an fsync()?
>> >
>> > Yes, we probably want to parse the Reserved1 field in the SMB2 frontend and pass
>> > it down to the SMB2 flush request handler. Depending on the setting we could the
>> > switch between callinc async flush() or sync.
>>
>> OK. I will look to adding that to the provided patch but it may take
>> some time as I understand some other parts of the code base. I think
>> that would provide the necessary balance between data consistency and
>> performance.
>
> I can do this part, you can then do the testing. :)
>
> Cheerio!
> -slow

Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Samba - samba-technical mailing list
Hi Ralph,
   I started an attempt to implement the remaining piece of this patch
and I have parsed out the reserved1 field in the
smbd_smb2_request_process_flush function. Does that seem appropriate
to you? If so, I only have to determine the best way to pass the fsync
to the actual file. Can/should the fsync be handled in the fruit
module? I would be open to any advice that you have as I have tried a
number of things already but they were unsuccessful.

I have attached a patch (with some extra print statements for my own
use. They will be removed before the final patch is submitted) that
parses the field.

Thanks,
Kevin

On Tue, Feb 21, 2017 at 4:06 PM, Kevin Anderson <[hidden email]> wrote:

> Hi Ralph,
>
> That works for me. :) Let me know if there is anything else I can help with.
>
> -Kevin
>
> On Tue, Feb 21, 2017 at 12:31 PM, Ralph Böhme <[hidden email]> wrote:
>> Hi Kevin,
>>
>> On Sat, Feb 18, 2017 at 06:32:00PM -0500, Kevin Anderson wrote:
>>> > > > I've also added code that ensures all prerequisite Samba options are
>>> > > > set on the fly when a Time Machine enabled share is connected.
>>> > > >
>>> > > > Now, secondly, the interesting part: have you ever tested if the TM
>>> > > > disk image filesystem survives network disconnects and/or hard server
>>> > > > power offs ?
>>> > > >
>>> > >
>>> > > I have been running the provided patch set for the past month and have not
>>> > > noticed any issues. In that time I have restarted the networking interfaces
>>> > > on the server I am using while backups are running without any issues being
>>> > > reported as well as being able to restore from the same backup. With that
>>> > > being said I have not tested a hard poweroff of the server as it is backed
>>> > > by an UPS. I will try to test this case and report back.
>>> >
>>> > did you run into any issues?
>>> >
>>>
>>> So far I have not run in to any issues even doing a hard power off a
>>> couple of times.
>>
>> ok.
>>
>>> > > Also based one this email thread, Samba FLUSH operations are
>>> > > asynchronous by default:
>>> > >
>>> > > https://lists.samba.org/archive/samba/2008-September/143627.html
>>> >
>>> > yes, they are asynchronous *and* they're disabled by default (strict sync =
>>> > no), that's why we'ge going to enable it at runtime if fruit:time machine=yes.
>>>
>>> OK.
>>>
>>> >
>>> > > The asynchronous writes make me curious if this might be leading to
>>> > > some of the corruption edge cases as well as the case above.
>>> >
>>> > Hm, I guess the time window is small where we responsed to the flush request
>>> > while the fsync is still being done in a worker thread, but it's there, so yes,
>>> > this could be possible.
>>> >
>>> > > Is it possible to force a fsync() from the VFS layer? Could we add a handler
>>> > > for SMB2 FLUSH commands that check for a Reserved1 Field set to 0xFFFF and
>>> > > force an fsync()?
>>> >
>>> > Yes, we probably want to parse the Reserved1 field in the SMB2 frontend and pass
>>> > it down to the SMB2 flush request handler. Depending on the setting we could the
>>> > switch between callinc async flush() or sync.
>>>
>>> OK. I will look to adding that to the provided patch but it may take
>>> some time as I understand some other parts of the code base. I think
>>> that would provide the necessary balance between data consistency and
>>> performance.
>>
>> I can do this part, you can then do the testing. :)
>>
>> Cheerio!
>> -slow

parse-reserved1-flush.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Samba - samba-technical mailing list
Hi Kevin,

On Sun, Apr 02, 2017 at 10:40:06PM -0400, Kevin Anderson wrote:
> Hi Ralph,
>    I started an attempt to implement the remaining piece of this patch
> and I have parsed out the reserved1 field in the
> smbd_smb2_request_process_flush function. Does that seem appropriate
> to you? If so, I only have to determine the best way to pass the fsync
> to the actual file. Can/should the fsync be handled in the fruit
> module? I would be open to any advice that you have as I have tried a
> number of things already but they were unsuccessful.

maybe something like the attached. Just cobbled together, completely untested.

Jeremy, can you take a look as well please?

Cheerio!
-slow

fullsync.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Samba - samba-technical mailing list
On Tue, Apr 04, 2017 at 01:21:06PM +0200, Ralph Böhme wrote:

> Hi Kevin,
>
> On Sun, Apr 02, 2017 at 10:40:06PM -0400, Kevin Anderson wrote:
> > Hi Ralph,
> >    I started an attempt to implement the remaining piece of this patch
> > and I have parsed out the reserved1 field in the
> > smbd_smb2_request_process_flush function. Does that seem appropriate
> > to you? If so, I only have to determine the best way to pass the fsync
> > to the actual file. Can/should the fsync be handled in the fruit
> > module? I would be open to any advice that you have as I have tried a
> > number of things already but they were unsuccessful.
>
> maybe something like the attached. Just cobbled together, completely untested.
>
> Jeremy, can you take a look as well please?
>
> Cheerio!
> -slow

> From 2c3ec983cfec3dceda77e979784e2ea34a0ef03e Mon Sep 17 00:00:00 2001
> From: Ralph Boehme <[hidden email]>
> Date: Tue, 4 Apr 2017 13:13:30 +0200
> Subject: [PATCH 1/2] s3/smbd: add aapl_enabled flag to fsp
>
> This will be needed in the next commit to implement macOS fullsync extension.
> ---
>  source3/include/vfs.h       | 1 +
>  source3/modules/vfs_fruit.c | 2 ++
>  2 files changed, 3 insertions(+)
>
> diff --git a/source3/include/vfs.h b/source3/include/vfs.h
> index 0810fc2..fa964c0 100644
> --- a/source3/include/vfs.h
> +++ b/source3/include/vfs.h
> @@ -289,6 +289,7 @@ typedef struct files_struct {
>   bool is_sparse;
>   bool backup_intent; /* Handle was successfully opened with backup intent
>   and opener has privilege to do so. */
> + bool aapl_enabled; /* macOS client with extensions enabled */

I *hate* adding these aapl_XXX flags into 'struct files_struct'.

I know aapl_copyfile_supported is used in fsctl_srv_copychunk_loop(),
but that's a layering violation we haven't yet solved IMHO. I don't
want to make this worse for the 4.7.0 VFS.

Can't we change this to a generic:

bool VFS_QUERY_FEATURE(fsp, const char *vfs_module_name, const char *feature_name);

or

void *VFS_QUERY_FEATURE(fsp, const char *vfs_module_name, const char *feature_name);

that can sweep aapl-specific flags/structs out of the main interfaces
and back onto the module-specific private data struct where they
belong ?

That way the upper-level code has a module-agnostic way of asking
for module features that change our protocol-level behavior.

>   bool aapl_copyfile_supported;
>   bool use_ofd_locks; /* Are we using open file description locks ? */
>   struct smb_filename *fsp_name;
> diff --git a/source3/modules/vfs_fruit.c b/source3/modules/vfs_fruit.c
> index 0b5558a..bf149ab 100644
> --- a/source3/modules/vfs_fruit.c
> +++ b/source3/modules/vfs_fruit.c
> @@ -4945,6 +4945,8 @@ static NTSTATUS fruit_create_file(vfs_handle_struct *handle,
>   fsp = *result;
>  
>   if (global_fruit_config.nego_aapl) {
> + fsp->aapl_enabled = true;
> +
>   if (config->copyfile_enabled) {
>   /*
>   * Set a flag in the fsp. Gets used in
> --
> 2.9.3
>
>
> From 5c49e46489fc3496acc4c6653f036a8ec11d9bd9 Mon Sep 17 00:00:00 2001
> From: Ralph Boehme <[hidden email]>
> Date: Tue, 4 Apr 2017 13:10:00 +0200
> Subject: [PATCH 2/2] s3/smbd: macOS fullsync
>
> ---
>  source3/smbd/smb2_flush.c | 26 +++++++++++++++++++++++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/source3/smbd/smb2_flush.c b/source3/smbd/smb2_flush.c
> index d077c62..e4ce0a0 100644
> --- a/source3/smbd/smb2_flush.c
> +++ b/source3/smbd/smb2_flush.c
> @@ -27,7 +27,8 @@
>  static struct tevent_req *smbd_smb2_flush_send(TALLOC_CTX *mem_ctx,
>         struct tevent_context *ev,
>         struct smbd_smb2_request *smb2req,
> -       struct files_struct *fsp);
> +       struct files_struct *fsp,
> +       uint16_t flush_reserved1);
>  static NTSTATUS smbd_smb2_flush_recv(struct tevent_req *req);
>  
>  static void smbd_smb2_request_flush_done(struct tevent_req *subreq);
> @@ -37,6 +38,7 @@ NTSTATUS smbd_smb2_request_process_flush(struct smbd_smb2_request *req)
>   const uint8_t *inbody;
>   uint64_t in_file_id_persistent;
>   uint64_t in_file_id_volatile;
> + uint16_t in_flush_reserved1;
>   struct files_struct *in_fsp;
>   struct tevent_req *subreq;
>  
> @@ -46,6 +48,7 @@ NTSTATUS smbd_smb2_request_process_flush(struct smbd_smb2_request *req)
>   }
>   inbody = SMBD_SMB2_IN_BODY_PTR(req);
>  
> + in_flush_reserved1 = SVAL(inbody, 0x2);
>   in_file_id_persistent = BVAL(inbody, 0x08);
>   in_file_id_volatile = BVAL(inbody, 0x10);
>  
> @@ -55,7 +58,7 @@ NTSTATUS smbd_smb2_request_process_flush(struct smbd_smb2_request *req)
>   }
>  
>   subreq = smbd_smb2_flush_send(req, req->sconn->ev_ctx,
> -      req, in_fsp);
> +      req, in_fsp, in_flush_reserved1);
>   if (subreq == NULL) {
>   return smbd_smb2_request_error(req, NT_STATUS_NO_MEMORY);
>   }
> @@ -115,7 +118,8 @@ static void smbd_smb2_flush_done(struct tevent_req *subreq);
>  static struct tevent_req *smbd_smb2_flush_send(TALLOC_CTX *mem_ctx,
>         struct tevent_context *ev,
>         struct smbd_smb2_request *smb2req,
> -       struct files_struct *fsp)
> +       struct files_struct *fsp,
> +       uint16_t flush_reserved1)
>  {
>   struct tevent_req *req;
>   struct tevent_req *subreq;
> @@ -153,6 +157,22 @@ static struct tevent_req *smbd_smb2_flush_send(TALLOC_CTX *mem_ctx,
>   return tevent_req_post(req, ev);
>   }
>  
> + if (fsp->aapl_enabled && (flush_reserved1 == 0xffff)) {
> + /*
> + * macOS specific fullsync, cf https://goo.gl/Lw5bdz
> + */
> + ret = SMB_VFS_FSYNC(fsp);

Doesn't this need to be SMB_VFS_FSYNC_SEND() and
the whole thing use the standard async callback
mechanisms ?

Looking at https://goo.gl/Lw5bdz it doesn't say we can't
return an intermediate response followed by a completed
response. I don't want to add in another synchronous-sync
call (if you'll pardon the pun :-).

Jeremy.

Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Samba - samba-technical mailing list
On 04/07/2017 09:04 PM, Jeremy Allison via samba-technical wrote:

> Can't we change this to a generic:
>
> bool VFS_QUERY_FEATURE(fsp, const char *vfs_module_name, const char *feature_name);
>
> or
>
> void *VFS_QUERY_FEATURE(fsp, const char *vfs_module_name, const char *feature_name);
>
> that can sweep aapl-specific flags/structs out of the main interfaces
> and back onto the module-specific private data struct where they
> belong ?
>
+1. My thinking of this has been:

a. It's Apple-specific, let's hide it
b. ...But Macs are important too, let's not hide it (that's where I
submitted the file-id patch which adds an Apple-specific flag to the SMB
server).
c. ...But their version of the protocol is not documented and subject to
change without notice, and Microsoft's version currently is documented
and supported, so let's keep the main SMB code adhering to the
documented protocol version and (try to) hide undocumented dialects.

Just my 2c...
Uri

Reply | Threaded
Open this post in threaded view
|

Re: Patch to add support for advertising FULLSYNC to Mac OSX Clients

Samba - samba-technical mailing list
On Fri, Apr 07, 2017 at 09:37:02PM +0300, Uri Simchoni wrote:

> On 04/07/2017 09:04 PM, Jeremy Allison via samba-technical wrote:
> > Can't we change this to a generic:
> >
> > bool VFS_QUERY_FEATURE(fsp, const char *vfs_module_name, const char *feature_name);
> >
> > or
> >
> > void *VFS_QUERY_FEATURE(fsp, const char *vfs_module_name, const char *feature_name);
> >
> > that can sweep aapl-specific flags/structs out of the main interfaces
> > and back onto the module-specific private data struct where they
> > belong ?
> >
> +1. My thinking of this has been:
>
> a. It's Apple-specific, let's hide it
> b. ...But Macs are important too, let's not hide it (that's where I
> submitted the file-id patch which adds an Apple-specific flag to the SMB
> server).
> c. ...But their version of the protocol is not documented and subject to
> change without notice, and Microsoft's version currently is documented
> and supported, so let's keep the main SMB code adhering to the
> documented protocol version and (try to) hide undocumented dialects.
>
> Just my 2c...

Yeah, each individual decision made sense at the time,
but I don't like where we ended up if you see what I
mean.

Wouldn't a:

void *VFS_QUERY_FEATURE(fsp, const char *vfs_module_name, const char *feature_name);

interface allow us to at least clean things up a
little, and maybe allow us to isolate some of the
state back where it belongs ?

12