Quantcast

"killtime" parameter patch ?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

"killtime" parameter patch ?

Jeremy Allison
Interesting situation just came up (original guilty party who
raised the issue deleted :-).

Testing a Samba server for reliability. Connected Windows client
with open files.

Ethernet cable gets unplugged for a few minutes, client IO fails,
then it gets reconnected.

Client reconnects to Samba, gets a new smbd but finds its open files
still locked.

"deadtime" doesn't fire because the smbd still has resources open.

Original smbd is still hanging around so sharemode detection finds
an open process.

If the files are oplocked oplock break send will get a TCP reset
and cause the original smbd to die, but what if the oplock was already
broken ?

In that case a "FILE_SHARE_NONE" blocks everything for as long as
the original smbd is still around.

Client has no way to signal original server smbd process that it
should die.

TCP timeouts will kill after 2 hours but this is considered *way*
too long for client to wait. "reset on zero vc" isn't set due to
potential NAT issues.

So here is an (untested, but compiles and I think it'll do the job)
solution. Parameter "killtime" (set in minutes). If nothing received
on a TCP connection (including no SMBecho calls) for "killtime"
minutes, it causes the smbd to commit suicide (even with open
resources). Cleanly of course :-).

Thoughts on whether this is a good idea ? If so I'll write up the
man page :-).

Jeremy.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: "killtime" parameter patch ?

Volker Lendecke
On Tue, May 15, 2012 at 05:23:56PM -0700, Jeremy Allison wrote:

> Interesting situation just came up (original guilty party who
> raised the issue deleted :-).
>
> Testing a Samba server for reliability. Connected Windows client
> with open files.
>
> Ethernet cable gets unplugged for a few minutes, client IO fails,
> then it gets reconnected.
>
> Client reconnects to Samba, gets a new smbd but finds its open files
> still locked.
>
> "deadtime" doesn't fire because the smbd still has resources open.
>
> Original smbd is still hanging around so sharemode detection finds
> an open process.
>
> If the files are oplocked oplock break send will get a TCP reset
> and cause the original smbd to die, but what if the oplock was already
> broken ?
>
> In that case a "FILE_SHARE_NONE" blocks everything for as long as
> the original smbd is still around.
>
> Client has no way to signal original server smbd process that it
> should die.
>
> TCP timeouts will kill after 2 hours but this is considered *way*
> too long for client to wait. "reset on zero vc" isn't set due to
> potential NAT issues.
>
> So here is an (untested, but compiles and I think it'll do the job)
> solution. Parameter "killtime" (set in minutes). If nothing received
> on a TCP connection (including no SMBecho calls) for "killtime"
> minutes, it causes the smbd to commit suicide (even with open
> resources). Cleanly of course :-).
>
> Thoughts on whether this is a good idea ? If so I'll write up the
> man page :-).

No, I don't think this is a good idea. We have the keepalive
and deadtime parameters together with reset on zero vc. You
can also play with socket options, at least on Linux to
lower the dead connection detection:

socket options = SO_KEEPALIVE TCP_KEEPIDLE=120 TCP_KEEPINTVL=10 TCP_KEEPCNT=5

might be a starting point.

The big question is -- how does Windows behave? afaik
Windows has the equivalent of "reset on zero vc" set always.

One idea for this particular case: What happens if we send a
netbios keepalive message from server to client when a
sharing violation is about to happen?

Volker

--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:[hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: "killtime" parameter patch ?

Stefan (metze) Metzmacher
In reply to this post by Jeremy Allison
Am 16.05.2012 02:23, schrieb Jeremy Allison:

> Interesting situation just came up (original guilty party who
> raised the issue deleted :-).
>
> Testing a Samba server for reliability. Connected Windows client
> with open files.
>
> Ethernet cable gets unplugged for a few minutes, client IO fails,
> then it gets reconnected.
>
> Client reconnects to Samba, gets a new smbd but finds its open files
> still locked.
>
> "deadtime" doesn't fire because the smbd still has resources open.
>
> Original smbd is still hanging around so sharemode detection finds
> an open process.
>
> If the files are oplocked oplock break send will get a TCP reset
> and cause the original smbd to die, but what if the oplock was already
> broken ?
>
> In that case a "FILE_SHARE_NONE" blocks everything for as long as
> the original smbd is still around.
>
> Client has no way to signal original server smbd process that it
> should die.
>
> TCP timeouts will kill after 2 hours but this is considered *way*
> too long for client to wait. "reset on zero vc" isn't set due to
> potential NAT issues.
>
> So here is an (untested, but compiles and I think it'll do the job)
> solution. Parameter "killtime" (set in minutes). If nothing received
> on a TCP connection (including no SMBecho calls) for "killtime"
> minutes, it causes the smbd to commit suicide (even with open
> resources). Cleanly of course :-).
>
> Thoughts on whether this is a good idea ? If so I'll write up the
> man page :-).
This is the wrong fix for SMB2.

What we need for SMB2 is the usage of the previous_session_id in
the session setup. I have the code mostly working in my WIP branch
https://gitweb.samba.org/?p=metze/samba/wip.git;a=shortlog;h=refs/heads/master3-reauth

The commit is
https://gitweb.samba.org/?p=metze/samba/wip.git;a=commitdiff;h=f91aec92ec9418dc2521bb68e5c3635c4fc9a304
(Note I need to make this full async before pushing it to master)

For SMB1 it might be better to use TCP keepalives,

"socket options" supports "SO_KEEPALIVE", "TCP_KEEPCNT",
"TCP_KEEPIDLE" and "TCP_KEEPINTVL".

metze


signature.asc (270 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: "killtime" parameter patch ?

Jeremy Allison
In reply to this post by Volker Lendecke
On Wed, May 16, 2012 at 08:47:34AM +0200, Volker Lendecke wrote:

> On Tue, May 15, 2012 at 05:23:56PM -0700, Jeremy Allison wrote:
> > Interesting situation just came up (original guilty party who
> > raised the issue deleted :-).
> >
> > Testing a Samba server for reliability. Connected Windows client
> > with open files.
> >
> > Ethernet cable gets unplugged for a few minutes, client IO fails,
> > then it gets reconnected.
> >
> > Client reconnects to Samba, gets a new smbd but finds its open files
> > still locked.
> >
> > "deadtime" doesn't fire because the smbd still has resources open.
> >
> > Original smbd is still hanging around so sharemode detection finds
> > an open process.
> >
> > If the files are oplocked oplock break send will get a TCP reset
> > and cause the original smbd to die, but what if the oplock was already
> > broken ?
> >
> > In that case a "FILE_SHARE_NONE" blocks everything for as long as
> > the original smbd is still around.
> >
> > Client has no way to signal original server smbd process that it
> > should die.
> >
> > TCP timeouts will kill after 2 hours but this is considered *way*
> > too long for client to wait. "reset on zero vc" isn't set due to
> > potential NAT issues.
> >
> > So here is an (untested, but compiles and I think it'll do the job)
> > solution. Parameter "killtime" (set in minutes). If nothing received
> > on a TCP connection (including no SMBecho calls) for "killtime"
> > minutes, it causes the smbd to commit suicide (even with open
> > resources). Cleanly of course :-).
> >
> > Thoughts on whether this is a good idea ? If so I'll write up the
> > man page :-).
>
> No, I don't think this is a good idea. We have the keepalive
> and deadtime parameters together with reset on zero vc. You
> can also play with socket options, at least on Linux to
> lower the dead connection detection:
>
> socket options = SO_KEEPALIVE TCP_KEEPIDLE=120 TCP_KEEPINTVL=10 TCP_KEEPCNT=5
>
> might be a starting point.
>
> The big question is -- how does Windows behave? afaik
> Windows has the equivalent of "reset on zero vc" set always.
>
> One idea for this particular case: What happens if we send a
> netbios keepalive message from server to client when a
> sharing violation is about to happen?

What is really interesting is that we ignore errors from
our own send_keepalive() code.

All that happens if sending a keepalive fails from server
to client is that we log it and stop sending them.

Is that correct ? If we fail in sending a keepalive shouldn't
we terminate in the same way as in the deadtime code ?

That's what the comments suggest, but not what the code
does.

Jeremy.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: "killtime" parameter patch ?

Volker Lendecke
On Wed, May 16, 2012 at 09:44:06AM -0700, Jeremy Allison wrote:
> What is really interesting is that we ignore errors from
> our own send_keepalive() code.
>
> All that happens if sending a keepalive fails from server
> to client is that we log it and stop sending them.
>
> Is that correct ? If we fail in sending a keepalive shouldn't
> we terminate in the same way as in the deadtime code ?

Yes, it is correct I think. I don't think at writev(2) time
a TCP socket can reliably detect that it will not be able to
ship the data for whatever reason. So I can not imagine it
to return a reliable error message. What will happen though
is that the keepalive packet will trigger a much faster
timeout or a RST from the client which in turn will make the
next poll/recv fail.

Please correct me if I'm wrong.

Volker

--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:[hidden email]
Loading...