[PATCH] Fix bug #13058 - Kernel oplock can remain not broken if oplock holder closes the file and another process catches the lease.

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Fix bug #13058 - Kernel oplock can remain not broken if oplock holder closes the file and another process catches the lease.

Samba - samba-technical mailing list
Hi Ralph,

As discussed on the phone, the fix is to revert
commit b35a296a27a0807c780f2a9e7af2f2e93feefaa8.

The proof is the test case added after. It shows
a race condition, which can be forced one way
or the other by adding (or not adding):

diff --git a/source3/smbd/open.c b/source3/smbd/open.c
index 89a267b0634..a65e75370ad 100644
--- a/source3/smbd/open.c
+++ b/source3/smbd/open.c
@@ -3269,6 +3269,9 @@ static NTSTATUS open_file_ntcreate(connection_struct *conn,
  schedule_defer_open(lck, fsp->file_id, request_time,
     req);
  TALLOC_FREE(lck);
+ if (oplock_request == 0) {
+ sleep(5);
+ }
  DEBUG(10, ("Sent oplock break request to kernel "
    "oplock holder\n"));
  return NT_STATUS_SHARING_VIOLATION;

into smbd to force it to process either the
create request causing the break first, or
the create request sent as a result of the
kernel oplock break.

Please review and push if happy.

I'm planning to follow up with the fix for
kernel oplock use by a non-smbd process, but
this needs more thought on how to add a test
case for it.

Cheers,

        Jeremy.

bug-13058.master (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Samba - samba-technical mailing list
After discussion with Ralph and Volker, here is a fix
for the case where smbd tries to open a file that has
a linux kernel lease on it by another (non-smbd) process.

Changes retry_open() to setup_kernel_oplock_poll_open()
to make things clear.

Includes a regression test case that demonstrates the
problem.

NB. This patch depends on the previously posted patch
for bug #13058 being applied first, so I'm attaching
both fixes here (in case you want to be a good citizen
and review both together :-).

Cheers,

        Jeremy.

bug-13058.master (13K) Download Attachment
bug-13121-master.patch (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Samba - samba-technical mailing list
I'm now seeing significant numbers of builds hanging in:

[558(3538)/2212 at 1h13m33s] samba3.smb2.kernel-oplocks(nt4_dc)

On 10/11/17 10:39, Jeremy Allison via samba-technical wrote:

> After discussion with Ralph and Volker, here is a fix
> for the case where smbd tries to open a file that has
> a linux kernel lease on it by another (non-smbd) process.
>
> Changes retry_open() to setup_kernel_oplock_poll_open()
> to make things clear.
>
> Includes a regression test case that demonstrates the
> problem.
>
> NB. This patch depends on the previously posted patch
> for bug #13058 being applied first, so I'm attaching
> both fixes here (in case you want to be a good citizen
> and review both together :-).
>
> Cheers,
>
> Jeremy.
>


signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Samba - samba-technical mailing list
On Thu, Nov 16, 2017 at 06:55:13AM +1300, Gary Lockyer via samba-technical wrote:
> I'm now seeing significant numbers of builds hanging in:
>
> [558(3538)/2212 at 1h13m33s] samba3.smb2.kernel-oplocks(nt4_dc)

Can you track down what part of the test it's hanging in ?
A strace on the smbtorture client process would help here.

The test does depend on working kernel oplocks as the
test child holds the lease for 3 seconds. Also, the
child does a pause() waiting for a signal which is
expected to come when the smbd tries to open the leased
file. I could add a 5 second alarm() timeout to ensure
it can't ever hang there but under normal circumstances
it shouldn't hang.

What platform are you testing on - if it's non-Linux
this test should not be running.

I do want to keep this test if I can as it's a guarantee that
kernel oplocks are working correctly in smbd.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Samba - samba-technical mailing list
On Wed, Nov 15, 2017 at 10:06:01AM -0800, Jeremy Allison via samba-technical wrote:

> On Thu, Nov 16, 2017 at 06:55:13AM +1300, Gary Lockyer via samba-technical wrote:
> > I'm now seeing significant numbers of builds hanging in:
> >
> > [558(3538)/2212 at 1h13m33s] samba3.smb2.kernel-oplocks(nt4_dc)
>
> Can you track down what part of the test it's hanging in ?
> A strace on the smbtorture client process would help here.
>
> The test does depend on working kernel oplocks as the
> test child holds the lease for 3 seconds. Also, the
> child does a pause() waiting for a signal which is
> expected to come when the smbd tries to open the leased
> file. I could add a 5 second alarm() timeout to ensure
> it can't ever hang there but under normal circumstances
> it shouldn't hang.
>
> What platform are you testing on - if it's non-Linux
> this test should not be running.
>
> I do want to keep this test if I can as it's a guarantee that
> kernel oplocks are working correctly in smbd.
Can you try running your builds with this additional
patch ?

It should prevent hangs from the smbtorture child using
the alarm mechanism I mentioned above, and also if the
test then fails it should print out a meaningful error
code so we can discover where the child had problems.

Cheers,

        Jeremy.

0001-s4-torture-Ensure-kernel-oplock-test-can-t-hang-in-p.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list


On 16/11/17 07:06, Jeremy Allison via samba-technical wrote:
> On Thu, Nov 16, 2017 at 06:55:13AM +1300, Gary Lockyer via samba-technical wrote:
>> I'm now seeing significant numbers of builds hanging in:
>>
>> [558(3538)/2212 at 1h13m33s] samba3.smb2.kernel-oplocks(nt4_dc)
>
> Can you track down what part of the test it's hanging in ?
> A strace on the smbtorture client process would help here.
I'll kick off some builds now, and see what I can find.

>
> The test does depend on working kernel oplocks as the
> test child holds the lease for 3 seconds. Also, the
> child does a pause() waiting for a signal which is
> expected to come when the smbd tries to open the leased
> file. I could add a 5 second alarm() timeout to ensure
> it can't ever hang there but under normal circumstances
> it shouldn't hang.
>
> What platform are you testing on - if it's non-Linux
> this test should not be running.
Ubuntu 14.04
>
> I do want to keep this test if I can as it's a guarantee that
> kernel oplocks are working correctly in smbd.
>



signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Samba - samba-technical mailing list
On Thu, Nov 16, 2017 at 07:52:29AM +1300, Gary Lockyer via samba-technical wrote:

>
>
> On 16/11/17 07:06, Jeremy Allison via samba-technical wrote:
> > On Thu, Nov 16, 2017 at 06:55:13AM +1300, Gary Lockyer via samba-technical wrote:
> >> I'm now seeing significant numbers of builds hanging in:
> >>
> >> [558(3538)/2212 at 1h13m33s] samba3.smb2.kernel-oplocks(nt4_dc)
> >
> > Can you track down what part of the test it's hanging in ?
> > A strace on the smbtorture client process would help here.
> I'll kick off some builds now, and see what I can find.

Please try with the additional patch I just sent. Should
make things easier to find.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list


On 16/11/17 07:52, Jeremy Allison wrote:

> On Wed, Nov 15, 2017 at 10:06:01AM -0800, Jeremy Allison via samba-technical wrote:
>> On Thu, Nov 16, 2017 at 06:55:13AM +1300, Gary Lockyer via samba-technical wrote:
>>> I'm now seeing significant numbers of builds hanging in:
>>>
>>> [558(3538)/2212 at 1h13m33s] samba3.smb2.kernel-oplocks(nt4_dc)
>>
>> Can you track down what part of the test it's hanging in ?
>> A strace on the smbtorture client process would help here.
>>
>> The test does depend on working kernel oplocks as the
>> test child holds the lease for 3 seconds. Also, the
>> child does a pause() waiting for a signal which is
>> expected to come when the smbd tries to open the leased
>> file. I could add a 5 second alarm() timeout to ensure
>> it can't ever hang there but under normal circumstances
>> it shouldn't hang.
>>
>> What platform are you testing on - if it's non-Linux
>> this test should not be running.
>>
>> I do want to keep this test if I can as it's a guarantee that
>> kernel oplocks are working correctly in smbd.
>
> Can you try running your builds with this additional
> patch ?
Will kick that off now.

>
> It should prevent hangs from the smbtorture child using
> the alarm mechanism I mentioned above, and also if the
> test then fails it should print out a meaningful error
> code so we can discover where the child had problems.
>
> Cheers,
>
> Jeremy.
>


signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Thu, Nov 16, 2017 at 07:52:29AM +1300, Gary Lockyer wrote:

>
>
> On 16/11/17 07:06, Jeremy Allison via samba-technical wrote:
> > On Thu, Nov 16, 2017 at 06:55:13AM +1300, Gary Lockyer via samba-technical wrote:
> >> I'm now seeing significant numbers of builds hanging in:
> >>
> >> [558(3538)/2212 at 1h13m33s] samba3.smb2.kernel-oplocks(nt4_dc)
> >
> > Can you track down what part of the test it's hanging in ?
> > A strace on the smbtorture client process would help here.
> I'll kick off some builds now, and see what I can find.
> >
> > The test does depend on working kernel oplocks as the
> > test child holds the lease for 3 seconds. Also, the
> > child does a pause() waiting for a signal which is
> > expected to come when the smbd tries to open the leased
> > file. I could add a 5 second alarm() timeout to ensure
> > it can't ever hang there but under normal circumstances
> > it shouldn't hang.
> >
> > What platform are you testing on - if it's non-Linux
> > this test should not be running.
> Ubuntu 14.04

I'm also developing and testing on a variant of 14.04,
so the kernel should work.

I'm running with kernel 4.4.0-93-generic. What kernel
version is on yours ?

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Samba - samba-technical mailing list


On 16/11/17 08:19, Jeremy Allison via samba-technical wrote:

> On Thu, Nov 16, 2017 at 07:52:29AM +1300, Gary Lockyer wrote:
>>
>>
>> On 16/11/17 07:06, Jeremy Allison via samba-technical wrote:
>>> On Thu, Nov 16, 2017 at 06:55:13AM +1300, Gary Lockyer via samba-technical wrote:
>>>> I'm now seeing significant numbers of builds hanging in:
>>>>
>>>> [558(3538)/2212 at 1h13m33s] samba3.smb2.kernel-oplocks(nt4_dc)
>>>
>>> Can you track down what part of the test it's hanging in ?
>>> A strace on the smbtorture client process would help here.
>> I'll kick off some builds now, and see what I can find.
>>>
>>> The test does depend on working kernel oplocks as the
>>> test child holds the lease for 3 seconds. Also, the
>>> child does a pause() waiting for a signal which is
>>> expected to come when the smbd tries to open the leased
>>> file. I could add a 5 second alarm() timeout to ensure
>>> it can't ever hang there but under normal circumstances
>>> it shouldn't hang.
>>>
>>> What platform are you testing on - if it's non-Linux
>>> this test should not be running.
>> Ubuntu 14.04
>
> I'm also developing and testing on a variant of 14.04,
> so the kernel should work.
>
> I'm running with kernel 4.4.0-93-generic. What kernel
> version is on yours ?
>
It's running 3.13.0-92-generic



signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Thu, Nov 16, 2017 at 10:18:26AM +1300, Gary Lockyer wrote:

> Ok just failed logs attached.

> [558(3552)/2212 at 1h9m32s] samba3.smb2.kernel-oplocks(nt4_dc)
> Bad child exit code 10

Right - the key is the above. It comes from here in my new
patch:

+       if (got_alarm == 1) {
+               return 10;
+       }

which means that the alarm(5) is firing in the client.

This means your kernel is not sending a RT_SIGNAL_LEASE
signal correctly to the client process holding the lease.

AKA - you're running with a broken kernel.

This code works perfectly on my local systems and
also on sn-devel, so I think it's the kernel you
have that's a fault here.

If you can RB+ my patch, I'll push it to master
as at least it means that the tests don't hang
anymore and you can track down what's wrong in
your test env without carrying an extra patch.

(This patch would be good to have in master anyway,
as it prevents any hanging if the tests are being
run on a broken Linux kernel).

Cheers,

        Jeremy.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Thu, Nov 16, 2017 at 09:19:41AM +1300, Gary Lockyer wrote:

>
>
> On 16/11/17 08:19, Jeremy Allison via samba-technical wrote:
> > On Thu, Nov 16, 2017 at 07:52:29AM +1300, Gary Lockyer wrote:
> >>
> >>
> >> On 16/11/17 07:06, Jeremy Allison via samba-technical wrote:
> >>> On Thu, Nov 16, 2017 at 06:55:13AM +1300, Gary Lockyer via samba-technical wrote:
> >>>> I'm now seeing significant numbers of builds hanging in:
> >>>>
> >>>> [558(3538)/2212 at 1h13m33s] samba3.smb2.kernel-oplocks(nt4_dc)
> >>>
> >>> Can you track down what part of the test it's hanging in ?
> >>> A strace on the smbtorture client process would help here.
> >> I'll kick off some builds now, and see what I can find.
> >>>
> >>> The test does depend on working kernel oplocks as the
> >>> test child holds the lease for 3 seconds. Also, the
> >>> child does a pause() waiting for a signal which is
> >>> expected to come when the smbd tries to open the leased
> >>> file. I could add a 5 second alarm() timeout to ensure
> >>> it can't ever hang there but under normal circumstances
> >>> it shouldn't hang.
> >>>
> >>> What platform are you testing on - if it's non-Linux
> >>> this test should not be running.
> >> Ubuntu 14.04
> >
> > I'm also developing and testing on a variant of 14.04,
> > so the kernel should work.
> >
> > I'm running with kernel 4.4.0-93-generic. What kernel
> > version is on yours ?
> >
> It's running 3.13.0-92-generic

That's it. That kernel almost certainly has broken
lease support.

We didn't notice because we never had any tests before
for smbd interacting with non-smbd processes with
kernel leases.

Now we do :-). So it's detecting a broken kernel,
which is what it was designed to do.

I suggest we add my additional patch to master so
it doesn't hang anymore, and then you'll have to
carry an extra patch adding samba3.smb2.kernel-oplocks(nt4_dc)
to knownfail until you can run on a working kernel.

But at least we have tests for this now :-).

Cheers,

        Jeremy.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Wed, 2017-11-15 at 13:29 -0800, Jeremy Allison via samba-technical
wrote:

> On Thu, Nov 16, 2017 at 10:18:26AM +1300, Gary Lockyer wrote:
>
> > Ok just failed logs attached.
> > [558(3552)/2212 at 1h9m32s] samba3.smb2.kernel-oplocks(nt4_dc)
> > Bad child exit code 10
>
> Right - the key is the above. It comes from here in my new
> patch:
>
> +       if (got_alarm == 1) {
> +               return 10;
> +       }
>
> which means that the alarm(5) is firing in the client.
>
> This means your kernel is not sending a RT_SIGNAL_LEASE
> signal correctly to the client process holding the lease.
>
> AKA - you're running with a broken kernel.

I've just worked with Gary to use a more recent 14.04 image for the
cloud builds, that should clear this up.

Andrew Bartlett

--
Andrew Bartlett
https://samba.org/~abartlet/
Authentication Developer, Samba Team         https://samba.org
Samba Development and Support, Catalyst IT  
https://catalyst.net.nz/services/samba





Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Samba - samba-technical mailing list
On Thu, Nov 16, 2017 at 10:38:26AM +1300, Andrew Bartlett wrote:

> On Wed, 2017-11-15 at 13:29 -0800, Jeremy Allison via samba-technical
> wrote:
> > On Thu, Nov 16, 2017 at 10:18:26AM +1300, Gary Lockyer wrote:
> >
> > > Ok just failed logs attached.
> > > [558(3552)/2212 at 1h9m32s] samba3.smb2.kernel-oplocks(nt4_dc)
> > > Bad child exit code 10
> >
> > Right - the key is the above. It comes from here in my new
> > patch:
> >
> > +       if (got_alarm == 1) {
> > +               return 10;
> > +       }
> >
> > which means that the alarm(5) is firing in the client.
> >
> > This means your kernel is not sending a RT_SIGNAL_LEASE
> > signal correctly to the client process holding the lease.
> >
> > AKA - you're running with a broken kernel.
>
> I've just worked with Gary to use a more recent 14.04 image for the
> cloud builds, that should clear this up.

Great, thanks ! If one of your Team could also +1 the
additional patch I'll push it to master so at least
we won't hang anymore if running on a broken kernel
and give some feedback as to what is wrong.

Cheers,

        Jeremy.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Samba - samba-technical mailing list


On 16/11/17 10:42, Jeremy Allison via samba-technical wrote:

> On Thu, Nov 16, 2017 at 10:38:26AM +1300, Andrew Bartlett wrote:
>> On Wed, 2017-11-15 at 13:29 -0800, Jeremy Allison via samba-technical
>> wrote:
>>> On Thu, Nov 16, 2017 at 10:18:26AM +1300, Gary Lockyer wrote:
>>>
>>>> Ok just failed logs attached.
>>>> [558(3552)/2212 at 1h9m32s] samba3.smb2.kernel-oplocks(nt4_dc)
>>>> Bad child exit code 10
>>>
>>> Right - the key is the above. It comes from here in my new
>>> patch:
>>>
>>> +       if (got_alarm == 1) {
>>> +               return 10;
>>> +       }
>>>
>>> which means that the alarm(5) is firing in the client.
>>>
>>> This means your kernel is not sending a RT_SIGNAL_LEASE
>>> signal correctly to the client process holding the lease.
>>>
>>> AKA - you're running with a broken kernel.
>>
>> I've just worked with Gary to use a more recent 14.04 image for the
>> cloud builds, that should clear this up.
>
> Great, thanks ! If one of your Team could also +1 the
> additional patch I'll push it to master so at least
> we won't hang anymore if running on a broken kernel
> and give some feedback as to what is wrong.
>
> Cheers,
>
> Jeremy.
>
Passes our auto builds on a Ubuntu 14.04 with a more current kernel



signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Wednesday, 15 November 2017 19:52:18 CET Jeremy Allison via samba-technical
wrote:
> On Wed, Nov 15, 2017 at 10:06:01AM -0800, Jeremy Allison via samba-technical
wrote:
> > On Thu, Nov 16, 2017 at 06:55:13AM +1300, Gary Lockyer via samba-technical
wrote:

> > > I'm now seeing significant numbers of builds hanging in:
> > >
> > > [558(3538)/2212 at 1h13m33s] samba3.smb2.kernel-oplocks(nt4_dc)
> >
> > Can you track down what part of the test it's hanging in ?
> > A strace on the smbtorture client process would help here.
> >
> > The test does depend on working kernel oplocks as the
> > test child holds the lease for 3 seconds. Also, the
> > child does a pause() waiting for a signal which is
> > expected to come when the smbd tries to open the leased
> > file. I could add a 5 second alarm() timeout to ensure
> > it can't ever hang there but under normal circumstances
> > it shouldn't hang.
> >
> > What platform are you testing on - if it's non-Linux
> > this test should not be running.
> >
> > I do want to keep this test if I can as it's a guarantee that
> > kernel oplocks are working correctly in smbd.
>
> Can you try running your builds with this additional
> patch ?
>
> It should prevent hangs from the smbtorture child using
> the alarm mechanism I mentioned above, and also if the
> test then fails it should print out a meaningful error
> code so we can discover where the child had problems.
>
> Cheers,
>
> Jeremy.

RB+

--
Andreas Schneider                   GPG-ID: CC014E3D
Samba Team                             [hidden email]
www.samba.org

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Thu, Nov 16, 2017 at 04:16:11PM +1300, Gary Lockyer wrote:

>
>
> On 16/11/17 10:42, Jeremy Allison via samba-technical wrote:
> > On Thu, Nov 16, 2017 at 10:38:26AM +1300, Andrew Bartlett wrote:
> >> On Wed, 2017-11-15 at 13:29 -0800, Jeremy Allison via samba-technical
> >> wrote:
> >>> On Thu, Nov 16, 2017 at 10:18:26AM +1300, Gary Lockyer wrote:
> >>>
> >>>> Ok just failed logs attached.
> >>>> [558(3552)/2212 at 1h9m32s] samba3.smb2.kernel-oplocks(nt4_dc)
> >>>> Bad child exit code 10
> >>>
> >>> Right - the key is the above. It comes from here in my new
> >>> patch:
> >>>
> >>> +       if (got_alarm == 1) {
> >>> +               return 10;
> >>> +       }
> >>>
> >>> which means that the alarm(5) is firing in the client.
> >>>
> >>> This means your kernel is not sending a RT_SIGNAL_LEASE
> >>> signal correctly to the client process holding the lease.
> >>>
> >>> AKA - you're running with a broken kernel.
> >>
> >> I've just worked with Gary to use a more recent 14.04 image for the
> >> cloud builds, that should clear this up.
> >
> > Great, thanks ! If one of your Team could also +1 the
> > additional patch I'll push it to master so at least
> > we won't hang anymore if running on a broken kernel
> > and give some feedback as to what is wrong.
> >
> > Cheers,
> >
> > Jeremy.
> >
> Passes our auto builds on a Ubuntu 14.04 with a more current kernel

Thanks for confirming ! Now at least we can know when a specific Linux
kernel version is broken :-).