[WIP] TDB traverse lock changes for massive AD DC perf improvement

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
On Wed, 2017-04-05 at 15:40 +0200, Stefan Metzmacher via samba-
technical wrote:

> Hi Andrew,
>
> > > > Please review.  If reviewed, I'll push with a patch that adds
> > > > new
> > > > performance tests that I'm keen to get in. 
> > >
> > > I'm wondering about all the readonly checks in
> > > _tdb_transaction_prepare_commit(),
> > > we already handle that in _tdb_transaction_start().
> > >
> > > I'm a bit nervous about the solaris10 problem.
> >
> > I am to.  I only got game to formally propose it when Jeremy
> > essentially proclaimed it dead :-)
>
> I discussed this with Volker and we think we have an understanding
> of what the solaris problem might be.
>
> The pattern with the traverse_read and prepare_commit interaction is
> the following:
>
> 1. transaction_start got the allrecord lock with F_RDLCK.
>
> 2. the traverse_read code walks the database in a sequence like this
> (per chain):
>    2.1  chainlock(chainX, F_RDLCK)
>    2.2  recordlock(chainX.record1, F_RDLCK)
>    2.3  chainunlock(chainX, F_RDLCK)
>    2.4  callback(chainX.record1)
>    2.5  chainlock(chainX, F_RDLCK)
>    2.6  recordunlock(chainX.record1, F_RDLCK)
>    2.7  recordlock(chainX.record2, F_RDLCK)
>    2.8  chainunlock(chainX, F_RDLCK)
>    2.9  callback(chainX.record2)
>    2.10 chainlock(chainX, F_RDLCK)
>    2.11 recordunlock(chainX.record2, F_RDLCK)
>    2.12 chainunlock(chainX, F_RDLCK)
>    2.13 goto next chain
>
>    So it has always one record locked in F_RDLCK mode and tries to
>    get the 2nd one before it releases the first one.
>
> 3. prepare_commit tries to upgrade the allrecord lock to F_RWLCK
>    If that happens at the time of 2.4, the operation of
>    2.5 may deadlock with the allrecord lock upgrade.
>    On Linux step 2.5 works in order to make some progress with the
>    locking, but on solaris it might fail because the kernel
>    wants to satisfy the 1st lock requester before the 2nd one.
>
> I think the first step is a standalone test that does this:
>
> process1: F_RDLCK for ofs=0 len=2
> process2: F_RDLCK for ofs=0 len=1
> process1: upgrade ofs=0 len=2 to F_RWLCK (in blocking mode)
> process2: F_RDLCK for ofs=1 len=1
> process2: unlock ofs=0 len=2
> process1: should continue at that point
>
> Once we have such a test we can run it on several solaris, freebsd,
> linux or whatever.
>
> Then we can decide if we want a configure and/or runtime check for
> this.
> And only avoid the transaction F_RDLCK lock in traverse_read if the
> kernel
> behaves as expected.
>
> Can you write such a standalone test?

I'll give it a shot today.  Can I do it in python?

Thanks!

Andrew Bartlett
--
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
Am 05.04.2017 um 20:56 schrieb Andrew Bartlett via samba-technical:

> On Wed, 2017-04-05 at 15:40 +0200, Stefan Metzmacher via samba-
> technical wrote:
>> Hi Andrew,
>>
>>>>> Please review.  If reviewed, I'll push with a patch that adds
>>>>> new
>>>>> performance tests that I'm keen to get in.
>>>>
>>>> I'm wondering about all the readonly checks in
>>>> _tdb_transaction_prepare_commit(),
>>>> we already handle that in _tdb_transaction_start().
>>>>
>>>> I'm a bit nervous about the solaris10 problem.
>>>
>>> I am to.  I only got game to formally propose it when Jeremy
>>> essentially proclaimed it dead :-)
>>
>> I discussed this with Volker and we think we have an understanding
>> of what the solaris problem might be.
>>
>> The pattern with the traverse_read and prepare_commit interaction is
>> the following:
>>
>> 1. transaction_start got the allrecord lock with F_RDLCK.
>>
>> 2. the traverse_read code walks the database in a sequence like this
>> (per chain):
>>    2.1  chainlock(chainX, F_RDLCK)
>>    2.2  recordlock(chainX.record1, F_RDLCK)
>>    2.3  chainunlock(chainX, F_RDLCK)
>>    2.4  callback(chainX.record1)
>>    2.5  chainlock(chainX, F_RDLCK)
>>    2.6  recordunlock(chainX.record1, F_RDLCK)
>>    2.7  recordlock(chainX.record2, F_RDLCK)
>>    2.8  chainunlock(chainX, F_RDLCK)
>>    2.9  callback(chainX.record2)
>>    2.10 chainlock(chainX, F_RDLCK)
>>    2.11 recordunlock(chainX.record2, F_RDLCK)
>>    2.12 chainunlock(chainX, F_RDLCK)
>>    2.13 goto next chain
>>
>>    So it has always one record locked in F_RDLCK mode and tries to
>>    get the 2nd one before it releases the first one.
>>
>> 3. prepare_commit tries to upgrade the allrecord lock to F_RWLCK
>>    If that happens at the time of 2.4, the operation of
>>    2.5 may deadlock with the allrecord lock upgrade.
>>    On Linux step 2.5 works in order to make some progress with the
>>    locking, but on solaris it might fail because the kernel
>>    wants to satisfy the 1st lock requester before the 2nd one.
>>
>> I think the first step is a standalone test that does this:
>>
>> process1: F_RDLCK for ofs=0 len=2
>> process2: F_RDLCK for ofs=0 len=1
>> process1: upgrade ofs=0 len=2 to F_RWLCK (in blocking mode)
>> process2: F_RDLCK for ofs=1 len=1
>> process2: unlock ofs=0 len=2
>> process1: should continue at that point
>>
>> Once we have such a test we can run it on several solaris, freebsd,
>> linux or whatever.
>>
>> Then we can decide if we want a configure and/or runtime check for
>> this.
>> And only avoid the transaction F_RDLCK lock in traverse_read if the
>> kernel
>> behaves as expected.
>>
>> Can you write such a standalone test?
>
> I'll give it a shot today.  Can I do it in python?
No, I think we need this in C.

metze


signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
On 06/04/17 19:47, Stefan Metzmacher via samba-technical wrote:

>>> I think the first step is a standalone test that does this:
>>>
>>> process1: F_RDLCK for ofs=0 len=2
>>> process2: F_RDLCK for ofs=0 len=1
>>> process1: upgrade ofs=0 len=2 to F_RWLCK (in blocking mode)
>>> process2: F_RDLCK for ofs=1 len=1
>>> process2: unlock ofs=0 len=2
>>> process1: should continue at that point
>>>
>>> Once we have such a test we can run it on several solaris, freebsd,
>>> linux or whatever.
>>>
>>> Then we can decide if we want a configure and/or runtime check for
>>> this.
>>> And only avoid the transaction F_RDLCK lock in traverse_read if the
>>> kernel
>>> behaves as expected.
>>>
>>> Can you write such a standalone test?
>>
>> I'll give it a shot today.  Can I do it in python?
>
> No, I think we need this in C.
I had a go at it. Attached.

cheers,

Douglas


fcntl_deadlock.c (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
On 07/04/17 17:28, Douglas Bagnall via samba-technical wrote:

> I had a go at it. Attached.

One thing I missed (I *was* drinking at the time) is process 1 waitpid()ing
at the end to collect the failure of process 2. That is, it needs something
like:

--- a/fcntl_deadlock.c 2017-04-07 18:08:38.916348365 +1200
+++ b/fcntl_deadlock.c 2017-04-07 18:08:12.696332062 +1200

@@ -1,3 +1,4 @@
+#include <sys/wait.h>
 #include <inttypes.h>
 #include <stdio.h>
 #include <time.h>
@@ -60,7 +61,7 @@
  int fd;
  char *filename = argv[1];
  char *msg;
- int pid;
+ int pid, status;
  int pipes_1_2[2];
  int pipes_2_1[2];
 
@@ -128,6 +129,11 @@
 
  if (process == 1) {
  expect_char('d');
+ pid = waitpid(pid, &status, 0);
+ if (status) {
+ msg = "process 2 failed";
+ goto err;
+ }
  }
 
  printf("process %d has got to the end\n", process);

but I'll leave it to people who actually have an OS where it fails
to decide the details.

Douglas


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Fri, Apr 07, 2017 at 05:28:02PM +1200, Douglas Bagnall via samba-technical wrote:
> I had a go at it. Attached.
>
> cheers,
>
> Douglas

Thanks, looks interesting! Attached find a version with a bit of
logging beyond strace. Unsurprisingly works fine on debian jessie. Now
lets find someone with a Solaris box :-)

Volker

fcntl_deadlock.c (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
On Fri, Apr 07, 2017 at 08:18:14AM +0200, Volker Lendecke via samba-technical wrote:

> On Fri, Apr 07, 2017 at 05:28:02PM +1200, Douglas Bagnall via samba-technical wrote:
> > I had a go at it. Attached.
> >
> > cheers,
> >
> > Douglas
>
> Thanks, looks interesting! Attached find a version with a bit of
> logging beyond strace. Unsurprisingly works fine on debian jessie. Now
> lets find someone with a Solaris box :-)

Is this the expected output?

---8<---
slow@login [login]:~ > ssh unstable10x
Last login: Fri Apr  7 08:22:49 2017 from login.bo.opencs
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
slow@unstable10x [global]:~ > gcc -o fcntl_deadlock fcntl_deadlock.c
slow@unstable10x [global]:~ > ./fcntl_deadlock
usage: ./fcntl_deadlock <filename>
slow@unstable10x [global]:~ > ./fcntl_deadlock fcntl_deadlock
process 1 lock ofs=0 len=2: Resource temporarily unavailable
slow@unstable10x [global]:~ >
---8<---

That's on Solaris 10 x86.

Cheerio!
-slow

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
On Fri, Apr 07, 2017 at 08:23:59AM +0200, Ralph Böhme wrote:

> On Fri, Apr 07, 2017 at 08:18:14AM +0200, Volker Lendecke via samba-technical wrote:
> > On Fri, Apr 07, 2017 at 05:28:02PM +1200, Douglas Bagnall via samba-technical wrote:
> > > I had a go at it. Attached.
> > >
> > > cheers,
> > >
> > > Douglas
> >
> > Thanks, looks interesting! Attached find a version with a bit of
> > logging beyond strace. Unsurprisingly works fine on debian jessie. Now
> > lets find someone with a Solaris box :-)
>
> Is this the expected output?
>
> ---8<---
> slow@login [login]:~ > ssh unstable10x
> Last login: Fri Apr  7 08:22:49 2017 from login.bo.opencs
> Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
> slow@unstable10x [global]:~ > gcc -o fcntl_deadlock fcntl_deadlock.c
> slow@unstable10x [global]:~ > ./fcntl_deadlock
> usage: ./fcntl_deadlock <filename>
> slow@unstable10x [global]:~ > ./fcntl_deadlock fcntl_deadlock
> process 1 lock ofs=0 len=2: Resource temporarily unavailable

I guess you need a separate file, not the executable itself :-)

Volker

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
On Fri, Apr 07, 2017 at 08:33:44AM +0200, Volker Lendecke wrote:

> On Fri, Apr 07, 2017 at 08:23:59AM +0200, Ralph Böhme wrote:
> > On Fri, Apr 07, 2017 at 08:18:14AM +0200, Volker Lendecke via samba-technical wrote:
> > > On Fri, Apr 07, 2017 at 05:28:02PM +1200, Douglas Bagnall via samba-technical wrote:
> > > > I had a go at it. Attached.
> > > >
> > > > cheers,
> > > >
> > > > Douglas
> > >
> > > Thanks, looks interesting! Attached find a version with a bit of
> > > logging beyond strace. Unsurprisingly works fine on debian jessie. Now
> > > lets find someone with a Solaris box :-)
> >
> > Is this the expected output?
> >
> > ---8<---
> > slow@login [login]:~ > ssh unstable10x
> > Last login: Fri Apr  7 08:22:49 2017 from login.bo.opencs
> > Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
> > slow@unstable10x [global]:~ > gcc -o fcntl_deadlock fcntl_deadlock.c
> > slow@unstable10x [global]:~ > ./fcntl_deadlock
> > usage: ./fcntl_deadlock <filename>
> > slow@unstable10x [global]:~ > ./fcntl_deadlock fcntl_deadlock
> > process 1 lock ofs=0 len=2: Resource temporarily unavailable
>
> I guess you need a separate file, not the executable itself :-)

d'oh, sorry!

So:

---8<---
slow@unstable10x [global]:~ > touch foo
slow@unstable10x [global]:~ > ./fcntl_deadlock foo
process 1 took read lock on range 0,2
process 2 took read lock on range 0,1
process 1 starts upgrade on range 0,2
process 2 got read lock on 1,1
process 2 released read lock on 0,2
process 1 got read upgrade done
process 2 has got to the end
process 1 has got to the end
---8<---

Cheerio!
-slow

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
On Fri, Apr 07, 2017 at 09:48:03AM +0200, Ralph Böhme wrote:

> On Fri, Apr 07, 2017 at 08:33:44AM +0200, Volker Lendecke wrote:
> > On Fri, Apr 07, 2017 at 08:23:59AM +0200, Ralph Böhme wrote:
> > > On Fri, Apr 07, 2017 at 08:18:14AM +0200, Volker Lendecke via samba-technical wrote:
> > > > On Fri, Apr 07, 2017 at 05:28:02PM +1200, Douglas Bagnall via samba-technical wrote:
> > > > > I had a go at it. Attached.
> > > > >
> > > > > cheers,
> > > > >
> > > > > Douglas
> > > >
> > > > Thanks, looks interesting! Attached find a version with a bit of
> > > > logging beyond strace. Unsurprisingly works fine on debian jessie. Now
> > > > lets find someone with a Solaris box :-)
> > >
> > > Is this the expected output?
> > >
> > > ---8<---
> > > slow@login [login]:~ > ssh unstable10x
> > > Last login: Fri Apr  7 08:22:49 2017 from login.bo.opencs
> > > Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
> > > slow@unstable10x [global]:~ > gcc -o fcntl_deadlock fcntl_deadlock.c
> > > slow@unstable10x [global]:~ > ./fcntl_deadlock
> > > usage: ./fcntl_deadlock <filename>
> > > slow@unstable10x [global]:~ > ./fcntl_deadlock fcntl_deadlock
> > > process 1 lock ofs=0 len=2: Resource temporarily unavailable
> >
> > I guess you need a separate file, not the executable itself :-)
>
> d'oh, sorry!
>
> So:
>
> ---8<---
> slow@unstable10x [global]:~ > touch foo
> slow@unstable10x [global]:~ > ./fcntl_deadlock foo
> process 1 took read lock on range 0,2
> process 2 took read lock on range 0,1
> process 1 starts upgrade on range 0,2
> process 2 got read lock on 1,1
> process 2 released read lock on 0,2
> process 1 got read upgrade done
> process 2 has got to the end
> process 1 has got to the end
> ---8<---
as requested by metze, here's the truss output.

Cheerio!
-slow

truss.txt (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
On Fri, 2017-04-07 at 13:11 +0200, Ralph Böhme via samba-technical
wrote:
>
> as requested by metze, here's the truss output.

Just checking:  Has anybody reached any conclusions yet on a way
forward here?

Thanks,

Andrew Bartlett

--
Andrew Bartlett
https://samba.org/~abartlet/
Authentication Developer, Samba Team         https://samba.org
Samba Development and Support, Catalyst IT  
https://catalyst.net.nz/services/samba





Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
On Tue, 2017-04-11 at 17:11 +1200, Andrew Bartlett via samba-technical
wrote:
> On Fri, 2017-04-07 at 13:11 +0200, Ralph Böhme via samba-technical
> wrote:
> >
> > as requested by metze, here's the truss output.
>
> Just checking:  Has anybody reached any conclusions yet on a way
> forward here?

It would be good to be able to move forward on this between now and the
end of SambaXP.  Can I have any suggestions on what the next move needs
to be?

Thanks,

Andrew Bartlett

--
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba


12