Quantcast

[WIP] TDB traverse lock changes for massive AD DC perf improvement

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[WIP] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
Garming recently discovered a bug in ldb_tdb, causing us not to hold
the allrecord lock in TDB after a lock/unlock cycle.  

The upshot of that is that a tdb traverse will do many more locks, but
still be protected against modification during the traverse by the
transaction lock.

However, testing this patch showed a lock ordering issue, triggered
EDEADLK, because if one process holds the allrecord lock, and the the
transaction lock (for the traverse), while the other process holds the
transaction lock (from a prepare commit), and tries to get the
allrecord lock (the finish the prepare commit), it will report a
deadlock in the prepare commit.

We tried to change the ltdb code to simply obtain a transaction lock
first, but the tdb transaction lock is not exposed in an API.  

However, by swapping the traverse to the allrecord lock, we remove the
second lock that is the heart of the ordering issue.

I realise that while unimportant to the AD DC case (all modifies in a
transaction), I have traded off concurrency for performance.  Can
someone explain why tdb_traverse_read() needs a lock at all?

Attached is the WIP patch and test modifications, as well as a PDF of
the performance improvement.

The branch is ldap-multi-process in the the Catalyst GIT server.

Thoughts most welcome.

Andrew Bartlett
--
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba

tdb-perf-wip.patch (23K) Download Attachment
samba-tdb-locking-perf.pdf (76K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [WIP] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
On Fri, 2017-03-31 at 23:51 +1300, Andrew Bartlett via samba-technical
wrote:

> Garming recently discovered a bug in ldb_tdb, causing us not to hold
> the allrecord lock in TDB after a lock/unlock cycle.  
>
> The upshot of that is that a tdb traverse will do many more locks,
> but
> still be protected against modification during the traverse by the
> transaction lock.
>
> However, testing this patch showed a lock ordering issue, triggered
> EDEADLK, because if one process holds the allrecord lock, and the the
> transaction lock (for the traverse), while the other process holds
> the
> transaction lock (from a prepare commit), and tries to get the
> allrecord lock (the finish the prepare commit), it will report a
> deadlock in the prepare commit. 
>
> We tried to change the ltdb code to simply obtain a transaction lock
> first, but the tdb transaction lock is not exposed in an API.  
>
> However, by swapping the traverse to the allrecord lock, we remove
> the
> second lock that is the heart of the ordering issue.
>
> I realise that while unimportant to the AD DC case (all modifies in a
> transaction), I have traded off concurrency for performance.  Can
> someone explain why tdb_traverse_read() needs a lock at all?

Here is the commit, but I still don't understand the reasoning:

commit 251aaafe3a9213118ac3a92def9ab2104c40d12a
Author: Andrew Tridgell <[hidden email]>
Date:   Tue Sep 27 01:26:34 2005 +0000

    r10522: finally got the locking working on solaris10. This adds a
read lock on
    the transaction lock in tdb_traverse_read(). This prevents a
pattern
    of locks which triggers the deadlock detection code in solaris10. I
    suspect solaris10 is trying to prevent lock starvation by granting
    locks in the order they were requested, which makes it much easier
to
    produce deadlocks.
    (This used to be commit 54203aacd138c30826d54c5d9b6cc8d6e9e270f8)

diff --git a/source4/lib/tdb/common/traverse.c
b/source4/lib/tdb/common/traverse.c
index 9271b75..00fe5be 100644
--- a/source4/lib/tdb/common/traverse.c
+++ b/source4/lib/tdb/common/traverse.c
@@ -205,9 +205,21 @@ int tdb_traverse_read(struct tdb_context *tdb,
 {
        struct tdb_traverse_lock tl = { NULL, 0, 0, F_RDLCK };
        int ret;
+       
+       /* we need to get a read lock on the transaction lock here to
+          cope with the lock ordering semantics of solaris10 */
+       if (tdb->methods->tdb_brlock(tdb, TRANSACTION_LOCK, F_RDLCK,
F_SETLKW, 0) == -1) {
+               TDB_LOG((tdb, 0, "tdb_traverse_read: failed to get
transaction lock\n"));
+               tdb->ecode = TDB_ERR_LOCK;
+               return -1;
+       }
+
        tdb->traverse_read++;
        ret = tdb_traverse_internal(tdb, fn, private, &tl);
        tdb->traverse_read--;
+
+       tdb->methods->tdb_brlock(tdb, TRANSACTION_LOCK, F_UNLCK,
F_SETLKW, 0);
+
        return ret;
 }
 

> Attached is the WIP patch and test modifications, as well as a PDF of
> the performance improvement. 

If we could understand better the solaris10 issue, we could potentially
remove this lock in tdb_traverse_read() as a better approach, and use
the allrecord lock for a tdb_traverse().  That would provide the
original concurrency for tdb_traverse_read().

> The branch is ldap-multi-process in the the Catalyst GIT server. 
>
> Thoughts most welcome.

My final thought is that I can't rationalise how the tdb locking model
and the async ldb stack can operate safely together.  

My instinct is that we need to create a new event context for reads,
just as we do for all transactions.  That is, starting to (as intended)
actually hold the allrecord lock while potentially calling
tevent_loop() via ldb_wait() feels totally unsafe.

Thanks,

Andrew Bartlett
--
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [WIP] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
On Sat, 2017-04-01 at 08:06 +1300, Andrew Bartlett via samba-technical
wrote:

> On Fri, 2017-03-31 at 23:51 +1300, Andrew Bartlett via samba-
> technical
> wrote:
> > Garming recently discovered a bug in ldb_tdb, causing us not to
> > hold
> > the allrecord lock in TDB after a lock/unlock cycle.  
> >
> > The upshot of that is that a tdb traverse will do many more locks,
> > but
> > still be protected against modification during the traverse by the
> > transaction lock.
> >
> > However, testing this patch showed a lock ordering issue, triggered
> > EDEADLK, because if one process holds the allrecord lock, and the
> > the
> > transaction lock (for the traverse), while the other process holds
> > the
> > transaction lock (from a prepare commit), and tries to get the
> > allrecord lock (the finish the prepare commit), it will report a
> > deadlock in the prepare commit. 
> >
> > We tried to change the ltdb code to simply obtain a transaction
> > lock
> > first, but the tdb transaction lock is not exposed in an API.  
> >
> > However, by swapping the traverse to the allrecord lock, we remove
> > the
> > second lock that is the heart of the ordering issue.
I've found the original intent for the transaction and traverse
behaviour documented in:

commit 7dd31288a701d772e45b1960ac4ce4cc1be782ed
Author: Andrew Tridgell <[hidden email]>
Date:   Thu Sep 22 13:12:46 2005 +0000

    r10421: following on discussions with simo, I have worked out a way
of
    allowing searches to proceed while another process is in a
    transaction, then only upgrading the transaction lock to a write
lock
    on commit.
    
    The solution is:
    
     - split tdb_traverse() into two calls, called tdb_traverse() and
       tdb_traverse_read(). The _read() version only gets read locks,
and
       will fail any write operations made in the callback from the
       traverse.
    
     - the normal tdb_traverse() call allows for read or write
operations
       in the callback, but gets the transaction lock, preventing
       transastions from starting inside the traverse
    
    In addition we enforce the following rule that you may not start a
    transaction within a traverse callback, although you can start a
    traverse within a transaction
    
    With these rules in place I believe all the deadlock possibilities
are
    removed, and we can now allow for searches to happen in parallel
with
    transactions

Following that, I've essentially reverted this:

> > I realise that while unimportant to the AD DC case (all modifies in
> > a
> > transaction), I have traded off concurrency for performance.  Can
> > someone explain why tdb_traverse_read() needs a lock at all?
>
> Here is the commit, but I still don't understand the reasoning:
>
> commit 251aaafe3a9213118ac3a92def9ab2104c40d12a
> Author: Andrew Tridgell <[hidden email]>
> Date:   Tue Sep 27 01:26:34 2005 +0000
>
>     r10522: finally got the locking working on solaris10. This adds a
> read lock on
>     the transaction lock in tdb_traverse_read(). This prevents a
> pattern
>     of locks which triggers the deadlock detection code in solaris10.
> I
>     suspect solaris10 is trying to prevent lock starvation by
> granting
>     locks in the order they were requested, which makes it much
> easier
> to
>     produce deadlocks.
>     (This used to be commit 54203aacd138c30826d54c5d9b6cc8d6e9e270f8)
>
> diff --git a/source4/lib/tdb/common/traverse.c
> b/source4/lib/tdb/common/traverse.c
> index 9271b75..00fe5be 100644
> --- a/source4/lib/tdb/common/traverse.c
> +++ b/source4/lib/tdb/common/traverse.c
> @@ -205,9 +205,21 @@ int tdb_traverse_read(struct tdb_context *tdb,
>  {
>         struct tdb_traverse_lock tl = { NULL, 0, 0, F_RDLCK };
>         int ret;
> +       
> +       /* we need to get a read lock on the transaction lock here to
> +          cope with the lock ordering semantics of solaris10 */
> +       if (tdb->methods->tdb_brlock(tdb, TRANSACTION_LOCK, F_RDLCK,
> F_SETLKW, 0) == -1) {
> +               TDB_LOG((tdb, 0, "tdb_traverse_read: failed to get
> transaction lock\n"));
> +               tdb->ecode = TDB_ERR_LOCK;
> +               return -1;
> +       }
> +
>         tdb->traverse_read++;
>         ret = tdb_traverse_internal(tdb, fn, private, &tl);
>         tdb->traverse_read--;
> +
> +       tdb->methods->tdb_brlock(tdb, TRANSACTION_LOCK, F_UNLCK,
> F_SETLKW, 0);
> +
>         return ret;
>  }
>  
>
> > Attached is the WIP patch and test modifications, as well as a PDF
> > of
> > the performance improvement. 
>
> If we could understand better the solaris10 issue, we could
> potentially
> remove this lock in tdb_traverse_read() as a better approach, and use
> the allrecord lock for a tdb_traverse().  That would provide the
> original concurrency for tdb_traverse_read(). 
I've now done that.  This feels closer to the correct solution, but
throws solaris10 under the bus for now.  See the attached patch.

> > The branch is ldap-multi-process in the the Catalyst GIT server. 

And this set with just the ldb and tdb locking is in ldb-tdb-locking.
I'm running the perf graphs again to see how just this isolated set
performs.

> > Thoughts most welcome.
>
> My final thought is that I can't rationalise how the tdb locking
> model
> and the async ldb stack can operate safely together.  
>
> My instinct is that we need to create a new event context for reads,
> just as we do for all transactions.  That is, starting to (as
> intended)
> actually hold the allrecord lock while potentially calling
> tevent_loop() via ldb_wait() feels totally unsafe. 
Thanks,

Andrew Bartlett
--
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba

ldb-tdb-locking.patch (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [WIP] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
On Sat, Apr 01, 2017 at 09:05:48AM +1300, Andrew Bartlett via samba-technical wrote:
>
> I've now done that.  This feels closer to the correct solution, but
> throws solaris10 under the bus for now.  See the attached patch.

Solaris10 is already under the bus wheels. They incorrectly
prevent an existing open UNIX domain socket from being accessed
from a non-root process (i.e. their permission check is being
done on time of use, not time of open). This breaks asynchronous
messaging in Samba.

I've logged this as a bug with Oracle, but I don't expect
this will ever get fixed.

It does mean that we should probably stop caring about
Solaris long term and put it in the same bucket as HPUX/IRIX/
AIX/True64 and the like. Sad, but practical.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [WIP] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
On Fri, Mar 31, 2017 at 01:23:32PM -0700, Jeremy Allison via samba-technical wrote:

> On Sat, Apr 01, 2017 at 09:05:48AM +1300, Andrew Bartlett via samba-technical wrote:
> >
> > I've now done that.  This feels closer to the correct solution, but
> > throws solaris10 under the bus for now.  See the attached patch.
>
> Solaris10 is already under the bus wheels. They incorrectly
> prevent an existing open UNIX domain socket from being accessed
> from a non-root process (i.e. their permission check is being
> done on time of use, not time of open). This breaks asynchronous
> messaging in Samba.
>
> I've logged this as a bug with Oracle, but I don't expect
> this will ever get fixed.
>
> It does mean that we should probably stop caring about
> Solaris long term and put it in the same bucket as HPUX/IRIX/
> AIX/True64 and the like. Sad, but practical.

In case anyone is interested, here's the bug report:

https://bugzilla.samba.org/show_bug.cgi?id=12429

(complete with nifty test program).

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [WIP] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Sat, Apr 01, 2017 at 09:05:48AM +1300, Andrew Bartlett via samba-technical wrote:
> + } else {
> + TDB_LOG((tdb, TDB_DEBUG_ERROR, "tdb_transaction_start: failed to get transaction lock\n"));

I know this is nothing to do with the patch, but I've finally
blown a gasket on this...

WILL YOU PLEASE FIX YOUR EDITOR TO FLAG LINES > 80 COLUMNS !!!!!!!!!!

*Everyone* else does this. Please do the same.

It's getting to the point where your patches are making my
eyes bleed :-) :-) :-).

On another less shouty note - well done on the fix itself.
It seems amazingly useful and good work.

Cheers,

        Jeremy.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [WIP] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Fri, 2017-03-31 at 13:23 -0700, Jeremy Allison wrote:

> On Sat, Apr 01, 2017 at 09:05:48AM +1300, Andrew Bartlett via samba-
> technical wrote:
> >
> > I've now done that.  This feels closer to the correct solution, but
> > throws solaris10 under the bus for now.  See the attached patch.
>
> Solaris10 is already under the bus wheels. They incorrectly
> prevent an existing open UNIX domain socket from being accessed
> from a non-root process (i.e. their permission check is being
> done on time of use, not time of open). This breaks asynchronous
> messaging in Samba.
>
> I've logged this as a bug with Oracle, but I don't expect
> this will ever get fixed.
>
> It does mean that we should probably stop caring about
> Solaris long term and put it in the same bucket as HPUX/IRIX/
> AIX/True64 and the like. Sad, but practical.
Thanks.  I wasn't at all sure how to translate the few details in the
commit message back into a test case and then workaround.

This set of patches passes a full make test for me 3 times over (it
wouldn't shock me if this fixes some flapping tests).

Graphs of the perf tests with these patches attached.  The performance
improvement for un-indexed searches on the AD DC is really impressive!

Aside from a good delay so others can comment, what would you recommend
I do next towards progressing this patch?

Thanks,

Andrew Bartlett

--
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba

ldb-tdb-locking.pdf (71K) Download Attachment
ldb-tdb-locking-abs.pdf (72K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [WIP] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Fri, 2017-03-31 at 15:05 -0700, Jeremy Allison wrote:
> On Sat, Apr 01, 2017 at 09:05:48AM +1300, Andrew Bartlett via samba-
> technical wrote:
> > + } else {
> > + TDB_LOG((tdb, TDB_DEBUG_ERROR,
> > "tdb_transaction_start: failed to get transaction lock\n"));
>
> I know this is nothing to do with the patch, but I've finally
> blown a gasket on this...

Can I get you a new one?  Shouldn't be very expensive down the hardware
store.

Thanks,

Andrew Bartlett

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [WIP] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
I didn't expect that the performance improvement would be so large. It
goes to show that there was never any reason for unindexed searches to
be orders of magnitude slower. The fact that unindexed and indexed
searches are now reasonably comparable shows that we probably do need to
be working on our indexing schemes.


On 01/04/17 18:44, Andrew Bartlett via samba-technical wrote:

> On Fri, 2017-03-31 at 13:23 -0700, Jeremy Allison wrote:
>> On Sat, Apr 01, 2017 at 09:05:48AM +1300, Andrew Bartlett via samba-
>> technical wrote:
>>> I've now done that.  This feels closer to the correct solution, but
>>> throws solaris10 under the bus for now.  See the attached patch.
>> Solaris10 is already under the bus wheels. They incorrectly
>> prevent an existing open UNIX domain socket from being accessed
>> from a non-root process (i.e. their permission check is being
>> done on time of use, not time of open). This breaks asynchronous
>> messaging in Samba.
>>
>> I've logged this as a bug with Oracle, but I don't expect
>> this will ever get fixed.
>>
>> It does mean that we should probably stop caring about
>> Solaris long term and put it in the same bucket as HPUX/IRIX/
>> AIX/True64 and the like. Sad, but practical.
> Thanks.  I wasn't at all sure how to translate the few details in the
> commit message back into a test case and then workaround.
>
> This set of patches passes a full make test for me 3 times over (it
> wouldn't shock me if this fixes some flapping tests).
>
> Graphs of the perf tests with these patches attached.  The performance
> improvement for un-indexed searches on the AD DC is really impressive!
>
> Aside from a good delay so others can comment, what would you recommend
> I do next towards progressing this patch?
>
> Thanks,
>
> Andrew Bartlett
>


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [WIP] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Fri, 2017-03-31 at 15:05 -0700, Jeremy Allison via samba-technical
wrote:
> On Sat, Apr 01, 2017 at 09:05:48AM +1300, Andrew Bartlett via samba-
> technical wrote:
> > + } else {
> > + TDB_LOG((tdb, TDB_DEBUG_ERROR,
> > "tdb_transaction_start: failed to get transaction lock\n"));
>
> I know this is nothing to do with the patch, but I've finally
> blown a gasket on this...

I'm sorry you feel this way.

In terms of my code formatting, I work strongly for clarity and
consistency.  If look at this file, you will see all the debug messages
are formatted that way.  

Naturally I've adapted that line (only) in the updated patches I sent
metze today.

I totally agree that having a globally consistent style in Samba would
be a good thing, and we actually do pretty well, which is why I find
outbursts like this quite as frustrating as you find what you feel is
deliberate non-conformance.  

It isn't that I'm out to 'break the rules', it is actually that I just
code primarily to match what is already present, secondly to whatever
is the clearest expression for the task.  Coding to our standard style
comes naturally when the rest of the file already conforms, and the
problem space permits, but when it doesn't, or was written by others
who didn't take them as strictly as you do, we end up in a pickle like
this.

In short, all things in moderation, even 80 column limits. :-)

Thanks,

Andrew Bartlett

--
Andrew Bartlett
https://samba.org/~abartlet/
Authentication Developer, Samba Team         https://samba.org
Samba Development and Support, Catalyst IT  
https://catalyst.net.nz/services/samba





Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [WIP] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
On Mon, Apr 03, 2017 at 04:48:56PM +1200, Andrew Bartlett via samba-technical wrote:
> In short, all things in moderation, even 80 column limits. :-)

Here I respectfully disagree. 80 columns are not negotiatable for new
code. In particular old code that has deeply nested control structures
needs refactoring before we can do real modifications.

If you want to know why I am so dogmatic about this, I can only
highly recommend reading the chapter "Minimizing Control Structures"
from the classic "Thinking Forth", available for free online under
http://thinking-forth.sourceforge.net/. Even if you are not literate in
Forth (I am no longer by any means), this is a really enlightening book
on code style.

The initial section of that chapter begins with:

> The use of conditional structures adds complexity to your code. The
> more complex your code is, the harder it will be for you to read and
> to maintain. The more parts a machine has, the greater are its
> chances of breaking down. And the harder it is for someone to fix.

This is also the reason why I have my regular little battle with
Jeremy about passing booleans down, but that is another story :-)

Volker

--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:[hidden email]

Samba eXPerience 2017, Hotel Freizeit In
2nd-4th of May 2017, http://sambaXP.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [WIP] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Mon, Apr 03, 2017 at 04:48:56PM +1200, Andrew Bartlett wrote:

> On Fri, 2017-03-31 at 15:05 -0700, Jeremy Allison via samba-technical
> wrote:
> > On Sat, Apr 01, 2017 at 09:05:48AM +1300, Andrew Bartlett via samba-
> > technical wrote:
> > > + } else {
> > > + TDB_LOG((tdb, TDB_DEBUG_ERROR,
> > > "tdb_transaction_start: failed to get transaction lock\n"));
> >
> > I know this is nothing to do with the patch, but I've finally
> > blown a gasket on this...
>
> I'm sorry you feel this way.
>
> In terms of my code formatting, I work strongly for clarity and
> consistency.  If look at this file, you will see all the debug messages
> are formatted that way.  
>
> Naturally I've adapted that line (only) in the updated patches I sent
> metze today.
>
> I totally agree that having a globally consistent style in Samba would
> be a good thing, and we actually do pretty well, which is why I find
> outbursts like this quite as frustrating as you find what you feel is
> deliberate non-conformance.  
>
> It isn't that I'm out to 'break the rules', it is actually that I just
> code primarily to match what is already present, secondly to whatever
> is the clearest expression for the task.  Coding to our standard style
> comes naturally when the rest of the file already conforms, and the
> problem space permits, but when it doesn't, or was written by others
> who didn't take them as strictly as you do, we end up in a pickle like
> this.
>
> In short, all things in moderation, even 80 column limits. :-)

That's where we disagree. I don't want to keep existing
style in existing files when updates are done. One of the
good reasons for changing to >80 columns is it become a
visual signature in a file if there are 80+ column lines,
and <80+ column lines is that the code here has been
updated. No it isn't good enough to track what was done
(git blame rules :-) but it's enough to raise awareness.

I know you think this is a silly rule, but details
*matter*, and enough of the rest of the Team members
adhere to it that when someone doesn't it really stands
out. We can always vote on it being a silly rule and
agree to drop it if that is the majority opinion.

But right now we should *always* code to our standard
style in new code. That way, eventually the whole project
will get upgraded piece by piece :-).
l

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

On code formatting (and testing)

Samba - samba-technical mailing list
On Mon, 2017-04-03 at 09:18 -0700, Jeremy Allison via samba-technical
wrote:

> On Mon, Apr 03, 2017 at 04:48:56PM +1200, Andrew Bartlett wrote:
> > On Fri, 2017-03-31 at 15:05 -0700, Jeremy Allison via samba-
> > technical
> > wrote:
> >
> > I'm sorry you feel this way.
> >
> > In terms of my code formatting, I work strongly for clarity and
> > consistency.  If look at this file, you will see all the debug
> > messages
> > are formatted that way.  
> >
> > Naturally I've adapted that line (only) in the updated patches I
> > sent
> > metze today. 
> >
> > I totally agree that having a globally consistent style in Samba
> > would
> > be a good thing, and we actually do pretty well, which is why I
> > find
> > outbursts like this quite as frustrating as you find what you feel
> > is
> > deliberate non-conformance.  
> >
> > It isn't that I'm out to 'break the rules', it is actually that I
> > just
> > code primarily to match what is already present, secondly to
> > whatever
> > is the clearest expression for the task.  Coding to our standard
> > style
> > comes naturally when the rest of the file already conforms, and the
> > problem space permits, but when it doesn't, or was written by
> > others
> > who didn't take them as strictly as you do, we end up in a pickle
> > like
> > this.
> >
> > In short, all things in moderation, even 80 column limits. :-)
>
> That's where we disagree. I don't want to keep existing
> style in existing files when updates are done. One of the
> good reasons for changing to >80 columns is it become a
> visual signature in a file if there are 80+ column lines,
> and <80+ column lines is that the code here has been
> updated. No it isn't good enough to track what was done
> (git blame rules :-) but it's enough to raise awareness.
>
> I know you think this is a silly rule, but details
> *matter*, and enough of the rest of the Team members
> adhere to it that when someone doesn't it really stands
> out. We can always vote on it being a silly rule and
> agree to drop it if that is the majority opinion.
>
> But right now we should *always* code to our standard
> style in new code. That way, eventually the whole project
> will get upgraded piece by piece :-).

No, that way we end up with a dog's breakfast.

New files, substantial rewrites, fine!  Improving totally unreadable
code prior to starting, reasonable!  Mixing old and new style line-by-
line, really?

That is why I've always opposed rules 'for new code' that don't come
with patches making at least a majority of our existing code comply.

We have seen how to do that well with the work Garming did early in his
time to expand some of the hidden return macros.

Now, there is something I do feel as passionate about, and that is the
amount of 'obviously correct' code that lands without tests.  We have a
great record in Samba for automated testing, but I do wish we had as
much zeal for comprehensive unit and integration tests as we do for
patch-perfecting.  

All things in moderation?

Thanks,

Andrew Bartlett

--
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: On code formatting (and testing)

Samba - samba-technical mailing list
On Tue, Apr 04, 2017 at 07:20:28AM +1200, Andrew Bartlett wrote:
>
> No, that way we end up with a dog's breakfast.
>
> New files, substantial rewrites, fine!  Improving totally unreadable
> code prior to starting, reasonable!  Mixing old and new style line-by-
> line, really?

Yep. We're going to have to agree to disagree on that one then :-).

> That is why I've always opposed rules 'for new code' that don't come
> with patches making at least a majority of our existing code comply.
>
> We have seen how to do that well with the work Garming did early in his
> time to expand some of the hidden return macros.
>
> Now, there is something I do feel as passionate about, and that is the
> amount of 'obviously correct' code that lands without tests.  We have a
> great record in Samba for automated testing, but I do wish we had as
> much zeal for comprehensive unit and integration tests as we do for
> patch-perfecting.  

Now *that's* something we can both agree on :-).

> All things in moderation?

Even moderation itself ? :-) :-).

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [WIP] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Sat, 2017-04-01 at 18:44 +1300, Andrew Bartlett via samba-technical
wrote:
>
> Aside from a good delay so others can comment, what would you
> recommend
> I do next towards progressing this patch?

Can I please get a good eye over and a team review for this patch?

Thanks,

Andrew Bartlett

--
Andrew Bartlett
https://samba.org/~abartlet/
Authentication Developer, Samba Team         https://samba.org
Samba Development and Support, Catalyst IT  
https://catalyst.net.nz/services/samba




0001-tdb-Improve-debugging-when-the-allrecord-lock-fails-.patch (2K) Download Attachment
0002-tdb-Remove-locking-from-tdb_traverse_read.patch (5K) Download Attachment
0003-tdb-Improve-debugging-in-_tdb_transaction_start.patch (1K) Download Attachment
0004-tdb-Release-version-1.3.13.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
On Wed, 2017-04-05 at 10:42 +1200, Andrew Bartlett wrote:

> On Sat, 2017-04-01 at 18:44 +1300, Andrew Bartlett via samba-
> technical
> wrote:
> >
> > Aside from a good delay so others can comment, what would you
> > recommend
> > I do next towards progressing this patch?
>
> Can I please get a good eye over and a team review for this patch?
>
> Thanks,
>
> Andrew Bartlett
The attached patch takes on board the 80 column feedback (I've also
found an emacs mode that is helpful but unobtrusive).

I hope this meets with your approval, as it makes a massive difference
to our Samba AD DC performance.

Please review.  If reviewed, I'll push with a patch that adds new
performance tests that I'm keen to get in.

Thanks,

Andrew Bartlett

--
Andrew Bartlett
https://samba.org/~abartlet/
Authentication Developer, Samba Team         https://samba.org
Samba Development and Support, Catalyst IT  
https://catalyst.net.nz/services/samba




0001-tdb-Improve-debugging-when-the-allrecord-lock-fails-.patch (2K) Download Attachment
0002-tdb-Remove-locking-from-tdb_traverse_read.patch (5K) Download Attachment
0003-tdb-Improve-debugging-in-_tdb_transaction_start.patch (1K) Download Attachment
0004-tdb-Release-version-1.3.13.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
On Wed, Apr 05, 2017 at 11:50:16AM +1200, Andrew Bartlett via samba-technical wrote:
> The attached patch takes on board the 80 column feedback (I've also
> found an emacs mode that is helpful but unobtrusive).
>
> I hope this meets with your approval, as it makes a massive difference
> to our Samba AD DC performance.
>
> Please review.  If reviewed, I'll push with a patch that adds new
> performance tests that I'm keen to get in.

Looking.

Volker

--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:[hidden email]

Samba eXPerience 2017, Hotel Freizeit In
2nd-4th of May 2017, http://sambaXP.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
Am 05.04.2017 um 01:50 schrieb Andrew Bartlett via samba-technical:

> On Wed, 2017-04-05 at 10:42 +1200, Andrew Bartlett wrote:
>> On Sat, 2017-04-01 at 18:44 +1300, Andrew Bartlett via samba-
>> technical
>> wrote:
>>>
>>> Aside from a good delay so others can comment, what would you
>>> recommend
>>> I do next towards progressing this patch?
>>
>> Can I please get a good eye over and a team review for this patch?
>>
>> Thanks,
>>
>> Andrew Bartlett
>
> The attached patch takes on board the 80 column feedback (I've also
> found an emacs mode that is helpful but unobtrusive).
>
> I hope this meets with your approval, as it makes a massive difference
> to our Samba AD DC performance.
>
> Please review.  If reviewed, I'll push with a patch that adds new
> performance tests that I'm keen to get in.
I'm wondering about all the readonly checks in
_tdb_transaction_prepare_commit(),
we already handle that in _tdb_transaction_start().

I'm a bit nervous about the solaris10 problem.
Ralph gave me access to a solaris 10 box, I'll try to reproduce the original
problem there and check if we deadlock with the current code.

metze


signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
On Wed, 2017-04-05 at 11:51 +0200, Stefan Metzmacher via samba-
technical wrote:

> Am 05.04.2017 um 01:50 schrieb Andrew Bartlett via samba-technical:
> > On Wed, 2017-04-05 at 10:42 +1200, Andrew Bartlett wrote:
> > > On Sat, 2017-04-01 at 18:44 +1300, Andrew Bartlett via samba-
> > > technical
> > > wrote:
> > > >
> > > > Aside from a good delay so others can comment, what would you
> > > > recommend
> > > > I do next towards progressing this patch?
> > >
> > > Can I please get a good eye over and a team review for this
> > > patch?
> > >
> > > Thanks,
> > >
> > > Andrew Bartlett
> >
> > The attached patch takes on board the 80 column feedback (I've also
> > found an emacs mode that is helpful but unobtrusive). 
> >
> > I hope this meets with your approval, as it makes a massive
> > difference
> > to our Samba AD DC performance.
> >
> > Please review.  If reviewed, I'll push with a patch that adds new
> > performance tests that I'm keen to get in. 
>
> I'm wondering about all the readonly checks in
> _tdb_transaction_prepare_commit(),
> we already handle that in _tdb_transaction_start().
>
> I'm a bit nervous about the solaris10 problem.

I am to.  I only got game to formally propose it when Jeremy
essentially proclaimed it dead :-)

> Ralph gave me access to a solaris 10 box, I'll try to reproduce the
> original
> problem there and check if we deadlock with the current code.

For the record (I included the git hashes of the history because it was
non-trivial to dig up), do you know or remember what the original
problem was?

Thanks,

Andrew Bartlett

--
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] TDB traverse lock changes for massive AD DC perf improvement

Samba - samba-technical mailing list
Hi Andrew,

>>> Please review.  If reviewed, I'll push with a patch that adds new
>>> performance tests that I'm keen to get in.
>>
>> I'm wondering about all the readonly checks in
>> _tdb_transaction_prepare_commit(),
>> we already handle that in _tdb_transaction_start().
>>
>> I'm a bit nervous about the solaris10 problem.
>
> I am to.  I only got game to formally propose it when Jeremy
> essentially proclaimed it dead :-)
I discussed this with Volker and we think we have an understanding
of what the solaris problem might be.

The pattern with the traverse_read and prepare_commit interaction is
the following:

1. transaction_start got the allrecord lock with F_RDLCK.

2. the traverse_read code walks the database in a sequence like this
(per chain):
   2.1  chainlock(chainX, F_RDLCK)
   2.2  recordlock(chainX.record1, F_RDLCK)
   2.3  chainunlock(chainX, F_RDLCK)
   2.4  callback(chainX.record1)
   2.5  chainlock(chainX, F_RDLCK)
   2.6  recordunlock(chainX.record1, F_RDLCK)
   2.7  recordlock(chainX.record2, F_RDLCK)
   2.8  chainunlock(chainX, F_RDLCK)
   2.9  callback(chainX.record2)
   2.10 chainlock(chainX, F_RDLCK)
   2.11 recordunlock(chainX.record2, F_RDLCK)
   2.12 chainunlock(chainX, F_RDLCK)
   2.13 goto next chain

   So it has always one record locked in F_RDLCK mode and tries to
   get the 2nd one before it releases the first one.

3. prepare_commit tries to upgrade the allrecord lock to F_RWLCK
   If that happens at the time of 2.4, the operation of
   2.5 may deadlock with the allrecord lock upgrade.
   On Linux step 2.5 works in order to make some progress with the
   locking, but on solaris it might fail because the kernel
   wants to satisfy the 1st lock requester before the 2nd one.

I think the first step is a standalone test that does this:

process1: F_RDLCK for ofs=0 len=2
process2: F_RDLCK for ofs=0 len=1
process1: upgrade ofs=0 len=2 to F_RWLCK (in blocking mode)
process2: F_RDLCK for ofs=1 len=1
process2: unlock ofs=0 len=2
process1: should continue at that point

Once we have such a test we can run it on several solaris, freebsd,
linux or whatever.

Then we can decide if we want a configure and/or runtime check for this.
And only avoid the transaction F_RDLCK lock in traverse_read if the kernel
behaves as expected.

Can you write such a standalone test?

metze


signature.asc (853 bytes) Download Attachment
12
Loading...