[PATCHES] Add 'net tdb' command to allow debugging of contended records in locking.tdb

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCHES] Add 'net tdb' command to allow debugging of contended records in locking.tdb

Samba - samba-technical mailing list
One problem in cluster environments can be contended access to the same
file, in which case ctdb has to transfer the locking.tdb record across
nodes for every open and close of that file.

ctdb detects this already as "hot record". The attached patches provide
a new command 'net tdb locking' that allows to identify the affected
file by quering the record.

Christof

patches (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCHES] Add 'net tdb' command to allow debugging of contended records in locking.tdb

Samba - samba-technical mailing list
Hi, Christof!

On Thu, Apr 13, 2017 at 01:00:19PM -0700, Christof Schmitt via samba-technical wrote:
> One problem in cluster environments can be contended access to the same
> file, in which case ctdb has to transfer the locking.tdb record across
> nodes for every open and close of that file.

Attached find two patches that you might want to take into
consideration. I haven't run a full autobuild on the patches, but if
you find my two to-SQUASH patches acceptable, Reviewed-By: me.

Volker

patch.txt (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCHES] Add 'net tdb' command to allow debugging of contended records in locking.tdb

Samba - samba-technical mailing list
On Tue, Apr 25, 2017 at 01:09:14PM +0200, [hidden email] wrote:

> Hi, Christof!
>
> On Thu, Apr 13, 2017 at 01:00:19PM -0700, Christof Schmitt via samba-technical wrote:
> > One problem in cluster environments can be contended access to the same
> > file, in which case ctdb has to transfer the locking.tdb record across
> > nodes for every open and close of that file.
>
> Attached find two patches that you might want to take into
> consideration. I haven't run a full autobuild on the patches, but if
> you find my two to-SQUASH patches acceptable, Reviewed-By: me.

Thank you. I added the two changes and will push the patches to
autobuild.

Christof

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCHES] Add 'net tdb' command to allow debugging of contended records in locking.tdb

Samba - samba-technical mailing list
On Wed, Apr 26, 2017 at 11:54:12AM -0700, Christof Schmitt via samba-technical wrote:

> On Tue, Apr 25, 2017 at 01:09:14PM +0200, [hidden email] wrote:
> > Hi, Christof!
> >
> > On Thu, Apr 13, 2017 at 01:00:19PM -0700, Christof Schmitt via samba-technical wrote:
> > > One problem in cluster environments can be contended access to the same
> > > file, in which case ctdb has to transfer the locking.tdb record across
> > > nodes for every open and close of that file.
> >
> > Attached find two patches that you might want to take into
> > consideration. I haven't run a full autobuild on the patches, but if
> > you find my two to-SQUASH patches acceptable, Reviewed-By: me.
>
> Thank you. I added the two changes and will push the patches to
> autobuild.
I tried to push these to autobuild, but get a consistent failure at:

[1689(10768)/2100 at 1h51m38s] samba4.blackbox.trust_ntlm(fl2008r2dc:local)
[1690(10790)/2100 at 1h51m39s] samba4.blackbox.trust_ntlm(fl2003dc:local)
[1691(10812)/2100 at 1h51m40s] samba4.blackbox.trust_ntlm(ad_member:local)
UNEXPECTED(error): samba4.blackbox.trust_ntlm.Test01 rpcclient getusername with LOCALADMEMBERtime: 2017-04-28 01:12:50.225336Z(ad_member:local)
REASON: Exception: Exception: Test was never started
UNEXPECTED(error): samba4.blackbox.trust_ntlm.Test01 rpcclient getusername with LOCALADMEMBERtime: 2017-04-28 01:12:50.223650Z(ad_member:local) (samba.subunit.RemotedTestCase)
REASON: was started but never finished!

FAILED (0 failures, 2 errors and 0 unexpected successes in 0 testsuites)


I am still debugging, something seems to be tripping up the tracking in
selftest/subunithelper.py. Running a 'make test' for the affected testcase and
the new net_tdb testcase works.

FYI, the attached patches are what i am trying to push.

Christof

patches (16K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCHES] Add 'net tdb' command to allow debugging of contended records in locking.tdb

Samba - samba-technical mailing list
On Fri, Apr 28, 2017 at 03:04:59PM -0700, Christof Schmitt via samba-technical wrote:

> On Wed, Apr 26, 2017 at 11:54:12AM -0700, Christof Schmitt via samba-technical wrote:
> > On Tue, Apr 25, 2017 at 01:09:14PM +0200, [hidden email] wrote:
> > > Hi, Christof!
> > >
> > > On Thu, Apr 13, 2017 at 01:00:19PM -0700, Christof Schmitt via samba-technical wrote:
> > > > One problem in cluster environments can be contended access to the same
> > > > file, in which case ctdb has to transfer the locking.tdb record across
> > > > nodes for every open and close of that file.
> > >
> > > Attached find two patches that you might want to take into
> > > consideration. I haven't run a full autobuild on the patches, but if
> > > you find my two to-SQUASH patches acceptable, Reviewed-By: me.
> >
> > Thank you. I added the two changes and will push the patches to
> > autobuild.
>
> I tried to push these to autobuild, but get a consistent failure at:
>
> [1689(10768)/2100 at 1h51m38s] samba4.blackbox.trust_ntlm(fl2008r2dc:local)
> [1690(10790)/2100 at 1h51m39s] samba4.blackbox.trust_ntlm(fl2003dc:local)
> [1691(10812)/2100 at 1h51m40s] samba4.blackbox.trust_ntlm(ad_member:local)
> UNEXPECTED(error): samba4.blackbox.trust_ntlm.Test01 rpcclient getusername with LOCALADMEMBERtime: 2017-04-28 01:12:50.225336Z(ad_member:local)
> REASON: Exception: Exception: Test was never started
> UNEXPECTED(error): samba4.blackbox.trust_ntlm.Test01 rpcclient getusername with LOCALADMEMBERtime: 2017-04-28 01:12:50.223650Z(ad_member:local) (samba.subunit.RemotedTestCase)
> REASON: was started but never finished!
>
> FAILED (0 failures, 2 errors and 0 unexpected successes in 0 testsuites)
>
>
> I am still debugging, something seems to be tripping up the tracking in
> selftest/subunithelper.py. Running a 'make test' for the affected testcase and
> the new net_tdb testcase works.

FYI,

the same problem also hits when trying to push a different patchset (the
idmap_rfc2307 fixes). The exact same problem also hits on a private
autobuild build with just some debugging enabled. The output after
"rpcclient getusername with" is suspicious; it is supposed to be the
user name, but here it shows the test environment and then also adds the
timestamp. Adding the timestamp in the name is likely the cause for the
subunithelp.py not being able to identify the running test. I am still
trying to find out why this is messed up in the first place.

Christof

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH] testprogs: Ignore escape characters when printing test name (was Re: [PATCHES] Add 'net tdb' command to allow debugging of contended records in locking.tdb)

Samba - samba-technical mailing list
On Tue, May 02, 2017 at 04:07:11PM -0700, Christof Schmitt via samba-technical wrote:

> On Fri, Apr 28, 2017 at 03:04:59PM -0700, Christof Schmitt via samba-technical wrote:
> > [1689(10768)/2100 at 1h51m38s] samba4.blackbox.trust_ntlm(fl2008r2dc:local)
> > [1690(10790)/2100 at 1h51m39s] samba4.blackbox.trust_ntlm(fl2003dc:local)
> > [1691(10812)/2100 at 1h51m40s] samba4.blackbox.trust_ntlm(ad_member:local)
> > UNEXPECTED(error): samba4.blackbox.trust_ntlm.Test01 rpcclient getusername with LOCALADMEMBERtime: 2017-04-28 01:12:50.225336Z(ad_member:local)
> > REASON: Exception: Exception: Test was never started
> > UNEXPECTED(error): samba4.blackbox.trust_ntlm.Test01 rpcclient getusername with LOCALADMEMBERtime: 2017-04-28 01:12:50.223650Z(ad_member:local) (samba.subunit.RemotedTestCase)
> > REASON: was started but never finished!
> >
> > FAILED (0 failures, 2 errors and 0 unexpected successes in 0 testsuites)
> >
> >
> > I am still debugging, something seems to be tripping up the tracking in
> > selftest/subunithelper.py. Running a 'make test' for the affected testcase and
> > the new net_tdb testcase works.
>
> FYI,
>
> the same problem also hits when trying to push a different patchset (the
> idmap_rfc2307 fixes). The exact same problem also hits on a private
> autobuild build with just some debugging enabled. The output after
> "rpcclient getusername with" is suspicious; it is supposed to be the
> user name, but here it shows the test environment and then also adds the
> timestamp. Adding the timestamp in the name is likely the cause for the
> subunithelp.py not being able to identify the running test. I am still
> trying to find out why this is messed up in the first place.
Found the issue. This only triggers for users starting with the
letter c, and that would only be by user on sn-devel. See attached
patch.

Christof
Loading...