[PATCH] tdb: runtime check for robust mutexes may hang

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] tdb: runtime check for robust mutexes may hang

Samba - samba-technical mailing list
Hi!

Attached is a fix for bug:

<https://bugzilla.samba.org/show_bug.cgi?id=12593>

Calling tdb_runtime_check_for_robust_mutexes() in multithreaded programs can
hang in sigsuspend() due to lost SIGCHLD.

Example stack back-trace from the bugreport:

  Thread 5 (Thread 0x7f2ea5037700 (LWP 1054)):
  #0  0x00007f2eb6325506 in sigsuspend () at /usr/lib/libc.so.6
  #1  0x00007f2ead463802 in tdb_runtime_check_for_robust_mutexes () at /usr/lib/libtdb.so.1
  #2  0x00007f2eade9b05d in tdb_wrap_open () at /usr/lib/samba/libtdb-wrap-samba4.so
  #3  0x00007f2eb4e49b21 in  () at /usr/lib/libsmbconf.so.0
  #4  0x00007f2eb4e4a305 in gencache_parse () at /usr/lib/libsmbconf.so.0
  #5  0x00007f2eb4e4a862 in gencache_get_data_blob () at /usr/lib/libsmbconf.so.0
  #6  0x00007f2eb4e4a90b in gencache_get () at /usr/lib/libsmbconf.so.0
  #7  0x00007f2eb422f74a in sitename_fetch () at /usr/lib/samba/libgse-samba4.so
  #8  0x00007f2eb422d7da in resolve_name () at /usr/lib/samba/libgse-samba4.so
  #9  0x00007f2eb762a5e2 in  () at /usr/lib/libsmbclient.so.0
  #10 0x0000000000406ebd in do_mount (...)
  #11 0x00007f2eb7407f4a in g_vfs_job_run (job=0x12b72b0 [GVfsJobMount]) ...
  #12 0x00007f2eb6920c9e in g_thread_pool_thread_proxy (data=<optimized out>) ...
  #13 0x00007f2eb69202a5 in g_thread_proxy (data=0x7f2e98004720) at gthread.c:784
  #14 0x00007f2eb6697444 in start_thread () at /usr/lib/libpthread.so.0
  #15 0x00007f2eb63d9cff in clone () at /usr/lib/libc.so.6

As there's no POSIX function to change the signal mask of all threads in a
process, this problem can't be solved by a change to
tdb_runtime_check_for_robust_mutexes().

Attached patch *moves* the runtime check to a library initializer which
guarantees that only one thread exists in the process when the function is run.

tdb_runtime_check_for_robust_mutexes() is marked as deprecated but not removed
from the API in order to prevent a incompatible ABI change that requires a major
version number bump (which could be done of course).

I'm adding a new function to the API that merely fetches the stored result of
the runtime check done at library initialisation. All existing callers of
tdb_runtime_check_for_robust_mutexes() are changed to call the new function.

This means we're essentially tying the support for robust mutexes to compiler
support for library initializers via a contructor attribute.

Again:

this means we're essentially tying the support for robust mutexes to compiler
support for library initializers via a contructor attribute.

sleep(60);

To the best of my knowledge, only Linux, FreeBSD and Solaris have support for
robust mutexes and the following OS/compiler combinations have been successfully
tested with this change:

o Linux with gcc and clang

o FreeBSD with gcc and clang

o Solaris with gcc

Having code still directly or indirectly calling
tdb_runtime_check_for_robust_mutexes() just doesn't feel right as we know it's
broken by design for multi-threaded programs and we're already using threads
under the hoods in some places.

Thoughts? Please review and comment, but don't push yet, I believe this needs
some time to sink into the minds of everyone interested.

Cheerio!
-slow

patch (21K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] tdb: runtime check for robust mutexes may hang

Samba - samba-technical mailing list
Hi Ralph,

> Attached is a fix for bug:
>
> <https://bugzilla.samba.org/show_bug.cgi?id=12593>
>
> Calling tdb_runtime_check_for_robust_mutexes() in multithreaded programs can
> hang in sigsuspend() due to lost SIGCHLD.
>
> Example stack back-trace from the bugreport:
>
>   Thread 5 (Thread 0x7f2ea5037700 (LWP 1054)):
>   #0  0x00007f2eb6325506 in sigsuspend () at /usr/lib/libc.so.6
>   #1  0x00007f2ead463802 in tdb_runtime_check_for_robust_mutexes () at /usr/lib/libtdb.so.1
>   #2  0x00007f2eade9b05d in tdb_wrap_open () at /usr/lib/samba/libtdb-wrap-samba4.so
>   #3  0x00007f2eb4e49b21 in  () at /usr/lib/libsmbconf.so.0
>   #4  0x00007f2eb4e4a305 in gencache_parse () at /usr/lib/libsmbconf.so.0
>   #5  0x00007f2eb4e4a862 in gencache_get_data_blob () at /usr/lib/libsmbconf.so.0
>   #6  0x00007f2eb4e4a90b in gencache_get () at /usr/lib/libsmbconf.so.0
>   #7  0x00007f2eb422f74a in sitename_fetch () at /usr/lib/samba/libgse-samba4.so
>   #8  0x00007f2eb422d7da in resolve_name () at /usr/lib/samba/libgse-samba4.so
>   #9  0x00007f2eb762a5e2 in  () at /usr/lib/libsmbclient.so.0
>   #10 0x0000000000406ebd in do_mount (...)
>   #11 0x00007f2eb7407f4a in g_vfs_job_run (job=0x12b72b0 [GVfsJobMount]) ...
>   #12 0x00007f2eb6920c9e in g_thread_pool_thread_proxy (data=<optimized out>) ...
>   #13 0x00007f2eb69202a5 in g_thread_proxy (data=0x7f2e98004720) at gthread.c:784
>   #14 0x00007f2eb6697444 in start_thread () at /usr/lib/libpthread.so.0
>   #15 0x00007f2eb63d9cff in clone () at /usr/lib/libc.so.6
>
> As there's no POSIX function to change the signal mask of all threads in a
> process, this problem can't be solved by a change to
> tdb_runtime_check_for_robust_mutexes().
>
> Attached patch *moves* the runtime check to a library initializer which
> guarantees that only one thread exists in the process when the function is run.
>
> tdb_runtime_check_for_robust_mutexes() is marked as deprecated but not removed
> from the API in order to prevent a incompatible ABI change that requires a major
> version number bump (which could be done of course).
>
> I'm adding a new function to the API that merely fetches the stored result of
> the runtime check done at library initialisation. All existing callers of
> tdb_runtime_check_for_robust_mutexes() are changed to call the new function.
>
> This means we're essentially tying the support for robust mutexes to compiler
> support for library initializers via a contructor attribute.
>
> Again:
>
> this means we're essentially tying the support for robust mutexes to compiler
> support for library initializers via a contructor attribute.
>
> sleep(60);
>
> To the best of my knowledge, only Linux, FreeBSD and Solaris have support for
> robust mutexes and the following OS/compiler combinations have been successfully
> tested with this change:
>
> o Linux with gcc and clang
>
> o FreeBSD with gcc and clang
>
> o Solaris with gcc
>
> Having code still directly or indirectly calling
> tdb_runtime_check_for_robust_mutexes() just doesn't feel right as we know it's
> broken by design for multi-threaded programs and we're already using threads
> under the hoods in some places.
>
> Thoughts? Please review and comment, but don't push yet, I believe this needs
> some time to sink into the minds of everyone interested.

I'd prefer to keep the patchset much simpler, I think we could
just call tdb_runtime_check_for_robust_mutexes() from
a library initializer. That's really all we need
tdb_runtime_check_for_robust_mutexes() is already a
run once function, which caches the result of it's first
run. The library initializers would most likely runs before
any thread is started.

If the platform supports robust mutexes it would most likely also
support library initializers. If library initializers are not supported,
we most likely also miss support for either threads or robust mutexes.

So just the simple fix to call tdb_runtime_check_for_robust_mutexes
from tdb_lib_init() should effectively catch all use cases, without
the need for any changes in existing callers.

Is there any real reason why we would need something more complex?

metze

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] tdb: runtime check for robust mutexes may hang

Samba - samba-technical mailing list
On Sun, Mar 12, 2017 at 03:18:55PM +0100, Stefan Metzmacher wrote:

> Hi Ralph,
>
> > Attached is a fix for bug:
> >
> > <https://bugzilla.samba.org/show_bug.cgi?id=12593>
> >
> > Calling tdb_runtime_check_for_robust_mutexes() in multithreaded programs can
> > hang in sigsuspend() due to lost SIGCHLD.
> >
> > Example stack back-trace from the bugreport:
> >
> >   Thread 5 (Thread 0x7f2ea5037700 (LWP 1054)):
> >   #0  0x00007f2eb6325506 in sigsuspend () at /usr/lib/libc.so.6
> >   #1  0x00007f2ead463802 in tdb_runtime_check_for_robust_mutexes () at /usr/lib/libtdb.so.1
> >   #2  0x00007f2eade9b05d in tdb_wrap_open () at /usr/lib/samba/libtdb-wrap-samba4.so
> >   #3  0x00007f2eb4e49b21 in  () at /usr/lib/libsmbconf.so.0
> >   #4  0x00007f2eb4e4a305 in gencache_parse () at /usr/lib/libsmbconf.so.0
> >   #5  0x00007f2eb4e4a862 in gencache_get_data_blob () at /usr/lib/libsmbconf.so.0
> >   #6  0x00007f2eb4e4a90b in gencache_get () at /usr/lib/libsmbconf.so.0
> >   #7  0x00007f2eb422f74a in sitename_fetch () at /usr/lib/samba/libgse-samba4.so
> >   #8  0x00007f2eb422d7da in resolve_name () at /usr/lib/samba/libgse-samba4.so
> >   #9  0x00007f2eb762a5e2 in  () at /usr/lib/libsmbclient.so.0
> >   #10 0x0000000000406ebd in do_mount (...)
> >   #11 0x00007f2eb7407f4a in g_vfs_job_run (job=0x12b72b0 [GVfsJobMount]) ...
> >   #12 0x00007f2eb6920c9e in g_thread_pool_thread_proxy (data=<optimized out>) ...
> >   #13 0x00007f2eb69202a5 in g_thread_proxy (data=0x7f2e98004720) at gthread.c:784
> >   #14 0x00007f2eb6697444 in start_thread () at /usr/lib/libpthread.so.0
> >   #15 0x00007f2eb63d9cff in clone () at /usr/lib/libc.so.6
> >
> > As there's no POSIX function to change the signal mask of all threads in a
> > process, this problem can't be solved by a change to
> > tdb_runtime_check_for_robust_mutexes().
> >
> > Attached patch *moves* the runtime check to a library initializer which
> > guarantees that only one thread exists in the process when the function is run.
> >
> > tdb_runtime_check_for_robust_mutexes() is marked as deprecated but not removed
> > from the API in order to prevent a incompatible ABI change that requires a major
> > version number bump (which could be done of course).
> >
> > I'm adding a new function to the API that merely fetches the stored result of
> > the runtime check done at library initialisation. All existing callers of
> > tdb_runtime_check_for_robust_mutexes() are changed to call the new function.
> >
> > This means we're essentially tying the support for robust mutexes to compiler
> > support for library initializers via a contructor attribute.
> >
> > Again:
> >
> > this means we're essentially tying the support for robust mutexes to compiler
> > support for library initializers via a contructor attribute.
> >
> > sleep(60);
> >
> > To the best of my knowledge, only Linux, FreeBSD and Solaris have support for
> > robust mutexes and the following OS/compiler combinations have been successfully
> > tested with this change:
> >
> > o Linux with gcc and clang
> >
> > o FreeBSD with gcc and clang
> >
> > o Solaris with gcc
> >
> > Having code still directly or indirectly calling
> > tdb_runtime_check_for_robust_mutexes() just doesn't feel right as we know it's
> > broken by design for multi-threaded programs and we're already using threads
> > under the hoods in some places.
> >
> > Thoughts? Please review and comment, but don't push yet, I believe this needs
> > some time to sink into the minds of everyone interested.
>
> I'd prefer to keep the patchset much simpler, I think we could
> just call tdb_runtime_check_for_robust_mutexes() from
> a library initializer. That's really all we need
> tdb_runtime_check_for_robust_mutexes() is already a
> run once function, which caches the result of it's first
> run. The library initializers would most likely runs before
> any thread is started.
>
> If the platform supports robust mutexes it would most likely also
> support library initializers. If library initializers are not supported,
> we most likely also miss support for either threads or robust mutexes.
>
> So just the simple fix to call tdb_runtime_check_for_robust_mutexes
> from tdb_lib_init() should effectively catch all use cases, without
> the need for any changes in existing callers.
>
> Is there any real reason why we would need something more complex?

yes: it leaves a mess behind.

Plus its still broken for the hypothetical case of a platform that supports
robust mutexes but doesn't support library initializers or uses some other
construct to set them up. My patchset would catch that and error out at compile
time.

Cheerio!
-slow

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] tdb: runtime check for robust mutexes may hang

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Sun, 2017-03-12 at 14:14 +0100, Ralph Böhme via samba-technical
wrote:

>
> As there's no POSIX function to change the signal mask of all threads
> in a
> process, this problem can't be solved by a change to
> tdb_runtime_check_for_robust_mutexes().
>
> Attached patch *moves* the runtime check to a library initializer
> which
> guarantees that only one thread exists in the process when the
> function is run.

Why can we assert this is true?

Surely if tdb is loaded into a program by being linked to a plugin, eg
dlopen(), the library initializer will run then, with threads already
running?

Andrew Bartlett

--
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] tdb: runtime check for robust mutexes may hang

Samba - samba-technical mailing list
On Mon, Mar 13, 2017 at 06:49:00AM +1300, Andrew Bartlett wrote:

> On Sun, 2017-03-12 at 14:14 +0100, Ralph Böhme via samba-technical
> wrote:
> >
> > As there's no POSIX function to change the signal mask of all threads
> > in a
> > process, this problem can't be solved by a change to
> > tdb_runtime_check_for_robust_mutexes().
> >
> > Attached patch *moves* the runtime check to a library initializer
> > which
> > guarantees that only one thread exists in the process when the
> > function is run.
>
> Why can we assert this is true?
>
> Surely if tdb is loaded into a program by being linked to a plugin, eg
> dlopen(), the library initializer will run then, with threads already
> running?

golly, you're right. I missed this case.

Back to the drawing board I guess, unless we decide we don't care about this use
case. We probably should care, might actually be used by the Gnome VFS, who
knows.

Cheerio!
-slow

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] tdb: runtime check for robust mutexes may hang

Samba - samba-technical mailing list
On Mon, Mar 13, 2017 at 10:57:50AM +0100, Ralph Böhme wrote:

> On Mon, Mar 13, 2017 at 06:49:00AM +1300, Andrew Bartlett wrote:
> > On Sun, 2017-03-12 at 14:14 +0100, Ralph Böhme via samba-technical
> > wrote:
> > >
> > > As there's no POSIX function to change the signal mask of all threads
> > > in a
> > > process, this problem can't be solved by a change to
> > > tdb_runtime_check_for_robust_mutexes().
> > >
> > > Attached patch *moves* the runtime check to a library initializer
> > > which
> > > guarantees that only one thread exists in the process when the
> > > function is run.
> >
> > Why can we assert this is true?
> >
> > Surely if tdb is loaded into a program by being linked to a plugin, eg
> > dlopen(), the library initializer will run then, with threads already
> > running?
>
> golly, you're right. I missed this case.
>
> Back to the drawing board I guess, unless we decide we don't care about this use
> case. We probably should care, might actually be used by the Gnome VFS, who
> knows.
attached patch uses a different approach, essentially going back to the previous
version of using waitpid() to rendezvous with the child, but fixing the toctou
race(s).

Thoughts? Uri (or whoever's interested), can you please take a look? I'd like as
many eyes as possible on this one. Thanks!

Cheerio!
-slow

bug12593-master.patch (6K) Download Attachment