[PATCH] Avoid CTDB daemon deadlock while reading db sequence number (bug 13021)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Avoid CTDB daemon deadlock while reading db sequence number (bug 13021)

Samba - samba-technical mailing list
Hi,

Once the recovery starts and databases are frozen, then all the record
access is postponed till the recovery is complete except reading the
database sequence number.  Database access for reading sequence number
is done via a control which does not check if the databases are frozen
or not.

If the database is frozen and if the freeze transaction is not started
(this can happen when a node is inactive, or during recovery when the
database is frozen but the transaction has not yet started), then trying
to read sequence number will cause ctdb daemon to deadlock.

Before reading the sequence number, check if the database access is
allowed.

Please review and push.

Amitay.

ctdb.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Avoid CTDB daemon deadlock while reading db sequence number (bug 13021)

Samba - samba-technical mailing list
On Mon, Sep 11, 2017 at 01:59:02PM +1000, Amitay Isaacs via samba-technical wrote:
> Once the recovery starts and databases are frozen, then all the record
> access is postponed till the recovery is complete except reading the
> database sequence number.  Database access for reading sequence number
> is done via a control which does not check if the databases are frozen
> or not.

Doesn't this depends on the lock helper process to go away in time
when being asked to? Chouldn't we also do a tdb_chainlock_nonblock in
the parent to avoid any problems with races?

Thanks,

Volker

--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Avoid CTDB daemon deadlock while reading db sequence number (bug 13021)

Samba - samba-technical mailing list
On Mon, Sep 11, 2017 at 8:13 PM, Volker Lendecke <[hidden email]>
wrote:

> On Mon, Sep 11, 2017 at 01:59:02PM +1000, Amitay Isaacs via
> samba-technical wrote:
> > Once the recovery starts and databases are frozen, then all the record
> > access is postponed till the recovery is complete except reading the
> > database sequence number.  Database access for reading sequence number
> > is done via a control which does not check if the databases are frozen
> > or not.
>
> Doesn't this depends on the lock helper process to go away in time
> when being asked to? Chouldn't we also do a tdb_chainlock_nonblock in
> the parent to avoid any problems with races?
>
>
CTDB daemon uses tdb_chainlock_nonblock() when it's trying to migrate
records in the
processing of ctdb_req_call.  However, there are few places where ctdb
fetches a
record when it expects to be able to get record lock.  One of such places
is reading
the sequence number.

In principle I do agree that any record locks in ctdb daemon should use
non-blocking version. I would not like to make a sweeping change for
all record locks since we don't have sufficient tests for the ctdb daemon.

I will keep this in mind when splitting the database daemon.

Amitay.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Avoid CTDB daemon deadlock while reading db sequence number (bug 13021)

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Mon, 11 Sep 2017 13:59:02 +1000, Amitay Isaacs via samba-technical
<[hidden email]> wrote:

> Once the recovery starts and databases are frozen, then all the record
> access is postponed till the recovery is complete except reading the
> database sequence number.  Database access for reading sequence number
> is done via a control which does not check if the databases are frozen
> or not.
>
> If the database is frozen and if the freeze transaction is not started
> (this can happen when a node is inactive, or during recovery when the
> database is frozen but the transaction has not yet started), then trying
> to read sequence number will cause ctdb daemon to deadlock.
>
> Before reading the sequence number, check if the database access is
> allowed.
>
> Please review and push.

Reviewed-by: Martin Schwenke <[hidden email]>

Mega-push coming...  :-)

peace & happiness,
martin