[PATCH] Avoid using CTDB_BROADCAST_ALL (bug 13056)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Avoid using CTDB_BROADCAST_ALL (bug 13056)

Samba - samba-technical mailing list
Hi,

CTDB in some places uses CTDB_BROADCAST_ALL to
distribute packets to all the configured nodes.  If there is a
dead node (or a node that is not running) in the cluster,
then ctdb daemons will queue up packets for such nodes.
Over long periods of time, ctdb daemon can consume lots
of memory which will not be freed till the dead node is
started or ctdb daemon is restarted.

Use CTDB_BROADCAST_CONNECTED instead to send
packets to the nodes that are running.

Please review and push.

Amitay.

ctdb.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Avoid using CTDB_BROADCAST_ALL (bug 13056)

Samba - samba-technical mailing list
On Thu, Sep 28, 2017 at 03:19:28PM +1000, Amitay Isaacs via samba-technical wrote:

> Hi,
>
> CTDB in some places uses CTDB_BROADCAST_ALL to
> distribute packets to all the configured nodes.  If there is a
> dead node (or a node that is not running) in the cluster,
> then ctdb daemons will queue up packets for such nodes.
> Over long periods of time, ctdb daemon can consume lots
> of memory which will not be freed till the dead node is
> started or ctdb daemon is restarted.
>
> Use CTDB_BROADCAST_CONNECTED instead to send
> packets to the nodes that are running.

Pushed, thanks!

Volker

>
> Please review and push.
>
> Amitay.

> From e424f81a39c3e289f2d6456724edaa252e3968c9 Mon Sep 17 00:00:00 2001
> From: Amitay Isaacs <[hidden email]>
> Date: Thu, 28 Sep 2017 11:47:00 +1000
> Subject: [PATCH 1/2] ctdb-daemon: Send broadcast to connected nodes, not
>  configured nodes
>
> https://bugzilla.samba.org/show_bug.cgi?id=13056
>
> This avoids queueing packets for dead (or not running) nodes in the
> cluster.
>
> Signed-off-by: Amitay Isaacs <[hidden email]>
> ---
>  ctdb/server/ctdb_daemon.c      | 2 +-
>  ctdb/server/ctdb_ltdb_server.c | 5 +++--
>  ctdb/server/ctdb_recoverd.c    | 3 ++-
>  ctdb/server/ctdb_takeover.c    | 2 +-
>  4 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/ctdb/server/ctdb_daemon.c b/ctdb/server/ctdb_daemon.c
> index 185982e65bd..2929cd55339 100644
> --- a/ctdb/server/ctdb_daemon.c
> +++ b/ctdb/server/ctdb_daemon.c
> @@ -1052,7 +1052,7 @@ static void ctdb_setup_event_callback(struct ctdb_context *ctdb, int status,
>   ctdb_run_notification_script(ctdb, "setup");
>  
>   /* tell all other nodes we've just started up */
> - ctdb_daemon_send_control(ctdb, CTDB_BROADCAST_ALL,
> + ctdb_daemon_send_control(ctdb, CTDB_BROADCAST_CONNECTED,
>   0, CTDB_CONTROL_STARTUP, 0,
>   CTDB_CTRL_FLAG_NOREPLY,
>   tdb_null, NULL, NULL);
> diff --git a/ctdb/server/ctdb_ltdb_server.c b/ctdb/server/ctdb_ltdb_server.c
> index d83783854a0..c199aac2d1d 100644
> --- a/ctdb/server/ctdb_ltdb_server.c
> +++ b/ctdb/server/ctdb_ltdb_server.c
> @@ -1206,7 +1206,7 @@ int32_t ctdb_control_db_attach(struct ctdb_context *ctdb, TDB_DATA indata,
>   }
>  
>   /* tell all the other nodes about this database */
> - ctdb_daemon_send_control(ctdb, CTDB_BROADCAST_ALL, 0, opcode,
> + ctdb_daemon_send_control(ctdb, CTDB_BROADCAST_CONNECTED, 0, opcode,
>   0, CTDB_CTRL_FLAG_NOREPLY,
>   indata, NULL, NULL);
>  
> @@ -1260,7 +1260,8 @@ int32_t ctdb_control_db_detach(struct ctdb_context *ctdb, TDB_DATA indata,
>   client = reqid_find(ctdb->idr, client_id, struct ctdb_client);
>   if (client != NULL) {
>   /* forward the control to all the nodes */
> - ctdb_daemon_send_control(ctdb, CTDB_BROADCAST_ALL, 0,
> + ctdb_daemon_send_control(ctdb,
> + CTDB_BROADCAST_CONNECTED, 0,
>   CTDB_CONTROL_DB_DETACH, 0,
>   CTDB_CTRL_FLAG_NOREPLY,
>   indata, NULL, NULL);
> diff --git a/ctdb/server/ctdb_recoverd.c b/ctdb/server/ctdb_recoverd.c
> index 2b94fed7478..4ad88419ed6 100644
> --- a/ctdb/server/ctdb_recoverd.c
> +++ b/ctdb/server/ctdb_recoverd.c
> @@ -1573,7 +1573,8 @@ static int send_election_request(struct ctdb_recoverd *rec, uint32_t pnn)
>  
>   /* send an election message to all active nodes */
>   DEBUG(DEBUG_INFO,(__location__ " Send election request to all active nodes\n"));
> - return ctdb_client_send_message(ctdb, CTDB_BROADCAST_ALL, srvid, election_data);
> + return ctdb_client_send_message(ctdb, CTDB_BROADCAST_CONNECTED,
> + srvid, election_data);
>  }
>  
>  /*
> diff --git a/ctdb/server/ctdb_takeover.c b/ctdb/server/ctdb_takeover.c
> index 5e8aabfe80e..dcc7c41154e 100644
> --- a/ctdb/server/ctdb_takeover.c
> +++ b/ctdb/server/ctdb_takeover.c
> @@ -1970,7 +1970,7 @@ static int ctdb_send_set_tcp_tickles_for_ip(struct ctdb_context *ctdb,
>   memcpy(&list->connections[0], tcparray->connections, sizeof(struct ctdb_connection) * num);
>   }
>  
> - ret = ctdb_daemon_send_control(ctdb, CTDB_BROADCAST_ALL, 0,
> + ret = ctdb_daemon_send_control(ctdb, CTDB_BROADCAST_CONNECTED, 0,
>         CTDB_CONTROL_SET_TCP_TICKLE_LIST,
>         0, CTDB_CTRL_FLAG_NOREPLY, data, NULL, NULL);
>   if (ret != 0) {
> --
> 2.13.5
>
>
> From 41205a6656589b874b7c3d148d5d4506ec51dd19 Mon Sep 17 00:00:00 2001
> From: Amitay Isaacs <[hidden email]>
> Date: Thu, 28 Sep 2017 11:47:24 +1000
> Subject: [PATCH 2/2] ctdb-tests: Send broadcast to connected nodes, not
>  configured nodes
>
> https://bugzilla.samba.org/show_bug.cgi?id=13056
>
> Signed-off-by: Amitay Isaacs <[hidden email]>
> ---
>  ctdb/tests/src/cluster_wait.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ctdb/tests/src/cluster_wait.c b/ctdb/tests/src/cluster_wait.c
> index 1405738ac47..ecd2efdf314 100644
> --- a/ctdb/tests/src/cluster_wait.c
> +++ b/ctdb/tests/src/cluster_wait.c
> @@ -264,7 +264,7 @@ static void cluster_wait_join_unregistered(struct tevent_req *subreq)
>   msg.data.data = tdb_null;
>  
>   subreq = ctdb_client_message_send(state, state->ev, state->client,
> -  CTDB_BROADCAST_ALL, &msg);
> +  CTDB_BROADCAST_CONNECTED, &msg);
>   if (tevent_req_nomem(subreq, req)) {
>   return;
>   }
> --
> 2.13.5
>


--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Avoid using CTDB_BROADCAST_ALL (bug 13056)

Samba - samba-technical mailing list
On Thu, Sep 28, 2017 at 9:17 PM, Volker Lendecke <[hidden email]>
wrote:

> On Thu, Sep 28, 2017 at 03:19:28PM +1000, Amitay Isaacs via
> samba-technical wrote:
> > Hi,
> >
> > CTDB in some places uses CTDB_BROADCAST_ALL to
> > distribute packets to all the configured nodes.  If there is a
> > dead node (or a node that is not running) in the cluster,
> > then ctdb daemons will queue up packets for such nodes.
> > Over long periods of time, ctdb daemon can consume lots
> > of memory which will not be freed till the dead node is
> > started or ctdb daemon is restarted.
> >
> > Use CTDB_BROADCAST_CONNECTED instead to send
> > packets to the nodes that are running.
>
> Pushed, thanks!
>
> Volker
>

Turns out that the fix is not as trivial as I thought earlier.
Sorry for the trouble.

Just replacing all BROADCAST_ALL with BROADCAST_CONNECTED
introduces all sorts of race conditions during ctdb startup.  Some of the
messages (e.g. election) have to be sent using BROADCAST_ALL.
I need to fix the logic for handling BROADCAST_ALL to not queue the
packets for dead nodes.

Amitay.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Avoid using CTDB_BROADCAST_ALL (bug 13056)

Samba - samba-technical mailing list
On Fri, Sep 29, 2017 at 12:45 PM, Amitay Isaacs <[hidden email]> wrote:

> On Thu, Sep 28, 2017 at 9:17 PM, Volker Lendecke <
> [hidden email]> wrote:
>
>> On Thu, Sep 28, 2017 at 03:19:28PM +1000, Amitay Isaacs via
>> samba-technical wrote:
>> > Hi,
>> >
>> > CTDB in some places uses CTDB_BROADCAST_ALL to
>> > distribute packets to all the configured nodes.  If there is a
>> > dead node (or a node that is not running) in the cluster,
>> > then ctdb daemons will queue up packets for such nodes.
>> > Over long periods of time, ctdb daemon can consume lots
>> > of memory which will not be freed till the dead node is
>> > started or ctdb daemon is restarted.
>> >
>> > Use CTDB_BROADCAST_CONNECTED instead to send
>> > packets to the nodes that are running.
>>
>> Pushed, thanks!uu
>>
>> Volker
>>
>
> Turns out that the fix is not as trivial as I thought earlier.
> Sorry for the trouble.
>
> Just replacing all BROADCAST_ALL with BROADCAST_CONNECTED
> introduces all sorts of race conditions during ctdb startup.  Some of the
> messages (e.g. election) have to be sent using BROADCAST_ALL.
> I need to fix the logic for handling BROADCAST_ALL to not queue the
> packets for dead nodes.
>
>
Here are the updated patches.

- Replace BROADCAST_ALL with BROADCAST_CONNECTED for
  database controls
- Do not queue packet if the queue fd is invalid

BROADCAST_ALL is still used in following two cases to avoid
the race conditions during ctdb startup.
- Election
- Distributing tickles

Please review and push.

Amitay.

ctdb.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Avoid using CTDB_BROADCAST_ALL (bug 13056)

Samba - samba-technical mailing list
On Tue, 3 Oct 2017 16:52:40 +1100, Amitay Isaacs via samba-technical
<[hidden email]> wrote:

> On Fri, Sep 29, 2017 at 12:45 PM, Amitay Isaacs <[hidden email]> wrote:
>
> > On Thu, Sep 28, 2017 at 9:17 PM, Volker Lendecke <  
> > [hidden email]> wrote:  
> >  
> >> On Thu, Sep 28, 2017 at 03:19:28PM +1000, Amitay Isaacs via
> >> samba-technical wrote:  
>  [...]  
> >>
> >> Pushed, thanks!uu
> >>
> >> Volker
> >>  
> >
> > Turns out that the fix is not as trivial as I thought earlier.
> > Sorry for the trouble.
> >
> > Just replacing all BROADCAST_ALL with BROADCAST_CONNECTED
> > introduces all sorts of race conditions during ctdb startup.  Some of the
> > messages (e.g. election) have to be sent using BROADCAST_ALL.
> > I need to fix the logic for handling BROADCAST_ALL to not queue the
> > packets for dead nodes.
> >
> >  
> Here are the updated patches.
>
> - Replace BROADCAST_ALL with BROADCAST_CONNECTED for
>   database controls
> - Do not queue packet if the queue fd is invalid
>
> BROADCAST_ALL is still used in following two cases to avoid
> the race conditions during ctdb startup.
> - Election
> - Distributing tickles
>
> Please review and push.

Reviewed-by: Martin Schwenke <[hidden email]>

Pushed...

peace & happiness,
martin

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Avoid using CTDB_BROADCAST_ALL (bug 13056)

Samba - samba-technical mailing list
On Thu, Oct 05, 2017 at 02:19:48PM +1100, Martin Schwenke via samba-technical wrote:
> > Please review and push.
>
> Reviewed-by: Martin Schwenke <[hidden email]>
>
> Pushed...

Sorry, had a public holiday and was busy with other stuff :-(

Volker

--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Avoid using CTDB_BROADCAST_ALL (bug 13056)

Samba - samba-technical mailing list
On Thu, 5 Oct 2017 06:50:54 +0200, Volker Lendecke
<[hidden email]> wrote:

> On Thu, Oct 05, 2017 at 02:19:48PM +1100, Martin Schwenke via samba-technical wrote:
> > > Please review and push.  
> >
> > Reviewed-by: Martin Schwenke <[hidden email]>
> >
> > Pushed...  
>
> Sorry, had a public holiday and was busy with other stuff :-(

It's cool.  I wondered about putting your review on (some of) it but we
decided there was a new patch and another had changed, so not the right
thing to do...  ;-)

peace & happiness,
martin