CTDB_MANAGES_SAMBA and CTDB_MANAGES_WINBIND - handles restart after crash

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

CTDB_MANAGES_SAMBA and CTDB_MANAGES_WINBIND - handles restart after crash

Samba - samba-technical mailing list
If winbind (or Samba) crashes (something that the ctdb event scripts
can detect by their polling), I noticed that unlike the systemd
service configuration - our ctdb event scripts are not setup to
auto-restart services after crash.

For example,
if CTDB_MANAGES_WINBIND=yes (see
https://wiki.samba.org/index.php/Configuring_clustered_Samba where it
recommends this)

and winbind ever crashes then it won't be restarted, on the other hand
with 'normal' configuration

of systemd the winbind.service file would have something like

[Service]
Restart=on-failure
RestartSec=4
...
[Install]
WantedBy=multi-user.target

in it so systemd would automatically restart winbind 4 seconds after failure.

Should ctdb events scripts for winbind (and similarly samba) be set if
the monitor ("wbinfo -p") fails - to do a service restart of winbind?

In addition, the "ctdb scriptstatus" output is strange if there is an
error (like winbind is crashed so the "wbinfo -p" fails in ctdb's
winbind monitoring script ) - if winbind event script (or any script)
the following ones in the list are not executed - rather than
reporting an error and continuing to report the status of the other
services


--
Thanks,

Steve

Reply | Threaded
Open this post in threaded view
|

Re: CTDB_MANAGES_SAMBA and CTDB_MANAGES_WINBIND - handles restart after crash

Samba - samba-technical mailing list
On Mon, 18 Dec 2017 00:08:37 -0600, Steve French via samba-technical
<[hidden email]> wrote:

> If winbind (or Samba) crashes (something that the ctdb event scripts
> can detect by their polling), I noticed that unlike the systemd
> service configuration - our ctdb event scripts are not setup to
> auto-restart services after crash.
>
> For example,
> if CTDB_MANAGES_WINBIND=yes (see
> https://wiki.samba.org/index.php/Configuring_clustered_Samba where it
> recommends this)
>
> and winbind ever crashes then it won't be restarted, on the other hand
> with 'normal' configuration
>
> of systemd the winbind.service file would have something like
>
> [Service]
> Restart=on-failure
> RestartSec=4
> ...
> [Install]
> WantedBy=multi-user.target
>
> in it so systemd would automatically restart winbind 4 seconds after failure.
>
> Should ctdb events scripts for winbind (and similarly samba) be set if
> the monitor ("wbinfo -p") fails - to do a service restart of winbind?

The historical answer here is "no".  Rather than just making the
service unavailable, as in the un-clustered case, doing an automatic
restart when clustered might cause failovers back and forth if the
service state keeps on flapping. This will encourage the client to keep
reconnecting to different nodes and I suppose this might result in data
corruption of the client keeps taking locks and doing partial file
updates.  I'd be interested in seeing if other confirm this idea.

> In addition, the "ctdb scriptstatus" output is strange if there is an
> error (like winbind is crashed so the "wbinfo -p" fails in ctdb's
> winbind monitoring script ) - if winbind event script (or any script)
> the following ones in the list are not executed - rather than
> reporting an error and continuing to report the status of the other
> services

CTDB only has a single binary state for healthy/unhealthy.  If a monitor
event fails in a particular script then there's no point continuing to
try to monitor other services because monitoring has already failed and
the node will be marked as unhealthy.  This also allows scripts to
implicitly depend on each other - if an early script fails then it
might not make sense to run the rest of the scripts.

peace & happiness,
martin