CTDB IP takeover/failover tunables - do you use them?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

CTDB IP takeover/failover tunables - do you use them?

Martin Schwenke
I'm currently hacking on CTDB's IP takeover/failover code.  For Samba
4.6, I would like to rationalise the IP takeover-related tunable
parameters.

I would like to know if there are any users who set the values of these
tunables to non-default values.  The tunables in question are:

   DisableIPFailover
       Default: 0

       When set to non-zero, ctdb will not perform failover or failback. Even
       if a node fails while holding public IPs, ctdb will not recover the IPs
       or assign them to another node.

       When this tunable is enabled, ctdb will no longer attempt to recover
       the cluster by failing IP addresses over to other nodes. This leads to
       a service outage until the administrator has manually performed IP
       failover to replacement nodes using the 'ctdb moveip' command.

   NoIPFailback
       Default: 0

       When set to 1, ctdb will not perform failback of IP addresses when a
       node becomes healthy. When a node becomes UNHEALTHY, ctdb WILL perform
       failover of public IP addresses, but when the node becomes HEALTHY
       again, ctdb will not fail the addresses back.

       Use with caution! Normally when a node becomes available to the cluster
       ctdb will try to reassign public IP addresses onto the new node as a
       way to distribute the workload evenly across the clusternode. Ctdb
       tries to make sure that all running nodes have approximately the same
       number of public addresses it hosts.

       When you enable this tunable, ctdb will no longer attempt to rebalance
       the cluster by failing IP addresses back to the new nodes. An
       unbalanced cluster will therefore remain unbalanced until there is
       manual intervention from the administrator. When this parameter is set,
       you can manually fail public IP addresses over to the new node(s) using
       the 'ctdb moveip' command.

   NoIPHostOnAllDisabled
       Default: 0

       If no nodes are HEALTHY then by default ctdb will happily host public
       IPs on disabled (unhealthy or administratively disabled) nodes. This
       can cause problems, for example if the underlying cluster filesystem is
       not mounted. When set to 1 on a node and that node is disabled, any IPs
       hosted by this node will be released and the node will not takeover any
       IPs until it is no longer disabled.

   NoIPTakeover
       Default: 0

       When set to 1, ctdb will not allow IP addresses to be failed over onto
       this node. Any IP addresses that the node currently hosts will remain
       on the node but no new IP addresses can be failed over to the node.

In particular, I would like to know if anyone has a use case where they
set any of these variables to different values on different nodes.  This
only really matters for the last 2 (NoIPHostOnAllDisabled,
NoIPTakeover), since the value on the recovery master is just used for
the other 2.  If you do this, can you please explain why?  :-)

I would like to make all of the above tunables global but I will
not do that if it breaks an existing use case and I can't find a
different way.

There are also 2 tunables to choose the algorithm used to calculate the
IP address layout:

   DeterministicIPs
       Default: 0

       When set to 1, ctdb will try to keep public IP addresses locked to
       specific nodes as far as possible. This makes it easier for debugging
       since you can know that as long as all nodes are healthy public IP X
       will always be hosted by node Y.

       The cost of using deterministic IP address assignment is that it
       disables part of the logic where ctdb tries to reduce the number of
       public IP assignment changes in the cluster. This tunable may increase
       the number of IP failover/failbacks that are performed on the cluster
       by a small margin.

   LCP2PublicIPs
       Default: 1

       When set to 1, ctdb uses the LCP2 ip allocation algorithm.

I plan to replace these with a single tunable to select the algorithm
(0 = deterministic, 1 = non-deterministic, 2 = LCP2 (default)).

Thanks for any feedback...

peace & happiness,
martin

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: CTDB IP takeover/failover tunables - do you use them?

Hvisage
Martin Schwenke wrote
I'm currently hacking on CTDB's IP takeover/failover code.  For Samba
4.6, I would like to rationalise the IP takeover-related tunable
parameters.

I would like to know if there are any users who set the values of these
tunables to non-default values.  The tunables in question are:

..snip..

   NoIPTakeover
       Default: 0

       When set to 1, ctdb will not allow IP addresses to be failed over onto
       this node. Any IP addresses that the node currently hosts will remain
       on the node but no new IP addresses can be failed over to the node.

In particular, I would like to know if anyone has a use case where they
set any of these variables to different values on different nodes.  This
only really matters for the last 2 (NoIPHostOnAllDisabled,
NoIPTakeover), since the value on the recovery master is just used for
the other 2.  If you do this, can you please explain why?  :-)
Sorry late reply, only saw this as I was implementing GlusterFS and first had to battle SystemD..

My case:
glusterfs volume replica 3 arbiter1 NodeA NodeB NodeC

NodeA - volume VM on HyperVisorA in DC1 - prefer to have 10.0.1.1 on NodeA
NodeB - volume VM on HyperVisorB in DC2 - prefer to have 10.0.2.2 on NodeB
NodeC - Arbiter VM in the "cloud".
ClientA1, ClientA2 -VMs on HyperVisorA
ClientB1, ClientB2 - VMs on HypervisorB

I don't want NodeC to have the public IP, as it means a performance issue, and when it's the only one available, ctdb would in any case be "down".

There are also 2 tunables to choose the algorithm used to calculate the
IP address layout:

   DeterministicIPs
       Default: 0

       When set to 1, ctdb will try to keep public IP addresses locked to
       specific nodes as far as possible. This makes it easier for debugging
       since you can know that as long as all nodes are healthy public IP X
       will always be hosted by node Y.
Here my "use ca
se" is that I want that to have ClientA1 & ClientA2 talk to NodeA, and ClientB1 & ClientB2 talk to NodeB. The data is mostly reads, just we need/want the uploads to be "stable", so I could point ClientB1 & ClientB2 to 10.0.2.2 and ClientA1 & clientA2 to 10.0.1.1 and have the best network performances.


I'm still looking/reading-docs as to how "deterministically" know that 10.0.2.2 should be on NodeB & 10.0.1.1. should be on NodeA in the normal/stable case?

HEndrik Visage
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: CTDB IP takeover/failover tunables - do you use them?

Samba - samba-technical mailing list
On Wed, 19 Apr 2017 03:18:10 -0700 (PDT), Hvisage via samba-technical
<[hidden email]> wrote:

> Martin Schwenke wrote
> > I'm currently hacking on CTDB's IP takeover/failover code.  For Samba
> > 4.6, I would like to rationalise the IP takeover-related tunable
> > parameters.
> >
> > I would like to know if there are any users who set the values of these
> > tunables to non-default values.  The tunables in question are:
> >
> > ..snip..
> >
> >    NoIPTakeover
> >        Default: 0
> >
> >        When set to 1, ctdb will not allow IP addresses to be failed over
> > onto
> >        this node. Any IP addresses that the node currently hosts will
> > remain
> >        on the node but no new IP addresses can be failed over to the node.
> >
> > In particular, I would like to know if anyone has a use case where they
> > set any of these variables to different values on different nodes.  This
> > only really matters for the last 2 (NoIPHostOnAllDisabled,
> > NoIPTakeover), since the value on the recovery master is just used for
> > the other 2.  If you do this, can you please explain why?  :-)  
>
> Sorry late reply, only saw this as I was implementing GlusterFS and first
> had to battle SystemD..
>
> My case:
> glusterfs volume replica 3 arbiter1 NodeA NodeB NodeC
>
> NodeA - volume VM on HyperVisorA in DC1 - prefer to have 10.0.1.1 on NodeA
> NodeB - volume VM on HyperVisorB in DC2 - prefer to have 10.0.2.2 on NodeB
> NodeC - Arbiter VM in the "cloud".
> ClientA1, ClientA2 -VMs on HyperVisorA
> ClientB1, ClientB2 - VMs on HypervisorB
>
> I don't want NodeC to have the public IP, as it means a performance issue,
> and when it's the only one available, ctdb would in any case be "down".

Right. So, in that case, you just set the public addresses list on
NodeC to be empty.  You can even set
CTDB_PUBLIC_ADDRESSES=/dev/null on NodeC - that's what our test suite
does when it is running against "local daemons".

There are some corner cases where nodes with no public addresses will
still get some unexpected messages sent to them and will log complaints
at higher debug levels.  We will eventually get that sorted out.

> > There are also 2 tunables to choose the algorithm used to calculate the
> > IP address layout:
> >
> >    DeterministicIPs
> >        Default: 0
> >
> >        When set to 1, ctdb will try to keep public IP addresses locked to
> >        specific nodes as far as possible. This makes it easier for
> > debugging
> >        since you can know that as long as all nodes are healthy public IP
> > X
> >        will always be hosted by node Y.  
>
> Here my "use case" is that I want that to have ClientA1 & ClientA2 talk to
> NodeA, and ClientB1 & ClientB2 talk to NodeB. The data is mostly reads, just
> we need/want the uploads to be "stable", so I could point ClientB1 &
> ClientB2 to 10.0.2.2 and ClientA1 & clientA2 to 10.0.1.1 and have the best
> network performances.
>
> I'm still looking/reading-docs as to how "deterministically" know that
> 10.0.2.2 should be on NodeB & 10.0.1.1. should be on NodeA in the
> normal/stable case?

You want to test deterministic IPs, which is now IPAllocAlgorithm=0 from
Samba >= 4.6.0.

However, deterministics IPs do not (necessary) behave well when some
nodes do not define public addresses.  The algorithm:

1. Assigns addresses across all nodes in a modulo manner

2. Drops addresses from unhealthy node

3. Assigns unassigned addresses in a balanced manner

If you only have 2 public addresses and NodeA is node 0, NodeB is node
1 then it should work fine.  :-)

peace & happiness,
martin

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: CTDB IP takeover/failover tunables - do you use them?

Samba - samba-technical mailing list

> On 20 Apr 2017, at 07:16 , Martin Schwenke <[hidden email]> wrote:
>
> On Wed, 19 Apr 2017 03:18:10 -0700 (PDT), Hvisage via samba-technical
> <[hidden email]> wrote:
>
>> Martin Schwenke wrote
>>> I'm currently hacking on CTDB's IP takeover/failover code.  For Samba
>>> 4.6, I would like to rationalise the IP takeover-related tunable
>>> parameters.
>>>
>>> I would like to know if there are any users who set the values of these
>>> tunables to non-default values.  The tunables in question are:
>>>
>>> ..snip..
>>>
>>>   NoIPTakeover
>>>       Default: 0
>>>
>>>       When set to 1, ctdb will not allow IP addresses to be failed over
>>> onto
>>>       this node. Any IP addresses that the node currently hosts will
>>> remain
>>>       on the node but no new IP addresses can be failed over to the node.
>>>
>>> In particular, I would like to know if anyone has a use case where they
>>> set any of these variables to different values on different nodes.  This
>>> only really matters for the last 2 (NoIPHostOnAllDisabled,
>>> NoIPTakeover), since the value on the recovery master is just used for
>>> the other 2.  If you do this, can you please explain why?  :-)  
>>
>> Sorry late reply, only saw this as I was implementing GlusterFS and first
>> had to battle SystemD..
>>
>> My case:
>> glusterfs volume replica 3 arbiter1 NodeA NodeB NodeC
>>
>> NodeA - volume VM on HyperVisorA in DC1 - prefer to have 10.0.1.1 on NodeA
>> NodeB - volume VM on HyperVisorB in DC2 - prefer to have 10.0.2.2 on NodeB
>> NodeC - Arbiter VM in the "cloud".
>> ClientA1, ClientA2 -VMs on HyperVisorA
>> ClientB1, ClientB2 - VMs on HypervisorB
>>
>> I don't want NodeC to have the public IP, as it means a performance issue,
>> and when it's the only one available, ctdb would in any case be "down".
>
> Right. So, in that case, you just set the public addresses list on
> NodeC to be empty.  You can even set
> CTDB_PUBLIC_ADDRESSES=/dev/null on NodeC - that's what our test suite
> does when it is running against "local daemons”.

Okay… the documentation I’ve read up till now, states something like:
 “the CTDB_PUBLIC_ADDRESSES file should be the exact same on all the nodes”,
 at least that’s how I interpreted it, so this just needs to be documented more explicitly ;)

> There are some corner cases where nodes with no public addresses will
> still get some unexpected messages sent to them and will log complaints
> at higher debug levels.  We will eventually get that sorted out.
>
>>> There are also 2 tunables to choose the algorithm used to calculate the
>>> IP address layout:
>>>
>>>   DeterministicIPs
>>>       Default: 0
>>>
>>>       When set to 1, ctdb will try to keep public IP addresses locked to
>>>       specific nodes as far as possible. This makes it easier for
>>> debugging
>>>       since you can know that as long as all nodes are healthy public IP
>>> X
>>>       will always be hosted by node Y.  
>>
>> Here my "use case" is that I want that to have ClientA1 & ClientA2 talk to
>> NodeA, and ClientB1 & ClientB2 talk to NodeB. The data is mostly reads, just
>> we need/want the uploads to be "stable", so I could point ClientB1 &
>> ClientB2 to 10.0.2.2 and ClientA1 & clientA2 to 10.0.1.1 and have the best
>> network performances.
>>
>> I'm still looking/reading-docs as to how "deterministically" know that
>> 10.0.2.2 should be on NodeB & 10.0.1.1. should be on NodeA in the
>> normal/stable case?
>
> You want to test deterministic IPs, which is now IPAllocAlgorithm=0 from
> Samba >= 4.6.0.
>
> However, deterministics IPs do not (necessary) behave well when some
> nodes do not define public addresses.  The algorithm:
>
> 1. Assigns addresses across all nodes in a modulo manner
>
> 2. Drops addresses from unhealthy node
>
> 3. Assigns unassigned addresses in a balanced manner
>
> If you only have 2 public addresses and NodeA is node 0, NodeB is node
> 1 then it should work fine.  :-)

So, will the first public IP in the list be used, or the sorted list and that first IP to be deployed?
What will happen if the order of the public IPs differ on the nodes?

>
> peace & happiness,
> martin


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: CTDB IP takeover/failover tunables - do you use them?

Samba - samba-technical mailing list
On Thu, 20 Apr 2017 09:27:25 +0200, hvjunk <[hidden email]> wrote:

> > On 20 Apr 2017, at 07:16 , Martin Schwenke <[hidden email]> wrote:

> > Right. So, in that case, you just set the public addresses list on
> > NodeC to be empty.  You can even set
> > CTDB_PUBLIC_ADDRESSES=/dev/null on NodeC - that's what our test suite
> > does when it is running against "local daemons”.  

> Okay… the documentation I’ve read up till now, states something like:
>  “the CTDB_PUBLIC_ADDRESSES file should be the exact same on all the
> nodes”, at least that’s how I interpreted it, so this just needs to
> be documented more explicitly ;)

In recent versions this has been clarified in the ctdb(7) man page.

I updated the wiki a couple of months ago and it is now clear there too:

  https://wiki.samba.org/index.php/Adding_public_IP_addresses

If you see it wrong anywhere else then please yell and we'll fix
it.  :-)

> >> Here my "use case" is that I want that to have ClientA1 & ClientA2 talk to
> >> NodeA, and ClientB1 & ClientB2 talk to NodeB. The data is mostly reads, just
> >> we need/want the uploads to be "stable", so I could point ClientB1 &
> >> ClientB2 to 10.0.2.2 and ClientA1 & clientA2 to 10.0.1.1 and have the best
> >> network performances.
> >>
> >> I'm still looking/reading-docs as to how "deterministically" know that
> >> 10.0.2.2 should be on NodeB & 10.0.1.1. should be on NodeA in the
> >> normal/stable case?  
> >
> > You want to test deterministic IPs, which is now IPAllocAlgorithm=0 from
> > Samba >= 4.6.0.
> >
> > However, deterministics IPs do not (necessary) behave well when some
> > nodes do not define public addresses.  The algorithm:
> >
> > 1. Assigns addresses across all nodes in a modulo manner
> >
> > 2. Drops addresses from unhealthy node
> >
> > 3. Assigns unassigned addresses in a balanced manner
> >
> > If you only have 2 public addresses and NodeA is node 0, NodeB is node
> > 1 then it should work fine.  :-)  

> So, will the first public IP in the list be used, or the sorted list
> and that first IP to be deployed? What will happen if the order of
> the public IPs differ on the nodes?

The order of the public IPs in the files doesn't matter.  The IP
takeover process gathers the IPs from all nodes and (effectively) sorts
them before allocating them.  However, a detail of the implementation
means that the list ends up *reversed* before the IPs are allocated. So,
if NodeA is node 0 and NodeB is node 1 with addresses 10.0.1.1 and
10.0.2.2 then the allocation should look like:

  10.0.1.1  1   # NodeB
  10.0.2.2  0   # NodeA

It is easy enough to test.  Hopefully you can work with that.

The IP failover part of CTDB will mostly like undergo some large
changes before Samba 4.8.  I don't think we'll lose any
functionality... but we might un-reverse the sort order...  ;-)

peace & happiness,
martin

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: CTDB IP takeover/failover tunables - do you use them?

Samba - samba-technical mailing list

> On 20 Apr 2017, at 11:50 , Martin Schwenke <[hidden email]> wrote:
>
> On Thu, 20 Apr 2017 09:27:25 +0200, hvjunk <[hidden email]> wrote:
>
>>> On 20 Apr 2017, at 07:16 , Martin Schwenke <[hidden email]> wrote:
>
>>> Right. So, in that case, you just set the public addresses list on
>>> NodeC to be empty.  You can even set
>>> CTDB_PUBLIC_ADDRESSES=/dev/null on NodeC - that's what our test suite
>>> does when it is running against "local daemons”.  
>
>> Okay… the documentation I’ve read up till now, states something like:
>
> If you see it wrong anywhere else then please yell and we'll fix
> it.  :-)

Ah, sorry, I confused the public_IPs list file with the NODES list file!
(Have just a headache after battling SystemD to get the GlusterFS mounting *reliably* )

> The order of the public IPs in the files doesn't matter.  The IP
> takeover process gathers the IPs from all nodes and (effectively) sorts
> them before allocating them.  However, a detail of the implementation
> means that the list ends up *reversed* before the IPs are allocated. So,
> if NodeA is node 0 and NodeB is node 1 with addresses 10.0.1.1 and
> 10.0.2.2 then the allocation should look like:
>
>  10.0.1.1  1   # NodeB
>  10.0.2.2  0   # NodeA
>
> It is easy enough to test.  Hopefully you can work with that.
>
> The IP failover part of CTDB will mostly like undergo some large
> changes before Samba 4.8.  I don't think we'll lose any
> functionality... but we might un-reverse the sort order...  ;-)

okay…

Thanks for the explanation!

However, would it be difficult/etc. to have it “preferred” to a specific node? ie, in the  CTDB_PUBLIC_ADDRESSES for a node, have something like:

10.1.1.1/24 eth1 prefer
10.1.2.1/24 eth1

In a case with the cluster “local” I know it’ll not make a difference, but with my “distributed” nodes, the locality would be nice to have it assigned, rather than computed, especially if the algorithm might change during an upgrade.

Anycase, thank you Martin for the help, it helped me a lot!

Hendrik Visage



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: CTDB IP takeover/failover tunables - do you use them?

Samba - samba-technical mailing list
On Thu, 20 Apr 2017 12:24:48 +0200, hvjunk <[hidden email]> wrote:

> However, would it be difficult/etc. to have it “preferred” to a
> specific node? ie, in the  CTDB_PUBLIC_ADDRESSES for a node, have
> something like:

> 10.1.1.1/24 eth1 prefer
> 10.1.2.1/24 eth1

Not with the current code.

> In a case with the cluster “local” I know it’ll not make a
> difference, but with my “distributed” nodes, the locality would be
> nice to have it assigned, rather than computed, especially if the
> algorithm might change during an upgrade.

OK, I can see why you want the locality.

If everything goes according to plan then we will completely rewrite
the way IP failover is done in CTDB, while maintaining approximately
the same functionality.  It is likely that we will factor out the
program that takes the IP layout and the node states and produces a new
IP layout. If we do this, then it would be simple to make it
pluggable... and you could easily replace it with a script that handles
your locality requirement.  However, that's probably 10 or 12 months
away.

A bit of history...  The original algorithm was deterministic IPs,
which is good for simple configurations (e.g. all nodes host IPs and
they all have the same configuration). The next algorithm was
non-deterministic IPs, where things are constantly rebalanced and can
end up anyway. However, this doesn't work well for multiple
networks/interfaces.  The current default LCP2 algorithm uses a
heuristic to be able to balance a lot of IPs across multiple
networks/interfaces per node, with different configurations on
different nodes.  So, we have worked to support more complex scenarios
rather than the simple ones... like the one you want.

> Anycase, thank you Martin for the help, it helped me a lot!

You're very welcome!  :-)

peace & happiness,
martin

Loading...