ctdb vacuum timeouts and record locks

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

ctdb vacuum timeouts and record locks

Samba - General mailing list
Hi List,

I set up a ctdb cluster a couple months back.  Things seemed pretty
solid for the first 2-3 weeks, but then I started getting reports of
people not being able to access files, or some times directories.  It
has taken me a while to figure some stuff out, but it seems the common
denominator to this happening is vacuuming timeouts for locking.tdb in
the ctdb log, which might go on every 2 minutes and 10 seconds for
anywhere from an hour to a day and some, and then it will also add to
the logs failure to get a RECORD lock on the same tdb file.  Whenever I
get a report about inaccessible files I find this in the ctdb logs:

ctdbd[89]: Vacuuming child process timed out for db locking.tdb
ctdbd[89]: Vacuuming child process timed out for db locking.tdb
ctdbd[89]: Unable to get RECORD lock on database locking.tdb for 10 seconds
ctdbd[89]: Set lock debugging helper to
"/usr/local/samba/etc/ctdb/debug_locks.sh"
/usr/local/samba/etc/ctdb/debug_locks.sh: 142:
/usr/local/samba/etc/ctdb/debug_locks.sh: cannot create : Directory
nonexistent
sh: echo: I/O error
sh: echo: I/O error
sh: echo: I/O error
sh: echo: I/O error
cat: write error: Broken pipe
sh: echo: I/O error
ctdbd[89]: Unable to get RECORD lock on database locking.tdb for 20 seconds
/usr/local/samba/etc/ctdb/debug_locks.sh: 142:
/usr/local/samba/etc/ctdb/debug_locks.sh: cannot create : Directory
nonexistent
sh: echo: I/O error
sh: echo: I/O error

 From googling, the vacuuming process is okay to timeout, it should
succeed next time, and if it doesn't the only harm is a bloated file.
But it never does succeed after the first time I see this message, and
the locking.tdb file does not change size, bigger or smaller.

I am not really clear on what the script cannot create, but I did find
no evidence of the gstack program being available on debian, so I
changed the script to run pstack instead, and then ran it manually with
set -x while the logs were recording the problem, and I think this is
the trace output it is trying to come up with, but sadly this isn't
meaningful to me (yet!):

cat /proc/30491/stack
[<ffffffff8197d00d>] inet_recvmsg+0x7d/0xb0
[<ffffffffc07c3856>] request_wait_answer+0x166/0x1f0 [fuse]
[<ffffffff814b8d50>] prepare_to_wait_event+0xf0/0xf0
[<ffffffffc07c3958>] __fuse_request_send+0x78/0x80 [fuse]
[<ffffffffc07c6bdd>] fuse_simple_request+0xbd/0x190 [fuse]
[<ffffffffc07ccc37>] fuse_setlk+0x177/0x190 [fuse]
[<ffffffff816592f7>] SyS_flock+0x117/0x190
[<ffffffff81403b1c>] do_syscall_64+0x7c/0xf0
[<ffffffff81a0632f>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

This might happen twice in a day or once in a week, doesn't seem
consistent, and so far I haven't found any catalyst.

My setup is two servers, the OS is debian and is running samba AD on
dedicated SSDs, and each server has a RAID array of HDDs for storage,
with a mirrored GlusterFS running on top of them.  Each OS has an LXC
container running the clustered member servers with the GlusterFS
mounted to the containers.  The tdb files are in the containers, not on
the shared storage.  I do not use ctdb to start smbd/nmbd.  I can't
think what else is relevant about my setup as it pertains to this issue...

I can fix the access to the files by stopping the ctdb process and just
letting the other cluster member run, but the only way I have found so
far to fix the locking.tdb file is to shutdown the container.  sometimes
I have to forcefully kill it from the host.

The errors are not confined to one member of the cluster, I have seen
them happen on both of them.  Though, of the people reporting the
problem, it often seems to be the same files causing the problem.
Before I had figured out about ctdb logs, several times there were
people who couldn't access a specific folder, but removing a specific
file from that folder fixed it.

I have put lots of hours into google on this and nothing I have found
has turned the light bulb in my brain on.  Maybe (hopefully, actually) I
am overlooking something obvious.  Wondering if anyone can point me at
the next step in troubleshooting this?


--
Bob Miller
Cell: 867-334-7117
Office: 867-633-3760
www.computerisms.ca

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: ctdb vacuum timeouts and record locks

Samba - General mailing list
Hi Bob,

On Thu, 26 Oct 2017 22:44:30 -0700, Computerisms Corporation via samba
<[hidden email]> wrote:

> I set up a ctdb cluster a couple months back.  Things seemed pretty
> solid for the first 2-3 weeks, but then I started getting reports of
> people not being able to access files, or some times directories.  It
> has taken me a while to figure some stuff out, but it seems the common
> denominator to this happening is vacuuming timeouts for locking.tdb in
> the ctdb log, which might go on every 2 minutes and 10 seconds for
> anywhere from an hour to a day and some, and then it will also add to
> the logs failure to get a RECORD lock on the same tdb file.  Whenever I
> get a report about inaccessible files I find this in the ctdb logs:
>
> ctdbd[89]: Vacuuming child process timed out for db locking.tdb
> ctdbd[89]: Vacuuming child process timed out for db locking.tdb
> ctdbd[89]: Unable to get RECORD lock on database locking.tdb for 10 seconds
> ctdbd[89]: Set lock debugging helper to
> "/usr/local/samba/etc/ctdb/debug_locks.sh"
> /usr/local/samba/etc/ctdb/debug_locks.sh: 142:
> /usr/local/samba/etc/ctdb/debug_locks.sh: cannot create : Directory
> nonexistent
> sh: echo: I/O error
> sh: echo: I/O error
> sh: echo: I/O error
> sh: echo: I/O error
> cat: write error: Broken pipe
> sh: echo: I/O error
> ctdbd[89]: Unable to get RECORD lock on database locking.tdb for 20 seconds
> /usr/local/samba/etc/ctdb/debug_locks.sh: 142:
> /usr/local/samba/etc/ctdb/debug_locks.sh: cannot create : Directory
> nonexistent
> sh: echo: I/O error
> sh: echo: I/O error

That's weird.  The only file really created by that script is the lock
file that is used to make sure we don't debug locks too many times.
That should be in:

  "${CTDB_SCRIPT_VARDIR}/debug_locks.lock"

The other possibility is the use of the script_log() function to try to
get the output logged.  script_log() isn't my greatest moment.  When
debugging you could just replace it with the logger command to get the
output out to syslog.

>  From googling, the vacuuming process is okay to timeout, it should
> succeed next time, and if it doesn't the only harm is a bloated file.
> But it never does succeed after the first time I see this message, and
> the locking.tdb file does not change size, bigger or smaller.
>
> I am not really clear on what the script cannot create, but I did find
> no evidence of the gstack program being available on debian, so I
> changed the script to run pstack instead, and then ran it manually with
> set -x while the logs were recording the problem, and I think this is
> the trace output it is trying to come up with, but sadly this isn't
> meaningful to me (yet!):
>
> cat /proc/30491/stack
> [<ffffffff8197d00d>] inet_recvmsg+0x7d/0xb0
> [<ffffffffc07c3856>] request_wait_answer+0x166/0x1f0 [fuse]
> [<ffffffff814b8d50>] prepare_to_wait_event+0xf0/0xf0
> [<ffffffffc07c3958>] __fuse_request_send+0x78/0x80 [fuse]
> [<ffffffffc07c6bdd>] fuse_simple_request+0xbd/0x190 [fuse]
> [<ffffffffc07ccc37>] fuse_setlk+0x177/0x190 [fuse]
> [<ffffffff816592f7>] SyS_flock+0x117/0x190
> [<ffffffff81403b1c>] do_syscall_64+0x7c/0xf0
> [<ffffffff81a0632f>] entry_SYSCALL64_slow_path+0x25/0x25
> [<ffffffffffffffff>] 0xffffffffffffffff

I'm pretty sure gstack used to be shipped as an example in the gdb
package in Debian.  However, it isn't there and changelog.Debian.gz
doesn't mention it.  I had a quick try of pstack but couldn't get sense
out of it.  :-(

> This might happen twice in a day or once in a week, doesn't seem
> consistent, and so far I haven't found any catalyst.
>
> My setup is two servers, the OS is debian and is running samba AD on
> dedicated SSDs, and each server has a RAID array of HDDs for storage,
> with a mirrored GlusterFS running on top of them.  Each OS has an LXC
> container running the clustered member servers with the GlusterFS
> mounted to the containers.  The tdb files are in the containers, not on
> the shared storage.  I do not use ctdb to start smbd/nmbd.  I can't
> think what else is relevant about my setup as it pertains to this issue...

Are the TDB files really on a FUSE filesystem?  Is that an artifact of
the LXC containers?  If so, could it be that locking isn't reliable on
the FUSE filesystem?

> I can fix the access to the files by stopping the ctdb process and just
> letting the other cluster member run, but the only way I have found so
> far to fix the locking.tdb file is to shutdown the container.  sometimes
> I have to forcefully kill it from the host.
>
> The errors are not confined to one member of the cluster, I have seen
> them happen on both of them.  Though, of the people reporting the
> problem, it often seems to be the same files causing the problem.
> Before I had figured out about ctdb logs, several times there were
> people who couldn't access a specific folder, but removing a specific
> file from that folder fixed it.
>
> I have put lots of hours into google on this and nothing I have found
> has turned the light bulb in my brain on.  Maybe (hopefully, actually) I
> am overlooking something obvious.  Wondering if anyone can point me at
> the next step in troubleshooting this?

Is it possible to try this without the containers?  That would
certainly tell you if the problem is related to the container
infrastructure...

peace & happiness,
martin

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: ctdb vacuum timeouts and record locks

Samba - General mailing list
Hi Martin,

Thanks for reading and taking the time to reply

>> ctdbd[89]: Unable to get RECORD lock on database locking.tdb for 20 seconds
>> /usr/local/samba/etc/ctdb/debug_locks.sh: 142:
>> /usr/local/samba/etc/ctdb/debug_locks.sh: cannot create : Directory
>> nonexistent
>> sh: echo: I/O error
>> sh: echo: I/O error
>
> That's weird.  The only file really created by that script is the lock
> file that is used to make sure we don't debug locks too many times.
> That should be in:
>
>    "${CTDB_SCRIPT_VARDIR}/debug_locks.lock"

Next time it happens I will check this.

> The other possibility is the use of the script_log() function to try to
> get the output logged.  script_log() isn't my greatest moment.  When
> debugging you could just replace it with the logger command to get the
> output out to syslog.

Okay, that sounds useful, will see what I can do next time I see the
problem...

>> My setup is two servers, the OS is debian and is running samba AD on
>> dedicated SSDs, and each server has a RAID array of HDDs for storage,
>> with a mirrored GlusterFS running on top of them.  Each OS has an LXC
>> container running the clustered member servers with the GlusterFS
>> mounted to the containers.  The tdb files are in the containers, not on
>> the shared storage.  I do not use ctdb to start smbd/nmbd.  I can't
>> think what else is relevant about my setup as it pertains to this issue...
>
> Are the TDB files really on a FUSE filesystem?  Is that an artifact of
> the LXC containers?  If so, could it be that locking isn't reliable on
> the FUSE filesystem?

No.  The TDB files are in the container, and the container is on the SSD
with the OS.  running mount from within the container shows:

/dev/sda1 on / type ext4 (rw,relatime,errors=remount-ro,data=ordered)

However, the gluster native client is a fuse-based system, so the data
is stored on a fuse system which is mounted in the container:

masterchieflian:ctfngluster on /CTFN type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,allow_other,max_read=131072)

Since this is where the files that become inaccessible are, perhaps this
is really where the problem is, and not with the locking.tdb file?  I
will investigate about file locks on the gluster system...

> Is it possible to try this without the containers?  That would
> certainly tell you if the problem is related to the container
> infrastructure...

I like to think everything is possible, but it's not really feasible in
this case.  Since there are only two physical servers, and they need to
be running AD, the only way to separate the containers now is with
additional machines to act as member servers.  And because everything
tested fine and actually was fine for at least two weeks, these servers
are in production now and have been for a few months.  If I have to go
this way, it will certainly be a last resort...

Thanks again for your reply, will get back to you with what I find...




>
> peace & happiness,
> martin
>

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: ctdb vacuum timeouts and record locks

Samba - General mailing list
Hi,

This occurred again this morning, when the user reported the problem, I
found in the ctdb logs that vacuuming has been going on since last
night.  The need to fix it was urgent (when isn't it?) so I didn't have
time to poke around for clues, but immediately restarted the lxc
container.  But this time it wouldn't restart, which I had time to trace
to a hung smbd process, and between that and a run of the debug_locks.sh
script, I traced it to the user reporting the problem.  Given that the
user was primarily having problems with files in a given folder, I am
thinking this is because of some kind of lock on a file within that
folder.

Ended up rebooting both physical machines, problem solved.  for now.

So, not sure how to determine if this is a gluster problem, an lxc
problem, or a ctdb/smbd problem.  Thoughts/suggestions are welcome...

On 2017-10-27 10:09 AM, Computerisms Corporation via samba wrote:

> Hi Martin,
>
> Thanks for reading and taking the time to reply
>
>>> ctdbd[89]: Unable to get RECORD lock on database locking.tdb for 20
>>> seconds
>>> /usr/local/samba/etc/ctdb/debug_locks.sh: 142:
>>> /usr/local/samba/etc/ctdb/debug_locks.sh: cannot create : Directory
>>> nonexistent
>>> sh: echo: I/O error
>>> sh: echo: I/O error
>>
>> That's weird.  The only file really created by that script is the lock
>> file that is used to make sure we don't debug locks too many times.
>> That should be in:
>>
>>    "${CTDB_SCRIPT_VARDIR}/debug_locks.lock"
>
> Next time it happens I will check this.
>
>> The other possibility is the use of the script_log() function to try to
>> get the output logged.  script_log() isn't my greatest moment.  When
>> debugging you could just replace it with the logger command to get the
>> output out to syslog.
>
> Okay, that sounds useful, will see what I can do next time I see the
> problem...
>
>>> My setup is two servers, the OS is debian and is running samba AD on
>>> dedicated SSDs, and each server has a RAID array of HDDs for storage,
>>> with a mirrored GlusterFS running on top of them.  Each OS has an LXC
>>> container running the clustered member servers with the GlusterFS
>>> mounted to the containers.  The tdb files are in the containers, not on
>>> the shared storage.  I do not use ctdb to start smbd/nmbd.  I can't
>>> think what else is relevant about my setup as it pertains to this
>>> issue...
>>
>> Are the TDB files really on a FUSE filesystem?  Is that an artifact of
>> the LXC containers?  If so, could it be that locking isn't reliable on
>> the FUSE filesystem?
>
> No.  The TDB files are in the container, and the container is on the SSD
> with the OS.  running mount from within the container shows:
>
> /dev/sda1 on / type ext4 (rw,relatime,errors=remount-ro,data=ordered)
>
> However, the gluster native client is a fuse-based system, so the data
> is stored on a fuse system which is mounted in the container:
>
> masterchieflian:ctfngluster on /CTFN type fuse.glusterfs
> (rw,relatime,user_id=0,group_id=0,allow_other,max_read=131072)
>
> Since this is where the files that become inaccessible are, perhaps this
> is really where the problem is, and not with the locking.tdb file?  I
> will investigate about file locks on the gluster system...
>
>> Is it possible to try this without the containers?  That would
>> certainly tell you if the problem is related to the container
>> infrastructure...
>
> I like to think everything is possible, but it's not really feasible in
> this case.  Since there are only two physical servers, and they need to
> be running AD, the only way to separate the containers now is with
> additional machines to act as member servers.  And because everything
> tested fine and actually was fine for at least two weeks, these servers
> are in production now and have been for a few months.  If I have to go
> this way, it will certainly be a last resort...
>
> Thanks again for your reply, will get back to you with what I find...
>
>
>
>
>>
>> peace & happiness,
>> martin
>>
>

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: ctdb vacuum timeouts and record locks

Samba - General mailing list
hm, I stand correct on the problem solved statement below.  Ip addresses
are simply not cooperating on the 2nd node.

root@vault1:~# ctdb ip
Public IPs on node 0
192.168.120.90 0
192.168.120.91 0
192.168.120.92 0
192.168.120.93 0

root@vault2:/service/ctdb/log/main# ctdb ip
Public IPs on node 1
192.168.120.90 0
192.168.120.91 0
192.168.120.92 0
192.168.120.93 0

root@vault2:/service/ctdb/log/main# ctdb moveip 192.168.120.90 1
Control TAKEOVER_IP failed, ret=-1
Failed to takeover IP on node 1

root@vault1:~# ctdb moveip 192.168.120.90 0
Memory allocation error

root@vault2:/service/ctdb/log/main# ctdb ipinfo 192.168.120.90
Public IP[192.168.120.90] info on node 1
IP:192.168.120.90
CurrentNode:0
NumInterfaces:1
Interface[1]: Name:eth0 Link:up References:0

Logs on vault2 (stays banned because it can't obtain IP):
IP 192.168.120.90 still hosted during release IP callback, failing
IP 192.168.120.92 still hosted during release IP callback, failing

root@vault1:~# ctdb delip 192.168.120.90
root@vault1:~# ctdb delip 192.168.120.92
root@vault2:/service/ctdb/log/main# ctdb addip 192.168.120.90/22 eth0
Node already knows about IP 192.168.120.90
root@vault2:/service/ctdb/log/main# ctdb ip
Public IPs on node 1
192.168.120.90 -1
192.168.120.91 0
192.168.120.92 -1
192.168.120.93 0


I am using the 10.external.  ip addr show shows the correct IP addresses
on eth0 in the lxc container.  rebooted the physical machine, this node
is buggered.  shut it down, used ip addr add to put the addresses on the
other node, used ctdb addip and the node took it and node1 is now
functioning with all 4 IPs just fine.  Or so it appears right now.

something is seriously schizophrenic here...




On 2017-11-02 11:17 AM, Computerisms Corporation via samba wrote:

> Hi,
>
> This occurred again this morning, when the user reported the problem, I
> found in the ctdb logs that vacuuming has been going on since last
> night.  The need to fix it was urgent (when isn't it?) so I didn't have
> time to poke around for clues, but immediately restarted the lxc
> container.  But this time it wouldn't restart, which I had time to trace
> to a hung smbd process, and between that and a run of the debug_locks.sh
> script, I traced it to the user reporting the problem.  Given that the
> user was primarily having problems with files in a given folder, I am
> thinking this is because of some kind of lock on a file within that folder.
>
> Ended up rebooting both physical machines, problem solved.  for now.
>
> So, not sure how to determine if this is a gluster problem, an lxc
> problem, or a ctdb/smbd problem.  Thoughts/suggestions are welcome...
>
> On 2017-10-27 10:09 AM, Computerisms Corporation via samba wrote:
>> Hi Martin,
>>
>> Thanks for reading and taking the time to reply
>>
>>>> ctdbd[89]: Unable to get RECORD lock on database locking.tdb for 20
>>>> seconds
>>>> /usr/local/samba/etc/ctdb/debug_locks.sh: 142:
>>>> /usr/local/samba/etc/ctdb/debug_locks.sh: cannot create : Directory
>>>> nonexistent
>>>> sh: echo: I/O error
>>>> sh: echo: I/O error
>>>
>>> That's weird.  The only file really created by that script is the lock
>>> file that is used to make sure we don't debug locks too many times.
>>> That should be in:
>>>
>>>    "${CTDB_SCRIPT_VARDIR}/debug_locks.lock"
>>
>> Next time it happens I will check this.
>>
>>> The other possibility is the use of the script_log() function to try to
>>> get the output logged.  script_log() isn't my greatest moment.  When
>>> debugging you could just replace it with the logger command to get the
>>> output out to syslog.
>>
>> Okay, that sounds useful, will see what I can do next time I see the
>> problem...
>>
>>>> My setup is two servers, the OS is debian and is running samba AD on
>>>> dedicated SSDs, and each server has a RAID array of HDDs for storage,
>>>> with a mirrored GlusterFS running on top of them.  Each OS has an LXC
>>>> container running the clustered member servers with the GlusterFS
>>>> mounted to the containers.  The tdb files are in the containers, not on
>>>> the shared storage.  I do not use ctdb to start smbd/nmbd.  I can't
>>>> think what else is relevant about my setup as it pertains to this
>>>> issue...
>>>
>>> Are the TDB files really on a FUSE filesystem?  Is that an artifact of
>>> the LXC containers?  If so, could it be that locking isn't reliable on
>>> the FUSE filesystem?
>>
>> No.  The TDB files are in the container, and the container is on the
>> SSD with the OS.  running mount from within the container shows:
>>
>> /dev/sda1 on / type ext4 (rw,relatime,errors=remount-ro,data=ordered)
>>
>> However, the gluster native client is a fuse-based system, so the data
>> is stored on a fuse system which is mounted in the container:
>>
>> masterchieflian:ctfngluster on /CTFN type fuse.glusterfs
>> (rw,relatime,user_id=0,group_id=0,allow_other,max_read=131072)
>>
>> Since this is where the files that become inaccessible are, perhaps
>> this is really where the problem is, and not with the locking.tdb
>> file?  I will investigate about file locks on the gluster system...
>>
>>> Is it possible to try this without the containers?  That would
>>> certainly tell you if the problem is related to the container
>>> infrastructure...
>>
>> I like to think everything is possible, but it's not really feasible
>> in this case.  Since there are only two physical servers, and they
>> need to be running AD, the only way to separate the containers now is
>> with additional machines to act as member servers.  And because
>> everything tested fine and actually was fine for at least two weeks,
>> these servers are in production now and have been for a few months.  
>> If I have to go this way, it will certainly be a last resort...
>>
>> Thanks again for your reply, will get back to you with what I find...
>>
>>
>>
>>
>>>
>>> peace & happiness,
>>> martin
>>>
>>
>

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: ctdb vacuum timeouts and record locks

Samba - General mailing list
In reply to this post by Samba - General mailing list
On Thu, 2 Nov 2017 11:17:27 -0700, Computerisms Corporation via samba
<[hidden email]> wrote:

> This occurred again this morning, when the user reported the problem, I
> found in the ctdb logs that vacuuming has been going on since last
> night.  The need to fix it was urgent (when isn't it?) so I didn't have
> time to poke around for clues, but immediately restarted the lxc
> container.  But this time it wouldn't restart, which I had time to trace
> to a hung smbd process, and between that and a run of the debug_locks.sh
> script, I traced it to the user reporting the problem.  Given that the
> user was primarily having problems with files in a given folder, I am
> thinking this is because of some kind of lock on a file within that
> folder.
>
> Ended up rebooting both physical machines, problem solved.  for now.
>
> So, not sure how to determine if this is a gluster problem, an lxc
> problem, or a ctdb/smbd problem.  Thoughts/suggestions are welcome...

You need a stack trace of the stuck smbd process.  If it is wedged in a
system call on the cluster filesystem then you can blame the cluster
filesystem.  debug_locks.sh is meant to be able to get you the relevant
stack trace via gstack.  In fact, even before you get the stack trace
you could check a process listing to see if the process is stuck in D
state.

gstack basically does:

  gdb -batch -ex "thread apply all bt" -p <pid>

For a single-threaded process it leaves out "thread apply all".
However, in recent GDB I'm not sure it makes a difference... seems to
work for me on Linux.

Note that gstack/gdb will hang when run against a process in D state.

peace & happiness,
martin

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: ctdb vacuum timeouts and record locks

Samba - General mailing list
In reply to this post by Samba - General mailing list
On Thu, 2 Nov 2017 12:17:56 -0700, Computerisms Corporation via samba
<[hidden email]> wrote:

> hm, I stand correct on the problem solved statement below.  Ip addresses
> are simply not cooperating on the 2nd node.
>
> root@vault1:~# ctdb ip
> Public IPs on node 0
> 192.168.120.90 0
> 192.168.120.91 0
> 192.168.120.92 0
> 192.168.120.93 0
>
> root@vault2:/service/ctdb/log/main# ctdb ip
> Public IPs on node 1
> 192.168.120.90 0
> 192.168.120.91 0
> 192.168.120.92 0
> 192.168.120.93 0
>
> root@vault2:/service/ctdb/log/main# ctdb moveip 192.168.120.90 1
> Control TAKEOVER_IP failed, ret=-1
> Failed to takeover IP on node 1
>
> root@vault1:~# ctdb moveip 192.168.120.90 0
> Memory allocation error
>
> root@vault2:/service/ctdb/log/main# ctdb ipinfo 192.168.120.90
> Public IP[192.168.120.90] info on node 1
> IP:192.168.120.90
> CurrentNode:0
> NumInterfaces:1
> Interface[1]: Name:eth0 Link:up References:0
>
> Logs on vault2 (stays banned because it can't obtain IP):
> IP 192.168.120.90 still hosted during release IP callback, failing
> IP 192.168.120.92 still hosted during release IP callback, failing
>
> root@vault1:~# ctdb delip 192.168.120.90
> root@vault1:~# ctdb delip 192.168.120.92
> root@vault2:/service/ctdb/log/main# ctdb addip 192.168.120.90/22 eth0
> Node already knows about IP 192.168.120.90
> root@vault2:/service/ctdb/log/main# ctdb ip
> Public IPs on node 1
> 192.168.120.90 -1
> 192.168.120.91 0
> 192.168.120.92 -1
> 192.168.120.93 0
>
>
> I am using the 10.external.  ip addr show shows the correct IP addresses
> on eth0 in the lxc container.  rebooted the physical machine, this node
> is buggered.  shut it down, used ip addr add to put the addresses on the
> other node, used ctdb addip and the node took it and node1 is now
> functioning with all 4 IPs just fine.  Or so it appears right now.
>
> something is seriously schizophrenic here...

I'm wondering why you're using 10.external.  Although we have tested
it, we haven't actually seen it used in production before!  10.external
is a hack to allow use of CTDB's connection tracking while managing the
public IP addresses externally.  That is, you tell CTDB about the
public IPs, use "ctdb moveip" to inform CTDB about moved public IPs and
it sends grat ARPs and tickle ACKs on the takeover node.  It doesn't
actually assign the public IP addresses to nodes.

The documentation might not be clear on this but if you're using
10.external then you need to have the DisableIPFailover tunable set to
1 on all nodes so that CTDB doesn't try to move the IPs itself.

Please let us know if the documentation could be improved...

peace & happiness,
martin

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: ctdb vacuum timeouts and record locks

Samba - General mailing list
Hi Martin,

Thanks for your answer...

>> I am using the 10.external.  ip addr show shows the correct IP addresses
>> on eth0 in the lxc container.  rebooted the physical machine, this node
>> is buggered.  shut it down, used ip addr add to put the addresses on the
>> other node, used ctdb addip and the node took it and node1 is now
>> functioning with all 4 IPs just fine.  Or so it appears right now.
>>
>> something is seriously schizophrenic here...
>
> I'm wondering why you're using 10.external.  Although we have tested
> it, we haven't actually seen it used in production before!  10.external
> is a hack to allow use of CTDB's connection tracking while managing the
> public IP addresses externally.  That is, you tell CTDB about the
> public IPs, use "ctdb moveip" to inform CTDB about moved public IPs and
> it sends grat ARPs and tickle ACKs on the takeover node.  It doesn't
> actually assign the public IP addresses to nodes.

Hm, okay, I was clear that using 10.external it is a human's
responsibility to deal with assigning IPs to physical interfaces.  In
re-reading the docs, I see DeterministicIPs and NoIPFailback are
required for moveip, which I am not sure are set.  will check next
opportunity, if they aren't that might explain the behaviour, however,
the ips were correctly assigned using the ip command.

The reason I am using 10.external is because when I initially set up my
cluster test environment, none of ctdb's automatic networking
assignments worked.  ip addr show wouldn't display the addresses as
being assigned to the interface.  I never did get down to the bottom of
that problem, I had thought perhaps the lxc container was the issue, but
don't know why it would be, the ip commands all seem to work fine from
th cli.

While I was trying to find my way around that, I found 10.external.  I
found that by adjusting my start scripts to include the appropriate ip
addr add commands, it worked fine.  in my test environment I played with
the ctdb addip/delip/moveip commands, and manually assigning the
addresses, and it all worked fine.  If I turned off a node, I could
uncomment a couple lines in the start script in the other node and
restart and everything moved to where it was supposed to be.

But not all things have worked in production as they did in my testing
environment, and doesn't always seem to work the same in production from
one time to the next, for that matter...
> The documentation might not be clear on this but if you're using
> 10.external then you need to have the DisableIPFailover tunable set to
> 1 on all nodes so that CTDB doesn't try to move the IPs itself.

I do have the DisableIPFailover set.

from the documentation, I am under the impression that if I do ctdb
delip on one node, and ctdb addip on the other node, and make sure the
other node shows the correct additional IPs assigned to the physical
interface using the ip addr show command, that should move an ip from
one node to the other.  But when I do this, I will frequently still see
messages like <ip> still hosted during callback, or failed to release
<ip> in the logs.  sometimes on startup, I will see log entries like
<ip> incorrectly on an interface, when ip addr show shows the address is
correctly on an interface, and ctdb ipinfo will show that the ip is
assigned to the node.

Does this mean these commands are not working, or could it be that the
10.external doesn't do the magic in these cases?

> Please let us know if the documentation could be improved...

Often documentation isn't straightforward until you have had some
experience and gained some of context that those who wrote it have.  I
am not sure about improving documentation, but I can say I learned
significantly more about how to set things up, what to expect, and what
procedures to perform by reading mailing list posts than I did by
reading the manuals or the wiki...

>
> peace & happiness,
> martin
>

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: ctdb vacuum timeouts and record locks

Samba - General mailing list
On Tue, 7 Nov 2017 17:05:27 -0800, Computerisms Corporation via samba
<[hidden email]> wrote:

> >> I am using the 10.external.  ip addr show shows the correct IP addresses
> >> on eth0 in the lxc container.  rebooted the physical machine, this node
> >> is buggered.  shut it down, used ip addr add to put the addresses on the
> >> other node, used ctdb addip and the node took it and node1 is now
> >> functioning with all 4 IPs just fine.  Or so it appears right now.
> >>
> >> something is seriously schizophrenic here...  
> >
> > I'm wondering why you're using 10.external.  Although we have tested
> > it, we haven't actually seen it used in production before!  10.external
> > is a hack to allow use of CTDB's connection tracking while managing the
> > public IP addresses externally.  That is, you tell CTDB about the
> > public IPs, use "ctdb moveip" to inform CTDB about moved public IPs and
> > it sends grat ARPs and tickle ACKs on the takeover node.  It doesn't
> > actually assign the public IP addresses to nodes.  
>
> Hm, okay, I was clear that using 10.external it is a human's
> responsibility to deal with assigning IPs to physical interfaces.  In
> re-reading the docs, I see DeterministicIPs and NoIPFailback are
> required for moveip, which I am not sure are set.  will check next
> opportunity, if they aren't that might explain the behaviour, however,
> the ips were correctly assigned using the ip command.

The documentation (CTDB >= 4.6) for moveip says:

       IPAllocAlgorithm != 0

so it will work for the other algorithms but not deterministic.

In 4.5, which is what I assume you're running the documentation
recommends:

        DeterministicIPs = 0

so, this one needs to be off.

I don't think these options will explain the messages you're seeing.

> The reason I am using 10.external is because when I initially set up my
> cluster test environment, none of ctdb's automatic networking
> assignments worked.  ip addr show wouldn't display the addresses as
> being assigned to the interface.  I never did get down to the bottom of
> that problem, I had thought perhaps the lxc container was the issue, but
> don't know why it would be, the ip commands all seem to work fine from
> th cli.

OK.  CTDB just runs the "ip" command in the event scripts to in most
cases it should be the same as running them from the cli. I wonder if it
could be an SELinux issue or something?

> While I was trying to find my way around that, I found 10.external.  I
> found that by adjusting my start scripts to include the appropriate ip
> addr add commands, it worked fine.  in my test environment I played with
> the ctdb addip/delip/moveip commands, and manually assigning the
> addresses, and it all worked fine.  If I turned off a node, I could
> uncomment a couple lines in the start script in the other node and
> restart and everything moved to where it was supposed to be.

You shouldn't need to mess with addip and delip.  If the IP addresses
are configured in the public addresses file at startup then moveip
should be sufficient to let ctdbd know that the address has moved.

> > The documentation might not be clear on this but if you're using
> > 10.external then you need to have the DisableIPFailover tunable set to
> > 1 on all nodes so that CTDB doesn't try to move the IPs itself.  
>
> I do have the DisableIPFailover set.
>
> from the documentation, I am under the impression that if I do ctdb
> delip on one node, and ctdb addip on the other node, and make sure the
> other node shows the correct additional IPs assigned to the physical
> interface using the ip addr show command, that should move an ip from
> one node to the other.  But when I do this, I will frequently still see
> messages like <ip> still hosted during callback, or failed to release
> <ip> in the logs.  sometimes on startup, I will see log entries like
> <ip> incorrectly on an interface, when ip addr show shows the address is
> correctly on an interface, and ctdb ipinfo will show that the ip is
> assigned to the node.

This message:

  IP 192.168.120.90 still hosted during release IP callback, failing

comes from this block of code in
ctdb/server/ctdb_takeover.c:release_ip_callback():

        if (ctdb->tunable.disable_ip_failover == 0 && ctdb->do_checkpublicip) {
                if  (ctdb_sys_have_ip(state->addr)) {
                        DEBUG(DEBUG_ERR,
                              ("IP %s still hosted during release IP callback, failing\n",
                               ctdb_addr_to_str(state->addr)));
                        ctdb_request_control_reply(ctdb, state->c,
                                                   NULL, -1, NULL);
                        talloc_free(state);
                        return;
                }
        }

So, if DisableIPFailover is set to 1 then that message can't happen.
Remember that the tunables are not cluster-wide, so need to be set on
all nodes.

> Does this mean these commands are not working, or could it be that the
> 10.external doesn't do the magic in these cases?

10.external doesn't do anything for the "releaseip" and "takeip"
events.  It really does depend on the IP address(es) being moved
manually and "moveip" being used...

> > Please let us know if the documentation could be improved...  
>
> Often documentation isn't straightforward until you have had some
> experience and gained some of context that those who wrote it have.  I
> am not sure about improving documentation, but I can say I learned
> significantly more about how to set things up, what to expect, and what
> procedures to perform by reading mailing list posts than I did by
> reading the manuals or the wiki...

Hmmm... we've been trying to turn the wiki content for CTDB into a very
simple how to... but it doesn't look like we're succeeding.  :-(

If you can point to particular things then we'll try to improve them...

peace & happiness,
martin

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: ctdb vacuum timeouts and record locks

Samba - General mailing list
In reply to this post by Samba - General mailing list
Hi Martin,

well, it has been over a week since my last hung process, but got
another one today...
>> So, not sure how to determine if this is a gluster problem, an lxc
>> problem, or a ctdb/smbd problem.  Thoughts/suggestions are welcome...
>
> You need a stack trace of the stuck smbd process.  If it is wedged in a
> system call on the cluster filesystem then you can blame the cluster
> filesystem.  debug_locks.sh is meant to be able to get you the relevant
> stack trace via gstack.  In fact, even before you get the stack trace
> you could check a process listing to see if the process is stuck in D
> state.

So, yes, I do have a process stuck in the D state.  is in an smbd
process.  matching up the times in the logs, I see that the the
"Vacuuming child process timed out for db locking.tdb" error in ctdb
lines up with the user who owns the the smbd process accessing a file
that has been problematic before.  it is an xlsx file.

> gstack basically does:
>
>    gdb -batch -ex "thread apply all bt" -p <pid>
>
> For a single-threaded process it leaves out "thread apply all".
> However, in recent GDB I'm not sure it makes a difference... seems to
> work for me on Linux.
>
> Note that gstack/gdb will hang when run against a process in D state.

Indeed, gdb, pstack, and strace all either hang or output no information.

I have been trying to find a way to get the actual gdb output, but all I
can seem to find is the contents of /proc/<pid>/stack:

[<ffffffffc05ed856>] request_wait_answer+0x166/0x1f0 [fuse]
[<ffffffffa04b8d50>] prepare_to_wait_event+0xf0/0xf0
[<ffffffffc05ed958>] __fuse_request_send+0x78/0x80 [fuse]
[<ffffffffc05f0bdd>] fuse_simple_request+0xbd/0x190 [fuse]
[<ffffffffc05f6c37>] fuse_setlk+0x177/0x190 [fuse]
[<ffffffffa0659467>] SyS_flock+0x117/0x190
[<ffffffffa0403b1c>] do_syscall_64+0x7c/0xf0
[<ffffffffa0a0632f>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

I am still not too sure how to interpret this, but I think this is
pointing me to the gluster file system, so will see what I can find
chasing that down...


>
> peace & happiness,
> martin
>

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: ctdb vacuum timeouts and record locks

Samba - General mailing list
On Tue, 14 Nov 2017 22:48:57 -0800, Computerisms Corporation via samba
<[hidden email]> wrote:

> well, it has been over a week since my last hung process, but got
> another one today...
> >> So, not sure how to determine if this is a gluster problem, an lxc
> >> problem, or a ctdb/smbd problem.  Thoughts/suggestions are welcome...  
> >
> > You need a stack trace of the stuck smbd process.  If it is wedged in a
> > system call on the cluster filesystem then you can blame the cluster
> > filesystem.  debug_locks.sh is meant to be able to get you the relevant
> > stack trace via gstack.  In fact, even before you get the stack trace
> > you could check a process listing to see if the process is stuck in D
> > state.  
>
> So, yes, I do have a process stuck in the D state.  is in an smbd
> process.  matching up the times in the logs, I see that the the
> "Vacuuming child process timed out for db locking.tdb" error in ctdb
> lines up with the user who owns the the smbd process accessing a file
> that has been problematic before.  it is an xlsx file.
>
> > gstack basically does:
> >
> >    gdb -batch -ex "thread apply all bt" -p <pid>
> >
> > For a single-threaded process it leaves out "thread apply all".
> > However, in recent GDB I'm not sure it makes a difference... seems to
> > work for me on Linux.
> >
> > Note that gstack/gdb will hang when run against a process in D state.  
>
> Indeed, gdb, pstack, and strace all either hang or output no information.
>
> I have been trying to find a way to get the actual gdb output, but all I
> can seem to find is the contents of /proc/<pid>/stack:
>
> [<ffffffffc05ed856>] request_wait_answer+0x166/0x1f0 [fuse]
> [<ffffffffa04b8d50>] prepare_to_wait_event+0xf0/0xf0
> [<ffffffffc05ed958>] __fuse_request_send+0x78/0x80 [fuse]
> [<ffffffffc05f0bdd>] fuse_simple_request+0xbd/0x190 [fuse]
> [<ffffffffc05f6c37>] fuse_setlk+0x177/0x190 [fuse]
> [<ffffffffa0659467>] SyS_flock+0x117/0x190
> [<ffffffffa0403b1c>] do_syscall_64+0x7c/0xf0
> [<ffffffffa0a0632f>] entry_SYSCALL64_slow_path+0x25/0x25
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> I am still not too sure how to interpret this, but I think this is
> pointing me to the gluster file system, so will see what I can find
> chasing that down...

Yes, it does look like it is in the gluster filesystem.

Are you only accessing the filesystem via Samba or do you also have
something like NFS exports?  If you are only exporting via Samba then
you could trying setting "posix locking = no" in your Samba
configuration.  However, please read the documentation for that option
in smb.conf(5) and be sure of your use-case before trying this on a
production system...

peace & happiness,
martin

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba