Possible samba4 / winbind memory leak?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Possible samba4 / winbind memory leak?

Andrew Walters

Hi all,

I've got five samba4 deployments across three sites, all now on release 4.0.4.

Two were set up when samba4 was in alpha stage before it got integrated DNS and smbd filesharing, and control one AD domain each. These sites are running samba4 and named on one IP, and samba3 listening on another IP (Franky-inspired) without issue - all happy.

Three sites are using samba4 with its own internal DNS server and samba3 fileserver both enabled, and with Samba4's libnss_winbind.so.2 copied to /lib64/. All three are on the same domain, one as a DC and two RODCs (on different sites linked by VPN), and each directly serves only about 3-6 users for now. The DC (and to a lesser extent the RODCs) seem to have memory leak induced issues. I'll focus on the DC as it is suffering the worst.

Basically over the course of several days one or two samba processes will grow and eat up most/all of the the 4GB RAM and 4GB swap.

This week I've run it with --leak-report-full, but when I killall samba on the DC the biggest memory eating process doesn't die.

After running killall samba, this is its line of the process that won't die in top, taken about two or three minutes after the killall samba:

==============================================================================
top - 10:04:28 up 10 days, 25 min,  2 users,  load average: 0.00, 0.99, 2.29
Tasks: 118 total,   1 running, 117 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us,  0.7%sy,  0.0%ni, 92.3%id,  6.9%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3922892k total,  2575808k used,  1347084k free,     7488k buffers
Swap:  4194296k total,  2053216k used,  2141080k free,   419996k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3268 root      20   0 4122m 1.6g 2656 S  0.0 43.3   1087:51 samba
==============================================================================

The high 15min load average is from before the killall samba when things were swapping around a fair bit.
You can see from the amount of free mem vs swap used that it was heavily loaded before the killall (sorry forgot to get a snapshot of top before doing the killall). Interestingly this process is completely idle. Because it won't respond to a SIGTERM, I don't think I'm going to get a leak report from it.

Once the offending samba process was the only one left running, this is the netstat -pln | grep samba output:
==============================================================================
unix  2      [ ACC ]     STREAM     LISTENING     14610435 3268/samba          /srv/adsrv/var/run/samba/winbindd/pipe
unix  2      [ ACC ]     STREAM     LISTENING     14610437 3268/samba          /srv/adsrv/var/lib/samba/winbindd_privileged/pipe
==============================================================================

This suggests to me there's some memory leak somewhere in the samba winbind code? Note the offending process is not listening on TCP at all.

The leak reports I *did* get contain some sensitive information. The text file comes in at 37MB which is difficult to scrub. Please email me if you want to see them.

Any ideas what might be happening here?

Regards


Andrew W

Reply | Threaded
Open this post in threaded view
|

Re: Possible samba4 / winbind memory leak?

Andrew Bartlett
On Tue, 2013-04-02 at 13:45 +1300, Andrew Walters wrote:
> Hi all,
>
> I've got five samba4 deployments across three sites, all now on release 4.0.4.

> You can see from the amount of free mem vs swap used that it was
> heavily loaded before the killall (sorry forgot to get a snapshot of
> top before doing the killall). Interestingly this process is
> completely idle. Because it won't respond to a SIGTERM, I don't think
> I'm going to get a leak report from it.

There are still ways to debug this, if you still have the process
around.  Ricky Nance (CC'ed) is the expert in this, so I've CC'ed him.

Essentially, what we need you to do is attach with gdb:

gdb -p <pid>

and then get me:

bt full

(because if it isn't responding to a sigkill, then it will be in an
infinite loop somewhere, and we want to know where that is).

Then you run:

p talloc_report_full(0, stderr)

You already have my GPG key, so encrypt the info to that key.

Thanks,

Andrew Bartlett

--
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org


Reply | Threaded
Open this post in threaded view
|

Re: Possible samba4 / winbind memory leak?

Andrew Walters
In reply to this post by Andrew Walters

----- Original Message -----
> From: "Andrew Bartlett" <[hidden email]>

> There are still ways to debug this, if you still have the process
> around.  Ricky Nance (CC'ed) is the expert in this, so I've CC'ed
> him.
>
> Essentially, what we need you to do is attach with gdb:
>
> gdb -p <pid>
>
> and then get me:
>
> bt full
>
> (because if it isn't responding to a sigkill, then it will be in an
> infinite loop somewhere, and we want to know where that is).
>
> Then you run:
>
> p talloc_report_full(0, stderr)
>

I killed it, sorry. It'll take a couple of days to get it back in that state. When it is, I'll run and send you as above.


Thanks,

Andrew W
Reply | Threaded
Open this post in threaded view
|

Re: Possible samba4 / winbind memory leak?

Ricky Nance
Andrew W.,

When I was running into memory leaks I had a cron job (it emailed me) that
ran ps_mem.py ( http://www.pixelbeat.org/scripts/ps_mem.py )* *every hour
and redirected it into a file. The bash script I ran looked like this...

#/bin/bash
date >> /sambamem.txt && ps_mem.py | grep 'samba\|smbd' >> /sambamem.txt


also, its probably easier to get the output running:

p talloc_report_full(0, fopen("/tmp/leak.txt", "w")

instead of



On Tue, Apr 2, 2013 at 4:04 AM, Andrew Walters <
[hidden email]> wrote:

>
> ----- Original Message -----
> > From: "Andrew Bartlett" <[hidden email]>
>
> > There are still ways to debug this, if you still have the process
> > around.  Ricky Nance (CC'ed) is the expert in this, so I've CC'ed
> > him.
> >
> > Essentially, what we need you to do is attach with gdb:
> >
> > gdb -p <pid>
> >
> > and then get me:
> >
> > bt full
> >
> > (because if it isn't responding to a sigkill, then it will be in an
> > infinite loop somewhere, and we want to know where that is).
> >
> > Then you run:
> >
> > p talloc_report_full(0, stderr)
> >
>
> I killed it, sorry. It'll take a couple of days to get it back in that
> state. When it is, I'll run and send you as above.
>
>
> Thanks,
>
> Andrew W
>



--
Reply | Threaded
Open this post in threaded view
|

Re: Possible samba4 / winbind memory leak?

Ricky Nance
(Sorry shift enter apparently sends emails in gmail...)

instead of:

p talloc_report_full(0, stderr)

then run:

tar --xz -cf leak.tar.xz /tmp/leak.txt

if you email it. (it will compress it a lot).

Ricky


On Tue, Apr 2, 2013 at 7:43 AM, Ricky Nance <[hidden email]
> wrote:

> Andrew W.,
>
> When I was running into memory leaks I had a cron job (it emailed me) that
> ran ps_mem.py ( http://www.pixelbeat.org/scripts/ps_mem.py )* *every hour
> and redirected it into a file. The bash script I ran looked like this...
>
> #/bin/bash
> date >> /sambamem.txt && ps_mem.py | grep 'samba\|smbd' >> /sambamem.txt
>
>
> also, its probably easier to get the output running:
>
> p talloc_report_full(0, fopen("/tmp/leak.txt", "w")
>
> instead of
>
>
>
> On Tue, Apr 2, 2013 at 4:04 AM, Andrew Walters <
> [hidden email]> wrote:
>
>>
>> ----- Original Message -----
>> > From: "Andrew Bartlett" <[hidden email]>
>>
>> > There are still ways to debug this, if you still have the process
>> > around.  Ricky Nance (CC'ed) is the expert in this, so I've CC'ed
>> > him.
>> >
>> > Essentially, what we need you to do is attach with gdb:
>> >
>> > gdb -p <pid>
>> >
>> > and then get me:
>> >
>> > bt full
>> >
>> > (because if it isn't responding to a sigkill, then it will be in an
>> > infinite loop somewhere, and we want to know where that is).
>> >
>> > Then you run:
>> >
>> > p talloc_report_full(0, stderr)
>> >
>>
>> I killed it, sorry. It'll take a couple of days to get it back in that
>> state. When it is, I'll run and send you as above.
>>
>>
>> Thanks,
>>
>> Andrew W
>>
>
>
>
> --
>
>


--