答复: [Samba] The memory maybe leak in samba 4.3.11

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

答复: [Samba] The memory maybe leak in samba 4.3.11

Samba - samba-technical mailing list
Hi,
Thanks a lot.

Use the valgrind, we found the stack of the malloc as below, so, maybe it is not memory leak.
    ==2796353== 36,334,440 bytes in 100,929 blocks are still reachable in loss record 774 of 774
    ==2796353==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    ==2796353==    by 0x953B88F: ??? (in /usr/lib/x86_64-linux-gnu/samba/libmessages-dgm.so.0)
    ==2796353==    by 0x953BCA0: ??? (in /usr/lib/x86_64-linux-gnu/samba/libmessages-dgm.so.0)
    ==2796353==    by 0x953C342: unix_msg_send (in /usr/lib/x86_64-linux-gnu/samba/libmessages-dgm.so.0)
    ==2796353==    by 0x953E3B6: messaging_dgm_send (in /usr/lib/x86_64-linux-gnu/samba/libmessages-dgm.so.0)
    ==2796353==    by 0x71732FF: messaging_send_iov_from (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
    ==2796353==    by 0x716E1BA: ??? (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
    ==2796353==    by 0x716E869: ??? (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
    ==2796353==    by 0x716EA91: ??? (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
    ==2796353==    by 0x7171207: ctdbd_migrate (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
    ==2796353==    by 0x716BD6E: ??? (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
        ==2796353==    by 0xAE6692F: ??? (in /usr/lib/x86_64-linux-gnu/samba/libdbwrap.so.0)

Ifound the sendmsg is always failed because erron=EINTR, but smbd also need to malloc for the new msgs, so the res of the smbd grows up quickly.

I add some code in unix_dgram_send_job, just send 10 times if sendmsg faild with EINTR, the res will not grows up anymore.
Another, keep the max queue length to 100 also work well.

I don’t know whether it is suitable for the process, also, I want to know why sendmsg return EINTR.
Could you give us some suggestion about the case.

Br,
Zhang Xiaoxu.

-----邮件原件-----
发件人: L.P.H. van Belle [mailto:[hidden email]]
发送时间: 2017年5月4日 20:53
收件人: [hidden email]
主题: Re: [Samba] The memory maybe leak in samba 4.3.11

Oeps, sorry wrong one.

There where multiple fixes.

https://wiki.samba.org/index.php/Samba_4.4_Features_added/changed
BUG #12377: vfs_glusterfs: Fix a memory leak in connect path.

https://wiki.samba.org/index.php/Samba_4.5_Features_added/changed
BUG #12485: ctdbd_conn: Fix a resource leak.

https://wiki.samba.org/index.php/Samba_4.6_Features_added/changed
## BUG #12624: lib/pthreadpool: Fix a memory leak.


Greetz,

Louis

> -----Oorspronkelijk bericht-----
> Van: samba-technical
> [mailto:[hidden email]] Namens Zhangxiaoxu
> via samba-technical
> Verzonden: donderdag 4 mei 2017 14:42
> Aan: '[hidden email]';
> '[hidden email]'
> Onderwerp: The memory maybe leak in samba 4.3.11
>
> Hi,
>
> SCENE:
> Client A and Client B write data to an share.
> Client B open the share with windows explorer.
>
> Problem:
> The RES of smbd grows quickly.
>      $ top
> PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM
> TIME+ COMMAND        CODE    DATA
> 2913029 nobody    20   0 1664536 1.204g  16608 D   0.0  1.3
> 5:09.21 smbd            64        1318580
>
> Process 2913029 is the connection of client B.
> The DATA also grows quickly.
>
> Use smbcontrol, we found smbd process not talloc so much memory.
> $ smbcontrol 2913029 pool-usage
> full talloc report on 'null_context' (total  92353 bytes in
> 1274 blocks)
>
>   From logs, we found there a lot of 0x310(maybe
> MSG_PVFS_NOTIFY) message from the ctdb.
>
> Environment:
> Ubuntu 14.04
> Samba 4.3.11
> Windows 7 Client
>
> Configuration:
> clustering = yes
>     ctdbd socket = /var/run/ctdb/ctdbd.socket
>    max protocol = SMB3
>    large readwrite = yes
>    idmap config *:range = 1000000-1999999
>    log level = 2
>    use sendfile = yes
>    store dos attributes = yes
>    acl_xattr:ignore system acls = yes
>    aio read size = 1024
>    deadtime = 10
>
> Is this an exist problem?
> Could you give us some suggestion about how to find the root cause
> about the problem.
>
> Br,
> Zhang Xiaoxu.
> --------------------------------------------------------------
> --------------------------------------------------------------
> ---------
> ??????????????????????????????????????????????????????????????
> ??????????????????????????????????????????????????????????
> ??????????????????????????????????????????????????????????????
> ??????????????????????????????????????????????????????????
> ??????????????????????????????????????????????????????????????
> ??????????????????????????????????????????????????????????
> ?????????
> This e-mail and its attachments contain confidential information from
> H3C, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure,
> reproduction, or
> dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error,
> please notify the sender by phone or email immediately and delete it!
>



-------------------------------------------------------------------------------------------------------------------------------------
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!
Reply | Threaded
Open this post in threaded view
|

答复: 答复: [Samba] The memory maybe leak in samba 4.3.11

Samba - samba-technical mailing list
> What exact version are you using and
Version 4.3.11.

> what code did you add ?
I add some code in the 'queue_msg' function, if the length of the queue is more than 100, the msg will be droped.
So, it won't to malloc too much more memory for the msg.
The msgs is generated when 'ctdbd_migrate', I don’t whether it can be droped.
The messages may be too much, or the sokcet maybe too busy.

Now my application scene is:
There are 3 samba server nodes, 3 clients connect to different servers, every client write 64k to 30 files every 40ms.
If open the current folder with windows explorer in a client, the RES with top command of smbd in that server will grow up very very quickly.

top -d 1 -p 2708752 -b | grep smbd

2708752 nobody    20   0  701928 299184  18616 R 107.8  0.3   1:34.64 smbd
2708752 nobody    20   0  701928 299184  18616 R 112.9  0.3   1:35.77 smbd
2708752 nobody    20   0  701928 299184  18616 R  92.9  0.3   1:36.70 smbd
2708752 nobody    20   0  701928 299184  18616 R 101.8  0.3   1:37.72 smbd
2708752 nobody    20   0  701928 299184  18616 D  18.0  0.3   1:37.90 smbd
2708752 nobody    20   0  701928 299184  18616 D   0.0  0.3   1:37.90 smbd
2708752 nobody    20   0  701928 299184  18616 D   0.0  0.3   1:37.90 smbd
2708752 root      20   0  703336 300528  18616 R  35.0  0.3   1:38.25 smbd
2708752 nobody    20   0  707052 304196  18684 D  20.0  0.3   1:38.45 smbd
2708752 nobody    20   0  707052 304196  18684 D   0.0  0.3   1:38.45 smbd
2708752 nobody    20   0  707052 304196  18684 D   0.0  0.3   1:38.45 smbd
2708752 nobody    20   0  707052 304196  18684 D   0.0  0.3   1:38.45 smbd
2708752 nobody    20   0  720364 317564  18684 R  46.9  0.3   1:38.92 smbd
2708752 nobody    20   0  723360 320564  18564 R  87.9  0.3   1:39.80 smbd
2708752 nobody    20   0  724716 321916  18564 R  96.9  0.3   1:40.77 smbd
2708752 nobody    20   0  724716 321916  18564 R 112.8  0.3   1:41.90 smbd
2708752 nobody    20   0  724716 321916  18564 R  93.9  0.3   1:42.84 smbd
2708752 nobody    20   0  724716 321916  18564 R 100.8  0.3   1:43.85 smbd
2708752 nobody    20   0  724716 321916  18564 D  26.0  0.3   1:44.11 smbd
2708752 nobody    20   0  724716 321916  18564 D   0.0  0.3   1:44.11 smbd
2708752 nobody    20   0  724716 321916  18564 D   0.0  0.3   1:44.11 smbd
2708752 root      20   0  725484 322624  18564 R  17.0  0.3   1:44.28 smbd
2708752 nobody    20   0  734464 331480  18568 D  71.9  0.3   1:45.00 smbd
2708752 nobody    20   0  735456 332660  18568 R  90.9  0.3   1:45.91 smbd
2708752 nobody    20   0  736932 334140  18568 R 106.8  0.3   1:46.98 smbd
2708752 nobody    20   0  738496 335624  18568 D  48.0  0.3   1:47.46 smbd
2708752 nobody    20   0  738496 335624  18568 D   0.0  0.3   1:47.46 smbd
2708752 nobody    20   0  745548 342752  18568 R  68.9  0.3   1:48.15 smbd
2708752 nobody    20   0  747048 344120  18568 R 100.9  0.3   1:49.16 smbd
2708752 nobody    20   0  749148 346184  18568 D  52.9  0.4   1:49.69 smbd
2708752 nobody    20   0  749148 346184  18568 D   0.0  0.4   1:49.69 smbd
2708752 nobody    20   0  755772 352980  18568 R  74.9  0.4   1:50.44 smbd
2708752 nobody    20   0  756632 353840  18568 R 108.8  0.4   1:51.53 smbd

-----邮件原件-----
发件人: Jeremy Allison [mailto:[hidden email]]
发送时间: 2017年5月10日 2:14
收件人: zhangxiaoxu 13123 (RD)
抄送: 'L.P.H. van Belle'; [hidden email]; '[hidden email]'
主题: Re: 答复: [Samba] The memory maybe leak in samba 4.3.11

On Tue, May 09, 2017 at 02:59:51AM +0000, Zhangxiaoxu via samba-technical wrote:

> Hi,
> Thanks a lot.
>
> Use the valgrind, we found the stack of the malloc as below, so, maybe it is not memory leak.
>     ==2796353== 36,334,440 bytes in 100,929 blocks are still reachable in loss record 774 of 774
>     ==2796353==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>     ==2796353==    by 0x953B88F: ??? (in /usr/lib/x86_64-linux-gnu/samba/libmessages-dgm.so.0)
>     ==2796353==    by 0x953BCA0: ??? (in /usr/lib/x86_64-linux-gnu/samba/libmessages-dgm.so.0)
>     ==2796353==    by 0x953C342: unix_msg_send (in /usr/lib/x86_64-linux-gnu/samba/libmessages-dgm.so.0)
>     ==2796353==    by 0x953E3B6: messaging_dgm_send (in /usr/lib/x86_64-linux-gnu/samba/libmessages-dgm.so.0)
>     ==2796353==    by 0x71732FF: messaging_send_iov_from (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>     ==2796353==    by 0x716E1BA: ??? (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>     ==2796353==    by 0x716E869: ??? (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>     ==2796353==    by 0x716EA91: ??? (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>     ==2796353==    by 0x7171207: ctdbd_migrate (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>     ==2796353==    by 0x716BD6E: ??? (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>         ==2796353==    by 0xAE6692F: ??? (in /usr/lib/x86_64-linux-gnu/samba/libdbwrap.so.0)
>
> Ifound the sendmsg is always failed because erron=EINTR, but smbd also need to malloc for the new msgs, so the res of the smbd grows up quickly.
>
> I add some code in unix_dgram_send_job, just send 10 times if sendmsg faild with EINTR, the res will not grows up anymore.
> Another, keep the max queue length to 100 also work well.
>
> I don’t know whether it is suitable for the process, also, I want to know why sendmsg return EINTR.

EINTR always means a signal was received and interrupted the send.

You're using 4.3.x yes ? In that branch:

static void unix_dgram_send_job(void *private_data) {
        struct unix_dgram_msg *dmsg = private_data;

        do {
                struct msghdr_buf *hdr = unix_dgram_msghdr(dmsg);
                struct msghdr *msg = msghdr_buf_msghdr(hdr);
                dmsg->sent = sendmsg(dmsg->sock, msg, 0);
        } while ((dmsg->sent == -1) && (errno == EINTR));

        if (dmsg->sent == -1) {
                dmsg->sys_errno = errno;
        }
}

we already loop on EINTR. What exact version are you using and what code did you add ?
-------------------------------------------------------------------------------------------------------------------------------------
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!
Reply | Threaded
Open this post in threaded view
|

Re: 答复: [Samba] The memory maybe leak in samba 4.3.11

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Tue, May 09, 2017 at 02:59:51AM +0000, Zhangxiaoxu via samba-technical wrote:

> Use the valgrind, we found the stack of the malloc as below, so, maybe it is not memory leak.
>     ==2796353== 36,334,440 bytes in 100,929 blocks are still reachable in loss record 774 of 774
>     ==2796353==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>     ==2796353==    by 0x953B88F: ??? (in /usr/lib/x86_64-linux-gnu/samba/libmessages-dgm.so.0)
>     ==2796353==    by 0x953BCA0: ??? (in /usr/lib/x86_64-linux-gnu/samba/libmessages-dgm.so.0)
>     ==2796353==    by 0x953C342: unix_msg_send (in /usr/lib/x86_64-linux-gnu/samba/libmessages-dgm.so.0)
>     ==2796353==    by 0x953E3B6: messaging_dgm_send (in /usr/lib/x86_64-linux-gnu/samba/libmessages-dgm.so.0)
>     ==2796353==    by 0x71732FF: messaging_send_iov_from (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>     ==2796353==    by 0x716E1BA: ??? (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>     ==2796353==    by 0x716E869: ??? (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>     ==2796353==    by 0x716EA91: ??? (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>     ==2796353==    by 0x7171207: ctdbd_migrate (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>     ==2796353==    by 0x716BD6E: ??? (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>         ==2796353==    by 0xAE6692F: ??? (in /usr/lib/x86_64-linux-gnu/samba/libdbwrap.so.0)
>
> Ifound the sendmsg is always failed because erron=EINTR, but smbd also need to malloc for the new msgs, so the res of the smbd grows up quickly.
>
> I add some code in unix_dgram_send_job, just send 10 times if sendmsg faild with EINTR, the res will not grows up anymore.
> Another, keep the max queue length to 100 also work well.
>
> I don’t know whether it is suitable for the process, also, I want to know why sendmsg return EINTR.
> Could you give us some suggestion about the case.

Do you have processes in D state or processes using 100% CPU
continuously?

It might be that some message receiver does not pick up its messages
at all.

Volker

Reply | Threaded
Open this post in threaded view
|

Re: Re: 答复: [Samba] The memory maybe leak in samba 4.3.11

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
Thanks a lot.

> Do you have processes in D state or processes using 100% CPU continuously?
Yes, if open the folder which is writing by itself and another client in windows explorer, the process will always in D state, and the CPU is higher.
I notice that, the client will send `change notify request`, then the smbd will receive a lot of message with code 0x310.
If the client receive the 'change notify response', it will send 'query directory request' to server.

> It might be that some message receiver does not pick up its messages at all.
Any idea about the case?

How do you think the solution:
keep the message queue length to 100 or more, and drop the other messages in 'queue_msg' function.


-----邮件原件-----
发件人: Volker Lendecke [mailto:[hidden email]]
发送时间: 2017年5月11日 22:04
收件人: zhangxiaoxu 13123 (RD)
抄送: 'L.P.H. van Belle'; '[hidden email]'
主题: Re: 答复: [Samba] The memory maybe leak in samba 4.3.11

On Tue, May 09, 2017 at 02:59:51AM +0000, Zhangxiaoxu via samba-technical wrote:

> Use the valgrind, we found the stack of the malloc as below, so, maybe it is not memory leak.
>     ==2796353== 36,334,440 bytes in 100,929 blocks are still reachable in loss record 774 of 774
>     ==2796353==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>     ==2796353==    by 0x953B88F: ??? (in /usr/lib/x86_64-linux-gnu/samba/libmessages-dgm.so.0)
>     ==2796353==    by 0x953BCA0: ??? (in /usr/lib/x86_64-linux-gnu/samba/libmessages-dgm.so.0)
>     ==2796353==    by 0x953C342: unix_msg_send (in /usr/lib/x86_64-linux-gnu/samba/libmessages-dgm.so.0)
>     ==2796353==    by 0x953E3B6: messaging_dgm_send (in /usr/lib/x86_64-linux-gnu/samba/libmessages-dgm.so.0)
>     ==2796353==    by 0x71732FF: messaging_send_iov_from (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>     ==2796353==    by 0x716E1BA: ??? (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>     ==2796353==    by 0x716E869: ??? (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>     ==2796353==    by 0x716EA91: ??? (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>     ==2796353==    by 0x7171207: ctdbd_migrate (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>     ==2796353==    by 0x716BD6E: ??? (in /usr/lib/x86_64-linux-gnu/libsmbconf.so.0)
>         ==2796353==    by 0xAE6692F: ??? (in /usr/lib/x86_64-linux-gnu/samba/libdbwrap.so.0)
>
> Ifound the sendmsg is always failed because erron=EINTR, but smbd also need to malloc for the new msgs, so the res of the smbd grows up quickly.
>
> I add some code in unix_dgram_send_job, just send 10 times if sendmsg faild with EINTR, the res will not grows up anymore.
> Another, keep the max queue length to 100 also work well.
>
> I don’t know whether it is suitable for the process, also, I want to know why sendmsg return EINTR.
> Could you give us some suggestion about the case.

Do you have processes in D state or processes using 100% CPU continuously?

It might be that some message receiver does not pick up its messages at all.

Volker
-------------------------------------------------------------------------------------------------------------------------------------
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!
Reply | Threaded
Open this post in threaded view
|

Re: Re: 答复: [Samba] The memory maybe leak in samba 4.3.11

Samba - samba-technical mailing list
On Fri, May 12, 2017 at 01:35:46AM +0000, Zhangxiaoxu wrote:

> Thanks a lot.
>
> > Do you have processes in D state or processes using 100% CPU continuously?
> Yes, if open the folder which is writing by itself and another
> client in windows explorer, the process will always in D state, and
> the CPU is higher.
> I notice that, the client will send `change notify request`, then
> the smbd will receive a lot of message with code 0x310.
> If the client receive the 'change notify response', it will send
> 'query directory request' to server.

That works as designed. That is what file change notify is for. The
question is -- what is the process doing that can not receive the
messages? Is the sendmsg error code really EINTR, or is it
EAGAIN/EWOULDBLOCK? If you strace the sending process, you should find
the destination socket in the sendmsg calls, and you should
investigate why the receiving process can't pick up the messages it
should receive.

> > It might be that some message receiver does not pick up its messages at all.
> Any idea about the case?
>
> How do you think the solution:
> keep the message queue length to 100 or more, and drop the other
> messages in 'queue_msg' function.

It will help with the memory leak, but in a well-working system we
should never pile up messages. Something is going wrong in your system
that should be investigated. See above.

Volker

--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Re: 答复: [Samba] The memory maybe leak in samba 4.3.11

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
> Is the sendmsg error code really EINTR, or is it EAGAIN/EWOULDBLOCK?
EAGAIN (Resource temporarily unavailable), the strace as below.
Then somebody will reconnect to the socket.
Whether the message queue will leak after the reconnect?

> you should investigate why the receiving process can't pick up the messages it should receive
From the message body when sendmsg, I think it is used for 'change notify'.
If we use an more performance CPU, the speed of memory grows up slower than before.
If the large load will cause the problem?

The straces:
sendmsg(6, {msg_name(110)={sa_family=AF_LOCAL, sun_path="/var/lib/samba/private/msg.sock/1525222"}, msg_iov(3)=[{"\0\0\0\0\0\0\0\0", 8}, {"\346E\27\0\0\0\0\0\0\0\0\0\3\0\0\0\230\324)\303\356\245\213\222\7S\r\0\0\0\0\0"..., 52}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\200j\313?\205U\0\0\3\0\0\0win2"..., 53}], msg_controllen=0, msg_flags=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
socket(PF_LOCAL, SOCK_DGRAM, 0)         = 68
fcntl(68, F_GETFD)                      = 0
fcntl(68, F_SETFD, FD_CLOEXEC)          = 0
connect(68, {sa_family=AF_LOCAL, sun_path="/var/lib/samba/private/msg.sock/1525222"}, 110) = 0
pipe([69, 70])                          = 0
futex(0x7f9ae1190040, FUTEX_WAKE_PRIVATE, 2147483647) = 0 rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [FPE USR2 PIPE], 8) = 0

-----邮件原件-----
发件人: Volker Lendecke [mailto:[hidden email]]
发送时间: 2017年5月12日 13:54
收件人: zhangxiaoxu 13123 (RD)
抄送: '[hidden email]'
主题: Re: Re: 答复: [Samba] The memory maybe leak in samba 4.3.11

On Fri, May 12, 2017 at 01:35:46AM +0000, Zhangxiaoxu wrote:

> Thanks a lot.
>
> > Do you have processes in D state or processes using 100% CPU continuously?
> Yes, if open the folder which is writing by itself and another client
> in windows explorer, the process will always in D state, and the CPU
> is higher.
> I notice that, the client will send `change notify request`, then the
> smbd will receive a lot of message with code 0x310.
> If the client receive the 'change notify response', it will send
> 'query directory request' to server.

That works as designed. That is what file change notify is for. The question is -- what is the process doing that can not receive the messages? Is the sendmsg error code really EINTR, or is it EAGAIN/EWOULDBLOCK? If you strace the sending process, you should find the destination socket in the sendmsg calls, and you should investigate why the receiving process can't pick up the messages it should receive.

> > It might be that some message receiver does not pick up its messages at all.
> Any idea about the case?
>
> How do you think the solution:
> keep the message queue length to 100 or more, and drop the other
> messages in 'queue_msg' function.

It will help with the memory leak, but in a well-working system we should never pile up messages. Something is going wrong in your system that should be investigated. See above.

Volker

--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9 AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen http://www.sernet.de, mailto:[hidden email]
-------------------------------------------------------------------------------------------------------------------------------------
本邮件及其附件含有新华三技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from New H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!
Reply | Threaded
Open this post in threaded view
|

Re: Re: 答复: [Samba] The memory maybe leak in samba 4.3.11

Samba - samba-technical mailing list
On Mon, May 15, 2017 at 02:42:01AM +0000, Zhangxiaoxu wrote:
> > Is the sendmsg error code really EINTR, or is it EAGAIN/EWOULDBLOCK?
> EAGAIN (Resource temporarily unavailable), the strace as below.
> Then somebody will reconnect to the socket.

Yep, that's the way it happens. We do a blocking connect when the
receiving process does not pick up its messages.

> Whether the message queue will leak after the reconnect?

It will pile up after the reconnect (which happens in a different
thread).

> > you should investigate why the receiving process can't pick up the messages it should receive
> From the message body when sendmsg, I think it is used for 'change notify'.
> If we use an more performance CPU, the speed of memory grows up slower than before.
> If the large load will cause the problem?
>
> The straces:
> sendmsg(6, {msg_name(110)={sa_family=AF_LOCAL, sun_path="/var/lib/samba/private/msg.sock/1525222"}, msg_iov(3)=[{"\0\0\0\0\0\0\0\0", 8}, {"\346E\27\0\0\0\0\0\0\0\0\0\3\0\0\0\230\324)\303\356\245\213\222\7S\r\0\0\0\0\0"..., 52}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\200j\313?\205U\0\0\3\0\0\0win2"..., 53}], msg_controllen=0, msg_flags=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)

The question is -- what process is 1525222? Can you strace that one?
Does that one for example chew 100% CPU, or is it stuck? Is 1525222
the notifyd process that is a bottleneck for example, or is the
sending process the notifyd? You might want to take a look at the
ascii image in source3/smbd/notifyd/notifyd.h to get an idea of our
notify architecture. Maybe in your situation we need to do blocking
sends instead of queueing.

Volker

--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:[hidden email]