Memory leak in ntlm_auth ?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Memory leak in ntlm_auth ?

Samba - samba-technical mailing list
 

Hi,

I wish to tell you a tale of investigation and suspicion of a memory
leak in the ntlm_auth program (as compiled from
source3/utils/ntlm_auth.c).

So in our system, we use ntlm_auth to provide both Kerberos and NTLM
authentication for transparent (and not-so transparent) web proxy
connections. In this particular instance, Kerberos is the auth of
choice, but I believe this is relevant to NTLM too.

We spawn up a pool of ntlm_auth processes on demand. If a process is not
available when a connection comes in, we create a new one up to a
maximum.

The base software version is 4.2.10, as shipped with Debian 8 (Jessie).

Now that the particulars of the environment are out of the way.....

So we have a customer. And they phoned in a support call that their
system was falling over. We dive in remotely, and see each ntlm_auth
process is running at an impressive 1.7GB of memory in the "VIRT" column
of top. EEP!

Suspecting a memory leak, we compiled up version 4.2.10 with a call to
talloc_enable_leak_reporting_full() in source3/utils/ntlm_auth.c (why
isn't this a command line option at the least?), along with a mod to our
own processes to send CTRL+D to ntlm_auth prior to terminating the
process, and gathering the report on STDERR.

Left it running for a day and examined the report. Each process had on
average over 19,000 DATA_BLOBs of around 1.5K in size, reportedly
created at lib/util/base64.c line 35, the function
base64_decode_data_blob_talloc().

After digging through the code, this function is called on (pseudo-code)
&line_of_STDIN[3] to decode the base64 blob input to a binary blob.
There's then a load of logic surrounding the 2-letter command code and
processing gensec, etc, etc, etc.

Now... the crux of the matter. If I compare with an earlier (3.x.x)
branch, we see that data_blob_free() is called on this returned blob
whenever the command handler returns. In 4.x.x this code has been
refactored, and the function manage_gensec_request() doesn't always call
data_blob_free(). In fact, it rarely calls it at all.

There's also a couple of returns that omit talloc_free(mem_ctx) that
seems to precede every other return in this function.

So, on this hunch, I added in the missing data_blob_free() calls and the
two talloc_free(mem_ctx) calls, and rebuilt. Now the same customer
system is sitting happily at a low memory usage, not showing so much of
a hint of a memory leak.

I've attached the patch we used to "correct" ntlm_auth.c. I'd be
interested to here some technical commentary from knowledge Samba bods
:)

Regards

Rebecca Gellman (On behalf of SmoothWall)

 

more_ntlm_auth_memory_leaks.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Memory leak in ntlm_auth ?

Samba - samba-technical mailing list
Hi Rebecca,

thanks for the report,
can you please also file a bug at https://bugzilla.samba.org,
this is required to get the fix into the release branches.

While I think your patch fixes the problem, I'd propose the attached
patch for upstream.

Can you test if the attached patch also works for you?

Thanks!
metze

Am 04.04.2017 um 11:20 schrieb Rebecca Gellman via samba-technical:

>  
>
> Hi,
>
> I wish to tell you a tale of investigation and suspicion of a memory
> leak in the ntlm_auth program (as compiled from
> source3/utils/ntlm_auth.c).
>
> So in our system, we use ntlm_auth to provide both Kerberos and NTLM
> authentication for transparent (and not-so transparent) web proxy
> connections. In this particular instance, Kerberos is the auth of
> choice, but I believe this is relevant to NTLM too.
>
> We spawn up a pool of ntlm_auth processes on demand. If a process is not
> available when a connection comes in, we create a new one up to a
> maximum.
>
> The base software version is 4.2.10, as shipped with Debian 8 (Jessie).
>
> Now that the particulars of the environment are out of the way.....
>
> So we have a customer. And they phoned in a support call that their
> system was falling over. We dive in remotely, and see each ntlm_auth
> process is running at an impressive 1.7GB of memory in the "VIRT" column
> of top. EEP!
>
> Suspecting a memory leak, we compiled up version 4.2.10 with a call to
> talloc_enable_leak_reporting_full() in source3/utils/ntlm_auth.c (why
> isn't this a command line option at the least?), along with a mod to our
> own processes to send CTRL+D to ntlm_auth prior to terminating the
> process, and gathering the report on STDERR.
>
> Left it running for a day and examined the report. Each process had on
> average over 19,000 DATA_BLOBs of around 1.5K in size, reportedly
> created at lib/util/base64.c line 35, the function
> base64_decode_data_blob_talloc().
>
> After digging through the code, this function is called on (pseudo-code)
> &line_of_STDIN[3] to decode the base64 blob input to a binary blob.
> There's then a load of logic surrounding the 2-letter command code and
> processing gensec, etc, etc, etc.
>
> Now... the crux of the matter. If I compare with an earlier (3.x.x)
> branch, we see that data_blob_free() is called on this returned blob
> whenever the command handler returns. In 4.x.x this code has been
> refactored, and the function manage_gensec_request() doesn't always call
> data_blob_free(). In fact, it rarely calls it at all.
>
> There's also a couple of returns that omit talloc_free(mem_ctx) that
> seems to precede every other return in this function.
>
> So, on this hunch, I added in the missing data_blob_free() calls and the
> two talloc_free(mem_ctx) calls, and rebuilt. Now the same customer
> system is sitting happily at a low memory usage, not showing so much of
> a hint of a memory leak.
>
> I've attached the patch we used to "correct" ntlm_auth.c. I'd be
> interested to here some technical commentary from knowledge Samba bods
> :)
>
> Regards
>
> Rebecca Gellman (On behalf of SmoothWall)
>
>  
>

tmp.diff.txt (3K) Download Attachment
signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Memory leak in ntlm_auth ?

Samba - samba-technical mailing list
Hi Rebecca,

I've filed https://bugzilla.samba.org/show_bug.cgi?id=12736
for this...

metze

> thanks for the report,
> can you please also file a bug at https://bugzilla.samba.org,
> this is required to get the fix into the release branches.
>
> While I think your patch fixes the problem, I'd propose the attached
> patch for upstream.
>
> Can you test if the attached patch also works for you?
>
> Thanks!
> metze
>
> Am 04.04.2017 um 11:20 schrieb Rebecca Gellman via samba-technical:
>>  
>>
>> Hi,
>>
>> I wish to tell you a tale of investigation and suspicion of a memory
>> leak in the ntlm_auth program (as compiled from
>> source3/utils/ntlm_auth.c).
>>
>> So in our system, we use ntlm_auth to provide both Kerberos and NTLM
>> authentication for transparent (and not-so transparent) web proxy
>> connections. In this particular instance, Kerberos is the auth of
>> choice, but I believe this is relevant to NTLM too.
>>
>> We spawn up a pool of ntlm_auth processes on demand. If a process is not
>> available when a connection comes in, we create a new one up to a
>> maximum.
>>
>> The base software version is 4.2.10, as shipped with Debian 8 (Jessie).
>>
>> Now that the particulars of the environment are out of the way.....
>>
>> So we have a customer. And they phoned in a support call that their
>> system was falling over. We dive in remotely, and see each ntlm_auth
>> process is running at an impressive 1.7GB of memory in the "VIRT" column
>> of top. EEP!
>>
>> Suspecting a memory leak, we compiled up version 4.2.10 with a call to
>> talloc_enable_leak_reporting_full() in source3/utils/ntlm_auth.c (why
>> isn't this a command line option at the least?), along with a mod to our
>> own processes to send CTRL+D to ntlm_auth prior to terminating the
>> process, and gathering the report on STDERR.
>>
>> Left it running for a day and examined the report. Each process had on
>> average over 19,000 DATA_BLOBs of around 1.5K in size, reportedly
>> created at lib/util/base64.c line 35, the function
>> base64_decode_data_blob_talloc().
>>
>> After digging through the code, this function is called on (pseudo-code)
>> &line_of_STDIN[3] to decode the base64 blob input to a binary blob.
>> There's then a load of logic surrounding the 2-letter command code and
>> processing gensec, etc, etc, etc.
>>
>> Now... the crux of the matter. If I compare with an earlier (3.x.x)
>> branch, we see that data_blob_free() is called on this returned blob
>> whenever the command handler returns. In 4.x.x this code has been
>> refactored, and the function manage_gensec_request() doesn't always call
>> data_blob_free(). In fact, it rarely calls it at all.
>>
>> There's also a couple of returns that omit talloc_free(mem_ctx) that
>> seems to precede every other return in this function.
>>
>> So, on this hunch, I added in the missing data_blob_free() calls and the
>> two talloc_free(mem_ctx) calls, and rebuilt. Now the same customer
>> system is sitting happily at a low memory usage, not showing so much of
>> a hint of a memory leak.
>>
>> I've attached the patch we used to "correct" ntlm_auth.c. I'd be
>> interested to here some technical commentary from knowledge Samba bods
>> :)
>>
>> Regards
>>
>> Rebecca Gellman (On behalf of SmoothWall)
>>
>>  
>>


signature.asc (853 bytes) Download Attachment
Loading...