Quantcast

vfs_acl_xattr and Linux memory fragmentation

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

vfs_acl_xattr and Linux memory fragmentation

Samba - samba-technical mailing list
Hi,

I thought I'd share this, although I'm not sure we need to do anything
about it in upstream Samba.

We've come across incidents where we see on the console of our NAS
devices something like this:

smbd: page allocation failure: order:4, mode:0xc0d0
Pid: 6443, comm: smbd Tainted: P         C O 3.2.54 #1
Call Trace:
 [<ffffffff810b6345>] warn_alloc_failed+0xf5/0x140
 [<ffffffff810b7822>] ? drain_pages+0x32/0x90
 [<ffffffff810b78d0>] ? page_alloc_cpu_notify+0x50/0x50
 [<ffffffff810b78e1>] ? drain_local_pages+0x11/0x20
 [<ffffffff810b70c1>] __alloc_pages_nodemask+0x481/0x6d0
 [<ffffffff810e707f>] alloc_pages_current+0x7f/0xe0
 [<ffffffff810b5a99>] __get_free_pages+0x9/0x40
 [<ffffffff810ee3a8>] __kmalloc+0xe8/0xf0
 [<ffffffff811154e8>] getxattr+0x98/0x130
 [<ffffffff8110371d>] ? do_path_lookup+0x2d/0xc0
 [<ffffffff810eeb15>] ? kmem_cache_free+0x15/0x90
 [<ffffffff810ff92e>] ? putname+0x2e/0x40
 [<ffffffff8110453f>] ? user_path_at_empty+0x5f/0xa0
 [<ffffffff810a0ec0>] ? call_rcu_sched+0x10/0x20
 [<ffffffff8106d25a>] ? __put_cred+0x3a/0x50
 [<ffffffff81115654>] sys_getxattr+0x54/0x80
 [<ffffffff8105db54>] ? sys_setresgid+0x84/0x120
 [<ffffffff8169ced2>] system_call_fastpath+0x16/0x1b

...and a bunch of memory stats.

Our analysis is that:

1. Samba's vfs_acl_xattr() algorithm for getting an extended attribute
is "try with a 1K buffer and if that doesn't work try with 64K". This is
presumably an optimistic strategy to save system calls.

2. In older Linux (prior to 3.4), getxattr() used to require a
physically-contiguous (kmalloc'd) buffer whose size equals the max size
requested by the user. So even though no Linux in-tree file system
supports EA's larger than 4K, the getxattr() system call would try to
allocate 64K if Samba passes this number.

3. On busy servers, memory gets fragmented over time, and that's the
failure we've been seeing (notice the "order:4" - that means it tried
2^4 pages or 64K). That causes the getxattr to fail with ENOMEM.

4. In newer kernels (since commit 44c82498), it still tries the kmalloc,
but falls back to vmalloc. I'm not sure why kmalloc first, maybe that's
also some optimistic stragegy...

5. So in newer kernels, the 64K still incurs some extra-allocation
overhead, but at least it doesn't fail.

6. Since the initial kmalloc is done with GFP_KERNEL, I think there's a
middle case where the kernel would try to evict pages, and that might
cause a performance hit, all for memory that's not really required by
any in-tree file system (only ZFS supports 64K on Linux AFAIK).

Bottom line - Samba works well with recent Linux kernels, but we might
be taking a performance hit by asking for memory we probably don't need.
We might want to do the initial getxattr with 4K, or have a 4K step
between the 1K and the 64K.

Thoughts?

Uri.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: vfs_acl_xattr and Linux memory fragmentation

Samba - samba-technical mailing list
On Fri, Apr 07, 2017 at 11:18:20AM +0300, Uri Simchoni via samba-technical wrote:

> 6. Since the initial kmalloc is done with GFP_KERNEL, I think there's a
> middle case where the kernel would try to evict pages, and that might
> cause a performance hit, all for memory that's not really required by
> any in-tree file system (only ZFS supports 64K on Linux AFAIK).
>
> Bottom line - Samba works well with recent Linux kernels, but we might
> be taking a performance hit by asking for memory we probably don't need.
> We might want to do the initial getxattr with 4K, or have a 4K step
> between the 1K and the 64K.
>
> Thoughts?

Start with one page worth of space and double on every failure? The
question here is -- if we ask for 4k, does this really map to one page
in kernel or will there be overhead that makes our user-space 4k
request allocate 2 pages?

Volker

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: vfs_acl_xattr and Linux memory fragmentation

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Fri, Apr 07, 2017 at 11:18:20AM +0300, Uri Simchoni via samba-technical wrote:
> 2. In older Linux (prior to 3.4), getxattr() used to require a
> physically-contiguous (kmalloc'd) buffer whose size equals the max size
> requested by the user. So even though no Linux in-tree file system
> supports EA's larger than 4K, the getxattr() system call would try to
> allocate 64K if Samba passes this number.

XFS supports larger than 4k xattr values.

That beeing said a 4k try after 1k seems inherently reasonable to me.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: vfs_acl_xattr and Linux memory fragmentation

Samba - samba-technical mailing list
Hi,

Here's a patch that adds a 4K buffer step between 1K and 64K.

Starting with 1K means we keep the same behavior for the majority of
security descriptors out there - no negative impact in user mode for
what aims to be an improvement in kernel mode.

I've been wrong stating that Linux file systems can't handle an EA
larger than 4K - both XFS and BTRFS (as well as others) can do that.

About efficiency - what I've been able to discern, without going into
too much detail, is that:
a. There's no per-object overhead, you kmalloc 4K you use 4K.
b. There should be an advantage to allocating 4K over 64, but not
because it happens to be page size, but rather because it has a backing
pool (see  kmalloc-4096 in /proc/slabinfo). If that pool needs to
allocate more 4K buffers, it does so in 32K units, which is also prone
to fragmentation issues, but more often than not, you have 4K objects in
the pool. 64K allocations don't have such a pool so every time you
either have contiguous 64K, or you don't (and once memory becomes
fragmented, you normally don't).

Thanks,
Uri.

On 04/10/2017 11:26 AM, Christoph Hellwig wrote:

> On Fri, Apr 07, 2017 at 11:18:20AM +0300, Uri Simchoni via samba-technical wrote:
>> 2. In older Linux (prior to 3.4), getxattr() used to require a
>> physically-contiguous (kmalloc'd) buffer whose size equals the max size
>> requested by the user. So even though no Linux in-tree file system
>> supports EA's larger than 4K, the getxattr() system call would try to
>> allocate 64K if Samba passes this number.
>
> XFS supports larger than 4k xattr values.
>
> That beeing said a 4k try after 1k seems inherently reasonable to me.
>


xattr-4k.patch.txt (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: vfs_acl_xattr and Linux memory fragmentation

Samba - samba-technical mailing list
On Wed, Apr 12, 2017 at 08:09:50PM +0300, Uri Simchoni wrote:

> About efficiency - what I've been able to discern, without going into
> too much detail, is that:
> a. There's no per-object overhead, you kmalloc 4K you use 4K.
> b. There should be an advantage to allocating 4K over 64, but not
> because it happens to be page size, but rather because it has a backing
> pool (see  kmalloc-4096 in /proc/slabinfo). If that pool needs to
> allocate more 4K buffers, it does so in 32K units, which is also prone
> to fragmentation issues, but more often than not, you have 4K objects in
> the pool. 64K allocations don't have such a pool so every time you
> either have contiguous 64K, or you don't (and once memory becomes
> fragmented, you normally don't).

My /proc/slabinfo also has kmalloc-2048 and kmalloc-8192. This starts
to turn into bike-shedding, but is there any value in also trying 2k
and 8k?

Volker

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: vfs_acl_xattr and Linux memory fragmentation

Samba - samba-technical mailing list
On 04/13/2017 10:33 AM, Volker Lendecke via samba-technical wrote:

> On Wed, Apr 12, 2017 at 08:09:50PM +0300, Uri Simchoni wrote:
>> About efficiency - what I've been able to discern, without going into
>> too much detail, is that:
>> a. There's no per-object overhead, you kmalloc 4K you use 4K.
>> b. There should be an advantage to allocating 4K over 64, but not
>> because it happens to be page size, but rather because it has a backing
>> pool (see  kmalloc-4096 in /proc/slabinfo). If that pool needs to
>> allocate more 4K buffers, it does so in 32K units, which is also prone
>> to fragmentation issues, but more often than not, you have 4K objects in
>> the pool. 64K allocations don't have such a pool so every time you
>> either have contiguous 64K, or you don't (and once memory becomes
>> fragmented, you normally don't).
>
> My /proc/slabinfo also has kmalloc-2048 and kmalloc-8192. This starts
> to turn into bike-shedding, but is there any value in also trying 2k
> and 8k?
>
> Volker
>

My thinking was that trying 2K and 8K could very well lead to
accomplishing the task with way more than 2 system calls. There's always
the option of passing 0 as size to get actual size, then call again with
the buffer - two system calls.

But I can add those steps too - I don't have any field-gathered
information to support this or that strategy, they all have their
strengths and weaknesses (for example, maybe there are less objects in
the 8K pool, so it's likelier to be depleted with many concurrent
users). Ultimately, it's a kernel issue and has been fixed in the kernel
(don't know specifics of distros).

What I think we should keep is that the initial allocation is 1K,
because that preserves the user-mode behavior for the majority of cases.

Thanks,
Uri.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: vfs_acl_xattr and Linux memory fragmentation

Samba - samba-technical mailing list
On Thu, Apr 13, 2017 at 11:40:49AM +0300, Uri Simchoni wrote:

> My thinking was that trying 2K and 8K could very well lead to
> accomplishing the task with way more than 2 system calls. There's always
> the option of passing 0 as size to get actual size, then call again with
> the buffer - two system calls.
>
> But I can add those steps too - I don't have any field-gathered
> information to support this or that strategy, they all have their
> strengths and weaknesses (for example, maybe there are less objects in
> the 8K pool, so it's likelier to be depleted with many concurrent
> users). Ultimately, it's a kernel issue and has been fixed in the kernel
> (don't know specifics of distros).
>
> What I think we should keep is that the initial allocation is 1K,
> because that preserves the user-mode behavior for the majority of cases.

Then what about starting with 4k, covering 99.9%. If that fails we're
in the slow path. There we could easily take 2 more syscalls, one to
get the real size. This is racy, because between getting the size and
doing the real getxattr call the size could increase, but that should
be rare enough.

Volker

--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:[hidden email]

Samba eXPerience 2017, Hotel Freizeit In
2nd-4th of May 2017, http://sambaXP.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: vfs_acl_xattr and Linux memory fragmentation

Samba - samba-technical mailing list
On 04/13/2017 11:53 AM, Volker Lendecke wrote:

> On Thu, Apr 13, 2017 at 11:40:49AM +0300, Uri Simchoni wrote:
>> My thinking was that trying 2K and 8K could very well lead to
>> accomplishing the task with way more than 2 system calls. There's always
>> the option of passing 0 as size to get actual size, then call again with
>> the buffer - two system calls.
>>
>> But I can add those steps too - I don't have any field-gathered
>> information to support this or that strategy, they all have their
>> strengths and weaknesses (for example, maybe there are less objects in
>> the 8K pool, so it's likelier to be depleted with many concurrent
>> users). Ultimately, it's a kernel issue and has been fixed in the kernel
>> (don't know specifics of distros).
>>
>> What I think we should keep is that the initial allocation is 1K,
>> because that preserves the user-mode behavior for the majority of cases.
>
> Then what about starting with 4k, covering 99.9%. If that fails we're
> in the slow path. There we could easily take 2 more syscalls, one to
> get the real size. This is racy, because between getting the size and
> doing the real getxattr call the size could increase, but that should
> be rare enough.
>
> Volker
>
OK, see attached.

My concern with this was that it allocates 4K instead of 1K in smbd, but
since this memory is freed by the caller, I suppose that should have
little impact on performance / heap fragmentation / etc.

Thanks,
Uri


xattr-4k-v2.patch.txt (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: vfs_acl_xattr and Linux memory fragmentation

Samba - samba-technical mailing list
On Thu, Apr 13, 2017 at 01:08:42PM +0300, Uri Simchoni via samba-technical wrote:

> On 04/13/2017 11:53 AM, Volker Lendecke wrote:
> > On Thu, Apr 13, 2017 at 11:40:49AM +0300, Uri Simchoni wrote:
> >> My thinking was that trying 2K and 8K could very well lead to
> >> accomplishing the task with way more than 2 system calls. There's always
> >> the option of passing 0 as size to get actual size, then call again with
> >> the buffer - two system calls.
> >>
> >> But I can add those steps too - I don't have any field-gathered
> >> information to support this or that strategy, they all have their
> >> strengths and weaknesses (for example, maybe there are less objects in
> >> the 8K pool, so it's likelier to be depleted with many concurrent
> >> users). Ultimately, it's a kernel issue and has been fixed in the kernel
> >> (don't know specifics of distros).
> >>
> >> What I think we should keep is that the initial allocation is 1K,
> >> because that preserves the user-mode behavior for the majority of cases.
> >
> > Then what about starting with 4k, covering 99.9%. If that fails we're
> > in the slow path. There we could easily take 2 more syscalls, one to
> > get the real size. This is racy, because between getting the size and
> > doing the real getxattr call the size could increase, but that should
> > be rare enough.
> >
> > Volker
> >
> OK, see attached.
>
> My concern with this was that it allocates 4K instead of 1K in smbd, but
> since this memory is freed by the caller, I suppose that should have
> little impact on performance / heap fragmentation / etc.


LGTM. I was going to ask for 4k as well ! :-).

Jeremy.

> From 1adf66fb1c4b8621388593edc393fee2f867f723 Mon Sep 17 00:00:00 2001
> From: Uri Simchoni <[hidden email]>
> Date: Sun, 9 Apr 2017 00:20:40 +0300
> Subject: [PATCH v2 1/4] selftest: test fetching a large ACL from vfs_acl_xattr
>
> Add a test that fetches an ACL whose size is larger than 4K.
>
> BUG: https://bugzilla.samba.org/show_bug.cgi?id=12737
>
> Signed-off-by: Uri Simchoni <[hidden email]>
> ---
>  source3/script/tests/test_large_acl.sh | 59 ++++++++++++++++++++++++++++++++++
>  source3/selftest/tests.py              |  1 +
>  2 files changed, 60 insertions(+)
>  create mode 100755 source3/script/tests/test_large_acl.sh
>
> diff --git a/source3/script/tests/test_large_acl.sh b/source3/script/tests/test_large_acl.sh
> new file mode 100755
> index 0000000..9b6901f
> --- /dev/null
> +++ b/source3/script/tests/test_large_acl.sh
> @@ -0,0 +1,59 @@
> +#!/bin/bash
> +#
> +# Blackbox test for fetching a large ACL
> +#
> +
> +if [ $# -lt 5 ]; then
> +cat <<EOF
> +Usage: $0 SERVER USERNAME PASSWORD SMBCLIENT SMBCACLS PARAMS
> +EOF
> +exit 1;
> +fi
> +
> +SERVER=${1}
> +USERNAME=${2}
> +PASSWORD=${3}
> +SMBCLIENT=${4}
> +SMBCACLS=${5}
> +shift 5
> +ADDARGS="$*"
> +SMBCLIENT="$VALGRIND ${SMBCLIENT} ${ADDARGS}"
> +SMBCACLS="$VALGRIND ${SMBCACLS} ${ADDARGS}"
> +
> +incdir=`dirname $0`/../../../testprogs/blackbox
> +. $incdir/subunit.sh
> +
> +# build a file to work with
> +build_files()
> +{
> +    touch large_acl
> +    $SMBCLIENT //$SERVER/acl_xattr_ign_sysacl_windows -U $USERNAME%$PASSWORD -c 'put large_acl' > /dev/null 2>&1
> +    rm -rf large_acl > /dev/null
> +}
> +
> +cleanup()
> +{
> +    $SMBCLIENT //$SERVER/acl_xattr_ign_sysacl_windows -U $USERNAME%$PASSWORD -c 'rm large_acl' > /dev/null 2>&1
> +}
> +
> +build_files
> +
> +test_large_acl()
> +{
> +    #An ACL with 200 entries, ~7K
> +    new_acl=$(seq 1001 1200 | sed -r -e '1 i\D:(A;;0x001f01ff;;;WD)' -e 's/(.*)/(A;;0x001f01ff;;;S-1-5-21-11111111-22222222-33333333-\1)/' | tr -d '\n')
> +    $SMBCACLS //$SERVER/acl_xattr_ign_sysacl_windows -U $USERNAME%$PASSWORD --sddl -S $new_acl large_acl
> +    actual_acl=$($SMBCACLS //$SERVER/acl_xattr_ign_sysacl_windows -U $USERNAME%$PASSWORD --sddl --numeric large_acl 2>/dev/null | sed -rn 's/.*(D:.*)/\1/p' | tr -d '\n')
> +    if [ ! "$new_acl" = "$actual_acl" ] ; then
> +        echo -e "expected:\n$new_acl\nactual:\n$actual_acl\n"
> +        return 1
> +    fi
> +}
> +
> +failed=0
> +
> +testit "able to retrieve a large ACL if VFS supports it" test_large_acl || failed=`expr $failed + 1`
> +
> +cleanup
> +
> +exit $failed
> diff --git a/source3/selftest/tests.py b/source3/selftest/tests.py
> index d0e2ae6..3959439 100755
> --- a/source3/selftest/tests.py
> +++ b/source3/selftest/tests.py
> @@ -228,6 +228,7 @@ for env in ["fileserver"]:
>      plantestsuite("samba3.blackbox.inherit_owner.default(%s)" % env, env, [os.path.join(samba3srcdir, "script/tests/test_inherit_owner.sh"), '$SERVER', '$USERNAME', '$PASSWORD', '$PREFIX', smbclient3, smbcacls, 'tmp', '0', '0', '-m', 'NT1'])
>      plantestsuite("samba3.blackbox.inherit_owner.full (%s)" % env, env, [os.path.join(samba3srcdir, "script/tests/test_inherit_owner.sh"), '$SERVER', '$USERNAME', '$PASSWORD', '$PREFIX', smbclient3, smbcacls, 'inherit_owner', '1', '1', '-m', 'NT1'])
>      plantestsuite("samba3.blackbox.inherit_owner.unix (%s)" % env, env, [os.path.join(samba3srcdir, "script/tests/test_inherit_owner.sh"), '$SERVER', '$USERNAME', '$PASSWORD', '$PREFIX', smbclient3, smbcacls, 'inherit_owner_u', '0', '1', '-m', 'NT1'])
> +    plantestsuite("samba3.blackbox.large_acl (%s)" % env, env, [os.path.join(samba3srcdir, "script/tests/test_large_acl.sh"), '$SERVER', '$USERNAME', '$PASSWORD', smbclient3, smbcacls])
>  
>      #
>      # tar command tests
> --
> 2.9.3
>
>
> From 38db4fd79210611a0e8a3ec75a8a142592733960 Mon Sep 17 00:00:00 2001
> From: Uri Simchoni <[hidden email]>
> Date: Thu, 13 Apr 2017 12:50:47 +0300
> Subject: [PATCH v2 2/4] vfs_xattr_tdb: handle case of zero size.
>
> With getxattr(), passing a zero buffer size is a
> way of obtaining actual xattr size.
>
> BUG: https://bugzilla.samba.org/show_bug.cgi?id=12737
>
> Signed-off-by: Uri Simchoni <[hidden email]>
> ---
>  source3/modules/vfs_xattr_tdb.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/source3/modules/vfs_xattr_tdb.c b/source3/modules/vfs_xattr_tdb.c
> index b32fbc1..58acf44 100644
> --- a/source3/modules/vfs_xattr_tdb.c
> +++ b/source3/modules/vfs_xattr_tdb.c
> @@ -85,6 +85,12 @@ static ssize_t xattr_tdb_getxattr(struct vfs_handle_struct *handle,
>   TALLOC_FREE(frame);
>   return -1;
>   }
> +
> + if (size == 0) {
> + TALLOC_FREE(frame);
> + return xattr_size;
> + }
> +
>   if (blob.length > size) {
>   TALLOC_FREE(frame);
>   errno = ERANGE;
> @@ -125,6 +131,12 @@ static ssize_t xattr_tdb_fgetxattr(struct vfs_handle_struct *handle,
>   TALLOC_FREE(frame);
>   return -1;
>   }
> +
> + if (size == 0) {
> + TALLOC_FREE(frame);
> + return xattr_size;
> + }
> +
>   if (blob.length > size) {
>   TALLOC_FREE(frame);
>   errno = ERANGE;
> --
> 2.9.3
>
>
> From c07e3f869588305ec83f1c74b382bcf775137760 Mon Sep 17 00:00:00 2001
> From: Uri Simchoni <[hidden email]>
> Date: Sun, 9 Apr 2017 00:40:44 +0300
> Subject: [PATCH v2 3/4] vfs_acl_xattr: factor out fetching of an extended
>  attribute
>
> Pure refactoring - add a function that fetches an extended attribute
> based on either the file descriptor or the file name.
>
> BUG: https://bugzilla.samba.org/show_bug.cgi?id=12737
>
> Signed-off-by: Uri Simchoni <[hidden email]>
> ---
>  source3/modules/vfs_acl_xattr.c | 44 +++++++++++++++++++++++++++++------------
>  1 file changed, 31 insertions(+), 13 deletions(-)
>
> diff --git a/source3/modules/vfs_acl_xattr.c b/source3/modules/vfs_acl_xattr.c
> index e1f90ff..85b9a0d 100644
> --- a/source3/modules/vfs_acl_xattr.c
> +++ b/source3/modules/vfs_acl_xattr.c
> @@ -37,6 +37,35 @@
>   Pull a security descriptor into a DATA_BLOB from a xattr.
>  *******************************************************************/
>  
> +static ssize_t getxattr_do(vfs_handle_struct *handle,
> +   files_struct *fsp,
> +   const struct smb_filename *smb_fname,
> +   const char *xattr_name,
> +   uint8_t *val,
> +   size_t size)
> +{
> + ssize_t sizeret;
> + int saved_errno = 0;
> +
> + become_root();
> + if (fsp && fsp->fh->fd != -1) {
> + sizeret = SMB_VFS_FGETXATTR(fsp, xattr_name, val, size);
> + } else {
> + sizeret = SMB_VFS_GETXATTR(handle->conn, smb_fname->base_name,
> +   XATTR_NTACL_NAME, val, size);
> + }
> + if (sizeret == -1) {
> + saved_errno = errno;
> + }
> + unbecome_root();
> +
> + if (saved_errno != 0) {
> + errno = saved_errno;
> + }
> +
> + return sizeret;
> +}
> +
>  static NTSTATUS get_acl_blob(TALLOC_CTX *ctx,
>   vfs_handle_struct *handle,
>   files_struct *fsp,
> @@ -47,7 +76,6 @@ static NTSTATUS get_acl_blob(TALLOC_CTX *ctx,
>   uint8_t *val = NULL;
>   uint8_t *tmp;
>   ssize_t sizeret;
> - int saved_errno = 0;
>  
>   ZERO_STRUCTP(pblob);
>  
> @@ -60,21 +88,11 @@ static NTSTATUS get_acl_blob(TALLOC_CTX *ctx,
>   }
>   val = tmp;
>  
> - become_root();
> - if (fsp && fsp->fh->fd != -1) {
> - sizeret = SMB_VFS_FGETXATTR(fsp, XATTR_NTACL_NAME, val, size);
> - } else {
> - sizeret = SMB_VFS_GETXATTR(handle->conn, smb_fname->base_name,
> - XATTR_NTACL_NAME, val, size);
> - }
> - if (sizeret == -1) {
> - saved_errno = errno;
> - }
> - unbecome_root();
> + sizeret =
> +    getxattr_do(handle, fsp, smb_fname, XATTR_NTACL_NAME, val, size);
>  
>   /* Max ACL size is 65536 bytes. */
>   if (sizeret == -1) {
> - errno = saved_errno;
>   if ((errno == ERANGE) && (size != 65536)) {
>   /* Too small, try again. */
>   size = 65536;
> --
> 2.9.3
>
>
> From a3fcf24c9cfe59b0cff4a6200f7fb1513289a092 Mon Sep 17 00:00:00 2001
> From: Uri Simchoni <[hidden email]>
> Date: Thu, 13 Apr 2017 12:44:58 +0300
> Subject: [PATCH v2 4/4] vfs_acl_xattr: avoid needlessly supplying a large
>  buffer to getxattr()
>
> When obtaining the security descriptor via getxattr(), first try
> optimistically to supply a buffer of 4K, and if that turns out
> to be too small, determine the correct buffer size.
>
> The previous behavior of falling back to a 64K buffer encountered
> problem with Linux prior to version 3.6, due to pyisical memory
> fragmentation. With those kernels, as long as the buffer is 8K or
> smaller, getting the xattr is much less prone to failure due to
> memory fragmentation.
>
> BUG: https://bugzilla.samba.org/show_bug.cgi?id=12737
>
> Signed-off-by: Uri Simchoni <[hidden email]>
> ---
>  source3/modules/vfs_acl_xattr.c | 44 ++++++++++++++++++++++++++++-------------
>  1 file changed, 30 insertions(+), 14 deletions(-)
>
> diff --git a/source3/modules/vfs_acl_xattr.c b/source3/modules/vfs_acl_xattr.c
> index 85b9a0d..421860b 100644
> --- a/source3/modules/vfs_acl_xattr.c
> +++ b/source3/modules/vfs_acl_xattr.c
> @@ -72,7 +72,7 @@ static NTSTATUS get_acl_blob(TALLOC_CTX *ctx,
>   const struct smb_filename *smb_fname,
>   DATA_BLOB *pblob)
>  {
> - size_t size = 1024;
> + size_t size = 4096;
>   uint8_t *val = NULL;
>   uint8_t *tmp;
>   ssize_t sizeret;
> @@ -91,22 +91,38 @@ static NTSTATUS get_acl_blob(TALLOC_CTX *ctx,
>   sizeret =
>      getxattr_do(handle, fsp, smb_fname, XATTR_NTACL_NAME, val, size);
>  
> - /* Max ACL size is 65536 bytes. */
> - if (sizeret == -1) {
> - if ((errno == ERANGE) && (size != 65536)) {
> - /* Too small, try again. */
> - size = 65536;
> - goto again;
> - }
> + if (sizeret >= 0) {
> + pblob->data = val;
> + pblob->length = sizeret;
> + return NT_STATUS_OK;
> + }
>  
> - /* Real error - exit here. */
> - TALLOC_FREE(val);
> - return map_nt_error_from_unix(errno);
> + if (errno != ERANGE) {
> + goto err;
>   }
>  
> - pblob->data = val;
> - pblob->length = sizeret;
> - return NT_STATUS_OK;
> + /* Too small, try again. */
> + sizeret =
> +    getxattr_do(handle, fsp, smb_fname, XATTR_NTACL_NAME, NULL, 0);
> + if (sizeret < 0) {
> + goto err;
> + }
> +
> + if (size < sizeret) {
> + size = sizeret;
> + }
> +
> + if (size > 65536) {
> + /* Max ACL size is 65536 bytes. */
> + errno = ERANGE;
> + goto err;
> + }
> +
> + goto again;
> +  err:
> + /* Real error - exit here. */
> + TALLOC_FREE(val);
> + return map_nt_error_from_unix(errno);
>  }
>  
>  /*******************************************************************
> --
> 2.9.3
>


Loading...