|
The file lib/tdb/common/tdb.c has a comment before the function
tdb_parse_record * DON'T CALL OTHER TDB CALLS FROM THE PARSER, THIS MIGHT LEAD TO SEGFAULTS. can someone explain exactly what this means? Exactly what calls are not allowed? I've run into a case where I get a segfault because tdb_parse_record ends up calling tdb_lock_list which then calls tdb_needs_recovery which tries to do a tdb_read which then gets a segfault. So it looks like I've hit this condition. |
|
On Tue, Jun 19, 2012 at 10:31:05AM -0700, Herb Lewis wrote:
> The file lib/tdb/common/tdb.c has a comment before the function > tdb_parse_record > > * DON'T CALL OTHER TDB CALLS FROM THE PARSER, THIS MIGHT LEAD TO SEGFAULTS. > > can someone explain exactly what this means? Exactly what calls are > not allowed? > I've run into a case where I get a segfault because tdb_parse_record > ends up calling > tdb_lock_list which then calls tdb_needs_recovery which tries to do > a tdb_read which > then gets a segfault. So it looks like I've hit this condition. Ooooh. That sounds interesting ! Can you post the stack backtrace ? Cheers, Jeremy. |
|
On Tue, Jun 19, 2012 at 10:33:19AM -0700, Jeremy Allison wrote:
> On Tue, Jun 19, 2012 at 10:31:05AM -0700, Herb Lewis wrote: > > The file lib/tdb/common/tdb.c has a comment before the function > > tdb_parse_record > > > > * DON'T CALL OTHER TDB CALLS FROM THE PARSER, THIS MIGHT LEAD TO SEGFAULTS. > > > > can someone explain exactly what this means? Exactly what calls are > > not allowed? > > I've run into a case where I get a segfault because tdb_parse_record > > ends up calling > > tdb_lock_list which then calls tdb_needs_recovery which tries to do > > a tdb_read which > > then gets a segfault. So it looks like I've hit this condition. > > Ooooh. That sounds interesting ! Can you post the > stack backtrace ? Yes, please! Volker -- SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen phone: +49-551-370000-0, fax: +49-551-370000-9 AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen http://www.sernet.de, mailto:[hidden email] |
|
In reply to this post by Jeremy Allison
On 06/19/2012 10:59 AM, Herb Lewis wrote:
> On 06/19/2012 10:33 AM, Jeremy Allison wrote: >> On Tue, Jun 19, 2012 at 10:31:05AM -0700, Herb Lewis wrote: >>> The file lib/tdb/common/tdb.c has a comment before the function >>> tdb_parse_record >>> >>> * DON'T CALL OTHER TDB CALLS FROM THE PARSER, THIS MIGHT LEAD TO >>> SEGFAULTS. >>> >>> can someone explain exactly what this means? Exactly what calls are >>> not allowed? >>> I've run into a case where I get a segfault because tdb_parse_record >>> ends up calling >>> tdb_lock_list which then calls tdb_needs_recovery which tries to do >>> a tdb_read which >>> then gets a segfault. So it looks like I've hit this condition. >> Ooooh. That sounds interesting ! Can you post the >> stack backtrace ? >> >> Cheers, >> >> Jeremy. > Unfortunately I don't have the offending tdb file but here is a > partial backtrace > > #0 0x000000080202599c in kill () at kill.S:2 > #1 0x0000000802024773 in abort () at > /.automount/nfs.paneast.panasas.com/root/sb14/qa-build-2/sb/4.1.2.d/src/freebsd-c/lib/libc/stdlib/abort.c:65 > #2 0x00000000008dfbdd in dump_core () at > /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/lib/fault.c:391 > #3 0x00000000008f2ff4 in smb_panic (why=0x10d5d7e "internal error") > at > /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/lib/util.c:1137 > #4 0x00000000008df39b in fault_report (sig=11) at > /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/lib/fault.c:53 > #5 0x00000000008df3b3 in sig_fault (sig=11) at > /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/lib/fault.c:76 > #6 <signal handler called> > #7 memcpy () at > /.automount/nfs.paneast.panasas.com/root/sb14/qa-build-2/sb/4.1.2.d/src/freebsd-c/lib/libc/amd64/string/bcopy.S:69 > #8 0x0000000000d7e80d in tdb_read (tdb=0x80233b720, off=44, > buf=0x7fffffffd69c, len=4, cv=0) > at > /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/../lib/tdb/common/io.c:142 > #9 0x0000000000d7f174 in tdb_ofs_read (tdb=0x80233b720, offset=44, > d=0x7fffffffd69c) > at > /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/../lib/tdb/common/io.c:388 > #10 0x0000000000d7ccee in tdb_needs_recovery (tdb=0x80233b720) > at > /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/../lib/tdb/common/transaction.c:1261 > #11 0x0000000000d7fe69 in tdb_lock_list (tdb=0x80233b720, list=9909, > ltype=1, waitflag=TDB_LOCK_WAIT) > at > /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/../lib/tdb/common/lock.c:357 > #12 0x0000000000d7fef7 in tdb_lock (tdb=0x80233b720, list=9909, ltype=1) > at > /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/../lib/tdb/common/lock.c:374 > #13 0x0000000000d78722 in tdb_find_lock_hash (tdb=0x80233b720, > key={dptr = 0x7fffffffd8f0 "��ת", dsize = 24}, hash=3115729387, > locktype=1, rec=0x7fffffffd7d0) > at > /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/../lib/tdb/common/tdb.c:118 > #14 0x0000000000d78ac1 in tdb_parse_record (tdb=0x80233b720, key={dptr > = 0x7fffffffd8f0 "��ת", dsize = 24}, > parser=0x8d6fa0 <db_tdb_fetch_parse at > /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/lib/dbwrap_tdb.c:144>, > > ---Type <return> to continue, or q <return> to quit--- > private_data=0x7fffffffd840) at > /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/../lib/tdb/common/tdb.c:241 > #15 0x00000000008d70b5 in db_tdb_fetch (db=0x8023414d0, > mem_ctx=0x8023708e0, key={dptr = 0x7fffffffd8f0 "��ת", dsize = 24}, > pdata=0x7fffffffd8d0) > at > /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/lib/dbwrap_tdb.c:171 > #16 0x000000000086ac4c in fetch_share_mode_unlocked > (mem_ctx=0x80236f050, id={devid = 2866260714, inode = 942439120, extid > = 0}) > at > /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/locking/locking.c:1009 > #17 0x000000000086b329 in get_file_infos (id={devid = 2866260714, > inode = 942439120, extid = 0}, name_hash=0, delete_on_close=0x0, > write_time=0x7fffffffdac0) > at > /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/locking/locking.c:1146 > #18 0x000000000057e713 in smbd_smb2_create_send (mem_ctx=0x802320110, > ev=0x80230d110, smb2req=0x802320110, in_oplock_level=0 '\0', > in_impersonation_level=2, > in_desired_access=2032063, in_file_attributes=128, in_share_access=7, > in_create_disposition=3, in_create_options=0, in_name=0x8023205d0 > "File106935650", > in_context_blobs={num_blobs = 0, blobs = 0x0}) > at > /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/smbd/smb2_create.c:808 > #19 0x000000000057c2bb in smbd_smb2_request_process_create > (smb2req=0x802320110) > at > /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/smbd/smb2_create.c:229 > looking at the core I don't see anything that should be wrong with the call to memcpy. (gdb) f 8 #8 0x0000000000d7e80d in tdb_read (tdb=0x80233b720, off=44, buf=0x7fffffffd69c, len=4, cv=0) at /.automount/nfs.panwest.panasas.com/root/sb4/hlewis/hlewis-sb4-trunk/src/samba/source3/../lib/tdb/common/io.c:142 142 memcpy(buf, off + (char *)tdb->map_ptr, len); (gdb) p len $8 = 4 (gdb) p buf $9 = (void *) 0x7fffffffd69c (gdb) x/4b buf 0x7fffffffd69c: 0x08 0x00 0x00 0x00 (gdb) p off + (char *)tdb->map_ptr $10 = 0x8014d602c "" (gdb) x/4b off + (char *)tdb->map_ptr 0x8014d602c: 0x00 0x00 0x00 0x00 (gdb) p/x *tdb $12 = {name = 0x80231d6a0, map_ptr = 0x8014d6000, fd = 0xe, map_size = 0x13000, read_only = 0x0, traverse_read = 0x0, traverse_write = 0x0, allrecord_lock = { off = 0x0, count = 0x0, ltype = 0x0}, num_lockrecs = 0x1, lockrecs = 0x8023093d0, ecode = 0x8, header = {magic_food = {0x0 <repeats 32 times>}, version = 0x2601196d, hash_size = 0x2717, rwlocks = 0xbad1a51, recovery_start = 0x0, sequence_number = 0x0, magic1_hash = 0xd7b694e5, magic2_hash = 0x76b9440e, reserved = {0x0 <repeats 27 times>}}, flags = 0xb01, travlocks = {next = 0x0, off = 0x0, hash = 0x0, lock_rw = 0x0}, next = 0x80233b5c0, device = 0x47, inode = 0x20e90d, log = {log_fn = 0x8d5f10, log_private = 0xd76435}, hash_fn = 0xd83ce0, open_flags = 0x202, methods = 0x13ffd20, transaction = 0x0, page_size = 0x1000, max_dead_records = 0x5, interrupt_sig_ptr = 0x0} |
|
On Tue, Jun 19, 2012 at 12:26:16PM -0700, Herb Lewis wrote:
> On 06/19/2012 10:59 AM, Herb Lewis wrote: > >On 06/19/2012 10:33 AM, Jeremy Allison wrote: > >>On Tue, Jun 19, 2012 at 10:31:05AM -0700, Herb Lewis wrote: > >>>The file lib/tdb/common/tdb.c has a comment before the function > >>>tdb_parse_record > >>> > >>>* DON'T CALL OTHER TDB CALLS FROM THE PARSER, THIS MIGHT LEAD > >>>TO SEGFAULTS. > >>> > >>>can someone explain exactly what this means? Exactly what calls are > >>>not allowed? > >>>I've run into a case where I get a segfault because tdb_parse_record > >>>ends up calling > >>>tdb_lock_list which then calls tdb_needs_recovery which tries to do > >>>a tdb_read which > >>>then gets a segfault. So it looks like I've hit this condition. > >>Ooooh. That sounds interesting ! Can you post the > >>stack backtrace ? > >> > >>Cheers, > >> > >> Jeremy. > >Unfortunately I don't have the offending tdb file but here is a > >partial backtrace In that backtrace I can't see anything suspicions from a wrong use of tdb_parse_record, but I would not say I'm not missing anything. In particular I don't see any recursive call from db_tdb_fetch_parse back into tdb. This must be something else as far as I can see. Volker |
|
On Tue, Jun 19, 2012 at 4:21 PM, Volker Lendecke
<[hidden email]> wrote: > On Tue, Jun 19, 2012 at 12:26:16PM -0700, Herb Lewis wrote: >> On 06/19/2012 10:59 AM, Herb Lewis wrote: >> >On 06/19/2012 10:33 AM, Jeremy Allison wrote: >> >>On Tue, Jun 19, 2012 at 10:31:05AM -0700, Herb Lewis wrote: >> >>>The file lib/tdb/common/tdb.c has a comment before the function >> >>>tdb_parse_record >> >>> >> >>>* DON'T CALL OTHER TDB CALLS FROM THE PARSER, THIS MIGHT LEAD >> >>>TO SEGFAULTS. >> >>> >> >>>can someone explain exactly what this means? Exactly what calls are >> >>>not allowed? >> >>>I've run into a case where I get a segfault because tdb_parse_record >> >>>ends up calling >> >>>tdb_lock_list which then calls tdb_needs_recovery which tries to do >> >>>a tdb_read which >> >>>then gets a segfault. So it looks like I've hit this condition. >> >>Ooooh. That sounds interesting ! Can you post the >> >>stack backtrace ? >> >> >> >>Cheers, >> >> >> >> Jeremy. >> >Unfortunately I don't have the offending tdb file but here is a >> >partial backtrace > > In that backtrace I can't see anything suspicions from a > wrong use of tdb_parse_record, but I would not say I'm not > missing anything. In particular I don't see any recursive > call from db_tdb_fetch_parse back into tdb. This must be > something else as far as I can see. What if munmap failed? That would explain what we are seeing. I'm not sure anything in that stack touches the mmapped area until then. I checked the error handling around munmap and it looks like it could cause this exact error. It leaves the map_ptr set. (Look at tdb_unmap.) Thoughts? -Ira |
|
On Tue, Jun 19, 2012 at 05:12:42PM -0400, Ira Cooper wrote:
> > What if munmap failed? That would explain what we are seeing. I'm > not sure anything in that stack touches the mmapped area until then. > > I checked the error handling around munmap and it looks like it could > cause this exact error. It leaves the map_ptr set. (Look at > tdb_unmap.) That's true, but ASAIK it's undefined what state the passed in pointer is left when munmap() fails. The man pages imply it would only report failure with EINVAL. Jeremy. |
|
In reply to this post by Herb Lewis
On Tue, 19 Jun 2012 10:31:05 -0700, Herb Lewis <[hidden email]> wrote:
> The file lib/tdb/common/tdb.c has a comment before the function > tdb_parse_record > > * DON'T CALL OTHER TDB CALLS FROM THE PARSER, THIS MIGHT LEAD TO SEGFAULTS. > > can someone explain exactly what this means? Exactly what calls are not > allowed? Hi Herb, Thanks for the bug report. Any call which touches the database can trigger a remap (and hence a segfault). Obviously adding a new record could extend the database, but a simple fetch could traverse a record outside our current mmap, which will cause a remap. We should fail any db access attempts from parse_record, to catch this case, since it will *usually* work fine and thus the bug is quite subtle. > I've run into a case where I get a segfault because tdb_parse_record > ends up calling > tdb_lock_list which then calls tdb_needs_recovery which tries to do a > tdb_read which > then gets a segfault. So it looks like I've hit this condition. Something is badly wrong here then! tdb_parse_record already holds a lock, so tdb_lock_list won't check the database. Please post the actual backtrace so we can see how this happened. Thanks, Rusty. |
| Powered by Nabble | Edit this page |
