|
This patch tries to reduce the pain around replicating DNS. We now do
it at join time. However, at least during make test, it causes a segfault in the DRS server, which I can't yet pin down (even with valgrind I don't get a useful answer). I'm posting the patch here in case someone else has a clue why it crashes our DRS server, as I think it is an existing bug (I just change how we join, not the DRS server). Andrew Bartlett -- Andrew Bartlett http://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org |
|
Hi Andrew,
> This patch tries to reduce the pain around replicating DNS. We now do > it at join time. > > However, at least during make test, it causes a segfault in the DRS > server, which I can't yet pin down (even with valgrind I don't get a > useful answer). > > I'm posting the patch here in case someone else has a clue why it > crashes our DRS server, as I think it is an existing bug (I just change > how we join, not the DRS server). is for all of them... Maybe your bug is related. metze |
|
On 06/21/2012 04:20 PM, Stefan (metze) Metzmacher wrote:
>> I'm posting the patch here in case someone else has a clue why >> it >>> crashes our DRS server, as I think it is an existing bug (I >>> just change how we join, not the DRS server). > HasMasterNCs is only for the 3 main partitions, while > msDS-HasMasterNCs is for all of them... Maybe your bug is related. This week I (mis-)configured it manually with hasMasterNCs. Nothing crashed. But "samba-tool drs showrepl" didn't show me the repl status of dns DNS stuff anymore until I've changed it to msDS-HasMasterNCs. Björn -- SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen phone: +49-551-370000-0, fax: +49-551-370000-9 AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen http://www.sernet.de, mailto:[hidden email] |
|
In reply to this post by Stefan (metze) Metzmacher
On Thu, 2012-06-21 at 16:20 +0200, Stefan (metze) Metzmacher wrote:
> Hi Andrew, > > > This patch tries to reduce the pain around replicating DNS. We now do > > it at join time. > > > > However, at least during make test, it causes a segfault in the DRS > > server, which I can't yet pin down (even with valgrind I don't get a > > useful answer). > > > > I'm posting the patch here in case someone else has a clue why it > > crashes our DRS server, as I think it is an existing bug (I just change > > how we join, not the DRS server). > > HasMasterNCs is only for the 3 main partitions, while msDS-HasMasterNCs > is for all of them... > Maybe your bug is related. > > metze The segfault is this: Program received signal SIGSEGV, Segmentation fault. 0x00007fffebf26cc0 in dreplsrv_run_pull_ops (s=0x19d6d80) at ../source4/dsdb/repl/drepl_out_pull.c:200 200 op->source_dsa->repsFrom1->last_attempt = now; #0 0x00007fffebf26cc0 in dreplsrv_run_pull_ops (s=0x19d6d80) at ../source4/dsdb/repl/drepl_out_pull.c:200 #1 0x00007fffebf24179 in dreplsrv_run_pending_ops (s=0x19d6d80) at ../source4/dsdb/repl/drepl_periodic.c:131 #2 0x00007fffebf2a710 in dreplsrv_notify_run (service=0x19d6d80) at ../source4/dsdb/repl/drepl_notify.c:480 #3 0x00007fffebf2a47b in dreplsrv_notify_handler_te (ev=0x630870, te=0x1477280, t=..., ptr=0x19d6d80) at ../source4/dsdb/repl/drepl_notify.c:421 #4 0x00007ffff68a4593 in tevent_common_loop_timer_delay (ev=0x630870) at ../lib/tevent/tevent_timed.c:254 #5 0x00007ffff68a3385 in epoll_event_loop (std_ev=0x630950, tvalp=0x7fffffff98e0) at ../lib/tevent/tevent_standard.c:298 #6 0x00007ffff68a3c13 in std_event_loop_once (ev=0x630870, location=0x40fb9f "../source4/smbd/server.c:472") at ../lib/tevent/tevent_standard.c:567 #7 0x00007ffff689ecf5 in _tevent_loop_once (ev=0x630870, location=0x40fb9f "../source4/smbd/server.c:472") at ../lib/tevent/tevent.c:506 #8 0x00007ffff689ef1a in tevent_common_loop_wait (ev=0x630870, location=0x40fb9f "../source4/smbd/server.c:472") at ../lib/tevent/tevent.c:607 #9 0x00007ffff689efe5 in _tevent_loop_wait (ev=0x630870, location=0x40fb9f "../source4/smbd/server.c:472") at ../lib/tevent/tevent.c:626 #10 0x000000000040b5a3 in binary_smbd_main (binary_name=0x40f58b "samba", argc=6, argv=0x7fffffff9d28) at ../source4/smbd/server.c:472 #11 0x000000000040b5e9 in main (argc=6, argv=0x7fffffff9d28) at ../source4/smbd/server.c:483 Missing separate debuginfos, use: debuginfo-install glibc-2.14.90-24.fc16.7.x86_64 gnome-keyring-3.2.1-3.fc16.x86_64 gnutls-2.12.14-3.fc16.x86_64 krb5-libs-1.9.3-2.fc16.x86_64 libbsd-0.2.0-4.fc15.x86_64 libdb-5.2.36-1.fc16.x86_64 libgcrypt-1.5.0-2.fc16.x86_64 libgpg-error-1.10-1.fc16.x86_64 libtalloc-2.0.7-4.fc16.x86_64 libtasn1-2.12-1.fc16.x86_64 openssl-1.0.0j-1.fc16.x86_64 p11-kit-0.6-1.fc16.x86_64 python-libs-2.7.3-3.fc16.x86_64 This is because op->source (struct dreplsrv_partition_source_dsa) is I think freed here: source4/messaging/messaging.c:772 full talloc report on 'struct irpc_message' (total 4102 bytes in 23 blocks) struct dreplsrv_partition_source_dsa contains 350 bytes in 3 blocks (ref 0) 0x1d3dae0 struct repsFromTo1OtherInfo contains 78 bytes in 2 blocks (ref 0) 0xe34500 178dcc89-2e73-4415-939b-0b3bb168ab09._msdcs.samba.example.com contains 62 bytes in 1 blocks (ref 0) 0x1d3dcd0 default/librpc/gen_ndr/ndr_drsuapi.c:14837 contains 416 bytes in 3 blocks (ref 0) 0xba93a0 default/librpc/gen_ndr/ndr_drsuapi.c:661 contains 376 bytes in 2 blocks (ref 0) 0x198e730 char contains 272 bytes in 1 blocks (ref 0) 0x1cd5100 default/librpc/gen_ndr/ndr_drsuapi.c:14829 contains 20 bytes in 1 blocks (ref 0) 0x1d3df30 DATA_BLOB: ../librpc/ndr/ndr_basic.c:1301 contains 4 bytes in 1 blocks (ref 0) 0x15d1de0 default/source4/librpc/gen_ndr/ndr_irpc.c:57 contains 508 bytes in 2 blocks (ref 0) 0x8980a0 default/librpc/gen_ndr/ndr_security.c:1001 contains 476 bytes in 1 blocks (ref 0) 0xbf1300 struct ndr_pull contains 2660 bytes in 12 blocks (ref 0) 0xff2520 struct ndr_push contains 2244 bytes in 5 blocks (ref 0) 0xec9d10 uint8_t contains 4 bytes in 1 blocks (ref 0) 0x13b80a0 struct ndr_push contains 1120 bytes in 2 blocks (ref 0) 0x1618a50 uint8_t contains 1024 bytes in 1 blocks (ref 0) 0xe4f050 uint8_t contains 1024 bytes in 1 blocks (ref 0) 0x16bca90 struct ndr_pull contains 96 bytes in 1 blocks (ref 0) 0x1f2a9a0 struct ndr_token_list contains 32 bytes in 1 blocks (ref 0) 0x19bcb80 struct ndr_token_list contains 32 bytes in 1 blocks (ref 0) 0xbf15c0 ../source4/lib/messaging/messaging.c:802 contains 32 bytes in 1 blocks (ref 0) 0xbf1540 struct ndr_pull contains 128 bytes in 2 blocks (ref 0) 0x1f2abb0 struct ndr_token_list contains 32 bytes in 1 blocks (ref 0) 0x18347a0 -- Andrew Bartlett http://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org |
|
In reply to this post by Andrew Bartlett
On Thu, 2012-06-21 at 23:49 +1000, Andrew Bartlett wrote:
> This patch tries to reduce the pain around replicating DNS. We now do > it at join time. > > However, at least during make test, it causes a segfault in the DRS > server, which I can't yet pin down (even with valgrind I don't get a > useful answer). I've found and fixed the segfault issue, so now I want testing of the join.py modifications. https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/fix-dns-replication If those who are having pain getting DNS replication up and going can try with these 2 patches, I hope this may solve some of the issues. You still need to run samba_upgradedns after the join, but I'll include that when I get a chance. This should at least mean that the partitions are correctly replicated, which has been the biggest pain point. We do really want this to work for folks, and I'm sorry it has taken so long to investigate. Thanks, Andrew Bartlett -- Andrew Bartlett http://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org |
|
Hi Andrew,
> On Thu, 2012-06-21 at 23:49 +1000, Andrew Bartlett wrote: >> This patch tries to reduce the pain around replicating DNS. We now do >> it at join time. >> >> However, at least during make test, it causes a segfault in the DRS >> server, which I can't yet pin down (even with valgrind I don't get a >> useful answer). > > I've found and fixed the segfault issue, so now I want testing of the > join.py modifications. > > https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/fix-dns-replication > metze |
|
In reply to this post by Andrew Bartlett
Hi Andrew,
On Fri, Jun 22, 2012 at 9:48 AM, Andrew Bartlett <[hidden email]> wrote: > On Thu, 2012-06-21 at 23:49 +1000, Andrew Bartlett wrote: >> This patch tries to reduce the pain around replicating DNS. We now do >> it at join time. >> >> However, at least during make test, it causes a segfault in the DRS >> server, which I can't yet pin down (even with valgrind I don't get a >> useful answer). > > I've found and fixed the segfault issue, so now I want testing of the > join.py modifications. > > https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/fix-dns-replication > > If those who are having pain getting DNS replication up and going can > try with these 2 patches, I hope this may solve some of the issues. If the DNS role is not assigned to a (windows) DC, it never replicates the DNS partition and also does not have DNS NCs listed in msDS-hasMasterNCs. So, it appears that adding DNS NCs in msDS-hasMasterNCs attribute is equivalent to adding DNS role to the second DC. May be that'll fix the replication issue. I was under the assumption that msDS-hasMasterNCs attribute is set only after the replication is complete. But that's not true. It has to be set if the DC is going to hold a full replica of the NC. > You still need to run samba_upgradedns after the join, but I'll include > that when I get a chance. This should at least mean that the partitions > are correctly replicated, which has been the biggest pain point. Since you have added dns_backend option to join, we can potentially short-circuit running samba_upgradedns and run parts of dns provision directly. > We do really want this to work for folks, and I'm sorry it has taken so > long to investigate. > > Thanks, > > Andrew Bartlett > -- > Andrew Bartlett http://samba.org/~abartlet/ > Authentication Developer, Samba Team http://samba.org Amitay. |
|
In reply to this post by Stefan (metze) Metzmacher
On Fri, 2012-06-22 at 08:29 +0200, Stefan (metze) Metzmacher wrote:
> Hi Andrew, > > > On Thu, 2012-06-21 at 23:49 +1000, Andrew Bartlett wrote: > >> This patch tries to reduce the pain around replicating DNS. We now do > >> it at join time. > >> > >> However, at least during make test, it causes a segfault in the DRS > >> server, which I can't yet pin down (even with valgrind I don't get a > >> useful answer). > > > > I've found and fixed the segfault issue, so now I want testing of the > > join.py modifications. > > > > https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/fix-dns-replication > > > > Your're still set HasMasterNCs to the full nc list, which is wrong. Ahh, now I get what you mean. I was fixated on the fix for the segfault, and didn't get time to look at join.py with un-tired eyes :-) I'm sure I can fix that up, if that's the only issue. Andrew Bartlett -- Andrew Bartlett http://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org |
|
In reply to this post by Amitay Isaacs
On Fri, 2012-06-22 at 17:17 +1000, Amitay Isaacs wrote:
> Hi Andrew, > > On Fri, Jun 22, 2012 at 9:48 AM, Andrew Bartlett <[hidden email]> wrote: > > On Thu, 2012-06-21 at 23:49 +1000, Andrew Bartlett wrote: > >> This patch tries to reduce the pain around replicating DNS. We now do > >> it at join time. > >> > >> However, at least during make test, it causes a segfault in the DRS > >> server, which I can't yet pin down (even with valgrind I don't get a > >> useful answer). > > > > I've found and fixed the segfault issue, so now I want testing of the > > join.py modifications. > > > > https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/fix-dns-replication > > > > If those who are having pain getting DNS replication up and going can > > try with these 2 patches, I hope this may solve some of the issues. > > If the DNS role is not assigned to a (windows) DC, it never replicates > the DNS partition and also does not have DNS NCs listed in > msDS-hasMasterNCs. So, it appears that adding DNS NCs in > msDS-hasMasterNCs attribute is equivalent to adding DNS role to the > second DC. > > May be that'll fix the replication issue. I was under the assumption > that msDS-hasMasterNCs attribute is set only after the replication is > complete. But that's not true. It has to be set if the DC is going to > hold a full replica of the NC. OK. So, aside from fixing it to use the right attribute, we might be on the way to a solution then. > > You still need to run samba_upgradedns after the join, but I'll include > > that when I get a chance. This should at least mean that the partitions > > are correctly replicated, which has been the biggest pain point. > > Since you have added dns_backend option to join, we can potentially > short-circuit running samba_upgradedns and run parts of dns provision > directly. That's essentially what I want to have happen. The one query I have is: What happens if the DC we choose to replicate the rest of the data from doesn't hold the DNS partitions? Andrew Bartlett -- Andrew Bartlett http://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org |
|
On Fri, Jun 22, 2012 at 6:08 PM, Andrew Bartlett <[hidden email]> wrote:
> On Fri, 2012-06-22 at 17:17 +1000, Amitay Isaacs wrote: >> Hi Andrew, >> >> On Fri, Jun 22, 2012 at 9:48 AM, Andrew Bartlett <[hidden email]> wrote: >> > On Thu, 2012-06-21 at 23:49 +1000, Andrew Bartlett wrote: >> >> This patch tries to reduce the pain around replicating DNS. We now do >> >> it at join time. >> >> >> >> However, at least during make test, it causes a segfault in the DRS >> >> server, which I can't yet pin down (even with valgrind I don't get a >> >> useful answer). >> > >> > I've found and fixed the segfault issue, so now I want testing of the >> > join.py modifications. >> > >> > https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/fix-dns-replication >> > >> > If those who are having pain getting DNS replication up and going can >> > try with these 2 patches, I hope this may solve some of the issues. >> >> If the DNS role is not assigned to a (windows) DC, it never replicates >> the DNS partition and also does not have DNS NCs listed in >> msDS-hasMasterNCs. So, it appears that adding DNS NCs in >> msDS-hasMasterNCs attribute is equivalent to adding DNS role to the >> second DC. >> >> May be that'll fix the replication issue. I was under the assumption >> that msDS-hasMasterNCs attribute is set only after the replication is >> complete. But that's not true. It has to be set if the DC is going to >> hold a full replica of the NC. > > OK. So, aside from fixing it to use the right attribute, we might be on > the way to a solution then. > >> > You still need to run samba_upgradedns after the join, but I'll include >> > that when I get a chance. This should at least mean that the partitions >> > are correctly replicated, which has been the biggest pain point. >> >> Since you have added dns_backend option to join, we can potentially >> short-circuit running samba_upgradedns and run parts of dns provision >> directly. > > That's essentially what I want to have happen. > > The one query I have is: What happens if the DC we choose to replicate > the rest of the data from doesn't hold the DNS partitions? As I understand, it should be the job of KCC to figure out which partitions should be replicated. The current implementation of KCC sets up replication between each DC for all partitions. So if a second DC does not have application partitions, first DC should not try to replicate those partitions to second DC. May be we need to switch to python KCC and make sure it does the correct thing. Amitay |
| Powered by Nabble | Edit this page |
