Quantcast

[PATCH] Attempt to replicate DNS zones at domain join time (drepl server crash)

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[PATCH] Attempt to replicate DNS zones at domain join time (drepl server crash)

Andrew Bartlett
This patch tries to reduce the pain around replicating DNS.  We now do
it at join time.

However, at least during make test, it causes a segfault in the DRS
server, which I can't yet pin down (even with valgrind I don't get a
useful answer).

I'm posting the patch here in case someone else has a clue why it
crashes our DRS server, as I think it is an existing bug (I just change
how we join, not the DRS server).

Andrew Bartlett


--
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org

0001-s4-join-Import-DNS-zones-in-AD-DC-join.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: [PATCH] Attempt to replicate DNS zones at domain join time (drepl server crash)

Stefan (metze) Metzmacher
Hi Andrew,

> This patch tries to reduce the pain around replicating DNS.  We now do
> it at join time.
>
> However, at least during make test, it causes a segfault in the DRS
> server, which I can't yet pin down (even with valgrind I don't get a
> useful answer).
>
> I'm posting the patch here in case someone else has a clue why it
> crashes our DRS server, as I think it is an existing bug (I just change
> how we join, not the DRS server).
HasMasterNCs is only for the 3 main partitions, while msDS-HasMasterNCs
is for all of them...
Maybe your bug is related.

metze


signature.asc (270 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: [PATCH] Attempt to replicate DNS zones at domain join time (drepl server crash)

Bjoern Baumbach
On 06/21/2012 04:20 PM, Stefan (metze) Metzmacher wrote:
>> I'm posting the patch here in case someone else has a clue why
>> it
>>> crashes our DRS server, as I think it is an existing bug (I
>>> just change how we join, not the DRS server).
> HasMasterNCs is only for the 3 main partitions, while
> msDS-HasMasterNCs is for all of them... Maybe your bug is related.

This week I (mis-)configured it manually with hasMasterNCs. Nothing
crashed. But "samba-tool drs showrepl" didn't show me the repl status
of dns DNS stuff anymore until I've changed it to msDS-HasMasterNCs.

Björn

--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:[hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: [PATCH] Attempt to replicate DNS zones at domain join time (drepl server crash)

Andrew Bartlett
In reply to this post by Stefan (metze) Metzmacher
On Thu, 2012-06-21 at 16:20 +0200, Stefan (metze) Metzmacher wrote:

> Hi Andrew,
>
> > This patch tries to reduce the pain around replicating DNS.  We now do
> > it at join time.
> >
> > However, at least during make test, it causes a segfault in the DRS
> > server, which I can't yet pin down (even with valgrind I don't get a
> > useful answer).
> >
> > I'm posting the patch here in case someone else has a clue why it
> > crashes our DRS server, as I think it is an existing bug (I just change
> > how we join, not the DRS server).
>
> HasMasterNCs is only for the 3 main partitions, while msDS-HasMasterNCs
> is for all of them...
> Maybe your bug is related.
>
> metze

The segfault is this:

Program received signal SIGSEGV, Segmentation fault.
0x00007fffebf26cc0 in dreplsrv_run_pull_ops (s=0x19d6d80)
at ../source4/dsdb/repl/drepl_out_pull.c:200
200             op->source_dsa->repsFrom1->last_attempt = now;
#0  0x00007fffebf26cc0 in dreplsrv_run_pull_ops (s=0x19d6d80)
at ../source4/dsdb/repl/drepl_out_pull.c:200
#1  0x00007fffebf24179 in dreplsrv_run_pending_ops (s=0x19d6d80)
at ../source4/dsdb/repl/drepl_periodic.c:131
#2  0x00007fffebf2a710 in dreplsrv_notify_run (service=0x19d6d80)
at ../source4/dsdb/repl/drepl_notify.c:480
#3  0x00007fffebf2a47b in dreplsrv_notify_handler_te (ev=0x630870,
te=0x1477280, t=..., ptr=0x19d6d80)
at ../source4/dsdb/repl/drepl_notify.c:421
#4  0x00007ffff68a4593 in tevent_common_loop_timer_delay (ev=0x630870)
at ../lib/tevent/tevent_timed.c:254
#5  0x00007ffff68a3385 in epoll_event_loop (std_ev=0x630950,
tvalp=0x7fffffff98e0) at ../lib/tevent/tevent_standard.c:298
#6  0x00007ffff68a3c13 in std_event_loop_once (ev=0x630870,
location=0x40fb9f "../source4/smbd/server.c:472")
at ../lib/tevent/tevent_standard.c:567
#7  0x00007ffff689ecf5 in _tevent_loop_once (ev=0x630870,
location=0x40fb9f "../source4/smbd/server.c:472")
at ../lib/tevent/tevent.c:506
#8  0x00007ffff689ef1a in tevent_common_loop_wait (ev=0x630870,
location=0x40fb9f "../source4/smbd/server.c:472")
at ../lib/tevent/tevent.c:607
#9  0x00007ffff689efe5 in _tevent_loop_wait (ev=0x630870,
location=0x40fb9f "../source4/smbd/server.c:472")
at ../lib/tevent/tevent.c:626
#10 0x000000000040b5a3 in binary_smbd_main (binary_name=0x40f58b
"samba", argc=6, argv=0x7fffffff9d28) at ../source4/smbd/server.c:472
#11 0x000000000040b5e9 in main (argc=6, argv=0x7fffffff9d28)
at ../source4/smbd/server.c:483
Missing separate debuginfos, use: debuginfo-install
glibc-2.14.90-24.fc16.7.x86_64 gnome-keyring-3.2.1-3.fc16.x86_64
gnutls-2.12.14-3.fc16.x86_64 krb5-libs-1.9.3-2.fc16.x86_64
libbsd-0.2.0-4.fc15.x86_64 libdb-5.2.36-1.fc16.x86_64
libgcrypt-1.5.0-2.fc16.x86_64 libgpg-error-1.10-1.fc16.x86_64
libtalloc-2.0.7-4.fc16.x86_64 libtasn1-2.12-1.fc16.x86_64
openssl-1.0.0j-1.fc16.x86_64 p11-kit-0.6-1.fc16.x86_64
python-libs-2.7.3-3.fc16.x86_64

This is because op->source (struct dreplsrv_partition_source_dsa) is I
think freed here:

source4/messaging/messaging.c:772

full talloc report on 'struct irpc_message' (total   4102 bytes in  23
blocks)
    struct dreplsrv_partition_source_dsa contains    350 bytes in   3
blocks (ref 0) 0x1d3dae0
        struct repsFromTo1OtherInfo    contains     78 bytes in   2
blocks (ref 0) 0xe34500

178dcc89-2e73-4415-939b-0b3bb168ab09._msdcs.samba.example.com contains
62 bytes in   1 blocks (ref 0) 0x1d3dcd0
    default/librpc/gen_ndr/ndr_drsuapi.c:14837 contains    416 bytes in
3 blocks (ref 0) 0xba93a0
        default/librpc/gen_ndr/ndr_drsuapi.c:661 contains    376 bytes
in   2 blocks (ref 0) 0x198e730
            char                           contains    272 bytes in   1
blocks (ref 0) 0x1cd5100
    default/librpc/gen_ndr/ndr_drsuapi.c:14829 contains     20 bytes in
1 blocks (ref 0) 0x1d3df30
    DATA_BLOB: ../librpc/ndr/ndr_basic.c:1301 contains      4 bytes in
1 blocks (ref 0) 0x15d1de0
    default/source4/librpc/gen_ndr/ndr_irpc.c:57 contains    508 bytes
in   2 blocks (ref 0) 0x8980a0
        default/librpc/gen_ndr/ndr_security.c:1001 contains    476 bytes
in   1 blocks (ref 0) 0xbf1300
    struct ndr_pull                contains   2660 bytes in  12 blocks
(ref 0) 0xff2520
        struct ndr_push                contains   2244 bytes in   5
blocks (ref 0) 0xec9d10
            uint8_t                        contains      4 bytes in   1
blocks (ref 0) 0x13b80a0
            struct ndr_push                contains   1120 bytes in   2
blocks (ref 0) 0x1618a50
                uint8_t                        contains   1024 bytes in
1 blocks (ref 0) 0xe4f050
            uint8_t                        contains   1024 bytes in   1
blocks (ref 0) 0x16bca90
        struct ndr_pull                contains     96 bytes in   1
blocks (ref 0) 0x1f2a9a0
        struct ndr_token_list          contains     32 bytes in   1
blocks (ref 0) 0x19bcb80
        struct ndr_token_list          contains     32 bytes in   1
blocks (ref 0) 0xbf15c0
        ../source4/lib/messaging/messaging.c:802 contains     32 bytes
in   1 blocks (ref 0) 0xbf1540
        struct ndr_pull                contains    128 bytes in   2
blocks (ref 0) 0x1f2abb0
            struct ndr_token_list          contains     32 bytes in   1
blocks (ref 0) 0x18347a0



--
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: [PATCH] Attempt to replicate DNS zones at domain join time (drepl server crash)

Andrew Bartlett
In reply to this post by Andrew Bartlett
On Thu, 2012-06-21 at 23:49 +1000, Andrew Bartlett wrote:
> This patch tries to reduce the pain around replicating DNS.  We now do
> it at join time.
>
> However, at least during make test, it causes a segfault in the DRS
> server, which I can't yet pin down (even with valgrind I don't get a
> useful answer).

I've found and fixed the segfault issue, so now I want testing of the
join.py modifications.

https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/fix-dns-replication

If those who are having pain getting DNS replication up and going can
try with these 2 patches, I hope this may solve some of the issues.

You still need to run samba_upgradedns after the join, but I'll include
that when I get a chance.  This should at least mean that the partitions
are correctly replicated, which has been the biggest pain point.

We do really want this to work for folks, and I'm sorry it has taken so
long to investigate.

Thanks,

Andrew Bartlett
--
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org

0001-s4-drepl-Ensure-that-the-op-source-does-not-get-deal.patch (3K) Download Attachment
0002-s4-join-Import-DNS-zones-in-AD-DC-join.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: [PATCH] Attempt to replicate DNS zones at domain join time (drepl server crash)

Stefan (metze) Metzmacher
Hi Andrew,

> On Thu, 2012-06-21 at 23:49 +1000, Andrew Bartlett wrote:
>> This patch tries to reduce the pain around replicating DNS.  We now do
>> it at join time.
>>
>> However, at least during make test, it causes a segfault in the DRS
>> server, which I can't yet pin down (even with valgrind I don't get a
>> useful answer).
>
> I've found and fixed the segfault issue, so now I want testing of the
> join.py modifications.
>
> https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/fix-dns-replication
>
Your're still set HasMasterNCs to the full nc list, which is wrong.

metze


signature.asc (270 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: [PATCH] Attempt to replicate DNS zones at domain join time (drepl server crash)

Amitay Isaacs
In reply to this post by Andrew Bartlett
Hi Andrew,

On Fri, Jun 22, 2012 at 9:48 AM, Andrew Bartlett <[hidden email]> wrote:

> On Thu, 2012-06-21 at 23:49 +1000, Andrew Bartlett wrote:
>> This patch tries to reduce the pain around replicating DNS.  We now do
>> it at join time.
>>
>> However, at least during make test, it causes a segfault in the DRS
>> server, which I can't yet pin down (even with valgrind I don't get a
>> useful answer).
>
> I've found and fixed the segfault issue, so now I want testing of the
> join.py modifications.
>
> https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/fix-dns-replication
>
> If those who are having pain getting DNS replication up and going can
> try with these 2 patches, I hope this may solve some of the issues.

If the DNS role is not assigned to a (windows) DC, it never replicates
the DNS partition and also does not have DNS NCs listed in
msDS-hasMasterNCs. So, it appears that adding DNS NCs in
msDS-hasMasterNCs attribute is equivalent to adding DNS role to the
second DC.

May be that'll fix the replication issue. I was under the assumption
that msDS-hasMasterNCs attribute is set only after the replication is
complete. But that's not true. It has to be set if the DC is going to
hold a full replica of the NC.

> You still need to run samba_upgradedns after the join, but I'll include
> that when I get a chance.  This should at least mean that the partitions
> are correctly replicated, which has been the biggest pain point.

Since you have added dns_backend option to join, we can potentially
short-circuit running samba_upgradedns and run parts of dns provision
directly.

> We do really want this to work for folks, and I'm sorry it has taken so
> long to investigate.
>
> Thanks,
>
> Andrew Bartlett
> --
> Andrew Bartlett                                http://samba.org/~abartlet/
> Authentication Developer, Samba Team           http://samba.org

Amitay.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: [PATCH] Attempt to replicate DNS zones at domain join time (drepl server crash)

Andrew Bartlett
In reply to this post by Stefan (metze) Metzmacher
On Fri, 2012-06-22 at 08:29 +0200, Stefan (metze) Metzmacher wrote:

> Hi Andrew,
>
> > On Thu, 2012-06-21 at 23:49 +1000, Andrew Bartlett wrote:
> >> This patch tries to reduce the pain around replicating DNS.  We now do
> >> it at join time.
> >>
> >> However, at least during make test, it causes a segfault in the DRS
> >> server, which I can't yet pin down (even with valgrind I don't get a
> >> useful answer).
> >
> > I've found and fixed the segfault issue, so now I want testing of the
> > join.py modifications.
> >
> > https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/fix-dns-replication
> >
>
> Your're still set HasMasterNCs to the full nc list, which is wrong.

Ahh, now I get what you mean.  I was fixated on the fix for the
segfault, and didn't get time to look at join.py with un-tired eyes :-)

I'm sure I can fix that up, if that's the only issue.

Andrew Bartlett

--
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: [PATCH] Attempt to replicate DNS zones at domain join time (drepl server crash)

Andrew Bartlett
In reply to this post by Amitay Isaacs
On Fri, 2012-06-22 at 17:17 +1000, Amitay Isaacs wrote:

> Hi Andrew,
>
> On Fri, Jun 22, 2012 at 9:48 AM, Andrew Bartlett <[hidden email]> wrote:
> > On Thu, 2012-06-21 at 23:49 +1000, Andrew Bartlett wrote:
> >> This patch tries to reduce the pain around replicating DNS.  We now do
> >> it at join time.
> >>
> >> However, at least during make test, it causes a segfault in the DRS
> >> server, which I can't yet pin down (even with valgrind I don't get a
> >> useful answer).
> >
> > I've found and fixed the segfault issue, so now I want testing of the
> > join.py modifications.
> >
> > https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/fix-dns-replication
> >
> > If those who are having pain getting DNS replication up and going can
> > try with these 2 patches, I hope this may solve some of the issues.
>
> If the DNS role is not assigned to a (windows) DC, it never replicates
> the DNS partition and also does not have DNS NCs listed in
> msDS-hasMasterNCs. So, it appears that adding DNS NCs in
> msDS-hasMasterNCs attribute is equivalent to adding DNS role to the
> second DC.
>
> May be that'll fix the replication issue. I was under the assumption
> that msDS-hasMasterNCs attribute is set only after the replication is
> complete. But that's not true. It has to be set if the DC is going to
> hold a full replica of the NC.

OK.  So, aside from fixing it to use the right attribute, we might be on
the way to a solution then.

> > You still need to run samba_upgradedns after the join, but I'll include
> > that when I get a chance.  This should at least mean that the partitions
> > are correctly replicated, which has been the biggest pain point.
>
> Since you have added dns_backend option to join, we can potentially
> short-circuit running samba_upgradedns and run parts of dns provision
> directly.

That's essentially what I want to have happen.  

The one query I have is:  What happens if the DC we choose to replicate
the rest of the data from doesn't hold the DNS partitions?

Andrew Bartlett

--
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: [PATCH] Attempt to replicate DNS zones at domain join time (drepl server crash)

Amitay Isaacs
On Fri, Jun 22, 2012 at 6:08 PM, Andrew Bartlett <[hidden email]> wrote:

> On Fri, 2012-06-22 at 17:17 +1000, Amitay Isaacs wrote:
>> Hi Andrew,
>>
>> On Fri, Jun 22, 2012 at 9:48 AM, Andrew Bartlett <[hidden email]> wrote:
>> > On Thu, 2012-06-21 at 23:49 +1000, Andrew Bartlett wrote:
>> >> This patch tries to reduce the pain around replicating DNS.  We now do
>> >> it at join time.
>> >>
>> >> However, at least during make test, it causes a segfault in the DRS
>> >> server, which I can't yet pin down (even with valgrind I don't get a
>> >> useful answer).
>> >
>> > I've found and fixed the segfault issue, so now I want testing of the
>> > join.py modifications.
>> >
>> > https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/fix-dns-replication
>> >
>> > If those who are having pain getting DNS replication up and going can
>> > try with these 2 patches, I hope this may solve some of the issues.
>>
>> If the DNS role is not assigned to a (windows) DC, it never replicates
>> the DNS partition and also does not have DNS NCs listed in
>> msDS-hasMasterNCs. So, it appears that adding DNS NCs in
>> msDS-hasMasterNCs attribute is equivalent to adding DNS role to the
>> second DC.
>>
>> May be that'll fix the replication issue. I was under the assumption
>> that msDS-hasMasterNCs attribute is set only after the replication is
>> complete. But that's not true. It has to be set if the DC is going to
>> hold a full replica of the NC.
>
> OK.  So, aside from fixing it to use the right attribute, we might be on
> the way to a solution then.
>
>> > You still need to run samba_upgradedns after the join, but I'll include
>> > that when I get a chance.  This should at least mean that the partitions
>> > are correctly replicated, which has been the biggest pain point.
>>
>> Since you have added dns_backend option to join, we can potentially
>> short-circuit running samba_upgradedns and run parts of dns provision
>> directly.
>
> That's essentially what I want to have happen.
>
> The one query I have is:  What happens if the DC we choose to replicate
> the rest of the data from doesn't hold the DNS partitions?

As I understand, it should be the job of KCC to figure out which
partitions should be replicated. The current implementation of KCC
sets up replication between each DC for all partitions. So if a second
DC does not have application partitions, first DC should not try to
replicate those partitions to second DC. May be we need to switch to
python KCC and make sure it does the correct thing.

Amitay
Loading...