Quantcast

[TEST][PATCH] Replication errors with Samba4

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[TEST][PATCH] Replication errors with Samba4

Andrew Bartlett
Metze,

In my repl-devel branch I have a series of patches to better test our
replication and conflict resolution handling.

https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/repl-devel

Currently we have a number of issues in this area.  The test I added
there shows that we do not consistently handle the conflict resolution.
This is particularly the case with conflicting renamed.

The attempts at modification of the replication code I've included try
to handle some of this, but it still doesn't work.  

However, this code remains dizzyingly complex, and I wondered if,
particularly as I now have a reasonable testsuite, you might be able toa
assist me in making this more reliable?

Thanks,

Andrew Bartlett

--
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: [TEST][PATCH] Replication errors with Samba4

Andrew Bartlett
On Mon, 2012-07-30 at 23:47 +1000, Andrew Bartlett wrote:

> Metze,
>
> In my repl-devel branch I have a series of patches to better test our
> replication and conflict resolution handling.
>
> https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/repl-devel
>
> Currently we have a number of issues in this area.  The test I added
> there shows that we do not consistently handle the conflict resolution.
> This is particularly the case with conflicting renamed.
>
> The attempts at modification of the replication code I've included try
> to handle some of this, but it still doesn't work.  
>
> However, this code remains dizzyingly complex, and I wondered if,
> particularly as I now have a reasonable testsuite, you might be able toa
> assist me in making this more reliable?

I've found some of the issues here, but I still can't make the conflict
handling reliable.  I've put in the test simply asserting that one or
other record becomes a conflict, until we can get back to this.  It
would be very helpful to me if you could look at this area, as this
should be deterministic :-(.

Still, at least is no longer stops or crashes.

Andrew Bartlett

--
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: [TEST][PATCH] Replication errors with Samba4

Stefan (metze) Metzmacher
Hi Andrew,

>> In my repl-devel branch I have a series of patches to better test our
>> replication and conflict resolution handling.
>>
>> https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/repl-devel
>>
>> Currently we have a number of issues in this area.  The test I added
>> there shows that we do not consistently handle the conflict resolution.
>> This is particularly the case with conflicting renamed.
>>
>> The attempts at modification of the replication code I've included try
>> to handle some of this, but it still doesn't work.  
>>
>> However, this code remains dizzyingly complex, and I wondered if,
>> particularly as I now have a reasonable testsuite, you might be able toa
>> assist me in making this more reliable?
>
> I've found some of the issues here, but I still can't make the conflict
> handling reliable.  I've put in the test simply asserting that one or
> other record becomes a conflict, until we can get back to this.  It
> would be very helpful to me if you could look at this area, as this
> should be deterministic :-(.
>
> Still, at least is no longer stops or crashes.
Does it randomly fail make test (if so what's the test name?)
or do you see the strange behavior in normal operation?

I was also debugging a replication problem with the servicePrincipalName
attribute
on a RODC, maybe this is related.

metze


signature.asc (270 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: [TEST][PATCH] Replication errors with Samba4

Andrew Bartlett
On Tue, 2012-07-31 at 08:04 +0200, Stefan (metze) Metzmacher wrote:

> Hi Andrew,
>
> >> In my repl-devel branch I have a series of patches to better test our
> >> replication and conflict resolution handling.
> >>
> >> https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/repl-devel
> >>
> >> Currently we have a number of issues in this area.  The test I added
> >> there shows that we do not consistently handle the conflict resolution.
> >> This is particularly the case with conflicting renamed.
> >>
> >> The attempts at modification of the replication code I've included try
> >> to handle some of this, but it still doesn't work.  
> >>
> >> However, this code remains dizzyingly complex, and I wondered if,
> >> particularly as I now have a reasonable testsuite, you might be able toa
> >> assist me in making this more reliable?
> >
> > I've found some of the issues here, but I still can't make the conflict
> > handling reliable.  I've put in the test simply asserting that one or
> > other record becomes a conflict, until we can get back to this.  It
> > would be very helpful to me if you could look at this area, as this
> > should be deterministic :-(.
> >
> > Still, at least is no longer stops or crashes.
>
> Does it randomly fail make test (if so what's the test name?)
> or do you see the strange behavior in normal operation?

What happens is that the additional tests I added in
samba4.drs.replica_sync.python fail randomly.  

To get the rest of the patch into mater (and to ensure we have any
coverage of this codepath at all), I've modified the tests to accept
that one DN or the other is made into a conflict, but not to assert on
which one in particular is the conflict.   This is in autobuild now.

On that branch, It is clear that it's random because if you run it
twice, the line number (corresponding to unit tests) of the assertions
changes.

Once these are in master, I'll update that branch with just the stricter
test.

Andrew Bartlett

--
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: [TEST][PATCH] Replication errors with Samba4

Andrew Bartlett
On Tue, 2012-07-31 at 16:09 +1000, Andrew Bartlett wrote:

> On Tue, 2012-07-31 at 08:04 +0200, Stefan (metze) Metzmacher wrote:
> > Hi Andrew,
> >
> > >> In my repl-devel branch I have a series of patches to better test our
> > >> replication and conflict resolution handling.
> > >>
> > >> https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/repl-devel
> > >>
> > >> Currently we have a number of issues in this area.  The test I added
> > >> there shows that we do not consistently handle the conflict resolution.
> > >> This is particularly the case with conflicting renamed.
> > >>
> > >> The attempts at modification of the replication code I've included try
> > >> to handle some of this, but it still doesn't work.  
> > >>
> > >> However, this code remains dizzyingly complex, and I wondered if,
> > >> particularly as I now have a reasonable testsuite, you might be able toa
> > >> assist me in making this more reliable?
> > >
> > > I've found some of the issues here, but I still can't make the conflict
> > > handling reliable.  I've put in the test simply asserting that one or
> > > other record becomes a conflict, until we can get back to this.  It
> > > would be very helpful to me if you could look at this area, as this
> > > should be deterministic :-(.
> > >
> > > Still, at least is no longer stops or crashes.
> >
> > Does it randomly fail make test (if so what's the test name?)
> > or do you see the strange behavior in normal operation?
>
> What happens is that the additional tests I added in
> samba4.drs.replica_sync.python fail randomly.  
>
> To get the rest of the patch into mater (and to ensure we have any
> coverage of this codepath at all), I've modified the tests to accept
> that one DN or the other is made into a conflict, but not to assert on
> which one in particular is the conflict.   This is in autobuild now.
>
> On that branch, It is clear that it's random because if you run it
> twice, the line number (corresponding to unit tests) of the assertions
> changes.
>
> Once these are in master, I'll update that branch with just the stricter
> test.

I've updated the branch.  To reproduce, just run:

make test TESTS=samba4.drs.replica_sync.python

--
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: [TEST][PATCH] Replication errors with Samba4

Stefan (metze) Metzmacher
Am 31.07.2012 08:37, schrieb Andrew Bartlett:

> On Tue, 2012-07-31 at 16:09 +1000, Andrew Bartlett wrote:
>> On Tue, 2012-07-31 at 08:04 +0200, Stefan (metze) Metzmacher wrote:
>>> Hi Andrew,
>>>
>>>>> In my repl-devel branch I have a series of patches to better test our
>>>>> replication and conflict resolution handling.
>>>>>
>>>>> https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/repl-devel
>>>>>
>>>>> Currently we have a number of issues in this area.  The test I added
>>>>> there shows that we do not consistently handle the conflict resolution.
>>>>> This is particularly the case with conflicting renamed.
>>>>>
>>>>> The attempts at modification of the replication code I've included try
>>>>> to handle some of this, but it still doesn't work.  
>>>>>
>>>>> However, this code remains dizzyingly complex, and I wondered if,
>>>>> particularly as I now have a reasonable testsuite, you might be able toa
>>>>> assist me in making this more reliable?
>>>>
>>>> I've found some of the issues here, but I still can't make the conflict
>>>> handling reliable.  I've put in the test simply asserting that one or
>>>> other record becomes a conflict, until we can get back to this.  It
>>>> would be very helpful to me if you could look at this area, as this
>>>> should be deterministic :-(.
>>>>
>>>> Still, at least is no longer stops or crashes.
>>>
>>> Does it randomly fail make test (if so what's the test name?)
>>> or do you see the strange behavior in normal operation?
>>
>> What happens is that the additional tests I added in
>> samba4.drs.replica_sync.python fail randomly.  
>>
>> To get the rest of the patch into mater (and to ensure we have any
>> coverage of this codepath at all), I've modified the tests to accept
>> that one DN or the other is made into a conflict, but not to assert on
>> which one in particular is the conflict.   This is in autobuild now.
>>
>> On that branch, It is clear that it's random because if you run it
>> twice, the line number (corresponding to unit tests) of the assertions
>> changes.
>>
>> Once these are in master, I'll update that branch with just the stricter
>> test.
>
> I've updated the branch.  To reproduce, just run:
>
> make test TESTS=samba4.drs.replica_sync.python
I guess it's related to the fact that the conflict resolution also depends
on the invocationId. The timestamps are in 1 sec intervals, in the protocol!
I think you should find out the invocationId and define the dc with the
lower
invocationId as dc1 and the other as dc2.

metze


signature.asc (270 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: [TEST][PATCH] Replication errors with Samba4

Andrew Bartlett
On Tue, 2012-07-31 at 10:37 +0200, Stefan (metze) Metzmacher wrote:

> Am 31.07.2012 08:37, schrieb Andrew Bartlett:
> > On Tue, 2012-07-31 at 16:09 +1000, Andrew Bartlett wrote:
> >> On Tue, 2012-07-31 at 08:04 +0200, Stefan (metze) Metzmacher wrote:
> >>> Hi Andrew,
> >>>
> >>>>> In my repl-devel branch I have a series of patches to better test our
> >>>>> replication and conflict resolution handling.
> >>>>>
> >>>>> https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/repl-devel
> >>>>>
> >>>>> Currently we have a number of issues in this area.  The test I added
> >>>>> there shows that we do not consistently handle the conflict resolution.
> >>>>> This is particularly the case with conflicting renamed.
> >>>>>
> >>>>> The attempts at modification of the replication code I've included try
> >>>>> to handle some of this, but it still doesn't work.  
> >>>>>
> >>>>> However, this code remains dizzyingly complex, and I wondered if,
> >>>>> particularly as I now have a reasonable testsuite, you might be able toa
> >>>>> assist me in making this more reliable?
> >>>>
> >>>> I've found some of the issues here, but I still can't make the conflict
> >>>> handling reliable.  I've put in the test simply asserting that one or
> >>>> other record becomes a conflict, until we can get back to this.  It
> >>>> would be very helpful to me if you could look at this area, as this
> >>>> should be deterministic :-(.
> >>>>
> >>>> Still, at least is no longer stops or crashes.
> >>>
> >>> Does it randomly fail make test (if so what's the test name?)
> >>> or do you see the strange behavior in normal operation?
> >>
> >> What happens is that the additional tests I added in
> >> samba4.drs.replica_sync.python fail randomly.  
> >>
> >> To get the rest of the patch into mater (and to ensure we have any
> >> coverage of this codepath at all), I've modified the tests to accept
> >> that one DN or the other is made into a conflict, but not to assert on
> >> which one in particular is the conflict.   This is in autobuild now.
> >>
> >> On that branch, It is clear that it's random because if you run it
> >> twice, the line number (corresponding to unit tests) of the assertions
> >> changes.
> >>
> >> Once these are in master, I'll update that branch with just the stricter
> >> test.
> >
> > I've updated the branch.  To reproduce, just run:
> >
> > make test TESTS=samba4.drs.replica_sync.python
>
> I guess it's related to the fact that the conflict resolution also depends
> on the invocationId. The timestamps are in 1 sec intervals, in the protocol!

Ouch!  Does that mean I would cause damage with this patch:
https://git.samba.org/?p=abartlet/samba.git/.git;a=commitdiff;h=862b26518a0629f6112fb7e6270c0b98ef71a855

(or would the NDR layer just remove the partial seconds anyway?)

It seems better to always work with NTTIME - if it's not harmful I'll
just change the commit message to clarify.

> I think you should find out the invocationId and define the dc with the
> lower
> invocationId as dc1 and the other as dc2.

I can just put some sleep into the tests to get times different if
that's what is going on.

(I've stopped my autobuild, which includes the next beta because it was
due today, pending resolving this)

Andrew Bartlett

--
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: [TEST][PATCH] Replication errors with Samba4

Stefan (metze) Metzmacher
Hi Andrew,

>>> I've updated the branch.  To reproduce, just run:
>>>
>>> make test TESTS=samba4.drs.replica_sync.python
>>
>> I guess it's related to the fact that the conflict resolution also depends
>> on the invocationId. The timestamps are in 1 sec intervals, in the protocol!
>
> Ouch!  Does that mean I would cause damage with this patch:
> https://git.samba.org/?p=abartlet/samba.git/.git;a=commitdiff;h=862b26518a0629f6112fb7e6270c0b98ef71a855
>
> (or would the NDR layer just remove the partial seconds anyway?)
I guess so

> It seems better to always work with NTTIME - if it's not harmful I'll
> just change the commit message to clarify.

I'd prefer to just skip that patch.

>> I think you should find out the invocationId and define the dc with the
>> lower
>> invocationId as dc1 and the other as dc2.
>
> I can just put some sleep into the tests to get times different if
> that's what is going on.

maybe for some parts, but you should also test the resolution based on the
invocationId and assing the dc1 and dc2 variable based on the invocationId.

> (I've stopped my autobuild, which includes the next beta because it was
> due today, pending resolving this)

Didn't it fail on a dbcheck test (something with lastKnownParent)?

metze


signature.asc (270 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: [TEST][PATCH] Replication errors with Samba4

Andrew Bartlett
On Tue, 2012-07-31 at 10:50 +0200, Stefan (metze) Metzmacher wrote:

> Hi Andrew,
>
> >>> I've updated the branch.  To reproduce, just run:
> >>>
> >>> make test TESTS=samba4.drs.replica_sync.python
> >>
> >> I guess it's related to the fact that the conflict resolution also depends
> >> on the invocationId. The timestamps are in 1 sec intervals, in the protocol!
> >
> > Ouch!  Does that mean I would cause damage with this patch:
> > https://git.samba.org/?p=abartlet/samba.git/.git;a=commitdiff;h=862b26518a0629f6112fb7e6270c0b98ef71a855
> >
> > (or would the NDR layer just remove the partial seconds anyway?)
>
> I guess so
>
> > It seems better to always work with NTTIME - if it's not harmful I'll
> > just change the commit message to clarify.
>
> I'd prefer to just skip that patch.

I'll do that.  Thanks for the feedback.

> >> I think you should find out the invocationId and define the dc with the
> >> lower
> >> invocationId as dc1 and the other as dc2.
> >
> > I can just put some sleep into the tests to get times different if
> > that's what is going on.
>
> maybe for some parts, but you should also test the resolution based on the
> invocationId and assing the dc1 and dc2 variable based on the invocationId.

That certainly sounds like a reasonable extension.

> > (I've stopped my autobuild, which includes the next beta because it was
> > due today, pending resolving this)
>
> Didn't it fail on a dbcheck test (something with lastKnownParent)?

It did, and then I fixed that up, then had this discussion.   I'll
upgrade the branch.

Andrew Bartlett

--
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org

Loading...