samba getting stuck, highwatermark replication issue?

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

samba getting stuck, highwatermark replication issue?

Samba - General mailing list
Hi all,

We would appreciate some input here. Not sure where to look...

We have three AD DCs, all running samba 4.5.10, and since a few days,
the samba DCs are getting stuck regularly, at ramdon times. Happens to
all three of them, randomly, and currently it is happening up to a few
times per day..! Must be some common cause.

For the rest, the systems appear fine, enough diskspace, nothing special
in syslog, etc.

We usually detect that a DC has become stuck, because LDAP auth no
longer works in that DC. Checking with "service sernet-samba-ad status"
will still report "Running".

After shutting down samba ("service sernet-samba-ad stop") one process
usually is still running, and prevents a restart from succeeding, always
because:

> Failed to listen on 0.0.0.0:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED

ps aux tells me that the process is: "samba -D"

Killing that process makes samba startup succeed, replication work
again, and samba funcion, until the next time this happens.

But WHY is samba getting stuck in the first place?

We are getting the following unusual in the logs on all three DCs:
>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=a_username,CN=Users,DC=samba,DC=company,DC=com)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
and the last line keeps repeating 2 - 3 times per second, completely
filling up the logs. The start-off username  differs per DC, but on each
DC it usually remains the same. (I have seen 5 or 6 different usernames
in total)

samba-tool dbcheck --cross-ncs looks similar on all three DCs, with
*many* errors about unsorted attributes, that I think I've been told in
the past are harmless:

> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x0002000d
> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00020002
> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00020001
> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x0000000d
> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00000003
> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00000000
> ERROR: unsorted attributeID values in replPropertyMetaData on CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com
>
> Not fixing replPropertyMetaData on CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com
>
> Please use --fix to fix these errors
> Checked 4948 objects (4193 errors)

All 4948 errors are about unsorted attributeID, with the following
exception: There appear still some references to an old (many YEARS ago
removed) DC:
> ERROR: no target object found for GUID component for msDS-NC-Replica-Locations in object CN=84bea0a7-82dd-4237-9296-030573700698,CN=Partitions,CN=Configuration,DC=samba,DC=company,DC=com - <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=187541>;<RMD_ORIGINATING_USN=3630>;<RMD_VERSION=0>;CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=company,DC=com
> ERROR: no target object found for GUID component for msDS-NC-Replica-Locations in object CN=d9d76e21-8cae-457d-b212-6cb192612739,CN=Partitions,CN=Configuration,DC=samba,DC=company,DC=com - <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=187515>;<RMD_ORIGINATING_USN=3631>;<RMD_VERSION=0>;CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=company,DC=com

That's about all info I can gather.

The very basic smb.conf on the DCs::

> [global]
> workgroup = WRKGRP
> realm = samba.company.com
> netbios name = DC4
> server role = active directory domain controller
> log level = 3
> dns forwarder = 192.x.x.x
> server signing = mandatory
> ntlm auth = yes
> ldap server require strong auth = no
> idmap_ldb:use rfc2307 = yes
>
> [netlogon]
> path = /var/lib/samba/sysvol/samba.company.com/scripts
> read only = No
>
> [sysvol]
> path = /var/lib/samba/sysvol
> read only = No
> acl_xattr:ignore system acls = yes

We have been running 4.5.10 since may 2017, and this issue started this
week.

Anyone with an idea?

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: samba getting stuck, highwatermark replication issue?

Samba - General mailing list
A bit more info:

We are currently getting those errors on DC2:
> ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=a_username,CN=Users,DC=samba,DC=company,DC=com)
> ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
> ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)

and they are also causing very high cpu usage on that DC. (85 - 90%)
On the other DCs, cpu usage is normal.

Replication still going strong, so DC2 is buzy, but functional.

The pid with high cpu usage is 3155, processlist:

> root@DC2:/var/log/samba# ps aux
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> root         1  0.0  0.0  10656   800 ?        Ss   17:22   0:00 init [2]
....

> root      1732  0.0  0.0  25304   420 ?        Ss   17:22   0:00 /usr/sbin/rpc.idmapd
> root      3153  0.0  0.5 553028 45272 ?        Ss   17:49   0:00 /usr/sbin/samba -D
> root      3154  0.0  0.3 553028 32644 ?        S    17:49   0:00 /usr/sbin/samba -D
> root      3155 85.5  0.7 561376 60052 ?        R    17:49 134:29 /usr/sbin/samba -D
> root      3156  0.0  0.6 541756 49448 ?        Ss   17:49   0:00 /usr/sbin/smbd -D --option=server role check:inhibit=yes --foreground
> root      3157  0.0  0.4 557180 35124 ?        S    17:49   0:00 /usr/sbin/samba -D
> root      3158  0.0  0.3 553028 32636 ?        S    17:49   0:00 /usr/sbin/samba -D
> root      3159  3.3  0.8 554536 70464 ?        S    17:49   5:16 /usr/sbin/samba -D
> root      3160  0.1  0.4 553028 34016 ?        S    17:49   0:10 /usr/sbin/samba -D
> root      3161  0.3  0.4 557180 36440 ?        S    17:49   0:31 /usr/sbin/samba -D
> root      3162  0.1  0.4 568024 37800 ?        S    17:49   0:09 /usr/sbin/samba -D
> root      3163  0.0  0.3 553028 32636 ?        S    17:49   0:00 /usr/sbin/samba -D
> root      3164  0.0  0.4 553028 33752 ?        S    17:49   0:00 /usr/sbin/samba -D
> root      3165  0.0  0.7 557180 60000 ?        S    17:49   0:00 /usr/sbin/samba -D
> root      3166  0.0  0.4 553028 33588 ?        S    17:49   0:00 /usr/sbin/samba -D
> root      3167  0.0  0.4 553548 35232 ?        S    17:49   0:08 /usr/sbin/samba -D
> root      3170  0.0  0.5 484364 46824 ?        Ss   17:49   0:00 /usr/sbin/winbindd -D --option=server role check:inhibit=yes --foreground
> root      3171  0.0  0.3 530724 32708 ?        S    17:49   0:00 /usr/sbin/smbd -D --option=server role check:inhibit=yes --foreground
> root      3172  0.0  0.4 530740 32828 ?        S    17:49   0:00 /usr/sbin/smbd -D --option=server role check:inhibit=yes --foreground
> root      3174  0.0  0.4 541748 34048 ?        S    17:49   0:00 /usr/sbin/smbd -D --option=server role check:inhibit=yes --foreground
> root      3175  0.0  0.4 489260 35340 ?        S    17:49   0:00 /usr/sbin/winbindd -D --option=server role check:inhibit=yes --foreground
> root      3195  0.0  0.4 484364 34448 ?        S    17:49   0:00 /usr/sbin/winbindd -D --option=server role check:inhibit=yes --foreground
> root      3262  0.0  0.4 550092 38520 ?        S    17:52   0:00 /usr/sbin/smbd -D --option=server role check:inhibit=yes --foreground
> root      3856  0.0  0.4 550092 38592 ?        S    18:38   0:00 /usr/sbin/smbd -D --option=server role check:inhibit=yes --foreground
> root      4821  0.0  0.4 550116 38716 ?        S    20:06   0:00 /usr/sbin/smbd -D --option=server role check:inhibit=yes --foreground
> 1464      4976  0.0  0.4 550116 38720 ?        S    20:20   0:00 /usr/sbin/smbd -D --option=server role check:inhibit=yes --foreground
> root      5033  0.0  0.0  25216  1336 pts/0    R+   20:26   0:00 ps aux

Suggestions?

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: samba getting stuck, highwatermark replication issue?

Samba - General mailing list
In reply to this post by Samba - General mailing list
On 10/9/2017 1:28 PM, mj via samba wrote:

> Hi all,
>
> We would appreciate some input here. Not sure where to look...
>
> We have three AD DCs, all running samba 4.5.10, and since a few days,
> the samba DCs are getting stuck regularly, at ramdon times. Happens to
> all three of them, randomly, and currently it is happening up to a few
> times per day..! Must be some common cause.
>
> For the rest, the systems appear fine, enough diskspace, nothing
> special in syslog, etc.
>
> We usually detect that a DC has become stuck, because LDAP auth no
> longer works in that DC. Checking with "service sernet-samba-ad
> status" will still report "Running".
>
> After shutting down samba ("service sernet-samba-ad stop") one process
> usually is still running, and prevents a restart from succeeding,
> always because:
>
>> Failed to listen on 0.0.0.0:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
>
> ps aux tells me that the process is: "samba -D"
>
> Killing that process makes samba startup succeed, replication work
> again, and samba funcion, until the next time this happens.
>
> But WHY is samba getting stuck in the first place?
>
> We are getting the following unusual in the logs on all three DCs:
>> ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd
>> replication on DN DC=samba,DC=company,DC=com older highwatermark
>> (last_dn CN=a_username,CN=Users,DC=samba,DC=company,DC=com)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges
>> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark
>> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges
>> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark
>> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
> and the last line keeps repeating 2 - 3 times per second, completely
> filling up the logs. The start-off username  differs per DC, but on
> each DC it usually remains the same. (I have seen 5 or 6 different
> usernames in total)
>
> samba-tool dbcheck --cross-ncs looks similar on all three DCs, with
> *many* errors about unsorted attributes, that I think I've been told
> in the past are harmless:
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x0002000d
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00020002
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00020001
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x0000000d
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00000003
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00000000
>> ERROR: unsorted attributeID values in replPropertyMetaData on
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com
>>
>> Not fixing replPropertyMetaData on
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com
>>
>> Please use --fix to fix these errors
>> Checked 4948 objects (4193 errors)
>
> All 4948 errors are about unsorted attributeID, with the following
> exception: There appear still some references to an old (many YEARS
> ago removed) DC:
>> ERROR: no target object found for GUID component for
>> msDS-NC-Replica-Locations in object
>> CN=84bea0a7-82dd-4237-9296-030573700698,CN=Partitions,CN=Configuration,DC=samba,DC=company,DC=com
>> -
>> <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=187541>;<RMD_ORIGINATING_USN=3630>;<RMD_VERSION=0>;CN=NTDS
>> Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=company,DC=com
>>
>> ERROR: no target object found for GUID component for
>> msDS-NC-Replica-Locations in object
>> CN=d9d76e21-8cae-457d-b212-6cb192612739,CN=Partitions,CN=Configuration,DC=samba,DC=company,DC=com
>> -
>> <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=187515>;<RMD_ORIGINATING_USN=3631>;<RMD_VERSION=0>;CN=NTDS
>> Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=company,DC=com
>>
>
> That's about all info I can gather.
>
> The very basic smb.conf on the DCs::
>
>> [global]
>>     workgroup = WRKGRP
>>     realm = samba.company.com
>>     netbios name = DC4
>>     server role = active directory domain controller
>>     log level = 3
>>     dns forwarder = 192.x.x.x
>>     server signing = mandatory
>>     ntlm auth = yes
>>     ldap server require strong auth = no
>>     idmap_ldb:use rfc2307 = yes
>>
>> [netlogon]
>>     path = /var/lib/samba/sysvol/samba.company.com/scripts
>>     read only = No
>>
>> [sysvol]
>>     path = /var/lib/samba/sysvol
>>     read only = No
>>     acl_xattr:ignore system acls = yes
>
> We have been running 4.5.10 since may 2017, and this issue started
> this week.
>
> Anyone with an idea?
>
You should be able to fix the 'replPropertyMetaData' errors with;

samba-tool dbcheck --cross-ncs --fix --yes 'fix_replmetadata_unsorted_attid'

The highwatermark doesn't necessarily reflect an issue. It's part of how
the destination DC keeps track of changes from the source DC. Can you
verify the time and date is correct on all DC's?

The GUID errors seem related to your old DC offline and NTDS connections
still lingering.  Open Microsoft Sites and Services and remove the ones
no longer needed.



--
--
James


--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: samba getting stuck, highwatermark replication issue?

Samba - General mailing list
Hi James,

Thanks for the quick reply.

On 10/09/2017 08:52 PM, lingpanda101 via samba wrote:

> You should be able to fix the 'replPropertyMetaData' errors with;
>
> samba-tool dbcheck --cross-ncs --fix --yes
> 'fix_replmetadata_unsorted_attid'
Yep, worked great! Fixed all of those replPropertyMetaData errors! :-)

> The highwatermark doesn't necessarily reflect an issue. It's part of how
> the destination DC keeps track of changes from the source DC. Can you
> verify the time and date is correct on all DC's?
Date & time matches. But the fact that the same identical message is
logged multiple times per second, without an end seems a bit strange...
Combined with high cpu usage on the DC where this happens. (yesterday
DC2, currently on DC4)

> The GUID errors seem related to your old DC offline and NTDS connections
> still lingering.  Open Microsoft Sites and Services and remove the ones
> no longer needed.
There is no DC1 mentioned anywhere there. And the two errors remain:

> ERROR: no target object found for GUID component for msDS-NC-Replica-Locations in object CN=84bea0a7-82dd-4237-9296-030573700698,CN=Partitions,CN=Configuration,DC=samba,DC=merit,DC=unu,DC=edu - <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=4605>;<RMD_ORIGINATING_USN=3630>;<RMD_VERSION=0>;CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=merit,DC=unu,DC=edu
> Not removing dangling forward link
> ERROR: no target object found for GUID component for msDS-NC-Replica-Locations in object CN=d9d76e21-8cae-457d-b212-6cb192612739,CN=Partitions,CN=Configuration,DC=samba,DC=merit,DC=unu,DC=edu - <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=4579>;<RMD_ORIGINATING_USN=3631>;<RMD_VERSION=0>;CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=merit,DC=unu,DC=edu
> Not removing dangling forward link

I was asked a question during the samba-tool dbcheck:

> Add yourself to the replica locations for DC=DomainDnsZones,DC=samba,DC=company,DC=com? [y/N/all/none] N
> Not fixing missing/incorrect attributes on DC=DomainDnsZones,DC=samba,DC=company,DC=com
>
> Add yourself to the replica locations for DC=ForestDnsZones,DC=samba,DC=company,DC=com? [y/N/all/none] N
> Not fixing missing/incorrect attributes on DC=ForestDnsZones,DC=samba,DC=company,DC=com

Should I answer Yes to those two questions?

MJ

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: samba getting stuck, highwatermark replication issue?

Samba - General mailing list
On 10/10/2017 3:14 AM, mj via samba wrote:

> Hi James,
>
> Thanks for the quick reply.
>
> On 10/09/2017 08:52 PM, lingpanda101 via samba wrote:
>
>> You should be able to fix the 'replPropertyMetaData' errors with;
>>
>> samba-tool dbcheck --cross-ncs --fix --yes
>> 'fix_replmetadata_unsorted_attid'
> Yep, worked great! Fixed all of those replPropertyMetaData errors! :-)
>
>> The highwatermark doesn't necessarily reflect an issue. It's part of
>> how the destination DC keeps track of changes from the source DC. Can
>> you verify the time and date is correct on all DC's?
> Date & time matches. But the fact that the same identical message is
> logged multiple times per second, without an end seems a bit
> strange... Combined with high cpu usage on the DC where this happens.
> (yesterday DC2, currently on DC4)
>
>> The GUID errors seem related to your old DC offline and NTDS
>> connections still lingering.  Open Microsoft Sites and Services and
>> remove the ones no longer needed.
> There is no DC1 mentioned anywhere there. And the two errors remain:
>
>> ERROR: no target object found for GUID component for
>> msDS-NC-Replica-Locations in object
>> CN=84bea0a7-82dd-4237-9296-030573700698,CN=Partitions,CN=Configuration,DC=samba,DC=merit,DC=unu,DC=edu
>> -
>> <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=4605>;<RMD_ORIGINATING_USN=3630>;<RMD_VERSION=0>;CN=NTDS
>> Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=merit,DC=unu,DC=edu
>>
>> Not removing dangling forward link
>> ERROR: no target object found for GUID component for
>> msDS-NC-Replica-Locations in object
>> CN=d9d76e21-8cae-457d-b212-6cb192612739,CN=Partitions,CN=Configuration,DC=samba,DC=merit,DC=unu,DC=edu
>> -
>> <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=4579>;<RMD_ORIGINATING_USN=3631>;<RMD_VERSION=0>;CN=NTDS
>> Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=merit,DC=unu,DC=edu
>>
>> Not removing dangling forward link
>
> I was asked a question during the samba-tool dbcheck:
>
>> Add yourself to the replica locations for
>> DC=DomainDnsZones,DC=samba,DC=company,DC=com? [y/N/all/none] N
>> Not fixing missing/incorrect attributes on
>> DC=DomainDnsZones,DC=samba,DC=company,DC=com
>>
>> Add yourself to the replica locations for
>> DC=ForestDnsZones,DC=samba,DC=company,DC=com? [y/N/all/none] N
>> Not fixing missing/incorrect attributes on
>> DC=ForestDnsZones,DC=samba,DC=company,DC=com
>
> Should I answer Yes to those two questions?
>
> MJ
>
MJ,

     I must have missed this snipit on your first email.

"Not removing dangling forward link"

These are deleted NTDS and harmless. However you can clean them up with.

#samba-tool domain tombstones expunge

It should be safe to say yes to those questions. You could also run the
following command as well for those.

#samba-tool dbcheck --cross-ncs --fix --yes 'fix_replica_locations'

It may be best to run a manual full replication from a good DC to one
that is having the problems. See
https://wiki.samba.org/index.php/Manually_Replicating_Directory_Partitions


--
--
James

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: samba getting stuck, highwatermark replication issue?

Samba - General mailing list
Hi all, James,

After following James' suggestions fixing the several dbcheck errors,
and having observed things for a few days, I'd like to update this
issue, and hope for some new input again. :-)

Summary: three DCs, all three running Version
4.5.10-SerNet-Debian-16.wheezy, samba-tool dbcheck --cross-ncs reports
no errors, except for two (supposedly innocent) dangling forward links
that I'm ignoring for now. Time is synced. Very basic smb.conf, posted
earlier, can post again if needed.

samba-tool ldapcmp dcX dcY --filter=whenChanged shows that they are in
sync, and also samba-tool drs showrepl shows that replication seems to
be stable.

The "getting stuck" from the subject line has not occured for a few
days, perhaps the dbcheck fixes have solved that, or perhaps we've just
been lucky.

All in all this appears pretty healthy, but there is a remaing problem:

At ANY given time, ONE RANDOM single DC shows high cpu usage on one
samba process. And on that DC (can be any of the three DCs) the logs
fill up with this:

> [2017/10/12 08:38:57.956586,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer'
> [2017/10/12 08:38:57.956638,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer]
> [2017/10/12 08:38:57.956823,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer'
> [2017/10/12 08:38:57.956869,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer]
> [2017/10/12 08:38:57.956990,  3] ../source4/auth/ntlm/auth.c:271(auth_check_password_send)
>   auth_check_password_send: Checking password for unmapped user []\[]@[(null)]
>   auth_check_password_send: mapped user is: []\[]@[(null)]
> [2017/10/12 08:38:57.958675,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET'
> [2017/10/12 08:38:57.958728,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET]
> [2017/10/12 08:38:57.958948,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET'
> [2017/10/12 08:38:57.958994,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET]
> [2017/10/12 08:38:57.969111,  0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
> [2017/10/12 08:38:57.969762,  2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
> [2017/10/12 08:38:58.378265,  0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
> [2017/10/12 08:38:58.379160,  2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
> [2017/10/12 08:38:58.810202,  0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
> [2017/10/12 08:38:58.810868,  2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
> [2017/10/12 08:38:59.251863,  0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
> [2017/10/12 08:38:59.252418,  2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
> [2017/10/12 08:38:59.692247,  0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)

I've seen "last_dn" be various things, system groups like above, but
also regular users, computers, and groups that we created. We have even
had (very few) cases were it was:

> ./log.samba.3.gz:  ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn DC=samba,DC=company,DC=com)

Can anyone explain what is happening here, or help me understand this?

I have read that highwatermark errors are not neccesarily bad, but the
fact that they cause continuous high cpu usage on a DC (80, 90%), until
the point where this behaviour "transfers" to a next DC makes me feel
that in this case, this is not normal, and indicates some kind of problem.

Thanks for input!

MJ

MJ

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: samba getting stuck, highwatermark replication issue?

Samba - General mailing list
On 10/12/2017 3:17 AM, mj wrote:

> Hi all, James,
>
> After following James' suggestions fixing the several dbcheck errors,
> and having observed things for a few days, I'd like to update this
> issue, and hope for some new input again. :-)
>
> Summary: three DCs, all three running Version
> 4.5.10-SerNet-Debian-16.wheezy, samba-tool dbcheck --cross-ncs reports
> no errors, except for two (supposedly innocent) dangling forward links
> that I'm ignoring for now. Time is synced. Very basic smb.conf, posted
> earlier, can post again if needed.
>
> samba-tool ldapcmp dcX dcY --filter=whenChanged shows that they are in
> sync, and also samba-tool drs showrepl shows that replication seems to
> be stable.
>
> The "getting stuck" from the subject line has not occured for a few
> days, perhaps the dbcheck fixes have solved that, or perhaps we've
> just been lucky.
>
> All in all this appears pretty healthy, but there is a remaing problem:
>
> At ANY given time, ONE RANDOM single DC shows high cpu usage on one
> samba process. And on that DC (can be any of the three DCs) the logs
> fill up with this:
>
>> [2017/10/12 08:38:57.956586,  3]
>> ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>>   Terminating connection - 'ldapsrv_accept_tls_loop:
>> tstream_tls_accept_recv() - 104:Connection reset by peer'
>> [2017/10/12 08:38:57.956638,  3]
>> ../source4/smbd/process_single.c:114(single_terminate)
>>   single_terminate: reason[ldapsrv_accept_tls_loop:
>> tstream_tls_accept_recv() - 104:Connection reset by peer]
>> [2017/10/12 08:38:57.956823,  3]
>> ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>>   Terminating connection - 'ldapsrv_accept_tls_loop:
>> tstream_tls_accept_recv() - 104:Connection reset by peer'
>> [2017/10/12 08:38:57.956869,  3]
>> ../source4/smbd/process_single.c:114(single_terminate)
>>   single_terminate: reason[ldapsrv_accept_tls_loop:
>> tstream_tls_accept_recv() - 104:Connection reset by peer]
>> [2017/10/12 08:38:57.956990,  3]
>> ../source4/auth/ntlm/auth.c:271(auth_check_password_send)
>>   auth_check_password_send: Checking password for unmapped user
>> []\[]@[(null)]
>>   auth_check_password_send: mapped user is: []\[]@[(null)]
>> [2017/10/12 08:38:57.958675,  3]
>> ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>>   Terminating connection - 'ldapsrv_call_loop:
>> tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET'
>> [2017/10/12 08:38:57.958728,  3]
>> ../source4/smbd/process_single.c:114(single_terminate)
>>   single_terminate: reason[ldapsrv_call_loop:
>> tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET]
>> [2017/10/12 08:38:57.958948,  3]
>> ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>>   Terminating connection - 'ldapsrv_call_loop:
>> tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET'
>> [2017/10/12 08:38:57.958994,  3]
>> ../source4/smbd/process_single.c:114(single_terminate)
>>   single_terminate: reason[ldapsrv_call_loop:
>> tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET]
>> [2017/10/12 08:38:57.969111,  0]
>> ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges
>> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark
>> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
>> [2017/10/12 08:38:57.969762,  2]
>> ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on
>> DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
>> [2017/10/12 08:38:58.378265,  0]
>> ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges
>> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark
>> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
>> [2017/10/12 08:38:58.379160,  2]
>> ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on
>> DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
>> [2017/10/12 08:38:58.810202,  0]
>> ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges
>> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark
>> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
>> [2017/10/12 08:38:58.810868,  2]
>> ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on
>> DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
>> [2017/10/12 08:38:59.251863,  0]
>> ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges
>> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark
>> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
>> [2017/10/12 08:38:59.252418,  2]
>> ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on
>> DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
>> [2017/10/12 08:38:59.692247,  0]
>> ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges
>> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark
>> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
>
> I've seen "last_dn" be various things, system groups like above, but
> also regular users, computers, and groups that we created. We have
> even had (very few) cases were it was:
>
>> ./log.samba.3.gz: ../source4/rpc_server/drsuapi/getncchanges.c:1961:
>> DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older
>> highwatermark (last_dn DC=samba,DC=company,DC=com)
>
> Can anyone explain what is happening here, or help me understand this?
>
> I have read that highwatermark errors are not neccesarily bad, but the
> fact that they cause continuous high cpu usage on a DC (80, 90%),
> until the point where this behaviour "transfers" to a next DC makes me
> feel that in this case, this is not normal, and indicates some kind of
> problem.
>
> Thanks for input!
>
> MJ
>
> MJ

MJ,

     A dev or someone else may to assist but your replication isn't
syncing correctly among each other.  Those dangling links should have
purged by now if it's in reference to a DC removed several years ago.

Did you do a full replication from a known good DC to the other two?
This doesn't always fix the issue but is a good start. You didn't by
chance restore a DC recently from backup or had one offline and recently
powered on?

The highwatermark value tells the source DC what objects the destination
DC is requesting to update. The high CPU usage seems due to the DC doing
a full partition replication. The fact you stated this issue can happen
on all 3 makes it ever tougher to help. I would normally advise to just
demote the affected DC and join again.


--
--
James


--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: samba getting stuck, highwatermark replication issue?

Samba - General mailing list
Hi James, list

We really appreciate your input on this, thanks!

On 10/12/2017 04:12 PM, lingpanda101 via samba wrote:
> MJ,
>
>      A dev or someone else may to assist but your replication isn't
> syncing correctly among each other.  Those dangling links should have
> purged by now if it's in reference to a DC removed several years ago.

This is rather worrying :-|

Specially since I have all kinds of scripts in place that continously
check replication, hourly using "samba-tool drs showrepl" plus
"samba-tool ldapcmp" every other hour.

So one can even have problems, when all built-in checks succeed. :-(

Currently DC2 has high cpu usage, and grepping the log.samba for
"Succeeded" gives this kind of result:

>   Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com
>   Replicated 3 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com

All zero, with some exceptions...

I image this looks better, a sample from the non-high CPU DCs:

>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com
>   Replicated 2 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 2 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 4 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 4 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 2 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com

Some zeros, but many indications that it is actually replicating data.

> Did you do a full replication from a known good DC to the other two?
Well at this point I have no idea which DC I can consider "a good dc".

> This doesn't always fix the issue but is a good start. You didn't by
> chance restore a DC recently from backup or had one offline and recently
> powered on?
No. These three DCs have been online for many years, ever since the DC1
was removed. (we never demoted it, since it had crashed, so we manually
removed the DC1 from the database, that's perhaps why there are some
remains)

The fact that there are still two 'dangling forward links', identical on
all DCs, makes me think that we simply have missed those when we
manually removed all DC1 references. This happened back in the samba 4.1
days.

> The highwatermark value tells the source DC what objects the destination
> DC is requesting to update. The high CPU usage seems due to the DC doing
> a full partition replication. The fact you stated this issue can happen
> on all 3 makes it ever tougher to help. I would normally advise to just
> demote the affected DC and join again.

Perhaps I should try if I can find a combination of two DCs that works,
check replication, verify with ldapcmp, make sure no high cpu, etc, etc,
and then trust those two and demote the third.

Any input here would be very welcome... Here's bit of the logs, leading
up to the "Replicated 0 objects" on the current high-cpu DC, hopefully
that reveils something..?

>   Not authoritative for '_kerberos.com', forwarding
> [2017/10/12 06:00:16.744615,  2] ../source4/dns_server/dns_query.c:1019(dns_server_process_query_send)
>   Not authoritative for '_kerberos.com', forwarding
> [2017/10/12 06:00:16.745393,  2] ../source4/dns_server/dns_query.c:1019(dns_server_process_query_send)
>   Not authoritative for '_kerberos.com', forwarding
> [2017/10/12 06:00:16.745731,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: AS-REQ authtime: 2017-10-12T06:00:16 starttime: unset endtime: 2017-10-12T16:00:16 renew till: unset
> [2017/10/12 06:00:16.745830,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: Client supported enctypes: aes256-cts-hmac-sha1-96, aes128-cts-hmac-sha1-96, des3-cbc-sha1, des3-cbc-md5, arcfour-hmac-md5, using arcfour-hmac-md5/arcfour-hmac-md5
> [2017/10/12 06:00:16.745975,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: Requested flags: forwardable
> [2017/10/12 06:00:16.748679,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ MEMBERSERVER$@SAMBA.COMPANY.COM from ipv4:192.168.89.2:40725 for ldap/[hidden email] [canonicalize]
> [2017/10/12 06:00:16.754551,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
> [2017/10/12 06:00:16.755962,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ DC2$@SAMBA.COMPANY.COM from ipv4:192.87.143.15:41634 for ldap/[hidden email] [canonicalize]
> [2017/10/12 06:00:16.762012,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
> [2017/10/12 06:00:16.762249,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
> [2017/10/12 06:00:16.762249,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
> [2017/10/12 06:00:16.762320,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
> [2017/10/12 06:00:16.762967,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ MEMBERSERVER$@SAMBA.COMPANY.COM from ipv4:192.168.89.2:40726 for krbtgt/[hidden email] [forwarded, forwardable]
> [2017/10/12 06:00:16.765363,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
> [2017/10/12 06:00:16.765585,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
> [2017/10/12 06:00:16.765679,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
> [2017/10/12 06:00:16.766324,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ DC2$@SAMBA.COMPANY.COM from ipv4:192.87.143.15:41635 for krbtgt/[hidden email] [forwarded, forwardable]
> [2017/10/12 06:00:16.768612,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
> [2017/10/12 06:00:16.768836,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
> [2017/10/12 06:00:16.768907,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
> [2017/10/12 06:00:16.769475,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
> [2017/10/12 06:00:16.769542,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
> [2017/10/12 06:00:16.799101,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ DC2$@SAMBA.COMPANY.COM from ipv4:192.87.143.15:41637 for ldap/[hidden email] [canonicalize]
> [2017/10/12 06:00:16.808786,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
> [2017/10/12 06:00:16.809681,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
> [2017/10/12 06:00:16.809767,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
> [2017/10/12 06:00:16.817237,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ DC2$@SAMBA.COMPANY.COM from ipv4:192.87.143.15:41638 for krbtgt/[hidden email] [forwarded, forwardable]
> [2017/10/12 06:00:16.819573,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
> [2017/10/12 06:00:16.820289,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
> [2017/10/12 06:00:16.820368,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
> [2017/10/12 06:00:16.843259,  2] ../source4/dsdb/repl/replicated_objects.c:1016(dsdb_replicated_objects_commit)
>   Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com

Lot's of NT_STATUS_CONNECTION_DISCONNECTED. Ideas anyone..?

MJ

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: samba getting stuck, highwatermark replication issue?

Samba - General mailing list
In reply to this post by Samba - General mailing list
On Thu, 2017-10-12 at 09:17 +0200, mj via samba wrote:

> Hi all, James,
>
> After following James' suggestions fixing the several dbcheck errors,
> and having observed things for a few days, I'd like to update this
> issue, and hope for some new input again. :-)
>
> Summary: three DCs, all three running Version
> 4.5.10-SerNet-Debian-16.wheezy, samba-tool dbcheck --cross-ncs reports
> no errors, except for two (supposedly innocent) dangling forward links
> that I'm ignoring for now. Time is synced. Very basic smb.conf, posted
> earlier, can post again if needed.
>
> samba-tool ldapcmp dcX dcY --filter=whenChanged shows that they are in
> sync, and also samba-tool drs showrepl shows that replication seems to
> be stable.
>
> The "getting stuck" from the subject line has not occured for a few
> days, perhaps the dbcheck fixes have solved that, or perhaps we've just
> been lucky.
>
> All in all this appears pretty healthy, but there is a remaing problem:
>
> At ANY given time, ONE RANDOM single DC shows high cpu usage on one
> samba process. And on that DC (can be any of the three DCs) the logs
> fill up with this:

I would upgrade to Samba 4.7.  The work on locking in LDB and the
mention of replication issues was serious.  Likewise we fixed a number
of other issues in over-replication of linked attributes (group
memberships) for 4.6.

We are carefully following the reports here, but we do expect
replication should be much more stable with Samba 4.7.

Thanks,

Andrew Bartlett

--
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba


--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: samba getting stuck, highwatermark replication issue?

Samba - General mailing list
Hi Andrew,

Thanks for chiming in!

On 10/14/2017 12:16 PM, Andrew Bartlett via samba wrote:
> We are carefully following the reports here, but we do expect
> replication should be much more stable with Samba 4.7.

OK, that's interesting, because I actually wanted to upgrade ASAP, but
(the few) 4.7-upgrade experiences that have been posted, are mostly
about replication issues after having upgraded:

See https://lists.samba.org/archive/samba/2017-October/thread.html

Have people here generally upgraded to 4.7 already? Without major
issues? (does that explain the lack of discussion on 4.7?)

Or are people mostly waiting until a version 4.7.1 or .2 has been released?

MJ

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: samba getting stuck, highwatermark replication issue?

Samba - General mailing list
On Sat, 2017-10-14 at 12:52 +0200, mj wrote:

> Hi Andrew,
>
> Thanks for chiming in!
>
> On 10/14/2017 12:16 PM, Andrew Bartlett via samba wrote:
> > We are carefully following the reports here, but we do expect
> > replication should be much more stable with Samba 4.7.
>
> OK, that's interesting, because I actually wanted to upgrade ASAP, but
> (the few) 4.7-upgrade experiences that have been posted, are mostly
> about replication issues after having upgraded:
>
> See https://lists.samba.org/archive/samba/2017-October/thread.html
>
> Have people here generally upgraded to 4.7 already? Without major
> issues? (does that explain the lack of discussion on 4.7?)
>
> Or are people mostly waiting until a version 4.7.1 or .2 has been released?

I remain ever thankful to those who are willing to test new Samba
versions, particularly Samba release candidates.

In regards replication issues:
 - Newer Samba versions are becoming much more strict.  While this may
feel like a problem, it is about failing fast and in an obvious way,
rather than allowing latent issues to remain un-detected.

 - dbcheck all the things

 - join a new DC rather than upgrading in-place is a safer option if
you fear replication issues
 
 - You can test if Samba can join an existing domain by running:

samba-tool drs replicate clone-dc-database

I do realise that some of the reported issues happen after the upgrade.
 This is a worry, but I think represents latent DB issues.

Finally, while taking care for user confidentiality (staff/student
names, unicodePwd and supplimentalCredential values etc) we do need to
see logs, typically less than the last 100 lines, for failures at a
higher log level (keep turning it up to until meaningful info appears).
 We can't really guess very much from errors like this from another
user in another thread:

   [2017/09/29 10:26:15.502219,  0]
../source4/dsdb/repl/drepl_out_helpers.c:959(dreplsrv_op_pull_source_ap
ply_changes_trigger)
       Failed to commit objects:
WERR_GEN_FAILURE/NT_STATUS_INVALID_NETWORK_RESPONSE

Sorry,

Andrew Bartlett

--
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba


--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: samba getting stuck, highwatermark replication issue?

Samba - General mailing list
Dear Andrew,

Your explanation makes a lot of sense, thank you. Alo for the
clone-dc-database tip.

Really no need to end your email with "Sorry", your presense and advises
here are very much appreciated, and as I've said before: Thank you for
your hard work on making samba what it is today!

MJ

> samba-tool drs replicate clone-dc-database
>
> I do realise that some of the reported issues happen after the upgrade.
>   This is a worry, but I think represents latent DB issues.
>
> Finally, while taking care for user confidentiality (staff/student
> names, unicodePwd and supplimentalCredential values etc) we do need to
> see logs, typically less than the last 100 lines, for failures at a
> higher log level (keep turning it up to until meaningful info appears).
>   We can't really guess very much from errors like this from another
> user in another thread:
>
>     [2017/09/29 10:26:15.502219,  0]
> ../source4/dsdb/repl/drepl_out_helpers.c:959(dreplsrv_op_pull_source_ap
> ply_changes_trigger)
>         Failed to commit objects:
> WERR_GEN_FAILURE/NT_STATUS_INVALID_NETWORK_RESPONSE
>
> Sorry,
>
> Andrew Bartlett
>

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: samba getting stuck, highwatermark replication issue?

Samba - General mailing list
In reply to this post by Samba - General mailing list

On Sat, October 14, 2017 14:33, Andrew Bartlett wrote:

>
> I remain ever thankful to those who are willing to test new Samba
> versions, particularly Samba release candidates.
>

We have a FreeBSD-11 BHyve virtual machine spun up and awaiting a
working SAMBA past v4.3+ to trial.  So whenever this port gets to the
point of requiring FreeBSD testers just drop me a note.


--
***          e-Mail is NOT a SECURE channel          ***
        Do NOT transmit sensitive data via e-Mail
 Do NOT open attachments nor follow links sent by e-Mail

James B. Byrne                mailto:[hidden email]
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3


--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: samba getting stuck, highwatermark replication issue?

Samba - General mailing list
On Mon, 2017-10-16 at 16:14 -0400, James B. Byrne wrote:

> On Sat, October 14, 2017 14:33, Andrew Bartlett wrote:
>
> >
> > I remain ever thankful to those who are willing to test new Samba
> > versions, particularly Samba release candidates.
> >
>
> We have a FreeBSD-11 BHyve virtual machine spun up and awaiting a
> working SAMBA past v4.3+ to trial.  So whenever this port gets to the
> point of requiring FreeBSD testers just drop me a note.

Thanks.  FreeBSD is a bit stuck due to issues around extended
attributes, and further around lack of support for ZFS.  Both are not
impossible to solve, but they will need time.

Thanks,

Andrew Bartlett

--
Andrew Bartlett
https://samba.org/~abartlet/
Authentication Developer, Samba Team         https://samba.org
Samba Development and Support, Catalyst IT  
https://catalyst.net.nz/services/samba





--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba