Quantcast

Tracking down random server drops...

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Tracking down random server drops...

Josh Coombs
I've been fighting with a samba install on top of a amd64 Ubuntu 10.4 system
for a couple months now.  I have a small flotilla of Windows systems running
anything from Win 2000 up through Windows Server 2008.  All save for the
2008 systems are setup with nightly NTBackup jobs with my samba server as
the destination.  The 2008 systems are running MS SQL and use the samba
server as the destination for their scheduled DB dumps.

Samba is setup with a single share, and a single role login/password stored
locally.

99% of the time this system has worked well.  Randomly at night though
anywhere from one to all systems will fail their backup jobs at the same
time.  The failure window rolls around between 11pm and 2am.  There isn't a
pattern to the machines that fail, and more annoyingly, most of the time
only a few fail while others backing up at the same time show no issues.  I
can't find any cron jobs, system processes or network spikes to blame the
problem on.

So far I've done the following trying to stabilize this system:
- Upgraded to a mainline kernel due to a rather big bug in Ubuntu 10.4 that
would cause samba to flake out over the course of a week, requiring a reboot
to resolve.  (Launchpad bugid 474089)
- Turned up samba's logging (lvl 2) to see what it reports
- Locked samba to port 139, used iptables to drop all traffic to port 445
- Verified there are no limits being hit (nofiles was at 1024 for awhile,
took a bit to figure out upstart's syntax for raising that limit...)
- Upgraded from 3.4.2 to 3.5.4 using Maverick's packages
- Tuned kernel settings for gigabit ethernet
- Tuned samba's settings based on various online discussions

The problems still persist.  Looking at last night's failure I found the
following log messages:
log.smbd
[2010/08/25 00:37:15.551263,  1] smbd/server.c:267(remove_child_pid)
  Scheduled cleanup of brl and lock database after unclean shutdown
[2010/08/25 00:37:35.551964,  1] smbd/server.c:240(cleanup_timeout_fn)
  Cleaning up brl and lock database after unclean shutdown

Those are the only messages logged for last night, and line up exactly with
the backup failures.  The smbd process was last restarted on 8/23 at 13:51.
 The night before I had a couple systems fail also and had the following log
entries:
[2010/08/24 00:49:59.265666,  1] smbd/server.c:267(remove_child_pid)
  Scheduled cleanup of brl and lock database after unclean shutdown
[2010/08/24 00:50:19.281235,  1] smbd/server.c:240(cleanup_timeout_fn)
  Cleaning up brl and lock database after unclean shutdown
[2010/08/24 00:50:19.281454,  2] lib/messages_local.c:289(message_notify)
  message to process 1828 failed - No such process
[2010/08/24 00:50:19.281475,  2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 1828 doesn't exist - deleting messages record
[2010/08/24 00:50:19.281488,  2] lib/messages.c:127(traverse_fn)
  pid 1828 doesn't exist - deleting connections -1 []
[2010/08/24 00:50:19.411331,  2] lib/messages_local.c:289(message_notify)
  message to process 1838 failed - No such process
[2010/08/24 00:50:19.411373,  2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 1838 doesn't exist - deleting messages record
[2010/08/24 00:50:19.411394,  2] lib/messages.c:127(traverse_fn)
  pid 1838 doesn't exist - deleting connections -1 []
[2010/08/24 00:50:19.411431,  2] lib/messages_local.c:289(message_notify)
  message to process 1712 failed - No such process
[2010/08/24 00:50:19.411443,  2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 1712 doesn't exist - deleting messages record
[2010/08/24 00:50:19.411467,  2] lib/messages.c:127(traverse_fn)
  pid 1712 doesn't exist - deleting connections -1 []
[2010/08/24 00:50:19.411507,  2] lib/messages_local.c:289(message_notify)
  message to process 999 failed - No such process
[2010/08/24 00:50:19.411519,  2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 999 doesn't exist - deleting messages record
[2010/08/24 00:50:19.411531,  2] lib/messages.c:127(traverse_fn)
  pid 999 doesn't exist - deleting connections -1 []
[2010/08/24 00:50:19.411583,  2] lib/messages_local.c:289(message_notify)
  message to process 1848 failed - No such process
[2010/08/24 00:50:19.411598,  2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 1848 doesn't exist - deleting messages record
[2010/08/24 00:50:19.411626,  2] lib/messages.c:127(traverse_fn)
  pid 1848 doesn't exist - deleting connections -1 []
[2010/08/24 00:50:19.411707,  2] lib/messages_local.c:289(message_notify)
  message to process 1713 failed - No such process
[2010/08/24 00:50:19.411721,  2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 1713 doesn't exist - deleting messages record
[2010/08/24 00:50:19.411743,  2] lib/messages.c:127(traverse_fn)
  pid 1713 doesn't exist - deleting connections -1 []
[2010/08/24 00:50:19.411794,  2] lib/messages_local.c:289(message_notify)
  message to process 1892 failed - No such process
[2010/08/24 00:50:19.411823,  2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 1892 doesn't exist - deleting messages record
[2010/08/24 00:50:19.411837,  2] lib/messages.c:127(traverse_fn)
  pid 1892 doesn't exist - deleting connections -1 []

Looking at one of the machine specific logs for that window I saw:
[2010/08/24 00:49:35.090722,  2] smbd/sesssetup.c:1390(setup_new_vc_session)
  setup_new_vc_session: New VC == 0, if NT4.x compatible we would close all
old
resources.
[2010/08/24 00:49:35.098893,  2] smbd/sesssetup.c:1390(setup_new_vc_session)
  setup_new_vc_session: New VC == 0, if NT4.x compatible we would close all
old
resources.
[2010/08/24 00:49:35.273094,  2] auth/auth.c:304(check_ntlm_password)
  check_ntlm_password:  authentication for user [backup] -> [backup] ->
[backup]
 succeeded
[2010/08/24 00:49:35.294169,  1] smbd/service.c:1070(make_connection_snum)
  ravenclaw (192.168.1.8) connect to service backups initially as user
backup (u
id=34, gid=34) (pid 17741)
[2010/08/24 00:49:35.332337,  2] smbd/open.c:631(open_file)
  backup opened file A7B4G5J1.BJ3 read=Yes write=Yes (numopen=1)
[2010/08/24 00:49:35.333801,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=0) NT_STATUS_OK
[2010/08/24 00:49:35.334524,  2] smbd/open.c:631(open_file)
  backup opened file A7B4G5J1.BJ3 read=No write=No (numopen=1)
[2010/08/24 00:49:35.335791,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=0) NT_STATUS_OK
[2010/08/24 00:49:35.336507,  2] smbd/open.c:631(open_file)
  backup opened file A7B4G5J1.BJ3 read=Yes write=Yes (numopen=1)
[2010/08/24 00:49:35.337803,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=0) NT_STATUS_OK
[2010/08/24 00:49:35.338367,  2] smbd/open.c:631(open_file)
  backup opened file A7B4G5J1.BJ3 read=No write=No (numopen=1)
[2010/08/24 00:49:35.339624,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=0) NT_STATUS_OK
[2010/08/24 00:49:35.768280,  2] smbd/open.c:631(open_file)
  backup opened file A7B4G5J1.BJ3 read=Yes write=Yes (numopen=1)
[2010/08/24 00:49:35.769585,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=0) NT_STATUS_OK
[2010/08/24 00:49:35.770204,  2] smbd/open.c:631(open_file)
  backup opened file A7B4G5J1.BJ3 read=No write=No (numopen=1)
[2010/08/24 00:49:35.771495,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=0) NT_STATUS_OK
[2010/08/24 00:49:59.104971,  0] lib/util_sock.c:675(write_data)
[2010/08/24 00:49:59.105025,  0]
lib/util_sock.c:1432(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Connection
reset
 by peer
[2010/08/24 00:49:59.167351,  0] smbd/process.c:79(srv_send_smb)
  Error writing 51 bytes to client. -1. (Transport endpoint is not
connected)
[2010/08/24 00:49:59.167981,  0] lib/util_sock.c:675(write_data)
[2010/08/24 00:49:59.167995,  0]
lib/util_sock.c:1432(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
[2010/08/24 00:49:59.168006,  0] smbd/process.c:79(srv_send_smb)
  Error writing 55 bytes to client. -1. (Transport endpoint is not
connected)
[2010/08/24 00:49:59.168172,  0] lib/util_sock.c:675(write_data)
[2010/08/24 00:49:59.168184,  0]
lib/util_sock.c:1432(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
[2010/08/24 00:49:59.168195,  0] smbd/process.c:79(srv_send_smb)
  Error writing 51 bytes to client. -1. (Transport endpoint is not
connected)
[2010/08/24 00:49:59.257943,  2] smbd/close.c:656(close_normal_file)
  backup closed file Windows Servers/Ravenclaw/RAVENCLAW.bkf (numopen=0)
NT_STAT
US_OK
[2010/08/24 00:49:59.257981,  1] smbd/service.c:1251(close_cnum)
  ravenclaw (192.168.1.8) closed connection to service backups
[2010/08/24 00:49:59.258013,  2] smbd/open.c:631(open_file)
  backup opened file Windows Servers/Ravenclaw/RAVENCLAW.bkf read=Yes
write=Yes
(numopen=1)
[2010/08/24 00:49:59.260011,  2] smbd/open.c:631(open_file)
  backup opened file Windows Servers/Ravenclaw/RAVENCLAW.bkf read=No
write=No (n
umopen=2)
[2010/08/24 00:49:59.261185,  2] smbd/close.c:656(close_normal_file)
  backup closed file Windows Servers/Ravenclaw/RAVENCLAW.bkf (numopen=1)
NT_STAT
US_OK
[2010/08/24 00:49:59.261888,  2] smbd/open.c:631(open_file)
  backup opened file Windows Servers/Ravenclaw/RAVENCLAW.bkf read=No
write=No (n
umopen=2)
[2010/08/24 00:49:59.263221,  2] smbd/close.c:656(close_normal_file)
  backup closed file Windows Servers/Ravenclaw/RAVENCLAW.bkf (numopen=1)
NT_STAT
US_OK
[2010/08/24 00:49:59.894074,  2] smbd/open.c:631(open_file)
  backup opened file A7B4G5J1.BJ3 read=Yes write=Yes (numopen=2)
[2010/08/24 00:49:59.895333,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=1) NT_STATUS_OK
[2010/08/24 00:49:59.895893,  2] smbd/open.c:631(open_file)
  backup opened file A7B4G5J1.BJ3 read=No write=No (numopen=2)
[2010/08/24 00:49:59.897157,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=1) NT_STATUS_OK
[2010/08/24 00:49:59.897748,  2] smbd/open.c:631(open_file)
  backup opened file A7B4G5J1.BJ3 read=Yes write=Yes (numopen=2)
[2010/08/24 00:49:59.899061,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=1) NT_STATUS_OK
[2010/08/24 00:49:59.899632,  2] smbd/open.c:631(open_file)
  backup opened file A7B4G5J1.BJ3 read=No write=No (numopen=2)
[2010/08/24 00:49:59.900945,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=1) NT_STATUS_OK
[2010/08/24 00:50:00.092755,  2] smbd/open.c:631(open_file)
  backup opened file A7B4G5J1.BJ3 read=Yes write=Yes (numopen=2)
[2010/08/24 00:50:00.094545,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=1) NT_STATUS_OK
[2010/08/24 00:50:00.095506,  2] smbd/open.c:631(open_file)
  backup opened file A7B4G5J1.BJ3 read=No write=No (numopen=2)
[2010/08/24 00:50:00.097609,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=1) NT_STATUS_OK
[2010/08/24 00:50:00.098641,  2] smbd/open.c:631(open_file)
  backup opened file A7B4G5J1.BJ3 read=Yes write=Yes (numopen=2)
[2010/08/24 00:50:00.100695,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=1) NT_STATUS_OK
[2010/08/24 00:50:00.101654,  2] smbd/open.c:631(open_file)
  backup opened file A7B4G5J1.BJ3 read=No write=No (numopen=2)
[2010/08/24 00:50:00.103585,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=1) NT_STATUS_OK
[2010/08/24 00:50:01.508152,  2] smbd/close.c:656(close_normal_file)
  backup closed file Windows Servers/Ravenclaw/RAVENCLAW.bkf (numopen=0)
NT_STAT
US_OK
[2010/08/24 00:50:03.032069,  1] smbd/service.c:1251(close_cnum)
  ravenclaw (192.168.1.8) closed connection to service backups

That system started it's backup run at 8pm.  Based on the above I found a
few conversations online of others with the same symptom set and the
suggestion to use iptables to drop port 445 in addition to disabling it in
samba's conf.

I've got the same logs for last night's run so obviously the iptables tweak
didn't help.

I'm at my wits end trying to sort this, anyone have any suggestions?

My conf for good measure:
[global]
   domain master = no
   local master = no
   lock directory = /var/run/samba
   smb ports = 139
   workgroup = WORKGROUP
   server string = %h server (Samba, Ubuntu)
   wins support = no
   dns proxy = no
   log file = /var/log/samba/log.%m
   log level = 2
   max log size = 1000
   syslog = 0
   panic action = /usr/share/samba/panic-action %d
   security = user
   encrypt passwords = true
   passdb backend = tdbsam
   obey pam restrictions = yes
   unix password sync = yes
   passwd program = /usr/bin/passwd %u
   passwd chat = *Enter\snew\s*\spassword:* %n\n *Retype\snew\s*\spassword:*
%n\
n *password\supdated\ssuccessfully* .
   pam password change = yes
   map to guest = bad user
   socket options = TCP_NODELAY IPTOS_LOWDELAY SO_RCVBUF=65536
SO_SNDBUF=65536
   strict sync = no
   sync always = no
   load printers = no
   use sendfile = yes
   write cache size = 262144
   max xmit = 65535
   fake oplocks = no
   oplocks = yes
#   max protocol = SMB2

[backups]
comment = Windows Backups
path = /mnt/Storage/Disk2/Biddeford/Tier-2
 browsable = yes
guest ok = no
read only = no
 create mask = 0755
valid users = backup
user = backup

Joshua Coombs
--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Tracking down random server drops...

Josh Coombs
I'm turning my logging up to lvl 3, it looks like I'm seeing the symptom,
not the cause the more I look at this.  Based on last night's failures I
have the following pattern:

At 9:24pm ntbackup on ravenclaw begins writing to the existing file
RAVECLAW.bkf, so far nothing logged at lvl 2 yet...

at 7:04am log.ravenclaw shows:
[2010/08/26 07:04:35.538218,  2] smbd/sesssetup.c:1390(setup_new_vc_session)
  setup_new_vc_session: New VC == 0, if NT4.x compatible we would close all
old resources.
[2010/08/26 07:04:35.576723,  2] smbd/sesssetup.c:1390(setup_new_vc_session)
  setup_new_vc_session: New VC == 0, if NT4.x compatible we would close all
old resources.
[2010/08/26 07:04:35.697538,  2] auth/auth.c:304(check_ntlm_password)
  check_ntlm_password:  authentication for user [backup] -> [backup] ->
[backup] succeeded
[2010/08/26 07:04:35.700101,  1] smbd/service.c:1070(make_connection_snum)
  ravenclaw (192.168.1.8) connect to service backups initially as user
backup (uid=34, gid=34) (pid 30350)
[2010/08/26 07:04:35.701039,  2] smbd/open.c:631(open_file)
  backup opened file A7B4G5J1.BJ3 read=Yes write=Yes (numopen=1)
[2010/08/26 07:04:35.702556,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=0) NT_STATUS_OK

The open and closed chatter for that file repeats until 7:06:47am...
[2010/08/26 07:06:36.254759,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=0) NT_STATUS_OK
[2010/08/26 07:06:47.640491,  1] smbd/service.c:1251(close_cnum)
  ravenclaw (192.168.1.8) closed connection to service backups
[2010/08/26 07:08:12.795026,  0] lib/util_sock.c:675(write_data)
[2010/08/26 07:08:12.795070,  0]
lib/util_sock.c:1432(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Connection
reset by peer
[2010/08/26 07:08:12.814075,  0] smbd/process.c:79(srv_send_smb)
  Error writing 51 bytes to client. -1. (Transport endpoint is not
connected)
[2010/08/26 07:08:12.816315,  0] lib/util_sock.c:675(write_data)
[2010/08/26 07:08:12.816325,  0]
lib/util_sock.c:1432(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
[2010/08/26 07:08:12.816333,  0] smbd/process.c:79(srv_send_smb)
  Error writing 55 bytes to client. -1. (Transport endpoint is not
connected)
[2010/08/26 07:08:12.817211,  0] lib/util_sock.c:675(write_data)
[2010/08/26 07:08:12.817219,  0]
lib/util_sock.c:1432(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
[2010/08/26 07:08:12.817226,  2] smbd/process.c:2202(keepalive_fn)
  Keepalive failed - exiting.
[2010/08/26 07:08:12.817339,  0] lib/util_sock.c:675(write_data)
[2010/08/26 07:08:12.817348,  0]
lib/util_sock.c:1432(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
[2010/08/26 07:08:12.817355,  0] smbd/process.c:79(srv_send_smb)
  Error writing 51 bytes to client. -1. (Transport endpoint is not
connected)
[2010/08/26 07:08:12.941965,  2] lib/messages_local.c:289(message_notify)
  message to process 30350 failed - No such process
[2010/08/26 07:08:12.941991,  2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 30350 doesn't exist - deleting messages record
[2010/08/26 07:08:12.942010,  2] lib/messages_local.c:289(message_notify)
  message to process 30350 failed - No such process
[2010/08/26 07:08:12.942020,  2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 30350 doesn't exist - deleting messages record
[2010/08/26 07:08:12.942037,  2] lib/messages_local.c:289(message_notify)
  message to process 30350 failed - No such process
[2010/08/26 07:08:12.942046,  2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 30350 doesn't exist - deleting messages record
[2010/08/26 07:08:12.942065,  2] lib/messages_local.c:289(message_notify)
  message to process 30350 failed - No such process
[2010/08/26 07:08:12.942075,  2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 30350 doesn't exist - deleting messages record
[2010/08/26 07:08:12.942136,  2] smbd/close.c:656(close_normal_file)
  backup closed file Windows Servers/Ravenclaw/RAVENCLAW.bkf (numopen=0)
NT_STATUS_OK
[2010/08/26 07:08:12.942159,  1] smbd/service.c:1251(close_cnum)
  ravenclaw (192.168.1.8) closed connection to service backups

This is followed by the cleanup messages in log.smbd:
[2010/08/26 07:08:13.139151,  1] smbd/server.c:267(remove_child_pid)
  Scheduled cleanup of brl and lock database after unclean shutdown
[2010/08/26 07:08:33.152490,  1] smbd/server.c:240(cleanup_timeout_fn)
  Cleaning up brl and lock database after unclean shutdown

Ravenclaw reports the backup failed at 7:06am, error writing.  So something
happened between Samba and the 2k3 box at 7:04am that triggered the new
session, but I don't have anything to indicate a reason.  I have eventlog
entries on Ravenclaw from after the fact noting Delayed Write Failed, but
nothing leading up to the problem.

Josh C
--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Tracking down random server drops...

Josh Coombs
On Thu, Aug 26, 2010 at 10:56 AM, Josh Coombs <[hidden email]> wrote:

> I'm turning my logging up to lvl 3, it looks like I'm seeing the symptom,
> not the cause the more I look at this.  Based on last night's failures I
> have the following pattern:


So, log level 3 is... intense.  I have a log snap from a crash but can't
pick out anything, is there a preferred spot to post logs for review?

Also, I have a panic action script defined in my conf, gdb installed, and
have verified it can mail out but I'm not getting anything on these crashes?

Josh C
--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Tracking down random server drops...

John Drescher
> So, log level 3 is... intense.  I have a log snap from a crash but can't
> pick out anything, is there a preferred spot to post logs for review?
>

I am not sure about this list but other open source lists prefer pastebin.

John
--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Tracking down random server drops...

Josh Coombs
In reply to this post by Josh Coombs
On Fri, Aug 27, 2010 at 9:49 AM, Josh Coombs <[hidden email]> wrote:

> On Thu, Aug 26, 2010 at 10:56 AM, Josh Coombs <[hidden email]>wrote:
>
>> I'm turning my logging up to lvl 3, it looks like I'm seeing the symptom,
>> not the cause the more I look at this.  Based on last night's failures I
>> have the following pattern:
>
>
> So, log level 3 is... intense.  I have a log snap from a crash but can't
> pick out anything, is there a preferred spot to post logs for review?
>
> Also, I have a panic action script defined in my conf, gdb installed, and
> have verified it can mail out but I'm not getting anything on these crashes?
>
> Josh C
>

Here is the portion of the log where the crash seems to occur, nothing jumps
out at me though.

http://pastebin.com/GspCcCdp

Josh C
--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Tracking down random server drops...

grant little


On Aug 27, 2010, at 7:59 AM, Josh Coombs <[hidden email]> wrote:

> On Fri, Aug 27, 2010 at 9:49 AM, Josh Coombs <[hidden email]> wrote:
>
>> On Thu, Aug 26, 2010 at 10:56 AM, Josh Coombs <[hidden email]>wrote:
>>
>>> I'm turning my logging up to lvl 3, it looks like I'm seeing the symptom,
>>> not the cause the more I look at this.  Based on last night's failures I
>>> have the following pattern:
>>
>>
>> So, log level 3 is... intense.  I have a log snap from a crash but can't
>> pick out anything, is there a preferred spot to post logs for review?
>>
>> Also, I have a panic action script defined in my conf, gdb installed, and
>> have verified it can mail out but I'm not getting anything on these crashes?
>>
>> Josh C
>>
>
> Here is the portion of the log where the crash seems to occur, nothing jumps
> out at me though.
>
> http://pastebin.com/GspCcCdp
>
> Josh C
> --
> To unsubscribe from this list go to the following URL and read the
> instructions:  https://lists.samba.org/mailman/options/samba

What jumps out at me is the 21GB file at the crash point...
--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Tracking down random server drops...

Josh Coombs
On Fri, Aug 27, 2010 at 11:04 AM, Grant <[hidden email]> wrote:

>
>
> On Aug 27, 2010, at 7:59 AM, Josh Coombs <[hidden email]> wrote:
>
> > On Fri, Aug 27, 2010 at 9:49 AM, Josh Coombs <[hidden email]>
> wrote:
> >
> >> On Thu, Aug 26, 2010 at 10:56 AM, Josh Coombs <[hidden email]
> >wrote:
> >>
> >>> I'm turning my logging up to lvl 3, it looks like I'm seeing the
> symptom,
> >>> not the cause the more I look at this.  Based on last night's failures
> I
> >>> have the following pattern:
> >>
> >>
> >> So, log level 3 is... intense.  I have a log snap from a crash but can't
> >> pick out anything, is there a preferred spot to post logs for review?
> >>
> >> Also, I have a panic action script defined in my conf, gdb installed,
> and
> >> have verified it can mail out but I'm not getting anything on these
> crashes?
> >>
> >> Josh C
> >>
> >
> > Here is the portion of the log where the crash seems to occur, nothing
> jumps
> > out at me though.
> >
> > http://pastebin.com/GspCcCdp
> >
> > Josh C
> > --
> > To unsubscribe from this list go to the following URL and read the
> > instructions:  https://lists.samba.org/mailman/options/samba
>
> What jumps out at me is the 21GB file at the crash point...


That's normal, the nightly dumps of that box run a bit over 23GB.  At that
time of night it should be close to finishing.  The biggest dump we get is
240GB each night.
--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Loading...