rsync hangs on select() system call

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

rsync hangs on select() system call

thomas veymont
hello,

trying to understand why a rsync client hangs during a transfer. Hopefully someone can advise.

client =  rsync 3.0.6, server = rsync 3.1.1 , transfer is done to a rsync module with such a command:
rsync -rtxvvv my_data [hidden email]

the client starts to transfer data (a few Mb) then hangs, then manually interrupted (CTRL+C)
a few minutes after. strace output on the client side is:

(...)
4251 17:14:30 select(4, NULL, [3], [3], {60, 0}) = 1 (out [3], left {59, 999998})
14251 17:14:30 write(3, "\374\17\0\7", 4) = 4
14251 17:14:30 select(4, NULL, [3], [3], {60, 0}) = 1 (out [3], left {59, 999998})
14251 17:14:30 write(3, "\337\364(\374\3567\t\361=\373\25@\0'+\00 (...)
14251 17:14:30 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:15:30 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:16:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:17:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:18:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:19:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:20:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:21:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:22:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:23:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:24:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:25:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:26:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:27:31 select(4, NULL, [3], [3], {60, 0}) = ? ERESTARTNOHAND (To be restarted)
14251 17:28:10 --- SIGINT (Interrupt) @ 0 (0) ---
4251 17:28:10 select(0, NULL, NULL, NULL, {0, 400000}) = 0 (Timeout)
14251 17:28:10 rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER, 0x371a432920}, NULL, 8) = 0
14251 17:28:10 rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER, 0x371a432920}, NULL, 8) = 0
14251 17:28:10 write(2, "rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(544) [sender=3.0.6]", 89) = 89
14251 17:28:10 write(2, "\n", 1)        = 1
14251 17:28:10 write(1, "_exit_cleanup(code=20, file=rsync.c, line=544): about to call exit(20)\n", 71) = 71
14251 17:28:10 exit_group(20)           = ?

Send-Q ont the client side looks like:

Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name
tcp       41  62844 132.xxx.xxx.xx:41397        152.xx.xx.xx:873           ESTABLISHED 3387/rsync

On the server side, rsync daemon log says :

Feb 25 17:14:25 resif-vm13 rsyncd[2403]: [generator] receiving flist for dir 10
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: recv_file_name(sds_extract/2015/RD/LOR/BHN.D/RD.LOR..BHN.D.2015.055)
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: received 1 names
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: recv_file_list done
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: recv_generator(sds_extract/2015/RD/LOR/BHN.D,19)
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: set modtime of sds_extract/2015/RD/LOR/BHN.D to (1424884220) Wed Feb 25 17:10:20 2015
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: sds_extract/2015/RD/LOR/BHN.D/
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: recv_generator(sds_extract/2015/RD/LOR/BHN.D/RD.LOR..BHN.D.2015.055,20)
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: [generator] receiving flist for dir 11
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: recv_file_name(sds_extract/2015/RD/LOR/BHZ.D/RD.LOR..BHZ.D.2015.055)
Feb 25 17:35:07 resif-vm13 rsyncd[2403]: rsync: connection unexpectedly closed (12005 bytes received so far) [generator]
Feb 25 17:35:07 resif-vm13 rsyncd[2403]: rsync error: error in rsync protocol data stream (code 12) at io.c(226) [generator=3.1.1]
Feb 25 17:35:07 resif-vm13 rsyncd[2403]: [generator] _exit_cleanup(code=12, file=io.c, line=226): about to call exit(12)

please note : clocks on server and client are maybe not exactly the same. I don't have easy console access to client (remote customer's).
Does someone have an idea of why it hangs ? Seems like the client side is hanging on a select() call, which would mean
that one the file selected for transfer is locked for reading (true/false?) ?

any suggestion welcome
Tom

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: rsync hangs on select() system call

thomas veymont
hello ,
any suggestion about this problem ?
thanks

2015-02-27 15:46 GMT+01:00 thomas veymont <[hidden email]>:
hello,

trying to understand why a rsync client hangs during a transfer. Hopefully someone can advise.

client =  rsync 3.0.6, server = rsync 3.1.1 , transfer is done to a rsync module with such a command:
rsync -rtxvvv my_data [hidden email]

the client starts to transfer data (a few Mb) then hangs, then manually interrupted (CTRL+C)
a few minutes after. strace output on the client side is:

(...)
4251 17:14:30 select(4, NULL, [3], [3], {60, 0}) = 1 (out [3], left {59, 999998})
14251 17:14:30 write(3, "\374\17\0\7", 4) = 4
14251 17:14:30 select(4, NULL, [3], [3], {60, 0}) = 1 (out [3], left {59, 999998})
14251 17:14:30 write(3, "\337\364(\374\3567\t\361=\373\25@\0'+\00 (...)
14251 17:14:30 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:15:30 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:16:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:17:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:18:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:19:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:20:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:21:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:22:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:23:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:24:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:25:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:26:31 select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)
14251 17:27:31 select(4, NULL, [3], [3], {60, 0}) = ? ERESTARTNOHAND (To be restarted)
14251 17:28:10 --- SIGINT (Interrupt) @ 0 (0) ---
4251 17:28:10 select(0, NULL, NULL, NULL, {0, 400000}) = 0 (Timeout)
14251 17:28:10 rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER, 0x371a432920}, NULL, 8) = 0
14251 17:28:10 rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER, 0x371a432920}, NULL, 8) = 0
14251 17:28:10 write(2, "rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(544) [sender=3.0.6]", 89) = 89
14251 17:28:10 write(2, "\n", 1)        = 1
14251 17:28:10 write(1, "_exit_cleanup(code=20, file=rsync.c, line=544): about to call exit(20)\n", 71) = 71
14251 17:28:10 exit_group(20)           = ?

Send-Q ont the client side looks like:

Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name
tcp       41  62844 132.xxx.xxx.xx:41397        152.xx.xx.xx:873           ESTABLISHED 3387/rsync

On the server side, rsync daemon log says :

Feb 25 17:14:25 resif-vm13 rsyncd[2403]: [generator] receiving flist for dir 10
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: recv_file_name(sds_extract/2015/RD/LOR/BHN.D/RD.LOR..BHN.D.2015.055)
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: received 1 names
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: recv_file_list done
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: recv_generator(sds_extract/2015/RD/LOR/BHN.D,19)
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: set modtime of sds_extract/2015/RD/LOR/BHN.D to (1424884220) Wed Feb 25 17:10:20 2015
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: sds_extract/2015/RD/LOR/BHN.D/
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: recv_generator(sds_extract/2015/RD/LOR/BHN.D/RD.LOR..BHN.D.2015.055,20)
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: [generator] receiving flist for dir 11
Feb 25 17:14:25 resif-vm13 rsyncd[2403]: recv_file_name(sds_extract/2015/RD/LOR/BHZ.D/RD.LOR..BHZ.D.2015.055)
Feb 25 17:35:07 resif-vm13 rsyncd[2403]: rsync: connection unexpectedly closed (12005 bytes received so far) [generator]
Feb 25 17:35:07 resif-vm13 rsyncd[2403]: rsync error: error in rsync protocol data stream (code 12) at io.c(226) [generator=3.1.1]
Feb 25 17:35:07 resif-vm13 rsyncd[2403]: [generator] _exit_cleanup(code=12, file=io.c, line=226): about to call exit(12)

please note : clocks on server and client are maybe not exactly the same. I don't have easy console access to client (remote customer's).
Does someone have an idea of why it hangs ? Seems like the client side is hanging on a select() call, which would mean
that one the file selected for transfer is locked for reading (true/false?) ?

any suggestion welcome
Tom


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: rsync hangs on select() system call

Chris-6
Hi Thomas (and the list members),

I have exactly the same problem. It's very very annoying as it breaks
backups. Did you resolve it ?

Regards,

Chris
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: rsync hangs on select() system call

Chris-6
Le 02/05/2015 01:09, Chris a écrit :
> Hi Thomas (and the list members),
>
> I have exactly the same problem. It's very very annoying as it breaks
> backups. Did you resolve it ?

I updated on both sides to rsync 3.1.1 compiled manually, and now I get
one more line just before the select and the hang :

gettimeofday({1430522930, 618188}, NULL) = 0
select(4, [], [3], [], {60, 0}

I just ntpdated on both side just before the rsync and verified the time
is identical on both sides.

Any idea ?

Regards,

Chris
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html