Discussion:
Rsync connection times out on very large --files-from
Eli
2005-03-08 17:35:07 UTC
Permalink
I've been trying to rsync a very large list of files specified by
--files-from to an rsync daemon and couldn't really figure out why it wasn't
working, but I think I finally understand the error.

I unfortunately don't have the error to paste here for you all to see, since
it takes several hours to get it (and I'm working on a workaround to rsync
the files so I don't have time to try and get the error again) - but if it's
necessary to see it I can reply with it on demand.

Anyways, the file specified in --files-from is about 500mb in size and
contains several million files. It takes rsync a few hours for it to parse
through all the files in the list. When it comes time for rsync to start
transmitting file data to the rsync server, it gets an error that the
connection timed out and it couldn't transmit any data.

I'm assuming this is just a standard 60 second tcp connection timeout
problem, but I see no way to implement keepalives or anything on the
connection to prevent it from dropping. Any ideas? Is this really
possible?

Eli.
Eli
2005-03-11 05:33:49 UTC
Permalink
Post by Eli
I'm assuming this is just a standard 60 second tcp connection
timeout problem, but I see no way to implement keepalives or
anything on the connection to prevent it from dropping. Any
ideas? Is this really possible?
I missed the ability to set socket options at the daemon level in
rsyncd.conf. Setting "socket options = SO_KEEPALIVE=1" in the daemon config
file fixes the timeouts (duh!) :)

Eli.
Wayne Davison
2005-03-11 16:35:30 UTC
Permalink
Post by Eli
Anyways, the file specified in --files-from is about 500mb in size and
contains several million files. It takes rsync a few hours for it to
parse through all the files in the list. When it comes time for rsync
to start transmitting file data to the rsync server, it gets an error
that the connection timed out and it couldn't transmit any data.
Since rsync is keeping the socket busy during the building of the list
(it is trasmitting it during that time), I guess the sorting/cleaning of
the list is taking enough time that the socket timeouted out on you.
I'm glad that the setting of SO_KEEPALIVE did something to help fix your
problem, but I'm having a hard time seeing why: the code should set
that for all daemon socket connections (excluding only daemon-over-ssh).
Are you using the --timeout=N option to rsync (which is an internal-to-
rsync timeout value, unrelated to the TCP timeout).

..wayne..
Eli
2005-03-11 16:57:16 UTC
Permalink
Post by Wayne Davison
Since rsync is keeping the socket busy during the building
of the list (it is trasmitting it during that time), I
guess the sorting/cleaning of the list is taking enough
time that the socket timeouted out on you.
Interesting, I didn't know rsync would transmit the list to the rsync server
as it processes it. The list of files is just over 1million files, and
takes find several hours to generate but that's irrelevant since rsync isn't
called until after the list is generated. It does take rsync a while to
parse the list of files before it gets to sending file data.
Post by Wayne Davison
I'm glad that the setting of SO_KEEPALIVE did something to
the code should set that for all daemon socket connections
(excluding only daemon-over-ssh).
Are you using the --timeout=N option to rsync (which is an
internal-to-rsync timeout value, unrelated to the TCP timeout).
No, I'm not using any --timeout setting on the client or server. There
could be another reason why the rsync worked this time... I needed to back
up the files regardless, so I wrote a PHP parser script to split up the huge
list of files in to groups of files based on parent folder (one list of
files per parent) which resulted in 8109 separate lists of files. I then
used a for loop in the shell to iterate through each of those 8109 files and
run rsync on them. Doing this allowed the rsync processes to not time out
their connections and they copied all their data (after MANY hours - it was
close to 200gb of data, and I didn't use -z). Once this data was copied to
the rsync server, I *THEN* tested the SO_KEEPALIVE setting in the
rsyncd.conf for subsequent runs of the original command (not splitting up
the list of files this time). The rsync process then did not error out and
was able to update the files fine. This could very well be due to rsync
taking far less time to do its task since it just had to compare files on
the rsync server and not transmit as many files (an update compared to a 1st
time back up run).

I don't have any debugger programs installed on the client system so I can't
run rsync in a debugger (not to mention the output would probably run my
system out of space!), but let me know if there's any tests you want me to
do. Shall I delete all the files on the rsync server and re-try the rsync
line to see if it still times out with SO_KEEPALIVE enabled?

Eli.

Loading...