Discussion:
rsync to a Samba/CIFS filessytem hangs
Robert Gasch
2005-07-14 07:48:55 UTC
Permalink
Hi,

I'm using rsync to backup a Linux Mandrake 10.1 (kernel 2.6.10) ext3
filesystem (+- 5GB of content, lots of little files) to a CIFS
filesystem mounted with samba 3.0.10. The exact invocation of rsync
is:

/usr/local/bin/rsync -v -a --copy-links --delete /var/www /mnt/backup/backup_www

Using the system provided rsync 2.6.3 and a self-compiled 2.6.5 this
process runs for a while and then simply hangs. What's even worse,
when I try to kill the job, the process becomes owned by pid 1, can't
be killed anymore and thus the memory it holds doesn't get released
anymore (forcing me to reboot the machine about once a month or so).

Does anybody have any ideas as to what could be causing this?

Greetings/Thanks
---> R
Paul Slootman
2005-07-14 09:53:38 UTC
Permalink
Post by Robert Gasch
I'm using rsync to backup a Linux Mandrake 10.1 (kernel 2.6.10) ext3
filesystem (+- 5GB of content, lots of little files) to a CIFS
filesystem mounted with samba 3.0.10. The exact invocation of rsync
The CIFS filesystem is mounted on the linux system? Then samba doesn't
really enter the picture. On what system is the CIFS filesystem located?
Could you show the output of 'mount'?
Post by Robert Gasch
/usr/local/bin/rsync -v -a --copy-links --delete /var/www /mnt/backup/backup_www
Using the system provided rsync 2.6.3 and a self-compiled 2.6.5 this
process runs for a while and then simply hangs. What's even worse,
Did you try using strace? Does lsof -p $pid show anything?
Post by Robert Gasch
when I try to kill the job, the process becomes owned by pid 1, can't
be killed anymore and thus the memory it holds doesn't get released
anymore (forcing me to reboot the machine about once a month or so).
It becomes a zombie process. It should be reaped by init (pid 1),
however for some reason init doesn't seem to be doing its work...
A zombie process only holds an entry in the process table so that it can
return its exit status to its parent (who hasn't waited for its child
yet). It shouldn't take up any memory or other resources...

The output from strace and lsof would be helpful.
However, my impression is that the CIFS filesystem is deadlocking
somewhere...


Paul Slootman
Robert Gasch
2005-07-15 05:13:01 UTC
Permalink
Hi Paul,

thanks for your reply. Pease see below for more info ...
Post by Paul Slootman
Post by Robert Gasch
I'm using rsync to backup a Linux Mandrake 10.1 (kernel 2.6.10) ext3
filesystem (+- 5GB of content, lots of little files) to a CIFS
filesystem mounted with samba 3.0.10. The exact invocation of rsync
The CIFS filesystem is mounted on the linux system? Then samba doesn't
really enter the picture. On what system is the CIFS filesystem located?
Could you show the output of 'mount'?
[***@www root]# mount
/dev/hda1 on / type ext3 (rw)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
/dev/hda9 on /home type ext3 (rw)
...
//sv4bdosbs1/linuxbck$ on /mnt/backup type cifs (rw,mand)

The CIFS file system is located on a Windows 2003 Server machine and
mounted on the server using Samba 3.0.10.
Post by Paul Slootman
The output from strace and lsof would be helpful.
However, my impression is that the CIFS filesystem is deadlocking
somewhere...
[***@www root]# lsof -p 16327
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
rsync 16327 root cwd DIR 0,14 0 64419 /mnt/backup/backup_www
rsync 16327 root rtd DIR 3,1 4096 2 /
rsync 16327 root txt REG 3,8 558667 3482979 /usr/local/bin/rsync
rsync 16327 root mem REG 3,1 35648 44022
/lib/libnss_files-2.3.3.so
rsync 16327 root mem REG 3,6 178476 411662
/usr/share/locale/ISO-8859-1/LC_CTYPE
rsync 16327 root mem REG 3,1 1165108 44042 /lib/tls/libc-2.3.3.so
rsync 16327 root mem REG 3,1 60804 44034
/lib/libresolv-2.3.3.so
rsync 16327 root mem REG 3,1 529609 43989 /lib/ld-2.3.3.so
rsync 16327 root 0u unix 0xe16fb680 8778239 socket
rsync 16327 root 2u CHR 136,2 4 /dev/pts/2
rsync 16327 root 4u unix 0xe16fb380 8778248 socket

[***@www root]# strace -p 16327
Process 16327 attached - interrupt to quit
select(1, [0], [], NULL, {27, 137000}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0} <unfinished ...>
Process 16327 detached

I'm not sure what to make of this output, but it indeed looks like
it's waiting for something which never happens. Any other pointers
would be much appreciated.

Greetings/Thanks
Robert
Paul Slootman
2005-07-15 10:39:55 UTC
Permalink
Post by Robert Gasch
Post by Paul Slootman
The output from strace and lsof would be helpful.
However, my impression is that the CIFS filesystem is deadlocking
somewhere...
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
rsync 16327 root cwd DIR 0,14 0 64419 /mnt/backup/backup_www
rsync 16327 root rtd DIR 3,1 4096 2 /
rsync 16327 root txt REG 3,8 558667 3482979 /usr/local/bin/rsync
rsync 16327 root mem REG 3,1 35648 44022
/lib/libnss_files-2.3.3.so
rsync 16327 root mem REG 3,6 178476 411662
/usr/share/locale/ISO-8859-1/LC_CTYPE
rsync 16327 root mem REG 3,1 1165108 44042 /lib/tls/libc-2.3.3.so
rsync 16327 root mem REG 3,1 60804 44034
/lib/libresolv-2.3.3.so
rsync 16327 root mem REG 3,1 529609 43989 /lib/ld-2.3.3.so
rsync 16327 root 0u unix 0xe16fb680 8778239 socket
rsync 16327 root 2u CHR 136,2 4 /dev/pts/2
rsync 16327 root 4u unix 0xe16fb380 8778248 socket
OK, this rsync process doesn't have any files open...
However, I expect that there should be a second rsync process as well?
Post by Robert Gasch
Process 16327 attached - interrupt to quit
select(1, [0], [], NULL, {27, 137000}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0} <unfinished ...>
Process 16327 detached
This rsync process is waiting for data to come in on that socket, which
most probably should be supplied by the other rsync process. Hence,
could you repeat the exercise, but then for all rsync processes?


Paul Slootman

Loading...