Downloading a great number of files from different rsync servers for good loadbalancing and high efficiency.

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Downloading a great number of files from different rsync servers for good loadbalancing and high efficiency.

Hongyi Zhao
Hi all,

I'm using Debian, I want to make a local repository which can let me
install packages more conveniently.

Considering that the rsync tool is the Debian official proposed tool for
syncing the files among its different rsync server sites, I use the rsync
client to downloading the deb packages from the different rsync servers
distributed around the world-wide for good loadbalancing and high
efficiency.

The steps are as follows:

1- Make the packages list file to be downloaded based on the Packages.gz
files for the corresponding OS distribution and architecture, say, for
testing, i.e., coded name by jessie and the amd64 architecture, the
following files can be use for extracting the packages list information:

https://mirrors.ustc.edu.cn/debian/dists/jessie/main/binary-amd64/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/main/binary-all/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/contrib/binary-amd64/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/contrib/binary-all/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/non-free/binary-amd64/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/non-free/binary-all/
Packages.gz

After I've downloaded all of the above files,  then use the following
command for extract the deb packages filenmaes list:

find /path/to/Packages.gz -type f -name Packages.gz -exec zcat \{\} + |
awk '/^Filename:/{ print $2  } ' > deb-file.list

At this point, the deb-file.list will contain a great number of lines
like the following:

----------
[snipped]
pool/main/m/mockobjects/libmockobjects-java-doc_0.09-5_all.deb
pool/main/s/subtitleeditor/subtitleeditor_0.33.0-3_amd64.deb
pool/main/h/haskell-hgl/libghc-hgl-prof_3.2.0.5-1_amd64.deb
pool/main/l/lsh-utils/lsh-doc_2.1-5_all.deb
pool/main/liba/libav/libswscale3_11.3-1_i386.deb
pool/main/s/smokeqt/libsmokeqtuitools4-3_4.12.2-2_amd64.deb
pool/main/libo/libotf/libotf0-dbg_0.9.13-2_amd64.deb
[snipped]
----------

2- Secondly, I obtain the list for all of the available rsync servers
supplied by Debian official and other open-source sites from here:

https://www.debian.org/CD/mirroring/rsync-mirrors

Note, though the above site say these rsync-mirrors are for Debian CD
images, in fact, most of them are also have the non-cd sections of Debian
repository.  So, I can use them for my purpose without any care.

At this stage, I make the rsync-mirrors for my purpose as follows:

curl https://www.debian.org/CD/mirroring/rsync-mirrors 2>/dev/null |awk
'/::debian-cd\//{gsub(/debian-cd/,"debian",$NF) ; split($NF,a,"<"); print
a[1] }' > mirrors.list

The content of the mirrors.list looks like the following:

----------------
[snipped]
debian.mirror.digitalpacific.com.au::debian-cd/
mirror.as24220.net::debian-cd/
mirror.intrapower.net.au::debian-cd/
mirror.rackcentral.com.au::debian-cd/
debian.anexia.at::debian-cd/
debian.sil.at::debian-cd/
[snipped]
----------------

Currently, I obtain 94 available rsync servers by using the above method
which are exactly the content of the file mirrors.list.

3- Finally, I use the powerful rsync tool to downloading all of these deb
files listed in deb-file.list by using all of the rsync servers stored in
the mirrors.list.  Considering that the bandwidth and maxconnections
limit    imposed by these servers' webmasters -- which are the fact for
most of these servers, I want only download one deb file from each of
these rsync servers at the same time.  And after the downloading finished
for the specific rsync server, than let rsync read in the next deb file
from the deb-file.list.  Again and again, till all of the deb files been
downloaded successfully by parallely using all of these rsync servers.

For the above purpose, I must use a script to do it.  I've tried the
following one which I struggling for sometime to get it, but it cann't meet all of the above requirements.  In fact it has a great distance from achieving the requirements I
posted in the above step 3:

-------------------
 mirror=1
 
 while read -r -a line
 do
 mirror_used=`awk 'NR=='"$mirror"'' mirrors.list`
 rsync -amH --progress --append-verify --timeout=10 --contimeout=5 \
 ${mirror_used} ${line[0]} debs/ &
 mirror=$[mirror+1]
 done < deb-file.list
 
 wait
-------------------

Any hints for this issue?

Regards
--
Hongyi Zhao <[hidden email]>
Xinjiang Technical Institute of Physics and Chemistry
Chinese Academy of Sciences
GnuPG DSA: 0xD108493

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: Downloading a great number of files from different rsync servers for good loadbalancing and high efficiency.

Karl O. Pinc
On Sat, 4 Apr 2015 15:21:21 +0800
Hongyi Zhao <[hidden email]> wrote:

> I'm using Debian, I want to make a local repository which can let me
> install packages more conveniently.

Your solution will not work for mirroring debian since it does
not do a 2-stage mirroring process.  This is
described in: https://www.debian.org/mirror/ftpmirrors

Further, your solution is a bad idea for many reasons.  If
you want to know more about this I suggest asking on
the Debian mailing lists or on the #debian irc
channel on irc.freenode.net.

Better would be to use the Debian recommended ftpsync
script.  This can be found at:
https://ftp-master.debian.org/ftpsync.tar.gz
The instructions are at:
https://www.debian.org/mirror/ftpmirrors

The Debian people know how to best mirror Debian.
Best to follow their guidance.  Depending on
your purposes you might not even want a mirror,
you might be better served with a cache.
Again, ask the Debian people for guidance.

Regards,


Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html