apt-proxy is alive!
After a long quiet period, apt-proxy has worken up again. There have been many fixes and
improvements, so if you haven't tried it for a while maybe you should have another look :)
http/ftp backend support is scheduled for 1.3.0, and is already in testing.
Re: Why use this ?
I could see this being usefull for large M$ Windows client network environments. Since Windows typically upgrades itself using Windows Update (http://windowsupdate.microsoft.com), this same system could be used to cache the packages from the M$ site.
squid for caching debs...
I've found that squid, at least 2.2.STABLE5 version is not that good for caching large files like debs. The problem is that large downloads often fail before completion, and squid doesn't seem to use "resume" style requests on its retries, it just starts all over again.
The apt client is smart enough to do a "resume" request when squid finally gives up, but depending on how you configure squid, it either starts all over again from the beginning, or it only fetches part of the file that it then doesn't cache. For a 16M deb that can equate to multiple almost complete download attempts that are not even cached when it finally completes, if ever.
Also, squid's expiry model seems to be tuned for multiple small, frequently accessed, and frequently changing objects, not large, rarely accessed and never changing objects. It's psedo LRU expiry seems to be expiring the debs in my cache in favor of smaller objects with disturbing regularity.
apt-proxy is cool because it uses rsync to do the fetches, which in my experience is faster and more reliable than http for large downloads, and can do resumed fetches and delta-updates for objects that only change a little bit (ie Packages files). It also builds a mirror directory structure on demand that can be browsed/exported using other tools (giving ftp/http/whatever access to the same file repository). This makes if perfect for on-demand building and/or mantaining a debian mirror site.
Re: Why use this ?
Mainly because squid will download the Packages.gz file every time it changes: we only xfer the diffs (rsync). The auto-clean (only if a newer package) feature, fallback backends, and the fact that the cache layout maps 1:1 with the backend(s) (re-export cache via NFS/rsync/ftp) also helps.
That said, if you've already got squid up and running, it might be easier.
Why use this ?
I configured all the machines on my net to use the SQUID proxy on my DMZ for both FTP and HTTP. Since they all apt-get from the same Debian mirrors (give or take a handful of unofficial archives), SQUID handles all the caching with very little tweaking (maximum file size, more time until cached entry become stale, etc.). I found that method a more intuitive way to solve the resource mutualisation problem.
Several reasons make it especially efficient :
- packages in the cache have a limited lifespan. Therefore, building a mirror out of requests is only valid until the next package upgrade.
- a typical set-up only selects a fraction of the available packages, even less for a small number of supported hosts.
- using a general purpose caching program such as SQUID limits the additional complexity, and users do not have to change a line in their setup, provided they configured the proxy environment variable right.
As usual, there's more than one way to do it !
This package changed my life!
This code is truly beautiful. The authors are geniuses. My mummy said so.
An open, cross-platform journaling program.
A scientific plotting package.