apt-proxy shames the very word proxy
I spent good 1 year or so using apt-proxy
. The first version written in
Perl had some issues with concurrency. For instance, if two computers were
updating the same file, only the other appeared to actually be downloading
it. The other computer would get its chance at the file only after the first
download completed, at which time it would move at LAN speed.
They said it was ugly. They said it was broken. They abandoned it.
Enter apt-proxy-v2
, written using the twisted
framework. It's Python,
it's supposed to be better. Cleaner, leaner, faster. So, how come it turned
out to be a crappile?
Firstly, apt-get update
s through apt-proxy are hellishly slow because it
"validates" everything, in this case meaning it bunzip2s the Packages files
launching some 6 bunzip2s in parallel. I don't have beefy hardware on my
gateway. It is a 450 MHz celeron. The unnecessary bunzip2 takes seconds, and so
apt-get update
takes good 30 seconds instead of, say, 5.
What's weirder, copying the data from local disk to network is also rather
heavy. The celeron was maxed out copying some 500 kB/s. That's some waste of
CPU right there. I see lots of mmap()
and munmap()
action going on, so, I
guess this just proves that memory allocation is slow... While it moves the
data, the process grows larger, sometimes up to 8 MB or so, but eventually
it appears to free all that RAM.
But sometimes, it goes into memory eating rampage. It's a curious leak, too. Normally, long-running daemons lose memory slowly over time, a kilobyte here, a megabyte there. But this one keeps it all well in shape until some critical moment comes, at which point it eats some 150 MB of RAM instantly and stops leaking. While leaking, it's consuming 100% of CPU and does not respond to requests.
I think it says something about twisted
that no-one has been able to debug
the memory leak in 2 years. In fact, the ChangeLog entries contain funny
statements like "a memory leak has been fixed" or "new version of twisted
corrects a memory leak" but here it is, two years later, still leaking
memory.
So here's my recommendation -- if you are one of those hapless people using
apt-proxy
, switch to squid
unless you actually need its spiffy
multiple-sources-as-one aggregation features. I didn't, I just wanted to spare
some bandwidth for updating the numerous machines in my LAN. Configure squid
for some 2-3 GB storage, change the cache replacement policy from lru to heap
LFUDA, and set maximum cacheable object size to 400 MB. That's your apt-proxy,
and it doubles as a great webcache to boot!