APT Speed For Incremental Updates Gets a Massive Performance Boost
jones_supa writes: Developer Julian Andres Klode has this week made some improvements to significantly increase the speed of incremental updates with Debian GNU/Linux's APT update system. His optimizations have yielded the apt-get program to suddenly yield 10x performance when compared to the old code. These improvements also make APT with PDiff now faster than the default, non-incremental behavior. Beyond the improvements that landed this week, Julian is still exploring other areas for improving APT update performance. More details via his blog post.
Really looking forward to have apt-get speeds that can be compared with pacman. Julian Andres Klode, if you read this, please continue the great work!
If something is XML based and time criticial, I wouldn't bother to optimize the XML parsing, but rather exchange XML for a non braindead format to start with.
It's not using standard I/O function, but pure syscalls, which are obviously unbuffered. And the same code paths are also used for other stuff that maps files. Performance critical code implemented a buffer on top of that, and the ReadLine() function experiencing the main issue was only added as a convenience function and not used for anything critical until a few months ago (and we forgot that it was not optimized). Anyway, we implement buffering for ReadLine() now. I'll try to make it generic for both reads (all reads) and writes, but so far have not succeeded, probably because some code depends on unbuffered reads or writes.
Gosh it was slow code. Not so much bad code. There are 3 people working on APT in their free time. Instead of complaining, do something about it. You can join #debian-apt on OFTC, view the commits live, and check them. What matters more is that there are almost no functional regressions thanks to our test suite. Checking for performance regressions is an entirely different beast altogether.
Well, we do download in parallel if you use httpredir.debian.org and httpredir.debian.org returns different mirrors for different packages (which it does not do all the time, but reasonably often). I don't like installing in parallel, or downloading and installing at the same time, as they just make the error handling harder, for modest speedup.