"Evolved" Caches Could Speed the Net
SpaceDilbert writes "According to New Scientist, evolutionary algorithms could make many network caches twice as efficient. This article describes a study carried out by a US researcher and two German academics, who "evolved" algorithms to determine what data should be held at a cache and for how long."
Well, I have found one flaw in the methodology:
I would think this would breed out of the caching system any affinity to locality-of-reference.One of the things I did each morning when I was running a cybercafé was "prime" the Squid cache by running a little script that did a wget -p on all of the URLs in the portal page, and a few sites that were not. And it definitely did improve performance for most users.
One of my unrealized dreams would be some sort of speculative-fetch algorithm for Squid that would basically do a breadth-first fetch on a page while the user was busy reading it.
Of course in my not-so-humble opinion, the biggest problem with any caching system is the population of websites that, through either malice or incompetence develop content that is cache-hostile, and call it "experience enhanced".
How does the Slashdot Effect happen given that no slashdotters ever RTFA?
Humbug, the big difference is: in GC there's perfect knowledge about what you want to throw away (unreferenced or only circularly referenced objects), and the problem is how to find out efficiently.. In caching the whole problem is to figure out which objects to keep, and beyond that, efficiency is no problem.
Garbage collection is required to be correct, but is allowed to keep extra stuff around as memory permits. Everything that must be kept can be calculated deterministically without future knowledge, and the same holds true for everything that can be discarded. Approximation merely allows the what can be discarded calculation to progress faster. In garbage collection it is not correct to throw out something that will not be used in the future unless it is also unreachable.
For cacheing there is no requirement of correctness. Performance is improved when things that will be used in the future are kept. There is no correctness requirement for keeping things around, and indeed best performance often requires discarding things that will be used in the future. In addition, there is no way of determining what will be used in the future without knowledge of the future. The correctness requirements of a cache are related to security (don't cache sensitive information) and staleness (don't cache stuff that is too old), but even those may be relaxed rather than strict requirements.
Fundamentally the two problems are very different, and algorithms that deal are successful in the two different cases will likely be very different.