Slashdot Mirror


"Evolved" Caches Could Speed the Net

SpaceDilbert writes "According to New Scientist, evolutionary algorithms could make many network caches twice as efficient. This article describes a study carried out by a US researcher and two German academics, who "evolved" algorithms to determine what data should be held at a cache and for how long."

9 of 195 comments (clear)

  1. Caching by Zorilla · · Score: 5, Funny

    According to New Scientist, evolutionary algorithms could make many network caches twice as efficient.

    That's easy, just cache 4 boobs at a time instead of two

    --

    It would be cool if it didn't suck.
  2. What makes a good cache? by YankeeInExile · · Score: 5, Insightful

    Well, I have found one flaw in the methodology:

    The starting population of algorithms was tested on the simulator using randomly generated requests.
    I would think this would breed out of the caching system any affinity to locality-of-reference.

    One of the things I did each morning when I was running a cybercafé was "prime" the Squid cache by running a little script that did a wget -p on all of the URLs in the portal page, and a few sites that were not. And it definitely did improve performance for most users.

    One of my unrealized dreams would be some sort of speculative-fetch algorithm for Squid that would basically do a breadth-first fetch on a page while the user was busy reading it.

    Of course in my not-so-humble opinion, the biggest problem with any caching system is the population of websites that, through either malice or incompetence develop content that is cache-hostile, and call it "experience enhanced".

    --
    How does the Slashdot Effect happen given that no slashdotters ever RTFA?
  3. LRU Rules by Mirk · · Score: 5, Informative
    There's a good reason why LRU caching (least recently used) is so widespread, and that is that it's very very hard to come up with a sophisticated algorithm that outperforms this very naive one.

    For the uninitiated, elements are added to an LRU cache until it fills up; thereafter, whenever a new element is added, space is made for it by throwing away the least-recently used one. Note, least recently used, not the least recently added, i.e. the oldest, since an element that was cached long ago may be used all the time, and so be well worth its place in the cache. For example, consider the company-logo image that your browser caches when you visit a new site and that is embedded in every page on that site. However old it gets, it's likely to continue to be used while you're on the site. As soon as you move to another site, it gradually shuffles its way down the deck until it falls off the bottom - which is precisely what you want.

    --

    --
    What short sigs we have -
    One hundred and twenty chars!
    Too short for haiku.
  4. Re:Algorithms by bunyip · · Score: 5, Funny

    I wonder if they evolved logic to counter the slashdot effect. 1. Scan slashdot.org for new stories every five minute. 2. Scan new story for links. 3. Cache those pages.

    That would be the "suicide algorithm". As the server goes up in flames from the Slashdot-effect it brought upon itself, it would become the first cyber recipient of a Darwin Award.

    Alan.

  5. Next Gen Networking? by arieswind · · Score: 5, Interesting

    From the article:
    He[Pablo Funes] suggests networks might in the future be designed to work out who deserves the most help for themselves. "Sophisticated network behaviours might implement rules for reciprocity and trust," he says. "And conversely, for not cooperating with other others who try to abuse our resources."

    The future of network security? Imagine the next computer virus outbreak: Every network in the world could recognize the virus type activity and allocate them lesser or zero resources, maybe sending them a "Virus detected, please run antivirus software or contact your IT Department" notice, and detecting outside attacks from viruses and automatically flagging them as unsafe, and not give much(or any) attention to traffic from or to that site

  6. the details of their method by Dezer · · Score: 5, Interesting

    Actually, I do genetic algorithm / genetic programming research at the university of michigan. It's unlikely that these guys are using genetic algorithms to develop a new algorithm, but are rather using an existing algorithm and *tuning* the associated parameters using a GA. Given a list of parameters, GA's work by finding the best combination of parameters. As a result, the settings could be constantly tweaked (say on a daily/hourly basis) and different servers could still have different regional settings. My only problem with the concept is that it still depends on the tuning of pre-existing algorithms... but still - the results they share (2x improvement) is encouraging.

  7. A note on hill-climbing by Animats · · Score: 5, Informative
    Genetic algorithms are methods for optimization in bumpy spaces. The basic goal is "find x such that f(x) is maximized". As optimization algorithms, they should always be tested against the two simple optimization algorithms - basic hill climbing, and random search.

    If the search space is not dominated by local maxima, basic hill climbing (go for the best neighboring value) will work. And it will be fast. If the function is differentiable, it can be orders of magnitude faster than other methods, because you can use a variant of Newton's Method.

    If the search space is small, random search (just guessing) will work by exhaustively searching the space. This is obvious, but tends to be ignored in academic papers all too often.

    This discussion also applies to neural nets and simulated annealing.

    Now this article at least describes a problem for which a GA might actually be useful. Many such articles don't. But they haven't demonstrated that you need a bumpy hill-climbing algorithm.

    This is why, despite all the hype, GAs, neural nets, and such aren't used all that much. The search space has to have the right properties. Not too small, not too big, bumpy, but not too bumpy.

  8. Re:Algorithms by ezzzD55J · · Score: 5, Insightful

    Humbug, the big difference is: in GC there's perfect knowledge about what you want to throw away (unreferenced or only circularly referenced objects), and the problem is how to find out efficiently.. In caching the whole problem is to figure out which objects to keep, and beyond that, efficiency is no problem.

  9. not even close by Anonymous Coward · · Score: 5, Insightful

    Garbage collection is required to be correct, but is allowed to keep extra stuff around as memory permits. Everything that must be kept can be calculated deterministically without future knowledge, and the same holds true for everything that can be discarded. Approximation merely allows the what can be discarded calculation to progress faster. In garbage collection it is not correct to throw out something that will not be used in the future unless it is also unreachable.

    For cacheing there is no requirement of correctness. Performance is improved when things that will be used in the future are kept. There is no correctness requirement for keeping things around, and indeed best performance often requires discarding things that will be used in the future. In addition, there is no way of determining what will be used in the future without knowledge of the future. The correctness requirements of a cache are related to security (don't cache sensitive information) and staleness (don't cache stuff that is too old), but even those may be relaxed rather than strict requirements.

    Fundamentally the two problems are very different, and algorithms that deal are successful in the two different cases will likely be very different.