Slashdot Mirror


"Evolved" Caches Could Speed the Net

SpaceDilbert writes "According to New Scientist, evolutionary algorithms could make many network caches twice as efficient. This article describes a study carried out by a US researcher and two German academics, who "evolved" algorithms to determine what data should be held at a cache and for how long."

17 of 195 comments (clear)

  1. LRU Rules by Mirk · · Score: 5, Informative
    There's a good reason why LRU caching (least recently used) is so widespread, and that is that it's very very hard to come up with a sophisticated algorithm that outperforms this very naive one.

    For the uninitiated, elements are added to an LRU cache until it fills up; thereafter, whenever a new element is added, space is made for it by throwing away the least-recently used one. Note, least recently used, not the least recently added, i.e. the oldest, since an element that was cached long ago may be used all the time, and so be well worth its place in the cache. For example, consider the company-logo image that your browser caches when you visit a new site and that is embedded in every page on that site. However old it gets, it's likely to continue to be used while you're on the site. As soon as you move to another site, it gradually shuffles its way down the deck until it falls off the bottom - which is precisely what you want.

    --

    --
    What short sigs we have -
    One hundred and twenty chars!
    Too short for haiku.
    1. Re:LRU Rules by rossjudson · · Score: 3, Informative

      ARC caches fairly handily outperform LRU. When LRU is appropriate they adapt towards it. When frequency is important they adapt towards that.

    2. Re:LRU Rules by Fzz · · Score: 4, Informative
      ARC caches fairly handily outperform LRU.

      Thanks for the pointer. Here's a link to some background on ARC, and a paper describing the algorithm. Looks like an interesting algorithm.

  2. Re:Not DNS by Short+Circuit · · Score: 3, Informative

    If I'm not mistaken, the DNS RFC takes into account the fact that domain records will be cached. That's why the records have expiration and refresh times on them.

  3. A note on hill-climbing by Animats · · Score: 5, Informative
    Genetic algorithms are methods for optimization in bumpy spaces. The basic goal is "find x such that f(x) is maximized". As optimization algorithms, they should always be tested against the two simple optimization algorithms - basic hill climbing, and random search.

    If the search space is not dominated by local maxima, basic hill climbing (go for the best neighboring value) will work. And it will be fast. If the function is differentiable, it can be orders of magnitude faster than other methods, because you can use a variant of Newton's Method.

    If the search space is small, random search (just guessing) will work by exhaustively searching the space. This is obvious, but tends to be ignored in academic papers all too often.

    This discussion also applies to neural nets and simulated annealing.

    Now this article at least describes a problem for which a GA might actually be useful. Many such articles don't. But they haven't demonstrated that you need a bumpy hill-climbing algorithm.

    This is why, despite all the hype, GAs, neural nets, and such aren't used all that much. The search space has to have the right properties. Not too small, not too big, bumpy, but not too bumpy.

    1. Re:A note on hill-climbing by 55555 · · Score: 2, Informative

      Misses the point. GAs do not look at search spaces and ask 'is this function too bumpy'. GAs (as well as many other evolutionary algorithms, see GECCO) are more similar to stochastic greedy strategies. I'd suggest looking at Walsh coefficients instead of derivatives as a rule of thumb.

      A large part of the reason GAs aren't catching on more is that there aren't enough really good text books on them. All of the really useful information is still in the form of research papers, these things take time. Also, we're only starting (last 5 years) to know enough to be able to routinely apply them. Without that, they're too hard and too expensive to use.

      As for random search, good papers look at the scaling factor for their algorithms. The goal is to build GAs that are quadratic. Any paper which suggests an exponential time GA (like random search) is wasting your time. The idea is to have a range of algorithms, greedy search for logarithmic time (or gradient descent), GAs for quadratic, and enumerative search for NP-Hard problems.

      The parent is either ignorant, overly jaded, or being a jerk.

  4. Legos and Tron mentioned in his thesis by the+frizz · · Score: 3, Informative
    The article link is light on details of the evolution algorithm, but Pablo Funes's home page has the text of his thesis on Evolution of Complexity in Real-World Domains. It talks about his use of evolving algorithms on topics like designing the strongest lego brick structure and playing Tron. Very cool, but not its application to caching.

    The is even a link to an online Tron game where us humans can play versus his evolving algorithms. The win/loss stats for his algorithm is approaching even. Given that humans can also evolve in the Tron game play, I imagine that the algorithm will have a head start over the new influx of slashdot visitors and start to win more often than not over the next week. I never got to play though. The SQL db to mange the stats was already down then I tried.

  5. Re:*sigh* by NialScorva · · Score: 4, Informative

    Evolution is the change in allele frequencies over time. Pure and simple, that's all it is. Check any evolutionary biology book. The notion of giraffes sprouting gills is absurd and not even remotely what evolution is except in the creationist strawmen. What prevents a whole lot of small changes from adding up? No one has discovered a barrier or any reason that this wouldn't occur. It's like saying that water can move grains of sand, but there's no proof that a lot of water will eventually erode a beach.

    The evolutionary algorithm will have a range of all possible algorithms that can be developed, so in a sense it is limited to "test various algorithms", though it would be testing all possible algorithms. Similarly biological evolution is limitted to testing various imperfect self-replicators, meaning all possible imperfect self-replicators. It is further constrained by the current state, but then that's the problem with non-biological GAs as well, the King of the Hill effect.

  6. Re:*sigh* by Conspiracy_Of_Doves · · Score: 3, Informative

    ANY environmental adaptation of genetic code over multiple generations is evolution. It does not have to be a major change. BTW, creationists are the only ones that make a distinction between macro-evolution and micro-evolution. If fact, creationists are the only ones that I have ever heard use the terms.

  7. A new sorting algorithm also ... by ThomasCR · · Score: 2, Informative
  8. Re:*sigh* by Conspiracy_Of_Doves · · Score: 2, Informative

    Natural selection is one of the methods (the other being sexual selection) by which all evolution (macro and micro) proceeds.

  9. Skips too far forward by Lulu+of+the+Lotus-Ea · · Score: 2, Informative

    Evolution is *not* the "change in allele frequencies over time." Or at least that's a secondary issue, historically, conceptually, and causitively. Well, on the last, allele frequency is pretty important (but still not exclusive). But evolution as a concept is firstly a question of PHENOTYPIC change, even before it's about GENOTYPIC change.

    Evolution was a well known phenomenon (e.g. from the geological record) long before the work of Mendel was known. Specifically, the rather well known Darwin had no concept of an allele when he came up with the Theory of Evolution by Natural Selection. In fact, Darwin's approach was most certainly not the only explanation of evolution floated in 19th century biology. For example, Lamarckian mechanisms had a plausibility at the time... in fact, Lamarckian mechanisms still represent an important underexamined area of evolutionary mechanisms, for historical reasons (Lysenko etc.)--see, for example, the work of Ruth Hubbard and the role of viral insertions into the genome. So the whole allele thing is only part of one (important) mechanism, only understood relatively late in the scientific study of evolution.

    Moreover, even in modern terms, mutation still exists; admittedly, the high-school text book focus on point mutations of genes is vastly overplayed. But just looking at allele frequencies misses also both the role of regulatory genes, and also ignores structural changes in chromosomes. Genes and smaller sequences migrate within and between chromosomes--this has nothing to do with alleles. Stephen Gould's work on phentotypic growth regulation in _Ontogeny and Phylogeny_ is good here, as is the tome _Structure of Evolutionary Theory_.

  10. Re:What makes a good cache? by a_ghostwheel · · Score: 3, Informative

    There are other types of neural networks than just standard back-propagating ones. Linear vector quantization is much better at solving classification problems (ART-based algorithm should be good too). Problem is that in many "on-the-surface" problems have better specialized solution than just a generic NN or GA brute-force approach. This situation led to drastic decrease in funding for NN research.

  11. Re:What makes a good cache? by autophile · · Score: 2, Informative
    Have CS researchers given up on the neural net approach, or are the nets still far too unstable for real world use?

    Well, no, it's just that neural nets have to use a carefully calculated and parameterized error-propagating algorithm, as well as an appropriate architecture and size of network, and if you choose the wrong one, you're basically hosed. Genetic algorithms (and genetic programming) have learning algorithms that are much simpler to understand and implement -- even though there's a lot of argument in the field as to exactly why they work. Even the classic "One-Armed Bandit" explanation is under attack.

    There has been some work done where a genetic algorithm/program evolves a neural network, and there has been nice success with that, as long as the problem is conducive to solution by a network. But in general, GA/GP can use any kind of solution architecture, and you choose one that makes the most sense for your problem. Google for +genetic +evolve +"neural network" for more info.

    --Rob

    --
    Towards the Singularity.
  12. Re:*sigh* by NialScorva · · Score: 2, Informative

    No, it's not that simple. Evolution has been defined as changes adding up enough so that a single celled organism becomes a multi-celled organism, which develops RNA and DNA, which then grows in complexity to develop various "macro" level systems such as a cardio-pulminary system, a digestive tract, a nervous system, and a central control "brain" system. Each of these changes requires a macro change of which no predecessor exists in the current chain of organics.

    That is one of the facts that it explains, not the definition. You're doing the equivalent of saying that a ball falling is the definition of gravity. Evolution is the collection of mechanisms by which these occur, which under the modern synthesis is by various alleles changing over time in a population.

    Actually, my first example was the very real issue of a single celled ameba becoming multi-celled.
    And oddly, stages between single cell and multicell still exist. The Volvox are single cell creatures that only function in colonies. A colony is a sphere that can grow a daughter colony inside it, then give birth to a new colony. Yet all of the cells involved are generic, not specialized into tissues. Sponge and jellyfish are on the other end of the gap. They both show tissue level specialization without organs.


    A giraffe sprouting gills would be an evolutionary change, but is highly unlikely.

    A giraffe sprouting gills would be a challenge to modern evolutionary theory, not an example. Modern evolution relies on the existing material being gradually changed or coopted.

    A more realistic example is a fish developing lungs. According to the current theories, most life originated in the ocean. I don't remember if the current thought is that lungs and gills developed simultaneously or if they developed in unison. Either way, the effect is much the same. A low order life form developed features not currently in its genetic makeup.

    Such as a mud skippers and lung fish? They both occupy niches and use mechanism similar to the first air-breathers using modified gills and air bladders, respectively. Not only are the transition species possible, but they're descendants still exist.

    Which is exactly my point. No new algorithm will be generated through the "genetic breeding". Only existing algorithms will rise to the top.

    That's interesting. I can write a program that contains the Axioms and rules from ZF Set theory, then randomly applies them to create new theorems. But that would be pointless since only theorems that existed in the orginal data and algorithms "will rise to the top". The whole of mathematics is pointless since all theorems are tautologies from the axioms and rules of manipulation.

  13. Re:Sigh... by NoOneInParticular · · Score: 2, Informative
    Actually, I have the real article here (not the New Scientist rubbish), the one they published at a conference, and what they do is evolve a function that takes the input data and computes a priority for that page. Pages with low priority are flushed, pages with high priority stay cached. One of the interesting expressions they found was:

    priority = lastTimeAccessed * (distance - accessCount)

    \ This was on data where the distance the document had travelled was taken into account. So, given some available input data they evolved a priority function (using GP) that servers as the center for a caching strategy. This with apparent great success.

  14. Re:Algorithms by aiyo · · Score: 2, Informative

    GC and caching are done very differently. In java memory has a counter associated with it. When you have two pointers to that memory then it has the value 2 stored somewhere with the memory. When you change the reference the counter is decremented. Every once in a while the GC will come along and free memory with a counter of 0. If you dereference that memory so that you have absolutely no pointers anywhere using the memory then you will have lost that memory although it is not freed until later.