Slashdot Mirror


"Evolved" Caches Could Speed the Net

SpaceDilbert writes "According to New Scientist, evolutionary algorithms could make many network caches twice as efficient. This article describes a study carried out by a US researcher and two German academics, who "evolved" algorithms to determine what data should be held at a cache and for how long."

17 of 195 comments (clear)

  1. Algorithms by th1ckasabr1ck · · Score: 4, Interesting
    Pablo Funes of US company Icosystem and Jürgen Branke and Frederik Theil of the University of Karlsruhe in Germany used "genetic algorithms", which mimic Darwinian evolution, to develop strategies for internet servers to use when caching data.

    It would be interesting to see exactly which algorithms they are talking about here. I wouldn't be surprised if they drew some ideas from garbage collection algorithms also.

  2. what about worm traffic? by garcia · · Score: 3, Interesting

    From what I can tell a good majority of the traffic that my machine receives is worm traffic. Would these genetic routines be setup to disregard those as cache data? If that's the case would they be setup to just block that data?

    That alone would save me quite a bit of traffic as people on my subnet hit me constantly with their infected machines.

    66.41.161.120 hit my machine 57 different times (that isn't individual requests, that's total times).

  3. Not DNS by artlu · · Score: 3, Interesting

    It already takes forever for DNS changes to propagate through every network, which can be extremely frustrating when you have a high bandwidth domain. There definitely needs some optimization on the DNS front.

    GroupShares Inc.

    --
    -------
    artlu.net
  4. bittorrent tie in? by Chuck+Bucket · · Score: 4, Interesting

    I"ve always wondered if something along the lines of cache complimented with a Bittorrent type of scheme couldn't help speed up the internet. that way bits would be mirrored all over, and a server could pull them in faster since more servers did less work each.

    just something I'm thinking about today, well, that and the Kerry/Edwards pairing.

    PCBRwer342$#

  5. Next Gen Networking? by arieswind · · Score: 5, Interesting

    From the article:
    He[Pablo Funes] suggests networks might in the future be designed to work out who deserves the most help for themselves. "Sophisticated network behaviours might implement rules for reciprocity and trust," he says. "And conversely, for not cooperating with other others who try to abuse our resources."

    The future of network security? Imagine the next computer virus outbreak: Every network in the world could recognize the virus type activity and allocate them lesser or zero resources, maybe sending them a "Virus detected, please run antivirus software or contact your IT Department" notice, and detecting outside attacks from viruses and automatically flagging them as unsafe, and not give much(or any) attention to traffic from or to that site

  6. Re:What makes a good cache? by YankeeInExile · · Score: 2, Interesting

    So, do you posit that the purpose of the GA is just to give human programmers insight into avenues they might not have otherwise considered?

    --
    How does the Slashdot Effect happen given that no slashdotters ever RTFA?
  7. the details of their method by Dezer · · Score: 5, Interesting

    Actually, I do genetic algorithm / genetic programming research at the university of michigan. It's unlikely that these guys are using genetic algorithms to develop a new algorithm, but are rather using an existing algorithm and *tuning* the associated parameters using a GA. Given a list of parameters, GA's work by finding the best combination of parameters. As a result, the settings could be constantly tweaked (say on a daily/hourly basis) and different servers could still have different regional settings. My only problem with the concept is that it still depends on the tuning of pre-existing algorithms... but still - the results they share (2x improvement) is encouraging.

  8. Re:What makes a good cache? by AKAImBatman · · Score: 3, Interesting

    I'm actually a bit surprised that neural nets haven't yet entered common usage. In theory, such a net would be far more "intelligent" (sort of) in making decisions.

    e.g. This page looks a lot like a truly dynamic page, so let's not cache it. Over here, this is dynamic, but only slightly so. Let's cache it and index the most likely path the user will take.

    Have CS researchers given up on the neural net approach, or are the nets still far too unstable for real world use?

  9. Re:What makes a good cache? by Short+Circuit · · Score: 3, Interesting

    I actually wrote a paper on those in middle school. The problem with them is that the larger ones are very difficult to tune.

    It turns out that if you have more than three layers (in other words, if you have any layers between the input and putput), you run into proplems in training when the network doesn't work as well as you want it to perform, but furthur cycles of your training algorithm don't seem to make it any better.

    The comp.ai.neural-nets FAQ was my primary source for that paper. Read that if you're interested. (I felt like my brain went numb after a while back when writing the paper.)

  10. Re:*sigh* by Laxitive · · Score: 3, Interesting

    If you had studies GAs, you would know that GAs do "create" new information that doesn't exist in the original solution set (i.e. any given generation of potential solutions).

    GA systems both propogate "good" information, as well as generate and test new information incrementally across generations. Of course, this is done within whatever limitations you choose to impose on the solution set (i.e. if you're using a fixed N-bit string to represent entities in the solution space, then you'll search across all 2^N possible solutions in the N-bit string space).

    Fundamentally, GAs are just a way of searching a solution space for good solutions. They are slow, but can tackle a more diverse set of problems easier than the traditional backpropogation-based neural network. I would consider neural networks to be more in line with your description of an 'adapting' learning algorithm.

    The following analogy can only be taken so far, but you can compare backpropogation-based neural networks to be akin to hill-climbing algorithms - you start from one solution, move some limited distance towards what you think is a better solution. For a neural network with N vertices, you can think of neural network learning as hill-climbing in an N-dimensional space over whatever field used to specify edge weights.

    GAs, on the other hand, do a much more disparate search over any solution space. And it is surprising how well it finds near-optimal solutions even for problems where the relation between solutions is complex and non-intiuitive (e.g. minimums for seemingly chaotic functions).

    -Laxitive

  11. Re:This seems a little...selfish by droleary · · Score: 4, Interesting

    "Priming the cache" and "doing a breadth-first fetch on a page" are both things that create *more* network traffic on the off-chance that it might save some number of microseconds for the user.

    More traffic isn't necessarily bad. While what you say is true, you fail to note that user-initiated traffic is done in bursts. Just like your CPU is idle >95% of the time, so is your network connection. So all users benefit, both in real and perceived performance, when there is a steady 100% utilization.

    So everyone just grabs what they can get and everyone is worse off.

    Again true, but naive. What would be better is if there were a mechanism to prioritize the pre-fetch cache. Every page has one link that is pretty much the most likely next page. Then a secondary link, and so on. Ideally, a site owner should be able to put that priority list somewhere in the page such that a user agent can start getting it after the current page has loaded and is being read. Otherwise, maybe the user agent can favor new content (i.e., compare this load of Slashdot with the one done five minutes ago and grab the links in the diff). That's a far cry from a mad grab, and would probably benefit all parties involved.

  12. Re:LRU Rules by arth1 · · Score: 2, Interesting
    There's a good reason why LRU caching (least recently used) is so widespread, and that is that it's very very hard to come up with a sophisticated algorithm that outperforms this very naive one.

    For small cache sizes with one or a few fairly similar users, this is true. For larger cache sizes with a large number of users, it's almost always better to expire based on both LRU (Least Recently Used) and LFR (Least Frequently Requested). You may want to cache all those sucky Christmas links until next year even though no-one is going to access them from mid-January until October. The number of requests for those objects will, with a combined LRU/LRF scheme ensure that they have a lesser chance to be harvested than an item that was accessed once, two months ago.
    It also means that a site that has been slashdotted once has a less chance of being slashdotted again.

    Regards,
    --
    *Art
  13. Freenet by zoeblade · · Score: 2, Interesting

    Doesn't Freenet already do this within its own network?

  14. Re:This seems a little...selfish by droleary · · Score: 2, Interesting

    There's the link tag for html headers, which specifies the next page, previous page, and things like that.

    Ah, I knew it was too good an idea to be all my own! For those not aware of how to use the link element for that, I found this page as well as this one. I don't know how widely it is used by site designers, and I can't say I know of any web browsers that use it, but I would say it is definitely something a proxy server should be taking advantage of. It's use is even stated directly in the second document:

    User agents may choose to preload the "next" document, to reduce the perceived load time.
  15. Re:What makes a good cache? by Dachannien · · Score: 2, Interesting

    Not a flaw. When dealing w/ GAs, one needs to create the initial generation somehow, and using 'randomly generated' GAs provides a good start that is statistically unlikely to be biased or stuck in a local maxima/minima.

    Except that wasn't what the poster was getting at - he was noting that the trials provided to the GA were essentially random, and didn't necessarily reflect what the cache is likely to see IRL.

    I suspect, however, that this is an error in the NS article, and that the data sets *were* tuned to particular situations.

    On a side note, some of the research our group is doing suggests that random initial populations in GAs can hinder the search, and that the first portion of the search is relatively indistinguishable from random search, because the fitness in most problems is flat over large portions of the search space. If you do know something general about the solution to the problem, it might not be a bad idea to seed the initial population with that knowledge.

  16. Re:*sigh* by AKAImBatman · · Score: 2, Interesting

    I don't believe you understand the word "evolution". Evolution is not a huge jump that rapidly redefines something, it's an almost unnoticable change over time.

    No, the current theories state that it is a unnoticable change over a long period of time. On a higher level, evolution simply states that major changes occurred over a long period of time. It also states that those changes occurred in an "organic vacuum". e.g. Lungs developed where no lungs had ever existed before.

    Evolution can be understood as a search through a problem space. The caching algorithm sets up a problem space of "all possible caching algorithms". A P2P server would be completely outside of the problem space and thus irrelavent.

    If that was really the case, then higher organisms should never have developed. The problem space for the "simple" organism was to survive in its wet environment. I believe you named colonies of amebas as an example of "vestigal organisms" at one point. Without arguing that point, what forced these successful organisms to then develop different cell types? How did different cell types improve their success rate? From there, how did developing complex DNA improve their success rate? From there, how did specialized macro-organs improve their success rate?

    In evolutionary theory, the "simple" life form was striving to become a better adapted life form. It obviously exceeded its problem space by a wide margin, or you and I would not be having this discussion today. Thus if GAs are really indicitive of "real" evolution, then why can't they eventually evolve into "higher algorithms"? A P2P network could very well improve the GA's success rate for its problem domain. It can start by accidentally reusing its network connections, then accidentally making contact with other cache hosts with the same mutation, then it can evolve a protocol by accidentally learning each communication packet through a torrent of random spam packets, etc.

    While that sounds somewhat silly, that's not my point. My point is that evolution states exactly that sort of behavior over a long enough period of time. If a GA is a good emulation of micro-evolution, and macro-evolution is based on micro-evolution, then shouldn't we be able to compute a set of mutations that take the program out of its pre-programmed algorithms? Shouldn't it be able to leverage parameters that we didn't program it to leverage?

  17. Re:*sigh* by AKAImBatman · · Score: 2, Interesting

    Will you make up your mind, first you agree with the grandparent poster that one particular example doesn't prove the general case, and then you make the claim that the SCA example does suggest the general case is zero sum.

    I have made up my mind. The parent of my post made a statement that was factually correct: SCA does not "prove" that positive mutations can't happen. I then go on to show that everything from SCA to genetic manipulation has so far shown us that genetics is a zero sum game. It's perfectly possible that it's not, but we haven't yet proved it either way. All we have is a mountain of data that suggests genetics is a zero-sum game.

    Come on man, make up your mind! which side of the fence are you on?

    How is that scientific? If you always approach a problem from the angle that "X must be right", then you'll never make any new discoveries. If something can be observed then proven to be true, then we must adjust our world-view and move on.

    It is worth noting that there is a natural bias towards observing negative mutations, not only are they probably far more common but also they stand out against the background, i.e. when people start dieing, or are born with gills, or no legs then people are going to notice pretty damn quick and start saying "ohhhh, mutation==bad!"

    This is quite true.

    however a beneficial mutation (for example a resistance to the common cold) is much more likely to go unnoticed at first,

    The difficulty here is that until we can isolate and identify "positive" mutations, they are nothing but speculation. It's possible that it's just a matter of time before such mutations are discovered, but we can't say they exist until we can demonstrate them. It's a bit like Tachyons in physics. Theory said they should exist, therefore they existed, right? (If it's on Star Trek, it must be true!) Turns out that Tachyons are now considered to have been a "glitch" in string theory that has now been worked out. No time-traveling phantom particle, sorry.

    as it is a lot harder to notice or prove that something hasn't happened during a persons lifetime.

    It's very difficult to prove a lot of things in science. That's what we have laboratories for. By direct manipulation of creatures with short life spans, scientists can observe how genetics work and test theories on how a positive mutation might be induced. With more and more computing power at their disposal, it should only be a short time before we're able to begin very fast computer modeling of entire genomes.

    The first surprise has already come in the form of shorter gene sequences than predicted. It wouldn't surprise me if biologists find themselves completely rewriting the books on genetics in the near future. If there's anything that physics has taught me, it's that the Universe is not one to easily give up its secrets.

    I apologise if you took offence at being called a troll earlier, I meant it in the original sense of someone who deliberately posts inflammatory content in the hopes of sparking a debate. (anti-evolution posts certainly qualify on Slashdot)

    Apology accepted, but with one caveat. I never claimed that evolution was false. It's not very nice to stomp all over the theories of others just because you don't like it. My specific complaint was the outright confusion caused by the "micro-evolution" renaming of environmental adaption. I don't like it, because no link has been demonstrated. This gives people the mistaken impression that they can write software using genetic adaption type algorithms, and actually produce software that "evolves" in the macro-evolutionary sense. The truth is that the software doesn't go anywhere. It merely "adapts" to the situation at hand until the most powerful solution can be found.

    As for calling you a creationist.... well are you are aren't you, you claim to not believe in evolution, so how do you think we got here?

    Ac