"Evolved" Caches Could Speed the Net
SpaceDilbert writes "According to New Scientist, evolutionary algorithms could make many network caches twice as efficient. This article describes a study carried out by a US researcher and two German academics, who "evolved" algorithms to determine what data should be held at a cache and for how long."
According to New Scientist, evolutionary algorithms could make many network caches twice as efficient.
That's easy, just cache 4 boobs at a time instead of two
It would be cool if it didn't suck.
It would be interesting to see exactly which algorithms they are talking about here. I wouldn't be surprised if they drew some ideas from garbage collection algorithms also.
Well, I have found one flaw in the methodology:
I would think this would breed out of the caching system any affinity to locality-of-reference.One of the things I did each morning when I was running a cybercafé was "prime" the Squid cache by running a little script that did a wget -p on all of the URLs in the portal page, and a few sites that were not. And it definitely did improve performance for most users.
One of my unrealized dreams would be some sort of speculative-fetch algorithm for Squid that would basically do a breadth-first fetch on a page while the user was busy reading it.
Of course in my not-so-humble opinion, the biggest problem with any caching system is the population of websites that, through either malice or incompetence develop content that is cache-hostile, and call it "experience enhanced".
How does the Slashdot Effect happen given that no slashdotters ever RTFA?
I wonder if they evolved logic to counter the slashdot effect. 1. Scan slashdot.org for new stories every five minute. 2. Scan new story for links. 3. Cash those pages.
From what I can tell a good majority of the traffic that my machine receives is worm traffic. Would these genetic routines be setup to disregard those as cache data? If that's the case would they be setup to just block that data?
That alone would save me quite a bit of traffic as people on my subnet hit me constantly with their infected machines.
66.41.161.120 hit my machine 57 different times (that isn't individual requests, that's total times).
For the uninitiated, elements are added to an LRU cache until it fills up; thereafter, whenever a new element is added, space is made for it by throwing away the least-recently used one. Note, least recently used, not the least recently added, i.e. the oldest, since an element that was cached long ago may be used all the time, and so be well worth its place in the cache. For example, consider the company-logo image that your browser caches when you visit a new site and that is embedded in every page on that site. However old it gets, it's likely to continue to be used while you're on the site. As soon as you move to another site, it gradually shuffles its way down the deck until it falls off the bottom - which is precisely what you want.
--
What short sigs we have -
One hundred and twenty chars!
Too short for haiku.
It already takes forever for DNS changes to propagate through every network, which can be extremely frustrating when you have a high bandwidth domain. There definitely needs some optimization on the DNS front.
GroupShares Inc.
-------
artlu.net
I"ve always wondered if something along the lines of cache complimented with a Bittorrent type of scheme couldn't help speed up the internet. that way bits would be mirrored all over, and a server could pull them in faster since more servers did less work each.
just something I'm thinking about today, well, that and the Kerry/Edwards pairing.
PCBRwer342$#
free ipod and free gmail!
From the article:
He[Pablo Funes] suggests networks might in the future be designed to work out who deserves the most help for themselves. "Sophisticated network behaviours might implement rules for reciprocity and trust," he says. "And conversely, for not cooperating with other others who try to abuse our resources."
The future of network security? Imagine the next computer virus outbreak: Every network in the world could recognize the virus type activity and allocate them lesser or zero resources, maybe sending them a "Virus detected, please run antivirus software or contact your IT Department" notice, and detecting outside attacks from viruses and automatically flagging them as unsafe, and not give much(or any) attention to traffic from or to that site
Actually, I do genetic algorithm / genetic programming research at the university of michigan. It's unlikely that these guys are using genetic algorithms to develop a new algorithm, but are rather using an existing algorithm and *tuning* the associated parameters using a GA. Given a list of parameters, GA's work by finding the best combination of parameters. As a result, the settings could be constantly tweaked (say on a daily/hourly basis) and different servers could still have different regional settings. My only problem with the concept is that it still depends on the tuning of pre-existing algorithms... but still - the results they share (2x improvement) is encouraging.
Once again, Al Gore has his hand in the shaping of the internet.
I'm sure everyone in the Slashdot community will miss him - even if you didn't enjoy his work, there's no denying his contributions to popular culture. Truly an American icon.
If the search space is not dominated by local maxima, basic hill climbing (go for the best neighboring value) will work. And it will be fast. If the function is differentiable, it can be orders of magnitude faster than other methods, because you can use a variant of Newton's Method.
If the search space is small, random search (just guessing) will work by exhaustively searching the space. This is obvious, but tends to be ignored in academic papers all too often.
This discussion also applies to neural nets and simulated annealing.
Now this article at least describes a problem for which a GA might actually be useful. Many such articles don't. But they haven't demonstrated that you need a bumpy hill-climbing algorithm.
This is why, despite all the hype, GAs, neural nets, and such aren't used all that much. The search space has to have the right properties. Not too small, not too big, bumpy, but not too bumpy.
The is even a link to an online Tron game where us humans can play versus his evolving algorithms. The win/loss stats for his algorithm is approaching even. Given that humans can also evolve in the Tron game play, I imagine that the algorithm will have a head start over the new influx of slashdot visitors and start to win more often than not over the next week. I never got to play though. The SQL db to mange the stats was already down then I tried.
Evolution is the change in allele frequencies over time. Pure and simple, that's all it is. Check any evolutionary biology book. The notion of giraffes sprouting gills is absurd and not even remotely what evolution is except in the creationist strawmen. What prevents a whole lot of small changes from adding up? No one has discovered a barrier or any reason that this wouldn't occur. It's like saying that water can move grains of sand, but there's no proof that a lot of water will eventually erode a beach.
The evolutionary algorithm will have a range of all possible algorithms that can be developed, so in a sense it is limited to "test various algorithms", though it would be testing all possible algorithms. Similarly biological evolution is limitted to testing various imperfect self-replicators, meaning all possible imperfect self-replicators. It is further constrained by the current state, but then that's the problem with non-biological GAs as well, the King of the Hill effect.
ANY environmental adaptation of genetic code over multiple generations is evolution. It does not have to be a major change. BTW, creationists are the only ones that make a distinction between macro-evolution and micro-evolution. If fact, creationists are the only ones that I have ever heard use the terms.
Technoli
Most computer scientists loosely use the term "evolutionary programming" to talk about algos that have inherently unpredictable ("emergent") results rather than modelling actual evolutionary processes observed in nature, although that's also a fair part of it all. (Incidentally, I also believe the "Science" topic assigned to this story is wrong for this reason.)
The meta-algo is evolutionary programming in that the algo fianlly "developed" by the meta-algo is apparently result that isn't immediately apparent, indeed, one that perhaps unpredictable by humans.
More than mere navel gazing.
If you had studies GAs, you would know that GAs do "create" new information that doesn't exist in the original solution set (i.e. any given generation of potential solutions).
GA systems both propogate "good" information, as well as generate and test new information incrementally across generations. Of course, this is done within whatever limitations you choose to impose on the solution set (i.e. if you're using a fixed N-bit string to represent entities in the solution space, then you'll search across all 2^N possible solutions in the N-bit string space).
Fundamentally, GAs are just a way of searching a solution space for good solutions. They are slow, but can tackle a more diverse set of problems easier than the traditional backpropogation-based neural network. I would consider neural networks to be more in line with your description of an 'adapting' learning algorithm.
The following analogy can only be taken so far, but you can compare backpropogation-based neural networks to be akin to hill-climbing algorithms - you start from one solution, move some limited distance towards what you think is a better solution. For a neural network with N vertices, you can think of neural network learning as hill-climbing in an N-dimensional space over whatever field used to specify edge weights.
GAs, on the other hand, do a much more disparate search over any solution space. And it is surprising how well it finds near-optimal solutions even for problems where the relation between solutions is complex and non-intiuitive (e.g. minimums for seemingly chaotic functions).
-Laxitive
"Priming the cache" and "doing a breadth-first fetch on a page" are both things that create *more* network traffic on the off-chance that it might save some number of microseconds for the user.
More traffic isn't necessarily bad. While what you say is true, you fail to note that user-initiated traffic is done in bursts. Just like your CPU is idle >95% of the time, so is your network connection. So all users benefit, both in real and perceived performance, when there is a steady 100% utilization.
So everyone just grabs what they can get and everyone is worse off.
Again true, but naive. What would be better is if there were a mechanism to prioritize the pre-fetch cache. Every page has one link that is pretty much the most likely next page. Then a secondary link, and so on. Ideally, a site owner should be able to put that priority list somewhere in the page such that a user agent can start getting it after the current page has loaded and is being read. Otherwise, maybe the user agent can favor new content (i.e., compare this load of Slashdot with the one done five minutes ago and grab the links in the diff). That's a far cry from a mad grab, and would probably benefit all parties involved.
Garbage collection is required to be correct, but is allowed to keep extra stuff around as memory permits. Everything that must be kept can be calculated deterministically without future knowledge, and the same holds true for everything that can be discarded. Approximation merely allows the what can be discarded calculation to progress faster. In garbage collection it is not correct to throw out something that will not be used in the future unless it is also unreachable.
For cacheing there is no requirement of correctness. Performance is improved when things that will be used in the future are kept. There is no correctness requirement for keeping things around, and indeed best performance often requires discarding things that will be used in the future. In addition, there is no way of determining what will be used in the future without knowledge of the future. The correctness requirements of a cache are related to security (don't cache sensitive information) and staleness (don't cache stuff that is too old), but even those may be relaxed rather than strict requirements.
Fundamentally the two problems are very different, and algorithms that deal are successful in the two different cases will likely be very different.
50% of all bits sent over the internet are 0s. Just cache that and we have a 50% cache hit rate. :)
Caching information can also be used as a backup of data in case the server crashes or has its data compromised by a virus , hacker, whatever you feel like ..
so what happens if you, for instance, have a security hole in one of your "smart" servers?? or even a breach in the protocol structure (DoS)?
you could get the server to "breed" algorithms that would stop all servers by either corrupting the data in all servers that are providing the service or just DoS-ing them.
I guess i'm not yet convinced that this solution is of any good for real world network which the whole structure is based on insecure protocols.
maybe after IPV6, IPV9, who knows?
What is best in life? To crush your enemies, to see them driven before you and to hear the lamentations of their women.