"Evolved" Caches Could Speed the Net
SpaceDilbert writes "According to New Scientist, evolutionary algorithms could make many network caches twice as efficient. This article describes a study carried out by a US researcher and two German academics, who "evolved" algorithms to determine what data should be held at a cache and for how long."
According to New Scientist, evolutionary algorithms could make many network caches twice as efficient.
That's easy, just cache 4 boobs at a time instead of two
It would be cool if it didn't suck.
It will be interesting to see how many companies buy into this and support whatever solution they end up developing.
fp?
slashdot is gay. /f0rth ps0t
...another article about breeding algorithms. What's so important about that?
Knowing how to program the breeder...
tasks(723) drafts(105) languages(484) examples(29106)
It would be interesting to see exactly which algorithms they are talking about here. I wouldn't be surprised if they drew some ideas from garbage collection algorithms also.
Well, I have found one flaw in the methodology:
I would think this would breed out of the caching system any affinity to locality-of-reference.One of the things I did each morning when I was running a cybercafé was "prime" the Squid cache by running a little script that did a wget -p on all of the URLs in the portal page, and a few sites that were not. And it definitely did improve performance for most users.
One of my unrealized dreams would be some sort of speculative-fetch algorithm for Squid that would basically do a breadth-first fetch on a page while the user was busy reading it.
Of course in my not-so-humble opinion, the biggest problem with any caching system is the population of websites that, through either malice or incompetence develop content that is cache-hostile, and call it "experience enhanced".
How does the Slashdot Effect happen given that no slashdotters ever RTFA?
"An important consideration is what incentives there are for caching information for other users."
:(
So the ISP's will have to upgrade their soft/hardware to make this work?
Will that be worth it ?
In my opinion the article was a bit light on details
This is the sig that says NI (again)
I wonder if they evolved logic to counter the slashdot effect. 1. Scan slashdot.org for new stories every five minute. 2. Scan new story for links. 3. Cash those pages.
From what I can tell a good majority of the traffic that my machine receives is worm traffic. Would these genetic routines be setup to disregard those as cache data? If that's the case would they be setup to just block that data?
That alone would save me quite a bit of traffic as people on my subnet hit me constantly with their infected machines.
66.41.161.120 hit my machine 57 different times (that isn't individual requests, that's total times).
For the uninitiated, elements are added to an LRU cache until it fills up; thereafter, whenever a new element is added, space is made for it by throwing away the least-recently used one. Note, least recently used, not the least recently added, i.e. the oldest, since an element that was cached long ago may be used all the time, and so be well worth its place in the cache. For example, consider the company-logo image that your browser caches when you visit a new site and that is embedded in every page on that site. However old it gets, it's likely to continue to be used while you're on the site. As soon as you move to another site, it gradually shuffles its way down the deck until it falls off the bottom - which is precisely what you want.
--
What short sigs we have -
One hundred and twenty chars!
Too short for haiku.
It already takes forever for DNS changes to propagate through every network, which can be extremely frustrating when you have a high bandwidth domain. There definitely needs some optimization on the DNS front.
GroupShares Inc.
-------
artlu.net
So now Slashdot is going to get cached for a long time and I'll never get first post again :(
Vonal Declosion
I"ve always wondered if something along the lines of cache complimented with a Bittorrent type of scheme couldn't help speed up the internet. that way bits would be mirrored all over, and a server could pull them in faster since more servers did less work each.
just something I'm thinking about today, well, that and the Kerry/Edwards pairing.
PCBRwer342$#
free ipod and free gmail!
From the article:
He[Pablo Funes] suggests networks might in the future be designed to work out who deserves the most help for themselves. "Sophisticated network behaviours might implement rules for reciprocity and trust," he says. "And conversely, for not cooperating with other others who try to abuse our resources."
The future of network security? Imagine the next computer virus outbreak: Every network in the world could recognize the virus type activity and allocate them lesser or zero resources, maybe sending them a "Virus detected, please run antivirus software or contact your IT Department" notice, and detecting outside attacks from viruses and automatically flagging them as unsafe, and not give much(or any) attention to traffic from or to that site
Stuff like this is what keeps computing interesting, I think. This technique can be used by almost every business in a situation in which optimization might be necessary. I assume that covers most tasks someone would want to accomplish. By utilizing these algorithms to explore search spaces previously thought to be too large I predict we'll experience some pretty explosive advances in the near future in areas from farming to software development. But that's kind of obvious.
Crystal Meth: Would you ingest somthing made from a poisonous gas and an explosive metal? You do it every day -- Salt!
Just have the algorithym keep all fleshtone images in the cache...
"One can even imagine each host evolving its own optimal rule."
call me old-fashioned, but I find the mindset of a host evolving a little too "new fangled" for me. is there going to be a big stir-up now on whether or not CS profs can teach this kind of thing in the classroom? maybe it's the creationist in me...
I like to hold on to my cache for as long as i can, but I do like to eat out, and dates can be expensive.
I know I'm going to get myself into trouble for saying this, but "breeding" is NOT evolution! It's environmental adaption! Evolution implies that a major change occurs in the organism for which no preceding genetic code existed. e.g. An ameba suddenly mutates into a multi-celled configuration. Or a giraffe suddenly develops gills. Breeding, OTOH, merely pushes various existing genetic codes to become dominate. e.g. A wolf might develop a thicker coat of fur in a colder climate, while his southern cousin continuously sheds. In both cases the code already exists, the most useful one for the environment merely bubbles to the surface.
:-)
This whole damn thing got so confused when some yahoo renamed "environmental adaption" to "micro-evolution". To distinguish "actual" evolution, they then referred to it as "Macro-evolution". Then they tied the whole thing together by claiming that "minor changes add up, and eventually produce a macro change". As much as I'd love to believe this, there's no evidence to support such a supposition. We can't yet treat it as fact, so the name change is very confusing.
Back on topic, this means that this "evolved" cache won't do anything it wasn't programmed to do. All it will do is automatically "test" various algorithm codes until it finds the one that works best for its current situation.
PS: Troll responses will be ignored. If you don't have something intelligent to say (and I do love intelligent conversation), don't say anything at all. Thank you.
Javascript + Nintendo DSi = DSiCade
Actually, I do genetic algorithm / genetic programming research at the university of michigan. It's unlikely that these guys are using genetic algorithms to develop a new algorithm, but are rather using an existing algorithm and *tuning* the associated parameters using a GA. Given a list of parameters, GA's work by finding the best combination of parameters. As a result, the settings could be constantly tweaked (say on a daily/hourly basis) and different servers could still have different regional settings. My only problem with the concept is that it still depends on the tuning of pre-existing algorithms... but still - the results they share (2x improvement) is encouraging.
Once again, Al Gore has his hand in the shaping of the internet.
I'm sure everyone in the Slashdot community will miss him - even if you didn't enjoy his work, there's no denying his contributions to popular culture. Truly an American icon.
1. Scan slashdot.org for new stories every five minute.
2. Scan new story for links.
3. Cash[sic] those pages.
4. PROFIT!
... it looks like /. is cacheing ZDnet.
If the search space is not dominated by local maxima, basic hill climbing (go for the best neighboring value) will work. And it will be fast. If the function is differentiable, it can be orders of magnitude faster than other methods, because you can use a variant of Newton's Method.
If the search space is small, random search (just guessing) will work by exhaustively searching the space. This is obvious, but tends to be ignored in academic papers all too often.
This discussion also applies to neural nets and simulated annealing.
Now this article at least describes a problem for which a GA might actually be useful. Many such articles don't. But they haven't demonstrated that you need a bumpy hill-climbing algorithm.
This is why, despite all the hype, GAs, neural nets, and such aren't used all that much. The search space has to have the right properties. Not too small, not too big, bumpy, but not too bumpy.
fp?
No.
The is even a link to an online Tron game where us humans can play versus his evolving algorithms. The win/loss stats for his algorithm is approaching even. Given that humans can also evolve in the Tron game play, I imagine that the algorithm will have a head start over the new influx of slashdot visitors and start to win more often than not over the next week. I never got to play though. The SQL db to mange the stats was already down then I tried.
What happens if someone caches Metallica, or the new super hot J-Lo movie? Then how is the guy who brings the director a warm towel ever going to make his money?
You guys should really think before you go out and start making technology that blatently abuses copyrights
</sarcasm>
"Priming the cache" and "doing a breadth-first fetch on a page" are both things that create *more* network traffic on the off-chance that it might save some number of microseconds for the user. It's basically a Tragedy of the Commons situation. Everyone would be better off if no one pre-fetched links, but any given person is better off if they don't cooperate with that global model. So everyone just grabs what they can get and everyone is worse off.
Can we use this tech to fix DNS? It's a really obsolete system, IMHO.
why the hell is "evolved" in quotes???
just because algorithms aren't alive doesn't mean they can't be evolved.
(notice how nice this sentance still looks without the quotes)
music - http://www.subatomicglue.com
No info about the genetic algortihm used. I though New Scientist was a science paper...
What's in a sig?
with Critticall tool.
Original site.
Slightly OT. How does one determine that their cache is working properly? I have a cache and it doesn't appear to make that much of a difference with content, or DNS.
This should be interesting also for the Freenet Guys. Their Datastore currently uses something like LFU i think. Implementing a better algorithm could reduce Hop Times and Performance of the whole Network.
Garbage collection is required to be correct, but is allowed to keep extra stuff around as memory permits. Everything that must be kept can be calculated deterministically without future knowledge, and the same holds true for everything that can be discarded. Approximation merely allows the what can be discarded calculation to progress faster. In garbage collection it is not correct to throw out something that will not be used in the future unless it is also unreachable.
For cacheing there is no requirement of correctness. Performance is improved when things that will be used in the future are kept. There is no correctness requirement for keeping things around, and indeed best performance often requires discarding things that will be used in the future. In addition, there is no way of determining what will be used in the future without knowledge of the future. The correctness requirements of a cache are related to security (don't cache sensitive information) and staleness (don't cache stuff that is too old), but even those may be relaxed rather than strict requirements.
Fundamentally the two problems are very different, and algorithms that deal are successful in the two different cases will likely be very different.
Are we impressed? A mere doubling in efficiency should be achievable by even the drunkest engineer in all but the most well studied problems.
Let's talk an order of magnitude and then you will get my attention. Caching...Hah! I want fibre to my wetwire.
+1 Insightful
This paper has a bunch of examples.
50% of all bits sent over the internet are 0s. Just cache that and we have a 50% cache hit rate. :)
The problem with caches is whether you can trust them. Even assuming you can keep the caches properly up-to-date, how do you prevent cache poisoning from taking place?
Evolution is *not* the "change in allele frequencies over time." Or at least that's a secondary issue, historically, conceptually, and causitively. Well, on the last, allele frequency is pretty important (but still not exclusive). But evolution as a concept is firstly a question of PHENOTYPIC change, even before it's about GENOTYPIC change.
Evolution was a well known phenomenon (e.g. from the geological record) long before the work of Mendel was known. Specifically, the rather well known Darwin had no concept of an allele when he came up with the Theory of Evolution by Natural Selection. In fact, Darwin's approach was most certainly not the only explanation of evolution floated in 19th century biology. For example, Lamarckian mechanisms had a plausibility at the time... in fact, Lamarckian mechanisms still represent an important underexamined area of evolutionary mechanisms, for historical reasons (Lysenko etc.)--see, for example, the work of Ruth Hubbard and the role of viral insertions into the genome. So the whole allele thing is only part of one (important) mechanism, only understood relatively late in the scientific study of evolution.
Moreover, even in modern terms, mutation still exists; admittedly, the high-school text book focus on point mutations of genes is vastly overplayed. But just looking at allele frequencies misses also both the role of regulatory genes, and also ignores structural changes in chromosomes. Genes and smaller sequences migrate within and between chromosomes--this has nothing to do with alleles. Stephen Gould's work on phentotypic growth regulation in _Ontogeny and Phylogeny_ is good here, as is the tome _Structure of Evolutionary Theory_.
Buy Text Processing in Python
Caching information can also be used as a backup of data in case the server crashes or has its data compromised by a virus , hacker, whatever you feel like ..
so what happens if you, for instance, have a security hole in one of your "smart" servers?? or even a breach in the protocol structure (DoS)?
you could get the server to "breed" algorithms that would stop all servers by either corrupting the data in all servers that are providing the service or just DoS-ing them.
I guess i'm not yet convinced that this solution is of any good for real world network which the whole structure is based on insecure protocols.
maybe after IPV6, IPV9, who knows?
What is best in life? To crush your enemies, to see them driven before you and to hear the lamentations of their women.
The evolution that is described in this article is anything but unguided. Researchers took existing carefully developed code. They selectively combined parts of this code. They studied the resulting effects to produce better code. This looks much more like Intelligent Design [discovery.org] than anything Chuck Darwin ever dreamed up.
No, if it were truly following the "intelligent design" theory, there wouldn't be any trial and error to begin with. The code would be written once, from the ground up, and it would work just as well as any code possibly could, without ever needing to selectively combine anything.
Also, your textbooks had Evolution wrongly defined. It is not random, unguided change. The mutations themselves may be random, but the environment (including predators) is what guides the changes, by causing favorable mutations to progress through the gene pool while unfavorable mutations die out quickly.
-CausticPuppy "Of all the people I know, you're certainly one of them." -Somebody I don't know
For a simple method to solving complex problems. There's also Fuzzy logic.
Why?....Why?....Why?....AHHHH...
"In fact, evolution can be precisely defined as any change in the frequency of alleles within a gene pool from one generation to the next."
Yes, you *can* define evolution that way (to be precise, one group of biologists, the population geneticists, do define evolution that way). However, that's an entirely different statement than saying that's the *only* valid definition of evolution, which seems to be what you're implying.
As a molecular evolutionist, I find alleles to be too high level -- I deal with the evolution of sequences themselves. And a zoologist would probably claim that alleles are too low level as zoologists mostly deal in phenotypes.
Natural selection is one of the methods (the other being sexual selection) by which all evolution (macro and micro) proceeds.
Mutation and genetic drift. Also, the term natural selection as Darwin used it included sexual selection (besides ecological selection). The term is used to distinguish these kinds of naturally occuring selection from intentional breeding, which is called artificial selection.
I love C++
Doesn't Freenet already do this within its own network?
Another related item is the ARC algorithm from IBM, which is an adaptive cache for block buffers.
I'm afraid this doesn't seem like a particularly appropriate use of genetic algorithims. In fact the entire piece seems to suggest a poor research project (someone who did genetic algorithms needed a thesis) which got picked up on a slow news day.
First of all I would point out that genetic algorithms are most appropriate for pure optimization problems with minimal mandatory constraints. While loading speed is important for internet caching it must be optimized inside certain constraints. A good cache system would have guarantees about how frequently web pages were refreshed/verified. For instance one might evolve a very efficent cache algorithm which only refreshes slashdot once per day.
Of course one could tack refresh guarantees on top of your evolved algorithm but the more constraints you place on top of the algorithm the less likely it is that this algorithm is more efficent than a more traditional solution. Most likely one would like to make the refresh minimum dependent on the frequency the website updates and placing a complex condition like this on top of a genetic algorithm evolved simply to maximize cache hits is likely to destroy any speed benefit it might provide.
To be fair one could amalgamate all these concerns into the evolution process. Instead of just measuring cache hits/download speed one could also check how frequently the cache page matches the current page and optimize some combination of these scores. However, this doesn't appear to be what was done in the article. Moreover, for web page cacheing some absolute gaurantee of freshness is desierable. The winning genetic algorithm might have very good performance on average but almost never update one person's report on the stock market.
In general it seems like a dangerous idea to use genetic algorithms in situations open to abuse. Handwritten code can be analyzed and it's behavior in exceptional situations relatively easily determined (as opposed to a genetic algorithm). With a genetic algorithm not only is it not clear how it will behave in exceptional circumstances but we aren't even sure what kind of situations are most likely to break the algorithm.
A web cache should robustly handle attempts to misdirect or corrupt it. If someone forges a great deal of web page requests we don't want the cache to dump truly popular pages to load the falsely requested pages. A flaw like this could quite possibly be used to mount a DNS attack using the large bandwidth of the web cache. More worrying is the possibility that a clever hacker might be able to trick the cache into replacing a valid website with a page of his choosing. Handwriting the code allows one to deal with all these potential problems and also ensures that the cache files/records are in a manageable form so sysadmins can manually flush some pages out of the cache or otherwise address errors.
Finally, it is not clear that a genetic caching algorithm would continue to work as the internet changes. If you sort through a wide swath of algorithms it is quite possible that certain features/tricks will end up being hardcoded, for instance the algorithm might refuse to cace anything starting with google.com which will then become inefficent if the internet changes. It would be good to remember the story about the army using a neural net to diffrentiate russian and american tanks and discovering it had just learned to distinguish the time of day as the american tanks had all been photographed at noon.
In short it seems that genetic algorithms have major drawbacks as cache algorithms. Since we have barely broken the surface of handwritten cache systems the effort hardly seems warranted. Before moving to genetic algorithms there are tons of good ideas which should be explored first. For instance making browsers cache aware (so the online cache can cooperate with the browsers cache like I suppose earthlink does), predictively requesting pages linked from downloaded pages, generating user profiles to better predict which pages wil
If you liked this thought maybe you would find my blog nice too:
In IE, when you hit 'back' or 'forward', it takes time to load the page!! Why?? If you tell it to work offline, then everything is instantaneous. There's no reason that it should check to see if a page has been updated after you click back if you were just there 5 seconds ago. Changing this would speed up my websurfing experience a lot more than this caching algorithm.
-grankk
Of course with systems like internet caches, where the cache is not a part of system it caches, it needs to have other measures as well. Not just obeying the expire headers on the things it caches but also trying to see when it has changed even if the expire info are missing or wrong.
And then perhaps trying to predict the update rate of the page and then prefetch pages high in the LRU tables to have fresh versions ready for the user.
To me its building in more capablity to track what and where you go in the internet. Most ISP's already have these servers. I don't see how they really have helped much. Plus being another layer of crap tends to make it also another layer that won't be managed securely and lend it self to the next compromised system.
BYte69
I just took the morning off to register at university for my masters in computer science. My research topic is a distributed caching proxy network. Obviously since i haven't started researching yet, i don't have all the details, but the basic idea, is to have a network of caching servers, each serving a group of users, but interconnected, so that if a user of a particular caching server requests a file that isn't in that server's cache, but is in the cache of another server on the network, then that server gets the file from its neighbouring cache instead of the origin server. Eventually, each of the caching servers end up with a copy of the file and the remote site is only hit once. There are obviously issues that I'll have to deal with, but thats the basic idea... kinda like squid crossed with bittorrent :)
Browsers don't always bother checking DNS expiration times and rechecking the address when they time out. This used to be less of an issue on IE because Windows used to bluescreen several times a day (:-) but these days it can be a problem.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks