Freecache
TonkaTown writes "Finally the solution for slashdotting, or just the poor man's Akamai? Freecache from the Internet Archive aims to bring easy to use distributed web caching to everyone. If you've a file that you think will be popular, but far too popular for your isp's bandwidth limits, you can just serve it as http://freecache.org/http://your.site/yourfile instead of the traditional http://your.site/yourfile and Freecache will do all the heavy lifting for you. Plus your users get the advantage of swiftly pulling the file from a nearby cache rather than it creeping off your overloaded webserver."
Well, it won't be the solution to Slashdotting, as you can't cache a whole site.
You can cache an HTML page (index.html) but all the images will pull from the local machine. You could cache each image separately, but the change would have to be made in the site's HTML.
On the other hand, I don't imagine it would be hard to write some kind of proxy script that grabs the page and changes the HTML to point to freecache SRCs for each image/movie... you could then point to a freecache of that page...
And of course, this all breaks the second somebody has a site that is heavily CGI based.
Still, it's a start. I'll be sure to use it if I ever submit any site of my own to Slashdot ;-) Many thanks to the guys at the Internet Archive for setting this up. You rock!
The facts have a liberal bias. --The Daily Show
In case of Slashdotting, here's a Freecache link.
http://www.archive.org/iathreads/post-view.php?id= 8764
/.'d... and hes apologizing for the load.
He was apparently
As I understand the setup, the ideal would be for ISPs to install this system on their networks like AOL's infernal content caching, except that it would only cache what the site owner wants cached. It seems like anyone with a static IP could join in the fun, too.
But would they? I saw this on the new service's message forum
I was perusing the content in my cache and checking the detailed status page and I noticed illegal content containing videos in one of the caches I run. What is freecache.org doing to stop people from mirroring illegal content. I currently run 2 fairly heavily used caches and it looks like only one of them had illegal content. I cleared the cache to purge the problem, but the user just abused the service again by uploading the content again. I know freecache.org cannot be responsible for uploaded content, but there has to be some sort of content management system to make sure freecache doesn't turn into just another way to hide illegal content.
Whether you believe this guy's story or not, it seems like this could subject small ISPs to the sort of problems that P2P has brought to regular users. It's not going to matter who's right -- just the idea of having to go to court over content physically residing on your server is a risk I don't see a marginal ISP being willing to take.
So we're left with the folks with static IP addresses. They're in even more trouble if John Ashcroft decides to send his boyz over to check for "enemy combatants" at your IP address.
With the current state of affairs in the US, and the personal risk involved, I'd have to pass on this cool concept.
Stressed? Me? Of course not. Stress is what a rubber band feels before it breaks, silly.
http://freecache.org/http://your.site/yourfile
f reecache.org
http://freecache.org/http://freecache.org/http://
seems to piss it off slightly. I wonder why...
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
1. Buy massive amounts of bandwidth
2. Host extremely popular web sites
3. ???
4. PROFIT!!!
How are they supposed to be making money on this?
I am defenseless. Use your button. Mod me down with all of your hatred.
If the referrer is slashdot, return a link to the google cache of your page element, rather than the actual element.
I trust google to be faster than these guys.
How much you wanna bet this is going to become a haven for bit-torrent seeds? Put 'em up, get 'em to people, get it started, then take 'em down.
its pretty good. lots of the servers are swamped tho, need more of them, anyone can run a freecache 'node'. its almost like freenet, cept not anonymous.
too bad the status seems to be down, its fun to see what clips/games/demo/patches are going around.
on slashdot - lots of times. It only cache's files bigger than 5MB so if someone is slashdotting your MP3 collection it's a boon. If you're jsut hosting a dynamic web page with dynamic images your mysql server is still going to feel the strain.
- Does that mean that Slashdot will now link to potentially low-bandwidth sites using Freecache?
- Will you update their FAQ on the whole subject of caching since Google and Freecache seem to feel that the legalities of site caching is small enough for it to be a non-issue?
- Or are we still going to be relying on people posting links and site content in the comments because the original site has been blown away under the load?
Inquiring minds would like to know.Avantslash - View Slashdot cleanly on your mobile phone.
I should point out that Freecache is in beta mode. By coincidence, this posting on Slashdot here is an interesting way of working out bugs.
This sig no verb.
The demo seems to be down.
0 /movies/LuckyStr1948_2/LuckyStr1948_2.mpg
Oh crap that was the wrong link - try this:
http://freecache.org/http://movies03.archive.org/
Slashdot should have their own caching system that automatically creates a cache of whatever website is being posted.
As their status page explains...
I have a few questions though, which I guess may be answered on the website:
1. Can users submit/upload files to be hosted on their website.
2. Who's responsible for ensuring that it doesn't turn into a pr0n/warez stash?
3. Can users request removal of cached content (something not possible with the Google cache).
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
Yes, but the thing that you are not considering is that probably 75% the slashdot effect is just people looking at the link for about 5 seconds, and then closing the page and moving on the the next story. This means no browsing, meaning that it is not important if the whole page is not up there. And as far as pictures go, I would guess that alot of people click on the link, even though they are not too interested, see the text, and realize that they are _really_ not interested. So they close the page before they even need pictures.
In other words, the important stuff, like the rest of the site and the pictures, will be resources only used on those that really care, while those that don't get to see a flash of the text for a second to get a really general idea.
After all, thats what the slashdot effect is, a whole bunch of people that don't really care that much, but want a quick, 5 second look at it.
I see dreaded pictures from goatse.cx in the future. This will break the nice convenient domain name clues that Slashdot gives us, so we don't accidently do things like that.
I think they're looking more for serving big files, not html and inline images. Smallest file size is 5mb.
It's not about the editors, it's about the authors. You, as an author, can use the freecache service by using their style links in your pages. It doesn't cost you anything to do it, and it's pretty easy to do.
It's not perfect, it will certainly not be used by everyone. Still it's something you can do defensively, especially if you're serving mpegs of your latest case mod or bear attack or whatever.
-Zipwow
I don't know which is more depressing, that 2/3 didn't care enough to vote, or that 1/2 of those that did are crazy.
Apparently, it's pronounced "free-crash" right now...
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
-- Nothing unusual happened today
Did anyone else misread that as "Freeache"?
I mean, I'm all for free stuff, but an ache...?
Create a file format that is basically just the web page plus dependent files tar'd and gzip'd - then release browser plugins that automatically take any file with the correct extention, and seamlessly ungzip/untar it to the local cache before displaying it like normal - I have yet to understand why nobody has combined this basic idea with BitTorrent. Seems like you could get a lot of mileage with it.
Let me guess.. you're posting this from soviet russia?
This use of Freecache is still subject to the actual problem that enables Slashdotting: inadequate scaling planning. Some sites are limited by the cost of effective scaling failover countermeasures, but most are limited by lack of any planning for even potential Slashdotting - this use of Freecache still falls prey to that primary problem. And who can remember to prepend "http://freecache.org/" to their entire domain URL, including their repetitive "http://"?
n g". More sites will be able to plan for that single change to their webserver config, than will be able to plan to distribute the freecache.org compound URL. And it won't depend on users correctly using the compound URL. More sites will get the benefit of the freecache.org service. And when freecache.org disappears, or ceases to be free, switching to a competitor will be as easy as changing the config, rather than redistributing a new URL.
A better use of Freecache is "under the hood". Make your webserver redirect accesses to your "http://whatever.com/something" to "http://freecache.org/http://whatever.com/somethi
--
make install -not war
Actually, index.html would only be cached if it is 5MB or greater in size.
Which is unlikely. So it won't be cached. Nor will the PNG/GIFs.
Ratboy
Just another "Cubible(sic) Joe" 2 17 3061
Not really.. I can't access their servers now. All will tremble before the might of slashdotting!
That is another stumbling block that will prevent it from saving may websites. If I can't use the freecache link, I will be forced to go back to the orginal link...as will a good percentage of the other /. crowd.
KevG
Information: "I want to be anthropomorphized"
Story is only a few minutes old and mecca of Internet caching has already been slashdotted. Maybe someone kid with an old P5 266mhz under his desk can mirror the site for us.
-=-=-=-=- osjedi uses Debian GNU/Linux. -=-=-=-=-
Is a public available squid server. If you put any link through the server such as:
www.squidserver.com/http://www.doomedsite.com
The public squid will cache a copy of it. On the first access (like when the approver looks at it) It should look at a request and see if it has a recent cache. If it does feed that, if not get the newest copy and promth the user for a refresh or automatically refresh after a set time (5 sec). It will update its cache as the site does. All without having to upload anything. After a few days when nobody is utilizing the cache, it can purge it. Waiting for the next doomed site.
DISCLAIMER: The may be how Freecache works, but I can't get to it
1) because I am at work.
2) as the comments suggest it is slashdotted.
KevG
Which is why it's very important to have a simple, clean, and informative main web page with links to more details. Sites that overload their main page with crap actually drive away viewers.
You are being MICROattacked, from various angles, in a SOFT manner.
Just pad out your pages with lots of hidden text.
:)
<!--
<?php
for ( $i = 0 ; $1 < 5000000 ; $i++ )
{
print "a";
}
?>
-->
Hey presto. All your pages are > 5MB!
Get your own free personal location tracker
How does this system guard against doctored content coming from the cache sites? Since they allow sites to sign up to become a cache server, wouldn't it be possible for a malicious user to sign up and use some locally-modified code to add a virus to all the .exe files that get sent out from their cache? They could even customize the output of their CGI depending on what domain you are in, making it easy to target specific sites and/or hide their munging from other sites.
..wayne..
Freecache is really just a half-baked ("precursor") version of P2P; not in any sense a long term solution, but interesting at least.
Correct use of P2P with network based caches (i.e., your ISP installs content caching throughout the network) and improved higher level protocols (i.e. web browsing actually runs across P2P protocols) would resolve slashdot effect type problems and usher in an age of transparent, ubiquities, long-lived, replicated content.
For example,
Basically, your request (and thousands of other slashdot readers requests) would fetch "closer" copies of content rather than having to reach directly to the end server (because, the content request [i.e. HTTP GET] actually splays itself out from your local node to find local and simultaneous sources, etc]. In theory, the end server would only deliver up one copy into the local ISP's content cache for transparent world-wide replication, and each end point would gradually drag replicated copies closer - meaning that subsequent co-located requests ride upon the back of prior ones. I'm just repeating the economics of P2P here
In additional to all of this, you'd still have places like the Internet Archive, because they would be "tremendously sized" content caches that do their best to suck up and permanently retain everything, just like it does now.
Physically locality would still be important: if I were a researcher doing mass data analysis / etc, then I'd be better of walking into the British Library and co-locating myself on high speed wi-fi or local gigabit (or whatever high speed standards we have in a couple of years time) to the archive rather than relying upon relatively slower broadband + WAN connections to my house or work place.
For example, say I'm doing some research on a type of flying bird and want to extract, process and analyse audiovisual data - this might be a lot of data to analyse.
Equally, places like the British Library will also have large clusters, so when I want in there to do this data analysis, I can make use of large scale co-located computing to help me with the task.
Nothing here is now: if you think about it, these are logical extensions of existing concepts and facilities.
You must be new here, or you would know the the news is old here.
An error occurred while loading http://www.archive.org/web/freecache.php: Timeout on server Connection was to www.archive.org at port 80
Somehow I don't think this solution will work.
Ban Reality TV!
Slashdot is a news site. If you post a link to a website thats 12 days old chances are its not going to have the information you expected it to have.
Bwahahahahaha (cough)(cough) bwahahahahahahhhahaaaaaaaa
Virus infects both Windows and Linux!
I wonder why this continues to be a problem. It should be obvious to any judge that a hosting provider cannot and should not check everything that is uploaded to their servers.
It may be reasonable to expect them to pull content that is illegal where they are located, but that should be a simple matter of notifying them, they pull the content, no harm done. They may even be required to disclose the identity of the uploader, after which this person can be prosecuted.
I don't think anything in this scenario is outrageous or unfeasible. What is outrageous and infeasible is holding the host responsible for what the user uploaded. Then why is this the way it happens all too often?
Please correct me if I got my facts wrong.
Can I say RTFFAQ now? :)
Yes, the site is down. Yes, it's ironic that this should happen to a site hosting information about a service that's being claimed as a solution to the slashdot effect.
But I don't think that it really is an indicator. I happen to have read the site yesterday after reading the Petabox article, so I think I have some of the basic concepts down. As I understand it, the idea works with cooperation from ISPs (and others) to provide more localized caches of large popular files. The motivation for the ISPs is that by providing the cache, they save on their upstream bandwidth and the associated costs.
So, while it's funny that we've slashdotted the archive.org server where the Freecache website is, Freecache itself is not dependant upon archive.org's bandwidth.
It's also worth noting that the concept is still in beta and pretty new - I don't think they've got a lot of ISPs on board yet. From what I can tell, it seems a very good concept - the only thing I can think of that I would want to make sure of if I were an ISP is that my cache is only available to users on my network (the whole saving on bandwidth usage argument falls apart if you suddenly become a cache for users on other ISPs) but I would think that would be pretty easy to do.
For those who haven't yet been able to read about it, here's Google's cache of the front page.
Information doesn't want to be anthropomorphized anymore.
Yes, but the thing that you are not considering is that probably 75% the slashdot effect is just people looking at the link for about 5 seconds, and then closing the page and moving on the the next story.
The other 25% is us looking at a page for 5 seconds and then replying because as everyone knows here, it's much more entertaining to reply without RTFA.
This is not a dream, not a dream...we are transmitting from the year 1-9-9-9.
I'm waiting for the introduction of the resource file. Sort of like a jar file...you can access content in it, but it transfers as a unit.
;) )
An entire site might be stored in a resource file. Or just the files a single page depends on. You could have a meta tag that points to the resource file for a site. Or a hyperlink on the front page to the resource file for an entire site.
And guess what...if it's over 5MB, Freecache will cache it.
There will be some conflict with per-MB bandwidth charges for hosts, though. But I'm sure someone will work out a decent solution. (like Freecache.
tasks(723) drafts(105) languages(484) examples(29106)
This would be great if my employer didn't restrict access to archive.org as allegedly being in the "sex" category.
Secession is the right of all sentient beings.
Yep
The MHT format is specified in RFC 2557, an open standard.... so you can implement your own MHT writer or reader if you like.
The trick with saving a page as an MHT in IE is that if the page includes any frames that are not visible (which are made visible by script that runs when the user clicks on buttons for example), IE appears to not automatically load that content, so the saved page doesn't include it. If you have a complex page, you might need to write code (or use chili kat if it's in your budget) to get an MHT created in the manner you would like.
Only on /. could you find someone optimizing code that would be used to bloat web pages.
Overrated / Underrated : Moderation
Definitely not an adequate solution, given it's current condition: slashdotted to hell.
Idiots! They should've had it cache itself first before posting this to /.
The problem with non-comusator caching systems is that there is little if any incentive for the end user to want to use them.
What ISPs should really do, is sell you a 256K internet connection (or whatever speed you happen to get), but then make all local content available at maximum line speeds... In other words, if you use the caching system (which saves the ISP money on the price of bandwidth) you get your files 6Xs as fast, or better in some cases.
I don't see why ISPs don't do that. It seems like everyone would win then. It wouldn't just need to be huge files either, they could have a Squid cache too, and not force people to use it via transparent proxy (most people would actually want to use it, despite the problems with proxy caches).
Right now, users have incentive not to use it. Mainly because it's another manual step for them, and to a less extent because caching systems usually have a few bugs to work out (stale files, incomplete files, etc).
I know that it would only require minor modifications to current DSL/Cable ISP's systems to accomplish the two zones with different bandwidth.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant