Startup Webaroo to put the 'Web on a Hard Drive'?
An anonymous reader writes "A new startup called Webaroo is launching Monday with an audacious proposition: You can search the Web without a net connection of any kind. Initial release consists of 'Web packs' on specific topics such as news, city guides or Wikipedia. Later this year they're promising a full-Web version that you can carry on a laptop -- provided you're willing to devote something in the neighborhood of 80 gig."
I'm sold. Does anyone have the .torrent for it?
------
beware he who would deny you access to information, for in his mind he dreams himself your master
A new startup called Webaroo is launching Monday with an audacious proposition: You can search the Web without a net connection of any kind.
If anyone doubted the next dotcom boom is upon us, this should put that doubt to rest.
After reading the article, it sounds like they are just selling their web cache, nice idea but really unless they are selling really cheap I just can't see it picking up, especially considering the difficulties of getting the data to your drive, I mean an 80G download!
Additionally what if I decide to follow site links that leave the cache?
Yeah I can't really see this picking up.
GeekServ Unix Consulting Services (http://www.geekserv.com)
when someone asked if the internet will fit on a floppy?
The war with islam is a war on the beast
The war on terror is a war for peace
How soon till the first lawsuit is filed.
Undetectable Steganography? Yep, there's an app fo
Is this really the right to to try this? when wi-fi connections are popping up all over the place and the internet's bigger than it ever has been before?
Wouldn't there be an issue here of selling another person's content? While everyone can view the content at will, copying that information to media and then reselling it, or even distributing it for free, would be an issue.
With hard drive sizes so much larger than they used to be, why limit the space to 80GB? I carry around a 250 with my laptop, and if you plan on having so much data, why not make it even larger?
Yeah yeah I did not RTFA, so if this is answered in the article, well...Eh.
Considering the fact that companies are suing google for putting the first paragraph of their news tidbits on google news, how long will it be before someone sues webaroo for copyright infringement? Whether the claim is valid or reasonable or not is a moot point - someone is gonna see this as infringement and call out their pack of rabid lawyers.
look at news without a net connection? Either this is going to be just the same as viewing pages offline after you've been on them (perhaps an automated web crawler which grabs pages whilst you have some up time) or you will be viewing very old news... It seems to be the former though, in which case your not really doing it "without a connection"... so why bother? this seems like a waste of space and time (an bandwidth), just look at what you want to when your plugged in rather than constantly getting information you may never need
*''I can't believe it's not a hyperlink.''
The Airport example highlights the major weakness of this software: whait if I want to send and recieve real-time messages and news in that 5 minutes before a flight?
Has it's uses, though.
This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen
wireless broadband access.. why would I want to download the web on my harddrive, when I will have (if not already) access to it from virtually anywhere ?
I see potential educational uses, but not wide spread adoption.
Is it just me, or does seem like a left-over post from last saturday?
Bah, this is old news. We swedes have been buying "Internet on a cd-rom" from http://home.swipnet.se/snezzer/pi/ for a long time. You can even buy it on VHS for 489:- or DVD with surround sound!
Did somebody say "Weaboo?" Because I... oh, webaroo. Damn, nevermind.
For example, where do we get the porn diffs?
Did you know my dad's dog died?
Been around since the early 90's. Back then it was called "fan fiction."
80 gigabytes of Natalie Portman pictures - sweeet!! Where do I sign up..
e.g. searching? Having Wikipedia on your hdd is all well and good, but if you can't easily search it, what's the point?
The problem with slashdot is that most of its users were bullied and stuffed into lockers as kids!
They should be selling their compression technology!
Without a proper flamewar, Anonymous was undecided on what shell to run.
FTFA:
Webaroo will also be touting the potential cost savings...
"Every hotel I go to wants to charge me $10 to $15 a night for Internet. Every airport wants to charge me another $10 to get connected," Husick says. "If I've got five minutes before I have to board my flight, do I want to spend that five minutes connecting or do I want to spend five minutes getting my search answer?"
I would be more interested in checking email(s) than assimilate search results.
This is the single dumbest thing I've seen on Slashdot recently. As someone has already posted, why carry the internet as hard copy when wifi is becoming ubiquitous? In any case, is it just my tin-foil-hat nature that sees this as a great way of hiding/censoring parts of the internet? I mean, if this were to actually take off we'd be trusting a single source of info, with little or no culpability to the public. Granted if this became popular we'd see other sources come in, but....oh to hell with it.
I'm not going to waste any more time on this. It's just an exercise in paranoia. Nothing to see here, move along.
Don't use the Troll mod just because you disagree with me.
I'm posting this message from my Webaroo offline internet connection.
That would cover about 0.0000000001% of the web, give or take a few dozen orders of magitude.
Concealed Handgun License Courses in Plano, Texas
I doubt the implementation will work properly without CPIP.
that you will be able to download it from itself once you have it installed on your HD?
http://en.wikipedia.org/wiki/Freenet
At least if you have it on your harddrive you don't have to redownload it to get your dupes!
liqbase
it's just the president nixon stereotype version of the normal web.
i'm going to surf the webarooooo
I do not accept czechs.
When I can get to a PC, I can usually get to the net. Do they offer hardcopies instead?
Massive copyright infringement.
They'll crash and burn.
"The Internet Archive Wayback Machine contains approximately 1 petabyte of data and is currently growing at a rate of 20 terabytes per month. This eclipses the amount of text contained in the world's largest libraries, including the Library of Congress. If you tried to place the entire contents of the archive onto floppy disks (we don't recommend this!) and laid them end to end, it would stretch from New York, past Los Angeles, and halfway to Hawaii."
Internet Archive Frequently Asked Questions
How big is Google's index of the Web, complete with URLs of results? I could search that, only a day out of date, without a Net connection, if it fit on a HD. Maybe using Usenet to distribute it...
--
make install -not war
I've got news for Husick. I'm a lawyer who have sets of Statutes, Court Rules and Local Rules behind his desk. I still look them up online to make sure I have the most recent version. I can't afford not to.
Search performance? Rarely, if ever a problem.
Siphon traffic away from "increasingly crowded broadband networks?" They make money from that traffic. They can't, if necessary, charge per data download? Tier the service by download bandwidth? Charge more? Build a better network?
The first cell phone or wireless device that expects me pre-download some portion of the net, that portion being determined by somebody else, is the first one I can cross off my list.
Save $5 or %10 at the airport by not connecting? What if I want to send or receive e-mail? Get the latest news, business or stock information? I'm AT AN AIRPORT, which implies I have some money, and in his context that I'm on business. I'm going to foregoe a net connection for $5 or $10? If my employer is that tight, I'm looking for another job anyway -- one that doesn't use Webaroos' services.
This reminds me of software solutions to cramped hard drive spaces awhile back. On the fly file compression and expansion when data size was outstriping hard drive size for a short period of time. (Remember the file corruption.) Even though there was a market for those products, barely, everyone and his brother knew that market was going to go away Real Soon Now.
Only Women Bleed (Sex, Sharia remix)
yeah... I don't know, usually when i'm travelling and i'm trying to get online i'm trying to connect to my VPN, or check my email...etc..not trying to look up what the capital of Georga is?.. plus how many people have a spare 80GB's on their laptop?????? (NOT ME)
plus i don't know where they are staying but the hotels we use have free HS Internet in the rooms?
actually I am happy to see you, however that is in fact a banana in my pocket.
Only if one of the webpacks is porn. Or better yet, if several are porn, cross referenced by type and participants.
Though, my vaguely disturbing ramblings do raise an interesting point, maybe - what's their stance on the indecent materials that make up a good deal of teh webernet? When they say the "whole internet," do they MEAN goatse too?
I can already tell you which side of the line it falls on. In addition to 80g being a thimbleful of ocean, websurfing is not my main use of the internet. How, for instance, are they going to support reading blogs, or even /.? My main use of the internet is to send and receive mail. Followed by participating in several blogs and fora (like /.). My home page is Google/ig, set up to monitor several RSS feeds, email, and news. This idea is so bad it isn't even wrong. It's pathetic.
Concealed Handgun License Courses in Plano, Texas
Would the downloadable content include porn?
Er, I'm asking this in order to, er, protect my girlfriend's sensibilities. Can't have her unwittingly downloading such naughty stuff you know. =)
I see issues of copyright coming up. Just linking to sites these days can get people into trouble, what will be the repercussions of essentially taking all this data and stuffing it on someones hard drive.
Only 'flamers' flame!
Does slashdot hate my posts?
Download.
On a more serious note, in a few years, won't there be wireless internet in the vast majority of places that you would be doing work? Why not work on getting internet everywhere, rather than a dumbed-down crippled version that uses up a big chunk of hard drive space? It seems like the opposite direction of where things are going. With the number of emerging internet based services that used to be only on the desktop (ie. office applications, image management, etc.) it seems like everything is moving to be -online-, not the other way around.
Oh my! The endless copyright battles that will ensue!
-Grey
Silver Clipboard: Time Management Tips
I missed that eBay auction deadline again! I'd better start using FedEx for the new versions.
Well I, for one, welcome our new "wget" overlords.
Webaroo's creator commented that his inspiration to create the service came from beholding the immense power of SQL on Rails
From the website "Webaroo is a stealth-mode technology startup" which obviously means something very clever ... personally I use WinHTTrack on a small number of sites, now if someone offered pre-downloaded WinHTTrack sites ...maybe to order ... ... "What do Daleks have for a snack? ...
Anyway, more importantly - Dr Who is due back on UK TV soon I think (slightly disappointing end to last series - shame to to see Chris E leave) so here's a joke that Webaroo might like to to 'cache'
Dalek bread..." geddit? (thanks to a kids radio show for that one).
could give me Duke Nukem Forever or the next Amiga OS release.
80 gigs is not nearly enough space for all the porn. Which is what most people search for anyway. 'Web on a hard drive' indeed!
for the leather bound book inscribed by cyber-monks, with hand illustrated, gold leafed side-bars.
I lost my sig...
now that I can get internet access on my laptop pretty much anywhere I get cell reception, however, there isn't much of a point.
So what should the transatlantic and transpacific frequent fliers use? Wi-Fi and cellphones don't work on an airplane.
This actually isn't by any means a new idea.
If you've ever written or read html, you know that html doesn't care if links start file:// or if they start html://. HTML has always been quite neutral on whether it was linking to a local file system or getting something over the internet. Of course, most people don't use html extensively for local content. So in theory, this isn't a new idea at all.
In practice, I don't see a lot of points for it. I can imagine that some people might want a map of a new city, with clickable pictures and informations about various services there. Most features of a city map are going to stay the same for at least six months, so this is the type of thing that could be done staticly. But even with this, internet access is so widespread, that it seems like a solution for a minor problem. Also, if you want a handy city guide, it would make more sense to me to write it from scratch rather than use a cludge of cached web pages.
Hopefully I didn't put any [] around my words.
Technically, they make a copy and the ISP doesn't.
Isn't the ephemeral copy in the RAM of a router still a copy? And don't operators of automated caches have a fairly broad exemption under United States copyright law, 17 USC 512(b)?
It's an offline, indexed database; interesting but hardly newsworthy. So unless they've broken the Shannon limit there's nothing more here than IPO fodder.
"I hate to advocate drugs, alcohol, violence or insanity but they've always worked for me" - HST
Imagine a version of Slashdot that you can refresh all day long, but no new articles appear.
You can post comments, but they never show up.
This service is a potential disaster that can drive millions of geeks on the edge of desperation or worse.
Think of the geeks, people...
Grandparent poster posits that without the database software running on your laptop, Wikipedia won't work.
How hard is it to set up a local WAMP (Windows-hosted Apache, MySQL, and PHP) server in a slick installer?
How are they to justify selling other peoples' websites? What about the sites' lost ad revenues?
Bye bye web 2.0
That's impossible! I have nowhere near all the pr0n on the interweb and I've used way more than 80GB.
Terrific, we'll have web packs that omit relevant information just the way the media suddenly and completely omitted any mention of Howard Dean as a candidate about halfway through the last election. That kind of obvious collusion won't be necessary now, we'll just be able to read fair and balanced news about one candidate.
All the news that's fit for you to read, Citizen!
But not in the way they think. TFA mentions two points, but doesn't explore them in depth. The first is their algorithms they use; let's face it, Google is starting to fall to the SEOs. If they have a new algorithm that was able to actually follow your web browsing all the way, they'd be able to provide much better results. Google claims to do this, but they can't follow you more than your first link. Second, they seem to pick up that most people find their entire information on the second or think link they visit.
Combine these together, and the program could offer you 80 gigs of data to just sit on your computer and be sifted through at yuor leisure. It would be able to follow you through, and find exactly how you get through your data. When it needs to, it can spider into areas that it might think you'd want to go (Been looking at a lok of Wikipedia? Next time you connect, it goes an picks up some wikibooks).
The best part, is that all the "Big Brother" information is being stored on YOUR computer, not their servers. You want that info, Bush? You'll have to supoena every user.
If they tergeted this more towards a desktop-search type thing with better search algos than Google, this could just work.
Now I can say that I've finished downloading all the intrawebs!
Archives are good and this can be a useful service. Providing 80 select gigs on a hard drive to libraries and schools is a useful until US networks get where they should be. Their software can keep those 80 GB up to snuff at night. When you leave the cache, you ... gasp ... get the new content. In the mean time, things are much faster when it matters. Mirrored content will always be a good idea. Look at the debian distribution system, for example.
Good luck to the people at Webaroo. So long as they don't apply for stupid patents that give them an exclusive franchise to distribution systems, they are AOK.
The road warrior thing will flop, though. People are going to stay where there's a network or pay the $10. It's the one piece of live information that requires the hook up. The speed of the rest is gravy for those people.
Friends don't help friends install M$ junk.
and, of course, one of the floppies will corrupt leaving you with the rest being useless.
How long have PAR files been around?
.. not that it isn't obvious but, the whole reason this service is about to happen, and the only reason we're reading about it on Slashdot is the wow factor of saying "get the internet on your disk".
Should have they said it like "buy our temp files" it's suddenly a lot less interesting.
You can't split the internet in topics and sell it. Noone browses while restraining himself to one topic. When looking up information or researching a topic, we jump from a site to site totally unrelated to any specific subcategory of pages.
As for caching specific sites for offline viewing, well that's something IE, a free product, had for ages.
I'm surprised no one has mentioned the word 'aleph' yet.
You raise excellent points which warrant discussion.
As many have said, the "point" of the Interent (as I see it) is LIVE contact with (just about) everything.
As many of us understand, 99% of traditional media is owned by the major corps like Disney, Viacom, News Corp, etc. If this is conspiracy theory, then Jon Stewart is a tinfoil hat nut because this is all spelled out in the Daily Show's "America: the book."
Like many of you, I was attracted to the Interent because I assumed it escaped this sort of control paradaigm. I figured, heck, who would even *try* to control this much info?
These days, when I browse the top sites on Alexa for example, I see the same sort of "media mafia" tactic has overrun the web in 2006.
So what? IMO: we are all wrong. My extreme views are just as stupid as yours, however, as my grand-pappy used to say: "somewhere in the middle lies the truth". I feel that the "wackos" on all sides are CRITICAL, and that this "societal average" is the closest we will ever come to "truth". I find anything which threatens this function of the Internet as detrememntal to me, my country, and my fellow man.
Someone around here has a great sig (sorry, but I am terrible with names), something like: "the problem with wikipedia is that it only works in practice, in theory, it can't possibly work." To whomever shared this with me: right on. This is exactly how I felt about the Internet circa 1996, and the reason I am so hurt to se where it is 10 years later.
Math is math. Regular expression is regular expression. The tools are there. The future is now.
Lemme guess, they're going to do that with SQL on Rails. (If you didn't see the screencast, that's part of their April 1 demo - they did a SQL query on "the internet", and claimed to have downloaded the whole internet into tables beforehand.)
And to make some sense out of this... they plan to split the internet in sections and preserve all graphics, bells and whistles, melodies and scripts.
Wouldn't it be a lot better if they would strip the HTML, graphics and leave plain unformatted text with hyperlinks, the words being compressed using a shared dictionary of the words in all pages?
You could fit a lot more information in a lot less space that way, also eliminating most of the useless noise such as having a pretty shadow on that rounded panel.
Posting as AC because of terrible karma and don't want to waste posts: DOn't click the link, YAGL(Yet Another Goatse Link). Wow, I just invented a slashcrynom, I'm so happy.(BTW, I hate the /. karma systen)
Copyright infringement.
It's one thing to copy material in order to A) cache it on a squid server to serve it faster, B) cache it in a search engine so that people can find it, or C) copy it into your web browser cache to view it locally. It's a rather different thing to statically bundle the content up as a downloadable package and SELL it as the product of another business. I predict lawsuits. Lots of lawsuits.
Still, as long as they respect robots.txt (as Google does), I suppose it might be okay.
Then there's the problem that I can't see many people using such a product, and the expectation that as wireless spreads further, this product is solving a dwindling problem.
Is this just marketdroidal Hype, or does it more than this? http://www.gedanken.demon.co.uk/wwwoffle/
In other news, AOL, LLC announced plans to launch their own revamped "super search" engine, with data stored entirely on piles of those "FREE AOL" floppies.
Math is math. Regular expression is regular expression. The tools are there. The future is now.
This is very good solution for specific sites. For example a month ago i used a crawler to copy a net library to a folder in my hardrive.
Now when i like to read something,i just click on Index.htm and it loads in my browser.
And you don't need to update books.
Such thing a copy of wikipedia is valuable still,without any updates.
If slashdot released a torrent of all archives,i'll download it too,just for entertainment.I got a new 250Gb drive
Even if this is doable and legal, it runs entirely counter to the spirit of the Internet. The Internet on a hard disk is no longer a network, it becomes a passive entity with no possibility of interaction.
At the moment, we are seeing a return to the interactive origins of the Internet, prime examples being blogging, Wikipedia, and even Slashdot! If this projects takes off it will be harmful to interaction and will turn the Net into a glorified television.
However, I find it unlikely that Webaroo will gain currency, precisely because we have become dependent on an interactive and living Internet. When I use the Net, I want to be able to read and respond to my emails, to check my bank balance, shop online, and read the latest news. Why on earth would I want to have a static Internet on my laptop?
Phoenix, Boston, Little Rock, see a pattern?
This project was a highschool biology series of CD-ROMs, which used html/javascript on a CD (worked in all browsers, all platforms). It was a great project, except that moron gave away "samples" to so many schools the market dried up, as well as feature creep which prevented him from ever declaring the CDs gold. I suspect this project is led by this moron (or a cloned similar PHB model), and will never come to fruition.
Moral of the story is, don't let a project director hire one of his "soccer buddies" to lead a project just because his friend is unemployed. We all became that way (except for the stupid PHB who still works for the university but hasn't had a raise in 5 years... it is nearly impossible to be fired from a public university).
today is spelling optional day.
Web-a-roo! Web-a-roo!
Um.. doesn't this miss the whole point of the internet? We use things like online news (as opposed to physical media) because they're live and up-to-the-minute in ways that recorded/printed media cannot be. Surely by making static copies of it (and doesn't this violate some kind of Intellectual Property - those dreaded words - laws?!) and then removing them from their interactive state, they're just making the digital equivalent of yesterday's paper? Bad move guys.
The entire web searchable on 80GB of HD space....
,sites purposes, changing?
Does it come with weekly downloadable updates to deal with sites, pages, text, owners, administrators
And what insane alien compression technology are they using? I wasnt part of the whole roswell thing, so I don't know about them, but I know there isnt much around today that can fit that much info on 80GB... The entire Web? Even if just the info that google lists to people about sites, in just 80GB? Thats ridiculious....
Or maybe I just don't understand how little space it takes up...
All the text google shows me for one entry is, on aerage 250-500 bytes.
Every 2-4 results is 1 KB, when i type in plastic, I get 288,000,000 results.
144,000,000KB->144,000MB, 144GB, Or if every entry was only 250 bytes It would then Just be under their 80GB, and thats just sites containing plastic..
Now I do a search for sites without plastic. 18,490,000,000 results.
That implies, google, in total has entries for around 18,778,000,000 sites.
The information on each site would have to be 4.5 Bytes, to fit on 80GB of space. Thats not even enough for the URL.
Now, im sure they have special compression methods, like of course compression, and replacing commonalities like http://www./ with 2-3 sequential uncommon ascii chars that can be converetd when displayed... But using any trick in the book, The entire web on 80gb? Even the descriptions and URLs of sites?
They would make more with this magically compression techonlogy they must plan to use to do it....
Not to mention, everyday hundreds of thousands of new sites come into exsistance, and old sites go out of exsistance, domains get taken over, companies do, webpage content changes drastically including what a search lists relating to the page.
With all of the above, They are going to have an impossible time to even provide search listings. And theres mention of actually being able to access the sites? 80GB? Accurate?
Obviously a company founded and funded by people with minor awareness of the web and surrounding technologies, trying to catch onto another non exsistant bubble, and of course the people they are paying to develop that might actually know the situation arn't going to ruin their job position by telling them it won't work.
I feel bad for being so cynical, I mean, maybe they Do have a method for providing people constant updates to ensure accurace (even though they wont need the internet????), and they have a way to make Several TB(and if whole sites too, PB) worth of data fit on 80GB... But in my opinion, this isn't even a pipe dream....
Sounds like a job for ZeroSync who seem to have disappeared. Perhaps their algorithms worked just a little too well and they compressed themselves into zerospace.
File under fractal recession.
Who says that this is a new idea? ;)
Maybe their plan on keeping the service up to date is to add an RSS feed for http://./
you know, so you can search your 'web' over the web... jeez...
sig goes here!
grab the cache and run.
what?! that's an entirely different world view from here.
;) unless they start with some sort of floating starbucks barges or something...hmm...
we have so many hotspots that it's not uncommon to accidentally use someone elses home wifi
and there are five starbucks within 5 miles of here -- and I'm only a few hundred yards from the West Coast, so that limits the available locations
what about Peet's Coffee shops? got any of those?
Frankly, I could see a market for this *maybe* 10-12 years ago. It just doesn't make any sense now. The internet is not solely about static content. Also, the thimble of data provided in each pack will be underwhelming and perpetually out of date.
I mean, if I know I won't be online for a week, what stops me from just CURLing or WGETing whatever I plan on reading for the next couple of weeks? And that goes only for static content like books and articles. Everything else is cannot be simply cached.
That's quite obvious. Nobody in the Slashdot crowd have a girlfriend!
There is a certain fundamental flaw with this proposal. I'm not exactly sold on the idea that there was a clear and vivid understanding of this proposal by those who backed it. Here are my reasons. They have a product, and a product isn't really anything if they don't have a viable consumer base. Which begs the question, just who are we selling our products to? In short, I put forth the idea that there really is no consumer base. Those who would need ubiquitous access to the internet and its infinite resources would more or less need access to things that are only current, or some form of information database, thereby rendering this product as effective as an encyclopedia. So thank you, but I'll keep my Britannica volumes anyday over a homebrew assembledge of information. This might also be perceived as a viable alternative to a monthly subscription to some form of internet access but when one really sits down to think about this, the product in this specific instance becomes obselete in a matter of seconds because of the delivery of new content. We are not far from an age that has a computer with internet access in every household. The notion to sell a static copy of the past is one that will interest at a rate proportional to the installations of new computers with internet access. The bottom line is this, the proposal simply does not make sense for the reason that there is no palpable customer base for this product.
You can tell a company how later a company joined the dot-com revolution by how bad a name it had to take to find one whose domain wasn't registered. Examples include Webaroo and letsbuyit.com
What sound do people on rollercoasters make? Hint: it's not Xbox 360.
The talk about reducing some terabytes down to a few gigabytes seems to indicate that they would only put the search index on the harddrive alongside a suitable search engine. That would enable you to search a snapshot of the whole net, you would get links out of it, but you couldn't follow the links without a net connection.
This is certainly possible and might even work well. If it's of any use is another question, as you still need a net connection to actually retrieve you now know to be out there.
Subject says it all! http://i.somethingawful.com/inserts/articlepics/ph otoshop/06-03-05-software/spacemountain1.jpg
"phantom"?
Heh heh.
I've got the spirit, lose the feeling.
Just drag the little blue "e" on your desktop to a floppy drive. Then you have the Internet wherever you go, and it only takes a few K!
Alexa pulls down 1TB (after compression) of data from the web a day, and that's information they've chosen to pull: i.e., stuff that isn't link farms. The metadata they collect is about 10% of the size of each page. Every two months they donate these 100TB crawls to the Archive. With that in mind, we should all have the good laugh that someone is paying good money to bring us.
Seems like the day has come where /. sells link placement.
At least Fark has the dignity to place a notice next to these kinds of things.
why did I bother?
Engineering is the art of compromise.
Look at it from an alternate perspecitive ...
...
For most of North America, where high speed is fairly common and unmetered, this is not a good idea.
For some other parts of the world, the internet is only available in dialup, and is metered. Spending hours surfing can be very cost prohibitive.
So, if large parts of the net is available offline, I can see a market for those geographical areas, provided the cost is not prohibitive
2bits.com, Inc: Drupal, WordPress, and LAMP performance tuning.
I'm getting only 37 from 75209 (Dallas)
My turnips listen for the soft cry of your love
Webaroo has gone far beyond being a cache, they are aggregating others content into a downloadable product they sell for money.
What is the line between a cache and aggregation? And how is your ISP not "aggregating others content into a downloadable product they sell for money"?
Dot bombs are not about technically feasible ideas. They are not even about technology. They are all about putting together something that will appeal to venture capitalists. What really drove dot.bomb was that the VCs got into a feeding frenzy and all rational business plan/idea vetting went out of the window. For that to happen again means that a whole lot of people that got badly burnt, or that know someone that got badly burnt, must forget their bad experiences and get stupid and greedy again.
The last dot.bomb had a fundamentally solid foundation: widescale adoption of internet. It was all the frilly bits that really were overhyped and caused the bomb. In the new wave, we seem to be seeing all the frilly bits and no solid core. Unless there's a solid core I expect the wave will implode long before things can get to the feeding frenzy stage.
Engineering is the art of compromise.
I always fly Lufthansa whenever travelling trans-atlantic, providing you're willing to pay the WiFi premium, you get WiFi internet access for the duration of the flight.
How many dollars, euros, etc. is this Wi-Fi premium? If Webaroo can undercut the Wi-Fi premium and the prerequisite business class premium, then it has a market.
-prolly have to find vast tracts of non-copyrighted stuff, or
-bulk license a lot of stuff
-need to configure/personalize your content as part of the setup, AND
-def. have to start sniffing and analyzing what the user gravitates toward, and grab another few megabytes of that sort of stuff whenever the machine gets back near a link.
If it's just 80 gigs of hard-coded "most popular pages on the net", it's going to tank.
My turnips listen for the soft cry of your love
Invalid comparison. Internet access, like electricity or water, is a utility. Providers put a large amount of resources in developing their infrastructure, and need a way to recoup those costs. Basic economics.
The development of the Internet would've been set back a couple decades if ISPs weren't allowed to charge for their services.
And I'm about to put an Elephant on a toddler's Tricycle!
Go ahead and call me unreliable; reliable is just a synonym for predictable.
Why not have a web spider working in the background, copying files from the browser's web cache, following links on these documents, etc? This way one is likely to have a great deal of information available for searches, and it would be an automatic cache built by the user, not distributed from a vendor.
LedgerSMB: Open source Accounting/ERP
Well, Wikipedia is licensed under the GFDL so there has never been any problem downloading the database for it. There are even many different versions for mobile platforms and XP (including search functionality). And the ipod of course.
OMG! My dream is finally come true! I can finally play WoW all by myself and all the world bosses will be miiine!!
The interesting part of the story is that they claimed their algorithms could extract relevant contents (web pages) from Web. Without much thinking, I suspect that if it is true,and that if their algorithms are that inteligent to parse through the complex web contents. If they really can do that, they'll be able to do things much more exciting than what they are doing right now. Most likely what they are doing is it graps the first 20 or so hit pages from search results for a collection of selected keywords. If this is true, I won't put my money to such a startup --- there are not much creativities here.
MMmm.. Now I can get viruses WITHOUT being connected to the internet.
when someone asked if the internet will fit on a floppy?
Hell, I compressed it down to an sh one-liner:
yes 'Blah blah blah.'
How do you spell "copyright infringement"? Isn't this like taking a shelf full of books and making copies of them to save people the "trouble" (read: expense or ad impressions) of getting them themselves? I know this may have a good purpose, like providing web information to the developing world, but they've gotta see the copyright issues.
ttuttle is a rankmaniac
80GB, huh. What's that? Two, dual-layer BluRay discs. Might make a great case for the next DVD technology.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
Yeah. That 80's President sucked.
It's a good thing you didn't complain about George W. Bush, Jr. I can't even imagine the levels of terrible you would need to go into to describe him!
Firefox 2.0 - Spell Rightly.
Please, if you _must_ write funnycode, write _solid_ funnycode.
Las time I passed thru customs in London, they asked about the laptop and "do I have the Internet on there". I told him "no" but now, thanks to these dweebs, I'll have to say "Yes, I have the Internet on my laptop."
Bastards.
-Charles
Learning HOW to think is more important than learning WHAT to think.
I can see the value in this, especially if you need to look something up while on the road, but...
You could do this by yourself *better*. You don't need a million sites that you would never visit. It would be better to write some frontend to wget that caches all top 10 sites that result from a series of google searches. If you parallelize that you couldn't get it done in reasonable time, for free, and use less HD space.
This kind of reminds me of when people used to sell encyclopedias on CD rom (with lots of nifty low res videos of the moon landing), but then suddenly the internet became better than any encyclopedia.
The one thing this company does that is cool is that it *reminds me* to make a local cache of wikipedia.
i don't think so
For some reason when I heard about this all I could think of was the inevitable product to follow... the Web hard copy!
If you've got the whole internet on your laptop, make sure you also get the current hash, to ensure your download didn't corrupt anything..
Repton.
They say that only an experienced wizard can do the tengu shuffle.
would they have a version for every content restriction?
after all one could unwittingly download something they ever intended to get.
There's one thing that was not brought up: When I use the internet it's not only to download things and view pages, but it's also to communicate. Imagine you read something on /. and want to post a comment! Or you want to send an email to someone. Having the entire internet on your hardrive won't help with this... So yep, unless modifying the version you have on your hardrive magically updates the "real" internet, I don't see the point of this. Might as well just get an encyclopedia!
. . . AvantGo
/years/ . . .
http://www.avantgo.com/frontdoor/index.html
I've been able to carry cached web on my palm for
Nothing to see here, move along.
"If you have nothing to hide, you have nothing to fear." - Every fascist, ever
I had a passing idea to do something like this a bit back, but didn't do anything with it, because I really couldn't see it working.
If they're going ahead with plans to cache certain web content, they had better be lawyered up. Yasee, if you start selling packs of content created by other people for profit, without their permission, those people are going to be mighty pissed when they find out. Sure, this duplication happens every single day through web caches. The difference, though, is that the presence of such caches is well established, and you can always set up your content to not use them (to a a degree). I'm not sure how they plan to get around the whole mass-copyright-infringement angle.
Anyway, I really hope they've thought this thing through...
The University of Iowa is currently doing a similiar project: apache, a snapshot of wikipedia, and firefox on a 500gb hard drive. These hard drives are then shipped to Africa.
The 500gb hard drive was the best they could do - wikipedia (including all the media) is over 500gb.
Seems a little nuts. I'm sure there will be issues as stated with copyright violations + sites complaining about lost ad revenue. I'd imagine some sort of commercial "bot" software would be fair better..
Add a couple websites you like to a list and tell the program to cache it. Yes yes, I realize most browsers can do this for you.. only manually I guess. It'd be nice to run my list every time I'll be away from an internet connection though to keep up with minor things. (
*THIS* is why Google created "Google Desktop"
What about copyright???
How do movie pirates justify selling pirate copies of other peoples movies? In much the same way that movie theaters justify showing other peoples movies.
No. Not really. Your analogy is silly.
When ISPs sell access the content makers get paid for their content via ad revenue, sales, subscriptions etc. When these people sell web content on a hard disk the creators of that content get nothing. This is clearly a case of the worst kind of copyright infingement. Not only are they making unauthorized copies they are also selling those copies for a profit.
but a lot of content providers won't be happy about getting their ad revenue stolen.
Absolutely. I know I'd be pissed. This is not like an ISP caching content so it can be delivered more rapidly. This sounds more akin to basically scraping the Web and putting it in your own product and calling it your own.
Purely from a copyright issue, discarding ad revenue, I'm sure there are plenty of companies and individuals that would have a problem with this. If Webaroo even gets past the "burning through capital like it's 1998" phase, I would be surprised as hell if they didn't get sued for what amounts to publishing without permission.
Read the EFF's Fair Use FAQ
One of the biggest problems I see is that people clearly won't be able to check their e-mail.
The internet is such a powerful tool partly because of the semi-real-time updates.
Besides reference, what sites do you visit that don't require actually being online? Taking a snapshot of your e-mail, a news site, or a humor site isn't terribly helpful.
YHBT YHL HAND
I wonder how the /. Effect will Affect webaroo-based systems.
...is slashdot? What is the size of all the articles, replies, journal entries and assorted foo-fa-ra that makes up slashdot?
I have already founded a company that does just this, and using clever malware have installed it on all Linux machines. If you'd like to test it, open a terminal and type
wget -m google.com/search?q=cache
I used AvantGo on my old Kyocera phone years ago and it worked fine. I subscribed to several "channels" which could be news, entertainment etc. Whenever I sync'd with my PC it would update the channels. (I could also sync remotely using wireless if I had to but is not really the recommended way of using it.) The great thing is I could be somewhere and instantly browse content. I think Webaroo wants to extend this to a larger scale.
Wouldn't it be better to devote more resources to a global wireless network. While there are security flaws in both, it seems to me that have WAP's virually everywhere seems a hell of a lot easier than keeping up with the growth and changes of the internet.
To understand recursion, one must first understand recursion...
See,
http://www.wrensoft.com/forum/viewtopic.php?t=871
"Through a new technique known as Compression by Recursive Annulus Primes, huge volumes of data from the web can now be compressed into tiny index files. Using this revolutionary technology it will now be possible, for the first time, to carry around your own personal copy of the internet on a device such as high capacity USB thumb drive."
Forget about the mobile platform and lets talk about the laptop/desktop version of the product:: 1. I already have google desktop search installed- the only diff between this and google desktop is push vs pull approach. Using this tool, I am able to pull the site info and store it in my hard drive. Whats the value add by Webaroo? What if Google also gives me a feature in desktop search product, to pull any site to the local drive? And that wil simply wipe away all that Webaroo has done? 2. Wi-Fi: You dont need Webaroo product if you have a WiFi access. 3. Static data: This product would have made sense 10 years back, but now? Who cares about static data? I dont need it anyways. I am not sure of how much of business model validation that has been done for the product. What do you guys think? Am I missing something?
When they get this thing under their control, maybe they can rename the company...
But, the rest of us who don't buy into it or who buy back out... well, we'll be the CharleTONS!
(ba-dum(b)oom!!!)
Previously: "Linux... Toward the Sunrise..." Now: "Linux... Toward the-- No, now, part of Every Sunrise"
I already have google desktop search installed- the only diff between Webaroo and google desktop is push vs pull approach. Using this tool, I am able to pull the site info and store it in my hard drive. Whats the value add by Webaroo? What if Google also gives me a feature in desktop search product, to pull any site to the local drive? And that wil simply wipe away all that Webaroo has done? >>>Wi-Fi: You dont need Webaroo product if you have a WiFi access. >> Static data: This product would have made sense 10 years back, but now? Who cares about static data? I dont need it anyways. I am not sure of how much of business model validation that has been done for the product. What do you others think?
http://www.httrack.com/
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
I for one don't like my website to be copied and sold for profit without permission, and I do believe I own the copyrights to it.
They might be successful: if they served useful, general-purpose contents that could not easily lost their usefulness because they got out of date (encyclopedias, dictionaries, maps, classical literature, newspaper archives, yearbooks, Bible, etc.), or in places with (very (rare|expensive)|no) Internet connections (I think inclusions of such packs in Nicholas Negroponte's $100 server might be a good idea).
The eGranary Digital Library provides millions of digital educational resources to institutions lacking adequate Internet access. Through a process of garnering permissions, copying Web sites, and delivering them to intranet Web servers INSIDE our partner institutions in developing countries, we deliver millions of multimedia documents that can be instantly accessed by patrons over their local area networks at no cost.
http://www.widernet.org/digitallibrary/
About 10 years ago, our campus had all diskless workstations. We offered boot floppies to our students in the dorms so they could access the network. Several times a week we would have students bringing down a floppy disk and say "Can you put the internet on this disk for me?"
With all of the data downloaded in background and a beautiful front-end, response time from screen-to-screen was sub-second and the web has been catching up ever since.
The product was called PointCast.com and it was an advertising-based medium at the very start of the dot-com boom. It also had the most beautiful stock portfolio display of any product I have used prior or since. (Don't bother with the current owner of the PointCast web site. It looks like some other completely unrelated web product startup took the name.)
PointCast tanked when companies started finally funding high-speed internet connections and investors lost their faith. After all, they reasoned, the web would become so fast that caching became completely old school.
Even, so, assume that you have a developed a personal page of favorite news, search, and research links. It doesn't have to be pretty, just personally relevant. (Mine is at http://www.roomberg.com/EveryDayPages.htm.) Now assume your PC goes out to the web in background when the PC is otherwise not being used and caches zillions of pages you have linked to. For 90% of the news and blogs you then need, wait time falls to zero. Only when you click a level too deep would you finally need to return to the world wide-but-slow web. Of course Akami (or somebody) might have to develop rules about how many levels deep such a program searched or the the web might actually finally be crushed when everyone's PC tries to repetitively "spider" their favorite site. You might set your PC to fetch PoinCast updates every morning, every hour, ever fifteen minutes, or for that matter, every minute.
OMG! PointCast is back.
Live Long and Prosper - Thanks Leonard. You are missed.
this is old tech. Companies such as webwacker, onfolio, enlighter... have been doing this for years... Why are you all thinking this is new?
From the article:
Webaroo does it, he says, through "a server farm that is of Web scale" and a set of proprietary search algorithms that whittle the million gigabytes down to more manageable chunks...
I hope this thing doesn't start to update itself recur[I hope this thing doesn't start to update itself recur[I hope this thing doesn't start to update itself recur[I hope this thing doesn't start to update itself recur[I hope this thing doesn't start to update itself recur...]...]...]...]...]...
when the FBI confiscates your computer and finds some "objectionable material" in the net backup?
Can't you just set your cache folder reeeaaaaallly big and get the same effect?
Those who believe the Internet is private,
find their privates are on the Internet.
This sounds great! Hang on a minute, let me blog about it and I'll send you the CD!
Defining Statistics and Social Research
There was a Dilbert strip where the PHB asked him to download the internet, and print off a couple hard copies. Who would have guessed he was so visionary?
Stop! Dremel time!
Hahah...responding to an automatically generated compliant-letter shows just how blindly you follow Bush. Heil Bush!!
Did you not see the link to the site that generates these right above them? Did you see the comment jokingly suggest using Bush's name to make one? No of course not, you didn't even read the thread before jumping on the oppotunity to spew anti Democratic Party garbage.
The truth is all parties and politicians suck. They are the blood sucking leeches on societies wallet. We don't need politicians in office we need citizens. It's time to think outside of the party system. It has been failing us for years and is just getting worse. For the last 20 years our government has been locked up by these two childish parties. And instead of putting our foot down and saying "Enough is enough, get to fucking work already!" We stand around and get caught up in the game they feed us. We let them walk all over us. It's time for change! It's time to get rid of this electoral voting system that ignores a large percentage of the voters views and reinstate the popular vote so our voices can be heard again. This county will continue to deteriorate until we are once again a country for the people, by the people. Give us our country back or we will take it back!
(ahhh...that felt good!)
If you must!
But even worse is how the hell am I going to get even semi uptodate news? The service don't know what I am going to want to read so I have to download ALL the news before I go offline ONLY to then have only old news on my system.
Sure there are times when I wish I could visit a site without a net connection. Most notably a help site that tells me how to get my connection back.
But for the rest? You would be lugging around a shitload of data in the hope that one day you need to search something while offline and then hope that it is in the cache AND that it is still relevant.
It reminds me off those old programs we in europe had in the days of modems. Since we had to pay per minute connected it often made sense to download all your favorite pages as fast as possible and then read them offline. I for instance had it setup to get the various webcomics and such in one go so I only paid for a few minutes what would have cost me at least half an hour online.
I can still see the same for modern times. When you arrive at an hotspot your computer quickly downloads the pages you regurarly visit and gets the data so you can read them later when you might have left the hotspot.
But downloading ALL the web? No, to much data that to be transferred that you will never use anyway.
I can see how they arrived at the idea but I think that they should have stopped before they arrived at the point they decided to cache the web. Develop a program that can easily download the pages you are very likely to want to read when the laptop happens to be connected for off line reading and leave it at that.
Then again, that might not get you VC money.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
A bunch of twenty-somethings in a coffee-house, with laptops.
"Wait.. Wait.. I have an idea!"
"I got to google this.."
"no connection no net on-line news access -political -blog -news"
"It IS a new idea!" Ok, we collect news on-line for people who are not going to be connected"
"Yeah! We can even to some kind of delayed feedback thing too."
"But if there's no internet, there may be no power. Hey, how about a hardcopy option?"
At the checkout: "Um.. it's called a "newspaper", and there's one right here.". "Oh, but this is with COMPUTERS, so we will get a bite of VC!".
Ratboy
Just another "Cubible(sic) Joe" 2 17 3061
In particular, the storage of the cache is not "temporary" or "intermediate", since the entire cache is made available wholesale to the end user in permanent form.
If the cache is implemented as a web proxy, then it could be encrypted with each URL as the key, preventing people from just browsing through the cache.
Webaroo does not qualify as a "service provider" when talking about this cache, because the cache is made available offline.
Mailing hard drives, tapes, Blu-ray discs, or other removable media is just a high-latency, high-packet-size data link, with routing handled by a parcel courier. Define "offline" in such a way as to exclude sneakernet.
I downloaded it today. It seems pretty useless. Tons of dead links, bad formatting and no live data. Makes zero sense to me.
I carry a year-old download of Wikipedia on my Palm via TomeRaider, fits on a 1Gig SD card. Very convenient and searchable. For me this was more cost-efficient than to pay for a monthly palm-internet connection, but 80G is a very small percentage of the web no matter what the cost, and the main thing I would like "permanent" access to the internet for would be communication (email, etc).
Alex.
Sounds like a good application for a blu-ray disk.
The unfortunate part is that the company might make sales due to the dilbert pointy-haired boss factor. The boss reads about the internet on a disk and immediately wants it, regardless of common sense.
GG Brad Husick, considering that the flaws in his argument have already been split wide open by several people.
Rather late, as it took them ten days to reply to my email, but Webaroo's Removals page is here:
/subdir
http://www.webaroo.com/rooRemovals.html
Since Webaroo obviously doesn't mind copyright theft (as that's what they're doing with our data, unless we find out about it and take action to stop them), here's the full specifics for removal:
-----
Removing the whole site:
If you want to remove your entire site from Webaroo, you can do so by specifying it in Robots.txt file. We support the Robots exclusion specification.
To exclude your entire site:
Add the following text to your Robots.txt file:
User-Agent: WebarooBot
Disallow: /
Removing a part of the site:
To exclude a specific directory (for example, subdir), add the following text to your Robots.txt file:
User-Agent: WebarooBot
Disallow:
Our bot will first obey the first record of User-agent starting with "WebarooBot". If no such entry exists, it will obey the first record of User- agent starting with "*".
Removing a specific page:
To remove a page from all search engines, insert the following meta-tag in the section of your page:
To remove your page from Webaroo only, use the following meta-tag:
Removing successive pages using HTML meta tags:
Yes, if you want certain pages of your site not to be indexed, please use the following tag in the head of the html:
If you want us to index the page but not the outlinks from the page, use the following:
Removing a page urgently:
Remember, the above changes to robots.txt and html meta-tags will take time to be reflected in Webaroo. They will take effect during the next Webaroo crawl.
If you believe your request is urgent and cannot wait until the next time Webaroo crawls your site, please send us an email request with web page and Web Pack details for urgent removal at noarchive@webaroo.com. Before processing their request, webmasters would be requested to insert the "noarchive" meta-tag into the page's HTML code.
-----