Alexa Web Search Platform Released
Philipp Lenssen writes "Amazon's Alexa is releasing their search index (the same that powers the Wayback Machine) to developers via their new Alexa Web Search Platform. The Alexa framework is not for the weak of heart -- expect to learn how to use their C API, and expect to pay micro-amounts for requests and CPU cycles used -- but it also seems to be more powerful than the rival APIs from Yahoo and Google."
Nothing they try to hide deep down in some obscure EULA or anything. Sure, it's about collecting data, but there's a difference between collecting data, and collecting data by spying. The former is about doing it visibly, the other trying to hide it.
Besides, technically speaking, I'm not sure one should call a business model or an online service "spyware" anyway, as it's usually a term used for client-side software often piggybacking on another tool, that secretly phones home by using an internet connection.
Beware: In C++, your friends can see your privates!
The diffences are major. Google's API gives access to search results or allows you to execute searches that can already be done through a browser. With G's API you can build apps like Gizoogle and Google Rank Checker. Alexa's API goes beyond allowing users to execute search queries by giving up the content within the index. This is big news for anyone interested in building their own index or accessing content for other sites.
Someone can download billions of pages for several thousand dollars then use that to build their own search engine. Another user could be to mine the web for content such as email addresses(which would be bad). Alexa's announcement is a big shift and was bound to happen. Instead of getting crumbs from Yahoo & Google, they're giving up huge chunks of juicy data.
Ah, our old friends at Alexa ... the ones who brought us the wonderfully flawed page ranking system that is based on data fed back from their IE plugin that records what pages you visit and builds rankings out of them. A quick review of their "top ranked sites" includes advertising providers like Doubleclick, and spyware providers like Claria. Depending upon the functionality of someone's IE browser is fatally flawed.
Tired of FB/Google censorship? Visit UNCENSORED!
Why is it spyware anymore than the Google Toolbar with Pagerank on, or for that matter the fact that there are google bugs all over the internet? Alexa is not bundled with anything and is very easy to unistall (use add remove programs).
It seems some people (especially the author of the cited article) missed some very important points:
1. You have access to more than just the index - you have access to the crawled data, which is about 300 Terabyte. So, if you want to do something with the pages, you don't have to download them, you don't have to rely, that they are there - you can use the crawled data to do whatever you want.
2. The processing does not take place on your machine, but on the provided infrastructure. There is a Web-Interface, so you can administer your account, your jobs etc. You do not download any software from Alexa. You get an account on their Linux cluster and there you can compile and run your own arbritrary applications. You are able to provide these results in form of Amazon Web Services.
So, this is much more than Google, MSN or Yahoo offer, it's hard even to compare those services. Alexa is a complete different beast, and it's a huge beast.