Alexa Web Search Platform Released
Philipp Lenssen writes "Amazon's Alexa is releasing their search index (the same that powers the Wayback Machine) to developers via their new Alexa Web Search Platform. The Alexa framework is not for the weak of heart -- expect to learn how to use their C API, and expect to pay micro-amounts for requests and CPU cycles used -- but it also seems to be more powerful than the rival APIs from Yahoo and Google."
How much is a micro-amount? And are the additional features worth it?
Alexa is notorious for spyware. Use Ad-Aware to remove Alexa if you have Alexa installed. Programmers: I will boycott all Alexa-sponsored products and label them as spyware in turn if you use this "API."
Google's APIs are better.
I'm not Seth Finkelstein. I still speak the truth.
There is something about that word that just bothers me... maybe it is all the porn sites out there who advertise their "micro" monthly cost.
LINUX ONLINE POKER: Linux Poker
From TFWS:
$1 per CPU hour ($.50 for unused hours)
$1 per GB/year
$1 per 50GB processed
$1 per GB downloaded
and $1 for every 4000 user requests.
This is just for search service, right?
And how do these prices relate to similar services?
Is the Alexa toolbar that gathers a lot of their data still considered spyware? If so, do I really want to use an API that is supported by spyware?
Bradley Holt
For those who prefer "other" languages, they provide an app that (true to unix best practices) uses stdin/stdout for communicating with other programs:
The Data Retrieval API is written in C, so it may be natural for users to develop C applications against this API. However, the Platform features a utility named awsp_cat. This utility reads CIDs from stdin and writes the raw content to stdout. Users may develop applications in arbitrary programming languages to process the awsp_cat output.
Perl developers would be able to wrap this into their existing codebase in no time, assuming they want to pay the fees.
Video Phone Blogs send video messages straight to the web.
If I were Google, I would think about partnering with Amazon, or trying to buy them, I'm sure the idea would strike fear in the heart of many.
Black Gold, Texas Tea. Second time in an hour I've seen this term as it relates to a technology company. The first was an optical networking company that is getting into Oil and Gas.
Before arguing the price for a search, I would question the value of the data itself.
What's your opinion about Alexa ranks? Reliable? IMHO, there is too few users of the Alexa toolbar. It is also quite biased (IE, Windows). So except maybe for the top 30,000 websites, I'm not sure about the reliability of the stats.
Million Dollar Screenshot
As part of the package, it appears the AWSP offers ssh access to the Alexa cluster where you can write arbitrary C code.
That seems a little dangerous, doesn't it?
'Alexa will not be held responsible for the loss or theft of information in the event of a security breach.' from: http://websearch.alexa.com/docs/faqs.html#security
Man, I would hate to see who or what is held responsible.
He who knows best knows how little he knows. - Thomas Jefferson
The diffences are major. Google's API gives access to search results or allows you to execute searches that can already be done through a browser. With G's API you can build apps like Gizoogle and Google Rank Checker. Alexa's API goes beyond allowing users to execute search queries by giving up the content within the index. This is big news for anyone interested in building their own index or accessing content for other sites.
Someone can download billions of pages for several thousand dollars then use that to build their own search engine. Another user could be to mine the web for content such as email addresses(which would be bad). Alexa's announcement is a big shift and was bound to happen. Instead of getting crumbs from Yahoo & Google, they're giving up huge chunks of juicy data.
It's a script that analyzes your web surfing. It IS spyware, except that you install it on purpose. In that regard, it's not spyware at all. It's lookware. Disclaimer: I actually work at Amazon. I (have to) use a modified version of Alexa (and IE, ugh) every day for my job. Other than javascript conflicts that make some web pages slow, it works as advertised. In that it looks at and analyzes your browsing and reports it to Amazon, which in this case happens to be on the LAN.
--The universe will not be altered by forum threads, even those which are very wry. --Tycho Brahe (Penny Arcade)
Ah, our old friends at Alexa ... the ones who brought us the wonderfully flawed page ranking system that is based on data fed back from their IE plugin that records what pages you visit and builds rankings out of them. A quick review of their "top ranked sites" includes advertising providers like Doubleclick, and spyware providers like Claria. Depending upon the functionality of someone's IE browser is fatally flawed.
Tired of FB/Google censorship? Visit UNCENSORED!
Why is it spyware anymore than the Google Toolbar with Pagerank on, or for that matter the fact that there are google bugs all over the internet? Alexa is not bundled with anything and is very easy to unistall (use add remove programs).
It seems some people (especially the author of the cited article) missed some very important points:
1. You have access to more than just the index - you have access to the crawled data, which is about 300 Terabyte. So, if you want to do something with the pages, you don't have to download them, you don't have to rely, that they are there - you can use the crawled data to do whatever you want.
2. The processing does not take place on your machine, but on the provided infrastructure. There is a Web-Interface, so you can administer your account, your jobs etc. You do not download any software from Alexa. You get an account on their Linux cluster and there you can compile and run your own arbritrary applications. You are able to provide these results in form of Amazon Web Services.
So, this is much more than Google, MSN or Yahoo offer, it's hard even to compare those services. Alexa is a complete different beast, and it's a huge beast.The Internet Archive gets Alexa's old backup tapes of the web crawl and uses them to load up the Archive with page copies. The indexing systems are completely different. The Archive barely has an index.
"you have to do indexing and search on your own"
Not strictly true, you have to do *ranking* on your own. Reading the docs it does let you reduce the document set, just not rank the finished result set. So you can filter the result set down to the matching documents, but which is the most important? Your algo decides.
Google and Yahoo give you finished results but only ones ranked by their own algorithms and then only the first 1000 result. Even then it's only 5000 query max for Yahoo and 1000 max for Google. Pretty feeble, I've used Yahoo Image API on a play site and maxed it out just from random blog traffic.
http://www.google.com/search?client=opera&rls=en&q =unix+permissions&sourceid=opera&ie=utf-8&oe=utf-8
So I suppose you'll be running arbitrary binaries somebody gives you and hope chmod does the trick?