Slashdot Mirror


Alexa Web Search Platform Released

Philipp Lenssen writes "Amazon's Alexa is releasing their search index (the same that powers the Wayback Machine) to developers via their new Alexa Web Search Platform. The Alexa framework is not for the weak of heart -- expect to learn how to use their C API, and expect to pay micro-amounts for requests and CPU cycles used -- but it also seems to be more powerful than the rival APIs from Yahoo and Google."

63 comments

  1. Pay? by op12 · · Score: 3, Funny

    How much is a micro-amount? And are the additional features worth it?

    1. Re:Pay? by radical_dementia · · Score: 5, Informative

      One dollar per CPU hour consumed. $1 per gig of storage used. $1 per 50 gigs of data processed. $1 per gig of data uploaded (if you are putting your new service up on their platform).

    2. Re:Pay? by op12 · · Score: 1, Offtopic

      Whoops, should have RTFA. But I should blend right in here :)

    3. Re:Pay? by Tibor+the+Hun · · Score: 4, Funny

      hi,
      i'll give you a friendly piece of advice.
      You are under no circumstances to read TFA before making at least one post. It's fine to read it, but you must make at least one wildass guess, and pretend to know what it's talking about.

      Second, even if TFA did answer your question, you should again, under no circumstances be apologetic.

      Finally, welcome to /

      --
      If you don't know what AltaVista is (was), get off my lawn.
    4. Re:Pay? by glinden · · Score: 1

      The Price Guide has the full details.

    5. Re:Pay? by op12 · · Score: 1

      And to think all this time I was at / .

      How embarassing.

    6. Re:Pay? by crazyjimmy · · Score: 1

      You're wrong! you're absolutely wrong! if you RTFA-- ...wait... nevermind

    7. Re:Pay? by Tibor+the+Hun · · Score: 1

      well, i won't point it out to anyone:)
      (it's like when you're good friend with a guy named Fleischer. it's OK to call him Fleisch)

      --
      If you don't know what AltaVista is (was), get off my lawn.
  2. Alexa? Nope. by Seth+Finklestein · · Score: 4, Informative

    Alexa is notorious for spyware. Use Ad-Aware to remove Alexa if you have Alexa installed. Programmers: I will boycott all Alexa-sponsored products and label them as spyware in turn if you use this "API."

    Google's APIs are better.

    --
    I'm not Seth Finkelstein. I still speak the truth.
    1. Re:Alexa? Nope. by Anonymous Coward · · Score: 0

      Google's API may or may not be better, but it's really only for personal use. Read the TOS. You can't use Google API to create your own niche search engine. You'd have to partner up with them, and it probably costs a lot.

    2. Re:Alexa? Nope. by Antiocheian · · Score: 1

      Programmers: I will boycott all Alexa-sponsored products

      Oh, don't be so harsh, we are shuddering.

    3. Re:Alexa? Nope. by Anonymous Coward · · Score: 0

      Oh no, Jewey Jewenstein is boycotting Alexa-sponsored products.

  3. "Micro-amount" by matr0x_x · · Score: 1

    There is something about that word that just bothers me... maybe it is all the porn sites out there who advertise their "micro" monthly cost.

    --
    LINUX ONLINE POKER: Linux Poker
    1. Re:"Micro-amount" by antifoidulus · · Score: 1

      Better than other things being "micro" on the site I would suppose...

    2. Re:"Micro-amount" by sd_diamond · · Score: 1

      There is something about that word that just bothers me... maybe it is all the porn sites out there who advertise their "micro" monthly cost.

      Yeah, I hate those sites.

      I mean, based on what I've heard.

  4. Price by 42Penguins · · Score: 4, Informative

    From TFWS:
    $1 per CPU hour ($.50 for unused hours)
    $1 per GB/year
    $1 per 50GB processed
    $1 per GB downloaded
    and $1 for every 4000 user requests.

    This is just for search service, right?
    And how do these prices relate to similar services?

    1. Re:Price by mparaz · · Score: 3, Informative

      Google, Yahoo, and MSN APIs are free but they do the searching for you and they have caps. Since you are paying Alexa, you can use it as much as you can pay. And, this new service does not offer search on it on its own. The Alexa Search Platform gives you only archive access - you have to do indexing and search on your own.

    2. Re:Price by Saxophonist · · Score: 1

      The most obvious difference between the price of the Alexa beta and the Google Web API beta is, well, Google's is free as in beer, to a certain point (1000 queries per day).

      Does anyone have any reasons that Alexa's API is better than, say, Google's?

    3. Re:Price by Savantissimo · · Score: 1
      "this new service does not offer search on it on its own"

      It does provide searching:
      Alexa Web Platform User Guide > Search > Criteria > Overview

      Search is an important part of the Platform. Every document in the Data Store is indexed and included in the Platform's search engine, and this engine forms the backbone of all SearchBased collections.


      Also, the crawl data includes pictures and movies, and the search engine metadata provides this useful field:
      CRITERIA,SEARCH FIELD = Adult content, Porn

      Could be popular.
      --
      "Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
    4. Re:Price by Urusai · · Score: 1

      Are you then actually paying Alexa to do their web crawling for them?

    5. Re:Price by samantha · · Score: 1

      I don't see any very good explanation of exactly how there system and aps working on it works. What I do read worries me. It looks like you need to guess upfront how much time your stuff will take against a partially opaque infrasturcture and scheduling model. If you overguess you get billed half the normal rate for the overage. no info I saw on what happens if you underguess. But it looks like your app is subject to being yanked. This reminds me of ancient batch processing days. I won't be writing stuff against an opaque system that will end up charging me not very easily predictable amounts. And what is this nonsense about interactive vs. compute nodes? Do I need to reserve time on each if I deploy an interactive app?

    6. Re:Price by mparaz · · Score: 1

      Thanks Savantissimo and NigelJohnstone - the docs are quite dense and I didn't fully understand the difference.

    7. Re:Price by mparaz · · Score: 1

      Yes, that's a way of putting it... Crawling and indexing.

  5. Spyware? by mysqlrocks · · Score: 1

    Is the Alexa toolbar that gathers a lot of their data still considered spyware? If so, do I really want to use an API that is supported by spyware?

    1. Re:Spyware? by trollable · · Score: 3, Interesting

      That depends on your definition of what spyware is.
      If you mean collecting data, then yes Alexa does it.
      If you mean collecting personnal data, I don't think the toolbar does it.
      Then what about Google? With AdSense running (almost) everywhere + your unique eternal Google ID, they surely collect a lot of data too. And with Google Analytics, they have also a lot of info.
      So the question becomes: Is Google AdSense spyware?

    2. Re:Spyware? by Jugalator · · Score: 4, Insightful
      I wouldn't call it spyware, as it doesn't feel to me like it's more spying than what a tourist does with a camera when visiting a country. It's all upfront, nothing hidden, much like Google's upfront privacy policy in clear text:
      Google collects personal information when you register for a Google service or otherwise voluntarily provide such information. We may combine personal information collected from you with information from other Google services or third parties to provide a better user experience, including customizing content for you.

      ...

      Google's servers automatically record information when you visit our website or use some of our products, including the URL, IP address, browser type and language, and the date and time of your request.


      Nothing they try to hide deep down in some obscure EULA or anything. Sure, it's about collecting data, but there's a difference between collecting data, and collecting data by spying. The former is about doing it visibly, the other trying to hide it.

      Besides, technically speaking, I'm not sure one should call a business model or an online service "spyware" anyway, as it's usually a term used for client-side software often piggybacking on another tool, that secretly phones home by using an internet connection.
      --
      Beware: In C++, your friends can see your privates!
    3. Re:Spyware? by m50d · · Score: 1
      So the question becomes: Is Google AdSense spyware?

      God yes. However, slashdot loves google, so you will hear people explaining why spyware's actually a good thing in this case.

      --
      I am trolling
    4. Re:Spyware? by grazzy · · Score: 1

      Alexa provides the only global public ranking for websites, to me the data that alexa gives me for free unvaluable. Sure, I dont use their toolbar (firefox), and I dont find it very useful. They're giving something back. Thats not something you can say about the data google collects..

    5. Re:Spyware? by zlogic · · Score: 1

      If it is spyware, it's a useful one. Kinda like using a keylogger on your own machine to avoid losing that 10 Kb unsaved Word document.
      Search history is great - I can see what I was searching for a month ago and vaguely remember what I was doing that day, what I was thinking about etc.

    6. Re:Spyware? by trollable · · Score: 1

      "We may combine personal information collected from you with information from other Google services or third parties to provide a better user experience, including customizing content for you."

      To summarize: they can do what ever they want. But you're right: it is not hidden.

      there's a difference between collecting data, and collecting data by spying. The former is about doing it visibly, the other trying to hide it.

      Is displaying an ad something visible? You know they record every click (since they will be audited). You know they count the impressions. You don't know what they record.

  6. C not required (kinda) by ProfaneBaby · · Score: 4, Informative

    For those who prefer "other" languages, they provide an app that (true to unix best practices) uses stdin/stdout for communicating with other programs:

    The Data Retrieval API is written in C, so it may be natural for users to develop C applications against this API. However, the Platform features a utility named awsp_cat. This utility reads CIDs from stdin and writes the raw content to stdout. Users may develop applications in arbitrary programming languages to process the awsp_cat output.


    Perl developers would be able to wrap this into their existing codebase in no time, assuming they want to pay the fees.

    --
    Video Phone Blogs send video messages straight to the web.
  7. Amazon as a serious concurrent to Google? by Anonymous Coward · · Score: 1, Funny

    If I were Google, I would think about partnering with Amazon, or trying to buy them, I'm sure the idea would strike fear in the heart of many.

    1. Re:Amazon as a serious concurrent to Google? by Saxophonist · · Score: 1
      If I were Google, I would think about partnering with Amazon, or trying to buy them, I'm sure the idea would strike fear in the heart of many.

      That certainly would give the Google Books project a different twist... It might go away entirely, given that Google would be in essence competing with itself then, giving users a reason not to buy books.

      Not that any of this would happen, of course.

    2. Re:Amazon as a serious concurrent to Google? by klept · · Score: 1

      Now why would Google want Amazan. The company probably has one of the largest retained deficits in the world. It is and was amazing how many suckers Bozo got to invest in his company. Supposedly they are making a profit now; yeah 1c a share. I think Amazan is great to buy books from. After all they are selling most of their items below cost. Great for the customer. But as an investment? Forget it

  8. Re:Black Gold, Texas Tea. by xoip · · Score: 1

    Black Gold, Texas Tea. Second time in an hour I've seen this term as it relates to a technology company. The first was an optical networking company that is getting into Oil and Gas.

  9. Data Value by trollable · · Score: 3, Informative

    Before arguing the price for a search, I would question the value of the data itself.
    What's your opinion about Alexa ranks? Reliable? IMHO, there is too few users of the Alexa toolbar. It is also quite biased (IE, Windows). So except maybe for the top 30,000 websites, I'm not sure about the reliability of the stats.

    1. Re:Data Value by trollable · · Score: 1

      Edit: TFA is about the index, not the ranks...
      Anyway the question remains: how good is the evaluation function of the search engine?

    2. Re:Data Value by Anonymous Coward · · Score: 0
      Reliable? IMHO, there is too few users of the Alexa toolbar. It is also quite biased (IE, Windows). So except maybe for the top 30,000 websites, I'm not sure about the reliability of the stats.

      Um, the reasons you cite also apply to the top 30,000 websites. Do you really believe that MSN is more popular than Google?

    3. Re:Data Value by Anonymous Coward · · Score: 0

      I have used them to check sites ranking and traffic.

      I am not sure if it is totally accurate, but when a site rank is 100,000 on alexa
      and another is 1,000,000 I think it is accurate enough that the first much
      more popular than the second. telling the difference between 1 million and
      2million would be pretty weak.

    4. Re:Data Value by trollable · · Score: 1

      the reasons you cite also apply to the top 30,000 websites.

      No, only the bias applies. For the top 30,000 websites, I think the daily sample is big enough to have at least a bit of meaning.

      Do you really believe that MSN is more popular than Google?

      Among the people that use the Alexa toolbar? yes. But of course, Alexa users are not representative of the internet population.

  10. Shell access? Arbitrary C code? by glinden · · Score: 4, Interesting

    As part of the package, it appears the AWSP offers ssh access to the Alexa cluster where you can write arbitrary C code.

    That seems a little dangerous, doesn't it?

  11. Who is responsible for a security breach? by digitaldc · · Score: 4, Informative

    'Alexa will not be held responsible for the loss or theft of information in the event of a security breach.' from: http://websearch.alexa.com/docs/faqs.html#security

    Man, I would hate to see who or what is held responsible.

    --
    He who knows best knows how little he knows. - Thomas Jefferson
  12. Google API vs Alexa API by mcguyver · · Score: 5, Insightful

    The diffences are major. Google's API gives access to search results or allows you to execute searches that can already be done through a browser. With G's API you can build apps like Gizoogle and Google Rank Checker. Alexa's API goes beyond allowing users to execute search queries by giving up the content within the index. This is big news for anyone interested in building their own index or accessing content for other sites.

    Someone can download billions of pages for several thousand dollars then use that to build their own search engine. Another user could be to mine the web for content such as email addresses(which would be bad). Alexa's announcement is a big shift and was bound to happen. Instead of getting crumbs from Yahoo & Google, they're giving up huge chunks of juicy data.

  13. It's lookware. by sycomonkey · · Score: 1

    It's a script that analyzes your web surfing. It IS spyware, except that you install it on purpose. In that regard, it's not spyware at all. It's lookware. Disclaimer: I actually work at Amazon. I (have to) use a modified version of Alexa (and IE, ugh) every day for my job. Other than javascript conflicts that make some web pages slow, it works as advertised. In that it looks at and analyzes your browsing and reports it to Amazon, which in this case happens to be on the LAN.

    --
    --The universe will not be altered by forum threads, even those which are very wry. --Tycho Brahe (Penny Arcade)
    1. Re:It's lookware. by Anonymous Coward · · Score: 0

      Get back to work! I know where you sit!

  14. Our favorite flawed rankings by IGnatius+T+Foobar · · Score: 3, Insightful

    Ah, our old friends at Alexa ... the ones who brought us the wonderfully flawed page ranking system that is based on data fed back from their IE plugin that records what pages you visit and builds rankings out of them. A quick review of their "top ranked sites" includes advertising providers like Doubleclick, and spyware providers like Claria. Depending upon the functionality of someone's IE browser is fatally flawed.

    --
    Tired of FB/Google censorship? Visit UNCENSORED!
    1. Re:Our favorite flawed rankings by merreborn · · Score: 1

      Flawed? I promise you doubleclick receives more GETs every day than 99% of the net.

      Their rankings aren't flawed. They just don't represent what you want them to.

      It's a raw count of GETs/POSTs, which includes pop-up advertising and such. It's not a ranking based on 'popularity'.

  15. What is the definition of spyware by MushMouth · · Score: 3, Insightful

    Why is it spyware anymore than the Google Toolbar with Pagerank on, or for that matter the fact that there are google bugs all over the internet? Alexa is not bundled with anything and is very easy to unistall (use add remove programs).

    1. Re:What is the definition of spyware by Anonymous Coward · · Score: 0

      Google's a search engine. Alexa's a spyware company.

    2. Re:What is the definition of spyware by T.Hobbes · · Score: 2, Informative

      Alexa
      - Tricks users into installing its software, or installs itself without permission
      - Actively tries to stop users from uninstalling it, forcing people to use a third-party app to remove it (Ad-Aware, etc.)
      - Tracks users

      The first two make it scumware, the last makes it spyware. Google toolbar does track users, but warns them before doing so and only installs when users want it installed.

    3. Re:What is the definition of spyware by MushMouth · · Score: 2, Insightful

      I'm, sorry,, but how does alexa "trick" people into installing it? Infact I was forced to close their privacy policy to install it, which made it quite clear that it would track my net usage. Since Alexa is uninstalled using "Add/Remove Programs" how is a third party app needed. The google toolbar with pagerank on tracks users in exactly the same way. (also upfront about it.)

    4. Re:What is the definition of spyware by T.Hobbes · · Score: 1

      Looks like I was wrong. I'm so used to seeing Alexa in AdAware logs that I lumped it in with all the other scumware out there. So far as I can tell, there have been two Alexa products: one that is bundled with IE as the 'Related Sites' feature (a Reg key that AdAware detects & removes, but which is reinstalled when the user repairs/reinstalles IE) and the toolbar. The toolbar seems well-behaved from all I've read.

  16. More than just an index by rca66 · · Score: 5, Insightful

    It seems some people (especially the author of the cited article) missed some very important points:

    1. You have access to more than just the index - you have access to the crawled data, which is about 300 Terabyte. So, if you want to do something with the pages, you don't have to download them, you don't have to rely, that they are there - you can use the crawled data to do whatever you want.

    2. The processing does not take place on your machine, but on the provided infrastructure. There is a Web-Interface, so you can administer your account, your jobs etc. You do not download any software from Alexa. You get an account on their Linux cluster and there you can compile and run your own arbritrary applications. You are able to provide these results in form of Amazon Web Services.

    So, this is much more than Google, MSN or Yahoo offer, it's hard even to compare those services. Alexa is a complete different beast, and it's a huge beast.
  17. Alexa's index does not drive the Internet Archive by Animats · · Score: 1

    The Internet Archive gets Alexa's old backup tapes of the web crawl and uses them to load up the Archive with page copies. The indexing systems are completely different. The Archive barely has an index.

  18. Google & Yahoo API's are Results not Data by NigelJohnstone · · Score: 1

    "you have to do indexing and search on your own"

    Not strictly true, you have to do *ranking* on your own. Reading the docs it does let you reduce the document set, just not rank the finished result set. So you can filter the result set down to the matching documents, but which is the most important? Your algo decides.

    Google and Yahoo give you finished results but only ones ranked by their own algorithms and then only the first 1000 result. Even then it's only 5000 query max for Yahoo and 1000 max for Google. Pretty feeble, I've used Yahoo Image API on a play site and maxed it out just from random blog traffic.

  19. Re:Shell access? Arbitrary C code? by Anonymous Coward · · Score: 0
    I saw your UID and went Wa??? Since looking through your posting history, I can see you've a chubby for google. Let me guess - lots of inorganic drugs?

    http://www.google.com/search?client=opera&rls=en&q =unix+permissions&sourceid=opera&ie=utf-8&oe=utf-8

  20. Re:Shell access? Arbitrary C code? by Anonymous Coward · · Score: 0

    So I suppose you'll be running arbitrary binaries somebody gives you and hope chmod does the trick?