Google to Offer API
philipx writes "From the ruby-talk archives here's a little interesting snippet from a post you have to check out:
"Here at Google, we're about to start offering an API to our
search-engine, so that people can programmatically use Google through
a clean and clearly defined interface, rather than have to resort to
parsing HTML." It goes on talking about SOAP and I think this is utterly cool."
The only problem I can see with this is that there was a recent thread on here about Google blocking a lump of IP addresses as someone in there was automatically querying way too often and affecting their load.
With the exposed API I could see, by malice or sheer accident, floods of queries coming in...
Text ads... Open standards for content distribution... If only certain other sites would follow...
ok then your [sic] infringing on my copyright! Could you as [sic] me next time before STEALING my comments for your own?
So how useful might that API be if you can't do anything with it...
If you look at that snippet of Ruby code there you can see that there is a field for a Key of some sort. I'm assuming google will sell you this service and provide you with a key in which you would use it. I know absolutely nothing about ruby (other than it's name) though this is the first thing that came to mind when I saw that code.
"Survival of the fittest Max, and we've got the fucking gun!" - Pi
The first page I visit every morning
---
The following is the preliminary code that a particular Google sysadmin (ian@) is trying out. He'd prefer to have a single WSDL file do all of the configure (from Google's end to client), but he first needs to get some advice from an experienced Ruby hacker.
Also, let's keep in mind that this API will actually be decreasing Google pageviews and hits, which will in turn make their AdWords, AdWordsSelect, and textads less effective. So, it's our duty to continue to support Google and show them that the free/open source software people are behind them 100%. We know that Teoma just doesn't deliver, and Google's already got 3 billion pages indexed and cached.
Support Google today, because they're the future of information indexing on the Web!
--- begin code ---
#!/usr/bin/ruby
require 'soap/driver'
endpoint = 'http://api-ab.google.com/search/beta2'
ns = 'urn:GoogleSearch'
key = 'xxxxxxxxxxxxxxx'
service = 'file:GoogleSearch.wsdl'
query = ARGV.shift || 'foo'
soap = SOAP::Driver.new(nil, nil, ns, endpoint)
# uncomment the next line to dump the traffic on the wire
#
#soap.setWireDumpDev(STDERR)
soap.addMethodWithSOAPAction('doGoogleSearch', ns, 'key', 'q', 'start',
'maxResults', 'filter', 'restrict',
'safeSearch', 'lr', 'ie', 'oe')
r = soap.doGoogleSearch(key, query, 0, 10, false, nil, false, nil,
'latin1', 'latin1')
printf "Estimated number of results is %d.\n", r.estimatedTotalResultsCount
printf "Your query took %6f seconds.\n", r.searchTime
I havent tried to get it to work yet, due to not having ruby installed, but does this imply some sort of subscription service?
Possibly a new way for them to raise revenue? Im assuming that the bold line means the authors key has been blanked out so other people cant abuse this service for free?
Lameness filter encountered. Post aborted! Reason: Too much repetition. :/
The problem with slashdot is that most of its users were bullied and stuffed into lockers as kids!
It's not MSDN and MSN.
I'm curious as to whether people would actually want such functionality from MSDN. It's one thing to be able to do a Google search from a function call and get the results back as XML but do people want API docs and technical articles retrieved via getArticle() and getAPI() webmethods?
One place where it might be useful however is KnowledgeBase articles. Perhaps a web service that retrieves a KB article given the Q number (e.g. Q123456) might be useful.
Disclaimer: This post is my opinion on doesnot reflect the thoughts, strategies, intentions or opinions of my employer.
Last year Google temporarily had an XML interface available using a query like: http://www.google.com/xml?q=slashdot
Of course, now it's just forbidden. I am surprised they would go back to such a service, it would seem to wind up losing revenue for them depending upon whether or not people are good about passing along whatever Ad-words Google returns. They could expect the traffic to be low enough to not matter compared to the continued word-of-mouth benefit. Or access to the SOAP interface could be offered as a subscription model (pure speculation on my part).
-Robert
If you run the Ruby script, as is, the result is thus:
#: Exception from service object: Invalid authorization key: xxxxxxxxxxxxxxx (SOAP::FaultError)
If somebody starts abusing a particular key, it's a no-brainer for Google to shut the key off.
If you're not part of the solution, you're part of the precipitate.
CPAN already contains the WWW::Search API to many search engines (including Google until [I am told] they requested it be removed). Yes, internally, it works by parsing HTML, but it exports a (Perl) API.
Rather than making the API something ya gotta pay for, couldn't they simply put it into the terms of service that the ads have to be shown in any software that uses the API? They could possibly offer different types of ads(text, pictures, etc.) so that you could even develop a text based app to use it and still stay within the terms of service. Have a nice little "Report a program not following the terms of service" link on the main page, and have all those people who love google help them out by reporting any programs they find that don't show the ads. Oh, and then also offer a pay-for service if they want so that the program dosen't have to show the ads.
This is not a complaint. I just want to point out that the whole Google Concept is built on a subtle contradicitons or, dare I say, hypocracy. From the User Agreement:
The search results that appear from Google's indices are indexed by Google's automated machinery and computers
The User Agreement precludes you from automatically querying their site:
You may not send automated queries of any sort to Google's system without express permission in advance from Google.
The google agreement demands express permission to automatically scan its site, while Google assumes the permission to index all sites on the net.
Google also pretty much make the claim that if it is on the Internet they will index it. Their terms of service states that the only way not to have a site indexed is remove it from the net:
For each web site reflected in Google's indices, if either (i) a site owner restricts access to his or her web site or (ii) a site is taken down from the web, then, upon receipt of a request by the site owner or a third party in the second instance, Google would consider on a case-by-case basis requests to remove the link to that site from its indices.
I think Google is providing a great service, but I hope you can see the subtle contradictions in their product. They basically are saying that anything on the web is fair game for Google. Yet, Google is on the web, it is not fair game for other organizations. This is a very blatant double standard.
Google is a derivative work. The product model of Google is to determine expert sites by aggregating the link lists on other expert sites. In other words, they are taking other people's work, aggregating it and providing the results. Google's aggregation program is a derivative work. Not only that, they fail to give any compensation to the expert sites.
As for the issue of intergallactic karma, they actually expect the expert site to pay for the bandwidth needed by Google to aggregate the site. They then use this information to draw human traffic from the expert site.
Again, to the Google worshippers, I am not complaining or flaming Google, but simply pointing out a logical contradiction. Jack's Expert Site is harmed by Google in two ways: The googlebots take up a great deal of bandwidth that Jack pays for. Google then uses this information to draw actual human traffic from Jack's Expert Site. From this vantage Google is a big guy stomping on the small guy. When Microsoft does this type of stuff, we call it evil.