Google Releases Web APIs
skunkeh writes "Google have released the first beta of their Web APIs package. Used in conjunction with a free license key this SOAP based web service allows developers to execute up to 1000 automated queries a day, but is currently available for non-commercial use only. The download comes with Java and .NET code examples and includes a WSDL description for use with other SOAP supporting languages." There's also a write up about uses on Userland.
http://www.soapware.org/directory/4/services/googl eApi/implementations
At the time of posting languages catered for were for AppleScript, Frontier/Radio, Perl, Python and Visual Basic. I've written a basic implementation in PHP which has yet to be added to the list - you can find it here:
http://toys.incutio.com/php/php-google-web-api.htm l
This is a very cool toy.
Other than being a really cool idea, this is a great tactical move from Google. On the one hand, by restricting the number of queries made to Google, they ensure that their APIs aren't misused/compromised, it also gives companies an initiative to purchase Google products and deploy this API (probably an unrestricted-query API) on their own network. Furthermore, an API such as this will easily muscle out any sniff of a competition from other search engine wannabes. Google has managed to do all this and yet be as compliant
to an Open Source initiative as possible. Remarkable.
DOH. And I hit submit before the good part:
Your program must include your license key with each query you submit to the Google Web APIs service.
I just had a go with this and some example output is displayed below. Basically you can do a search of their main web pages, request a cached page or use their spellchecker.
n d_Culture/History"}
e mpire.htm" ... "
Dave
$ java -cp googleapi.jar com.google.soap.search.GoogleAPIDemo XXmykeyXX search "british empire"
Parameters:
Client key = XXmykeyXX
Directive = search
Args = british empire
Google Search Results:
======================
{
TM = 0.117071
Q = "british empire"
CT = ""
TT = ""
CATs =
{
{SE="", FVN="Top/Regional/Europe/United_Kingdom/Society_a
}
Start Index = 1
End Index = 10
Estimated Total Results Number = 688000
Document Filtering = true
Estimate Correct = false
Rs =
{
[
URL = "http://www.btinternet.com/~britishempire/empire/
Title = "The British Empire"
Snippet = "| Introduction | Articles | Biographies | Timelines
| Discussio
n | Map Room | Armed Forces | Art
Directory Category = {SE="", FVN=""}
Directory Title = ""
Summary = ""
Cached Size = "5k"
Related information present = true
Host Name = ""
],
...
O'Reilly has a good article here with some code as well in both Java and Perl.
http://www.oreillynet.com/cs/weblog/view/wlg/1283
No Automated Querying
You may not send automated queries of any sort to Google's system without express permission in advance from Google. Note that "sending automated queries" includes, among other things:
- using any software which sends queries to Google to determine how a website or webpage "ranks" on Google for various queries;
- "meta-searching" Google; and
- performing "offline" searches on Google.
Now, how can I use the web API?!Note that this is not in the Google Api TOS wich you must agree to before downloading the api. But in the Google Terms of Service wich you must agree to before creating a Google account needed to use the Google Api.
Still, it's fun and i'll play with it!
Well this is what it told me:
In the future, your Google account will enable login access to all Google services, including Google Groups posting, Google AdWords, the Google Store, the Google in Your Language program, and more.
(My emphasis)
Notice the difference?
Whilst the potential of a regular Google search is large enough, when you consider the Google search modifiers, the potential becomes staggering. Imagine using the following features:
Does anyone happen to know if you can use the other sections of Google (e.g. news, images etc.)?
Is Google the best company ever or what?!
I think I speak for most when I ask if you can have your results back in the "interesting" language sets:
And the URL was broken. Here is the right one:
1 1/applescriptForGoogleApi.html
http://radio.weblogs.com/0100012/stories/2002/04/
ummm... yeah... that's partly how they do their whole weighting thing to determine hot websites for search criteria.. without that in place, you'd have to search through tons of crap to find what you want.. I regularly use the "I feel lucky" button on google, because their algorithms manage to pull up what i'm looking for first hit..
This is where things get interesting.
Companies have become happy blocking ports to restrict no-nos: messaging, newsgroups, etc.
I'm wondering how long it will be until we start seeing firewalls that can filter/block SOAP calls for the very reasons you mention. SOAP just forces network admins to move up from ports and protocals to sniffing HTTP requests to keep people from having too much fun.
Enjoy it while it lasts.
A speech...
Ummmmm. Ok, check this out.
/. we have an article about Google releasing their SOAP 1.1 API followed immediately by an article from a guy that set up a spambot trap on his web site, and in the margin a poll about giving spammers what they deserve. Putting 2 and 2 and 2 together, I got 4, popped open a google box and started playing.
This morning on
All I did was ask google to search for "mailto" and "@msn.com" and lo and behold, she spit back 111,000 hits - hits that contain what look like legit email addresses IN THE THREE LINE SUMMARIES.
The point is, now that google can be automated, what's to stop spammers from SOAPing their way into Google to do their harvesting? Would there be any point over what they're doing now? It might be cheaper, because you only have to run over the google results not the whole sites and since Google caches pages, you can even grab addresses from the past, somewhat.
IT ALSO DEFEATS SPAMBOT TRAPS.
Doesn't this give spammers whole new avenues to exploit?
Worse, are webmasters going to have to put a halt to Google crawls?
"Lawyers are for sucks."
- Doug McKenzie
So, I guess
public void Pigeon()
is what makes them crap on your shoulder?
www.lucernesys.comHorizon: Calendar-based personal finance
OK, your script parses Google's HTML output today, but what about a year from now when Google changes its output, to say, XHTML or plain text or something. How well will your script work then? Although the Google API could change tommorow like some companies' , in general APIs are more stable. I haven't looked at their API, but I'm guessing it's also easier to develop against their API, and it should be less processor- and network-intensive.