Re:Google, Books and the internet..
on
Reining in Google
·
· Score: 1
All of these are excellent points.
If Google is convinced this is in the best interests of authors, they should provide them with an opt-in. This could be good or bad for books in different sectors.
Even better, the search engines and publishing houses should get together and decide on some standard metadata format (akin to robots.txt) by which the copyright owners determine whether, and how much, their book can be indexed. Google today - who knows tomorrow? Does Google think it is fair that authors have to individually contact each company that does this, to request their book is excluded?
Is it OK if I stick a notice up in the town hall saying I am going to burgle everyone in town, but people who contact me will be excluded?
This problem has been around awhile, and is part of the larger problem of search engines filtering out duplicate content. Great for the users, but it can be a real problem for site owners suffering from plagiarism or content theft. There's some information at the Copyscape plagiarism search service about what you can do about it.
Yahoo's biggest step to getting their mojo back was their release of the Yahoo Web Services, inviting the hacker community to build applications around their search technologies. Yahoo have gone further than the Google Web APIs, providing access to image, news, video and local search as well as the web search that Google offers.
Then again, we're yet to see the sort of buzz around these APIs that Google was able to muster. Where are the Yahoo equivalents of GoogleBrowser, Googlism and GoogleAlert? Guess there's still something more emotionally exciting about Google, at least for now...
"Sometimes the target page will win, sometimes the redirect script will win. Specifically, if the PageRank of the target page is lower that the PageRank of the hijacking page, it's most likely that the target page will drop out of the SERPs"
This means that you can't reliably hijack the page unless you have a higher PR than it. But if you have a higher PR than that page then could just as well copy its content, then wait till you're spidered, then substitute for whatever you want.
In other words, this is nothing more than another way to exploit two existing problems: (a) that you can steal anyone's content on the web (though see this for a way to detect it) and (b) you can cloak your site for the search engines (though I'm sure they notice that too).
In summary, there is nothing new in this whatsoever.
Google is also the only search engine with an API, giving 3rd party developers the chance to add value to their service without violating any terms. I think they deserve serious Kudos for that and it's also a smart move - they get to pick up some great ideas fro third parties like Google Alert for tracking the web, CapeMail to get results by email, GARBO for browsing related pages
and Copyscape for finding plagiarism.
Until the other search engines release competing APIs (hopefully with a higher than 1000 query limit), Google will remain top dog from the POV of/. types.
One reason why this is in the interest of big old universities like Harvard is that it will make it much easier to detect plagiarism in students' essays. If published books were included in Google's index, a plagiarism detection service like Copyscape would also be able to check whether content was lifted from printed material, as well as from the web.
Does this mean that I've been missing a huge amount of important information until now? I'd just assumed that Google covered the entire relevant web but now it seems to cover the whole same amount again. My Google alerts also seem to have started producing a lot more results which suggest that a lot of these new pages are rated quite highly. Who knows how much more quality content on the web we're just not seeing?
Seems like Google are moving away from static browse-only-when-you-want-to information provision to dynamic, in-your-face services. Just some examples: email, alerts (like this third party) and SMS. In all cases, Google are getting a more dynamic relationship with their customers - giving more and (as they no doubt hope) advertising more in return.
Seems like an obvious crossover to me... Google could combine this kind of SMS service with the search alerting concept to provide regular alerts of information that would be useful on a cellphone - price reductions, new shops opening - and I'm sure later on there will be traffic, weather, etc...
I reckon good applications of the Google Web APIs should also get a chance at the winnings - some crackers that spring to mind are Google Cookin recipe search, Copyscape web plagiarism search, the TouchGraph Google browser, and Google Alert for tracking topics. Isn't the point of both the Jam and the APIs to invite external developers to weave their magic around Google's platform?
Sounds like yet-another-data-visualization-startup - what we really need is a product which turns a database query into an RSS feed, so it's easy to keep track of new matches. If it can be done for Google, and these people are meant to be the next Google, why are they doing it for databases?
Pointless story if you ask me.
With an already profitable business, and lots of extra money in its pocket, can we expect Google to start a buyout spree? Some targets might include Vivisimo with their clustering technology, Girafa for visualizing search, or even some of the better Web APIs applications like Google Alert or the GoogleBrowser, as this Wired story suggests.
Who knows whether the new (or old) price is a good one? It's practically impossible to put a number on Google's future profitability. There are simply far too many unknowns:
Future successes in any of these businesses could make Google's current price seriously undervalued. And if some key ones fall through, it will have been far too high.
Time and time and time again, Microsoft has crushed anybody who's tried to get a significant presence on the desktop, by incorporating a competing Microsoft technology into Windows, which controls all the desktops. No matter how much better Google's technology is, this time will be no different.
Google's main hope is to control the market for supplying results to other places. They can use RSS for website integration, SMS for mobile phones, voice for telephones. This won't help them this year or the next, but it will save them over the long term.
I think there's definitely a need for a Google manual. Apart from the importance of explaining how to search effectively, there are so many extra Google features that normal users simply won't know about, such as Google Local, the Google Toolbar and Personalized Search. That's not even starting to mention some extremely useful third-party add-ons that use the Google Web APIs, such as GoogleBrowser, GoogleAlert and CapeMail. Since they're not home grown, Google ain't going to be publicizing these on its own site any time soon.
Interesting that Google don't seem to mind about API applications keeping the whole word Google in their names, from Google Fight to Googlism to Google Rankings. The Google Alert tool states explicitly on its FAQs that Google "agreed to the use of the Google Alert name and googlealert.com domain". I guess it's all about the distinction between sites that feed into Google's brand value, and those that take away from it.
I think it's a fair price. It reflects the money Google will make in future from selling access to their web index and associated technology - a market that they haven't even begun to seriously develop. The Internet is going to be around for ever, and its content is going to keep growing exponentially until this scary vision is fulfilled. Google's search results represent (to date) the best attempt to organize this information in an intuitive user-centric way.
In fact, they already provide programmatic access to their results via the Web APIs, spawning services ranging from a recipe generator to a site for detecting online plagiarism. According to this story, the developers of Google Alert, one well-known APIs application, have recently been granted permission to commercialize their service. My guess is that it won't be long before there are many more 3rd party Google applications, bringing in a lot of new money to Google's coffers. Anyone for a BUY rating?
It's true that Google's API Terms are restrictive:
"The Google Web APIs service is made available to you for your personal, non-commercial use only"
But it looks like they're open to successful applications being commercialized. For example, according to their FAQs and this article, Google Alert have been granted permission to sell a service based on the APIs:
"Google has agreed to our release of premium paid Google Alert services."
So it looks like Google were just waiting for someone to come along with a commercially viable application - see also these twointerviews for more background. I've been following this story for a while...
Google's usefulness is also being expanded by third party developers using their APIs to develop kitschy hits such as Google Fight and Googlism. But there are useful apps too... A recent release is Copyscape which uses Google to find people who have plagiarized your web content. It's from the same guys as Google Alert and works like magic. I reckon it won't be long (after the IPO?) before Google expand their APIs a lot further, to make image, news and group searching available to third party apps. Then things will get really interesting.
It seems that the web needs a serious search engine for detecting copycat sites, whether it's images, layout, Javascript or text that has been stolen. So far the only thing I know of is Copyscape which seems to work well for text (it uses the Google Web APIs) but can't handle images or code. Maybe there should be a publicly-funded project for this?
This discussion of Google using RSS for Blogger is all well and good, but what about the broader question of integrating RSS into their mainstream search services? By comparison, Feedster searches RSS, and provides its results in RSS. But to get an RSS feed for a Google search you need to use the 3rd party GoogleAlert. Not to mention that Google recently shut down a third party news-to-RSS service. Aren't the guys from the Googleplex supposed to have technological vision or something?
It's not only a matter of internal PhDs at the company which help along their R&D efforts. Thousands of developers outside of Google are using the Google APIs to create new Google applications. Some notable hits are BananaSlug and GoogleAlert (the latter of which is indeed the product of a PhD, according to this article). The fact that Google is able to tempt so many to build on their platform is another sign of their popularity with the academic nerdy elite.
A recent Wired story hints that Google will be using some of their IPO proceeds to acquire some of the best API developments. Interesting... are the Google Web APIs nothing but another channel in Google's search for the world's best computer scientists?
All of these are excellent points.
If Google is convinced this is in the best interests of authors, they should provide them with an opt-in. This could be good or bad for books in different sectors.
Even better, the search engines and publishing houses should get together and decide on some standard metadata format (akin to robots.txt) by which the copyright owners determine whether, and how much, their book can be indexed. Google today - who knows tomorrow? Does Google think it is fair that authors have to individually contact each company that does this, to request their book is excluded?
Is it OK if I stick a notice up in the town hall saying I am going to burgle everyone in town, but people who contact me will be excluded?
This problem has been around awhile, and is part of the larger problem of search engines filtering out duplicate content. Great for the users, but it can be a real problem for site owners suffering from plagiarism or content theft. There's some information at the Copyscape plagiarism search service about what you can do about it.
Then again, we're yet to see the sort of buzz around these APIs that Google was able to muster. Where are the Yahoo equivalents of GoogleBrowser, Googlism and GoogleAlert? Guess there's still something more emotionally exciting about Google, at least for now...
But who will be the first to throw open the floodgates and actuallly provide unlimited API querying at a price? Businesses (such as (plagiarism detection), (rank tracking) and (advanced alerting) are starting to be built out of this stuff, so there's obviously a genuine economy out there for the taking.
This means that you can't reliably hijack the page unless you have a higher PR than it. But if you have a higher PR than that page then could just as well copy its content, then wait till you're spidered, then substitute for whatever you want.
In other words, this is nothing more than another way to exploit two existing problems: (a) that you can steal anyone's content on the web (though see this for a way to detect it) and (b) you can cloak your site for the search engines (though I'm sure they notice that too).
In summary, there is nothing new in this whatsoever.
Until the other search engines release competing APIs (hopefully with a higher than 1000 query limit), Google will remain top dog from the POV of /. types.
One reason why this is in the interest of big old universities like Harvard is that it will make it much easier to detect plagiarism in students' essays. If published books were included in Google's index, a plagiarism detection service like Copyscape would also be able to check whether content was lifted from printed material, as well as from the web.
Does this mean that I've been missing a huge amount of important information until now? I'd just assumed that Google covered the entire relevant web but now it seems to cover the whole same amount again. My Google alerts also seem to have started producing a lot more results which suggest that a lot of these new pages are rated quite highly. Who knows how much more quality content on the web we're just not seeing?
What's this? A slashdot story not about Google? Quick... here's some links on Google and RSS. Phew!
Seems like Google are moving away from static browse-only-when-you-want-to information provision to dynamic, in-your-face services. Just some examples: email, alerts (like this third party) and SMS. In all cases, Google are getting a more dynamic relationship with their customers - giving more and (as they no doubt hope) advertising more in return.
Seems like an obvious crossover to me... Google could combine this kind of SMS service with the search alerting concept to provide regular alerts of information that would be useful on a cellphone - price reductions, new shops opening - and I'm sure later on there will be traffic, weather, etc...
I reckon good applications of the Google Web APIs should also get a chance at the winnings - some crackers that spring to mind are Google Cookin recipe search, Copyscape web plagiarism search, the TouchGraph Google browser, and Google Alert for tracking topics. Isn't the point of both the Jam and the APIs to invite external developers to weave their magic around Google's platform?
Sounds like yet-another-data-visualization-startup - what we really need is a product which turns a database query into an RSS feed, so it's easy to keep track of new matches. If it can be done for Google, and these people are meant to be the next Google, why are they doing it for databases? Pointless story if you ask me.
With an already profitable business, and lots of extra money in its pocket, can we expect Google to start a buyout spree? Some targets might include Vivisimo with their clustering technology, Girafa for visualizing search, or even some of the better Web APIs applications like Google Alert or the GoogleBrowser, as this Wired story suggests.
Future successes in any of these businesses could make Google's current price seriously undervalued. And if some key ones fall through, it will have been far too high.
Google's main hope is to control the market for supplying results to other places. They can use RSS for website integration, SMS for mobile phones, voice for telephones. This won't help them this year or the next, but it will save them over the long term.
I think there's definitely a need for a Google manual. Apart from the importance of explaining how to search effectively, there are so many extra Google features that normal users simply won't know about, such as Google Local, the Google Toolbar and Personalized Search. That's not even starting to mention some extremely useful third-party add-ons that use the Google Web APIs, such as GoogleBrowser, GoogleAlert and CapeMail. Since they're not home grown, Google ain't going to be publicizing these on its own site any time soon.
Interesting that Google don't seem to mind about API applications keeping the whole word Google in their names, from Google Fight to Googlism to Google Rankings. The Google Alert tool states explicitly on its FAQs that Google "agreed to the use of the Google Alert name and googlealert.com domain". I guess it's all about the distinction between sites that feed into Google's brand value, and those that take away from it.
In fact, they already provide programmatic access to their results via the Web APIs, spawning services ranging from a recipe generator to a site for detecting online plagiarism. According to this story, the developers of Google Alert, one well-known APIs application, have recently been granted permission to commercialize their service. My guess is that it won't be long before there are many more 3rd party Google applications, bringing in a lot of new money to Google's coffers. Anyone for a BUY rating?
Google's usefulness is also being expanded by third party developers using their APIs to develop kitschy hits such as Google Fight and Googlism. But there are useful apps too... A recent release is Copyscape which uses Google to find people who have plagiarized your web content. It's from the same guys as Google Alert and works like magic. I reckon it won't be long (after the IPO?) before Google expand their APIs a lot further, to make image, news and group searching available to third party apps. Then things will get really interesting.
It seems that the web needs a serious search engine for detecting copycat sites, whether it's images, layout, Javascript or text that has been stolen. So far the only thing I know of is Copyscape which seems to work well for text (it uses the Google Web APIs) but can't handle images or code. Maybe there should be a publicly-funded project for this?
This discussion of Google using RSS for Blogger is all well and good, but what about the broader question of integrating RSS into their mainstream search services? By comparison, Feedster searches RSS, and provides its results in RSS. But to get an RSS feed for a Google search you need to use the 3rd party GoogleAlert. Not to mention that Google recently shut down a third party news-to-RSS service. Aren't the guys from the Googleplex supposed to have technological vision or something?
It's not only a matter of internal PhDs at the company which help along their R&D efforts. Thousands of developers outside of Google are using the Google APIs to create new Google applications. Some notable hits are BananaSlug and GoogleAlert (the latter of which is indeed the product of a PhD, according to this article). The fact that Google is able to tempt so many to build on their platform is another sign of their popularity with the academic nerdy elite.
A recent Wired story hints that Google will be using some of their IPO proceeds to acquire some of the best API developments. Interesting... are the Google Web APIs nothing but another channel in Google's search for the world's best computer scientists?