Google Looks To Convert Print Pubs Into E-Articles

← Back to Stories (view on slashdot.org)

Google Looks To Convert Print Pubs Into E-Articles

Posted by timothy on Thursday February 25, 2010 @07:22AM from the one-at-a-time dept.

bizwriter writes "A patent application by Google (GOOG), filed in August 2008 and made public last week, shows that the company is trying to automate the process of splitting printed magazines and newspapers into individual articles that it could then deliver separately. Although this could allow Google to convert stacks of periodicals into electronic archives, it potentially sends the company headlong into conflict with a famous Supreme Court ruling on media law."

42 comments

Min score:

Reason:

Sort:

"Reprinted by permission" by LostCluster · 2010-02-25 07:25 · Score: 3, Interesting

Most magazines are glad to sell their content from back issues for money. So, if Google gets permission from the publisher, and then charges for back magazine items in the same way they have a paid-for newspaper archive search... is that really headed for the Supreme Court?
1. Re:"Reprinted by permission" by perlchild · 2010-02-25 07:27 · Score: 2, Interesting
  
  Most magazines wouldn't be ok with an automated process because it wouldn't let them charge extra for some issues.
  I'm not saying google intends to do this, but I doubt sports illustrated would let their swimsuit issue go for the same price as the rest.
2. Re:"Reprinted by permission" by Overzeetop · 2010-02-25 07:29 · Score: 3, Funny
  
  I'm not saying google intends to do this, but I doubt sports illustrated would let their swimsuit issue go for the same price as the rest.
  Yeah, but you don't really read the swimsuit edition for "the articles."
  
  --
  Is it just my observation, or are there way too many stupid people in the world?
3. Re:"Reprinted by permission" by perlchild · 2010-02-25 07:32 · Score: 2, Insightful
  
  I was just using an example that really stood out. Most magazines have one issue a year that really sells, because of just one article that outdoes their competitors. The SI example is recurrent every year, most other magazines aren't so regular.
4. Re:"Reprinted by permission" by LostCluster · 2010-02-25 07:35 · Score: 2, Informative
  
  SI's Swimsuit Issue is not a run-of-the-mill issue of the magazine... and sometimes when sports issues warrant they'll even publish a normal SI on the same day. But, like special issues of Time and Consumer Reports... those don't have to even go to subscribers if they don't want them to. Easy to exclude such things, or include them if Google really wants them, in the eventual contract.
5. Re:"Reprinted by permission" by DragonWriter · 2010-02-25 07:37 · Score: 3, Insightful
  
  Most magazines wouldn't be ok with an automated process because it wouldn't let them charge extra for some issues.
  An automated conversion process has no effect on what can be charged for individual portions of the results, it just streamlines the process of getting material into a form where it can be distributed online, separated by (and, potentially, priced differently by) article, which is even more specific than particular issue.
  Now, certainly, Google would probably like to get everything from everyone and pay and charge nothing for it, making money by serving targetted ads alongside the content. But that's not the only could do with the technology, and patenting the technology (even if one assumes that they intend to deploy it at all) doesn't tell you anything about how they plan to deploy it.
6. Re:"Reprinted by permission" by LostCluster · 2010-02-25 07:38 · Score: 2, Informative
  
  Did you read my original post? Google has a paywall for old newspaper content, they could easily erect one for old magazine content if needed.
7. Re:"Reprinted by permission" by MobyDisk · 2010-02-25 08:18 · Score: 1
  
  If the magazines want variable pricing, then I see no reason they couldn't negotiate that with Google.
8. Re:"Reprinted by permission" by belmolis · 2010-02-25 08:23 · Score: 1
  
  As the article says, the problem is not so much with the publishers as with the copyrights of the authors.
9. Re:"Reprinted by permission" by edumacator · 2010-02-25 12:50 · Score: 3, Funny
  
  No kidding...those pictures are ... degrading to women. If it wasn't for the articles, I wouldn't pick up Sports Illustrated Swimsuit issue. Are you one of those perverts that just buys it for the pictures. You and your ilk disgust me.
  .
  .
  Whew...that was close. She's gone now, but my wife was standing over my shoulder. Those girls are hot!
  
  --
  An important change for education.
10. Re:"Reprinted by permission" by Modern+Primate · 2010-03-01 12:14 · Score: 1
  
  Most magazines wouldn't be ok with an automated process because it wouldn't let them charge extra for some issues.
  I'm not saying google intends to do this, but I doubt sports illustrated would let their swimsuit issue go for the same price as the rest.
  Why would an automated process necessitate uniform pricing for everything? They could easily set it up so that if the OCR reads "Swimsuit Issue" on the cover, the "articles" are tagged differently and a different price is charged.
11. Re:"Reprinted by permission" by perlchild · 2010-03-01 12:20 · Score: 1
  
  As I've said, the swimsuit issue is a rarity.
  In fact, the behaviour of most media executives is that they want to set the price retroactively, based on popularity.
capability does not imply intention by AliasMarlowe · 2010-02-25 07:26 · Score: 2, Interesting

The patent application merely shows they know how to do such a thing. It does not mean that they plan to do so. Google has many unimplemented patents.
Maybe they will, and maybe they won't. But anyone who does will have to factor Google's patent application into their economic reckoning.

--
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
1. Re:capability does not imply intention by Aeros · 2010-02-25 09:03 · Score: 1
  
  No doubt. It makes me laugh at how people jump to conclusions on here so quickly. "The sky is falling"!!! It's nice they have this cool feature but after the book lawsuit they have against them I am sure they are going to tread into this area (if they decide to) a little more carefully.
Re:Google has quite a history by sopssa · 2010-02-25 07:34 · Score: 1

it potentially sends the company headlong into conflict with a famous Supreme Court ruling on media law.
They've already proved with the blatently illegal settlement on the book scanning deal that the law doesn't apply to them.
What is that famous ruling anyway? That sentence just calls for a link.
Which ruling? by mcgrew · 2010-02-25 07:35 · Score: 0

it potentially sends the company headlong into conflict with a famous Supreme Court ruling on media law."
Can someone link please? I'm not a legal scholar. Which law, and how did they rule?

--
Free Martian Whores!
1. Re:Which ruling? by eldavojohn · 2010-02-25 07:39 · Score: 4, Informative
  
  Seriously, folks, it's in the article:
  
  There’s just one legal problem: New York Times Co. , et. al. v. Jonathan Tasini et. al. Usually called the Tasini case, freelance writers sued the New York Times and other print publications for licensing individual articles to database companies without permission from the writers, who retained the copyright on the articles. One of the main turning points was that the publishers had explicit permission only to include the articles in the print publication. However, copyright law did not allow the publishers to break their publications up and make the articles accessible to readers out of the original context.
  Obligatory Wikipedia link.
  
  --
  My work here is dung.
2. Re:Which ruling? by DingoGroton · 2010-02-25 07:42 · Score: 1
  
  From the article:
  
  New York Times Co. , et. al. v. Jonathan Tasini et. al. Usually called the Tasini case, freelance writers sued the New York Times and other print publications for licensing individual articles to database companies without permission from the writers, who retained the copyright on the articles. One of the main turning points was that the publishers had explicit permission only to include the articles in the print publication. However, copyright law did not allow the publishers to break their publications up and make the articles accessible to readers out of the original context.
3. Re:Which ruling? by FlyingBishop · 2010-02-25 07:59 · Score: 2, Funny
  
  Wait, what's this article thing you're talking about? I thought this was Slashdot.
4. Re:Which ruling? by mcgrew · 2010-02-25 08:05 · Score: 1
  
  Thank you. Now, will someone please mod the parent "informative" and my GP comment "overrated?" Thanking the mods in advance.
  
  --
  Free Martian Whores!
5. Re:Which ruling? by eldavojohn · 2010-02-25 08:11 · Score: 4, Funny
  
  Thank you. Now, will someone please mod the parent "informative" and my GP comment "overrated?" Thanking the mods in advance.
  Understanding, thanks, salutations ... delivered on Slashdot? With cordiality? Scanning for sarcasm, hatred, malice, discontent ... clean?! Taking full claim of responsibility? Strange new feelings welling up inside me. Double checking URL ... still Slashdot! No memes? Bizarre. How to appropriately respond?
  
  "Uh ... it's been a pleasure doing business with you?"
  
  --
  My work here is dung.
6. Re:Which ruling? by Hurricane78 · 2010-02-25 16:38 · Score: 1
  
  He just hates himself. ;)
  
  --
  Any sufficiently advanced intelligence is indistinguishable from stupidity.
Re:Google has quite a history by eldavojohn · 2010-02-25 07:37 · Score: 2, Informative

it potentially sends the company headlong into conflict with a famous Supreme Court ruling on media law.
They've already proved with the blatently illegal settlement on the book scanning deal that the law doesn't apply to them.
What is that famous ruling anyway? That sentence just calls for a link.
It's right there in the article:

There’s just one legal problem: New York Times Co. , et. al. v. Jonathan Tasini et. al. Usually called the Tasini case, freelance writers sued the New York Times and other print publications for licensing individual articles to database companies without permission from the writers, who retained the copyright on the articles. One of the main turning points was that the publishers had explicit permission only to include the articles in the print publication. However, copyright law did not allow the publishers to break their publications up and make the articles accessible to readers out of the original context.

--
My work here is dung.
Re:Google has quite a history by Whalou · 2010-02-25 07:37 · Score: 1

From TFA:
http://www.law.cornell.edu/supct/pdf/00-201P.ZO

Usually called the Tasini case, freelance writers sued the New York Times and other print publications for licensing individual articles to database companies without permission from the writers, who retained the copyright on the articles.

--
English is not this .sig mother tongue...
Re:Google has quite a history by LostCluster · 2010-02-25 07:41 · Score: 2, Insightful

There aren't as many orphan magazines as there are orphan books.
I'll have half a pint by Anonymous Coward · 2010-02-25 07:45 · Score: 1, Informative

In the UK, Australia and NZ, "pubs" are what americans call bars.
1. Re:I'll have half a pint by Monkeedude1212 · 2010-02-25 08:34 · Score: 1
  
  We have pubs in Canada too. We also have bars. And clubs
  They are different though. A pub is one of those places you go down to drink and have a good time with your friends. You usually end up buying a big platter of Appetizers, sitting chatting and getting drunk together.
  A club is the opposite of a pub, in that you expect to do No sitting whatsoever. You pay a ridiculously high cover charge, have to be dressed nice, and pretty much go there to dance while drinking. There will at most be 5 tables in seperate corners. It really is just a place to dance and have chicks grind up against you.
  A Bar is kind of a mix between the two. There will be a venue for a live music group usually - and a small area for dancing should the live band encourage you to do so. Otherwise, there is a seating area for you to drink and listen to music.
  The more you know!
2. Re:I'll have half a pint by Anonymous Coward · 2010-02-26 12:19 · Score: 0
  
  Nearly the same. Ours (in the UK) serve actual beer.
Leaping to conclusions by icebike · 2010-02-25 07:46 · Score: 4, Insightful

Both TFA and the summary assume leap to the conclusion that GOOGLE would run afoul of a law relating to current publications without even hinting at the utterly vast archives of newspapers molding in public libraries or on microfilm that can't be accessed conveniently if at all.
Many worry about the loss of historical content, so much so that due to so much of our modern media being released only in digital form.
Yet there is a huge wealth of old newspapers, scientific journals, and popular press magazines that could be salvaged with this technology.
Its odd, that when envisioning futuristic civilizations we almost always expect all of their literary history being contained in computers accessible from everywhere. Yet when someone develops the tools to do just that there is a huge outcry from those that posture as defenders of IP rights.

--
Sig Battery depleted. Reverting to safe mode.
1. Re:Leaping to conclusions by Anonymous Coward · 2010-02-25 08:12 · Score: 2, Interesting
  
  Both TFA and the summary assume leap to the conclusion that GOOGLE would run afoul of a law relating to current publications without even hinting at the utterly vast archives of newspapers molding in public libraries or on microfilm that can't be accessed conveniently if at all.
  That was pretty much exactly what I was going to say. There's a huge leap to nefarious conclusions here - this kind of technology would be awesome for getting old magazines and newspapers a huge amount of which are out of copyright altogether preserved.
  Google's "don't be evil" motto may be laughable, but the leaping to conclusions about their nefarious attempts to preserve history that is rotting away as we speak is even more hilarious.
2. Re:Leaping to conclusions by TubeSteak · 2010-02-25 12:44 · Score: 1
  
  Its odd, that when envisioning futuristic civilizations we almost always expect all of their literary history being contained in computers accessible from everywhere. Yet when someone develops the tools to do just that there is a huge outcry from those that posture as defenders of IP rights.
  There is an outcry because current IP rights don't allow for content to be "salvaged with this technology"
  The solution is to go to Congress and have the law changed, not to run roughshod over the rights of others and then present a fait accompli.
  I know it's easier to ask forgiveness than permission, but that isn't how our legal system works.
  
  --
  [Fuck Beta]
  o0t!
3. Re:Leaping to conclusions by icebike · 2010-02-25 12:49 · Score: 1
  
  Wouldn't it be prudent to actually wait till there was an actual violation of someone's IP rights before starting with the crocodile tears?
  
  --
  Sig Battery depleted. Reverting to safe mode.
Re:Google has quite a history by icebike · 2010-02-25 07:49 · Score: 1

"Blatantly Illegal" (you are welcome for the spelling correction) is a matter for the court to decide. Courts have approved the settlement.
So what was your problem? Did they fail to ask for YOUR approval?

--
Sig Battery depleted. Reverting to safe mode.
Re:Google has quite a history by Anonymous Coward · 2010-02-25 07:54 · Score: 0

It's right there in the article
It almost sounds like you expected anyone to RTFA...
Googling getting questionable by dave562 · 2010-02-25 08:16 · Score: 2, Informative

The summary makes it sound like Google is trying to do yet another end run around actually paying publishers to access their content. Every single major publisher out there already has their article content in an advertisement free format. They have templates that they copy the content (and advertisements) into when it comes time to print. If Google wants the content, they can pay the publishers for it. They don't need to reverse engineer the final printing. They need to stop being cheap and pay content creators.
1. Re:Googling getting questionable by belmolis · 2010-02-25 08:28 · Score: 3, Insightful
  
  It's only recent material for which this is true. Google appears to be interested in older material, for which the publishers generally do not have split out versions, or, for that matter, in many cases, any electronic version at all.
2. Re:Googling getting questionable by Anonymous Coward · 2010-02-25 08:45 · Score: 0
  
  Indeed, Google is trying to do yet another end run around actually paying writers for the access to the content.
3. Re:Googling getting questionable by Anonymous Coward · 2010-02-25 15:26 · Score: 0
  
  Yep, and this is old technology. HP did Time Magazine and the MIT Press collections, Olive Software does this as a matter of course, the technology is prior art going back 20 years, DFKI has had technology to do this for a decade. If this patent gets awarded it will either be yet another mis-granted patent or irrelevant (i.e. easily worked around). Let's hope that the examiner is awake this time.
you're doin it wrong by commodoresloat · 2010-02-25 08:46 · Score: 1

How to appropriately respond?
"Uh ... it's been a pleasure doing business with you?"
No, no; it's like this:
A+++++ comment, would read again!
How will they do it? by ZERO1ZERO · 2010-02-25 09:26 · Score: 1

Complex printed media material, such as a newspaper, often involve columns of body text, headlines, graphic images, multiple font sizes, comprising multiple articles and logical elements in close proximity to each other, on a single page. Attempts to utilize optical character recognition in such situations are typically inadequate resulting in a wide range of multiple errors, including, for example, the inability to properly associate text from multiple columns as being from the same article, mis-associating text areas without an associated headline or those articles which cross page boundaries, and classifying large headline fonts as a graphic image.
the link in the article points to Parallel, Side-Effect Based DNS Pre-Caching and not Segmenting Printed Media Pages Into Articles ??
Anyway take a scanned page of a magzine or newspaper - what kind of algorithms and checks would need to be done to split these articles as they mention - how would you go about it?
Can anyone see anyway they could apply their 'google magic' to do this in any kind of efficient way?, other than the obvious methods of font size, type, justification, upper/lower / bold etc ?
1. Re:How will they do it? by davecb · 2010-02-27 03:35 · Score: 1
  
  Yup: long since done by Exegenix, who even did the magazine analysis, and now available as a web service from Tata Consulting in India.
  
  ... an intelligent document conversion solution that helps you to quickly and easily convert Word, PostScript or PDF files into XML. Exegenix® employs human-like intelligence to interpret each page enabling automatic and accurate conversion of structures within the document, with no scripting required.
  One of my customers used the for-pay service to convert a massive government budget to text the day it was relased.
  --dave
  
  --
  davecb@spamcop.net