Searching for The New York Times
r.jimenezz writes "Adam L. Penenberg, an assistant professor at New York University, has written an interesting piece over at Wired about the contrast between the New York Times' relevance in the real world and the dismal rankings it gets in modern search engines' results. Penenberg discusses some very interesting ideas about opening up the Times digital archive and the impact this would have on its cyber presence."
Of course, like many things about the business operations of a traditional publisher that has ventured online, the reasons are simple but the solutions complicated. The New York Times requires that its users register, which makes it difficult for search engines to spider its content.
As a rule I do not read any newspaper online that I have to register for. In fact, I refuse to purchase the Star Tribune or Pioneer Press here in Minnesota because of their policy requiring user registration. Fake accounts be dammed, you want me to read your paper and have to look through your ads you will let me do so without a cookie linked to information, fake or otherwise.
an even more impenetrable barrier is the Times' paid archive. Because it stows material more than a week old behind an archive wall, you have to cough up $3 per article. Since few are willing to pay for content they can get free elsewhere, search engines, which often base results on relevancy (read: popularity), will continue to dis the Times -- as well as other media sites that make you register or pay for old news (The Washington Post, The Wall Street Journal).
This is a horrible problem that I have run into in recent times trying to do simple research on the web. I was trying to look for articles pertaining to a friend that currently resides in Perrysburg, OH. I did a simple search on the Toledo Blade's website only to find a link to a third-party archive company that required me to pay a fee to access more than a short blurb about the story. Unwilling to drive the 665 miles to Toledo from where I currently live just to read a hardcopy I gave up on my search for these articles due to this barrier. But while doing research about NEPA I find that The Scranton Times has a much better free searchable archive of information than does the The Times Leader which requires you to pay to visit their archive. Wonder who gets my visits?
I really think that these policies could lead to the downfall of traditional news outlets. I have absolutely no desire to pay money for information that should be easily available. Hell, if you are going to charge I can't see a $3 fee! A couple hundred words are worth $3 in storage? No way. Perhaps if I asked them to mail me the copy of the article then $3 would be reasonable.
"There isn't a compelling business argument today that would suggest that giving away our content is a good idea," Nisenholtz said. Even though the Lexis-Nexis deal is an all-you-can-eat model -- not based on usage -- the Times can ill afford to undermine its relationship with such an important customer. It simply can't charge Lexis-Nexis tens of millions of dollars while giving away the same content free over the Web.
The argument that makes sense is that people aren't going to be willing to pay you $3 for a computer copy of an article that is only a couple hundred words. Make the fee something reasonable or watch as you begin to waste a lot of money paying the third party archive to host your data and no one retrives it. Perhaps a rival newspaper would open their database up and people would start going to them instead. We can always hope.
I assume that the Googlebot can't be bothered to register ;-)
Using HTML in email is like putting sound effects on your phone calls. Just say <strong>no</strong>.
What a bunch of bastards. Great paper though.
Buy the President
Relevance is a highly subjective term. If you're a typical outspoken, liberal New Yorker, then its your Bible. If you live in a cabin in Montana, you probably don't give a shit. Calling something 'relevant' indicates much about the person doing the calling, as much or more than it tells anything about the item being discussed.
Personally, I think its a rag. It's old, its big, its supposedly a "standard", but no more relevant than my local paper. And probably LESS relevant than the sum total of whats available online - BBC, London Times, Die Zeit, Drudge, CNN.com, english.aljazeera.net, etc. etc.
I want to delete my account but Slashdot doesn't allow it.
Who needs the NYT! Let the New York POST open up its vast archives! Imagine searching through decades of mindless celebrity gossip and suddle right-wing propaganda?
I think you're painting with too broad of a brush, but I don't think that the New York Times has been the 'paper of record' since Watergate.
The entire idea of their *being* such a thing seems a little outdated to me.
The article assumes that the fault lies with the NYT and whether their archives are open. Perhaps the real fault lies with Google. Shouldn't there be something in Google that identifies certain sites and more reliable than others rather than basing rank solely on links? How many people link to online news articles? You're more likely to link to your friends beer-and-computer-mods page than a NYT article about Ashcroft's boot fetish.
*** *** You're just jealous 'cause the voices talk to me... ***
While the NYT may fare dismally in search rankings, I suspect their online influence is still strong. Many of the top hits on a given subject may not link to nyt.com but i'll wager that a number of them are blogs that reference Times material. Just a thought.
harmonious design
I have no problem with registering. If all I have to do is register an email address (heck, even a free hotmail address that i reserve only for spam) and my name, and maybe even my address, and I can get top quality news reporting without having to pay for the newspaper, then by all means I'm for it.
The reason why the NY Times is one of the best papers in the world is because they can afford to pay their employees what they deserve. If my registration helps up the amount of money they can get from their advertisers, then I'm all for it. People deserve to be paid for their hard work.
That said, I do believe they need to have better results on google, and don't agree with paying $3 for their archives that I can get at my local library for free.
Think of the children, people.
I think this touches upon a much larger problem.
Traditionally, libraries were the ultimate source of information. They were organised and well indexed - to help one find what they are looking for.
The internet has become an "instant library" to a lot of us. In ways, the internet is better than a library. Searching is trivial and the amount of information staggering. However, a lot of information is getting lost. I'm aware that there are Archiving sites, but often, these sites cannot index or record the information that sites present from their own MySQL/Oracle databases.
Search engines are really only good for searching a static site, and don't particularly scale well to sites that have content that change frequently.
It all boils down to this: HTML+Search Engine is not a good combination for giving people access to information over a long period of time. Web sites come and go (depending on the interest of their maintainters) and when they go, they're gone for good.
We need to start distributing the content on a global scale - the same way books distribute content among many people.
[ Monday is a terrible way to spend one seventh of your life. ]
B) Not indexed by search engines
C) Not electronically archived
Yeah, looks like they're really relevant in the 21st century. (And this is a good indication that land-grab IP attitudes have no long term positive benefit in an information society.)
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
This shouldn't be a surprise. Look at the headlines they give in 50 point type, and then when it turns out to be wrong it doesn't even make front page news.
Yellow cake in Niger, for example, they hail him as nearly a god when he says there was no such thing, and that turns out to be wrong...see here here here here
here and here.
They've finally run a story about it, but wouldn't it have been a lot better for them to have investigated those Wilson allegations themselves, when they first happened?
That's only one of the latest...
Tons of websites require you to register, not to mention discussion boards of every flava.
/. has the AC option... I wish more websites would offer a similar thing to people, and a few more benefits to registered users, and a few more benefits to paying customers.
I have to admit I have registration-fatigue.
At least
More people would be happy this way.
"a specific name in that article" site:nytimes.com
in Google News and it returned me that specific article. But then, I presed "Web" search for the same phrase and it didn't return that article but a couple of older articles with the same name (I guess those were from the time before the Google News started).
In an interesting coincidence, just an hour or so ago, I was looking for an article I read online in the NYT. Specifically, I was looking for an interesting image which was in the article. (Not for any specific use, I just wanted to show a friend.)
Besides the fact that the article is in the archive now (yet less than a month old!) and costs money, the page also informs you that:
Please Note: Archive articles do not include photos, charts or graphics. Our photos are available for purchase, please click here for more information.
Clicking the link reveals that you can order a photographic print for $95, and that's if they have it.
I don't even want a photographic print! A 200x200 pixel bitmap would be fine! (and hardly damaging to their photo sales)
As the article points out, why would anyone casually link to a NYT story? There is simply no point in linking to something most can't access without paying.
They certainly deserve that Google ranking.
I'm not so sure the NY Times is outlandish in their pricing for archived articles. Articles from the past are a niche offering, and thus come with niche prices. If you really need an article from 1964, most likely a few bucks won't be too much trouble. The idea that you'll pay a price directly reflective of the cost of goods is ludacris. If it weren't, we'd be paying 4 cents for a coke, 2 dollars for a movie, and 5 bucks a month for internet service. Take a trip down to the library and spend a few hours finding the article on microfiche, if you can, or pay a few dollars and get it immediately at home.
Dude, 99.99% of Drudge's big "scoops" are just a sentence leaked from the NY Times newsroom about some big story they're going to publish the next day. Drudge is good at collecting information, but don't kid yourself: his investigative skills are nil.
First we had that scandal with Jason Blair who made up stories-- okay even top notch organizations make mistakes.
But then they came out and admitted they didn't do their job in the run up to the war (i.e., underreporting the suspect issues with the war and putting it in back pages).
OOOPS.
After such big mistakes I don't really consider them the best anymore. And like other reputations in this world, it seems to be more based on momentum than anything else.
I'm not saying they're a bad paper, just that we should demand more from the US's supposed #1 paper.
As a rule I do not read any newspaper online that I have to register for. In fact, I refuse to purchase the Star Tribune or Pioneer Press here in Minnesota because of their policy requiring user registration. Fake accounts be dammed, you want me to read your paper and have to look through your ads you will let me do so without a cookie linked to information, fake or otherwise.
So they are supposed to provide world-class journalism and post it on a world-class website and you can't be bothered to host a cookie and look at some ads (which can be easily blocked anyway) in return?
What a massive sense of entitlement you have. Either that or a severe cookie-phobia...
Stop by my site where I write about ERP systems & more
From a newspapers perspective open archives aren't always a possiblity. I work for a newspaper in a Moderately sized (~100,000 people) midwestern city. We currently have about 135 years of paper archives dating back the the late 1800's. While we do have a decent internet presence, we don't have the resources to provide this conent online for free.
A recent estimate by me showed that we would need about $20,000 to get that project started in a very barebones manner. That isn't a small amount of money to throw at a project that you want to give away for free. On the other hand their is antoher newspaper in town that charges $90/year for access to their sports archives and at last estimate they had close to 1000 subscribers. For a medium sized paper that amount of money is hard to pass up.
Now for a company like the New York Times that is a different story. They certainly have the resources to get their content online. They though, have other reasons to keep their content available on a pay basis. They maintian strict controls over all their copyrighted material. Its hard to blame them for this though, since that content is their lifeblood.
In my opinion I do feel they keep their content under too tight of a lock. Its like having a great idea but never letting anyone hear about it because you are afraid they might steal it. Papers must decide between keeping their copyrighted material secure and providing it to readers in a new medium. But it is that delecate balance that traditional print publications now face while moving into the digital era.
I don't understand the logic behind charging to read news articles online and frankly I don't care about the NYT. I'm of the opinion that every newspaper and website news seems to copy and paste the same articles with the exception of a few choice words put in that I just choose to ignore - for example:
Reuters
"Man commits suicide"
BBC
"Man commits suicide after learning his wife was having an affair"
CNN
"An average Joe Worker committed suicide today after having his broken when he found out about his wife having an affair with another man"
FOX
"It was a tragic day for the family of Joe Worker who committed suicide shortly after learning that his wife was having an affiar with another man."
NYT
"It was a day like any other, except this time Joe Worker came home early from work to surprise his wife. Unfortunately he surprised not only her, but his wife's lover as well. After becoming enraged (wouldn't we all?) he proceeded to the basement where Joe Worker took his fathers P-Shooter and blew his head off. His wife later called authorities."
Now why do I need to PAY to be able to read a NEWS story that reads like an editorial on some guys pathetic life when all I really care about is "Just the facts" and getting to the Dilbert Comics?
Ave Molech Setting
I know it will sound abhorringly naive but shouldn't The New York Times have as a prime interest independent and objective journalism instead of profit driven opinion-articles passed as objective journalism? Didn't they have to appologize for participating in the national hype (that means acting as a propaganda instrument) for the war against Iraq?
A newspaper acting as a propaganda instrument is something very alarming to happen in a democratic country. That's what happens in fascist, communist and oppresive regimes in general. No wonder Michael Moore's movie/documentary is so wildly accepted. The people want the truth but the number of them that trusts US corporate media anymore decreases by the day.
Yam, yam, uga booga, yam, yam, yade, yade, uga booga, yam, yam, yade, yade
Aren't many rankings dependant on how many people link to the site? Not many folks will cite the link if it requires registration or a "Pay to Retrieve".
Moreso - People will just cut and paste the article and post that instead.
I don't know why they still bother with the registration - who actually puts in relvant information anyway?
_ _ _ Go for the eyes Boo! GO FOR THE EYES!
A pint of high-quality water can be obtained from many municipal water systems for a fraction of a penny.
Yet people are happy to pay $2 for a bottle of the same water.
Things are worth whatever you are willing to pay.
Conformity is the jailer of freedom and enemy of growth. -JFK
The Times attracts 9 million unique visitors a month, while only about 1 million read the daily paper.
I find the extensive dead-tree version convenient and end up reading more from it than the on-line version that's free.
But, not having a lot of time during the week, I end up buying the print version maybe every 3 days, and quickly scanning the on-line headlines on the off-print days.
The Times really ought to open up its archive and let everyone, including Lexis-Nexis, have free access.
Many years ago at a university library they had an entire special catalog devoted to indexing old NY Times articles that one could read from microfiche. Without the individual paying, either.
There is still a fundamental chasm between archived high-quality material (especially true for scientific journals) and what is freely available and searchable on the web.
Think about how useful it would be for the general public to have access to old, high-quality archives like the NY Times and other scientific periodicals; the pursuit of science and other research would be considerably advanced over where it is today. Then there is the reality: copyright protections and the hope by the copyright owners for a few dollars more by charging for access (that only the very wealthy or institutions can afford) still persists.
It's almost enough that I think the government ought to exercise eminent domain (link to counterpoint about possible abuse of eminent domain - just as they do for land when a freeway needs to go through Aunt Tilly's backyard) and provide some reasonable compensation to the current copyright owners and to appropriate sufficiently old works and make them available publicly.
"Provided by the management for your protection."
All news organizations are the same. Even Fox which isn't really a news organization but more a tabloid show.
I can easily recall numerous occasions where Fox puts out a story and either the newsheads or the 'experts', or both, conveniently leave out facts or skew things.
Don't bother trying to claim it's the 'liberal' media which lies or spews propoganda.
We will bankrupt ourselves in the vain search for absolute security. -- Dwight D. Eisenhower
"The Gray Lady is a beautiful clipper ship, but it's losing steam..."
--media consultant Vin Crosbie, from TFA
The Lexis-Nexis agreement is the key bit. NYT Digital profited $25M and they have a $20M agreement with Lexis-Nexis that they wouldn't have if the archive were available free. The archive therefore clearly won't be free as long as Lexis-Nexis "owns" it.
I don't know what else is in Lexis-Nexis, but I imagine they have similar agreements with their other main sources of info. But it seems like they're the ones who are more threatened by Google, since they are so clearly in direct competition. When their first customers start making their content too free on the web, there's going to be a momentum that leads to the decline of Lexis-Nexis's current model--at which point NYT Digital will figure out some other way to make money.
Newspapers rarely make enough in issue sales to pay the cost of printing the issue. They make the money in advertising, plain and simple.
To have a paper like the New York Times, who can command advertising rates as high as any paper in the world, bitching and moaning about their web presence and hoarding their articles like some stupid info-miser shows nothing more than a complete lack of understanding somewhere in the company. There is no excuse for it.
If any website could sell enough ads to keep itself profitable it would be the website for the new york times. They could add to their revenue and readership in one fell swoop. But no.
It's dumbass media outlets like this that had better wake up and get with the program. Doing it the way you've always done it will do YOU in the end, and it won't be pretty.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
He's from Minnesota. Everyone here has a massive sense of entitlement. We have taxes that tax your taxes.
Besides, both the Star Tribune and the Pioneer Press are such left wing rags that they aren't worth much.
I only get the Sunday edition because of the coupons.
There is no correlation between size in the real world vs. the virtual world. The New York Times is a gated community. It should be _no_surprise_ that search engines rank the NYT low *and* that its popularity is low. If Google starts ranking NYT links high, it won't be because they are popular or more useful that other news sources, and it will be a great disservice to Google users.
how long has it been since the Times was really a relevant source of information in the real world?
Since computerized communication provided open sources of news that made it painfully obvious the Times had let ideology lead them into draconian self-censorship, bias, and occasional (but systematic) outright lies, rather than news coverage, to spread a political agenda.
It's tempting to say since they started that policy. But that still left them "relevant" - like the propaganda machine of ANY ideology with major political power is relevant. What killed their relevance is the availablility of sources they and their ilk couldn't suppress or ridicule into irrevelance.
This was starting to happen in the early days of netnews and bulletin-board systems. But the explosion of home-computer connectivity and web-based interfaces brought it to the general public with a vengance.
I'd say the watershed event was the Drudge Report's breaking of the Lewinsky scandal. People had been switching off mainstream media news for some time. But this made it clear to the broad public that the internet was not just a good source of news, but a BETTER and MORE RELIABLE one, than the broadcaster/newspaper/magazine axis. In particular, it brought the latter's self-censorship and bias into the public eye.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Your local library. Unless you're really in the middle of nowhere and your library has no budget at all, go to the library. Heck, you might not even have to go to the library, many libraries now do chat reference, ask-a-librarian, and all libraries have a phone.
There's more, MUCH more, to doing research than using google. Paid databases have it all over google for finding current and historical news information.
If you can't find something local, try the Library Of Congress, they do online chat reference.
Not sure if anyone noticed but in my opinion:
http://nytimes.blogspace.com/genlink
was the only thing of any relative importance as its another nice way to get around the NYtimes registration barrier....
Libraries are generally wonderful, amazing places: well organised, friendly and incredibly expert staff who do their best to get what you need for little or no cost.
But there is a cost - and people forget about it, because its in our taxes. (Whether or not we should pay for public libraries out of our taxes, and whether the money is well spent is another argument). But the bottom line is that we've had 100 years or so of great services because there has been a general philosophical acceptance that it's a Good Thing for everybody to throw in a few cents for a building in every town, full of good books, staffed by experts, and with an infrastructure to enable gaps in individual library stocks to be covered at a national *and* international level by an interlibrary loan service. Most developed countries now have a superbly developed system for getting paper-based information to their citizens for little cost.
My question is: would we accept paying taxes to do the same via the internet?
I think it's mainly a philosophical, rather than technical question. If we all agreed to pay additional 'library taxes' then there's no reason why existing sources couldn't be made available to all citizens (e.g. your National Insurance number is your password, now you can get the NYT online for free, NYT gets paid by the treasury for its national-to-all-citizens licence each year) and also in the same way that many library indexing systems were evolved by librarians working under public funding, why not use public funding to develop internet archiving / retrieval systems of comparable value? I think it's a philosophical issue, it depends on how you see these technical solutions being funded.