Searching for The New York Times
r.jimenezz writes "Adam L. Penenberg, an assistant professor at New York University, has written an interesting piece over at Wired about the contrast between the New York Times' relevance in the real world and the dismal rankings it gets in modern search engines' results. Penenberg discusses some very interesting ideas about opening up the Times digital archive and the impact this would have on its cyber presence."
Of course, like many things about the business operations of a traditional publisher that has ventured online, the reasons are simple but the solutions complicated. The New York Times requires that its users register, which makes it difficult for search engines to spider its content.
As a rule I do not read any newspaper online that I have to register for. In fact, I refuse to purchase the Star Tribune or Pioneer Press here in Minnesota because of their policy requiring user registration. Fake accounts be dammed, you want me to read your paper and have to look through your ads you will let me do so without a cookie linked to information, fake or otherwise.
an even more impenetrable barrier is the Times' paid archive. Because it stows material more than a week old behind an archive wall, you have to cough up $3 per article. Since few are willing to pay for content they can get free elsewhere, search engines, which often base results on relevancy (read: popularity), will continue to dis the Times -- as well as other media sites that make you register or pay for old news (The Washington Post, The Wall Street Journal).
This is a horrible problem that I have run into in recent times trying to do simple research on the web. I was trying to look for articles pertaining to a friend that currently resides in Perrysburg, OH. I did a simple search on the Toledo Blade's website only to find a link to a third-party archive company that required me to pay a fee to access more than a short blurb about the story. Unwilling to drive the 665 miles to Toledo from where I currently live just to read a hardcopy I gave up on my search for these articles due to this barrier. But while doing research about NEPA I find that The Scranton Times has a much better free searchable archive of information than does the The Times Leader which requires you to pay to visit their archive. Wonder who gets my visits?
I really think that these policies could lead to the downfall of traditional news outlets. I have absolutely no desire to pay money for information that should be easily available. Hell, if you are going to charge I can't see a $3 fee! A couple hundred words are worth $3 in storage? No way. Perhaps if I asked them to mail me the copy of the article then $3 would be reasonable.
"There isn't a compelling business argument today that would suggest that giving away our content is a good idea," Nisenholtz said. Even though the Lexis-Nexis deal is an all-you-can-eat model -- not based on usage -- the Times can ill afford to undermine its relationship with such an important customer. It simply can't charge Lexis-Nexis tens of millions of dollars while giving away the same content free over the Web.
The argument that makes sense is that people aren't going to be willing to pay you $3 for a computer copy of an article that is only a couple hundred words. Make the fee something reasonable or watch as you begin to waste a lot of money paying the third party archive to host your data and no one retrives it. Perhaps a rival newspaper would open their database up and people would start going to them instead. We can always hope.
What a bunch of bastards. Great paper though.
Buy the President
While the NYT may fare dismally in search rankings, I suspect their online influence is still strong. Many of the top hits on a given subject may not link to nyt.com but i'll wager that a number of them are blogs that reference Times material. Just a thought.
harmonious design
I think this touches upon a much larger problem.
Traditionally, libraries were the ultimate source of information. They were organised and well indexed - to help one find what they are looking for.
The internet has become an "instant library" to a lot of us. In ways, the internet is better than a library. Searching is trivial and the amount of information staggering. However, a lot of information is getting lost. I'm aware that there are Archiving sites, but often, these sites cannot index or record the information that sites present from their own MySQL/Oracle databases.
Search engines are really only good for searching a static site, and don't particularly scale well to sites that have content that change frequently.
It all boils down to this: HTML+Search Engine is not a good combination for giving people access to information over a long period of time. Web sites come and go (depending on the interest of their maintainters) and when they go, they're gone for good.
We need to start distributing the content on a global scale - the same way books distribute content among many people.
[ Monday is a terrible way to spend one seventh of your life. ]
Tons of websites require you to register, not to mention discussion boards of every flava.
/. has the AC option... I wish more websites would offer a similar thing to people, and a few more benefits to registered users, and a few more benefits to paying customers.
I have to admit I have registration-fatigue.
At least
More people would be happy this way.
"a specific name in that article" site:nytimes.com
in Google News and it returned me that specific article. But then, I presed "Web" search for the same phrase and it didn't return that article but a couple of older articles with the same name (I guess those were from the time before the Google News started).
In an interesting coincidence, just an hour or so ago, I was looking for an article I read online in the NYT. Specifically, I was looking for an interesting image which was in the article. (Not for any specific use, I just wanted to show a friend.)
Besides the fact that the article is in the archive now (yet less than a month old!) and costs money, the page also informs you that:
Please Note: Archive articles do not include photos, charts or graphics. Our photos are available for purchase, please click here for more information.
Clicking the link reveals that you can order a photographic print for $95, and that's if they have it.
I don't even want a photographic print! A 200x200 pixel bitmap would be fine! (and hardly damaging to their photo sales)
As the article points out, why would anyone casually link to a NYT story? There is simply no point in linking to something most can't access without paying.
They certainly deserve that Google ranking.
From a newspapers perspective open archives aren't always a possiblity. I work for a newspaper in a Moderately sized (~100,000 people) midwestern city. We currently have about 135 years of paper archives dating back the the late 1800's. While we do have a decent internet presence, we don't have the resources to provide this conent online for free.
A recent estimate by me showed that we would need about $20,000 to get that project started in a very barebones manner. That isn't a small amount of money to throw at a project that you want to give away for free. On the other hand their is antoher newspaper in town that charges $90/year for access to their sports archives and at last estimate they had close to 1000 subscribers. For a medium sized paper that amount of money is hard to pass up.
Now for a company like the New York Times that is a different story. They certainly have the resources to get their content online. They though, have other reasons to keep their content available on a pay basis. They maintian strict controls over all their copyrighted material. Its hard to blame them for this though, since that content is their lifeblood.
In my opinion I do feel they keep their content under too tight of a lock. Its like having a great idea but never letting anyone hear about it because you are afraid they might steal it. Papers must decide between keeping their copyrighted material secure and providing it to readers in a new medium. But it is that delecate balance that traditional print publications now face while moving into the digital era.
To have a paper like the New York Times, who can command advertising rates as high as any paper in the world, bitching and moaning about their web presence and hoarding their articles like some stupid info-miser shows nothing more than a complete lack of understanding somewhere in the company. There is no excuse for it.
Uh, I don't know if you realized this, but newspapers ALSO make a lot -- a LOT -- of money on their archives. In fact, in some areas the only reason the local paper survives is an archival entity, selling their content digitally and on microfilm/fiche to universities and to services like Lexis-Nexus.
There is a big fear in the newspaper industry that opening their archives online will destroy this revenue stream without introducing a comparable new revenue. It is a very realistic fear...I used to work for an online newspaper company, and it was quite common to have customers putting up less than half of their print content after seeing massive drop offs in print sales. Many clients would ask us to clear their archives, so you could only search a month back.
I mean, the Times is a respected paper. Their articles are linked to all over the net despite the required registration, and they can expect every self respecting university to buy the year's microfilm roll. Offering the content for free could ONLY hurt them, so they'd be stupid to do so.
Hey freaks: now you're ju
Libraries are generally wonderful, amazing places: well organised, friendly and incredibly expert staff who do their best to get what you need for little or no cost.
But there is a cost - and people forget about it, because its in our taxes. (Whether or not we should pay for public libraries out of our taxes, and whether the money is well spent is another argument). But the bottom line is that we've had 100 years or so of great services because there has been a general philosophical acceptance that it's a Good Thing for everybody to throw in a few cents for a building in every town, full of good books, staffed by experts, and with an infrastructure to enable gaps in individual library stocks to be covered at a national *and* international level by an interlibrary loan service. Most developed countries now have a superbly developed system for getting paper-based information to their citizens for little cost.
My question is: would we accept paying taxes to do the same via the internet?
I think it's mainly a philosophical, rather than technical question. If we all agreed to pay additional 'library taxes' then there's no reason why existing sources couldn't be made available to all citizens (e.g. your National Insurance number is your password, now you can get the NYT online for free, NYT gets paid by the treasury for its national-to-all-citizens licence each year) and also in the same way that many library indexing systems were evolved by librarians working under public funding, why not use public funding to develop internet archiving / retrieval systems of comparable value? I think it's a philosophical issue, it depends on how you see these technical solutions being funded.