New York Times Wipes Journalist's Online Corpus
thefickler writes "Reading about Peter Wayner and his problems with book piracy reminded me of another writer, Thomas Crampton, who has the opposite problem — a lot of his work has been wiped from the Internet. Thomas Crampton has worked for the New York Times (NYT) and the International Herald Tribune (IHT) for about a decade, but when the websites of the two newspapers were merged two months ago, a lot of Crampton's work disappeared into the ether. Links to the old stories are simply hitting generic pages. Crampton wrote a letter to Arthur Sulzberger, the publisher of the NYT, pleading for his work to be put back online. The hilarious part: according to one analysis, the NYT is throwing away at least $100,000 for every month that the links remain broken."
the NYT is throwing away at least $100,000 for every month that the links remain broken."
now how much would it cost to fix all those links...
no wonder newspapers are not doing well
Its not my fault, someone put a wall in my way.
I wish I had someone to wipe my corpus for me. I always soil my fingers.
Groovy baby.
CNN's website doesn't have as many broken links.
Articles over a decade old still work!
Whoever designed theirs deserves a lot of credit.
We've come to rely on being able to find things on the internet, it is sad to think that information might go away and cease to exist. That said, I guess it depends on the contract the writers have whether he has a right to have his body of work preserved or not. I mean if a company pays for your work it is theirs and not yours unless your contract entitles you to it. Once you've sold your work to somebody, they can never have anyone read it and use it to line hamster cages for all they care.
This is so unfortunate. IHT was great before the merge, which was touted as a "new" version of IHT. Instead, they just canned it and attempted to transfer its content to the existing NYT site. And did a dreadful job, it seems.
I understand the logic - newspapers need to cut costs because they can't figure out the internet and it is killing them. But they lost a dedicated reader in me with this move.
Light the blue touch-paper and retire immediately.
And it's got unlimited space. Strangely enough, some people are adamant about keeping their works out of this library. And I say they have the right to insure the internet forgets about them when they die. This poor soul seems to understand what's going on.
My work here is dung.
The problem IMHO is not so much the broken links, but instead the desire (or lack of...) from the corporate overlord to retain "obsolete" content. Priority was given to the merger of both titles, without considering what makes a newspaper what it is: content.
I was interested in reading the analysis that led to the $100,000/month loss per month the guy's work was offline. So doing what you do, I clicked on the link and found it grandly hilarious to receive a 500 error stating: "Error establishing a database connection". Oh, the irony.
Am I the only one who finds this funny? They've managed to keep archives older than, oh my god brace yourselves, 10 years!!!
Seriously though, don't give them standing ovations simply because everybody else fail. Tell me this in 50 years and I'll honestly clap my hands.
I am the lawn!
Whenever I redesign my site, I try hard to avoid changing and URLs. But if I do have to change a URL, I always make sure that there is a redirect (preferably a HTTP/301 permanent redirect) that points from the old URL to the new URL. Updating links is not enough, because you will always have links that come from external sites that you don't control, user bookmarks, links found in "Hey, check this article out" e-mails, etc.
This is one of those basic principles of the web that the W3C (and for those who don't pay attention to them, you can substitute that with "plain old common sense" here) strongly recommends.
It means that users can always find and view content. It means that you still retain your ad revenue. It means that you still keep your PageRank for external sites that link. It means less bitrot and a more useful web...
I feel for the guy and his lost articles, but I am wondering why he did not keep backups of everything? The stories seem to be gone forever, or else his letter would be about to re-publishing. his stories on his own website.... That is a rather bad case of negligence on the publisher's side , but more so on the part of Mr. Crampton. For comparison: I work with a professional fotojournalist and this guy has been working for 50 years now and has archived everything (more than 1.5 million pictures) like a mad squirrel. If you ask him about an article he wrote in 1961, it takes him about five minutes to find a copy of the article and the raw materials. Everything analog but nonetheless... That makes you wonder if -while embracing digital media and the blogosphere - many journalists have not brought with them the necessary tools to manage and archive their digital assets.
Pay me $100,000 per month and I'll dishonestly clap my hands right now.
Help fight poverty: Punch a poor person.
In the digital age, wiping out thousands of volumes of material takes mere seconds. Permanently. Gone. Poof.
We have books, printed books, which go back hundreds and hundreds of years (well, written material; the printing press is a fairly recent invention).
We don't even have a record of some newspaper articles that came out 5 years ago. We're LOSING our history, not retaining it, because we lack sufficient "printing" to always keep a copy in circulation. Witness the Avism.com debacle and hundreds of other cases where this has happened.
Until we can have a hard-copy of digital media which can NOT be changed, edited, altered or redacted... we're lost.
When we all have "Kindle DX2" devices in the classroom for digital copies of our textbooks... what is stopping them from "gently changing" some of the wording over time, over a few years, to permanently alter the way our youth views the history of times they never lived through?
How can you compare one version of a website today, with the one that was there last week? Was anything changed? Was article content "censored" in any subtle way?
We're heading down a very slippery slope, when digital information can't remain static enough to hold through the years, and be validated and verified to be unchanged, with sufficient copies in enough hands, to ensure survivability. The Internet is not the place to "store" things you want to keep for years and decades.
Much of what we know about past days is from written material. With move towards net everything and the decline in print as the internet changes (and I do not mean just the web; email, gopher, irc, usenet, ftp archives, et al are all prone to this problem) much of our history will be lost to generations to come purely through attrition.
Then we have the problem of changing file formats, media which decays rapidly when compared to paper and decent inks, obsolescence of technology (try finding a laptop with a built in 3.5" floppy drive)...
In years to come this period will become the 2nd dark age.
--- Users are like bacteria -> Each one causing a thousand tiny crises until the host finally gives up and dies.
My company links to articles on a lot of magazine websites, and I'm just amazed at how often the links become broken. Sites get redesigned and they don't bother redirecting the old URLs to the corresponding new locations. Or, even worse, they just discard all of the old articles, or random articles disappear or come up blank or mangled. Does it not occur to them that websites, search engines, and blogs are left with broken links? Do they not realize that people bookmark the articles?
It's hard to tell from the linked article (yeah, I read it) but it doesn't seem like Crampton has no copies of the articles (surely he would keep of his own stuff) but that they're just not accessible on the Internet. All the links that should point to them from the NYT and the IHT went kablammo when the two sites merged.
There's no way a back up on his end could fix this problem.
"There is no time, sir, at which ties do not matter," Jeeves, (Jeeves and the Impending Doom)
Hell, the way things are right now you could pay me $10,000 a month and I'll gladly clap my hands 40 hours a week in whatever venue you deem most appropriate.
This is my sig. There are many like it but this one is mine.
I clicked on the two links listed at the bottom of the open letter to Arthur Sulzberger (both are IHT links), and both now are redirected to the correct articles on the www.nytimes.com domain. Has the NYT fixed the problem and no one has just bothered to mention that?
This just illustrates how little the print media understand the web. If they want to increase traffic to their site to increase ad revenue they'd make sure they're capturing all views they can, and building a content base that will be an asset for time to come.
I see over 1000 articles (with photos) by this guy on the Times website. And I can access all of them.
This is the same bunch that complained how (paraphrasing) "George Bush is such an asshole; he's having them DRIVE yellowcake uranium through the streets of Baghdad without concern for the safety of the inhabitants".
Where do I start?
- They'd been in contact with the uranium for some time.
- Maybe they'd teleport, Scotty?
- Isn't this the "Weapon of Mass Destruction" they were hoping not to find?
Of course it was; there was 8 TONS of the stuff, and nearly a ton was close to weapons-grade. But the article was quickly whisked off to the archive so you'd have to PAY to get a copy just days after printing.
The only reason you're learning about it, is that I read the original and posted it here.
THE POINT:
Newspapers, and for that matter news agencies, have a template to fulfill. It's no longer _news_, it's part of a story they tell. They spend all their time printing a paper no one wants, then complain how their customer base is 'un-hip' and "Joe Sixpack" and doesn't get the literary genius packed within. It's nuts!
Let's let them collapse.
--- For a good time mail uce@ftc.gov
I feel for the guy and his lost articles, [...]
I feel for him too. Of course the articles aren't his, they are his employers (unless he has a contract that says otherwise) - which is probably why he's bothered. If they were _his_ articles then he could wholesale upload them to his own site and reap the rewards (whatsoever they may be).
any good /. er could go on and on about the problems of the times website. I actually had to tell them that they needed a button so people could go back or forward one day at a time (any std site for a journal has this feature - look at say amer chem soc journals, there is a button that goes forward or back one issue)
I have repeatedly told them their comments suck and they should have slashcode and wikipedia - can you imagine how much traffic the times website would generate if each of their great articles formed the basis of a community wiki article
and I'm not that good at this - i'm sure any competent experienced person could find 100s of things they are doing wrong (if you want to GIVE the times money, by being a customer, and purchasing an ad, try and figure out how to do it from the website - its embarrassing)
I guess the website is driven by old guys whoose attitude is, it was good enough for hot lead set by hand, its good enough for the web....nyt, rip
Moving websites is a good time for purging embarrassing stuff, especially the comments section. One wonders what else is missing especially from the archive. Ah, I just read this bit; the archives were erased in the move. It takes willful action to lose your own archive. At least they didn't go back into the archive and replace the negative bits with adverts, like some other online newspapers do. Job well done I guess :)
davecb5620@gmail.com
Peter Wayner - author of a famous and well known book on compression algorithms, which managed to survive the Big Howl of Internet due to its relatively popularity on the time it was written. It was recovered thanks to thousands of fragments found in hundreds of hard drives all over the world.
Thomas Crampton - A supposedly journalist for the once famous New York Times. His personality is quite obscure and nothing is known about him, except for a short reference in the once famous Slashdot forum on Internet. He is known for the statement "You erased my career", supposedly written by him in a letter (lost) after supposedly finding that all his work was wiped by the New York Times (reference lost), supposedly a chronic problem of the newspaper in its electronic era (all references lost). Nearly all data on the New York Times has been lost, with exception to a few millions of fragments of articles that may be found now and then, so it is nearly impossible to know who was Thomas Crampton and probably we never will. The statement attributed to him is considered, today, as a markup symbol of the Big Howl of Internet.
When you read the article, you find one of the main reasons he wants the articles back up is because he himself doesn't have copies of the articles. TFA and Slashdot are full of angst towards the megacorp, but nobody seems to have noted this point.
Interesting. I got quite upset with the IHT-NYT change a while ago for exactly this reason: many bookmarks and links to news articles that I had made throughout the years evaporated overnight, making me regret not printing or saving the text of those articles when I had the chance. But apparently the NYT has fixed it now. Crampton links to two articles of a scoop he had a few years ago, and they resolve to a new page. And a bookmark that I have on the computer I'm working on now has the same thing, suggesting that they must have transferred their news archive to the new site.
The original bookmark: http://www.iht.com/articles/2009/02/24/opinion/edcardenas.php now resolves to http://www.nytimes.com/2009/02/24/opinion/24iht-edcardenas.1.20395821.html
I'll try it later with my other bookmarks, but it seems like they have responded to the criticism well.
Analyses are a dime-a-dozen, and as we know from past experience, analysts are often biased, stupid, or insane.
So does it really matter than one analyst came up with a number that, if true, would make NYT look foolish?
I don't know anything about this gentleman, but, maybe, his writings simply go against the current Illiberal pro-Democrat bias of the paper? They weren't always this way — most famously, NYT used to be against government-mandated minimum wage until 1999.
Perhaps, they are trying to score some favors from the current government in the hopes of getting substantial financial help (a bailout, that was, no doubt, already promised to them) and certain writers are no longer welcome?
One does not need to be a "rabid partisan" to fall into disfavor — until recently NYT weren't hiring such partisans anyway. Just not participating in the adoration fest could've been enough. When the company survival is at stake, one can't afford taking chances...
In Soviet Washington the swamp drains you.
"They took my work and erased it! Please mommy help me!" - That's one solution. The other solution is for this journalist to get off his fat ass, buy a personal website, and publish all his back work for everyone to see.
You know, when I left Lockheed ten years ago most of my work ended-up in the dumpster too. That's life. If I felt it was important enough to publish, I'd simply copy it to my c: drive and later my personal website. It's a much simpler solution than whining to my ex-boss. It's MY job to preserve my work, not his.
This journalist thinks his work is so all-important. Well then he should be willing to put up the money to publish it.
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
One of the greatest delusions that people have about the Web is that almost all information can be found on it somewhere. What total nonsense.
Stories rot from the Web faster than newspaper print ever has or ever will. All that we're left with is the most recent version or revision, which may have *nothing* to do with what was first written.
If you don't keep copies of your work that appears on the Web, you might as well have thrown them into a fire-place. And, as for everyone else, if you assume for even a moment that what you read on the Web about what happened even in technology news even five years reflects what people really wrote and thought at the time, you're a fool.
It's thanks to delusions like this that, for example, people can argue sincerely that Windows is popular because it's good; and not because Microsoft forced a monopoly on hardware vendors. Almost all the reports of DoJ vs. Microsoft from the time are long gone now. The proof that Microsoft's products are only popular because Microsoft made damn sure that no one else would have a chance to compete against them has vaporized.
The only thing newsworthy about what's happened here is that people think that stories disappearing like this is in any way what-so-ever noteworthy. It happens every day.
Steven
I'm sure you are not a troll (YANAT), but the other side of slashdot is discussing how cheap data is, woe to the pay providers.
I think you mean that keeping data in a sophisticated manner is what grinds out IT Admin time, which eventually means a salary to pay. But the data itself is cheap, and 50,000 fellas on here can whack up something simple as a makeshift in a week for $5,000 and a month's supply of pizza&caffeine.
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
You complain about how all of your AOL-hosted links ceased to work and how you're unable to update all the places they were used to point to your (currently) Verizon-hosted content. Do you see the problem with this?
The solution to this is to get your own domain, so you retain the ability to move it at will. I started out with my primary domain (http://www.fencepost.net/) because I wanted a reliable email address after two successive ISPs were bought out. I would never use a carrier-provided email address as my primary, though I probably do have an @sbcglobal.net address that will continue to exist until AT&T decides to kill off the last of that Baby Bell.
As I see it, if you want a "permanent" online presence then you have two options: 1) control it yourself with a domain of your own, or 2) find an entity that you are positive will not cease to exist or restructure your presence out of existence.
Your best bet for #2 is probably an email address through your college (assuming you're a college grad) if your college's Alumni Relations office has set something like that up. Generally these are "forwarder" addresses (@alumni.mycollege.edu) that simply pass mail along to another address that you've provided them with, and sending email with that as your return address may be problematic depending on who your actual mailbox is hosted with. It's also not unheard of for colleges (particularly small/poorly funded ones) to go under. GMail does not qualify for #2. Some associations could be considered to qualify for #2 (e.g. ACM, IEEE) but if you're not using their other services then you're paying several hundred dollars a year just for an email address - a domain is cheaper.
For #1, sure it's going to cost you a few dollars and a little time each year, but anyone who's reading Slashdot should be able to register a domain and set up hosting. Simple registration is under $10/year, and depending on your needs hosting might even be available "free" from your registrar. You can also look at services such as NearlyFreeSpeech.net, with hosting prices dependent on your traffic and a minimum deposit of $0.25. If all you're doing is email and a small static website that nobody ever goes to, throw a $10 deposit at them and you're probably set for years. (Disclaimer: I've never used these folks, but they're an example of how little it can take to get things started).
fencepost
just a little off
...I'll gladly clap my hands 40 hours a week in whatever venue you deem most appropriate.
Well now, that depends on what you're willing to have in between your hands while clapping, and how soft your hands are...
Sigs are for losers
But the data itself is cheap, and 50,000 fellas on here can whack up something simple as a makeshift in a week for $5,000 and a month's supply of pizza&caffeine.
I am making a 3TB RAID 10 array (6 1TB drives) rackmount server for around $850
I'll grant you $150 in misc. expense.
The other $4000 is 2.5 hours of link maintenance a week for two years.
But even so, we agree that $100K is ludicrous. That's the price attitude that killed wall street.
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
I'll do it for $50k, and I'll pretend to be genuinely impressed too!
https://www.eff.org/https-everywhere
So where did the value of $100,000 come from?
"To buy that traffic from Google at $.20/click, you'd have to pay $100,000 a month"
So google says its worth 20 cents a click. What if I say it's only worth a cent a click then its worth $5000, or perhaps at 0.1 cents a click its worth $500.
All make believe. Don't tell me "an expert told you so" because I think a bunch of "experts" called "bankers" just got discredited a few months ago for overvaluing other virtual sales... ;-)
Except I guess this is America so the writer is probably getting ready to sue the paper for his $100,000 lost one month's earnings on the grounds that he read it on the internet that he's lost $100K. Not bad for a bloke who probably usually earns $2000 a month but will keep really quiet about the actual figure he actually earns :-)
Comment removed based on user account deletion
my old national geographic magazines from 50 years ago still work. even the old ads are still there... NEAT!
seriously, in a hundred years there is going to be a huge history gap. it's great to read old magazines and books and newspapers. what is anyone in the year 2100 going to read from 2009? nothing will be printed out or compatible with whatever brain-link stuff they use in the future...
all you will have is old shaky-cam JJ Abrams videos as a record of the early 21st century... sad.
Ask Me About... The 80's!