Bringing the Library of Congress Newspapers Online
smooth wombat writes "If you want to read a newspaper article from sometime in the past (say 1920 for example) your only options right now are to go to your local library and hope they have a microfiche file of that paper or take a visit to Washington, DC and the Library of Congress. That may soon change. CNN is reporting that by 2006 the government will have the first of 30 million digitized pages from papers published from 1836 through 1922 which will be available to anyone who has a connection to the net. The project is a joint cooperation between the National Endowment for the Humanities and the Library of Congress.
The span of the joint project is limited because type faces of printers used before 1836 are too difficult for optical scanners to read, and copyright restrictions are in force on papers published after 1923."
Yet another good reason for copyrights to expire after a reasonable number of years.
I'm surprised they haven't done this sooner... But supposedly, MIT is working on a thing to scan in every document ever in the LOC, for internet access. A monumental task.
If the Library of Congress is entirely digitized, that's going to totally screw up the "burning Libraries of Congress" measurement of energy output.
What is the law regarding an online library? I guess not even the government can do it.
The local library has every edition of the local papers on microfilm, and I suppose they could put it all on DVD too.. When does it become a copyright issue?
I don't need no instructions to know how to rock!!!!
Many newspapers charge obscene fees to access articles more than a week old, yet provide free of charge to library patrons access to their entire archive electronically.
The span of the joint project is limited because type faces of printers used before 1836 are too difficult for optical scanners to read
Surely the OCR process could be recalibrated to identify a different typeface ?
Now, where is the open source OCR software that they can use to read the old wonky typefaces?
DAMN YOU OCTODOG! DAMN YOU TO HELL!
I'm not so sure about the significance of the content, what did they write/read in 19th Century?
Obituaries and marriage announcements, for one this. This stuff will be a gold mine for genealogists.
Lost: Sig, white with black letters. No collar. Reward if found!
The span of the joint project is limited because type faces of printers used before 1836 are too difficult for optical scanners to read
That excerpt strongly implies the use of OCR, in which case the search engines probably won't require a substantial amount of time to index the archive.
On a related note, many historically memorable events occurred during the timeframe mentioned. These include the American Civil War, the Titanic disaster, and many others.
Do you like German cars?
We already have 20 million.
The fact that you have no idea what people wrote or read about shows the importance of making the materials more accessible.
(From the digitized 1844 paper...)
Howdy, pardner! To read about that scalliwag Black Bart's shootout with Arizona Jack last week, you'll need to pay two bits per article or buy a subscription for a gold dollar or its equivalent in salt pork or live chickens.
I watched C-beams glitter in the dark near the Tannhauser gate.
This is not entirely accurate. The Washington Post's archive is available from 1877 to present day if you're willing got pay.
From 1877-1986, the Post offers the full page scans of the articles as they appeared in the newspaper. Begining in 1987, the full text versions of articles (without photos) are available.
So close and yet so far from the world's perfect ID number
Oxford University did a trial project to see how difficult it would be to place some 18th and 19th Century journals online. Here is the final report giving some of the difficulties they had. The journals are available here and make for some very interesting browsing.
These, to me, were always half the fun whenever I perused old microfiche in the library.
There is a bar in NYC called McSorley's, which has been in continuous existance since 1846 or so. They have framed newspaper articles on the wall from over a hundred years ago, 130 year old pictures, political campaign buttons from McKinley's run. Talk about a neat experience.
Actually seeing the old print would mean more to me. I rather hope that they serve images of the old papers, not just the computer-read text. But hey, that's just me.
Newspapers need to waive their copyright restrictions for this particular project. They have a right to control their content, but copyright should not be an impediment to archiving this information. Maybe there's a way to apply the copyright to the end user (i.e. whoever is viewing this content online) without completely excluding the stuff from being indexed. An 80-year blind spot practically ensures irrelevance.
perl -e 'foreach(values %SIG){$_="IGNORE";}while(){}'
...I'm not so sure about the significance of the content, what did they write/read in 19th Century?...
What they named news at their time is what we call history right now.
Your head a splode
and copyright restrictions are in force on papers published after 1923
in case anyone was still left who thought copyright laws were reasonable....
... and for once, it's interesting.
To most Americans, the period from 1790 to 1915 is kind of a mystery except for Gettysburg and the Ford Theater.
There was tremendous growth in the number of newspapers during that period, starting at a handful in 1790 to thousands in the 1920's. They fell on hard times with the advent of radio.
During that time, everyone with a spare nickel and a desire to publish something put out their own rag. They would trade stories, publish letters to each other, have flame wars, etc. I think it must have looked a lot like the blogosphere, with a bit more latency.
The more things change, the more they stay the same. Sometimes, we need to see the old news to recall that.
sigs, as if you care.
And if not, couldn't they still post a picturized version? Even if it's essentially digitized microfilm, there's still a lot more you can do with a digital copy than with a microfilm (such as save where you left off, bookmark, backup in case of fire etc.)
I don't understand why the text HAS to be selectable... That's cooler, but it shouldn't need to be a requirement.
Too bad they aren't scanning newspapers from say the revolutionary war period. I think it would have been really interesting to read the war and the general thoughts about it at the time.
I'm sure OCR technology will advance quickly enough to allow the scanning of these newspapers.
With all the OCR problems, i'm sure the folks down at Project Gutenburg wouldn't mind taking this on.
I'm amazed they still have these archives. One of my favourite people, Nicholson Baker has made a personal crusade, written books on the subject, and put enormous amounts of his own cash, into preserving newspapers that government archives are hellbent on destroying. In particular he attacks two fallacies of document archiving:
Paper does not self-destruct in a short space of time, which was among the flawed rationales for misguided conversion to microfiche:
Microfiche is actually far more vulnerable to destruction than the originals. Decades of archives have been lost because they were microfiched and the originals pulped.
I fully expect digital archives to be even more fragile (as various /. articles over the years, not to mention much research into digital curatorship, attest)
you had me at #!
My library had the NY Times on microfilm so I decided it would be interesting to look up famous dates. I checked Dec 7 1941 but there was no article on Perl Harbor. Figuring with the time difference and printing times it didn't make it I checked Dec. 8th. Still nothing. Gradually over the next few days the story began to trickle out that "yes, something happened", "a few ships were damaged", "quite a few ships were damaged". It was a week later before the story was consistant with what we now believe happen. Very different from the "Live from the field" news of today. I have been present at two events that made it into the newspaper. In neither case could I even recognize the article as describing the same event.
Actually, (having done a little historical research myself) those kinds of things are relatively easy to find. (church and public records)
In general, the most interesting stuff is often the stuff which was the least interesting when the newspaper was published, such as advertisments, expressions and figures-of-speech in the articles, opinion pieces, the style of reporting, the biases.
All these little things that generally convey the atmosphere and mindset of an age. It's easy to find out facts, like the construction date of a factory. It's more difficult to find out what people were thinking about the new factory.
It'll be a big help to me personally!
I work as a research assistant, which involves a great deal of time going through libraries and copying old journal articles (and I get paid, too, can you believe that?)
Eight or nine months ago I was looking stuff up for my professor's book on the history of the death penalty in the United States, and she had me track down an article from the Hattiesburg (Miss.) American on an outlaw named John Long, who was hanged in Mississippi in 1870. No library in New England archives the Hattiesburg American--not even Harvard or the Athenaeum--so in the end I had to call the Hattiesburg Public Library and ask the librarian to make me a photocopy of that article.
(We had a hard time understanding each other--I had to spell out the name "John Long" because my Boston accent confused her. I had the same problem in South Carolina when I asked the gas station attendant what town I was in. It was Summerton, which she pronounced something like "Suhhhn't'n"--eventually she had to point to it on a map.)
Believe me, this project could save me a lot of backache and eyestrain. Looking through six months of the New York Times from 1899 on microfilm because some footnoter wasn't more specific than "late 1899" is no joke.
This has already been done for journals by the Making of America Project. So wouldn't the
process be similar for for newspapers. But, newspapers are printed on lower quality paper and
possibly lower quality printing technology.
Making of America (MOA)
http://cdl.library.cornell.edu/moa/ (Cornell U)
http://www.hti.umich.edu/m/moagrp/ (U Michigan)
Presumably papers after 1923 will be added one year at a time as the copyright expires?
The Mickey Mouse Protection Act,(aka Sonny Bono Copyright Term Extension Act) tacked on an immideate and retroactive 20 years to copyright length. So, don't look for anything to be entering the public domain until 1/1/2019. And that's not even considering the likelyhood of Congress extending the length of copyrights again.
--You will rephrase your request for me to go to hell. Goto statements are not acceptable programming constructs
Ads today are complete rubbish. Even looking back at ads from the 80`s in pcmagazine, they were a lot better then. Back then they would tell you the actual benefits and features of a product. Now you get a picture of the sky, with a window and a question, "where do you want to go today?". I want to know what I'm buying, and I don't think its an artists rendition of utopia, its a computer program.
"brxref
Mickey Mouse is keeping us from reading newspapers from the great depression? How powerful should one rat be?
is competition good, or is duplication of effort bad?
In 7 years we'll be able to read about black Monday.
Not if someone patents the act of reading historical articles about black Monday!
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
WWI!
Isn't it amazing that reporting on WWII is still under copyright?
One man's -1 Flamebait is another man's +5 Funny.
In 7 years we'll be able to read about black Monday.
Nope; everything from 1909 to 1922 is only in the public domain because it was grandfathered in in the Sonny Bono Copyright Extension Act. Newspapers that were published in 1929 will be in the public domain in until 1929+95 years. So in 2025 you'll be able to read about Black Monday.
All digitally enhanced and edited to give you a better happier feeling of your government
The LoC would have their reputation destroyed among the librarian and researcher communities if they were caught doing that; and they would be, because hard-core researchers would notice any significant changes in the text and go back to the microfilm and original text copies.
Librarians tend to be among the strongest anti-censorship groups in America. There's never been any insinuation that the Library of Congress was having its strings pulled by the forces in power. I trust the Library of Congress to be a neutral provider of information much more then, say, the Washington Post or the Encyclopedia Britannica.
I can see a lot of places (libraries primary example) that will no longer carry or supply this type of information, because the government will supply it to us.
Most libraries are part of government. Why should you trust your home-town library more than the Library of Congress?
they put effort into changing the format from paper to Jpeg or whatever.
Feist says that just effort doesn't a copyright make; it requires creative input.
Project guttenberg has their small print because of editing
Reread the small print. It's not a copyright license, it's a trademark license. If you remove the Project Gutenberg trademark from the etext, you can do whatever you want with it. (Assuming it's not one of the rare ones that's still under copyright, but the author gave the right post it.)
Each state has an archives + history department (or somethign similar to archive all state history). You can go to your state's archies and history dept and pull just about any state newspaper from any time period that you want. We go from the present (well a couple of weeks before present, it takes us a few days to convert the newspaper to microfilm). our oldest newspaper on microfilm is from 1736.
Yes its not online. we don't have the staff or money to put it online, pesently, but we are trying to put as much of our records online right now.
Anyway, you can check out the one I work for, and if you Live in Mississippi, please come by and check us out. We are open 6 days a week and are totally free.
http://www.mdah.state.ms.us/
Does the name Pavlov ring a bell?
>>"...I'm not so sure about the significance of the content, what did they write/read in 19th Century?
"
Presumably, everything you missed by not taking history.
In that timespan, the U.S. expanded to the Pacific; fought wars with Mexico and Spain; participated in World War One; prompted the formation of the League of Nations; built the world's largest railway network; invented the telegraph, telephone, electric light, and the airplane; developed mass production and the auto industry; produced inumerable works of literature (start with Sam Clemens); fought the Civil War and abolished slavery; spawned the movie, recording, radio and popular music industries.
For a start.
-- Slashdot: When Public Access TV Says "No"
An complete resource for all those Call of Cthulhu campains.
If we don't make light of everything, we are just stumbling in the dark - Blank
I believe a lot of old films have already been lost, because tracking the current copyright holder is too expensive or simply cannot be done, but without their permission it is illegal to copy the old & decaying prints onto new media.