Bringing the Library of Congress Newspapers Online
smooth wombat writes "If you want to read a newspaper article from sometime in the past (say 1920 for example) your only options right now are to go to your local library and hope they have a microfiche file of that paper or take a visit to Washington, DC and the Library of Congress. That may soon change. CNN is reporting that by 2006 the government will have the first of 30 million digitized pages from papers published from 1836 through 1922 which will be available to anyone who has a connection to the net. The project is a joint cooperation between the National Endowment for the Humanities and the Library of Congress.
The span of the joint project is limited because type faces of printers used before 1836 are too difficult for optical scanners to read, and copyright restrictions are in force on papers published after 1923."
Seeing that Google has been searching on 8 billion pages, these 30 million seems pretty insignificant in terms of volume, but I'm not so sure about the significance of the content, what did they write/read in 19th Century?
I wonder how many people will actually wait for Google/MSN to index them and search from there.
Rock that crushes, Paper & Scissors that don't matter.
Yet another good reason for copyrights to expire after a reasonable number of years.
I'm surprised they haven't done this sooner... But supposedly, MIT is working on a thing to scan in every document ever in the LOC, for internet access. A monumental task.
If the Library of Congress is entirely digitized, that's going to totally screw up the "burning Libraries of Congress" measurement of energy output.
In 7 years we'll be able to read about black Monday.
What is the law regarding an online library? I guess not even the government can do it.
The local library has every edition of the local papers on microfilm, and I suppose they could put it all on DVD too.. When does it become a copyright issue?
I don't need no instructions to know how to rock!!!!
Many newspapers charge obscene fees to access articles more than a week old, yet provide free of charge to library patrons access to their entire archive electronically.
The span of the joint project is limited because type faces of printers used before 1836 are too difficult for optical scanners to read
Surely the OCR process could be recalibrated to identify a different typeface ?
The first 30 million is free. After that they get you hooked...
Now, where is the open source OCR software that they can use to read the old wonky typefaces?
DAMN YOU OCTODOG! DAMN YOU TO HELL!
So, how many Libraries of Congress is tha.... Oh.
Sometimes seventeen/Syllables aren't enough to/Express a complete
We already have 20 million.
(From the digitized 1844 paper...)
Howdy, pardner! To read about that scalliwag Black Bart's shootout with Arizona Jack last week, you'll need to pay two bits per article or buy a subscription for a gold dollar or its equivalent in salt pork or live chickens.
I watched C-beams glitter in the dark near the Tannhauser gate.
This is not entirely accurate. The Washington Post's archive is available from 1877 to present day if you're willing got pay.
From 1877-1986, the Post offers the full page scans of the articles as they appeared in the newspaper. Begining in 1987, the full text versions of articles (without photos) are available.
So close and yet so far from the world's perfect ID number
Oxford University did a trial project to see how difficult it would be to place some 18th and 19th Century journals online. Here is the final report giving some of the difficulties they had. The journals are available here and make for some very interesting browsing.
These, to me, were always half the fun whenever I perused old microfiche in the library.
There is a bar in NYC called McSorley's, which has been in continuous existance since 1846 or so. They have framed newspaper articles on the wall from over a hundred years ago, 130 year old pictures, political campaign buttons from McKinley's run. Talk about a neat experience.
Actually seeing the old print would mean more to me. I rather hope that they serve images of the old papers, not just the computer-read text. But hey, that's just me.
Newspapers need to waive their copyright restrictions for this particular project. They have a right to control their content, but copyright should not be an impediment to archiving this information. Maybe there's a way to apply the copyright to the end user (i.e. whoever is viewing this content online) without completely excluding the stuff from being indexed. An 80-year blind spot practically ensures irrelevance.
perl -e 'foreach(values %SIG){$_="IGNORE";}while(){}'
How useful will this be? If you do not know the date and the page of the article you will not be able to easly find it.
Yah, you can page through the archives, but I am sure that these are not small images to look at and and band width is valuable.
And once thay scan it into didgital format, THEY will have copyright over the Scanned image. What kind of restrictions will they place over their new property?
I wouldn't doubt that their are strings attached.
and copyright restrictions are in force on papers published after 1923
in case anyone was still left who thought copyright laws were reasonable....
... and for once, it's interesting.
To most Americans, the period from 1790 to 1915 is kind of a mystery except for Gettysburg and the Ford Theater.
There was tremendous growth in the number of newspapers during that period, starting at a handful in 1790 to thousands in the 1920's. They fell on hard times with the advent of radio.
During that time, everyone with a spare nickel and a desire to publish something put out their own rag. They would trade stories, publish letters to each other, have flame wars, etc. I think it must have looked a lot like the blogosphere, with a bit more latency.
The more things change, the more they stay the same. Sometimes, we need to see the old news to recall that.
sigs, as if you care.
And if not, couldn't they still post a picturized version? Even if it's essentially digitized microfilm, there's still a lot more you can do with a digital copy than with a microfilm (such as save where you left off, bookmark, backup in case of fire etc.)
I don't understand why the text HAS to be selectable... That's cooler, but it shouldn't need to be a requirement.
Pay college students $0.03 an hour to type them in. Monkey see monkey do monkey buy coffee with proceeds.
Presumably papers after 1923 will be added one year at a time as the copyright expires? Or will the mouse protection league keep them locked away for ever?*
*On a related note a BBC radio broadcast about a hitch hiking trip had a comment from a Fat Woman in her slightly derranged middle age who was on her way to Disney World in Florida. She said that America would be a much better place if Disney ran it, just look at how nice and clean and safe Disney World is. Perhaps she should take a closer look at opensecrets.org and where the money tree grows.
Beep beep.
Too bad they aren't scanning newspapers from say the revolutionary war period. I think it would have been really interesting to read the war and the general thoughts about it at the time.
I'm sure OCR technology will advance quickly enough to allow the scanning of these newspapers.
This is really cool. IMHO this was a major medium that was lacking a web interface. There were a lot of times when I would search for a piece of information that I new was in a paper, but wasn't archived on the net anywhere.
My first time in the paper: Front page of Times Union on Feburary 19th/20th, 1989.
How many Volkswagen Beetles would be needed to contain this?
With all the OCR problems, i'm sure the folks down at Project Gutenburg wouldn't mind taking this on.
you would thnk congress would be able to give their own library an exception to the rule. Just a thought.
W00T! Time travel in my own lifetime! I think it's going to be a lot of fun reading those old papers and getting a flavor for life during those times. This is also going to be a great resource for historians and geneaologists.
To the making of books there is no end, so let's get started
I'm amazed they still have these archives. One of my favourite people, Nicholson Baker has made a personal crusade, written books on the subject, and put enormous amounts of his own cash, into preserving newspapers that government archives are hellbent on destroying. In particular he attacks two fallacies of document archiving:
Paper does not self-destruct in a short space of time, which was among the flawed rationales for misguided conversion to microfiche:
Microfiche is actually far more vulnerable to destruction than the originals. Decades of archives have been lost because they were microfiched and the originals pulped.
I fully expect digital archives to be even more fragile (as various /. articles over the years, not to mention much research into digital curatorship, attest)
you had me at #!
My library had the NY Times on microfilm so I decided it would be interesting to look up famous dates. I checked Dec 7 1941 but there was no article on Perl Harbor. Figuring with the time difference and printing times it didn't make it I checked Dec. 8th. Still nothing. Gradually over the next few days the story began to trickle out that "yes, something happened", "a few ships were damaged", "quite a few ships were damaged". It was a week later before the story was consistant with what we now believe happen. Very different from the "Live from the field" news of today. I have been present at two events that made it into the newspaper. In neither case could I even recognize the article as describing the same event.
Odd, I only have one. Is this normal? Should I be worried?
find / -name "*.sig" | xargs rm
Government owned, not copyright.
nice troll, dude.
This has already been done for journals by the Making of America Project. So wouldn't the
process be similar for for newspapers. But, newspapers are printed on lower quality paper and
possibly lower quality printing technology.
Making of America (MOA)
http://cdl.library.cornell.edu/moa/ (Cornell U)
http://www.hti.umich.edu/m/moagrp/ (U Michigan)
I was thinking about digging up old National Geographics, scanning the text and photos, and posting that online. It would make for a great distributed project. However, Nats from before 1923 are rare and expensive. I wonder if I can find them at libraries...
Computers are useless. They can only give you answers.
-- Pablo Picasso
"CNN is reporting that by 2006 the government will have the first of 30 million digitized pages from papers published from 1836 through 1922 which will be available to anyone who has a connection to the net." Newspapers from 1923 onwards will be available after "rectification."
I was so ticked about the extension of copyrights that I went as Sonny Bono this year for Halloween. I dripped fake blood (candy apple sauce across my face), and taped branches to my ski jacket. I went around saying " I got you babe" check my blog post about it http://yorkpaddy.blogspot.com/2004/11/halloween-04 .html
"brxref
All digitally enhanced and edited to give you a better happier feeling of your government, and America ass seen through the eyes of censorship. After what has happened over the last few years the last thing I want to depend on is the government telling me what has happened in history, or telling me WHAT parts of American history (aka news) I can have access too. Yes this is something they are going to be offering, and their will be other areas where you could get this information, but I can see a lot of places (libraries primary example) that will no longer carry or supply this type of information, because the government will supply it to us.
TruePunk | Games
Ads today are complete rubbish. Even looking back at ads from the 80`s in pcmagazine, they were a lot better then. Back then they would tell you the actual benefits and features of a product. Now you get a picture of the sky, with a window and a question, "where do you want to go today?". I want to know what I'm buying, and I don't think its an artists rendition of utopia, its a computer program.
"brxref
Mickey Mouse is keeping us from reading newspapers from the great depression? How powerful should one rat be?
is competition good, or is duplication of effort bad?
There's an old saying that goes:
"If you don't have anything constructive or nice to say, then don't say anything."
If anyone ever wonders why Liberalism is so hated and considered by many to be a mental disorder, the parent above has provided an excellent example. Thanks for that...
It'll be "Burning Library of Congress servers" now?
There's an old saying that goes:
"If you don't have anything constructive or nice to say, then don't say anything."
If anyone ever wonders why Conservatism is so hated and considered by many to be a mental disorder, the parent above has provided an excellent example. Thanks for that...
will make hay with this archive.
What were the jury members saying after the trial? Who were the witnesses and what was their standing in the community? How did the decedent's estate fare where the bastards claimed that they were not bastards?
Aside from the births and deaths, the property records will be very valuable.
Many of these documents are available in microform, but the actual value of the documents will be increased exponentially where the full text is searchable. At present the vast majority are available as images.
Proquest (a database vendor) has something called historical New York Times, Washington Post, and a few other historical newspapers. some other vendors have similar databases. so if you want to do this now go to your local library (or not...some states let you access databases from home) and use one of those databases.
Libraries of Congress is that, you ask? D.
They have archives for the NY Times, Hartford Courant, LA Times, Wall Street journal, Washington Post, etc. While the archives don't go back too far (twenty years for some papers, six for the NY Times) it is nice to see governments offering citizens access to this information free of charge. I use it quite frequently, and with hope they can get funding for the historical New York Times service (which is absolutely incredible).
www.lonseidman.com
f4g
Each state has an archives + history department (or somethign similar to archive all state history). You can go to your state's archies and history dept and pull just about any state newspaper from any time period that you want. We go from the present (well a couple of weeks before present, it takes us a few days to convert the newspaper to microfilm). our oldest newspaper on microfilm is from 1736.
Yes its not online. we don't have the staff or money to put it online, pesently, but we are trying to put as much of our records online right now.
Anyway, you can check out the one I work for, and if you Live in Mississippi, please come by and check us out. We are open 6 days a week and are totally free.
http://www.mdah.state.ms.us/
Does the name Pavlov ring a bell?
I work for MDAH. this is one of the things we do, is archive all state of mississippi newspapers.
http://www.mdah.state.ms.us/
Does the name Pavlov ring a bell?
How much would it cost to start up a small newspaper with a real press and newspaper paper? Like if I wanted to make 1000 copies a week or day?
It seems it should be pretty cheap, like cheaper than a computer and a laser printer.
Who makes real printing presses any more?
obviously no deficiencies vs. no obvious deficiencies
h0m0ph0b3
>>"...I'm not so sure about the significance of the content, what did they write/read in 19th Century?
"
Presumably, everything you missed by not taking history.
In that timespan, the U.S. expanded to the Pacific; fought wars with Mexico and Spain; participated in World War One; prompted the formation of the League of Nations; built the world's largest railway network; invented the telegraph, telephone, electric light, and the airplane; developed mass production and the auto industry; produced inumerable works of literature (start with Sam Clemens); fought the Civil War and abolished slavery; spawned the movie, recording, radio and popular music industries.
For a start.
-- Slashdot: When Public Access TV Says "No"
An complete resource for all those Call of Cthulhu campains.
If we don't make light of everything, we are just stumbling in the dark - Blank
I wonder if they'll join the crowd and sue to protect their failing business model?
Buy Steampunk Clothing Online!
However, it looks like it might (somehow) eventually be limited to education users only: http://www.bl.uk/collections/britishnewspapers1800 to1900.html/
which would be crazy.
http://www.libs.uga.edu/hargrett/rarebooksonline.h tml
Many of these books have long-s. In fact this series of Botany Magazines is a beautiful representation of a lost book format.
Curtis's Botanical Magazine
http://fax.libs.uga.edu/QK1xC981/cbmmenu.html
I guess they aren't interested in working with LizardTech and DjvuLibre.
This project is EXACTLY what this software technologies were developed to preserve.
http://www.djvuzone.org/links/The site is littered with information and thanks to AT&T Bell Labs we have a wonderful product freely available.
It's a matter of desire to make the OCR capable of encoding the world's typefaces. If they can scan Sanskrit, Hebrew and Egyptian Hieroglyphics they sure as hell can scan Beowulf in it's original print form.
On a bright sunny day in June of 1991, I spent the entire day in the reading room of the LOC. I picked my grandmother's birthday in 1914 and one by one went through as many newpapers as I could. The attendant would come by and collect paper request forms, then after a scant 20 minutes she would make another round and deliver a neat little roll of microfiche. Minus breaks to relieve myself, I spent the entire day from the minute the place opened until they kicked me out, staring at reel-after-reel of the newspapers. I remember that at the time I couldn't get enough. I did the same thing with my mother's birthday in 1935. I hope they can come to term with the insane copyright laws so that someday I can do this from the comfort of my damp basement.......
The sensationalism. The trivial stories dominating the front pages. Suspense and drama instead of fact based reporting. The Ricky Lake of news if you will. Wait... we're still doing that? Oh... well uh, nevermind. ;-)
With no "soul-sucking" registration required?
Wrong. That section of the constitution does not require copyright laws to be enacted. It only gives congress the power to do so if and when it decides to.
The aristocratic party in the Roman state maintained power by laying claim to the control of religion, and by keeping secret the essential data of the State - for instance, only the aristocracy had access to which days were, and were not, legal for public business. Plebeians could not easily pursue litigation because they were denied access to necessary information. Copyrights, patents and secrecy are being used by the new aristocracy - corporates - to prevent the rest of us from mounting any kind of challenge to their monopolies.
The aristocrats also controlled access to knowledge of world events, and the army. As a result, the average Roman citizen could be kept ignorant of any events that the patricians did not want them to find out about.
Substitute State and Party for Patricians and you have a major feature of the Soviet Empire. Add restrictions in travel, especially the prevention of movement and free speech for suspected dissidents - look at the no-fly list - and the current US administration is borrowing from the Soviet Empire too. What with the claim to be the custodians of morality (God is angry with you if you have the wrong kind of consensual sex but supports you if you screw people over in the interests of power or profit) it's a dirty picture.
Against this is the "old" America - the ideas that built the place in the first place. Copyrights? Ignore everybody else's until well into the 20th Century. Freedom of thought. Separation of Church and State. Unwillingness to get involved in the politics of foreigners. Self-sufficiency.
It's good that the Library of Congress is still part of Old America, and it's good that the Internet still provides an equivalent of the Soviet samizdat - but over the next few years these things will need defending as never before.
And history provides a precedent. During WW1 the British government behaved like the US government to-day, creating draconian secrecy laws in the face of a supposed threat from German spies. Even to-day, Civil Servants are still trying to preserve that secrecy, the result of a wartime overreaction. That's nearly 90 years of weakened democracy, resulting in a country where, for instance, government IT foulups are never investigated publicly because it might embarrass the Civil Servants responsible, and as a result billions are still wasted in systems that don't work. I could go on, but you get my drift. We should support these projects for access to knowledge not just because they are interesting, but because they are part of a view of society that created the modern world, not a view that is trying to drag us back to the past (and the past was pretty horrible, if you had to live there.)
Panurge has posted for the last time. Thanks for the positive moderations.
No, in 2025 copyrights will have been re-extended to 150+ years. Thanks to Sonny Bono and friends, the public domain stopped in 1922.