Bringing the Library of Congress Newspapers Online

← Back to Stories (view on slashdot.org)

Bringing the Library of Congress Newspapers Online

Posted by CmdrTaco on Thursday November 18, 2004 @08:30AM from the thats-a-whole-lotta-bits-and-bites dept.

smooth wombat writes "If you want to read a newspaper article from sometime in the past (say 1920 for example) your only options right now are to go to your local library and hope they have a microfiche file of that paper or take a visit to Washington, DC and the Library of Congress. That may soon change. CNN is reporting that by 2006 the government will have the first of 30 million digitized pages from papers published from 1836 through 1922 which will be available to anyone who has a connection to the net. The project is a joint cooperation between the National Endowment for the Humanities and the Library of Congress. The span of the joint project is limited because type faces of printers used before 1836 are too difficult for optical scanners to read, and copyright restrictions are in force on papers published after 1923."

22 of 240 comments (clear)

Min score:

Reason:

Sort:

Copyright limits by hey · 2004-11-18 08:31 · Score: 4, Insightful

Yet another good reason for copyrights to expire after a reasonable number of years.
1. Re:Copyright limits by jc42 · 2004-11-18 11:11 · Score: 2, Insightful
  
  This might be a good place to bring up the old suggestion that anything out of print for a year become public domain. A newspaper publisher could then maintain their copyright by setting up a method of reprinting old issues. But most of them wouldn't find this lucrative enough, and would just let the copyright expire. Then the LoC could include most newspapers after a year.
  
  One of the very real problems with copyright law is that it allows publishers to "capture" our history and prevent access to some of the most important primary documents. This really should be fixed, if you think that there's anything to learn from history.
  
  Of course, one of the things that history shows is that we rarely learn anything from history.
  
  --
  Those who do study history are doomed to stand helplessly by while everyone else repeats it.
2. Re:Copyright limits by Selanit · 2004-11-18 11:54 · Score: 2, Insightful
  
  This might be a good place to bring up the old suggestion that anything out of print for a year become public domain. A newspaper publisher could then maintain their copyright by setting up a method of reprinting old issues. But most of them wouldn't find this lucrative enough, and would just let the copyright expire. Then the LoC could include most newspapers after a year.
  
  While I approve the impulse, I think this would be a nightmare to maintain, particularly if the "expire after a year" idea was applied to all copyrighted material. If applied to books, that means that anything that is not successful enough to need reprinting every year would soon go out of copyright, thereby making it much more difficult for anyone to even make a pretense at supporting themselves through writing. Our very first copyright legislation, the Copyright Act of 1790 provided a term of 14 years, extensible to 26 on application; that's how long the founding fathers (many of whom were in Congress at that point) felt was necessary to adequately fulfill the Constitution's requirement to "promote the Progress of Science and useful Arts" in Section 8 Clause 8. You could argue that they were just copying the Statute of Anne (1710) which set forth the same period of protection for copyrighted works; but I would argue that if they had felt that a longer or shorter period of coverage was required, they'd have changed it. Anyway, having the copyright expire after one year out of print would drastically reduce the coverage period of any work that failed to stimulate instant and ongoing demand.
  
  Furthermore, how would we apply "out of print" to works that are copyrighted but never printed? Take software. You can download programs that are years old -- shareware, old open source stuff, etc -- long after the original copyright holder has lost all interest in the program. Is that "in print" or not?
  
  All that said, I can see you reason for wanting to apply such a rule specifically to newspapers, and perhaps to other current-events publications whose value declines rapidly with age (news magazines, etc). If the rule was limited to those sorts of publications, I guess I'd support it. Though I should point out that leaving those sorts of things under copyright for a somewhat longer period of time has two benefits: recycling material from an old article that is copyrighted is plagiarism; but doing so from a public domain article is perfectly kosher. Letting the news into the wild too soon might serve to decrease the originality of the news, particularly in opinion pieces. Second, some article writers sell their articles to multiple publications over time. It is sometimes easier to re-sell an article once it's been out of the public view for a while. Therefore, in order to protect the livelihoods of struggling writers, it would be better to give them a longer grace period before the work goes public domain.
  
  One of the very real problems with copyright law is that it allows publishers to "capture" our history and prevent access to some of the most important primary documents.
  
  I agree. I just think that your solution has a lot of side-effects that would need to be carefully weighed and balanced before it was put into effect.
Re:Google by bsartist · 2004-11-18 08:38 · Score: 5, Insightful

I'm not so sure about the significance of the content, what did they write/read in 19th Century?

Obituaries and marriage announcements, for one this. This stuff will be a gold mine for genealogists.

--
Lost: Sig, white with black letters. No collar. Reward if found!
Re:Google by Scoria · 2004-11-18 08:39 · Score: 4, Insightful

The span of the joint project is limited because type faces of printers used before 1836 are too difficult for optical scanners to read

That excerpt strongly implies the use of OCR, in which case the search engines probably won't require a substantial amount of time to index the archive.

On a related note, many historically memorable events occurred during the timeframe mentioned. These include the American Civil War, the Titanic disaster, and many others.

--
Do you like German cars?
Re:Google by 44BSD · 2004-11-18 08:40 · Score: 5, Insightful

The fact that you have no idea what people wrote or read about shows the importance of making the materials more accessible.
Half the fun of old papers is... by MarkEst1973 · 2004-11-18 08:41 · Score: 5, Insightful

seeing the old typesets, how they laid the papers out, the ancient advertisements.
These, to me, were always half the fun whenever I perused old microfiche in the library.
There is a bar in NYC called McSorley's, which has been in continuous existance since 1846 or so. They have framed newspaper articles on the wall from over a hundred years ago, 130 year old pictures, political campaign buttons from McKinley's run. Talk about a neat experience.
Actually seeing the old print would mean more to me. I rather hope that they serve images of the old papers, not just the computer-read text. But hey, that's just me.
The newspapers need to step up by mogrify · 2004-11-18 08:42 · Score: 3, Insightful

Newspapers need to waive their copyright restrictions for this particular project. They have a right to control their content, but copyright should not be an impediment to archiving this information. Maybe there's a way to apply the copyright to the end user (i.e. whoever is viewing this content online) without completely excluding the stuff from being indexed. An 80-year blind spot practically ensures irrelevance.

--
perl -e 'foreach(values %SIG){$_="IGNORE";}while(){}'
Re:Google by c0p0n · 2004-11-18 08:43 · Score: 4, Insightful

...I'm not so sure about the significance of the content, what did they write/read in 19th Century?...

What they named news at their time is what we call history right now.

--

Your head a splode
copyright insanity by drDugan · 2004-11-18 08:45 · Score: 4, Insightful

and copyright restrictions are in force on papers published after 1923

in case anyone was still left who thought copyright laws were reasonable....
1. Re:copyright insanity by rewt66 · 2004-11-18 09:12 · Score: 2, Insightful
  
  Um, why exactly is this moderated "troll"? I'd call it "insiteful", myself, but I've burned all my moderation points for the day...
  
  The point is that this is a perfect illustration of why the current copyright length is insane. It's something you can use to explain it to your neighbors, and they might get it. It's even something you might be able to use to explain to your legislator in terms they can understand ("hey, look, long copyrights even get in the way of this perfectly reasonable government project!")
And even so... by BobPaul · 2004-11-18 08:45 · Score: 2, Insightful

And if not, couldn't they still post a picturized version? Even if it's essentially digitized microfilm, there's still a lot more you can do with a digital copy than with a microfilm (such as save where you left off, bookmark, backup in case of fire etc.)

I don't understand why the text HAS to be selectable... That's cooler, but it shouldn't need to be a requirement.
1. Re:And even so... by UWC · 2004-11-18 08:47 · Score: 2, Insightful
  
  I'd imagine the text of a newspaper would take up much less storage space and bandwidth than would a picture of the newspaper. Plus the ability to be searched.
  
  --
  Honor Among Slackers. A veri
Why not pass it through project Gutenburg? by t0qer · 2004-11-18 08:49 · Score: 2, Insightful

With all the OCR problems, i'm sure the folks down at Project Gutenburg wouldn't mind taking this on.
can't scan? by toby · 2004-11-18 08:53 · Score: 2, Insightful

type faces of printers used before 1836 are too difficult for optical scanners to read
Bollocks. Even if they are trying to OCR this stuff, it's critical that the original page bitmaps remain available, anyway.
I'm amazed they still have these archives. One of my favourite people, Nicholson Baker has made a personal crusade, written books on the subject, and put enormous amounts of his own cash, into preserving newspapers that government archives are hellbent on destroying. In particular he attacks two fallacies of document archiving:
Paper does not self-destruct in a short space of time, which was among the flawed rationales for misguided conversion to microfiche:
Microfiche is actually far more vulnerable to destruction than the originals. Decades of archives have been lost because they were microfiched and the originals pulped.
I fully expect digital archives to be even more fragile (as various /. articles over the years, not to mention much research into digital curatorship, attest)

--
you had me at #!
Re:Google by k98sven · 2004-11-18 08:57 · Score: 4, Insightful

Actually, (having done a little historical research myself) those kinds of things are relatively easy to find. (church and public records)

In general, the most interesting stuff is often the stuff which was the least interesting when the newspaper was published, such as advertisments, expressions and figures-of-speech in the articles, opinion pieces, the style of reporting, the biases.

All these little things that generally convey the atmosphere and mindset of an age. It's easy to find out facts, like the construction date of a factory. It's more difficult to find out what people were thinking about the new factory.
Re:Copyright restrictions by DunbarTheInept · 2004-11-18 09:06 · Score: 4, Insightful

Why would it be any different just because it's online?

In the online world, it is completely impossible to show somebody something without similtaneously giving them a copy of that same something. If the library shows you a html version of the copyrighted work, then it had to do so by sending you the contents of that work as a second digital copy, independant of the copy that's on their hard drive. If the library shows you a GIF image of the copyrighted work, then it hd to do so by sending you the contents of that work. No matter what scheme is used, no matter what technique for encryption is used, the fact of the matter is that at some point, even if just temporarily, your computer has to have its own copy in one way or another.

On the other hand, if I show you a physical book, this doesn't cause two seperate copies of the book to appear.

Unless the online library is willing to delete their copy (even from backups and from the hard drive) while you have your copy (and then trust you to send it back to them when you are done or pay them for it if you lose it), then there cannot be a working analogy between online and physical libraries as far as copyright law goes. Even someone not intending to make use of their copy is still technically breaking copyright law every time they look at a copyrighted work. Your browser's cache is filled with copyright violations if you've ever visited any website with any copyrighted content recently (which is most people who surf the web, probably).

The problem is that the original law was not written with this technology in mind, and the attempts to update it are written by people who just don't understand what they're doing, don't understand how the technology works, and aren't listening to those who do, and instead are listening to those with a vested interest in lying to them about the issue. Hence we get laws that if interpreted literally would outlaw the entire world wide web, but then get enforced selectively. (ALWAYS a bad situation to be in, where it is nearly impossible to avoid violating a law - then the law becomes a means to randomly smack-down on people for whatever you wish to discriminate against them for.)

--
Don't label something "offtopic" unless you know the topic well enough to tell what's on topic.
Re:Google by OECD · 2004-11-18 09:34 · Score: 2, Insightful

WWI!
Isn't it amazing that reporting on WWII is still under copyright?

--
One man's -1 Flamebait is another man's +5 Funny.
Re:Typeface ? by dvdeug · 2004-11-18 09:43 · Score: 3, Insightful

Yef, we could recalibrate the OCR for the early fontf, but the text ftartf to look ftrange.

That's not hard. It would be easy to get the OCR to recognize the long-s (which does in fact look different from the f); even if you don't, post-processing (dictionary lookups to see if f or s is valid at a point) can clear up many cases, and for those it doesn't, well, you're going to have to check and fix the OCR anyway.

(This is not theory; Distributed Proofreaders (http://www.pgdp.net/) has and uses such a post-processor.
Re:Goverment and the american history. by dvdeug · 2004-11-18 10:08 · Score: 4, Insightful

All digitally enhanced and edited to give you a better happier feeling of your government

The LoC would have their reputation destroyed among the librarian and researcher communities if they were caught doing that; and they would be, because hard-core researchers would notice any significant changes in the text and go back to the microfilm and original text copies.

Librarians tend to be among the strongest anti-censorship groups in America. There's never been any insinuation that the Library of Congress was having its strings pulled by the forces in power. I trust the Library of Congress to be a neutral provider of information much more then, say, the Washington Post or the Encyclopedia Britannica.

I can see a lot of places (libraries primary example) that will no longer carry or supply this type of information, because the government will supply it to us.

Most libraries are part of government. Why should you trust your home-town library more than the Library of Congress?
Re:Copyright restrictions by odin53 · 2004-11-18 11:26 · Score: 2, Insightful

The problem is that the original law was not written with this technology in mind, and the attempts to update it are written by people who just don't understand what they're doing....

Just because you're not aware of the legal history of copyright law doesn't mean the issues you raise haven't been considered.

We can analogize, for example, to the issue you mention above with copyright law-making from almost 30 years ago. It's been long realized that using a computer program almost always requires making a copy of the program (or non-trivial parts of it) in RAM. That's simply just how computers work. Section 117 of the copyright act was amended in *1980* to make an exception for this kind of copying. And that was as a result of people considering this issue in the *1970s.* Your "insight" about the necessity of making local copies of online material is an obvious extension of this idea. It's not like lawmakers/judges just missed the boat on the analogy wrt works posted for public free consumption in the online world -- I think it's just that everyone assumes that making a copy of the work is necessary to using the internet and a license for that particular act is implied.

Hence we get laws that if interpreted literally would outlaw the entire world wide web, but then get enforced selectively.

People who say this usually are not aware of all of the applicable law and how it interacts with the facts. Though of course I'm not saying that this sort of thing doesn't happen sometimes...
Re:Typeface ? by otherniceman · 2004-11-19 00:08 · Score: 2, Insightful

You don't even need to recalibrate, just modify the search engine to use fuzzy techniques.

See www.historicaldirectories.org for an example.

Historical Directories is a digital library of local and trade directories for England and Wales, from 1750 to 1919. It contains high quality reproductions of comparatively rare books, essential tools for research into local and genealogical history.