Library of Congress Offers Update On Huge Twitter Archive Project

← Back to Stories (view on slashdot.org)

Library of Congress Offers Update On Huge Twitter Archive Project

Posted by samzenpus on Monday January 7, 2013 @10:38AM from the 140-little-problems dept.

Nerval's Lobster writes "Back in April 2010, the Library of Congress agreed to archive four years' worth of public Tweets. Even by the standards of the nation's most famous research library, the goal was an ambitious one. The librarians needed to build a sustainable system for receiving and preserving an enormous number of Tweets, then organize that dataset by date. At the time, Twitter also agreed to provide future public Tweets to the Library under the same terms, meaning any system would need the ability to scale up to epic size. The resulting archive is around 300 TB in size. But there's still a huge challenge: the Library needs to make that huge dataset accessible to researchers in a way they can actually use. Right now, even a single query of the 2006-2010 archive takes as many as 24 hours to execute, which limits researchers' ability to do work in a timely way."

6 of 88 comments (clear)

Min score:

Reason:

Sort:

Why? by Anonymous Coward · 2013-01-07 10:40 · Score: 5, Insightful

Why does the federal government need to archive the useless information twitter calls tweets .. yet another huge wast of my money (being a taxpayer and all)
1. Re:Why? by griffjon · 2013-01-07 10:54 · Score: 4, Insightful
  
  To paraphrase a quote by the Internet Archive chairman from some years back, "The average lifespan of a Web page today is 100 days. This is no way to run a culture."
  
  --
  Returned Peace Corps IT Volunteer
2. Re:Why? by Anonymous Coward · 2013-01-07 11:10 · Score: 5, Insightful
  
  To paraphrase a quote by the Internet Archive chairman from some years back, "The average lifespan of a Web page today is 100 days. This is no way to run a culture."
  The average life of an inane conversation used to be maybe 15 minutes. I'm not sure the world is a better place for having extended that.
3. Re:Why? by fsterman · 2013-01-07 11:28 · Score: 4, Interesting
  
  Because academia is starved for data. Companies hoarding information limits what we can do with it. The library of congress is acting as an aggregate buyer for thousands of individual researchers, it is a huge cost savings.
  
  --
  Is there anything better than clicking through Microsoft ads on Slashdot?
4. Re:Why? by Hatta · 2013-01-07 11:49 · Score: 4, Insightful
  
  Because Twitter is a great model for the spread of ideas. If you study the spread of ideas, you can begin to understand it and use that understanding to affect it. That has enormous value.
  
  --
  Give me Classic Slashdot or give me death!
Re:He who archives my tweets by Reilaos · 2013-01-07 11:00 · Score: 4, Interesting

Some of the most important historical knowledge comes from things that people at the time wouldn't consider important. Things like grocery lists can help determine the diets and agricultural abilities of a culture at the time.
For an example I just made up: In the future, the presence or lack of traffic reports could, alongside legal/budget records, help a historian verify the spread/development of roadways.
Twitter could be a huge source of topics and a wealth of information for historians in the future.
They may conclude that we were all idiots. This too, counts as useful information.