The Library of Congress Will Stop Archiving Every Public Tweet On January 1st (gizmodo.com)

← Back to Stories (view on slashdot.org)

The Library of Congress Will Stop Archiving Every Public Tweet On January 1st (gizmodo.com)

Posted by BeauHD on Tuesday December 26, 2017 @08:45AM from the end-of-an-era dept.

An anonymous reader quotes a report from Gizmodo: In 2010, the Library of Congress started archiving every single public tweet that was published on Twitter. It even retroactively acquired all tweets dating back to 2006. But the Library of Congress will stop archiving every tweet on December 31, 2017. The Library of Congress issued a white paper this month saying that it was proud of its comprehensive collection of tweets from the first 12 years of Twitter, but that it's completely unnecessary for it to continue. Instead, the organization will only collect tweets that it deems historically significant. For instance, President Trump's tweets are almost certainly still going to be saved for future generations. One reason that the Library is stopping the comprehensive archive? The social media company's controversial change to allow 280 character tweets. The Library's halt on collection of all tweets puts Twitter more in line with the way that other digital collections are archived, including websites. The Library of Congress only archives websites on a selective basis, unlike the nonprofit, non-governmental organization the Internet Archive, which has a much broader goal of archiving everything online with its Wayback Machine. The Library of Congress also noted that many tweets include photos and video and that it has only been collecting text, making some of its collection worthless.

12 of 79 comments (clear)

Min score:

Reason:

Sort:

Re:How much data is that per year? by Rockoon · 2017-12-26 08:54 · Score: 4, Informative

Assuming they are only archiving text, I wonder how much storage that requires. Of course it would compress VERY well.
On a good day about 1 bit per character.

--
"His name was James Damore."
Who knew? by rmdingler · 2017-12-26 08:57 · Score: 3

I'm actually more surprised this data collection has gone on at the Library of Congress since 2010, than the news that it's ending.
Now if the story had started with "The NSA...", I would've been quite shocked at its termination.

--
Happiness in intelligent people is the rarest thing I know.
Ernest Hemingway
Posterity by Tablizer · 2017-12-26 08:58 · Score: 5, Funny

Instead, the organization will only collect tweets that it deems historically significant. For instance, President Trump's tweets are almost certainly still going to be saved for future generations.

Archaeologist 1: "Hey, I just discovered a message broadcast by the leader of the once great empire, United States!"
Archaeologist 2: "Marvelous! What's it say?"
Archaeologist 1: 'Let's see..."Rosie O. looks like a horse farted out a prune. Disgusting loser, so sad!"'
Archaeologist 2: "On second thought, let's pretend we never found it."

--
Table-ized A.I.
Re:Why did they do this to begin with? by AlanObject · 2017-12-26 08:59 · Score: 2

Twitter is little more than a digital version of some a-hole writing something on the wall of a public restroom.
Along the line of a-holes on Twitter..
Wasn't it established in a federal court that Trump's tweets amount to official statements and can be cited as effective policy statements? Further, I recall they must be preserved by the official records act.
I don't have the citation handy and don't remember what venue it was but perhaps someone else here can post it.
Re:Why did they do this to begin with? by Tablizer · 2017-12-26 09:02 · Score: 2

Twitter is little more than a digital version of some a-hole writing something on the wall of a public restroom. Mostly a collection of advertisements and banal BS. It's not like we have someone writing profound tresses on the human condition there.
Some things never change.

--
Table-ized A.I.
Re:How much data is that per year? by ShanghaiBill · 2017-12-26 09:14 · Score: 4, Insightful

1 byte. you mean 1 byte.
No. He means one bit. One byte (8 bits) is completely uncompressed. But English text will compress down by nearly 90%, which leaves about 1 bit per character.
The best compression ratios are for large texts using a consistent writing style and vocabulary, so tweets would yield less than 90% compression, but would likely still be better than 85%.
Re:Why did they do this to begin with? by nospam007 · 2017-12-26 09:19 · Score: 5, Insightful

"Twitter is little more than a digital version of some a-hole writing something on the wall of a public restroom."
Nonetheless historians are studying the graffiti on the walls of Pompeii and Herculaneum.
https://www.smithsonianmag.com...
Re:How much data is that per year? by jellomizer · 2017-12-26 09:21 · Score: 3, Insightful

I say it will average 1 Library of Congress to store a Library of Congress worth of data.

--
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Re:Why did they do this to begin with? by ShanghaiBill · 2017-12-26 09:23 · Score: 5, Insightful

Mostly a collection of advertisements and banal BS.
In hindsight, it is often the banalities that are the most interesting. Archaeologists often learn more from looking at ancient garbage dumps than from excavating palaces.
English is about 1 bit per character by Okian+Warrior · 2017-12-26 09:26 · Score: 3, Informative

Shannon's paper "Prediction Entropy of Printed English" tries to measure the amount of information per character in English.
He found that English is about 1 bit per character, and so compressed text can be expected to take up about that much room.
The paper is a pretty interesting read if you have read his 1948 paper that defines entropy (also a good read).
He came up with some interesting experimental methods to measure entropy in English.
Re:The Library of Congress won't save Trump's lies by Tablizer · 2017-12-26 11:03 · Score: 2

Those are not lies, they are "colorful and whimsical interpretations of events and alternative realities".

--
Table-ized A.I.
Re:How much data is that per year? by Ecuador · 2017-12-26 11:11 · Score: 3, Informative

I'll throw in one more data point. I developed a predictive text entry database for my previous employer - similar to the old T9 ( better obviously since I was involved ;-) ) and for English (and similar languages) it would take about 4 bits per dictionary word you trained (which is less than 1 bit/char since the average word length is a bit over 5). It is worst-case as we are talking about a dictionary, so no repeating words etc that compress a lot - however the information about how long a word is is not included in those 4 bits, so you save there (the way to think about it is that the user provides the length of word knowledge, the linguistic db the rest).
But the idea is that English is pretty compressible...

--
Violence is the last refuge of the incompetent. Polar Scope Align for iOS