Slashdot Mirror


The Library of Congress Will Stop Archiving Every Public Tweet On January 1st (gizmodo.com)

An anonymous reader quotes a report from Gizmodo: In 2010, the Library of Congress started archiving every single public tweet that was published on Twitter. It even retroactively acquired all tweets dating back to 2006. But the Library of Congress will stop archiving every tweet on December 31, 2017. The Library of Congress issued a white paper this month saying that it was proud of its comprehensive collection of tweets from the first 12 years of Twitter, but that it's completely unnecessary for it to continue. Instead, the organization will only collect tweets that it deems historically significant. For instance, President Trump's tweets are almost certainly still going to be saved for future generations. One reason that the Library is stopping the comprehensive archive? The social media company's controversial change to allow 280 character tweets. The Library's halt on collection of all tweets puts Twitter more in line with the way that other digital collections are archived, including websites. The Library of Congress only archives websites on a selective basis, unlike the nonprofit, non-governmental organization the Internet Archive, which has a much broader goal of archiving everything online with its Wayback Machine. The Library of Congress also noted that many tweets include photos and video and that it has only been collecting text, making some of its collection worthless.

79 comments

  1. How much data is that per year? by Danathar · · Score: 1

    Assuming they are only archiving text, I wonder how much storage that requires. Of course it would compress VERY well.

    1. Re:How much data is that per year? by Rockoon · · Score: 4, Informative

      Assuming they are only archiving text, I wonder how much storage that requires. Of course it would compress VERY well.

      On a good day about 1 bit per character.

      --
      "His name was James Damore."
    2. Re:How much data is that per year? by Anonymous Coward · · Score: 0

      1 byte. you mean 1 byte.

    3. Re:How much data is that per year? by Memnos · · Score: 1

      Not sure of the exact amount, but it's less than one Library of Congress' worth.

      --
      I don't trust atoms -- they make up stuff.
    4. Re:How much data is that per year? by Anonymous Coward · · Score: 0

      Post-compression could be 1 bit per character on average, I suppose.

      Still, I'm curious to see some real numbers...

    5. Re:How much data is that per year? by ShanghaiBill · · Score: 4, Insightful

      1 byte. you mean 1 byte.

      No. He means one bit. One byte (8 bits) is completely uncompressed. But English text will compress down by nearly 90%, which leaves about 1 bit per character.

      The best compression ratios are for large texts using a consistent writing style and vocabulary, so tweets would yield less than 90% compression, but would likely still be better than 85%.

    6. Re:How much data is that per year? by jellomizer · · Score: 3, Insightful

      I say it will average 1 Library of Congress to store a Library of Congress worth of data.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    7. Re:How much data is that per year? by Rockoon · · Score: 1

      1 byte. you mean 1 byte.

      1 bit per character is what I mean you ignorant twat.

      --
      "His name was James Damore."
    8. Re:How much data is that per year? by Rockoon · · Score: 1

      Still, I'm curious to see some real numbers...

      These do something about it. Is your curiosity only ever itched by you asking other people to teach you? For fuck sakes it will be the top of pretty much any google search you can manage to construct about the subject.

      --
      "His name was James Damore."
    9. Re:How much data is that per year? by NicknameUnavailable · · Score: 1

      I say it will average 1 Library of Congress to store a Library of Congress worth of data.

      That depends on your temporal frame of reference.

    10. Re:How much data is that per year? by will_die · · Score: 1

      Going by 2013 twitter figures they say that they have the equivalent of 20 million pages of text each day. Lexus-Nexus estimates around 678,000 of pages in text format per GB.
      So by those you are at around 30 GB per day, with increase in twit size and its increased usage in the four years lets double that so probably 60 GB per day, ignoring indexes, metadata, linking to users, etc.

    11. Re:How much data is that per year? by Joce640k · · Score: 1

      Post-compression could be 1 bit per character on average, I suppose.

      The information content of the average tweet is less than 1 bit so compression ratios should be FAR higher than that.

      (I wouldn't be surprised if the whole of this years tweets could be compressed onto a single floppy if we use the previous eleven years as a dictionary).

      --
      No sig today...
    12. Re:How much data is that per year? by Anonymous Coward · · Score: 0

      Sometimes, i got 0.33 bit per character thanks to Shannon's theory (to research the entropy of the information.).

    13. Re:How much data is that per year? by Ecuador · · Score: 3, Informative

      I'll throw in one more data point. I developed a predictive text entry database for my previous employer - similar to the old T9 ( better obviously since I was involved ;-) ) and for English (and similar languages) it would take about 4 bits per dictionary word you trained (which is less than 1 bit/char since the average word length is a bit over 5). It is worst-case as we are talking about a dictionary, so no repeating words etc that compress a lot - however the information about how long a word is is not included in those 4 bits, so you save there (the way to think about it is that the user provides the length of word knowledge, the linguistic db the rest).
      But the idea is that English is pretty compressible...

      --
      Violence is the last refuge of the incompetent. Polar Scope Align for iOS
    14. Re:How much data is that per year? by guruevi · · Score: 1

      Typically you'll see 3:1 to 4:1 for text obviously you'll trade off speed for compression so about 2 bits per character and that is if you don't use a streaming algorithm, then you'll see closer to 2:1.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    15. Re:How much data is that per year? by jrumney · · Score: 1

      If you're saying the entire useful content of Twitter can be compressed to about 140 bits, then I agree with you.

    16. Re: How much data is that per year? by TheOuterLinux · · Score: 1

      You can get all of Wikipedia for ~60GB, I think. https://meta.wikimedia.org/wik.... So, a bunch of Twitter text files might not be as large as you think. It's the JS that does the carding part and I doubt they're screen shooting them all.

    17. Re: How much data is that per year? by houghi · · Score: 1

      How does it compare to other languages? Both using the same an different alphabets? E.g. German, Spanish, Arabic, Ivrit, Chinese, ....

      --
      Don't fight for your country, if your country does not fight for you.
    18. Re: How much data is that per year? by Ecuador · · Score: 1

      Well, English belongs to the second most compressible group for our technology, along with languages like Spanish and (maybe) German. There was one group that was even more compressible (e.g. Finnish, Italian) - by about 10-20%. Arabic was bad, took twice the space per word, although with only one client asking for it I never tried to see if I could optimize for it. I don't remember Hebrew and I don't see a built db on my disk to extrapolate. Chinese is a whole different story since you store pronunciations (pinyin, zhuyin etc) for hanzi characters and then hanzi sequences as well, so it is not comparable to anything else.
      Klingon didn't have a big enough dictionary, so while it seems simple enough I'd better not draw a conclusion.

      --
      Violence is the last refuge of the incompetent. Polar Scope Align for iOS
  2. Why did they do this to begin with? by Noishkel · · Score: 1

    Twitter is little more than a digital version of some a-hole writing something on the wall of a public restroom. Mostly a collection of advertisements and banal BS. It's not like we have someone writing profound tresses on the human condition there.

    Hell.. personally I really believe that the entire act of doing this was nothing more than a giant advertising campaign for Twitter using former President Obama's connection to the media.

    1. Re:Why did they do this to begin with? by AlanObject · · Score: 2

      Twitter is little more than a digital version of some a-hole writing something on the wall of a public restroom.

      Along the line of a-holes on Twitter..

      Wasn't it established in a federal court that Trump's tweets amount to official statements and can be cited as effective policy statements? Further, I recall they must be preserved by the official records act.

      I don't have the citation handy and don't remember what venue it was but perhaps someone else here can post it.

    2. Re:Why did they do this to begin with? by Tablizer · · Score: 2

      Twitter is little more than a digital version of some a-hole writing something on the wall of a public restroom. Mostly a collection of advertisements and banal BS. It's not like we have someone writing profound tresses on the human condition there.

      Some things never change.

    3. Re:Why did they do this to begin with? by nospam007 · · Score: 5, Insightful

      "Twitter is little more than a digital version of some a-hole writing something on the wall of a public restroom."

      Nonetheless historians are studying the graffiti on the walls of Pompeii and Herculaneum.

      https://www.smithsonianmag.com...

    4. Re:Why did they do this to begin with? by ShanghaiBill · · Score: 5, Insightful

      Mostly a collection of advertisements and banal BS.

      In hindsight, it is often the banalities that are the most interesting. Archaeologists often learn more from looking at ancient garbage dumps than from excavating palaces.

    5. Re:Why did they do this to begin with? by jdschulteis · · Score: 1

      It's not like we have someone writing profound tresses on the human condition there.

      For profound treatises, look hair on Slashdot.

    6. Re:Why did they do this to begin with? by ClickOnThis · · Score: 1

      Wasn't it established in a federal court that Trump's tweets amount to official statements and can be cited as effective policy statements?

      Established that they are official statements? No, but that was claimed earlier this year by Sean Spicer, the White House press secretary at the time.

      --
      If it weren't for deadlines, nothing would be late.
    7. Re:Why did they do this to begin with? by AlanObject · · Score: 1

      Established that they are official statements? No, but that was claimed earlier this year by Sean Spicer, the White House press secretary at the time.

      More than just Spicer. The 9th circuit said, basically, that they were pretty much the same stature as executive orders:

      Buried in a footnote in the 9th U.S. Circuit Court of Appeals’ unanimous opinion upholding the bulk of the injunction blocking Donald Trump’s travel ban, there is a moment of reckoning in which the panel addresses whether the president’s tweets constitute binding statements of executive intent.

    8. Re:Why did they do this to begin with? by ClickOnThis · · Score: 1

      The 9th circuit said, basically, that they were pretty much the same stature as executive orders

      If you read the actual article, instead of just the first sentence which you cited, you'll see that the 9th circuit claimed nothing of the kind. They merely cited a tweet in the context of their ruling that Trump exceeded his statutory authority, and that he had no rationale for his decision:

      Indeed, the President recently [tweeted] his assessment that it is the “countries” that are inherently dangerous, rather than the 180 million individual nationals of those countries who are barred from entry under the President’s “travel ban.”

      So, the court merely made note of one of his tweets, even though they mocked it. But the court paying attention to them does not elevate them to the status of executive orders. If that were true, then God help us. What are we to make of his re-tweets of phony news stories?

      --
      If it weren't for deadlines, nothing would be late.
    9. Re:Why did they do this to begin with? by Anonymous Coward · · Score: 0

      If your a-hole can write on a restroom wall that is an acrobatic feat that merits archiving.

    10. Re:Why did they do this to begin with? by Anonymous Coward · · Score: 0

      Did you just try to justify Twitter's existence?

  3. Who knew? by rmdingler · · Score: 3

    I'm actually more surprised this data collection has gone on at the Library of Congress since 2010, than the news that it's ending.

    Now if the story had started with "The NSA...", I would've been quite shocked at its termination.

    --
    Happiness in intelligent people is the rarest thing I know.

    Ernest Hemingway

  4. Posterity by Tablizer · · Score: 5, Funny

    Instead, the organization will only collect tweets that it deems historically significant. For instance, President Trump's tweets are almost certainly still going to be saved for future generations.

    Archaeologist 1: "Hey, I just discovered a message broadcast by the leader of the once great empire, United States!"

    Archaeologist 2: "Marvelous! What's it say?"

    Archaeologist 1: 'Let's see..."Rosie O. looks like a horse farted out a prune. Disgusting loser, so sad!"'

    Archaeologist 2: "On second thought, let's pretend we never found it."

    1. Re:Posterity by Anonymous Coward · · Score: 0

      The US-North Korea Twitter battles have the same historical value for prelude to nuclear war as the assassination of Archduke Ferdinand ;)

    2. Re:Posterity by jellomizer · · Score: 1

      I think the article writer was trying to be PC, and didn't add "saved for future generations", as a warning to society.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
  5. Redundancy by Anonymous Coward · · Score: 1

    The government realized it was useless hosting this data at both the Library of Congress AND the NSA datacenter.

  6. 7 years of wasted taxpayer money by Anonymous Coward · · Score: 0

    THERE is zero usefulness of all that crap text.
    I assume they started collecting thinking twitter will be closed much faster...
    I mean come on, that should have even be illegal!
    ITS just like if someday, they decide every letter and postcard sent by mail should be opened and scanned to get archived. Some people need to be fired for so much money wasted on stupid projects!!!

    1. Re:7 years of wasted taxpayer money by Anonymous Coward · · Score: 0

      Your tax dollars at work!

  7. Historically Significant by Anonymous Coward · · Score: 0

    Who decides what is historically significant? Is William Shatner's opinion on the best Love Live girl historically significant?

    https://twitter.com/WilliamShatner/status/942966660356435968

    I would say yes, but the Library of Congress may not appreciate the significance of this post.

    1. Re:Historically Significant by jellomizer · · Score: 1

      I would expect the rambling of famous such as William Shatner , would be saved, even if it some of it is rather odd. Also I would say a random sample from every day people should be saved, just as representation of the times. However everyone cat video, and personal rambling of their political belief probably shouldn't be bothered as it would be a waste of space.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
  8. Must be ashamed of all Downie Donald's "winning" by Anonymous Coward · · Score: 0

    Why waste money archiving that vatnik retard's toilet tweets?

  9. All this means is... by Richard_at_work · · Score: 0

    All this means is that, in 5000 years, historians will get a dangerously stilted view of what was posted on twitter - if Trumps tweets are archived, but not the tweets that debunks his claims, then his claims will stand unopposed for future historians to debate about.

    1. Re:All this means is... by Dragonslicer · · Score: 1

      All this means is that, in 5000 years, historians will get a dangerously stilted view of what was posted on twitter - if Trumps tweets are archived, but not the tweets that debunks his claims, then his claims will stand unopposed for future historians to debate about.

      Are you saying that Twitter is the only place where people debunk false claims made by politicians?

    2. Re:All this means is... by Richard_at_work · · Score: 0

      Look around at what has survived the ages so far, and tell me that there isn't a decent chance that quite possibly the twitter archive might be the only thing on certain topics which survives the next few ages - it might not even survive intact.

      Why take the chance? Archive everything, or nothing. Archiving the tweets of someone known to be toxic while relying on other external sources to debunk that toxicity shouldn't be a strategy to rely on.

      Do you really want Trump to be seen as the voice of reason by default in 5000 years?

    3. Re:All this means is... by Dragonslicer · · Score: 1

      I'm still not convinced that archives of Twitter are more likely to survive any period of time than archives of major publications like the New York Times or Washington Post.

    4. Re:All this means is... by Richard_at_work · · Score: 1

      Trump is still in power and waging a war against several mainstream news outlets.

      Still feel confident?

      Also, I'm sure no one thought that the best record of several dead languages would turn out to be a stone establishing a religious cult and granting tax exemption status to its priests (some things never change). I'm sure the creator of the decree on that stone would have thought his religion would have lasted longer than the stone itself, and yet here we are...

    5. Re:All this means is... by Richard_at_work · · Score: 1

      Looks like the Trump cock suckers are out in force, downvoting things they don't like :D

    6. Re:All this means is... by Anonymous Coward · · Score: 0

      The peanut in this pile of shit is rather peculiar.
      Ah but we haven't sifted through the rest of the pile.
      You're right, good chap! I see some carrots and corn too.

  10. English is about 1 bit per character by Okian+Warrior · · Score: 3, Informative

    Shannon's paper "Prediction Entropy of Printed English" tries to measure the amount of information per character in English.

    He found that English is about 1 bit per character, and so compressed text can be expected to take up about that much room.

    The paper is a pretty interesting read if you have read his 1948 paper that defines entropy (also a good read).

    He came up with some interesting experimental methods to measure entropy in English.

    1. Re:English is about 1 bit per character by gtall · · Score: 1

      This is TWITTER!! The only way to measure its information content is to use negative entropy.

    2. Re:English is about 1 bit per character by guruevi · · Score: 1

      That is obviously completely optimal compression which is practically impossible to obtain. Also, Twitter is mostly non-English (spelling mistakes, emoji characters etc).

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    3. Re:English is about 1 bit per character by jrumney · · Score: 1

      O rly? #precompressed

  11. Proof the U.S. has time-travel technology by jtara · · Score: 1

    This is the best proof yet that the U.S. possesses time-travel technology!

    How else does the Library of Congress know which tweets will be of historic significance?

    1. Re: Proof the U.S. has time-travel technology by Anonymous Coward · · Score: 0

      Dont worry about it. They'll tell you whats history just like they already did.

      Go on and give us all a laugh; tell us what "leftwing" and "rightwing" mean in your country's political theatre..

  12. How do you mean, exactly? by Okian+Warrior · · Score: 1

    All this means is that, in 5000 years, historians will get a dangerously stilted view of what was posted on twitter - if Trumps tweets are archived, but not the tweets that debunks his claims, then his claims will stand unopposed for future historians to debate about.

    How do you mean, exactly?

    I'm wondering what danger there could be (5000 years from now), and if we should take steps to avoid it.

    1. Re:How do you mean, exactly? by Anonymous Coward · · Score: 0

      All this means is that, in 5000 years, historians will get a dangerously stilted view of what was posted on twitter -

      How do you mean, exactly?

      if Trumps tweets are archived, but not the tweets that debunks his claims, then his claims will stand unopposed for future historians to debate about.

      I'm wondering what danger there could be (5000 years from now), and if we should take steps to avoid it.

      Yes, we should take away Trump's Twitter, it would be appropriate. Then posterity will know that we did the right thing.

    2. Re:How do you mean, exactly? by Richard_at_work · · Score: 0

      Laws and movements are often wildly misconstrued from their original meanings, because those meanings are not properly explained and laid out to start off with.

      Look at how much legal effort has gone into interpreting the US Constitution, with significant legal arguments hinging on commas etc. And that document is written in a language which is still spoken.

      I take it you've heard the story about how Nero played his fiddle as Rome burned? Yup, that's just one of the versions of how it went down - but it's the popular version, as Nero was an unpopular figure. Reliable contemporary accounts put him in another city altogether, returning to fight the devastation and famine that followed. Which of these accounts are true? And this is only 2000 years ago.

    3. Re:How do you mean, exactly? by nagora · · Score: 1
      "Yes, we should take away Trump's Twitter, it would be appropriate."

      I think it would be a better idea to just fire him and put in someone who isn't a completely worthless fucking pile of despicable immoral dishonest perverted shit.

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
  13. tweets for twits by ohgary · · Score: 1

    What a waste of my money saving all that crap in the first place.

  14. the real reason by Anonymous Coward · · Score: 0

    twittertwats tweet too much.

  15. Wasted US FED money by kencurry · · Score: 1

    Okay, this is a rant I admit it.

    But US Gov. is a financial train wreck. We just passed "TAX CUT" with expected 1 trillion added to deficit, and we find out how wasteful stuff like this happens all around us.

    Library of Congress is part of Judicial branch with budget of about 700M USD.

    --
    sigs are for losers (except to point out that sigs are for losers)
    1. Re:Wasted US FED money by gtall · · Score: 1

      And you somehow believe that saving money on the 1/3 of the budget that is discretionary spending is going to save us from the 2/3 that is non-discretionary? You might get a few mill out of that for killing tweets.

      While we're on the subject tilting at windmills, the entire foreign aid budget is less than 50 Billion. Saving that won't help and will probably cost money in the long run due to the programs being canceled in countries we'd really like to stand up and not fall over to the local nutjobs.

    2. Re:Wasted US FED money by kencurry · · Score: 1

      What was I thinking, trying to save a few measly million dollars? You on the other hand will make a great senator or congressman with your sound financial insight.

      --
      sigs are for losers (except to point out that sigs are for losers)
    3. Re:Wasted US FED money by Actually,+I+do+RTFA · · Score: 1

      The tax cut had a 1.8 - 2+ trillion dollar price tag. Don't worry though, they've already passed more legislation increasing the cost by at lease another 200 billion. So, it's at least a 2 trillion bill (over the next decade alone). And that's if they let the middle class tax cuts expire in 2023 as planned.

      --
      Your ad here. Ask me how!
  16. Woot! by AndyKron · · Score: 1

    OMG I'm in the Library of Congress? I'm published!

  17. I know why LOC is doing this by Applehu+Akbar · · Score: 1

    Apparently a single public employee is monopolizing the service and sucking up all the storage space.

    1. Re:I know why LOC is doing this by Anonymous Coward · · Score: 0

      and the LOC apparently hasn't discovered gzip, either.

  18. Speaking of the archive team by Anonymous Coward · · Score: 0

    They just got a $1 million donation this week (still at least $900,000 if you factor in current Bitcoin fees) Hopefully that money will help them to step up and take over the job of archiving every single tweet, because every single tweet needs to be archived.

    None of this "I regret that drunken tweet from last night, I better delete it before anyone sees" shit.

  19. Re:The Library of Congress won't save Trump's lies by Tablizer · · Score: 2

    Those are not lies, they are "colorful and whimsical interpretations of events and alternative realities".

  20. I say to future generations... by Anonymous Coward · · Score: 0

    I'm sorry.

    "President Trump's tweets are almost certainly still going to be saved for future generations"

  21. But but but.. by Anonymous Coward · · Score: 0

    How will future generations knows that Kim Kardashian got a pimple on he ass? Won't someone think of the children!

  22. Internet Archive DOESNT DO SHIT. by Anonymous Coward · · Score: 0

    Just buy one of the dead domains they "archive" and reacticate it.. BOOM! Watch all their "archives" disapear! What a fucking joke.

    1. Re: Internet Archive DOESNT DO SHIT. by Anonymous Coward · · Score: 0

      All those "archived" webpages will vanish the second the NSA decides it doesnt want history known.. and all they have to do is takeover the domain (so hard!) and post a new robots.txt (oh noes!)

      #freedumbs
      https://archive.org/post/184024/robotstxt-policy-is-a-failure

  23. Re:The Library of Congress won't save Trump's lies by Anonymous Coward · · Score: 1

    so it's as if 1984's Big Brother died and left his brother, Big Bozo, in charge.

  24. Re: Must be ashamed of all Downie Donald's "winnin by Anonymous Coward · · Score: 0

    Says the illegal drug fucked up Mexican murderer who rapes goats... And is a dumb fuck who can't get a job at Walmart.

  25. I’ve seen those elsewhere... by Anonymous Coward · · Score: 0

    This looks strangely similar to Rants & Raves on Craigslist. Perhaps the Library of Congress should archive that instead of Twitter, then.

  26. Trump tweets? by Anonymous Coward · · Score: 0

    No archiving the next three years of Trump tweets? How will the apes get their sacred scrolls? I guess Trump won't get to be Lawgiver for when the apes take over. Mike could still be Dr. Zaius if creationists have their way.

  27. Re:The Library of Congress won't save Trump's lies by Anonymous Coward · · Score: 0

    Genuinely curious - how does that compare to previous presidents?