Slashdot Mirror


British Library To Archive One Billion UK Websites

An anonymous reader writes "The British Library is to begin archiving the entire UK web, including one billion pages from 4.8 million websites, blogs, forums and social media sites. The process will take five months, with the aim of presenting a more complete picture of news events for future generations to read and learn from."

2 of 89 comments (clear)

  1. Re:Data Storage by Anonymous Coward · · Score: 4, Informative

    BL, and other memory institutions such as archives, apply a concept, called "Digital Preservation", to the stored data. This concept, based on the OAIS model, covers all stages of storage, administration, maintenance and retrieval of these "remains".

    Hardest part of webarchiving is not storing the data but how to render it in 200 years. They also need to store the browser, but nowadays, browsers use so much different "subrenderers" such as Flash, Java, Javascript and CSS engines and whatnot to render a page, so there is also a need to archive all those subrenderers as well.

    Best known strategy to date is to create and store emulator containers or VM's with the original software so they can be emulated in the far future.

    http://en.wikipedia.org/wiki/Open_Archival_Information_System

  2. Re:Presumably by Trpajzlix · · Score: 4, Informative

    Ehm, "everyone else". In Czech bilion = 10^12.
    The Brits use the same billion=10^9 as everyone else speaking english
    FTFY

    --
    A day will always be long, because 86400 won't fit into short.