British Library To Archive One Billion UK Websites
An anonymous reader writes "The British Library is to begin archiving the entire UK web, including one billion pages from 4.8 million websites, blogs, forums and social media sites. The process will take five months, with the aim of presenting a more complete picture of news events for future generations to read and learn from."
BL, and other memory institutions such as archives, apply a concept, called "Digital Preservation", to the stored data. This concept, based on the OAIS model, covers all stages of storage, administration, maintenance and retrieval of these "remains".
Hardest part of webarchiving is not storing the data but how to render it in 200 years. They also need to store the browser, but nowadays, browsers use so much different "subrenderers" such as Flash, Java, Javascript and CSS engines and whatnot to render a page, so there is also a need to archive all those subrenderers as well.
Best known strategy to date is to create and store emulator containers or VM's with the original software so they can be emulated in the far future.
http://en.wikipedia.org/wiki/Open_Archival_Information_System
Ehm, "everyone else". In Czech bilion = 10^12.
The Brits use the same billion=10^9 as everyone else speaking english
FTFY
A day will always be long, because 86400 won't fit into short.