Slashdot Mirror


How To Build a TimesMachine (nytimes.com)

necro81 writes: The NY Times has an archive, the TimesMachine, that allows users to find any article from any issue from 1851 to the present day. Most of it is shown in the original typeset context of where an article appeared on a given page — like sifting through a microfiche archive. But when original newspaper scans are 100-MB TIFF files, how can this information be conveyed in an efficient manner to the end user? These are other computational challenges are described in this blog post on how the TimesMachine was realized.

3 of 41 comments (clear)

  1. OCR? by The-Ixian · · Score: 2

    Seems like this is the obvious choice.

    Maybe just the headlines or the first paragraph and then link to a compressed version of the image file or PDF (not the TIFF itself for Jebussake).

    --
    My eyes reflect the stars and a smile lights up my face.
  2. nobody cares, google did it first by Thud457 · · Score: 5, Funny

    sticking it behind a paywall will cut down on bandwidth usage dramatically.

    --

    the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff

  3. Re:TIFF? by Anonymous Coward · · Score: 2, Informative

    Isn't there another format that would be better than TIFF? How about TGA? :p

    TIFF can use Group IV compression on monochrome images, whereas TGA is limited to RLE.