Slashdot Mirror


Multi-page PDF To Multi-page TIFF and Archiving?

GeorgeMonroy writes "One of my clients has aperture cards that they have been scanning into multi-page PDF files — but now they want them in multi-page TIFFs instead. One of the reasons they gave for this is that TIFF files require less storage space. While that is true, I wonder if TIFF is the best format going into the future. Are TIFFs better than PDFs for future use? I wonder what format you think would last longer. Are there any other formats that you think would be better or more future-proof? To me, storage is not a good enough reason to go to TIFF, because storage prices are always dropping anyway. Also, since they already have many of these files in PDF format and they want to convert them into multipage TIFFs, are there any programs that you can recommend that will perform batch processing of files so that we do not have to convert each PDF one by one? If another file format is better than TIFF, then are there any programs for batch processing that you can recommend?"

6 of 125 comments (clear)

  1. Re:Are these things images or documents? by pclminion · · Score: 4, Informative

    If they're images, then you should use TIFF (or perhaps PNG). However, it doesn't make sense for them to be "multi-page." If they're documents, then PDF is appropriate.

    Multi-page TIFF is well supported in the industry. There is nothing "weird" about it. It even supports embedded, searchable text (a Microsoft addition, but something that actually adds value). PDF archival can be difficult to do correctly. At the very least you want to use a product which supports PDF/A, followed up with some serious validation to make sure the results are actually compliant. Otherwise you may get bitten decades down the road. Searchable TIFF, on the other hand, will be around for freaking ever.

  2. Don't do this. But if you insist, here's how. by Cheesey · · Score: 5, Informative

    Are TIFFs better than PDFs for future use? I wonder what format you think would last longer. Are there any other formats that you think would be better or more future-proof? To me, storage is not a good enough reason to go to TIFF, because storage prices are always dropping anyway.

    Don't use TIFF. Stay with PDF. PDF is what all the big digital libraries are using. It's a proper standard, it's readable and writable by lots of free open source software, so even if Adobe disappears in a puff of intellectual property, you'll still be able to read your documents.

    TIFF, on the other hand, is a container format (like AVI, but worse). It isn't fully supported by every program - what sort of TIFF do you want, anyway? Compressed with LZW? With RLE? Not compressed at all? There's free software that will read and write the most common types of TIFF, so you can certainly do it, but why give up the convenience of using PDF?

    Also, since they already have many of these files in PDF format and they want to convert them into multipage TIFFs, are there any programs that you can recommend that will perform batch processing of files so that we do not have to convert each PDF one by one?

    Use ghostscript. Use something like the following command line:

    gs -dNOPAUSE -sDEVICE=tiffgray -sOutputFile=output%02d.tiff -dBATCH -r300 input.pdf
    This turns input.pdf into a series of 300 dpi tiff files, one for each page, called output01.tiff, output02.tiff, etc. Change the DEVICE to get a different sort of tiff file, and use gs --help to get a list of options. You can easily wrap this command in a script of almost any sort to make the process fully automatic.
    --
    >north
    You're an immobile computer, remember?
  3. Re:Tiff is better by pclminion · · Score: 4, Interesting

    But size does not have anything to do with it. TIFF is far simpler in structure than PDF and has therefore better compatibility. TIFF is also well documented. Of course, they would have to use raw tiff to get the advantages. The storage-space argument is secondary and matters only insofar as larger data sets have a higher irsk of corruption.

    I dispute the "well documented" claim. The TIFF standard is quite clear. Unfortunately, almost nobody adheres precisely to the standard. I work extensively with TIFF and PDF, and I have to say that the consistency I see in PDF is about 100 times more than what I see in TIFF. Your typical TIFF reader will contain thousands of hacks and workarounds for oddities that are produced by major players in the industry. While there is slightly non-compliant PDF, I have never seen things that even begin approaching the strangeness I see in TIFF on a daily basis. Having said that, I recommend TIFF plus search text metadata for archival, not PDF.

  4. pdf2tiff.sh by Anonymous Coward · · Score: 4, Informative

    let's not reinvent the wheel -- I did this about 9 months ago //wolfmann -- and this code is Public domain (done on federal gov't time):

    # cat pdf2tiff.sh
    #!/bin/bash

    for file in */*.pdf #for each pdf
    do
                    filename=`echo $file | cut -d'.' -f1`
                    if [ ! -e "$filename".tiff ]
                    then
                                    echo "gs -q -dNOPAUSE -dBATCH -sDEVICE=tiffg4 -sOutputFile=$filename.tiff $file"
                                    gs -q -dNOPAUSE -dBATCH -sDEVICE=tiffg3 -sOutputFile="$filename".tiff "$file" 2> /dev/null
                    else
                                    echo "$filename.tiff exists! skipping..."
                    fi
    done

  5. PDF/A by SpaghettiPattern · · Score: 4, Insightful

    Although the TIFF format is open and it is widely used in archiving systems, it is not particularly suited for an archive you setup new. The main reason is that many applications that generate TIFF may throw in their own proprietary stuff and lock you into a specific viewer. Also, you cannot do a text search of content in TIFF.

    When you discuss archves you think about looong times. Typically 10 to 50 years of retention with the odd exception where eternity is desired.

    Hence "plain" PDF is probably even worse than TIFF. One problem here are the included resources (fonts) and references (http links) which are mostly left out in order to save disk space. The other problem is that there are so many "plain" PDF versions to choose from and none of them will last 10 to 50 years.

    However, PDF is a good technology and therefor the PDF/A standard was developed. It is designed especially to deal with loooong term issues, is currently readable through almost any PDF reader and will be maintained by most sensible PDF readers for the years to come. There is NO vendor lock-in, you can put text in a PDF/A document an run searches against it. But most importantly, NO propitiatory stuff can be shoved in as it would result in an invalid document (a PDF document maybe but not a PDF/A document.)

    With the price of current disk space you should NOT make file size a defining criterion in your archiving policy. Only on z/OS disk space comes at absurd and ridiculous prices. If you can, try aiming for an archiving solution on Unix, Linux or even Windows.

    I am in the archiving business. At the moment PDF/A is the only format suitable for archiving.

    --

    I hadn't the slightest objection to his spending his time planning massacres for the bourgeoisie... (P.G. Wodehouse)
  6. Re:Are these things images or documents? by MBGMorden · · Score: 4, Informative

    Multi-page TIFF is well supported in the industry. Better supported than PDF in some cases. Our records management (in addition to keeping electronic scanned copies) still insists on having a microfilm copy of all of our retained documents. We can send digital copies to a processing company to have them processed, but they don't accept PDF documents - only TIFF's (multi-page is acceptable). Given that our internal document management is all in PDF, I ended up having to find a program to convert all of that information about a year ago (though the name of the program we ended up using escapes me - I wouldn't recommend it anyways, since it crashed for me very frequently).
    --
    "People who think they know everything are very annoying to those of us who do."-Mark Twain