Slashdot Mirror


Aussie Government Gives PDF the Thumbs Down

littlekorea writes "The central IT office of the Australian Government has advised its agencies to offer alternatives to Adobe's Portable Document Format to ensure folks with impaired vision are able to consume information on the Web. A Government-funded study found that PDFs can present themselves as image-only files to screen readers, rendering the information contained within them unreadable for the vision impaired."

6 of 179 comments (clear)

  1. Re:A subset of PDF files? by sjames · · Score: 4, Insightful

    Given the number of times government officials around the world have failed to understand the difference between removing text in a PDF and replacing it with black and just covering the text over with black, they'd probably get it wrong about half the time even with best intentions.

  2. Re:A subset of PDF files? by wiredlogic · · Score: 4, Informative

    ISO already has created the standardized PDF/X subsets used widely in the publishing industry. They lack support for extra features like scripting and other extensions.

    The main problem with PDF for document archives is that it is a presentation format and doesn't adequately preserve text structure since everything is broken down into lines of text or individually placed glyphs. Analysis of a page layout can only bring back so much. There are better ways to store data that offer more versatility.

    --
    I am becoming gerund, destroyer of verbs.
  3. What about Flash? Check out this site: by whoever57 · · Score: 5, Interesting

    Look at this page. It's for a local police department in a city that has lots of blind people because of the presence of the California School for the Blind. This is the first page that Google lists for the site. I can't imagine that a screen reader can make anything of the front page and there are no navigation buttons.

    --
    The real "Libtards" are the Libertarians!
  4. So the problem is fancy formatting. by robbak · · Score: 5, Insightful

    And the rest of us say "Get rid of it". We do not access government documents to be blown away by their totally rad page style. We access them for information, and extracting the information from the glumph that encases it is sometimes hard for the best of us.

    html all the way. Any formatting you cannot fit in a simple stylsheet can get left out.

    --
    Prediction for end of Universe #42: Fencepost error in Quantum_bogosort.cpp
  5. Re:Throwing out the baby with the bath water by robbak · · Score: 4, Informative

    Not necessarily. PDF does not preserve text flow. It breaks up paragraphs into lines (or less if kerning has been altered), and places them accurately on the page. If you have a multi-column layout, then a pdf-to-text algorithm (first step in screen reading) is likely to put column-2-line-1 between column-1-lines-{1 and 2}. Best of luck sorting that out.

    --
    Prediction for end of Universe #42: Fencepost error in Quantum_bogosort.cpp
  6. Re:Throwing out the baby with the bath water by peppepz · · Score: 4, Informative

    Not necessarily. PDF does not preserve text flow. It breaks up paragraphs into lines (or less if kerning has been altered), and places them accurately on the page.

    This is not true. PDF is capable of preserving text flow if the document contains such information. See this as an example: if you open it in acrobat reader and move the text cursor using the down arrow, you'll see it travel correctly among columns and paragraphs.
    No page description format will help if the page has been generated in a broken way: for instance, try extracting text from the tables of an html page generated by javascript.

    If you have a multi-column layout, then a pdf-to-text algorithm (first step in screen reading) is likely to put column-2-line-1 between column-1-lines-{1 and 2}. Best of luck sorting that out.

    In this case it is the pdf-to-text algorithm to be broken, and should be fixed.