Aussie Government Gives PDF the Thumbs Down
littlekorea writes "The central IT office of the Australian Government has advised its agencies to offer alternatives to Adobe's Portable Document Format to ensure folks with impaired vision are able to consume information on the Web. A Government-funded study found that PDFs can present themselves as image-only files to screen readers, rendering the information contained within them unreadable for the vision impaired."
A thumbs down in the southern hemisphere is the same as a thumbs up in the northern hemisphere, as long as you name the file bruce.pdf. It saves confusion.
Given the number of times government officials around the world have failed to understand the difference between removing text in a PDF and replacing it with black and just covering the text over with black, they'd probably get it wrong about half the time even with best intentions.
Other than plain text, are there really many other alternatives which don't endure levels of difficulty. Only other options I can see out there at the moment are ePub, simplified HTML or RTF - but of course then they all fall short of the possibly desired 'fancy formatting'.
As someone will likely also mention, why not just mandate that the PDF contents are actually text, as opposed to images (which is annoying to anyone!).
That is the case with badly done PDFs where pages are rendered as images. PDFs done via the office plugin or Openoffice or any other proper authoring package at the default settings have the text present and the fonts embedded instead so should work fin as far as accessibility.
How about enforcing some computer literacy on document publishers instead?
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Learn how to operate another program? Spend from the budget for another set of licenses? (the horror)... start to use Open Office or the like?
Questions raise, answers kill. Raise questions to stay alive.
ISO already has created the standardized PDF/X subsets used widely in the publishing industry. They lack support for extra features like scripting and other extensions.
The main problem with PDF for document archives is that it is a presentation format and doesn't adequately preserve text structure since everything is broken down into lines of text or individually placed glyphs. Analysis of a page layout can only bring back so much. There are better ways to store data that offer more versatility.
I am becoming gerund, destroyer of verbs.
Look at this page. It's for a local police department in a city that has lots of blind people because of the presence of the California School for the Blind. This is the first page that Google lists for the site. I can't imagine that a screen reader can make anything of the front page and there are no navigation buttons.
The real "Libtards" are the Libertarians!
Missing from the statement is what the preferred format is.
I would expect a Microsoft format from our illustrious leaders.
Reads like a fairly dumb statement which is what I always
expect from our government.
Sounds like a lead up to them locking themselves (us) into
using a proprietary, expensive, unusable system.
Who , me , negative ,
yep
Go well
No it doesn't sound like a bozo official since that style of pdf was specifically excluded from the user study they ran.
You could of course skim the report and know that, but I guess that would mean you couldn't launch into meaningless rants.
Of ocurse if you did that you'd know the report is available in PDF format which I guess would just launch you on a different meaningless rant.
XML to the rescue!
Sleep your way to a whiter smile...date a dentist!
And the rest of us say "Get rid of it". We do not access government documents to be blown away by their totally rad page style. We access them for information, and extracting the information from the glumph that encases it is sometimes hard for the best of us.
html all the way. Any formatting you cannot fit in a simple stylsheet can get left out.
Prediction for end of Universe #42: Fencepost error in Quantum_bogosort.cpp
Also consider pdfs with complex page layouts. Deciphering the text flow from them is often hard for eyeballs, let alone computers.
2 columns is enough to throw out many screen readers.
Prediction for end of Universe #42: Fencepost error in Quantum_bogosort.cpp
You do know that in Australia it is law that a company make their website accessible for vision impaired if at all possible.
...
What does it matter that they can't read the text? PDFs aren't about content, they are about preserving the layout. At least that is what it seems like to me when I am foolish enough to try and read PDFs on a device with a different number of pixels than the person who made the PDF file.
If the content matters at all, someone should invent a technology that allows text to be tagged somehow with indicators of the MEANING of that portion of text, like 'this is a title', and let the display device render the text according to how the reader can best view it. It sounds crazy, and it may take a few decades to do, but think of the benefits.
They whose government reduces their essential liberties for temporary security, receive neither liberty nor security.
Yes it is, these shouldn't be features, it should be simple for a text-speech program to follow without having some tacked on standard that you now have to expect everyone to follow.
The layout should compliment the data, not vice versa. If you have to think for one second "will my document be able to be accessed by vision impaired" then that is one second more than it should be, if you type three columns of text in a continuous flow, it should be able to read it back as such without having to go over it later and mark it up.
...
I expect they could require that all they wanted, and it still wouldn't happen.
If my usability manuals are to be believed, people have neglected the safeties of nuclear reactors because those things are a chore and do nothing anyway. If you don't want your users to do something, then you design your system so that they never get the option.
I don't understand the comparrison between websites and PDF's? Graphical text banners, or images that contain text, are perfectly acceptable under WCAG, as long as alt text or long descriptions are used correctly. And if a PDF is correctly created then text can easily be read by a screen reader.
So basically they are saying that *because* it is possible to produce a shoddy PDF file which is basically an image dump, that this is reason enough not to use the format?
By this same reckoning, you could produce a really shoddy HTML page which also consists of images and no text... Virtually any format could be misused in this way.
So what's the alternative? That we all revert back to ASCII text since its incapable of holding graphics?
Personally i hate seeing poorly designed websites or pdf files as i described here, where the text is actually an embedded image (or worse - a flash file) and there is no clickable index etc.
We should probably start naming and shaming pdf creation software, and those who use (or misuse) such tools.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
No, that would be analogous to allowing PDF, but requiring the text portions actually be text.
And that would actually be reasonable.
Don't thank God, thank a doctor!
Are you talking about modifying existing pdf files, or simply creating new ones?
OpenOffice/LibreOffice has a PDF Import extension which does a pretty good job of editing, i also found via a very quick google search a pdfedit program on sourceforge - http://sourceforge.net/projects/pdfedit/
As for creating pdf files, there are countless programs for doing that, openoffice, pdflatex, virtually anything that can print to postscript combined with ps2pdf etc etc etc.
Sure, HTML is preferable to PDF for web content, but PDF is a pretty good format when used appropriately.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
Working as a web developer for the Canadian gov't, we had some similar rules for content. Mainly, you always had to provide it in the most accessible form possible. This usually meant HTML > PDF > Office Document. However, it was always on a best effort/convenience basis. So, if you posted PowerPoint slides, you also had to post the PDF versions, since making a PDF version was dead simple. However, we certainly weren't required to go all out and make a usable HTML version as well.
We also offered many things (eg. transcription or translation) on an "as requested" basis, since technically we were suppose to offer them, but realistically we didn't have the budget to do it for everything. This worked well.
I think just flat out banning PDFs is stupid. Require accessibility (best-effort), but allow for wiggle room. Yeah, it would be great if all PDFs had real text in them, but if the choice for some gov't agency is to either post an inaccessible version of the document or post nothing at all (because the time/cost required to make it accessible is too high), then they should be able to post the inaccessible version.
Most times I follow a link and discover the content is PDF, I give it a pass. If you want to publish on the web, use HTML.
And if you *truly* want to ensure it *always* looks the same *everywhere*, you use PDF
antipaucity
To make the documents accessible, they will need to create them in such a way that the screen reader can read the text for the blind person. Believe it or not, extracting the text contents from a pdf file is actually a very non-trivial problem. Mostly the problems are caused by pdf authoring tools that render each glyph separately. The text extractor then has no idea about which characters belong to each line and has to guess based on the baseline of the character. Another problem is non-ascii characters and how the authoring tool decides to render them. The venerable free software tool pdflatex uses composite characters (basically it renders multiple glyps on top of each other) which makes it impossible to accurately extract the text.
So no, it is not about stupidity or bad Microsoft softare. PDF just is unsuitable for accessable documents.
Football Odds