Aussie Government Gives PDF the Thumbs Down
littlekorea writes "The central IT office of the Australian Government has advised its agencies to offer alternatives to Adobe's Portable Document Format to ensure folks with impaired vision are able to consume information on the Web. A Government-funded study found that PDFs can present themselves as image-only files to screen readers, rendering the information contained within them unreadable for the vision impaired."
Couldn't they have just required that the text portions of a PDF files are actually text?
A thumbs down in the southern hemisphere is the same as a thumbs up in the northern hemisphere, as long as you name the file bruce.pdf. It saves confusion.
So can a webpage, or a word document.
I suppose a pure text file cannot, but at the expense of other meta-data. Why not require PDFs to have word position OCR done (part of Acrobat Pro, so hardly a chore), and keep info like page number and position on page for scans. For non-scans it would take effort to destroy the text data.
Hell, even in ASCII I could use something like figlets to generate large letters (for easy reading), and destroy assessibility.
This sounds like bozo official had a scanned hard-copy in PDF, ran into trouble, and blamed the format (even though it would offer a good way to handle the situation built in) rather than the other bozo that scanned it, and didn't use the built in OCR function. I'm pretty sure these people would do the same with HTML, OOXML or ODF; it's not the formats fault.
Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
Other than plain text, are there really many other alternatives which don't endure levels of difficulty. Only other options I can see out there at the moment are ePub, simplified HTML or RTF - but of course then they all fall short of the possibly desired 'fancy formatting'.
As someone will likely also mention, why not just mandate that the PDF contents are actually text, as opposed to images (which is annoying to anyone!).
That is the case with badly done PDFs where pages are rendered as images. PDFs done via the office plugin or Openoffice or any other proper authoring package at the default settings have the text present and the fonts embedded instead so should work fin as far as accessibility.
How about enforcing some computer literacy on document publishers instead?
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Possiblly not a bad thing given the vast amount of security flaws and exploits that PDF has been hit with, especially over the last few years.
I really like PDF's ability to retain the font and display of the document without worrying about fonts and the application.
Since I have to distribute documents that are read on a variety of systems, including Linux, OSX, iPhone/Pad and Windows, PDF really beats all other alternatives in compatibility.
Adobe should really work on creating a text/image-only version of PDF without their fancy password protecting features and what-not.
If they don't, perhaps an open source group can take on the challenge.
Look at this page. It's for a local police department in a city that has lots of blind people because of the presence of the California School for the Blind. This is the first page that Google lists for the site. I can't imagine that a screen reader can make anything of the front page and there are no navigation buttons.
The real "Libtards" are the Libertarians!
Missing from the statement is what the preferred format is.
I would expect a Microsoft format from our illustrious leaders.
Reads like a fairly dumb statement which is what I always
expect from our government.
Sounds like a lead up to them locking themselves (us) into
using a proprietary, expensive, unusable system.
Who , me , negative ,
yep
Go well
Why no try looking at the study before jumping to your conclusion?
And the rest of us say "Get rid of it". We do not access government documents to be blown away by their totally rad page style. We access them for information, and extracting the information from the glumph that encases it is sometimes hard for the best of us.
html all the way. Any formatting you cannot fit in a simple stylsheet can get left out.
Prediction for end of Universe #42: Fencepost error in Quantum_bogosort.cpp
Also consider pdfs with complex page layouts. Deciphering the text flow from them is often hard for eyeballs, let alone computers.
2 columns is enough to throw out many screen readers.
Prediction for end of Universe #42: Fencepost error in Quantum_bogosort.cpp
Why are you assuming I didn't review the study? I did, and again, the conclusions are deeply flawed. The appropriate course of action would be to instantiate improved policies for the production of documents that appear in PDF format for general consumption. Once again, the file format itself is not the problem.
512 MB RAM, 20 GB disk, 200 GB transfer, five datacenters. $19.95/month.
The key problem that the majority seem to be overlooking here is that the people affected by this are disabled (mostly the blind). 1 you’re either blind or mostly blind, this is pretty bad, life has already given you the short-end. Screens are designed to be read, this is a fact. 2 your blind and thus probably not they most computer savvy person, your probably getting your friends son to fix this up for you, by installing software meant to fix this. 3 the tools made to help these people are not very well made, most are just providing magnification, or doing text to speech. 4 the office-person that takes a written document , scans it in the office scanner and then puts the result on the web, are not thinking about the poor blind barstard that can see it, its not in their job description.
I'm pretty sure that 90% of all documents on the internet need nothing more fancy than RTF encoding or even a very simple set of BBCode tags to be usable. I know PDFs are supposed to have tons of features but why not just be simple and stick with ASCII?
Hire me...
What does it matter that they can't read the text? PDFs aren't about content, they are about preserving the layout. At least that is what it seems like to me when I am foolish enough to try and read PDFs on a device with a different number of pixels than the person who made the PDF file.
If the content matters at all, someone should invent a technology that allows text to be tagged somehow with indicators of the MEANING of that portion of text, like 'this is a title', and let the display device render the text according to how the reader can best view it. It sounds crazy, and it may take a few decades to do, but think of the benefits.
They whose government reduces their essential liberties for temporary security, receive neither liberty nor security.
The Aussie government failed to recommend a standard that supplants PDF in such a way that it handles all the cases one would expect to handle. So what's the point of this exercise that the OZ gov't did other than basically say without words... 'we should publish everything in XML documents since at least those can be parsed to some degree?
You know, there should be an industry-standard sheet of paper (Letter/AF) that meets the JAWS difficulty test, much in the same way there are test HTML pages that test web browser compliance with HTML 1.1/5.0.
Needless to say, blind people already have solutions for reading printed text that is not braille. Print the PDF and then scan it back into OCR-to-speech software. I'm sure someone by now has invented the OCR-capable print driver that eliminates the need to print to paper to reach the step of reading scanned paper.
Create a PDF document that has radially-printed text, "The green fox slept and fellated the brown dog." printed in a straight line, then printed in a spiral, and then printed upside down.
Then for Hebrew and Arabic (RTL languages), the same type of sentence... printed in RTL in various configurations.
Then the newsprint column layout, etc. etc. etc.
Point JAWS at the PDF, or use the PDF reader's built in speech interpretation, and let PDF vendors attain for certified compliance from the accessibility software industry.
Problem solved.
Yes it is, these shouldn't be features, it should be simple for a text-speech program to follow without having some tacked on standard that you now have to expect everyone to follow.
The layout should compliment the data, not vice versa. If you have to think for one second "will my document be able to be accessed by vision impaired" then that is one second more than it should be, if you type three columns of text in a continuous flow, it should be able to read it back as such without having to go over it later and mark it up.
...
Who writes these idiot gloom and doom headlines. I truly hate misleading BS like this!!!!
Remember the Sydney Olympic Games website being non-readable?
Did they learn anything? Nooooo.
And many .gov.au sites still depend on IE6 - they are frozen to a defunct standard, and applications standardized around 17' in LCD monitor resolution.
The Australian AG's office nearly mostly password protects and bitmaps all its corro to it clients
for the sole reason to make things harder. Brain dead.
This is forgetting all the very real and stark security holes associated PDF's and ADOBE.
Now some have gone a step further and sharepointed things.
The ANAO (Audit Office) should simply go around and give Dept's 'F' for disability considerations, and substandard policy setting.
PDF's main goal is to make sure that a document always *looks* the same(if you have eyes that can look). But what's the point of that? Who cares about the precise graphic layout? Most PDFs that we encounter could have served their purpose better by being HTML documents. For gov documents, it's highly unlikely that they contain complex math equations that require careful layout.
Well, now here's a rich story. A story about lack of accessibility...on Slashdot. Surely this site is highly qualified to criticize others.
Shutting down free speech with violence isn't fighting fascism. It IS fascism!
Portable Document Format... Format.
Specialist Mac support for creative pros, Melbourne
It's not the format wrong, it's users. We in Poland have the same problem with gov's documents. Those morons write documents in ms word, then print them, then scan the printed document and embed scanned image in PDF. PDF *can* contain and preserve the content as text, with format and layout. the user who choose to misuse it is the problem.
I like ;)
The authors of the report say as much in their summary:
And while both the article summary and the report itself stress the need to provide alternate formats alongside (or in place of) PDF, the full report is scant on details or comparative tests of other formats. HTML and RTF seem decent options, as they permit some text formatting options (but are not wedded to them) and are platform-independent. But when you start adding graphics to the mix (as sometimes must happen) their portability tanks. They also cannot prevent the same problem that plagues PDFs: when some dipshit just scans a document and spits out an image-only file.
(PS - would it have killed the submitter and editors to link to the main report page, rather than only to a second-hand link from ITNews Australia?)
So basically they are saying that *because* it is possible to produce a shoddy PDF file which is basically an image dump, that this is reason enough not to use the format?
By this same reckoning, you could produce a really shoddy HTML page which also consists of images and no text... Virtually any format could be misused in this way.
So what's the alternative? That we all revert back to ASCII text since its incapable of holding graphics?
Personally i hate seeing poorly designed websites or pdf files as i described here, where the text is actually an embedded image (or worse - a flash file) and there is no clickable index etc.
We should probably start naming and shaming pdf creation software, and those who use (or misuse) such tools.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
Yeah, it's not like regular browsers could display XHTML.
Because the case you stated was the one the explicitely excluded so either you didn't review it or you are just trying to confuse things on purpose.
That's the one they choose? It wasn't the gaping security holes, the incessant patch requests (that are never even 6 steps behind the security holes) or the laborious installation/upgrade process? I'm sorry, I know blind people have it tough on the internet, but this is really the dumbest of the reasons I could imagine you would switch away from a nearly universally accepted format.
why not just have the great google automagically OCR any images it finds in PDFs and generate a vision-impaired-friendly version of the PDF?
It then can append a footer to each page stating "the creator of this PDF is a google-certified nimrod".
(I've always found it a bit galling that some paper catalog companies I've dealt with thought it reasonable to create a web presence by posting PDFs with scans of each page their physical catalog. Good luck searching through that!)
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
Vanilla HTML is a much better answer. Let the reader control the format - separate the markup from the content, let the reader control the fonts, how emphasis displays, even link colors. Or move one step forward and use (basic!) CSS. PDF is overweight, slow, seriously buggy, can lock content, and is not available for all platforms. HTML readers are ubiquitous, fast, highly compressible and wide open. Heck, I can display and edit a basic HTML file, formatted nicely according to the HTML, on my 1970's-era 64k 6809 machine using a text-based terminal. Now that is good compatibility! And it didn't take long to write, either. Try supporting PDF in a 64k environment. Good luck.
I have never allowed PDF to be used as any form of outgoing documentation for our products; and I've never regretted that decision.
I've fallen off your lawn, and I can't get up.
There isn't one good reason in the entire world to make sure a document "looks the same everywhere."
What we need is that the document is (1) readable, (2) orderly, and (3) conforms to the reader's needs.
When you have someone with poor vision, you don't want some tiny font used for anything, and zooming the page blows the context right out the window. The reader needs to be able to set the font, and the color(s), and the link colors, if any, and the document width, and quite a few other things.
PDF is unfriendly and the very idea that the author has to set the absolute look of the document reeks of elitism, misplaced "artistic" intent at the expense of readability and usability.
And then there is editing -- a document you can't edit and/or annotate is crippled -- and PDF encourages this unfriendly behavior.
The ideal solution at this point in time is, has been, and is likely to remain, HTML, which resolves every one of those critical problems.
I've fallen off your lawn, and I can't get up.
It's hard to read when your wasted, right Australia?
Isn't the web a visual medium and not suited to the blind? What is going to happen to audio files for the deaf? What about big words for those without education? This is a very precarious and slippery slope. Once you bend over backwards for one minority you have set a precedent. Don't think that this won't be used to help destroy the net as we know it today. It wont be too long before your blog won't be allowed until it has been thoroughly scanned by software to determine its PC friendliness. Your birdwatching blog wont be allowed as it isn't available to the blind or maybe your heavy metal appreciation blog either as it isn't deaf friendly. If you haven't worked out that Net ?Neutrality isn't what you should be worried about then you aren't paying attention. It's the PC police you need to worry about.
The new right fascists are bilingual. They speak English and Bullshit.