Adobe Pushing For Flash and PDF In Open Government Initiative
angryrice tips news that Adobe seems to be campaigning for the inclusion of Flash and PDF in the Obama administration's efforts at increasing government transparency and openness. A post from the Sunlight Labs blog is critical of Adobe's undertaking, in part since PDF is often "non-parsable by software, unfindable by search engines, and unreliable if text is extracted." They also say government's priority should be to publish datasets and the APIs to interact with them, rather than choosing how they're displayed in fancy graphs and charts.
I have no problem with PDFs, there are a number of free and commercial applications out there that can work with them.
Flash on the other hand is absolutely an abomination that must be wiped from the net. They still haven't released a proper version for *BSD and they commonly don't bother with less popular OSes. If they want it to be used for this sort of purpose then they need to get their act together and make it available for all operating environments on an equal basis. Which I don't think they have the resources to do.
Nobody likes Flash, and they probably shouldn't use it for anything. But there's not much wrong with PDF, if it's done right. When publishing something, one could offer "source" (some sane, machine-readable format) and PDF (autogenerated from the source, and prettified for easier reading).
PDF shouldn't be used as a way to encapsulate scanned JPEGs and pretend they're a real electronic document.
I would also note that many of the complaints about PDF as a format in TFA are really complaints about Adobe's abysmal PDF reading software. For example, the concern about the visually impaired: KDE's Okular does speech synthesis and has a high-contrast mode.
# cat
Damn, my RAM is full of llamas.
but specially html5+js+canvas+svg+ogg vorbis/theora for rich web content.
Who has announced authoring tools for this stack that are anywhere near as capable as even Flash 3, let alone Flash CS4? Say I want to make an animated SVG like the Flash animations I see on Newgrounds. What package should I start with?
PDF remains difficult to manage. Like MS Word documents, an incredible amount of resources is wasted in display information rather than actual text or graphical content. Unlike MS Word, they're parseable: but unfortunately like MS Word, the commercial vendor-sold document creation tool (Adobe Acrobat) generates unstable and unreliable content that interacts very badly with other tools. Oddly, the ghostscript created PDF remains very stable and legible, and tools like "PDFCreator" which uses ghostscript creates long-term viable PDF printouts of other document formats. I use it for complex MS Word documents that cannot be handled by other software, even different versions of MS Word.
Adobe can actually do better with this, and I hope that they will in the future. But it's not stable enough to be reliably indexed or viewable even 5 years in the future, much less 10 or 20 or 100 such as may be needed for legal or historical documents.
Flash, you're quite right. Unless they open up the source, it has no business as yet another document format.
The summary does not do a good job of reflecting the original blog post's point. The point was that the government should make data available in a machine-parseable and generic format. PDF is a great format for storing typeset pages, but it is a terrible format for publishing data. It's easy to generate beautiful PDFs from well-structured data but it's much harder to go the other way. Would you rather have budget figures (for example) as a CSV file in a well-defined format or as a PDF of tables and graphs? If the data is available in the former format, it's easy for you or a third party to produce the latter format. If it's only available in the PDF form then it's much harder to create the CSV.
I am TheRaven on Soylent News
GP is right. Government should focus on doing what government is needed for success, such as determining standards for formats that everyone can use, with input from academia and industry. For example a human readable parsable format that one could embed in a web page for semantic metadata. Or funding open source software to make it easy (cross platform) to input such data (I am thinking of information about cited papers or books). Typeset information is nice but we already are drowning in information - how many pages of Google results do you usually look at? And we need help before generating 10 times as much.
Why PDF is bad:
- It is a potable typeset document package. Not a data sharing package that could be pulled apart easily with tools automatically.
- PDF is extremely hard to parse, and using current free software does not always give good results.
- You destroy useful document structure, or in the case of ASCII text parsability and small size, when you convert to PDF. You can't just convert back to the original.
- It takes significant processing power and commercial software to display well and reliability as far as I can see. Having just gotten the latest Mac I feel like I'm in a dauntless battleship, but I have had many trouble with different unix tools in the past.
- Scientists publish PDF too but then also use other formats for data. For example on arxiv, one scientists recently published animations inside a zip but it was hard to find the link
- It is difficult to manage bibliographic information automatically.
- It is proprietary
- It requires a huge amount of data, and arcane knowledge, just to build a parser that works most of the time (such as for Asian languages especially).
I am OK with PDF. I would RATHER see documents in plain HTML, but there are times when formatting is important. In those cases, if it is to be read/print-only, PDF is the way to go. Otherwise, the gov should use ODF.
But Flash? Are you kidding? The last thing on earth we need is more Flash.
* Does not work on all devices
* Slow and/or consumes tons of CPU
* Consumes tons of RAM
* Consumes more bandwidth
* Makes it difficult or impossible to cut and paste
* Impossible to "search/find"
* Violates the native UI look and feel
* Fonts and font sizes are uncontrollable by the end user
* Can't scroll correctly much of the time
* Almost completely proprietary
* Rarely adjusts to screen size
* Often introduces extremely irritating animation.
* Doesn't allow text to be "seen" by the browser (or OS), making other plugins (like a screen reader) 100% useless
At least that SilverDark stuff isn't even on the radar- thank God for little favors.
What are you talking about? The PDF specification has been available as a free download from Adobe with no royalties payable by implementors since PDF was first created. More recently, the PDF/X family of specifications was approved by ISO. These define subsets of the PDF 1.4 specification for different uses (see ISO 15930). There are at least three open source PDF readers that I know of as well as several commercial viewers (Adobe Reader, FoxIt, Apple's Preview, and so on) and numerous tools can generate PDFs.
I am TheRaven on Soylent News
This sort of authoring is easily handled in vi - or emacs - your choice.
If libertarians are so opposed to effective government, why don't they all move to Somalia?
Printing documents created in other language versions of Acrobat. In particular, the Adobe Acrobat for German created documents that were not only unviewable in a normal Acrobat viewer, but when used to "print PDF" for MS Word documents, created documents that actually crashed Windows computers. The Acrobat for Hebrew didn't crash Windows with the printed documents, but was filled with layout errors when rendered even by Acrobat Reader, errors that didn't show up in the Adobe Acrobat tool. Much of this may have been fixed with the latest release, but I'm not spending nor suggesting that my peers overseas spend all the money needed to upgrade.
Getting our colleagues to stop using Acrobat and use _anything else_ to generate their documents, and use PDFCreator to print them as PDF, stabilized the situation enough for us to generate the documents we needed. It didn't provide PDF forms for people to fill out, which was its only flaw.
Just recently I had to look at, and print a few pages from, a PDF document. Knowing where it came from, a corporation that is only very slowly dipping a toe in the water of software other than the big names, I'm sure it was done with Adobe.
Now I don't even have the Adobe Acrobat reader on my system, when I try to install it, the install crashes. But Fedora comes with several other PDF readers, and the default is set to "Evince" which works fine MOST of the time.
But I got this PDF, and one page was a picture of a tax form, and when I tried to print it, the tax form came out as a big black blob - man, does that waste ink! Obviously I killed the print job to try something else. (Just VIEWING this tax form was fine, only printing messed up.)
I remembered using "Xpdf" a while ago, so I tried that, and voila, the tax form printed perfectly. Since I knew there were more tax forms in there, I used Xpdf for the rest of the job.
So here is a case where two different PDF viewers reacted differently to the same PDF file. I think what we need is is an OPEN DEFINITION for PDF files, probably a subset of Adobe's definition, that any OSS viewer can follow and get the proper results - and ask the user what to do with files that don't follow it.
And tell Adobe they can either follow the open definition, or stuff it where the sun don't shine!
Teen Angel - a Ghost Story
There is such; Adobe publishes it and makes it freely available on its web site. It's possible your file didn't follow it, but it's more likely your reader wasn't 100% compliant; it's a very complicated specification.