Adobe Pushing For Flash and PDF In Open Government Initiative
angryrice tips news that Adobe seems to be campaigning for the inclusion of Flash and PDF in the Obama administration's efforts at increasing government transparency and openness. A post from the Sunlight Labs blog is critical of Adobe's undertaking, in part since PDF is often "non-parsable by software, unfindable by search engines, and unreliable if text is extracted." They also say government's priority should be to publish datasets and the APIs to interact with them, rather than choosing how they're displayed in fancy graphs and charts.
I don't believe this is true - I find PDF documents in search results all the time. The consistency and reliability of PDF for forms creation has no real competition. If you hate Adobe, ok, but don't hate PDF 'cause it's beautiful...
Ask Me About... The 80's!
I am so sick of flash-based interfaces. Augh.
Nobody likes Flash, and they probably shouldn't use it for anything. But there's not much wrong with PDF, if it's done right. When publishing something, one could offer "source" (some sane, machine-readable format) and PDF (autogenerated from the source, and prettified for easier reading).
PDF shouldn't be used as a way to encapsulate scanned JPEGs and pretend they're a real electronic document.
I would also note that many of the complaints about PDF as a format in TFA are really complaints about Adobe's abysmal PDF reading software. For example, the concern about the visually impaired: KDE's Okular does speech synthesis and has a high-contrast mode.
# cat
Damn, my RAM is full of llamas.
The future is ODF (a real open xml) and of course PDF, but specially html5+js+canvas+svg+ogg vorbis/theora for rich web content.
With this kind of technology that the new browsers bring to the arena, adobe is getting scared!
Why am I not surprised?
Perhaps you know of a document format where the text in images IS searchable?
The point being about open government, freedom of information and accessibility of data.
PDF carries the information you're looking for in a less convenient form, which more to
the point is frequently derived from a source that could easily provide a more convenient
form of access. Sunlight labs puts it this way:
We can turn XML into PDFs. We can't turn PDFs into XML.
We've had a couple of decades now since SGML, a bunch of progress on its derived and
related technologies, but 'downgrading' the publicly accessible format to PDF seems
like a step in the wrong direction simply to make things look nice and be a
no-brainer to publish.
Flash should only be considered if the government can mandate that Adobe provide and competently maintain a Flash player of comparable quality for all major desktop, mobile, and handheld OSes and platforms. The alpha-quality Flash player for 64-bit Linux sucks donkey balls while Windows gets star treatment. Open source would be another plus, but right now I'd settle for a 64-bit Linux binary that didn't crash my browsers constantly.
"Liberty may be endangered by the abuses of liberty as well as the abuses of power." -- James Madison
Way to go to convince government and its constituents that Flash and PDF will help them put together open websites and follow "ADA Guidelines for the Web" aimed at ensuring accessibility...
They also say government's priority should be to publish datasets and the APIs to interact with them, rather than choosing how they're displayed in fancy graphs and charts.
I felt a great disturbance in the Force, as if millions of IT workers suddenly cried out in terror, and were suddenly silenced.
I record my sleeptalking
Campaigning against PDF in any way might effectively equate to implicitly campaigning for Microsoft's XML Paper Specification (XPS)
GP is right. Government should focus on doing what government is needed for success, such as determining standards for formats that everyone can use, with input from academia and industry. For example a human readable parsable format that one could embed in a web page for semantic metadata. Or funding open source software to make it easy (cross platform) to input such data (I am thinking of information about cited papers or books). Typeset information is nice but we already are drowning in information - how many pages of Google results do you usually look at? And we need help before generating 10 times as much.
Why PDF is bad:
- It is a potable typeset document package. Not a data sharing package that could be pulled apart easily with tools automatically.
- PDF is extremely hard to parse, and using current free software does not always give good results.
- You destroy useful document structure, or in the case of ASCII text parsability and small size, when you convert to PDF. You can't just convert back to the original.
- It takes significant processing power and commercial software to display well and reliability as far as I can see. Having just gotten the latest Mac I feel like I'm in a dauntless battleship, but I have had many trouble with different unix tools in the past.
- Scientists publish PDF too but then also use other formats for data. For example on arxiv, one scientists recently published animations inside a zip but it was hard to find the link
- It is difficult to manage bibliographic information automatically.
- It is proprietary
- It requires a huge amount of data, and arcane knowledge, just to build a parser that works most of the time (such as for Asian languages especially).
PDF has become a defacto standard like GIFs, so I think it's an okay idea to embrace their usage, but only if PDF is open-licensed to all. Otherwise tell Adobe "no"
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
Have these people not heard of Google? Just because YOU can't write software to parse PDF files doesn't mean that nobody else can and that it doesn't already exist.
If you are publishing a document that can be printed then PDF is a good format. If you expect people to extract data from the document then you should look for a different format. It depends on the purpose of posting the document on the web.
I am OK with PDF. I would RATHER see documents in plain HTML, but there are times when formatting is important. In those cases, if it is to be read/print-only, PDF is the way to go. Otherwise, the gov should use ODF.
But Flash? Are you kidding? The last thing on earth we need is more Flash.
* Does not work on all devices
* Slow and/or consumes tons of CPU
* Consumes tons of RAM
* Consumes more bandwidth
* Makes it difficult or impossible to cut and paste
* Impossible to "search/find"
* Violates the native UI look and feel
* Fonts and font sizes are uncontrollable by the end user
* Can't scroll correctly much of the time
* Almost completely proprietary
* Rarely adjusts to screen size
* Often introduces extremely irritating animation.
* Doesn't allow text to be "seen" by the browser (or OS), making other plugins (like a screen reader) 100% useless
At least that SilverDark stuff isn't even on the radar- thank God for little favors.
Microsofters are not IT workers. Political activists, maybe. Certainly they are not IT.
The summary does not do a good job of reflecting the original blog post's point. The point was that the government should make data available in a machine-parseable and generic format. PDF is a great format for storing typeset pages, but it is a terrible format for publishing data. It's easy to generate beautiful PDFs from well-structured data but it's much harder to go the other way. Would you rather have budget figures (for example) as a CSV file in a well-defined format or as a PDF of tables and graphs? If the data is available in the former format, it's easy for you or a third party to produce the latter format. If it's only available in the PDF form then it's much harder to create the CSV.
If the goal is to make the data available, then even CSV would be a better option than PDF. PDF, while pretty, is a terminal format and is the digital equivalent of a mayfly. It's paper that hasn't happened yet and when it does it will exist for a few short hours before finding its way to the circular file.
Much of the government data consists of tables and tables of data. gzipped csv would be readable by anyone, so would ODF. Adobe appears to be looking for a handout at the expense of creating a useful and open data system.
Put it in context: open government requires data formats that are independent of campaign donors.
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
Has anyone else noticed that Adobe software is crap, inefficient crap?
Further, the recent PDF specifications add DRM which shouldn't be allowed in government publications. If the govt agrees to use a PDF version that open source software can completely read, parse, and convert, then it is fine PROVIDED the raw data is available in open formats too.
PDF/A is already open. However, that doesn't mean that anyone knows how to produce it, especially some R.O.A.D. staffer or random hourly GS1.
Open or not, PDF/A is a display format and, in most cases, useless for information retrieval or automated data processing. PDF/A is a useful alternative to paper. However, the open government initiative is not talking about paper. It's about 'born digital', machine readable data.
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
I guess most of you do not realize that Adobe produces SpyWare by default in their own products?
Flash for example has iesnare built into itself. This all allows machine profiling that everyone agrees to when you install their bullshit software!
You can't trust a company that has already done something to make you distrust them.
http://www.adobe.com/accessibility/products/flashplayer/overview.html "With integrated support for Microsoft Active Accessibility (MSAA), Flash Player 10 makes content available via screen access technologies such as Window-Eyes from GW Micro and JAWS from Freedom Scientific."
But it is very bad that Adobe doesn't consider accessibility support to be a "must have" feature for the desktop version of their Flash Player.
Adobe ships Flash/PDF readers/plugins to: Windows, OS X, Symbian (in some form), Linux, *BSD and various, uncountable tiny platforms. iPhone/iPod does not count because of obvious reasons.
Lets see what MS Silverlight ships to: Windows/Intel Mac. Damn thing is so tied to Windows that they couldn't even convert/ship the V2 for PPC Macs or they simply abandoned them. (like we cared!)
MS XPS format and viewer is the answer to PDF which, some people who didn't use Windows have never, ever heard of. It is that Windows centric. Despise all rude attempts by MS (adding XPS printer without etc), it has never, ever took off.
What we need is, something combines ODF and PDF. You can add binary file to PDF document like some layer. ROM LogicWare, less known Office (Papyrus) developer does it right now. The files are both PDF and their own edit format, transparent to PDF readers and NOT a hack.
Of course, people will spend time "omg flash, pdf, Adobe is slow" flaming rather than finding a solution to a real problem. Asking government to use Flash is really absurd but the real one to blame here is MS and open source based large companies. If they have no alternative, Adobe will suggest PDF of course. What else they should use? MS XPS?
If you look around, every single Apple computer, device (ipod/iphone) is actively indexing every single PDF thrown at them, instantly and keep database of it.
It is the famous "Spotlight" technology. They don't even need to look at Google, some of them have same kind of indexing technology (minus relation) running on their laptops.
One should check the TFA relations with MS. I am sure something will come up.
I work with PDFs a lot, especially on OS X. I am telling you from an OS which you can have 60 KB 1080p screenshots in PDF in some circumstances: Whoever did that "text as image" trick, he is a complete moron.
One of the reasons that PDF took off is exactly embedding fonts used in a document so it will appear as pixel perfect on client machines.
As last resort (and a good practice), you can embed unformatted pure text of the entire PDF in your PDF file. PDF, like Quicktime Mov is one of the formats where people doesn't use the features and bitch about the size of client etc.
A number of government forms don't work with the free PDF readers.
This is because Adobe broke its own published spec with its LiveCycle product, and by default it saves files that aren't compatible with anything else. It does a great job of forcing you to buy LiveCycle/Acrobat instead of using free tools. The Adobe people will tell you that it speeds up rendering of downloaded data, which I find hard to believe as the files are between 2x and 3x the size of a regular PDF.
The current use of Adobe products for government forms is a nightmare, it seems like a dumb idea to extend it.
I will ask one thing as you seem to miss why HTML is not considered a print/distro format: "When did we have an embeddable font standard for HTML webpages?" as with Flash: "Is there a way to have a single file and infrastructure to show embedded videos in HTML5 form?"
They actually suggested people to use abandoned VP3 format for God's sake and the very same people have chosen TrueType (check why freetype exists) as font embedding format.
If the data is out there in some easily consumable format then it really doesn't matter whether it's displayed in Flash, PDF or whatever. Choose your flavor of output/display. As the developer of a global app across the entire corporation the most popular output is Excel compatible using either CSV or XML format because people want to do their own thing. Of course, we don't support any of what they do with it. We produce the official results each month and if their numbers don't match ours it's their responsibility to prove otherwise.
Flash is evil for man reasons, but the most in-your-face reason if you use a Mac is that the Mac Flash plugin crashes all the time. It is the #1 (by far) reason for Safari crashes on the Mac.
I'm not wild about PDF, but at least I don't see PDF viewers crashing all over the place.
"...unfindable by search engines..."
That is absolutely not true. Anyone who uses Google knows that the search engine can read PDFs, identify if any of the keywords are located within, and then provide a link both directly to the PDF as well as to an HTML version.
The world moves for love. It kneels before it in awe.
This could be a Good Thing, if it means that the formats will be made and remain open. IIRC, PDF is already an open standard, and supported by various programs from multiple sources. I would applaud it if the same were to happen to Flash. And if both formats are open and widely supported, the government could do a lot worse than using them.
Please correct me if I got my facts wrong.
So there is a partial option for MS-Windows only. Great. Not exactly platform agnostic and open. I suppose it is better than nothing, though.
PDF and Flash are commonly used and extremely powerful. They can be and are used with great success. Your points are valid, but they are not Adobe's fault. Just because people make Flash intro pages or scan jpg's into a PDF is outside of the control of Adobe.
Anyone can create a PDF reader or flash player without paying Adobe a cent.
If you look around, every single Apple computer, device (ipod/iphone) is actively indexing every single PDF thrown at them, instantly and keep database of it.
No it is not. It is indexing every PDF that has text in the metadata. Create a PDF by printing to PostScript and then converting to PDF (the easiest way of creating PDF on Windows or Linux machines) and watch Spotlight completely fail to index it. Spotlight does not index the text that you see when you browse the PDF, because that text is stored as a set of glyph indexes, not as streams of characters.
And if you want some real fun, open up a PDF containing table in Preview and try to persuade it to copy it in a way that preserves the structure of the table.
I am TheRaven on Soylent News
So how then does their plan support open government?
Which, imho would be okay, if Microsoft released C (or even C++) code that explicitely included a license (even MS-PL) that extended to cross-platform ports, and ports to other languages.
Could Adobe please bless up an AMD64 build for Linux, please? Manually having to install http://download.macromedia.com/pub/labs/flashplayer10/libflashplayer-10.0.32.18.linux-x86_64.so.tar.gz is a pain. Cross-platform means !EVERYWHERE! Do it or don't.
As for Flash, lets not even go there. Flash is passable as a streaming video container, if you're making animated cartoons like Homestar Runner or as a platform for small web games but other than those use cases, you're using it wrong.
Not even movies anymore. Try Divx. The webplayer will buffer correctly, go into fullscreen on doubleclick, and behave like VLC and similar players, inside your webbrowser.
Try this site for free movies after downloading Divx (pretty cool stuff): http://www.freefullmovies.net/
After having tried Divx for movies, you will never want to use flash for that, except maybe youtube, but only because the quality is already horrible.
Pretty surprised I'm the 1st to suggest this combo. Most 'modern' browsers are close to svg 1.1 now. Google has stated it's interest in svg, hosting this year's svgopen.org. Indexability being a strong draw. Sure, everytime you mention an xml format the json guys cough up bits. Size is reducible by gzip, xslt is not 'pretty' but the flexibility will exercise yer greystuff. MarkT ps: Inkscape.org will convert pdf pages to svg nicely.
I'd vote for that as a standard.
No sig today...
Useless is the wrong word. It took 15 lines of python wrapping xpdf for me to get a working system for dumping the transactions out of the last 6 years of my credit card statements.
It's ugly, but it works just fin
That would be because that particular PDF happened to accidentally be wrapping ASCII or ISO-8859 or UTF-8 or UTF-16 instead of some image format. Even then, that was just screen-scraping like can be done with old terminal sessions. It can be done, sometimes.
Keep the data in machine readable formats, not a terminal format like PDF or paper.
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
When Flash had a few issues a couple months ago, I removed it from my browser. Suddenly, thousands of irritating advertisements and web banners and annoying intro pages of pointless information were blank with only a notice to install flash player.
Remove it from one browser and see if it doesn't make surfing better for you.
Just my 2 cents in regards to public records and data.
I'd like to say that the groups making decisions in this area really should consider a MVC architecture which will avoid the concerns iterated here on /. and by pundits for open data standards everywhere in regards to display aka View technologies.
With a Model View Controller methodology and pattern in place it really is not a concern what technology is being used to display data at any given time. If public data is *stored* (Model) and *accessed* (Controller) via open standards then the *display* (View) itself is inconsequential and/or malleable to the extent needed for any purpose.
Flash is great at some things, PDFs are perfect for a variety of tasks. They are, like any other format, not the only useful format available and should never be thought of as the 'archive' or 'final' format. The Model is the archive.
All the government agencies need to do is show that the Model is able to be trans-coded to several other popular storage formats without loss and that should be good enough for anyone. They also need to provide an API for accessing the data regardless of the Model and an output format that is structured and well documented (XML, JSON, SOAP even).
At this point it is the data consumer who should choose what format they would like to visually see it in... PDF, interactive Flash/Flex charts, JSON, Word, HTML, SGML, RTF... does it matter? Not to me or anyone else. I will get to choose the format I'd like it in (XML, JSON or Actionscript Objects please).
If the format doesn't exist yet, there's an API I can use to transcode the data as I see fit.
A fool throws a stone into a well and a thousand sages can not remove it.
Especially on Windows and with English language, it is not an excuse. Every scanner comes with OCR programs, at least in English. I did a 70 page manual translation back when Windows 3.1 was new so I know.
Of course, here are the true free software: http://jocr.sourceforge.net/ and recent Google (taken back to life) http://code.google.com/p/tesseract-ocr/
Even if you are home user, thanks to Spotlight and various Windows/Linux local engines, it is really good idea to keep text in pdf files.