Adobe Pushing For Flash and PDF In Open Government Initiative
angryrice tips news that Adobe seems to be campaigning for the inclusion of Flash and PDF in the Obama administration's efforts at increasing government transparency and openness. A post from the Sunlight Labs blog is critical of Adobe's undertaking, in part since PDF is often "non-parsable by software, unfindable by search engines, and unreliable if text is extracted." They also say government's priority should be to publish datasets and the APIs to interact with them, rather than choosing how they're displayed in fancy graphs and charts.
I don't believe this is true - I find PDF documents in search results all the time. The consistency and reliability of PDF for forms creation has no real competition. If you hate Adobe, ok, but don't hate PDF 'cause it's beautiful...
Ask Me About... The 80's!
Nobody likes Flash, and they probably shouldn't use it for anything. But there's not much wrong with PDF, if it's done right. When publishing something, one could offer "source" (some sane, machine-readable format) and PDF (autogenerated from the source, and prettified for easier reading).
PDF shouldn't be used as a way to encapsulate scanned JPEGs and pretend they're a real electronic document.
I would also note that many of the complaints about PDF as a format in TFA are really complaints about Adobe's abysmal PDF reading software. For example, the concern about the visually impaired: KDE's Okular does speech synthesis and has a high-contrast mode.
# cat
Damn, my RAM is full of llamas.
The future is ODF (a real open xml) and of course PDF, but specially html5+js+canvas+svg+ogg vorbis/theora for rich web content.
With this kind of technology that the new browsers bring to the arena, adobe is getting scared!
Perhaps you know of a document format where the text in images IS searchable?
Flash should only be considered if the government can mandate that Adobe provide and competently maintain a Flash player of comparable quality for all major desktop, mobile, and handheld OSes and platforms. The alpha-quality Flash player for 64-bit Linux sucks donkey balls while Windows gets star treatment. Open source would be another plus, but right now I'd settle for a 64-bit Linux binary that didn't crash my browsers constantly.
"Liberty may be endangered by the abuses of liberty as well as the abuses of power." -- James Madison
Way to go to convince government and its constituents that Flash and PDF will help them put together open websites and follow "ADA Guidelines for the Web" aimed at ensuring accessibility...
They also say government's priority should be to publish datasets and the APIs to interact with them, rather than choosing how they're displayed in fancy graphs and charts.
I felt a great disturbance in the Force, as if millions of IT workers suddenly cried out in terror, and were suddenly silenced.
I record my sleeptalking
Campaigning against PDF in any way might effectively equate to implicitly campaigning for Microsoft's XML Paper Specification (XPS)
GP is right. Government should focus on doing what government is needed for success, such as determining standards for formats that everyone can use, with input from academia and industry. For example a human readable parsable format that one could embed in a web page for semantic metadata. Or funding open source software to make it easy (cross platform) to input such data (I am thinking of information about cited papers or books). Typeset information is nice but we already are drowning in information - how many pages of Google results do you usually look at? And we need help before generating 10 times as much.
Why PDF is bad:
- It is a potable typeset document package. Not a data sharing package that could be pulled apart easily with tools automatically.
- PDF is extremely hard to parse, and using current free software does not always give good results.
- You destroy useful document structure, or in the case of ASCII text parsability and small size, when you convert to PDF. You can't just convert back to the original.
- It takes significant processing power and commercial software to display well and reliability as far as I can see. Having just gotten the latest Mac I feel like I'm in a dauntless battleship, but I have had many trouble with different unix tools in the past.
- Scientists publish PDF too but then also use other formats for data. For example on arxiv, one scientists recently published animations inside a zip but it was hard to find the link
- It is difficult to manage bibliographic information automatically.
- It is proprietary
- It requires a huge amount of data, and arcane knowledge, just to build a parser that works most of the time (such as for Asian languages especially).
Have these people not heard of Google? Just because YOU can't write software to parse PDF files doesn't mean that nobody else can and that it doesn't already exist.
If you are publishing a document that can be printed then PDF is a good format. If you expect people to extract data from the document then you should look for a different format. It depends on the purpose of posting the document on the web.
I am OK with PDF. I would RATHER see documents in plain HTML, but there are times when formatting is important. In those cases, if it is to be read/print-only, PDF is the way to go. Otherwise, the gov should use ODF.
But Flash? Are you kidding? The last thing on earth we need is more Flash.
* Does not work on all devices
* Slow and/or consumes tons of CPU
* Consumes tons of RAM
* Consumes more bandwidth
* Makes it difficult or impossible to cut and paste
* Impossible to "search/find"
* Violates the native UI look and feel
* Fonts and font sizes are uncontrollable by the end user
* Can't scroll correctly much of the time
* Almost completely proprietary
* Rarely adjusts to screen size
* Often introduces extremely irritating animation.
* Doesn't allow text to be "seen" by the browser (or OS), making other plugins (like a screen reader) 100% useless
At least that SilverDark stuff isn't even on the radar- thank God for little favors.
What are you talking about? The PDF specification has been available as a free download from Adobe with no royalties payable by implementors since PDF was first created. More recently, the PDF/X family of specifications was approved by ISO. These define subsets of the PDF 1.4 specification for different uses (see ISO 15930). There are at least three open source PDF readers that I know of as well as several commercial viewers (Adobe Reader, FoxIt, Apple's Preview, and so on) and numerous tools can generate PDFs.
I am TheRaven on Soylent News
The summary does not do a good job of reflecting the original blog post's point. The point was that the government should make data available in a machine-parseable and generic format. PDF is a great format for storing typeset pages, but it is a terrible format for publishing data. It's easy to generate beautiful PDFs from well-structured data but it's much harder to go the other way. Would you rather have budget figures (for example) as a CSV file in a well-defined format or as a PDF of tables and graphs? If the data is available in the former format, it's easy for you or a third party to produce the latter format. If it's only available in the PDF form then it's much harder to create the CSV.
If the goal is to make the data available, then even CSV would be a better option than PDF. PDF, while pretty, is a terminal format and is the digital equivalent of a mayfly. It's paper that hasn't happened yet and when it does it will exist for a few short hours before finding its way to the circular file.
Much of the government data consists of tables and tables of data. gzipped csv would be readable by anyone, so would ODF. Adobe appears to be looking for a handout at the expense of creating a useful and open data system.
Put it in context: open government requires data formats that are independent of campaign donors.
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
PDF/A is already open. However, that doesn't mean that anyone knows how to produce it, especially some R.O.A.D. staffer or random hourly GS1.
Open or not, PDF/A is a display format and, in most cases, useless for information retrieval or automated data processing. PDF/A is a useful alternative to paper. However, the open government initiative is not talking about paper. It's about 'born digital', machine readable data.
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
I guess most of you do not realize that Adobe produces SpyWare by default in their own products?
Flash for example has iesnare built into itself. This all allows machine profiling that everyone agrees to when you install their bullshit software!
You can't trust a company that has already done something to make you distrust them.
Adobe ships Flash/PDF readers/plugins to: Windows, OS X, Symbian (in some form), Linux, *BSD and various, uncountable tiny platforms. iPhone/iPod does not count because of obvious reasons.
Lets see what MS Silverlight ships to: Windows/Intel Mac. Damn thing is so tied to Windows that they couldn't even convert/ship the V2 for PPC Macs or they simply abandoned them. (like we cared!)
MS XPS format and viewer is the answer to PDF which, some people who didn't use Windows have never, ever heard of. It is that Windows centric. Despise all rude attempts by MS (adding XPS printer without etc), it has never, ever took off.
What we need is, something combines ODF and PDF. You can add binary file to PDF document like some layer. ROM LogicWare, less known Office (Papyrus) developer does it right now. The files are both PDF and their own edit format, transparent to PDF readers and NOT a hack.
Of course, people will spend time "omg flash, pdf, Adobe is slow" flaming rather than finding a solution to a real problem. Asking government to use Flash is really absurd but the real one to blame here is MS and open source based large companies. If they have no alternative, Adobe will suggest PDF of course. What else they should use? MS XPS?
If you look around, every single Apple computer, device (ipod/iphone) is actively indexing every single PDF thrown at them, instantly and keep database of it.
It is the famous "Spotlight" technology. They don't even need to look at Google, some of them have same kind of indexing technology (minus relation) running on their laptops.
One should check the TFA relations with MS. I am sure something will come up.
I work with PDFs a lot, especially on OS X. I am telling you from an OS which you can have 60 KB 1080p screenshots in PDF in some circumstances: Whoever did that "text as image" trick, he is a complete moron.
One of the reasons that PDF took off is exactly embedding fonts used in a document so it will appear as pixel perfect on client machines.
As last resort (and a good practice), you can embed unformatted pure text of the entire PDF in your PDF file. PDF, like Quicktime Mov is one of the formats where people doesn't use the features and bitch about the size of client etc.
A number of government forms don't work with the free PDF readers.
This is because Adobe broke its own published spec with its LiveCycle product, and by default it saves files that aren't compatible with anything else. It does a great job of forcing you to buy LiveCycle/Acrobat instead of using free tools. The Adobe people will tell you that it speeds up rendering of downloaded data, which I find hard to believe as the files are between 2x and 3x the size of a regular PDF.
The current use of Adobe products for government forms is a nightmare, it seems like a dumb idea to extend it.
I will ask one thing as you seem to miss why HTML is not considered a print/distro format: "When did we have an embeddable font standard for HTML webpages?" as with Flash: "Is there a way to have a single file and infrastructure to show embedded videos in HTML5 form?"
They actually suggested people to use abandoned VP3 format for God's sake and the very same people have chosen TrueType (check why freetype exists) as font embedding format.
Flash is evil for man reasons, but the most in-your-face reason if you use a Mac is that the Mac Flash plugin crashes all the time. It is the #1 (by far) reason for Safari crashes on the Mac.
I'm not wild about PDF, but at least I don't see PDF viewers crashing all over the place.
"...unfindable by search engines..."
That is absolutely not true. Anyone who uses Google knows that the search engine can read PDFs, identify if any of the keywords are located within, and then provide a link both directly to the PDF as well as to an HTML version.
The world moves for love. It kneels before it in awe.
This could be a Good Thing, if it means that the formats will be made and remain open. IIRC, PDF is already an open standard, and supported by various programs from multiple sources. I would applaud it if the same were to happen to Flash. And if both formats are open and widely supported, the government could do a lot worse than using them.
Please correct me if I got my facts wrong.
So there is a partial option for MS-Windows only. Great. Not exactly platform agnostic and open. I suppose it is better than nothing, though.
There are a huge number of free programs that can create PDFs. Anything that uses Cairo for rendering can generate PDFs natively, although without some of the nice metadata. If you're using almost any modern operating system (Windows or anything that uses CUPS for printing, including Linux and OS X) then any application that can print can also generate PDFs. I use pdflatex very often and it produces beautiful PDFs with working hyperlinks and the table of contents in the bookmarks section, and it will happily import the PDFs that I've created with gnuplot or graphviz as well as commercial tools like OmniGraffle. My entire workflow involves creation of PDFs to send to my publisher. None of the tools that I use come from Adobe and most are Free Software.
I am TheRaven on Soylent News
Apple uses PDF as the basis of the OS X display engine. When they adopted the NeXT OS as their next-generation to replace the "Classic" Mac OS, they switched from NeXT's Display PostScript precisely because PDF was a free and open-source specification. An OS X user can create a PDF file from pretty much any document simply by beginning a print operation then selecting "save as PDF" from the print dialog box.
This ain't rocket surgery.
"Further, the recent PDF specifications add DRM which shouldn't be allowed in government publications. If the govt agrees to use a PDF version that open source software can completely read, parse, and convert, then it is fine PROVIDED the raw data is available in open formats too."
No, it's not fine because, as others have pointed out, PDF is mainly use for formatting documents. It's doing a pretty adequate job on that as well, and you can use third party software that can actually display it without the drawbacks of the *HORRIBLE* Adobe software. But that does not make it a good mechanism for storing information that can be indexed in any useful way (except simply parsing the text). Hell, you can't even /select/ text normally using most PDF readers.
How many free programs do you know of that create .pdf's?
To lazy to count right now, but just what I use on more or less daily bases, about 20. Plus hundreds of others that I don't use.
AccountKiller
PostScript is also a free specification, but NeXT was using the Display PostScript implementation licensed from Adobe. They switched to something closer to PDF because, it turned out, no one actually cared about the nicer features in PS. With DPS, you could write view objects entirely in PostScript and have them run on the display server. This was quite slow and had all sorts of problems in that the PS programs could (potentially) run forever. Most people just used the drawing subset of PS, which is also available in PDF, and none of the flow control stuff.
I am TheRaven on Soylent News
Yes, and then they SUED Microsoft for putting PDF support in Office. It's only "open" as long as you're not big enough to compete with Acrobat. If you even get within a mile of stepping on Adobe's business, you're sued up the wazzoo.
"Free and open" my ass.
Comment of the year
Hell, you can't even /select/ text normally using most PDF readers.
People keep saying that. I never had a problem with this. I use 3 or 4 different pdf readers, including the one from Adobe, and I never had problems with selecting and cutting text from a pdf document.
AccountKiller
If you look around, every single Apple computer, device (ipod/iphone) is actively indexing every single PDF thrown at them, instantly and keep database of it.
No it is not. It is indexing every PDF that has text in the metadata. Create a PDF by printing to PostScript and then converting to PDF (the easiest way of creating PDF on Windows or Linux machines) and watch Spotlight completely fail to index it. Spotlight does not index the text that you see when you browse the PDF, because that text is stored as a set of glyph indexes, not as streams of characters.
And if you want some real fun, open up a PDF containing table in Preview and try to persuade it to copy it in a way that preserves the structure of the table.
I am TheRaven on Soylent News
Bullshit.
It's either an open standard, meaning anybody can use it-- ANY BODY-- or it's not. There's no such classification as "it's an open standard, except we don't let companies we don't like use it because they have a big marketshare, but other than that it's an open standard believe me!"
By your argument, Microsoft should also be prevented from parsing HTML files in IE because they're a monopoly. Does that make sense? No. Does your argument make sense? No.
Comment of the year
"Everybody uses it" is not the same as open. PDF is like VHS or CD. All are closed standards, requiring a license from their respective owners.
BTW why was I modded "flamebait" for expressing an opinion? Silly, silly, silly.
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
Pretty surprised I'm the 1st to suggest this combo. Most 'modern' browsers are close to svg 1.1 now. Google has stated it's interest in svg, hosting this year's svgopen.org. Indexability being a strong draw. Sure, everytime you mention an xml format the json guys cough up bits. Size is reducible by gzip, xslt is not 'pretty' but the flexibility will exercise yer greystuff. MarkT ps: Inkscape.org will convert pdf pages to svg nicely.
"Everybody uses it" is not the same as open. PDF is like VHS or CD. All are closed standards, requiring a license from their respective owners.
You're moderated flamebait for being wrong and, as you usually do, aggressively defending your incorrect position when ten seconds of fact checking would indicate that you are wrong.
You can, as I said in the original post, download the PDF specification and implement it without paying a royalty and you've been able to do this for every version of the PDF specification since version 1.0. That page is linked to from the top link that you get if you Google for 'PDF specification' and it has been for some years. No license is required for downloading or implementing them.
The ISO 32000 specification, which is now the official PDF specification, costs money to buy from ISO (as all ISO specs do, including the C language specification), but the format it describes is identical to the one described by the PDF 1.7 format, with various sub-formats (e.g. PDF/A) requiring only a subset of the features described in this document. Although it costs money to get the spec from ISO, there are no royalty requirements for implementors. Adobe now publish their versions as extensions to the ISO-controlled format, rather than as complete new specifications and any organisation wanting interoperability should mandate the ISO specification rather than the Adobe extensions.
Of course, you'd have known all of that if you'd spent a minute actually doing basic research on the topic at hand before posting. Fortunately, the fact that you still haven't learned how to use quote tags means that it's usually easy to spot your posts from a distance and ignore your ill-informed ramblings.
I am TheRaven on Soylent News
>>>you are wrong. You can, as I said in the original post, download the PDF specification and implement it without paying a royalty and you've been able to do this for every version of the PDF specification since version 1.0 [1993]
>>>
Guess what? You are wrong too. (Surprised? You shouldn't be; nobody's perfect; not me nor you.) PDF did not become an open standard until version 1.7 [2008] according to wikipedia. That was only a year ago.
Which is why, as others pointed out, various companies had been sued for infringing upon Adobe's PDF patents. I had to *buy* Adobe Acrobat because at that time (2004) there was no other program available to create PDFs. It was parented and restricted.
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
the fact that you still haven't learned how to use quote tags
You mean like that? I know how to use them just fine, but I've always preferred the old Usenet methodology. Typing >>> is a heck of a lot faster than typing 14-letter tags.
.
>>>your ill-informed ramblings.
That's nice. You were still wrong when you said, "You've been able to do this for every version of the PDF specification since version 1.0." Adobe had the patents until 2008. That means it was closed. No one could legally publish a PDF Creator program prior to that year, as Microsoft and other companies discovered when they got sued.
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
That's weird, because any time I cut anything around a page border, a table or more or less any other break in the page, everything gets screwed up. I won't even go into what happens when there is a watermark on the page. And with screwed up, I mean screwed up. Missing parts of text, text in wrong order, you name it. That and it crashes every so often, it doesn't live through power saving state on my computer, to name something. I won't go into the way it handles tabs, or form input or search or pop ups because we could be discussing their crapware for hours on end.
I'd vote for that as a standard.
No sig today...
Useless is the wrong word. It took 15 lines of python wrapping xpdf for me to get a working system for dumping the transactions out of the last 6 years of my credit card statements.
It's ugly, but it works just fin
That would be because that particular PDF happened to accidentally be wrapping ASCII or ISO-8859 or UTF-8 or UTF-16 instead of some image format. Even then, that was just screen-scraping like can be done with old terminal sessions. It can be done, sometimes.
Keep the data in machine readable formats, not a terminal format like PDF or paper.
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
When Flash had a few issues a couple months ago, I removed it from my browser. Suddenly, thousands of irritating advertisements and web banners and annoying intro pages of pointless information were blank with only a notice to install flash player.
Remove it from one browser and see if it doesn't make surfing better for you.
Just my 2 cents in regards to public records and data.
I'd like to say that the groups making decisions in this area really should consider a MVC architecture which will avoid the concerns iterated here on /. and by pundits for open data standards everywhere in regards to display aka View technologies.
With a Model View Controller methodology and pattern in place it really is not a concern what technology is being used to display data at any given time. If public data is *stored* (Model) and *accessed* (Controller) via open standards then the *display* (View) itself is inconsequential and/or malleable to the extent needed for any purpose.
Flash is great at some things, PDFs are perfect for a variety of tasks. They are, like any other format, not the only useful format available and should never be thought of as the 'archive' or 'final' format. The Model is the archive.
All the government agencies need to do is show that the Model is able to be trans-coded to several other popular storage formats without loss and that should be good enough for anyone. They also need to provide an API for accessing the data regardless of the Model and an output format that is structured and well documented (XML, JSON, SOAP even).
At this point it is the data consumer who should choose what format they would like to visually see it in... PDF, interactive Flash/Flex charts, JSON, Word, HTML, SGML, RTF... does it matter? Not to me or anyone else. I will get to choose the format I'd like it in (XML, JSON or Actionscript Objects please).
If the format doesn't exist yet, there's an API I can use to transcode the data as I see fit.
A fool throws a stone into a well and a thousand sages can not remove it.
My experience is it works fine for single column text or for selecting individual words but at least acrobat reader doesn't have any clue about what is and isn't part of the same block of text (e.g. it will select over into the second column of a two column page before selecting stuff in the next row of the current column)
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Hm, never happened to me. I have few hundreds pdf files here, many of them with 2 or 3 columns, tables, formulas etc, and I don't seem to have any problems with selecting text. What software were your pdf files generated by, anyway?
AccountKiller
Especially on Windows and with English language, it is not an excuse. Every scanner comes with OCR programs, at least in English. I did a 70 page manual translation back when Windows 3.1 was new so I know.
Of course, here are the true free software: http://jocr.sourceforge.net/ and recent Google (taken back to life) http://code.google.com/p/tesseract-ocr/
Even if you are home user, thanks to Spotlight and various Windows/Linux local engines, it is really good idea to keep text in pdf files.