Adobe Pushing For Flash and PDF In Open Government Initiative

don't hate PDF 'cause it's beautiful by vaporland · 2009-10-31 01:25 · Score: 1, Informative

"non-parsable by software, unfindable by search engines, and unreliable if text is extracted."

I don't believe this is true - I find PDF documents in search results all the time. The consistency and reliability of PDF for forms creation has no real competition. If you hate Adobe, ok, but don't hate PDF 'cause it's beautiful...

--
Ask Me About... The 80's!

Re:don't hate PDF 'cause it's beautiful by hedwards · 2009-10-31 01:29 · Score: 5, Insightful

I have no problem with PDFs, there are a number of free and commercial applications out there that can work with them.

Flash on the other hand is absolutely an abomination that must be wiped from the net. They still haven't released a proper version for *BSD and they commonly don't bother with less popular OSes. If they want it to be used for this sort of purpose then they need to get their act together and make it available for all operating environments on an equal basis. Which I don't think they have the resources to do.
Re:don't hate PDF 'cause it's beautiful by Bacon+Bits · 2009-10-31 01:35 · Score: 2, Interesting

PDFs are only searchable if the document contains text. Half the time PDFs contain text-as-image, which is about as useful to a search engine as a captcha image. Google doesn't run OCR on PDFs, AFAIK. Although, come to think of it, that sounds like something they'd get sued by a random company for doing for "violating copyright proprietary information".

--
The road to tyranny has always been paved with claims of necessity.
Re:don't hate PDF 'cause it's beautiful by Antique+Geekmeister · 2009-10-31 01:39 · Score: 4, Informative

PDF remains difficult to manage. Like MS Word documents, an incredible amount of resources is wasted in display information rather than actual text or graphical content. Unlike MS Word, they're parseable: but unfortunately like MS Word, the commercial vendor-sold document creation tool (Adobe Acrobat) generates unstable and unreliable content that interacts very badly with other tools. Oddly, the ghostscript created PDF remains very stable and legible, and tools like "PDFCreator" which uses ghostscript creates long-term viable PDF printouts of other document formats. I use it for complex MS Word documents that cannot be handled by other software, even different versions of MS Word.
Adobe can actually do better with this, and I hope that they will in the future. But it's not stable enough to be reliably indexed or viewable even 5 years in the future, much less 10 or 20 or 100 such as may be needed for legal or historical documents.
Flash, you're quite right. Unless they open up the source, it has no business as yet another document format.
Re:don't hate PDF 'cause it's beautiful by TheRaven64 · 2009-10-31 01:40 · Score: 5, Insightful

The summary does not do a good job of reflecting the original blog post's point. The point was that the government should make data available in a machine-parseable and generic format. PDF is a great format for storing typeset pages, but it is a terrible format for publishing data. It's easy to generate beautiful PDFs from well-structured data but it's much harder to go the other way. Would you rather have budget figures (for example) as a CSV file in a well-defined format or as a PDF of tables and graphs? If the data is available in the former format, it's easy for you or a third party to produce the latter format. If it's only available in the PDF form then it's much harder to create the CSV.

--
I am TheRaven on Soylent News
Re:don't hate PDF 'cause it's beautiful by Crudely_Indecent · 2009-10-31 01:57 · Score: 2, Informative

Many implementations of PDF converters merely print a document to images and then embed the images into a PDF. Those are non-searchable and no text can be extracted with the existing tools. I once created a documentation website which relied on these embedded image types of PDF documents. I had to implement an OCR solution in order to extract the text to make my clients documentation searchable. It was ugly and a real pain in the ass.
Certainly, PDF can be beautiful, but it is often not implemented that way. Personally, I'm a big fan of PDF. If not implemented properly, I try to avoid it.

--

"Lame" - Galaxar
Re:don't hate PDF 'cause it's beautiful by petermgreen · 2009-10-31 01:58 · Score: 1

but unfortunately like MS Word, the commercial vendor-sold document creation tool (Adobe Acrobat) generates unstable and unreliable content that interacts very badly with other tools
Can you be more specific as to what problems you have had using files from acrobat in other tools?

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Re:don't hate PDF 'cause it's beautiful by cryfreedomlove · 2009-10-31 02:09 · Score: 1

I agree that open government docs should stay away from Flash. I don't agree that Flash is an abomination because Adobe does not bother with less popular OSes. Why should they implement Flash on less popular OSes? That costs Adobe real money and then only a handful of users would benefit. If you were in charge of the engineering budget at Adobe, would you spend $ on a feature for Mac and Windows that 100 million people would use or would you use that same $ to port Flash to a less popular OS with 10,000 users?
Re:don't hate PDF 'cause it's beautiful by dov_0 · 2009-10-31 02:17 · Score: 1

With the way CSS is developing, won't flash be redundant soon anyway? I certainly hope so!

--
sudo mount --milk --sugar /cup/tea /mouth /etc/init.d/relax start
Re:don't hate PDF 'cause it's beautiful by Antique+Geekmeister · 2009-10-31 02:19 · Score: 3, Interesting

Printing documents created in other language versions of Acrobat. In particular, the Adobe Acrobat for German created documents that were not only unviewable in a normal Acrobat viewer, but when used to "print PDF" for MS Word documents, created documents that actually crashed Windows computers. The Acrobat for Hebrew didn't crash Windows with the printed documents, but was filled with layout errors when rendered even by Acrobat Reader, errors that didn't show up in the Adobe Acrobat tool. Much of this may have been fixed with the latest release, but I'm not spending nor suggesting that my peers overseas spend all the money needed to upgrade.
Getting our colleagues to stop using Acrobat and use _anything else_ to generate their documents, and use PDFCreator to print them as PDF, stabilized the situation enough for us to generate the documents we needed. It didn't provide PDF forms for people to fill out, which was its only flaw.
Re:don't hate PDF 'cause it's beautiful by Anonymous Coward · 2009-10-31 02:23 · Score: 2, Informative

Unlike micros~1 word documents, there are freely available specifications and a reasonable number of quite reasonable third party implementations that can either display or generate PDF, or even both. That is to say, you can very well ``do PDF'' without ever using adobe software. Part of its success is that it's a dumbed-down version of PostScript, also open and arguably the right way to talk to printers. That's a whole sight better than micros~1's ooxml abomination, that once standardized turned out to have not even one conformant working implementation. Agree on the flash, but there's more.
PDF is pretty good on storing bound-for-paper documents (and when doing that, use metric paper, dammit) though for scans you're probably better off with DJVU. Flash is basically pure concentrated dancing rodents, and has very little to offer beyond gimmicks. Unless it opens, and opens soon, it will have no staying power and flash data will be rendered useless in a decade or two. That's bad for archiving.
The core goal should be content: Content, interop, accessability for the disabled, accessability for non-wintendo machines regardless of marketshare, archiving, being able to re-use, and still being able to access centuries down the road. PDF may qualify, flash certainly does not.
Re:don't hate PDF 'cause it's beautiful by eugene2k · 2009-10-31 02:28 · Score: 1

Better JSON or XML than CSV

--
Apple has "Mac vs PC", Microsoft has "Laptop Hunters", Linux has recession
Re:don't hate PDF 'cause it's beautiful by Joce640k · 2009-10-31 02:58 · Score: 1

Can I hate all the multimedia/hyperlink/scripting/vulnerabilities they've added to PDF?
I'll back this so long as it's PDF light - text and graphics only (OK, maybe I'll allow hyperlinks...).

--
No sig today...
Re:don't hate PDF 'cause it's beautiful by xjimhb · 2009-10-31 03:01 · Score: 3, Interesting

Just recently I had to look at, and print a few pages from, a PDF document. Knowing where it came from, a corporation that is only very slowly dipping a toe in the water of software other than the big names, I'm sure it was done with Adobe.
Now I don't even have the Adobe Acrobat reader on my system, when I try to install it, the install crashes. But Fedora comes with several other PDF readers, and the default is set to "Evince" which works fine MOST of the time.
But I got this PDF, and one page was a picture of a tax form, and when I tried to print it, the tax form came out as a big black blob - man, does that waste ink! Obviously I killed the print job to try something else. (Just VIEWING this tax form was fine, only printing messed up.)
I remembered using "Xpdf" a while ago, so I tried that, and voila, the tax form printed perfectly. Since I knew there were more tax forms in there, I used Xpdf for the rest of the job.
So here is a case where two different PDF viewers reacted differently to the same PDF file. I think what we need is is an OPEN DEFINITION for PDF files, probably a subset of Adobe's definition, that any OSS viewer can follow and get the proper results - and ask the user what to do with files that don't follow it.
And tell Adobe they can either follow the open definition, or stuff it where the sun don't shine!

--
Teen Angel - a Ghost Story
Re:don't hate PDF 'cause it's beautiful by Darkness404 · 2009-10-31 03:26 · Score: 2, Insightful

Because Flash is now a crucial part of the internet. Until HTML 5 comes out with video standards and the like, Flash is about the only way you can embed videos in sites without ruining the layout of the site with a third-party media player and without your users searching for codecs.

If Adobe would simply release the source to the Flash player, they could -save- money, have full platform compatibility and perhaps make more money with the Flash creation products. Think of it this way, if there was a fast language (most apps in Flash seem to load, run and interact faster than Java) that you could truly write once and run anywhere, it would be a hit. Flash could be this language if Adobe just opens up the player. Until they open it up, I expect them to do a good job and port it to every single OS or platform where it is allowed because it is good for business for them and helps that platform (which in all honesty Adobe should want to kill Windows as quickly as possible and move the world to OS X and Linux).

--
Taxation is legalized theft, no more, no less.
Re:don't hate PDF 'cause it's beautiful by russotto · 2009-10-31 03:50 · Score: 3, Informative

I think what we need is is an OPEN DEFINITION for PDF files, probably a subset of Adobe's definition, that any OSS viewer can follow and get the proper results - and ask the user what to do with files that don't follow it.
There is such; Adobe publishes it and makes it freely available on its web site. It's possible your file didn't follow it, but it's more likely your reader wasn't 100% compliant; it's a very complicated specification.
Re:don't hate PDF 'cause it's beautiful by John+Whitley · 2009-10-31 04:24 · Score: 1

It's easy to generate beautiful PDFs from well-structured data but it's much harder to go the other way. Would you rather have budget figures (for example) as a CSV file in a well-defined format or as a PDF of tables and graphs?
More importantly, it's then easy to import that data for visualization and analysis purposes. Data presented as a PDF file is effectively so inaccessible that it will rarely be extracted for further analysis, meaning that some gov't functionary becomes responsible for the presentation and analysis instead of members of the public. Then a panoply of tools become available for finding out things from that data that no one ever knew were there. Something like Tableau Desktop can slurp in CSV data (or data imported to a slew of OSS or commercial DBs) and allow very rapid exploration.
As an aside, I will point out that CSV is an _evil_ format. Did you know it can be generated in localized forms (without any distinguishing metadata), that mean comma is supplanted for use as a thousands separator? Oops. Really, what idiot thought it was a good idea to have a localized data format... Much better to use a serialization format like Avro which uses a compact serialization for tabular data (akin to Protocol Buffers or Thrift) and the schema data (i.e. the description of the table's structure: columns, types, etc.) as a sidecar file in JSON.
Re:don't hate PDF 'cause it's beautiful by John+Whitley · 2009-10-31 04:27 · Score: 2, Interesting

CSV is kinda evil (see my post above), but it's better for tabular data than JSON or XML. Again, a tabular serialization format such as Avro, Thrift, or Protocol Buffers might well be far better than CSV for tabular data. JSON has quite a bit of format bloat, and would need some standardized way to explain the data's schema for further analysis. XML is the king of format bloat, but at least has standard schema representations. XML is far better for semi-structured or unstructured data than tables.
Re:don't hate PDF 'cause it's beautiful by TheRaven64 · 2009-10-31 05:47 · Score: 1

PDF/A is the term you are looking for. It is the ISO-defined subset of PDF that prohibits encryption, JavaScript, sound and video.

--
I am TheRaven on Soylent News
Re:don't hate PDF 'cause it's beautiful by cryfreedomlove · 2009-10-31 06:09 · Score: 1

Why, as you say, is it good business for Adobe to port it to every single OS, even those with only a handful of users?
Re:don't hate PDF 'cause it's beautiful by Darkness404 · 2009-10-31 06:13 · Score: 1

Because it would allow for a language and framework that works on every single OS. Look at Java, even though it has numerous faults, the fact that it is now open source and ported to just about every single device means that it is used for lots of cross-platform programs. Flash could be the same way if they ported the player to every OS and device. By open sourcing the Flash player they would A) save money in development B) allow the porting of it to various platforms and C) Improve sales for their development program.

--
Taxation is legalized theft, no more, no less.
Re:don't hate PDF 'cause it's beautiful by Cochonou · 2009-10-31 19:57 · Score: 1

Which PDF converters do that ? Because they must be really crap.
Most PDF converters I have used rely on Ghostscript on a way or another (after all, it's free!), and Ghostscript definitely doesn't do like this.
Most images-embedded-as-PDF files come from Xerox printers. Which, of course, have trouble knowing whatever was typed in the document in the first place.
Re:don't hate PDF 'cause it's beautiful by wondershit · 2009-10-31 21:56 · Score: 1

an incredible amount of resources is wasted in display information rather than actual text or graphical content
Isn't this one of the key features of PDF? PDF documents should (a) look the same on every device -- preferably forever, i.e. independent of any environmental constraints like available fonts -- and (b) be capable of representing very complex typography. So yes, you have to specify every little detail. If you just want to convey some information, use RTF or HTML or something. Heck, pure information needs no formatting, let it be plain text. But if the document should look really good (like, printing-books-good) and be portable you need more.
A little example: Imagine a justified paragraph. I don't know what it looks like in MS Word format but I had a look at ODT. The text is written in plain and obviously the program that processes the document has to figure out the spacing. In a PDF file it looks totally different. The producing application specifies the spacing so that there can never be any misinterpretation (provided the spec is followed and correct of course). I take my example from the license text of a book. A simple Permission is granted to copy and distribute becomes Td[(Permission)-296(is)-296(gra)1(nted)-296(to)-296(cop)10(y)-296(and)-296(distrib)20(ute).
Yes, this takes some resources. Whether they are wasted is another question.
Re:don't hate PDF 'cause it's beautiful by John+Hasler · 2009-11-01 02:49 · Score: 1

> More importantly, it's then easy to import that data for visualization and
> analysis purposes. Data presented as a PDF file is effectively so
> inaccessible that it will rarely be extracted for further analysis, meaning
> that some gov't functionary becomes responsible for the presentation and
> analysis instead of members of the public.
Which is exactly why PDF is what you are going to get (or something even more inaccessible).

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Re:don't hate PDF 'cause it's beautiful by Antique+Geekmeister · 2009-11-01 02:57 · Score: 1

Yes, detailed display format is a critical feature of PDF. This is key to why it will it's not appropriate for indexing and stable long-term storage: the visual detail actively interferes with its stability and reliability.
Re:don't hate PDF 'cause it's beautiful by TheRaven64 · 2009-11-01 06:33 · Score: 1

That bug was also present in Flash 9 on Mac/PowerPC. I had a lot of Flash videos turn into slide shows (one frame every few seconds) with a 1.5GHz PowerPC G4. Restarting the browser fixed it, but for some videos it would reappear after about ten minutes of playback. Upgrading to Flash 10 fixed it for me, so I suspect that Adobe just chased it from one part of their code to another if it's still present for other people.

--
I am TheRaven on Soylent News
Re:don't hate PDF 'cause it's beautiful by Tynin · 2009-11-01 10:00 · Score: 1

With the way CSS is developing, won't flash be redundant soon anyway? I certainly hope so!
I haven't been paying attention to CSS and Flash development for a while, so please help fill me in. How does CSS and flash relate? CSS is used for easy and consistent page formatting across a site, where flash is used for a specific page to render something, be it an app/movie/effect. Please explain how flash will be made redundant in favor of CSS for the uninformed of us. Thanks.
Re:don't hate PDF 'cause it's beautiful by dov_0 · 2009-11-01 14:06 · Score: 1

I really should have said HTML5 and CSS. CSS drop-down menus, transparency and the like also some great shadow and border features etc and even different developments in CSS animationwill hopefully replace flash and javascript.New HTML tags for embedding content or applications will go a long way to making flash redundant also. I hope.

CSS is used for easy and consistent page formatting across a site...
Well, yes that is ONE use for CSS, but please remember that it is Cascading Style Sheets. Style information can be embedded in the HTML itself, or in an external file. Style information in the page itself will be prioritized over CSS in an external document. This adds flexibility. HTML is about data. CSS is about presenting that data. Flash is doing what HTML and CSS should do and will soon.

--
sudo mount --milk --sugar /cup/tea /mouth /etc/init.d/relax start
Re:don't hate PDF 'cause it's beautiful by Tynin · 2009-11-02 13:15 · Score: 1

Excellent explanation. Many thanks :)
Re:don't hate PDF 'cause it's beautiful by TheSpoom · 2009-11-05 07:30 · Score: 1

The PDF reference is here, in case anyone was wondering.

--
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs
Re:don't hate PDF 'cause it's beautiful by TheSpoom · 2009-11-05 07:37 · Score: 1

I don't really understand why you consider CSV to be evil; it's one of the most simple, well-known formats around. Yes, you can change the separator, delimiter, and record-end characters to be something else, but all you have to do is tell people which characters you're using (though generally, IMHO people should stick to commas, quotes, and newlines). In addition, practically every CSV import routine can accept alternatives for these characters.

--
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs

Nobody likes flash by bcmm · 2009-10-31 01:31 · Score: 5, Insightful

Nobody likes Flash, and they probably shouldn't use it for anything. But there's not much wrong with PDF, if it's done right. When publishing something, one could offer "source" (some sane, machine-readable format) and PDF (autogenerated from the source, and prettified for easier reading).

PDF shouldn't be used as a way to encapsulate scanned JPEGs and pretend they're a real electronic document.

I would also note that many of the complaints about PDF as a format in TFA are really complaints about Adobe's abysmal PDF reading software. For example, the concern about the visually impaired: KDE's Okular does speech synthesis and has a high-contrast mode.

--
# cat /dev/mem | strings | grep -i llama
Damn, my RAM is full of llamas.

Re:Nobody likes flash by Anonymous Coward · 2009-10-31 04:52 · Score: 1, Informative

PDF is also not really an open standard. It's mostly open - but some very interesting features, like "Allow commenting in Reader" and "Allow Reader to save filled-in forms" cannot be implemented using published standards information.
I suppose it's ok if the website offers an option to return data in multiple forms (eg. here's a link to the original word file, or here's a PDF if you can't read Word), but it doesn't quite seem appropriate as _the_ way to present information.
Re:Nobody likes flash by lahvak · 2009-10-31 05:18 · Score: 1

If you are going to suggest presenting word document, then complaining about pdf not being open standard is somewhat hypocritical. The problem actually has nothing to do with openness of the pdf format, but rather with the fact that Adobe Reader is closed. Anybody is free to implement a reader that will allow to save filled in forms, and will allow commenting of pdf file, there is nothing in the format that could prevent it.
Also, creating pdf files from word documents is not the best way of doing it, in fact, I suspect that a lot of the bad rap that pdf gets is due to a huge amount of lousy pdf files created from word documents flooding the internet.

--
AccountKiller
Re:Nobody likes flash by Blakey+Rat · 2009-10-31 05:49 · Score: 1

Anybody is free to implement a reader that will allow to save filled in forms
Unless they're Microsoft. In that case, Adobe takes them to court and forces them to remove any PDF-relating features. PDF is an "open format" my ass. Adobe talks the talk, but they sure don't walk the walk.

--
Comment of the year
Re:Nobody likes flash by Ghubi · 2009-10-31 06:49 · Score: 1

Nobody likes Flash
Most people DO like flash. Most people use Internet Explorer. Most people have never even heard of open standards, much less give a damn about them. For most people, Flash, Windows Media, and Real are the 3 types of video that exist.
Re:Nobody likes flash by NotBorg · 2009-10-31 07:25 · Score: 2, Insightful

But there's not much wrong with PDF, if it's done right.

I'm sure they won't fuck this up, after all it is the US government.

--
I want this account deleted.
Re:Nobody likes flash by robogun · 2009-10-31 10:29 · Score: 1

Additional reasons are they're closed, and are malware vectors that need to be constantly updated. As if that wasn't enough reason not to use them, there are even phishing scams to update your flash or pdf installs... with the scammers horrible malware.
Unless you play Flash games or view Youtube all day there is no need to run Flash, all it does is deliver ugly ads or someone's horribly botched schoolboy attempt at an edgy webpage. Flash, far from enabling web usage is often used to RESTRICT usage, go to Webshots for an example of that.
Re:Nobody likes flash by mrmeval · 2009-11-01 03:01 · Score: 1

Yes PDF is marvelous "No you can't print this document scumbag" "No you can't save a copy cretin" "No you can't extract the pretty pictures asshole". I've run into it and it's a show stopper.
KDE is not windows and windows is here to stay for the foreseeable future. That is partly microsoft at work but it's also some serious usability and other problems with distributions using either a Linux or BSD kernel.

--
I'd go on a Vegan diet but the delivery time from Vega is too long. --brownkitty
Re:Nobody likes flash by bill_mcgonigle · 2009-11-02 06:58 · Score: 1

KDE's Okular does speech synthesis and has a high-contrast mode.
yet can't output PS that my Brother BRScript can handle (Evince does OK).
All of the open source PDF readers have come a long way in recent years - my only point is that PDF appears to be *hard* to implement. I don't know why somebody would need to, but my imagination is limited. Should a file format essential to government be such a hurdle to potential users?

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Re:Nobody likes flash by TheSpoom · 2009-11-05 07:41 · Score: 1

You're missing the point.
Flash is great for animation.
Flash is horrible for tabulated data.
The whole point of having this data open to the public is to allow the public to read and process it. If they can't load it into alternative environments to analyze it, the data effectively becomes useless; sort of a "transparency theatre" where none really exists.

--
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs

The future is ODF and html5 by quantic_oscillation7 · 2009-10-31 01:35 · Score: 1, Informative

The future is ODF (a real open xml) and of course PDF, but specially html5+js+canvas+svg+ogg vorbis/theora for rich web content.

With this kind of technology that the new browsers bring to the arena, adobe is getting scared!

Re:The future is ODF and html5 by tepples · 2009-10-31 01:38 · Score: 3, Insightful

but specially html5+js+canvas+svg+ogg vorbis/theora for rich web content.
Who has announced authoring tools for this stack that are anywhere near as capable as even Flash 3, let alone Flash CS4? Say I want to make an animated SVG like the Flash animations I see on Newgrounds. What package should I start with?
Re:The future is ODF and html5 by Cochonou · 2009-10-31 01:42 · Score: 2, Insightful

Right...

In order to read a document, what I really need to replace the heavyweight Adobe Reader, is a bloated modern browser ! :D
Re:The future is ODF and html5 by Anonymous Coward · 2009-10-31 01:54 · Score: 1, Insightful

The future is typically whoever gets there first; Adobe is shipping a great product (from a producer's prospective) right now today. SVG has been around for how long now? And it's still just a minor player; same with ogg. HTML5 will eventually make inroads, but the spec doesn't mandate any specific codecs. On top of that, it requires the browser to implement basic navigation controls; producers are going to want to keep their own in-house player controls.
Re:The future is ODF and html5 by oldspewey · 2009-10-31 02:13 · Score: 4, Funny

This sort of authoring is easily handled in vi - or emacs - your choice.

--
If libertarians are so opposed to effective government, why don't they all move to Somalia?
Re:The future is ODF and html5 by tepples · 2009-10-31 02:33 · Score: 2, Interesting

Yeah, and you can hex edit an SWF file too. But change a letter, refresh, change a letter, refresh, is not the kind of editing that graphic designers prefer to do. If that's what SVG has to offer, the market will choose SWF. I can only hope your comment was sarcasm.
Re:The future is ODF and html5 by tepples · 2009-10-31 02:36 · Score: 1

On top of that, [HTML 5 video] requires the browser to implement basic navigation controls; producers are going to want to keep their own in-house player controls.
That's still doable. JavaScript running in an HTML 5 page can disable the browser's built-in controls in a <video> element and control the video itself.
Re:The future is ODF and html5 by agnosticnixie · 2009-10-31 08:29 · Score: 1

i dont't know where and how adobe flash or the other cancer,ms-novell-silverlight-moonlight coud do that.
It can't, adobe's accessibility recommendations is to keep a separate non-flash version of the site and that's it. It doesn't degrade well (so financial access is problematic), it doesn't work in screen readers (and I've even seen flash shit coders who thought a site with audio without text for its menus, assuming the person is of a particular ethnic group and hearing, was a smart idea), and it's barely searchable.
Re:The future is ODF and html5 by agnosticnixie · 2009-10-31 08:48 · Score: 1

wow, they revised it recently. Let me review and grade if they did their homework, shall we?
Fail, barely, Fail entirely if you add the fact that the only 100% flash compatible screen reading tech is on Windows.
Of 5 categories supported, they have 4 with exceptions. 3 with killing exceptions. And only a handful users who are smart enough to implement the solutons.

What do you want? by FranTaylor · 2009-10-31 01:40 · Score: 1

Perhaps you know of a document format where the text in images IS searchable?

Re:What do you want? by Bacon+Bits · 2009-10-31 01:46 · Score: 2, Interesting

A document format shouldn't store text as an image. That's why it's called text.

--
The road to tyranny has always been paved with claims of necessity.
Re:What do you want? by petermgreen · 2009-10-31 02:00 · Score: 2, Insightful

That is not really a format issue though, in any format that supports images I can insert an image containing text.

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Re:What do you want? by TheRaven64 · 2009-10-31 02:07 · Score: 2, Interesting

You're missing the point. PDFs do not store text. Text is a stream of characters. PDFs store glyphs and their locations. It is more or less possible to convert glyphs into characters, although things like ligatures and the fact that spaces are not really represented make this difficult. In the metadata, some PDFs also store the text of the document, allowing it to be extracted. Given that the PDF is created automatically from the text in most cases, the text is more useful. You can create the PDF from the text easily, but creating the text from the PDF is much harder.

--
I am TheRaven on Soylent News
Re:What do you want? by PhrstBrn · 2009-10-31 02:52 · Score: 1

Sure, whatever. It's not text, it's "something else". But that something else can easily be converted to text.
You can copy/paste text out of a PDF in almost all PDF reader software. You can't possibly argue that you can't extract text from the PDF.
Re:What do you want? by TheRaven64 · 2009-10-31 03:12 · Score: 1

It depends on how the PDF was created. If the PDF had the source text embedded in the metadata then it will work fine. Now try it with a PDF that's generated by printing to PostScript and then distilling to PDF (as a lot of PDFs are). It won't work.

--
I am TheRaven on Soylent News
Re:What do you want? by 99BottlesOfBeerInMyF · 2009-11-01 06:50 · Score: 1

A document format shouldn't store text as an image. That's why it's called text.
A document shouldn't store text as images. A document format can be misused and should not be trying to interpret images and reject them if they contain text. Heck, I can misuse the standard text files by storing the text as ASCII images output by "banner", making the difficult to copy and paste and near impossible to search. That's not the fault of the format, but me for misusing it.
It's more of a problem with PDF because unlike the example I give with .txt files, because of how the formats are used. Documents that have the as images usually do so because they are scanned in printed files, where no digital source is available and the people inputting it did not apply OCR.
If there were no PDF format, you'd just see big JPG files, or big JPG files embedded in ODF or .doc files and have the same problem.
That said, recognizing the usability issue means there is a client side solution. Good OCR can be built into PDF readers so that those readers can scan the big images and render them into text despite the laziness of the document creator. Of course this will be an error prone process and is just a hack to work around the main problem of lousy document creation procedures resulting from paper centric workflows that have not yet replaced by modern technology.
Re:What do you want? by 99BottlesOfBeerInMyF · 2009-11-01 07:02 · Score: 1

It depends on how the PDF was created. If the PDF had the source text embedded in the metadata then it will work fine. Now try it with a PDF that's generated by printing to PostScript and then distilling to PDF (as a lot of PDFs are). It won't work.
I use a lot of PDFs from a lot of sources. I can copy and paste text from pretty much all of them with the rare exception of documents that are clearly scanned in versions of printed documents, complete with artifacts left over by the scanner. Now there are issues using multi-column PDFs in some readers that aren't smart enough to recognize the columns when the copy paste is performed, and different readers handle this with different amounts of ease. But that' does not indicate you can't get text out of a PDF, which you can whenever the document is not created from a non-text source.

One condition by PotatoFiend · 2009-10-31 01:45 · Score: 1

Flash should only be considered if the government can mandate that Adobe provide and competently maintain a Flash player of comparable quality for all major desktop, mobile, and handheld OSes and platforms. The alpha-quality Flash player for 64-bit Linux sucks donkey balls while Windows gets star treatment. Open source would be another plus, but right now I'd settle for a 64-bit Linux binary that didn't crash my browsers constantly.

--
"Liberty may be endangered by the abuses of liberty as well as the abuses of power." -- James Madison

Heads should roll at Adobe by Obispus · 2009-10-31 01:47 · Score: 1

From TFA

the entire site--designed in Flash--is practically inaccessible. After just a cursory browsing, here are some of the usability and data accessibility issues we observed. You can't select, copy, or paste any text. Your browser's font override features won't work, so you can't adjust the font or its size to be more readable. Your browser's built-in in-page search won't work, and you can't use the keyboard to scroll through the text. You can't parse or scrape the data in any way; the design is fixed-width, so it's not going to work well on different screen sizes; and browser plugins, like Greasemonkey, can't adjust anything. Basically when it comes to text at all, if you don't like the style or are visually impaired, you're screwed.

Way to go to convince government and its constituents that Flash and PDF will help them put together open websites and follow "ADA Guidelines for the Web" aimed at ensuring accessibility...

Re:Heads should roll at Adobe by agnosticnixie · 2009-10-31 08:31 · Score: 1

Even Adobe knows it, in their "let's pay lip service to the ADA and pretend we and our users can code our way out of a paper bag" page, their sole recommendation is to keep a non-flash version linked to...
Re:Heads should roll at Adobe by andreyvul · 2009-11-01 08:58 · Score: 1

You can FizzBuzz a paper bag?

--
proud caffeine whore

Tremor by sleeponthemic · 2009-10-31 01:48 · Score: 2, Funny

They also say government's priority should be to publish datasets and the APIs to interact with them, rather than choosing how they're displayed in fancy graphs and charts.

I felt a great disturbance in the Force, as if millions of IT workers suddenly cried out in terror, and were suddenly silenced.

--
I record my sleeptalking

If not PDF, then Microsoft's XPS: XML Paper Spec. by optikos · 2009-10-31 01:49 · Score: 1

Campaigning against PDF in any way might effectively equate to implicitly campaigning for Microsoft's XML Paper Specification (XPS)

PDF bad. Work on microformats please. by mattr · 2009-10-31 01:51 · Score: 3, Interesting

GP is right. Government should focus on doing what government is needed for success, such as determining standards for formats that everyone can use, with input from academia and industry. For example a human readable parsable format that one could embed in a web page for semantic metadata. Or funding open source software to make it easy (cross platform) to input such data (I am thinking of information about cited papers or books). Typeset information is nice but we already are drowning in information - how many pages of Google results do you usually look at? And we need help before generating 10 times as much.

Why PDF is bad:
- It is a potable typeset document package. Not a data sharing package that could be pulled apart easily with tools automatically.
- PDF is extremely hard to parse, and using current free software does not always give good results.
- You destroy useful document structure, or in the case of ASCII text parsability and small size, when you convert to PDF. You can't just convert back to the original.
- It takes significant processing power and commercial software to display well and reliability as far as I can see. Having just gotten the latest Mac I feel like I'm in a dauntless battleship, but I have had many trouble with different unix tools in the past.
- Scientists publish PDF too but then also use other formats for data. For example on arxiv, one scientists recently published animations inside a zip but it was hard to find the link
- It is difficult to manage bibliographic information automatically.
- It is proprietary
- It requires a huge amount of data, and arcane knowledge, just to build a parser that works most of the time (such as for Asian languages especially).

Re:PDF bad. Work on microformats please. by Anonymous Coward · 2009-10-31 02:07 · Score: 5, Informative

- It is proprietary
FAIL.
PDF is an ISO standard. See: ISO 32000-1, Document management – Portable document format – Part 1: PDF 1.7
This doesn't change the fact that it is a portable typesetting document format though. It's good for read only documents from your word processor but it shouldn't be (ab)used to store tables or graphs or whatever other crap people use it for.
---
As for Flash, lets not even go there. Flash is passable as a streaming video container, if you're making animated cartoons like Homestar Runner or as a platform for small web games but other than those use cases, you're using it wrong.
Re:PDF bad. Work on microformats please. by bcmm · 2009-10-31 02:12 · Score: 1

The majority of those issues would be fixed by publishing LaTeX sources next to the PDFs generated from them.

--
# cat /dev/mem | strings | grep -i llama
Damn, my RAM is full of llamas.
Re:PDF bad. Work on microformats please. by iris-n · 2009-10-31 02:25 · Score: 1

- Scientists publish PDF too but then also use other formats for data. For example on arxiv, one scientists recently published animations inside a zip but it was hard to find the link
Err... also? I've never seen a scientist using pdf to publish data. We use pdf (and ps and div) to publish typeset papers. The actual data is in a lot of formats, dependent on the field and application. I've seen csv, matlab's .mat, xml, jpeg, tiff, proprietary crap, etc.

--
entropy happens
Re:PDF bad. Work on microformats please. by Stormwatch · 2009-10-31 02:47 · Score: 4, Funny

It is a potable typeset document package.
So you can drink a PDF?!

--
Circumcision is child abuse.
Re:PDF bad. Work on microformats please. by maxume · 2009-10-31 04:42 · Score: 1

I usually look at the first 3 results of the first page of Google results. But that is related to the types of searches I usually run. I sort of expect that I am 'like most people' in that regard.
Well structured data is nice, but it isn't a panacea (for example, it can still be false).

--
Nerd rage is the funniest rage.
Re:PDF bad. Work on microformats please. by beelsebob · 2009-10-31 04:57 · Score: 1

Not only is it a standard, it's also *really* easy to parse. It's specifically structured so that any printer manufacturer can parse it and end up with *exactly* the same document as software displays.
It still doesn't change the fact that it's not for data transfer, but for pristine document layout though.
Re:PDF bad. Work on microformats please. by NotBorg · 2009-10-31 07:35 · Score: 1

Yeh see matey, if yeh leave out the R no one be respecting.
Yarrrrrrr!

--
I want this account deleted.
Re:PDF bad. Work on microformats please. by rdnetto · 2009-11-01 19:53 · Score: 1

Sure you can. Why wouldn't you be able to drink a Properly Distilled Fluid?

--
Most human behaviour can be explained in terms of identity.

WTF? by dnaumov · 2009-10-31 02:02 · Score: 1

PDF is often "non-parsable by software, unfindable by search engines, and unreliable if text is extracted."

Have these people not heard of Google? Just because YOU can't write software to parse PDF files doesn't mean that nobody else can and that it doesn't already exist.

Depends on the purpose by PineHall · 2009-10-31 02:04 · Score: 1

If you are publishing a document that can be printed then PDF is a good format. If you expect people to extract data from the document then you should look for a different format. It depends on the purpose of posting the document on the web.

Re:Depends on the purpose by lahvak · 2009-10-31 05:32 · Score: 1

Either you provide data, or you provide a document. Extracting textual data from pdf is not any harder than extracting them from a word file or an odf document.
If you want to provide data, provide data, in a csv format of something simple like that.
In fact, with pdf, you can do both, since you can attach the cvs or whatever format data to the document.

--
AccountKiller

PDF Yes, Flash No by markdavis · 2009-10-31 02:11 · Score: 5, Insightful

I am OK with PDF. I would RATHER see documents in plain HTML, but there are times when formatting is important. In those cases, if it is to be read/print-only, PDF is the way to go. Otherwise, the gov should use ODF.

But Flash? Are you kidding? The last thing on earth we need is more Flash.

* Does not work on all devices
* Slow and/or consumes tons of CPU
* Consumes tons of RAM
* Consumes more bandwidth
* Makes it difficult or impossible to cut and paste
* Impossible to "search/find"
* Violates the native UI look and feel
* Fonts and font sizes are uncontrollable by the end user
* Can't scroll correctly much of the time
* Almost completely proprietary
* Rarely adjusts to screen size
* Often introduces extremely irritating animation.
* Doesn't allow text to be "seen" by the browser (or OS), making other plugins (like a screen reader) 100% useless

At least that SilverDark stuff isn't even on the radar- thank God for little favors.

Re:PDF Yes, Flash No by gaspyy · 2009-10-31 02:55 · Score: 2, Insightful

Most of what you say is implementation-related rather than format-related. It's like saying that C sucks because there are so many crappy programs. I know about feeding the trolls, but for all those who don't know better, here we go:
Nothing "just works" on all devices and in this area flash fares better than most other technologies; agree is slow; not really agree on RAM usage.
Flash uses less bandwidth than alternatives, it's quite very well optimized. Sure, someone can stuff some 10 min. mp3s encoded at 256kbps and and bunch of 2048x2048 bitmaps but that's another story.
Cut/Paste is more tedious because of security reasons but keyboard shortcuts work. Search works too and static text is indexed by Google.
Agree on native UI, but then so it's Java. Font size is controllable by user if the app is done properly -- granted, user can't override any settings.
Scrolling - never had an issue. Specs are open. Rarely adjust to screen size - are you kidding me? it's vector, by default it will adjust to anything and can be programmed a lot better than CSS/HTML.
Irritating animation - not a fault of the format itself.
Works with screen readers -- seriously, have you TRIED it?
What Adobe is pushing is most likely their "Flashpaper" format, something similar to PDF but lighter.
One more comment from the summary: "unfindable by search engines" - where does this come from? Google and all have been indexing PDF files since 10 years ago.
I know Slashdot crowd loves to hate flash, but at least hate it for the right reasons: its lack of speed and real 3d hardware acceleration.
Re:PDF Yes, Flash No by Vladimus · 2009-10-31 03:09 · Score: 1

HTML is a great middle ground. By following XHTML rules and combining it with CSS, you have a very parse-able document and can typeset it virtually any way you want. I've loved the PDF format since it was PostScript, since it can literally do anything involving typesetting or vectors, but trying to get data out of it sucks. It'd be great if adobe could somehow embed text data or XML into it, but I don't see that happening.
I wonder if SVG might work well.

--
A rolling stone is worth two in the bush!
Re:PDF Yes, Flash No by Vexorian · 2009-10-31 03:29 · Score: 2, Insightful

I know Slashdot crowd loves to hate flash, but at least hate it for the right reasons: its lack of speed and real 3d hardware acceleration.
Those are very lame reasons. We are talking about open government initiative here, not about "standard for web games" initiative. Flash is:
Not portable: Many platforms lack proper support. Flash can't be legally redistributed, alternatives are poor. It is no open format in any way.
Bad for accessibility.
Not a web standard or anything close to it.

Nothing "just works" on all devices
Then make the format 100% free to get, 100% easy implement and to 100% redistributable without royalties. So that the device and platform makes actually can make it work instead of asking for Adobe's charity. Ever wonder how come XHTML more than just works on all devices? Without those things, flash is terrible for this job in question which is as a tool to give access to all the citizens to government information.

--

Copyright infringement is "piracy" in the same way DRM is "consumer rape"
Re:PDF Yes, Flash No by Vexorian · 2009-10-31 03:32 · Score: 1

Oh, and no half assed "openish" attempts a la MS. The whole entirety of it would have to be open, including the codecs and the tools to generate them. Nothing about proprietary extensions making the standard optionally-open. Also, as a standard for open government initiative, giberish like DRM must be completely out of the question.

--

Copyright infringement is "piracy" in the same way DRM is "consumer rape"
Re:PDF Yes, Flash No by markdavis · 2009-10-31 04:16 · Score: 1

>Most of what you say is implementation-related rather than format-related. It's like saying that C sucks because there are so many crappy programs.
I will agree that there are better and worse ways to IMPLEMENT Flash, but even properly implemented, it doesn't address all (or most) of my issues.
>Nothing "just works" on all devices and in this area flash fares better than most other technologies; agree is slow; not really agree on RAM usage.
HTML works fine on all devices. 95% of the time I see Flash used, it is totally unnecessary.
>Flash uses less bandwidth than alternatives, it's quite very well optimized. Sure, someone can stuff some 10 min. mp3s encoded at 256kbps and and bunch of 2048x2048 bitmaps but that's another story.
Agreed- for when you really need video or audio, that is one of the few times Flash shines (although it is still a pig). But we are mostly talking about documents, not multimedia. And for general website use and documents, rarely does Flash really add anything useful to counteract the tremendous negatives.
>Cut/Paste is more tedious because of security reasons but keyboard shortcuts work.
I cannot highlight and middle paste somewhere, so it is at least 1/2 broken.
> Search works too and static text is indexed by Google.
Funny, when I do control-F in firefox and ask to find something, it never finds anything inside a Flash object.
>Agree on native UI, but then so it's Java.
And I agree on Java- it is usually unnecessary and an annoying pig too. But I encounter Java less than 1% of the time I encounter unnecessary Flash.
> Font size is controllable by user if the app is done properly -- granted, user can't override any settings.
And that is what I am talking about. I am rarely, if ever in control when viewing anything Flash. I can do ONLY what the developer decided I should be allowed to do, and only in their non-standard way.
> Rarely adjust to screen size - are you kidding me? it's vector, by default it will adjust to anything and can be programmed a lot better than CSS/HTML.
Try this on your small internet tablet or Flash supporting phone: http://blueswitch.com/ It is a perfect example of the typical Flash site that doesn't adjust to anything. And without Flash, there is essentially no content. If you have a large screen, it uses only a fraction of it. If you have a small screen, it is barely usable.
>Irritating animation - not a fault of the format itself.
True. And, yet, Flash developers can't seem to resist form over function.
>Works with screen readers -- seriously, have you TRIED it?
Under what platforms does it work? All? Can the browser see the text inside Flash? Can the OS?
>One more comment from the summary: "unfindable by search engines" - where does this come from? Google and all have been indexing PDF files since 10 years ago.
I assume they are referring to text within Flash objects, not PDF.
>I know Slashdot crowd loves to hate flash, but at least hate it for the right reasons: its lack of speed and real 3d hardware acceleration.
There are a lot more reasons to hate it than 3D and speed. I listed quite a few already. And you are correct that maybe half of my issues are with the typical IMPLEMENTATION of Flash and not Flash itself. But if 99% of the time I saw a steel pipe in someone's hand it was used to clobber and rob people, I might be upset when I see one.
Re:PDF Yes, Flash No by Lennie · 2009-10-31 06:16 · Score: 1

Not only that, but so much more is possible these days with a browser that supports proper standards.

Flash became populair by the web-development community when you had to do a lot of web-programming to get things done and the performance wasn't optimised for those kind of things.

But that is ages (in internet time) ago.

--
New things are always on the horizon
Re:PDF Yes, Flash No by mini+me · 2009-10-31 07:42 · Score: 1

Microformats allow HTML to describe the data. But you are right, it is not the right tool for storing data.
Re:PDF Yes, Flash No by agnosticnixie · 2009-10-31 08:50 · Score: 1

I have no mod points, but, seriously, that - mod up people.
Re:PDF Yes, Flash No by BenoitRen · 2009-10-31 10:57 · Score: 1

Most of what you say is implementation-related rather than format-related.

This is irrelevant, as Flash is a proprietary standard with only one implementation. Hence the two are pretty much equal.
Re:PDF Yes, Flash No by nahdude812 · 2009-11-01 01:59 · Score: 1

Maybe you haven't seen this: http://www.adobe.com/devnet/swf/
This is 278 pages of very straightforward and in-depth documentation on the SWF file format.

--
Slay a dragon... over lunch!

Re:Tell Adobe to open-license PDF by TheRaven64 · 2009-10-31 02:13 · Score: 5, Informative

What are you talking about? The PDF specification has been available as a free download from Adobe with no royalties payable by implementors since PDF was first created. More recently, the PDF/X family of specifications was approved by ISO. These define subsets of the PDF 1.4 specification for different uses (see ISO 15930). There are at least three open source PDF readers that I know of as well as several commercial viewers (Adobe Reader, FoxIt, Apple's Preview, and so on) and numerous tools can generate PDFs.

--
I am TheRaven on Soylent News

data formats independent of campaign donors by SgtChaireBourne · 2009-10-31 02:30 · Score: 2, Informative

The summary does not do a good job of reflecting the original blog post's point. The point was that the government should make data available in a machine-parseable and generic format. PDF is a great format for storing typeset pages, but it is a terrible format for publishing data. It's easy to generate beautiful PDFs from well-structured data but it's much harder to go the other way. Would you rather have budget figures (for example) as a CSV file in a well-defined format or as a PDF of tables and graphs? If the data is available in the former format, it's easy for you or a third party to produce the latter format. If it's only available in the PDF form then it's much harder to create the CSV.

If the goal is to make the data available, then even CSV would be a better option than PDF. PDF, while pretty, is a terminal format and is the digital equivalent of a mayfly. It's paper that hasn't happened yet and when it does it will exist for a few short hours before finding its way to the circular file.

Much of the government data consists of tables and tables of data. gzipped csv would be readable by anyone, so would ODF. Adobe appears to be looking for a handout at the expense of creating a useful and open data system.

Put it in context: open government requires data formats that are independent of campaign donors.

--
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.

Digital Stewardship : PDF vs PDF/A by SgtChaireBourne · 2009-10-31 02:40 · Score: 2, Insightful

PDF/A is already open. However, that doesn't mean that anyone knows how to produce it, especially some R.O.A.D. staffer or random hourly GS1.

Open or not, PDF/A is a display format and, in most cases, useless for information retrieval or automated data processing. PDF/A is a useful alternative to paper. However, the open government initiative is not talking about paper. It's about 'born digital', machine readable data.

--
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.

Re:Digital Stewardship : PDF vs PDF/A by maxume · 2009-10-31 04:44 · Score: 1

Useless is the wrong word. It took 15 lines of python wrapping xpdf for me to get a working system for dumping the transactions out of the last 6 years of my credit card statements.
It's ugly, but it works just fine.

--
Nerd rage is the funniest rage.

NO to Adobe! by SirAstral · 2009-10-31 02:41 · Score: 1

I guess most of you do not realize that Adobe produces SpyWare by default in their own products?

Flash for example has iesnare built into itself. This all allows machine profiling that everyone agrees to when you install their bullshit software!

You can't trust a company that has already done something to make you distrust them.

Re:NO to Adobe! by agnosticnixie · 2009-10-31 08:39 · Score: 1

The Internet doesn't belong to Adobe, moron.

PDF and Flash are massively multiplatform by Ilgaz · 2009-10-31 03:04 · Score: 2, Interesting

Adobe ships Flash/PDF readers/plugins to: Windows, OS X, Symbian (in some form), Linux, *BSD and various, uncountable tiny platforms. iPhone/iPod does not count because of obvious reasons.

Lets see what MS Silverlight ships to: Windows/Intel Mac. Damn thing is so tied to Windows that they couldn't even convert/ship the V2 for PPC Macs or they simply abandoned them. (like we cared!)

MS XPS format and viewer is the answer to PDF which, some people who didn't use Windows have never, ever heard of. It is that Windows centric. Despise all rude attempts by MS (adding XPS printer without etc), it has never, ever took off.

What we need is, something combines ODF and PDF. You can add binary file to PDF document like some layer. ROM LogicWare, less known Office (Papyrus) developer does it right now. The files are both PDF and their own edit format, transparent to PDF readers and NOT a hack.

Of course, people will spend time "omg flash, pdf, Adobe is slow" flaming rather than finding a solution to a real problem. Asking government to use Flash is really absurd but the real one to blame here is MS and open source based large companies. If they have no alternative, Adobe will suggest PDF of course. What else they should use? MS XPS?

Re:PDF and Flash are massively multiplatform by tolan-b · 2009-10-31 04:12 · Score: 1

> What we need is, something combines ODF and PDF. You can add binary file to PDF document like some layer.
Already exists:
http://www.oooninja.com/2008/06/pdf-import-hybrid-odf-pdfs-extension-30.html
(scroll down a little)
Re:PDF and Flash are massively multiplatform by koiransuklaa · 2009-11-05 08:39 · Score: 1

Let's see how that works out. My guess is that Moonlight development will always be just behind Silverlight so a lot of content won't work. Currently Moonlight is version 1.99 while Silverlight is at version 3.
Microsofts contributes, sure. It's just that so far that contribution looks just like the same old move they've done so many times.

Forget Google, every single Apple device does it by Ilgaz · 2009-10-31 03:11 · Score: 1

If you look around, every single Apple computer, device (ipod/iphone) is actively indexing every single PDF thrown at them, instantly and keep database of it.

It is the famous "Spotlight" technology. They don't even need to look at Google, some of them have same kind of indexing technology (minus relation) running on their laptops.

One should check the TFA relations with MS. I am sure something will come up.

Which idiot managed to do it? by Ilgaz · 2009-10-31 03:17 · Score: 2, Informative

I work with PDFs a lot, especially on OS X. I am telling you from an OS which you can have 60 KB 1080p screenshots in PDF in some circumstances: Whoever did that "text as image" trick, he is a complete moron.

One of the reasons that PDF took off is exactly embedding fonts used in a document so it will appear as pixel perfect on client machines.

As last resort (and a good practice), you can embed unformatted pure text of the entire PDF in your PDF file. PDF, like Quicktime Mov is one of the formats where people doesn't use the features and bitch about the size of client etc.

Re:Which idiot managed to do it? by 99BottlesOfBeerInMyF · 2009-11-01 06:56 · Score: 2, Informative

Whoever did that "text as image" trick, he is a complete moron.
Generally text as images in PDFs are the result of people who scan in paper documents but don't have access to or don't use OCR programs to convert the raw image coming in from the scanner into text.

Free programs only work with some govt PDFs by bigtrike · 2009-10-31 03:17 · Score: 1

A number of government forms don't work with the free PDF readers.

This is because Adobe broke its own published spec with its LiveCycle product, and by default it saves files that aren't compatible with anything else. It does a great job of forcing you to buy LiveCycle/Acrobat instead of using free tools. The Adobe people will tell you that it speeds up rendering of downloaded data, which I find hard to believe as the files are between 2x and 3x the size of a regular PDF.

The current use of Adobe products for government forms is a nightmare, it seems like a dumb idea to extend it.

Re:Free programs only work with some govt PDFs by jeremyp · 2009-10-31 10:37 · Score: 2, Informative

A PDF file produced by the LiveCycle suite is actually an XML document with a thin PDF wrapper. The XML conforms to the XFA standard which is owned by Adobe but is a published standard (http://partners.adobe.com/public/developer/en/xml/xfa_spec_2_4.pdf).

--
All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
Re:Free programs only work with some govt PDFs by bigtrike · 2009-10-31 11:56 · Score: 1

Acrobat Pro can't even edit XFA forms (beyond filling in values), why should 3rd party tools do so? I'm aware that you can save it as a hybrid "compatible" form, but it's not actually editable in Acrobat without stripping out the xfa data with a non-Adobe tool such as pdftk. The spec is subject to change at any time and has quite a few ambiguities, making it much more difficult to work with. How many more extra "open" specs and additions would we see if PDF was the official format of the government?
XFA/LiveCycle should be avoided for anything which requires interoperability, and Adobe never should have embedded it within a PDF file in the first place.
Perhaps if the government is going to use PDF for anything, it should set some guidelines to avoid these poorly supported extensions.

Why blame Adobe when there are no alternatives by Ilgaz · 2009-10-31 03:23 · Score: 1

I will ask one thing as you seem to miss why HTML is not considered a print/distro format: "When did we have an embeddable font standard for HTML webpages?" as with Flash: "Is there a way to have a single file and infrastructure to show embedded videos in HTML5 form?"

They actually suggested people to use abandoned VP3 format for God's sake and the very same people have chosen TrueType (check why freetype exists) as font embedding format.

Re:Why blame Adobe when there are no alternatives by Vladimus · 2009-10-31 04:00 · Score: 1

I know you can't embed fonts in HTML, that's why it gives you the option of choosing different fonts. If you can't make a layout work with standard fonts you're not much of a designer. And, um, why would you publish data in an HTML5 video?

--
A rolling stone is worth two in the bush!

Flash sucks by minkie · 2009-10-31 03:38 · Score: 1

Flash is evil for man reasons, but the most in-your-face reason if you use a Mac is that the Mac Flash plugin crashes all the time. It is the #1 (by far) reason for Safari crashes on the Mac.

I'm not wild about PDF, but at least I don't see PDF viewers crashing all over the place.

Hate to Nitpick by sehryan · 2009-10-31 03:43 · Score: 1

"...unfindable by search engines..."
That is absolutely not true. Anyone who uses Google knows that the search engine can read PDFs, identify if any of the keywords are located within, and then provide a link both directly to the PDF as well as to an HTML version.

--
The world moves for love. It kneels before it in awe.

If They Open the Formats by RAMMS+EIN · 2009-10-31 03:53 · Score: 1

This could be a Good Thing, if it means that the formats will be made and remain open. IIRC, PDF is already an open standard, and supported by various programs from multiple sources. I would applaud it if the same were to happen to Flash. And if both formats are open and widely supported, the government could do a lot worse than using them.

--
Please correct me if I got my facts wrong.

Re:the flash web browser does enable screen reader by markdavis · 2009-10-31 03:56 · Score: 2, Interesting

So there is a partial option for MS-Windows only. Great. Not exactly platform agnostic and open. I suppose it is better than nothing, though.

Re:Tell Adobe to open-license PDF by TheRaven64 · 2009-10-31 04:47 · Score: 1

There are a huge number of free programs that can create PDFs. Anything that uses Cairo for rendering can generate PDFs natively, although without some of the nice metadata. If you're using almost any modern operating system (Windows or anything that uses CUPS for printing, including Linux and OS X) then any application that can print can also generate PDFs. I use pdflatex very often and it produces beautiful PDFs with working hyperlinks and the table of contents in the bookmarks section, and it will happily import the PDFs that I've created with gnuplot or graphviz as well as commercial tools like OmniGraffle. My entire workflow involves creation of PDFs to send to my publisher. None of the tools that I use come from Adobe and most are Free Software.

--
I am TheRaven on Soylent News

Re:Tell Adobe to open-license PDF by gyrogeerloose · 2009-10-31 04:48 · Score: 1

Apple uses PDF as the basis of the OS X display engine. When they adopted the NeXT OS as their next-generation to replace the "Classic" Mac OS, they switched from NeXT's Display PostScript precisely because PDF was a free and open-source specification. An OS X user can create a PDF file from pretty much any document simply by beginning a print operation then selecting "save as PDF" from the print dialog box.

--
This ain't rocket surgery.

Re:Adobe SW = Wasted CPU by owlstead · 2009-10-31 05:08 · Score: 1

"Further, the recent PDF specifications add DRM which shouldn't be allowed in government publications. If the govt agrees to use a PDF version that open source software can completely read, parse, and convert, then it is fine PROVIDED the raw data is available in open formats too."

No, it's not fine because, as others have pointed out, PDF is mainly use for formatting documents. It's doing a pretty adequate job on that as well, and you can use third party software that can actually display it without the drawbacks of the *HORRIBLE* Adobe software. But that does not make it a good mechanism for storing information that can be indexed in any useful way (except simply parsing the text). Hell, you can't even /select/ text normally using most PDF readers.

Re:Tell Adobe to open-license PDF by lahvak · 2009-10-31 05:24 · Score: 1

How many free programs do you know of that create .pdf's?

To lazy to count right now, but just what I use on more or less daily bases, about 20. Plus hundreds of others that I don't use.

--
AccountKiller

Re:Tell Adobe to open-license PDF by TheRaven64 · 2009-10-31 05:54 · Score: 2, Interesting

PostScript is also a free specification, but NeXT was using the Display PostScript implementation licensed from Adobe. They switched to something closer to PDF because, it turned out, no one actually cared about the nicer features in PS. With DPS, you could write view objects entirely in PostScript and have them run on the display server. This was quite slow and had all sorts of problems in that the PS programs could (potentially) run forever. Most people just used the drawing subset of PS, which is also available in PDF, and none of the flow control stuff.

--
I am TheRaven on Soylent News

Re:Tell Adobe to open-license PDF by Blakey+Rat · 2009-10-31 05:56 · Score: 2, Insightful

Yes, and then they SUED Microsoft for putting PDF support in Office. It's only "open" as long as you're not big enough to compete with Acrobat. If you even get within a mile of stepping on Adobe's business, you're sued up the wazzoo.

"Free and open" my ass.

--
Comment of the year

Re:Adobe SW = Wasted CPU by lahvak · 2009-10-31 06:07 · Score: 1

Hell, you can't even /select/ text normally using most PDF readers.

People keep saying that. I never had a problem with this. I use 3 or 4 different pdf readers, including the one from Adobe, and I never had problems with selecting and cutting text from a pdf document.

--
AccountKiller

Re:Forget Google, every single Apple device does i by TheRaven64 · 2009-10-31 06:24 · Score: 1

If you look around, every single Apple computer, device (ipod/iphone) is actively indexing every single PDF thrown at them, instantly and keep database of it.

No it is not. It is indexing every PDF that has text in the metadata. Create a PDF by printing to PostScript and then converting to PDF (the easiest way of creating PDF on Windows or Linux machines) and watch Spotlight completely fail to index it. Spotlight does not index the text that you see when you browse the PDF, because that text is stored as a set of glyph indexes, not as streams of characters.

And if you want some real fun, open up a PDF containing table in Preview and try to persuade it to copy it in a way that preserves the structure of the table.

--
I am TheRaven on Soylent News

Re:Tell Adobe to open-license PDF by Blakey+Rat · 2009-10-31 07:36 · Score: 2, Insightful

Bullshit.

It's either an open standard, meaning anybody can use it-- ANY BODY-- or it's not. There's no such classification as "it's an open standard, except we don't let companies we don't like use it because they have a big marketshare, but other than that it's an open standard believe me!"

By your argument, Microsoft should also be prevented from parsing HTML files in IE because they're a monopoly. Does that make sense? No. Does your argument make sense? No.

--
Comment of the year

Re:Tell Adobe to open-license PDF by commodore64_love · 2009-10-31 07:59 · Score: 1

"Everybody uses it" is not the same as open. PDF is like VHS or CD. All are closed standards, requiring a license from their respective owners.

BTW why was I modded "flamebait" for expressing an opinion? Silly, silly, silly.

--
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall

HTML5 & SVG. by paradisaeidae · 2009-10-31 12:35 · Score: 1

Pretty surprised I'm the 1st to suggest this combo. Most 'modern' browsers are close to svg 1.1 now. Google has stated it's interest in svg, hosting this year's svgopen.org. Indexability being a strong draw. Sure, everytime you mention an xml format the json guys cough up bits. Size is reducible by gzip, xslt is not 'pretty' but the flexibility will exercise yer greystuff. MarkT ps: Inkscape.org will convert pdf pages to svg nicely.

Re:Tell Adobe to open-license PDF by TheRaven64 · 2009-10-31 13:19 · Score: 1

"Everybody uses it" is not the same as open. PDF is like VHS or CD. All are closed standards, requiring a license from their respective owners.

You're moderated flamebait for being wrong and, as you usually do, aggressively defending your incorrect position when ten seconds of fact checking would indicate that you are wrong.

You can, as I said in the original post, download the PDF specification and implement it without paying a royalty and you've been able to do this for every version of the PDF specification since version 1.0. That page is linked to from the top link that you get if you Google for 'PDF specification' and it has been for some years. No license is required for downloading or implementing them.

The ISO 32000 specification, which is now the official PDF specification, costs money to buy from ISO (as all ISO specs do, including the C language specification), but the format it describes is identical to the one described by the PDF 1.7 format, with various sub-formats (e.g. PDF/A) requiring only a subset of the features described in this document. Although it costs money to get the spec from ISO, there are no royalty requirements for implementors. Adobe now publish their versions as extensions to the ISO-controlled format, rather than as complete new specifications and any organisation wanting interoperability should mandate the ISO specification rather than the Adobe extensions.

Of course, you'd have known all of that if you'd spent a minute actually doing basic research on the topic at hand before posting. Fortunately, the fact that you still haven't learned how to use quote tags means that it's usually easy to spot your posts from a distance and ignore your ill-informed ramblings.

--
I am TheRaven on Soylent News

Re:Tell Adobe to open-license PDF by commodore64_love · 2009-10-31 14:09 · Score: 1

>>>you are wrong. You can, as I said in the original post, download the PDF specification and implement it without paying a royalty and you've been able to do this for every version of the PDF specification since version 1.0 [1993]
>>>

Guess what? You are wrong too. (Surprised? You shouldn't be; nobody's perfect; not me nor you.) PDF did not become an open standard until version 1.7 [2008] according to wikipedia. That was only a year ago.

Which is why, as others pointed out, various companies had been sued for infringing upon Adobe's PDF patents. I had to *buy* Adobe Acrobat because at that time (2004) there was no other program available to create PDFs. It was parented and restricted.

--
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall

Re:Tell Adobe to open-license PDF by commodore64_love · 2009-10-31 14:15 · Score: 1

the fact that you still haven't learned how to use quote tags

You mean like that? I know how to use them just fine, but I've always preferred the old Usenet methodology. Typing >>> is a heck of a lot faster than typing 14-letter tags.
.

>>>your ill-informed ramblings.

That's nice. You were still wrong when you said, "You've been able to do this for every version of the PDF specification since version 1.0." Adobe had the patents until 2008. That means it was closed. No one could legally publish a PDF Creator program prior to that year, as Microsoft and other companies discovered when they got sued.

--
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall

Re:Adobe SW = Wasted CPU by owlstead · 2009-10-31 14:36 · Score: 1

That's weird, because any time I cut anything around a page border, a table or more or less any other break in the page, everything gets screwed up. I won't even go into what happens when there is a watermark on the page. And with screwed up, I mean screwed up. Missing parts of text, text in wrong order, you name it. That and it crashes every so often, it doesn't live through power saving state on my computer, to name something. I won't go into the way it handles tabs, or form input or search or pop ups because we could be discussing their crapware for hours on end.

Right you are, sir... by Joce640k · 2009-10-31 16:07 · Score: 1

I'd vote for that as a standard.

--
No sig today...

screen-scraping a PDF/A wrapper by SgtChaireBourne · 2009-11-01 02:13 · Score: 1

Useless is the wrong word. It took 15 lines of python wrapping xpdf for me to get a working system for dumping the transactions out of the last 6 years of my credit card statements.
It's ugly, but it works just fin

That would be because that particular PDF happened to accidentally be wrapping ASCII or ISO-8859 or UTF-8 or UTF-16 instead of some image format. Even then, that was just screen-scraping like can be done with old terminal sessions. It can be done, sometimes.

Keep the data in machine readable formats, not a terminal format like PDF or paper.

--
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.

Remove Flash and enjoy the surf by minstrelmike · 2009-11-01 03:15 · Score: 1

When Flash had a few issues a couple months ago, I removed it from my browser. Suddenly, thousands of irritating advertisements and web banners and annoying intro pages of pointless information were blank with only a notice to install flash player.

Remove it from one browser and see if it doesn't make surfing better for you.

Model View Controller by foniksonik · 2009-11-01 17:27 · Score: 1

Just my 2 cents in regards to public records and data.

I'd like to say that the groups making decisions in this area really should consider a MVC architecture which will avoid the concerns iterated here on /. and by pundits for open data standards everywhere in regards to display aka View technologies.

With a Model View Controller methodology and pattern in place it really is not a concern what technology is being used to display data at any given time. If public data is *stored* (Model) and *accessed* (Controller) via open standards then the *display* (View) itself is inconsequential and/or malleable to the extent needed for any purpose.

Flash is great at some things, PDFs are perfect for a variety of tasks. They are, like any other format, not the only useful format available and should never be thought of as the 'archive' or 'final' format. The Model is the archive.

All the government agencies need to do is show that the Model is able to be trans-coded to several other popular storage formats without loss and that should be good enough for anyone. They also need to provide an API for accessing the data regardless of the Model and an output format that is structured and well documented (XML, JSON, SOAP even).

At this point it is the data consumer who should choose what format they would like to visually see it in... PDF, interactive Flash/Flex charts, JSON, Word, HTML, SGML, RTF... does it matter? Not to me or anyone else. I will get to choose the format I'd like it in (XML, JSON or Actionscript Objects please).

If the format doesn't exist yet, there's an API I can use to transcode the data as I see fit.

--
A fool throws a stone into a well and a thousand sages can not remove it.

Re:Adobe SW = Wasted CPU by petermgreen · 2009-11-02 08:13 · Score: 1

My experience is it works fine for single column text or for selecting individual words but at least acrobat reader doesn't have any clue about what is and isn't part of the same block of text (e.g. it will select over into the second column of a two column page before selecting stuff in the next row of the current column)

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register

Re:Adobe SW = Wasted CPU by lahvak · 2009-11-03 02:44 · Score: 1

Hm, never happened to me. I have few hundreds pdf files here, many of them with 2 or 3 columns, tables, formulas etc, and I don't seem to have any problems with selecting text. What software were your pdf files generated by, anyway?

--
AccountKiller

English OCR has become trivial really by Ilgaz · 2009-11-03 02:46 · Score: 1

Especially on Windows and with English language, it is not an excuse. Every scanner comes with OCR programs, at least in English. I did a 70 page manual translation back when Windows 3.1 was new so I know.

Of course, here are the true free software: http://jocr.sourceforge.net/ and recent Google (taken back to life) http://code.google.com/p/tesseract-ocr/

Even if you are home user, thanks to Spotlight and various Windows/Linux local engines, it is really good idea to keep text in pdf files.

Slashdot Mirror

Adobe Pushing For Flash and PDF In Open Government Initiative

130 of 172 comments (clear)