Ask Slashdot: How Do You Automatically Sanitize PDF Email Attachments?
First time accepted submitter supachupa writes "It seems the past couple of years that spearfishing is getting very convincing and it is becoming more and more likely someone (including myself) will accidentally click on a PDF attachment with malicious javascript embedded. It would be impossible to block PDFs as they are required for business. We do disable javascript on Adobe reader, but I would sleep a lot better knowing the code is removed completely. I have looked high and low but could not find a cheap out of the box solution or a 'how to' guide for automatically neutralizing PDFs by stripping out the javascript. The closest thing I could find is using PDF2PS and then reversing the process with PS2PDF. Does anyone know of a solution for this that is not too complex, works preferably at the SMTP relay, and can work with ZIPed PDFs as well, or have some common sense advice for dealing with this so that once its in place, there is no further action required by myself or by users."
As far as I know, Foxit Reader strips out any JavaScript. The PDF readers in Chrome and Firefox also should do the same.
The way I'd do it is to create a dummy printer driver that just writes to a file. Print the PDF to the dummy printer, which in turn creates a new PDF without all the junk.
You can change the legality of a document for example by modifying it.
A solution that modifies the PDF viewer is much better than one that alters the document. That means not using Adobe. Pity the company refuses to build a version that doesn't do Javascript in the first place.
Check out Sumatrapdf http://blog.kowalczyk.info/software/sumatrapdf/free-pdf-reader.html. It's super fast and does not support javascript or actionscript in PDF's. I use it exclusively now.
The best way to protect your computer from malicious Javascript embedded within a PDF is to not install Adobe Reader. If you cannot open the file, your computer cannot be infected.
sudo make me a sandwich
Why in the world is javascript included in PDF documents? PDF is already a Forth like programming language and environment.
Why don't you sanitize the reader? Use a reader with javascript ignored. Or build one from whatever open source pdf reader you can find, if there isn't one already. Or run the pdf reader inside a sandbox without internet access or permanent disk write. If that breaks the portability and the documents don't render correctly when javascript is diabled, tell the sender and blacklist the sender too for good measure. If enough companies lock javascript out of pdf documents eventually the authoring tools will stop using it.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
A big limitation of Sumatra is that it doesn't support filling out interactive forms, which makes it a no-go in my organization
You don't need a solution that rewrites the PDF. At best it will work correctly "most of the time", and break PDFs the rest of the time. For example, pdf->ps->pdf, or the "print to pdf" solution mentioned earlier in the comments may work fine for scanned PDFs, but if there are annotations/comments then they'll get stripped. This will lead to massive user frustration ("but the comments are there, I sent it in the last email") and people having to find ways to work around your filter. Modifying people's attachments is a bad move. A more reasonable solution is to detect if the PDF contains any javascript code, and if it does, block the PDF entirely.
The problem is that PDF and the PostScript used in it is an executable language. This falls under "executable code in non-executable containers". If you need to be sure, convert the PDF to a series of JPG or GIF pictures and recreate a PDF from them. With any less harsh approach, you may retain malicious PostScript (and other) code.
And, yes, what you are trying to do is non-trivial. Expect anything "simple" will be insecure.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
At some point you trust technology and also reinforce proper user behavior. I hate catch-phrases but your e-mail hygiene should have layers of protection (defense in depth). Assuming that the message got through IP reputation filters, SPAM analysis, malware scans, and was delivered to your user, you rely on desktop protection and cross your fingers that nobody opens it.
We have SMTP appliances from Axway and we used to stop all executable attachments and deliver a notification to the user to call the help desk and request a release. Times changed and we don't do that any more. However, you could annotate the message to remind the user that if they don't know who it's from or what it is or if they weren't expecting it to not open it. And some will anyway. We also used to hold certain attachments for four hours until the virus definitions (and the other defenses) received a couple of updates and then reprocess the message.
If you do try to roll your own, be aware that everyone and their dog creates PDF files with varying degrees of success and we had certain PDF files that caused services to fail on our gateway while they tried to scan and process them. You didn't mention the volume but make sure your solution scales well.
Evince is the PDF/Documents viewer for gnome, It also gets compiled for windows.
In the linux world, its a heavy weight gnome app.(compared to e/x pdf), but its far far far lighter than Adobe Acrobate Reader, and it doesn't do javascript at all. I've yet to come accross issues with PDFs not working, as most legimiate PDFs don't use javascript.
It also comes from a long standing respected open source project, GNOME,(read comparable quality as commericial software,), not a drive by night freeware operation of dubious origins.
https://wiki.gnome.org/Evince/Downloads
A big limitation of Sumatra is that it doesn't support filling out interactive forms, which makes it a no-go in my organization
If it fills in forms, it's a security risk. I seem to recall that there are a few that ignore forms and let you create companion files that do overprint forms on form-like fields though. Can't remember the names offhand.
Check out Sumatrapdf http://blog.kowalczyk.info/software/sumatrapdf/free-pdf-reader.html. It's super fast and does not support javascript or actionscript in PDF's. I use it exclusively now.
Is it vulnerable to font description overloading and the other PDF exploits out there? A large portion of the malicious PDFs I've seen lately didn't use forms or javascript containers as the main attack vector (usually shellcode via some markup bug).
Great little app for just such issues.
They address what few security vulnerabilites exist in the software immediately, it's based on MuPDF library.
Aren't the signed PDFs usually just signed in Adobe, but read just fine in lmost any other reader?
This is my signature. There are many like it, but this one is mine.
If you use anything but Adobe, it probably won't support javascript because it's fucking stupid to have javascript in a PDF. Just avoid Adobe, because they are allergic to security.
This is my signature. There are many like it, but this one is mine.
Signed PDFs can be read in any reader, but the signature will be still validated (if the reader is not defective.) Encrypted PDFs will not be even readable if they are not encrypted to you. Password-protected PDFs may require the password to be readable, let alone printable or changeable.
In other words, PDFs are not designed for wanton modification. Some of them can be modified, but others cannot. This means that you cannot build a reliable method for converting suspect PDFs into safe PDFs.
There's a couple of vendors (and many more playing catch-up) selling appliances that detonate attachments on sandboxed VMs running in fast virtual memory.
They executed/open attachments and watch to see what happens - registry changes, file drops, network activity, attempts to contact known C&C servers, etc.
Anything that exhibits non-legit behavior get quarantined. FireEye have a box that does this and also crawls network shares, testing files.
Aside from whitelisting, I think it's the best defense against zero day malware. It's a little too pricy for the company I work at right now, but as more vendors add this functionality, the price will come down.
It gripped her hand gently. 'Regret is for humans,' it said.
I seem to recall a lot of the security mechanisms assuming you are using Adobe. I want to say that passworded files will often just ignore the password prompt and display normally, and if a PDF can be read, it can be printed.
This is my signature. There are many like it, but this one is mine.
I want to say that passworded files will often just ignore the password prompt and display normally, and if a PDF can be read, it can be printed.
It's because there are two passwords; one to open for reading, and another for other purposes. Let me open Acrobat and tell exactly...
The certificate security seems to support that too. It's a complicated cardhouse, and I wouldn't want to become responsible for hacking it. Not as a volunteer, at least (no "thank you" if it stops a virus, but all the blame if it breaks someone's workflow.) Generally, if a PDF is signed or certified or encrypted, it's off limits. I do sign PDFs now and then, and I have seen workflows where *every* PDF is signed (the government does that.) Those are not something you dare to hack - those are often multimillion contracts awarded to your company.
For a long time, I thought like you, that it was my duty to ward off and protect the "children". After a while, you realize 2 things.
First, it is most likely your duty to inform and educate. Do that. Do it well, do it loud, and do it as often as you can. When someone eventually opens up one of those attachments, it will get around, and peer pressure will make everyone else gun-shy. After a user or two of mine got bit by an attachment, and I had repeatedly warned my users about these things.. I ended up with people at my desk occasionally asking..can you come look at this.. it just looks funny.. it was all about the peer pressure and not wanting to be That Guy who clicked the stupid link.
Second, and I hate to say it, this is what we do, and this is job security. You can't save em all Hasselhoff, if ya did, there would be nothing left to do..
Learn the file format and write a program to strip out any executable script elements.
http://www.adobe.com/devnet/pdf/pdf_reference.html.
And be sure to double-check that the next update doesn't revert those settings on you...
Javascript should not be given the capability of doing damaging things, It should be confined to a narrow execution context that is limited to being able to do only the things that enhance the experience of that ONE information resource. Dynamic layout is certainly a useful thing. Dynamically changing your system is not. It should not have access. I blame the developers. It doesn't matter if it is mail or web. It might do cute things inside a PDF like give you a calculator for a certain algorithm the PDF is written about. But it should not be able to access even /etc/hosts on your computer.
now we need to go OSS in diesel cars
I'm hoping that somebody can reply to this with a _genuine_ reason why sending a PDF (Pretty Damn F'ked) attachment to an e-mail is either necessary or optimal
What else would you use to send an invoice, or a contract, or a drawing, or a user's manual, or anything else that requires pixel-accurate placement of all elements as designed ? It has to support digital signatures as a minimum, and preferrably a complete public key encryption. PDF does that.
'It's good looking' sounds like a weak reason.
The 'good looking' is a weak reason. "Correct" is a far better reason. Once you print into a PDF, it captures your document exactly as it is. You want your documents to represent what you put into them - neither more nor less. Perhaps there are better formats, but I'm not aware of any.
Good thoughts. I was thinking (not written well) about what PDF does better than other pixel-accurate formats (such as postscript). In other words, I was looking for something above-and-beyond the competition effectively justifying the sanitization effort that the OP will have to put forth (as many unfortunately don't).
Lots of people here saying "Don't use Adobe" and suggesting alternatives. Reality is, for many of us, we deal with complex PDF forms and applications that integrate directly with Adobe Acrobat. In my business (CPA firm) we use lots of applications, and most of them are highly vertical with often just one realistic competitor that can function adequately for a firm our size. Many of our apps integrate directly with Acrobat (and Office) so not using Acrobat simply isn't a choice we can make.
So how do we deal with Adobe Acrobat? As some pointed out earlier, defense in depth. Spam filters, multiple virus scans, and our two most important measures: End users don't have admin on their computers and Adobe is one of our "High Priority" upgrade applications. Updates must be pushed out within one day of being released.
BTW, the other other High priority apps are Java and Flash, again, both required by our software. With Acrobat, they make up my "Axis of Evil" of insecure software.
And there are software out there that removes all limitations on a PDF too.
Of course - mostly useful when you want to be able to enable the ability to copy text from a PDF or remove watermarks and similar stuff.
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
That software cracks the password(s) by brute force. As I understand, there are not too many better attacks against AES. This means that a password like 9~}~w\1[X\3{F968|05|\St3\Ya7Lh~~ is not going to be cracked in this millennium. Besides, it would be entirely illegal to use such software in a business. Cracking of a password may take a second, or it may take a year. How would you integrate that into your mail processing chain?
PDF can be also encrypted with PKI, and with Adobe's own DRM. Those cannot be cracked, as far as I know. You either attack the symmetric cipher, which is usually AES256, or you find a new attack against RSA. If you can do either of those in reasonable time, you have better things to do - like becoming filthy rich and famous. (Or dead.)
just sayin... you could simply use a more secure pdf reader.
Ignore and erase if possible, posted in wrong thread by accident
Use Foxit and keep javascript off by default. (Or don't even install the JavaScript plugin.) It's lightweight, fast and has fewer quality issues than adobe. Additionally, considering PDF is inherently an unsafe format, I'd say adding a sandbox like Sandboxie can help you. More technical people here might try porting a good PDF reader's key parsing and JS functionality to NaCl sandboxing system. Put each component in separate partitions with inner sandbox protection at a minimum. That lets us use the fast and legacy native code, but have plenty isolation almost for free. Nick P Security Engineer usually on schneier.com
Simple. In our organization, Sumatrapdf is the only allowed PDF reader. Users could request nitro or foxit but a sysadmin would disable JavaScript on install. Never once had a malicious PDF infect our organization. Little more work to not give users admin rights to their machines. But time and time again, users prove they are too incompent to safely manage their own machines.
Seriously, why do people still run acrobat? PDF is a standard format, there are countless programs which support it and the only reason such files are a target is because adobe reader is basically a monoculture and represents a very large and attractive target. We need diversity among PDF readers, just like diversity among web browsers. It was diversity among web browsers more than anything else that reduced browser attacks and caused hackers to concentrate on proprietary monoculture plugins instead.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
The submitter is looking for a code-based solution to a sociological/psychological problem, and it's just not going to be effective.
The real solution is to educate and train your users so they don't fall prey to these sorts of attacks. I know a lot of IT people aren't comfortable dealing with people, and I know it takes quite a bit of time and doesn't look as snazzy on your résumé - but, really, it's the best long-term approach.
#DeleteChrome
The only option remotely useful, is the one to encrypt the file with a password for opening. The other "features" are just stupid client side security, and only appear to work if the client respects the options. All the user has to do, is open the file with a different pdf reader that ignores the options. Options like this are actually worse than having no options at all, because they create a false sense of security and encourage users to use them.
If you can read the file, you can always copy data out of it, print it, edit it etc.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
Cracking the password is entirely different from removing the "limitations"...
If you can open the file and read it, then you can always modify, print, copy etc the file too. If you can read the file then you have already got past the encryption because either there is no encryption or you have the key.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
Postscript is a turing complete language, it has even more scope for including malicious code than pdf does.
Incidentally there are also subset versions of the pdf format which don't include stupid features like javascript.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
So you make it inconvenient for your employees to do their jobs, which will make some potentially good employees walk and reduce the efficiency of those who remain. Technology is supposed to improve the efficiency of workers, otherwise why bother using it at all? It's very hard to include working exploit code on a piece of paper.
While i agree attachments are often misused, and i utterly detest companies that attach a bunch of images to every email they send out, all you can really do is avoid doing such stupid things yourself... Other people will still do it.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
Print then delete the file.
In the install tree find the file JSByteCodeWin.bin and rename it. Works for me.
A big limitation of Sumatra is that it doesn't support filling out interactive forms, which makes it a no-go in my organization
an organization relying on filling out interactive pdf forms sounds quite like a no-go to me. can't you really come up with a better solution to get your shit together? besides, the topic is how to get rid of this pdf pest because of obvious security concerns. I don't see how insisting in bad practices could be of any help.
here are my 50c: forget proprietary formats, forget any interactive or multimedia content requiring anything but a vanilla browser to view (yes, this includes html crap in emails), embrace the simplicity of plain text and mash up a secure webapp for anything beyond the capabilities of plain text.
If somebody is in charge of filtering policy on a mail server that's carrying such traffic the answer is nearly always yes.
I think all the people commiting suicide at their Seattle office might be getting to them.
Their Seattle office is right under the Aurora bridge, popular with jumpers...
In other words, PDFs are not designed for wanton modification. Some of them can be modified, but others cannot. This means that you cannot build a reliable method for converting suspect PDFs into safe PDFs.
I believe the entire point of the original submission was likely to troll this fact; as soon as he/she said that they wanted to do it while transitting a mail gateway, it was either a request for PDF encryption cracking or a troll against Adobe locking down documents in this fashion.
I've personally railed against government agencies being in violation of the Americans with Disabilities Act for putting up PDF forms that have to be filled in by loading them into Adobe products, but until someone who has been spearfished for lack of a product capable of doing this without violating the DMCA, nothing's going to change. With the ADA, there are clear, litigious interests groups, with large fat government agency targets. Not so when all you are talking about is companies like Barracuda being essentially frozen out of a market which Adobe is free to compete in on a software basis. But again, you have to be the wronged party.
The ordinary person doesn't give enough of a damn about this sort of thing for public pressure to work, and never will, since they have no idea what constitutes "enough" and would rather watch TV than be lectured to by nerds like us.
I did not know it was possible to detect javascript in a PDF, and I think this is possibly a better approach than a full rewrite (btw: I found this python script: http://blog.didierstevens.com/programs/pdf-tools/ ) So instead of rewriting every PDF, you just choose to delete any PDF attachments that are detected with JavaScript. I assume this will then not break any legitimate PDFs that have comments or forms, etc? It will need testing, I guess.
The mail relay can then be configured to detect and delete any javascript-containing PDFs and allow everything else through (including encrypted, which is more likely to be legit than not). Once again, this is not the only protection against this malicious code, but just one facet. I found some recent exploits that don't need javascript at all, so it seems the safest, yet most likely to make you hated, approach is to rewrite the PDF completely or not allow PDFs at all.
http://www.clearswift.com/sites/default/files/documents/datasheets/Sanitization%20SECURE%20Email%20Gateway%20v2.pdf
Check out Sumatrapdf http://blog.kowalczyk.info/software/sumatrapdf/free-pdf-reader.html. It's super fast and does not support javascript or actionscript in PDF's. I use it exclusively now.
Sumatra PDF used to be light, a mere 800 KB, I just installed the latest version, a whopping 3.6 MB. It's suffering the same form of featuritis as the other PDF readers I dumped because they became slow and unwieldy (Adobe and Foxit).
Calling someone a "hater" only means you can not rationally rebut their argument.
I use Ghostscript when attempting to compress a "bloated" PDF (such as generated by Xsane). The input is a PDF, output is a PDF:
# Use ghostscript to re-write the PDF
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=new.pdf old.pdf
Also handy to combine multiple PDFs into a single document, or copy out certain pages from a PDF:
# Combine PDFs
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=combined.pdf 01.pdf 02.pdf 03.pdf
# Copy pages 3 & 4 from an existing PDF
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dFirstPage=3 -dLastPage=4 -sOutputFile=new.pdf current.pdf
Signed PDFs can be read in any reader, but the signature will be still validated (if the reader is not defective.) Encrypted PDFs will not be even readable if they are not encrypted to you. Password-protected PDFs may require the password to be readable, let alone printable or changeable.
In other words, PDFs are not designed for wanton modification. Some of them can be modified, but others cannot. This means that you cannot build a reliable method for converting suspect PDFs into safe PDFs.
Encrypted PDFs can be broken, quite quickly - a quick search pulled up some tools - one of which I had to use a while back for work [1]. I decrypted about 40k documents in less than a day with GuaPDF, with only about 300 or so that couldn't be cracked - 99% success rate. Combine with the JS detection method noted in another comment [2], and you can still tell if there's a dangerous PDF most of the time.
If you need to protect your populace (i.e., at the mail server level), combining the two above and either blocking (with a "see IT" note) or warning users for uncrackable/JS-detected pdfs sounds like a good win. Especially since cracking is almost instantaneous.
[1] http://pcsupport.about.com/od/toolsofthetrade/tp/pdf-password-remover.htm
[2] http://it.slashdot.org/comments.pl?sid=3985927&cid=44314295
Make sure everyone's vote counts: Verified Voting
an organization relying on filling out interactive pdf forms sounds quite like a no-go to me. can't you really come up with a better solution to get your shit together?
Certainly! I could accept a multi-dozen thousand UKP fine and possible imprisonment from HM Revenue & Customs for refusing to complete the interactive PDF Corporation Tex return form ( CT600 ).
What an alternative!
Note: paper submissions are no longer accepted.
If you know a bit of Java you can do it in a few lines of code with the BFO PDF Library.
Only hire people who are smart enough to not open obviously fake attachments
Remember back when Internet worms were rampant? Yeah, back then we had all ports open by default in popular desktop operating systems. Finally they were convinced to release a service pack that closed all ports (turned on the firewall by default) and nearly all the worms went away.
That's what you call using a whitelist instead of a blacklist. A blacklist is stupid, but I guess it's the only option you have if everyone is too stupid to utilize the whitelist. This problem has been boiling for a while, and its come down to authentication of endpoints and blocking all others not in the trust graph. If only there were some system that allowed you to authenticate emails and form decentralized trust graphs... Like PGP.
So, How do you automatically sanitize all the packets hitting your ports? You don't. You block all but the ones that are legitimate. How do you automatically sanitize PDF Email Attachments? You don't. You block all but the ones that are legitimate.
Instead of thinking me clueless, or unhelpful consider that I've already been down that road as far as it can be traveled and wound up in the exact situation I started. Absolute security is impossible, make the environment hostile to propagation. Call up the other IT guys in the businesses you do work with, it's a problem you can't solve on your own. If only Jedi Mind tricks actually worked, you could convince your managers to let you live. These scanners are not the solutions you are looking for. Whitelists are the way to go.
For dangerous attachment types, you can quarantine them using MimeDefang. Then you provide a link for download after X days (notifying the recipient of the mandatory quarantine time), and a procedure for the helpdesk to pre-release 'known good/expected' documents. While in the quarantine area, you can do whatever you like to it ... scan for viruses, convert to another format, etc.
There's an interesting talk from Chaos Communication Camp 2011 about making a verified PDF scanner in the Coq proof assistant: http://www.youtube.com/watch?v=CmPw7eo3nQI
Beside that, if you sue a less feature rich pdf reader, that is safe,
the virusscanner will STILL find the suspicious js code, flag it, and set of all kinds of big alarms. And then they will be back at the mail guy.
Maybe you want to run mail in a virtulized sanatized citrix box, where you cannot infect an pc that has access to al internal systems.
Built-in feature using hardware-hardened (VTx and IOMMU) disposable virtual machines. Process is described here: http://theinvisiblethings.blogspot.com/2013/02/converting-untrusted-pdfs-into-trusted.html
I know it's a thought doomed from the start, but switching from pdf to epub as the document standard would be a big help.
https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
why not solve the problem properly and just sandbox the PDF reader so it can't access anything except the PDF that's passed into it?
We can within a company, but we can't control what reader software the recipient of a forwarded PDF uses.
Sumatra PDF used to be light, a mere 800 KB, I just installed the latest version, a whopping 3.6 MB.
Considering the size of Acrobat Reader, calling Sumatra "whopping" at 3.6MB sounds like a pretty good compliment.
"Our two-party system is like a bowl of shit looking at itself in a mirror." - Lewis Black
I'm with you in spirit but stuck in a situation where our single most important software vendor has incorporated interactive forms using Adobe PDF reader. Replacing it is so far beyond our budget that just discussing what it would take turns the discussion into a five year plan.
Doing our own software to replace it is even worse in terms of budgeting. If you'd like to make yourself a couple million dollars and a career for the next twenty years, you can go for it, but the initial investment requirements are going to be in the same range and it'll probably take a minimum of five years for you to get a new system past all the regulatory hurdles and any adoption.
If you do all that, drop me a line. We'll be looking for somebody new about that time.
For people stuck in reality: Our solution is to try to be very suspicious of PDFs that come into the system, but trust the ones already in there. It's not a great situation to be in but I hope this forum will give me some ideas on how we can better protect ourselves from the potential dangerous PDFs coming in.
If you need PDF forms, then you just about have to live with the whole "dynamic content" package which is the security problem in the first place.
The actual javascript object looks something like this:
244 0 obj<</S/JavaScript/JS(all javascript code is between the two paranthesis)>>
endobj
Just stripping it out appears to work ok. Though the first number on the line appears to be an object number... so you might need to renumber the remaining objects as well to avoid problems with some viewers (I'm just guessing here).
If you want to go through a PDF a scrub out such things as JavaScript, actions, annotations, etc. I would implement either Enfocus' PitStop Server or Callas' pdfToolBox Server. They pay tools are not some sort of conspiracy. They have been tested in a large number of production environments with a zillion PDFs produced by various tools and systems. The vendors (Adobe included) have libraries (10's of thousands) of malformed PDFs that they use to regression test their products.
Do not refry (PDF--> PS --> PDF) the PDF unless you know what you are doing. It's not the PS --> PDF using Ghostscript that's the problem (ver 9 of GS actually produces a pretty decent PDF). It's the creation of the PS from the PDF feedstock. It is not as easy as you may think. Did you sit down with a loupe to see if you have the resulting PDF look EXACTLY as the input? Didn't think so. You can run into all sort of weird issues with fonts, color spaces, transparency, alternate content layers etc. by doing a blind refry. There are a lot of ways to create a PDF. There are relatively few ways to do it correctly. There are very few (read: only ONE!) PDF Reader that actually does a good job on the not so well-formed PDFs. That being Adobe Reader.
Tools that decompose the PDF and recompose it will be hit or miss.
With regards to installation of Reader in a corporate environment:
1) Use the latest/current version. Starting with Reader X (ten) Adobe launches PDFs in a sandboxed mode (until disabled by the user), negating much of the JS and other exploits that have been rampant previously. Starting with Acrobat XI (Spinal-Tap version - it goes to Eleven!), even Acrobat is launched in a sandboxed mode, again until disabled by the user.
2) Use the enterprise deployment tools that Adobe provides http://www.adobe.com/products/acrobat/it-resources.html to make sure that a) Reader is locked down b) stays locked down according to your corporate needs. The tools provided can allow you to harden Reader quite a bit and keep the users from making changes.
3) If you are truly of the paranoid type - and there are some business areas that have a legitimate need to be hyper paranoid about this stuff - only allow the PDFs to be opened inside of a hardened virtual machine that you remote into. Sort of a glove box approach to the PDF. Others have mentioned various methods to do this which are perfectly acceptable.
Now, a larger number of slashdotter's are not going to like this - but much (most?) of the FOSS PDF software, tools, and libraries, produce less than optimal PDFs. The primary problem stems from 1) good page layout design is not the same as good web design. 2) Good PDF is built by using the most expedient and direct method possible. Not by using the most obscure methods you can find (such as how Apache FOP loves to f-around with the CTM rather than just performing a simple moveto). This is not RISC vs. CISC. Yes, f-ing around with the CTM allows you to slice, dice, Julian, fry, as well as being both a dessert topping and a floor polish. However, it is almost impossible to debug. You would have been better off just coding moveto, rmoveto, translate, scale, rotate, etc. as individual function calls (note, I am using the PostScript equivalents to the internal PDF commands). Your code is easier to parse, understand, debug, and, most importantly, follows generally industry concepts. 3) Use the minimal work to get the job done, not the most maximal. Don't screw around with kerning, leading, etc. unless you really need to. Place stings of characters as stings, not individual glyphs. 4) Learn the industry you are developing in and not gripe that the industry has no clue as to what they are doing. The typographic/layout industry has 10x the longevity as the web industry (500+ years vs. ~50). Most of the mistakes noobs were learnt years ago. Learn from their mistakes first. Yes, there are some things that are holdovers from tim
"- PDF is no longer searchable"
Use a PDF OCR product like Adobe Acrobat Professional or an ABBYY FineReader product.
Print to PDF using the open source PDFCreator, which would create another PDF file?
Invoice: Text file, HTML, spreadsheet, etc.
Pixel-accurate, in a single file, with embedded vector fonts and raster images? What kind of text file is that?
Contract: Rich Text file (signed with GPG/PGP), Text file (signed with GPG/PGP), HTML (with SHA hash stored elsewhere), markdown (signed with GPG/PGP), ODF, or even Doc or Docx.
Doc and Docx are the likeliest candidates, at least because most documents are prepared in them. However these files are not pixel-accurate, and they do not lock the content, and they contain hard to remove traces of past edits. Still, MS Word documents are a popular format in business - as long as both sides intend to edit them.
Drawing: lossless: bit map, Portable Net Graphic, Giff, WebP, tiff, Scalar Vector Graphic lossy: Jpeg
Not even funny. Did you ever try to export a D size architectural drawing into a JPEG? An SVG may do well on vectors, but how will it handle small rasters that are often there? How will it deal with fonts?
User Manual: Windows: HTML files compiled to .lit format, HTML document, Doc or Docx, Rich text file, or text file
I see no reason to separate Windows and Linux here because user manuals must be platform-independent. But ebook formats are not very nice because they don't deal nicely with *all* of the text, raster and vector graphics. HTML comes very close, but it's usually not a single file (hard to distribute.) RTF is, of course, good - but it's very complex. User manuals are rarely published as .doc[x] because the end result is not pixel-accurate, and reflowing of the document can (and will) mess it up considerably.
Encryption and signing: GPG/PGP, TrueCrypt Volume (where you can even hide the files even exist for plausible deniability), ste[GA]nographs
Businesses rarely need to hide data in images. Volume encryption does nothing to secure documents that you email. GPG/PGP is somewhat OK, but it is arcane and requires an extra step to verify.
As you can see, PDF combines all those desirable features in one convenient format, and there are many different readers and writers. A good number of them are free. What is there not to like? Alternatives may be just as good in one specific aspect, but there is no competition that does all of that pretty well.
Certainly! I could accept a multi-dozen thousand UKP fine and possible imprisonment from HM Revenue & Customs for refusing to complete the interactive PDF Corporation Tex return form ( CT600 ).
What an alternative!
Note: paper submissions are no longer accepted.
i had no idea, sorry to hear that. that's pretty deep shit. have you considered migration as an alternative?
Beware of people changing the extension (to, say, ".foo") before sending it inside your network. The recipient just has to change the extension back to ".pdf" and voila: they've snuck in an unsanitized PDF. And then there's the problem of password-protected Zip files.
I posted more details up above, but if you get the Adobe Enterprise Toolkit (free), you can rebuild a custom MSI installer that you deploy through Group Policy. It defines both installer options, and default user preferences.
You can specify a file share on your LAN for it to check for updates, so it will only have updates available once you rebuild the new version installer and vet for everything working.
There are also group policy admin templates you can use to lock user preferences, and you can have reader check an internal file share for updates instead of from Adobe.
Oh, and you can turn off the Ask toolbar crap too.
Toolkit docs:
http://www.adobe.com/devnet-docs/acrobatetk/index.html
Adobe Reader "offline" installer (The toolkit won't work with the "online" installer)
ftp://ftp.adobe.com/pub/adobe/reader/win/11.x/11.0.03/en_US/
Then go up to the 11.0.0 dir to get the group policy templates.