Ask Slashdot: How Do You Automatically Sanitize PDF Email Attachments?
First time accepted submitter supachupa writes "It seems the past couple of years that spearfishing is getting very convincing and it is becoming more and more likely someone (including myself) will accidentally click on a PDF attachment with malicious javascript embedded. It would be impossible to block PDFs as they are required for business. We do disable javascript on Adobe reader, but I would sleep a lot better knowing the code is removed completely. I have looked high and low but could not find a cheap out of the box solution or a 'how to' guide for automatically neutralizing PDFs by stripping out the javascript. The closest thing I could find is using PDF2PS and then reversing the process with PS2PDF. Does anyone know of a solution for this that is not too complex, works preferably at the SMTP relay, and can work with ZIPed PDFs as well, or have some common sense advice for dealing with this so that once its in place, there is no further action required by myself or by users."
As far as I know, Foxit Reader strips out any JavaScript. The PDF readers in Chrome and Firefox also should do the same.
The way I'd do it is to create a dummy printer driver that just writes to a file. Print the PDF to the dummy printer, which in turn creates a new PDF without all the junk.
You can change the legality of a document for example by modifying it.
A solution that modifies the PDF viewer is much better than one that alters the document. That means not using Adobe. Pity the company refuses to build a version that doesn't do Javascript in the first place.
Check out Sumatrapdf http://blog.kowalczyk.info/software/sumatrapdf/free-pdf-reader.html. It's super fast and does not support javascript or actionscript in PDF's. I use it exclusively now.
Why in the world is javascript included in PDF documents? PDF is already a Forth like programming language and environment.
Why don't you sanitize the reader? Use a reader with javascript ignored. Or build one from whatever open source pdf reader you can find, if there isn't one already. Or run the pdf reader inside a sandbox without internet access or permanent disk write. If that breaks the portability and the documents don't render correctly when javascript is diabled, tell the sender and blacklist the sender too for good measure. If enough companies lock javascript out of pdf documents eventually the authoring tools will stop using it.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
A big limitation of Sumatra is that it doesn't support filling out interactive forms, which makes it a no-go in my organization
You don't need a solution that rewrites the PDF. At best it will work correctly "most of the time", and break PDFs the rest of the time. For example, pdf->ps->pdf, or the "print to pdf" solution mentioned earlier in the comments may work fine for scanned PDFs, but if there are annotations/comments then they'll get stripped. This will lead to massive user frustration ("but the comments are there, I sent it in the last email") and people having to find ways to work around your filter. Modifying people's attachments is a bad move. A more reasonable solution is to detect if the PDF contains any javascript code, and if it does, block the PDF entirely.
At some point you trust technology and also reinforce proper user behavior. I hate catch-phrases but your e-mail hygiene should have layers of protection (defense in depth). Assuming that the message got through IP reputation filters, SPAM analysis, malware scans, and was delivered to your user, you rely on desktop protection and cross your fingers that nobody opens it.
We have SMTP appliances from Axway and we used to stop all executable attachments and deliver a notification to the user to call the help desk and request a release. Times changed and we don't do that any more. However, you could annotate the message to remind the user that if they don't know who it's from or what it is or if they weren't expecting it to not open it. And some will anyway. We also used to hold certain attachments for four hours until the virus definitions (and the other defenses) received a couple of updates and then reprocess the message.
If you do try to roll your own, be aware that everyone and their dog creates PDF files with varying degrees of success and we had certain PDF files that caused services to fail on our gateway while they tried to scan and process them. You didn't mention the volume but make sure your solution scales well.
Check out Sumatrapdf http://blog.kowalczyk.info/software/sumatrapdf/free-pdf-reader.html. It's super fast and does not support javascript or actionscript in PDF's. I use it exclusively now.
Is it vulnerable to font description overloading and the other PDF exploits out there? A large portion of the malicious PDFs I've seen lately didn't use forms or javascript containers as the main attack vector (usually shellcode via some markup bug).
If you rasterize and re-encapsulate your user's PDF attachments, your users will hate you, and work around your "stupid filter that breaks pdf attachments". You are better off blocking all PDF attachments by email. It'll save yourself a ton of work, and your users can skip the frustration of mangled attachments and go directly to working around your filter.
Your problem only applies if the PDFs have to be editable or if you rasterize with too low or too high resolution. You can also run the images through OCR to get back come level of editability.
Otherwise you have work with possibly infected PDF. There are a few settings where that is not acceptable and users will not work around it (e.g. "you infect this system and then it turns out you where not following procedure, you go to prison for a few years"-environments.)
While I agree that security should not hinder users from doing their job exactly because they will otherwise start to work around it, that was not the question of the OP.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
If you use anything but Adobe, it probably won't support javascript because it's fucking stupid to have javascript in a PDF. Just avoid Adobe, because they are allergic to security.
This is my signature. There are many like it, but this one is mine.
Signed PDFs can be read in any reader, but the signature will be still validated (if the reader is not defective.) Encrypted PDFs will not be even readable if they are not encrypted to you. Password-protected PDFs may require the password to be readable, let alone printable or changeable.
In other words, PDFs are not designed for wanton modification. Some of them can be modified, but others cannot. This means that you cannot build a reliable method for converting suspect PDFs into safe PDFs.
There's a couple of vendors (and many more playing catch-up) selling appliances that detonate attachments on sandboxed VMs running in fast virtual memory.
They executed/open attachments and watch to see what happens - registry changes, file drops, network activity, attempts to contact known C&C servers, etc.
Anything that exhibits non-legit behavior get quarantined. FireEye have a box that does this and also crawls network shares, testing files.
Aside from whitelisting, I think it's the best defense against zero day malware. It's a little too pricy for the company I work at right now, but as more vendors add this functionality, the price will come down.
It gripped her hand gently. 'Regret is for humans,' it said.
For a long time, I thought like you, that it was my duty to ward off and protect the "children". After a while, you realize 2 things.
First, it is most likely your duty to inform and educate. Do that. Do it well, do it loud, and do it as often as you can. When someone eventually opens up one of those attachments, it will get around, and peer pressure will make everyone else gun-shy. After a user or two of mine got bit by an attachment, and I had repeatedly warned my users about these things.. I ended up with people at my desk occasionally asking..can you come look at this.. it just looks funny.. it was all about the peer pressure and not wanting to be That Guy who clicked the stupid link.
Second, and I hate to say it, this is what we do, and this is job security. You can't save em all Hasselhoff, if ya did, there would be nothing left to do..
And be sure to double-check that the next update doesn't revert those settings on you...
Javascript should not be given the capability of doing damaging things, It should be confined to a narrow execution context that is limited to being able to do only the things that enhance the experience of that ONE information resource. Dynamic layout is certainly a useful thing. Dynamically changing your system is not. It should not have access. I blame the developers. It doesn't matter if it is mail or web. It might do cute things inside a PDF like give you a calculator for a certain algorithm the PDF is written about. But it should not be able to access even /etc/hosts on your computer.
now we need to go OSS in diesel cars
I'm hoping that somebody can reply to this with a _genuine_ reason why sending a PDF (Pretty Damn F'ked) attachment to an e-mail is either necessary or optimal
What else would you use to send an invoice, or a contract, or a drawing, or a user's manual, or anything else that requires pixel-accurate placement of all elements as designed ? It has to support digital signatures as a minimum, and preferrably a complete public key encryption. PDF does that.
'It's good looking' sounds like a weak reason.
The 'good looking' is a weak reason. "Correct" is a far better reason. Once you print into a PDF, it captures your document exactly as it is. You want your documents to represent what you put into them - neither more nor less. Perhaps there are better formats, but I'm not aware of any.
Lots of people here saying "Don't use Adobe" and suggesting alternatives. Reality is, for many of us, we deal with complex PDF forms and applications that integrate directly with Adobe Acrobat. In my business (CPA firm) we use lots of applications, and most of them are highly vertical with often just one realistic competitor that can function adequately for a firm our size. Many of our apps integrate directly with Acrobat (and Office) so not using Acrobat simply isn't a choice we can make.
So how do we deal with Adobe Acrobat? As some pointed out earlier, defense in depth. Spam filters, multiple virus scans, and our two most important measures: End users don't have admin on their computers and Adobe is one of our "High Priority" upgrade applications. Updates must be pushed out within one day of being released.
BTW, the other other High priority apps are Java and Flash, again, both required by our software. With Acrobat, they make up my "Axis of Evil" of insecure software.
Ignore and erase if possible, posted in wrong thread by accident
In the install tree find the file JSByteCodeWin.bin and rename it. Works for me.
I am intreagued by your solution, and would like to subscribe to your magazine.
Sent from my ASR33 using ASCII
In other words, PDFs are not designed for wanton modification. Some of them can be modified, but others cannot. This means that you cannot build a reliable method for converting suspect PDFs into safe PDFs.
I believe the entire point of the original submission was likely to troll this fact; as soon as he/she said that they wanted to do it while transitting a mail gateway, it was either a request for PDF encryption cracking or a troll against Adobe locking down documents in this fashion.
I've personally railed against government agencies being in violation of the Americans with Disabilities Act for putting up PDF forms that have to be filled in by loading them into Adobe products, but until someone who has been spearfished for lack of a product capable of doing this without violating the DMCA, nothing's going to change. With the ADA, there are clear, litigious interests groups, with large fat government agency targets. Not so when all you are talking about is companies like Barracuda being essentially frozen out of a market which Adobe is free to compete in on a software basis. But again, you have to be the wronged party.
The ordinary person doesn't give enough of a damn about this sort of thing for public pressure to work, and never will, since they have no idea what constitutes "enough" and would rather watch TV than be lectured to by nerds like us.
I did not know it was possible to detect javascript in a PDF, and I think this is possibly a better approach than a full rewrite (btw: I found this python script: http://blog.didierstevens.com/programs/pdf-tools/ ) So instead of rewriting every PDF, you just choose to delete any PDF attachments that are detected with JavaScript. I assume this will then not break any legitimate PDFs that have comments or forms, etc? It will need testing, I guess.
The mail relay can then be configured to detect and delete any javascript-containing PDFs and allow everything else through (including encrypted, which is more likely to be legit than not). Once again, this is not the only protection against this malicious code, but just one facet. I found some recent exploits that don't need javascript at all, so it seems the safest, yet most likely to make you hated, approach is to rewrite the PDF completely or not allow PDFs at all.
I use Ghostscript when attempting to compress a "bloated" PDF (such as generated by Xsane). The input is a PDF, output is a PDF:
# Use ghostscript to re-write the PDF
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=new.pdf old.pdf
Also handy to combine multiple PDFs into a single document, or copy out certain pages from a PDF:
# Combine PDFs
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=combined.pdf 01.pdf 02.pdf 03.pdf
# Copy pages 3 & 4 from an existing PDF
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dFirstPage=3 -dLastPage=4 -sOutputFile=new.pdf current.pdf
There's an interesting talk from Chaos Communication Camp 2011 about making a verified PDF scanner in the Coq proof assistant: http://www.youtube.com/watch?v=CmPw7eo3nQI
Invoice: Text file, HTML, spreadsheet, etc.
Pixel-accurate, in a single file, with embedded vector fonts and raster images? What kind of text file is that?
Contract: Rich Text file (signed with GPG/PGP), Text file (signed with GPG/PGP), HTML (with SHA hash stored elsewhere), markdown (signed with GPG/PGP), ODF, or even Doc or Docx.
Doc and Docx are the likeliest candidates, at least because most documents are prepared in them. However these files are not pixel-accurate, and they do not lock the content, and they contain hard to remove traces of past edits. Still, MS Word documents are a popular format in business - as long as both sides intend to edit them.
Drawing: lossless: bit map, Portable Net Graphic, Giff, WebP, tiff, Scalar Vector Graphic lossy: Jpeg
Not even funny. Did you ever try to export a D size architectural drawing into a JPEG? An SVG may do well on vectors, but how will it handle small rasters that are often there? How will it deal with fonts?
User Manual: Windows: HTML files compiled to .lit format, HTML document, Doc or Docx, Rich text file, or text file
I see no reason to separate Windows and Linux here because user manuals must be platform-independent. But ebook formats are not very nice because they don't deal nicely with *all* of the text, raster and vector graphics. HTML comes very close, but it's usually not a single file (hard to distribute.) RTF is, of course, good - but it's very complex. User manuals are rarely published as .doc[x] because the end result is not pixel-accurate, and reflowing of the document can (and will) mess it up considerably.
Encryption and signing: GPG/PGP, TrueCrypt Volume (where you can even hide the files even exist for plausible deniability), ste[GA]nographs
Businesses rarely need to hide data in images. Volume encryption does nothing to secure documents that you email. GPG/PGP is somewhat OK, but it is arcane and requires an extra step to verify.
As you can see, PDF combines all those desirable features in one convenient format, and there are many different readers and writers. A good number of them are free. What is there not to like? Alternatives may be just as good in one specific aspect, but there is no competition that does all of that pretty well.