Ask Slashdot: How Do You Automatically Sanitize PDF Email Attachments?
First time accepted submitter supachupa writes "It seems the past couple of years that spearfishing is getting very convincing and it is becoming more and more likely someone (including myself) will accidentally click on a PDF attachment with malicious javascript embedded. It would be impossible to block PDFs as they are required for business. We do disable javascript on Adobe reader, but I would sleep a lot better knowing the code is removed completely. I have looked high and low but could not find a cheap out of the box solution or a 'how to' guide for automatically neutralizing PDFs by stripping out the javascript. The closest thing I could find is using PDF2PS and then reversing the process with PS2PDF. Does anyone know of a solution for this that is not too complex, works preferably at the SMTP relay, and can work with ZIPed PDFs as well, or have some common sense advice for dealing with this so that once its in place, there is no further action required by myself or by users."
As far as I know, Foxit Reader strips out any JavaScript. The PDF readers in Chrome and Firefox also should do the same.
Can't go wrong with chicken bone.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
The way I'd do it is to create a dummy printer driver that just writes to a file. Print the PDF to the dummy printer, which in turn creates a new PDF without all the junk.
You can change the legality of a document for example by modifying it.
A solution that modifies the PDF viewer is much better than one that alters the document. That means not using Adobe. Pity the company refuses to build a version that doesn't do Javascript in the first place.
Sumatra PDF:
http://blog.kowalczyk.info/software/sumatrapdf/free-pdf-reader.html
It doesn't get nailed by viruses and security breaches like Adobe's PDF reader. And, it doesn't have adware in the installer like FoxIt. And, he releases updates regularly.
I used to use FoxIt a long time ago, but have since switched to Sumatra. Never looked back, it does the job.
Check out Sumatrapdf http://blog.kowalczyk.info/software/sumatrapdf/free-pdf-reader.html. It's super fast and does not support javascript or actionscript in PDF's. I use it exclusively now.
The best way to protect your computer from malicious Javascript embedded within a PDF is to not install Adobe Reader. If you cannot open the file, your computer cannot be infected.
sudo make me a sandwich
Why in the world is javascript included in PDF documents? PDF is already a Forth like programming language and environment.
Why don't you sanitize the reader? Use a reader with javascript ignored. Or build one from whatever open source pdf reader you can find, if there isn't one already. Or run the pdf reader inside a sandbox without internet access or permanent disk write. If that breaks the portability and the documents don't render correctly when javascript is diabled, tell the sender and blacklist the sender too for good measure. If enough companies lock javascript out of pdf documents eventually the authoring tools will stop using it.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
You don't need a solution that rewrites the PDF. At best it will work correctly "most of the time", and break PDFs the rest of the time. For example, pdf->ps->pdf, or the "print to pdf" solution mentioned earlier in the comments may work fine for scanned PDFs, but if there are annotations/comments then they'll get stripped. This will lead to massive user frustration ("but the comments are there, I sent it in the last email") and people having to find ways to work around your filter. Modifying people's attachments is a bad move. A more reasonable solution is to detect if the PDF contains any javascript code, and if it does, block the PDF entirely.
"Your document has been completed Sent on behalf of *. All parties have completed the envelope 'Please DocuSign this document: To All Employees 2013.pdf'. To view or print the document download the attachment . (self-extracting archive, Adobe PDF) This document contains information confidential and proprietary to * LEARN MORE: New Features | Tips & Tricks | Video Tutorials DocuSign. The fastest way to get a signature. If you have questions regarding this notification or any enclosed documents requiring your signature, please contact the sender directly. For technical assistance with the signing process, you can email support. This message was sent to you by * who is using the DocuSign Electronic Signature Service. If you would rather not receive email from this sender you may contact the sender with your request." They are zero day exploits... The funny thing is, the more I try to piss off elite corps and gov't, the more of these I get. No law against testing the "system".
The problem is that PDF and the PostScript used in it is an executable language. This falls under "executable code in non-executable containers". If you need to be sure, convert the PDF to a series of JPG or GIF pictures and recreate a PDF from them. With any less harsh approach, you may retain malicious PostScript (and other) code.
And, yes, what you are trying to do is non-trivial. Expect anything "simple" will be insecure.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
At some point you trust technology and also reinforce proper user behavior. I hate catch-phrases but your e-mail hygiene should have layers of protection (defense in depth). Assuming that the message got through IP reputation filters, SPAM analysis, malware scans, and was delivered to your user, you rely on desktop protection and cross your fingers that nobody opens it.
We have SMTP appliances from Axway and we used to stop all executable attachments and deliver a notification to the user to call the help desk and request a release. Times changed and we don't do that any more. However, you could annotate the message to remind the user that if they don't know who it's from or what it is or if they weren't expecting it to not open it. And some will anyway. We also used to hold certain attachments for four hours until the virus definitions (and the other defenses) received a couple of updates and then reprocess the message.
If you do try to roll your own, be aware that everyone and their dog creates PDF files with varying degrees of success and we had certain PDF files that caused services to fail on our gateway while they tried to scan and process them. You didn't mention the volume but make sure your solution scales well.
Evince is the PDF/Documents viewer for gnome, It also gets compiled for windows.
In the linux world, its a heavy weight gnome app.(compared to e/x pdf), but its far far far lighter than Adobe Acrobate Reader, and it doesn't do javascript at all. I've yet to come accross issues with PDFs not working, as most legimiate PDFs don't use javascript.
It also comes from a long standing respected open source project, GNOME,(read comparable quality as commericial software,), not a drive by night freeware operation of dubious origins.
https://wiki.gnome.org/Evince/Downloads
Great little app for just such issues.
Adobe Reader (XI, other releases?) can be used in Protected Mode.
Edit>Preferences>Security(Enhanced), check Enable Protected Mode at Startup, All files and check Enable Enhanced Security. And disable Javascript.
I wouldn't say it's possible to definitely sanitize a malicious document, but enabling some of the security features is going to make exploitation more challenging.
You're going to be automatically opening PDF from the internet, so whatever you do, do it in a sandbox.
Before you jump in and start messing with corporate documents, make sure you understand very well why you are doing it in the first place. Is it what you are specifically hired to do? Some PDFs are cryptographically signed, and there is nothing that you can do to alter them that won't invalidate the signature. Other PDFs are password-protected from copying. You cannot legally extract their content (even if technically there are ways.) Malicious content inside a PDF is, therefore, not blockable unless you block all PDFs - and then you will cause more harm to the business than all the PDF viruses taken together. The best solution is to enforce a safe reader.
Download and install Ubuntu or one of these distros ..
AccountKiller
If you use anything but Adobe, it probably won't support javascript because it's fucking stupid to have javascript in a PDF. Just avoid Adobe, because they are allergic to security.
This is my signature. There are many like it, but this one is mine.
There's a couple of vendors (and many more playing catch-up) selling appliances that detonate attachments on sandboxed VMs running in fast virtual memory.
They executed/open attachments and watch to see what happens - registry changes, file drops, network activity, attempts to contact known C&C servers, etc.
Anything that exhibits non-legit behavior get quarantined. FireEye have a box that does this and also crawls network shares, testing files.
Aside from whitelisting, I think it's the best defense against zero day malware. It's a little too pricy for the company I work at right now, but as more vendors add this functionality, the price will come down.
It gripped her hand gently. 'Regret is for humans,' it said.
I was surprised that nobody mentioned it so far. Is it the case that nobody uses KDE these days? (KDE SC can be installed and run on a windows box as well)
And I believe there are programs that convert PDF's to ps's so none of the executable stuff are kept. Whether those will survive legal challenges when some comes up, well, that's for another discussion.
and if you are more cautious than that, set up vm's for these and you have snapshots when things go wrong.
For a long time, I thought like you, that it was my duty to ward off and protect the "children". After a while, you realize 2 things.
First, it is most likely your duty to inform and educate. Do that. Do it well, do it loud, and do it as often as you can. When someone eventually opens up one of those attachments, it will get around, and peer pressure will make everyone else gun-shy. After a user or two of mine got bit by an attachment, and I had repeatedly warned my users about these things.. I ended up with people at my desk occasionally asking..can you come look at this.. it just looks funny.. it was all about the peer pressure and not wanting to be That Guy who clicked the stupid link.
Second, and I hate to say it, this is what we do, and this is job security. You can't save em all Hasselhoff, if ya did, there would be nothing left to do..
Re-evaluate the use-case for the whole PDF attachment. I can't think of a single _good_ reason to use it, ever. If somebody tries to give a false reason why it's a necessary format, just explain to them in technical detail why it's bad. I'm hoping that somebody can reply to this with a _genuine_ reason why sending a PDF (Pretty Damn F'ked) attachment to an e-mail is either necessary or optimal. 'It's good looking' sounds like a weak reason.
Learn the file format and write a program to strip out any executable script elements.
http://www.adobe.com/devnet/pdf/pdf_reference.html.
Javascript should not be given the capability of doing damaging things, It should be confined to a narrow execution context that is limited to being able to do only the things that enhance the experience of that ONE information resource. Dynamic layout is certainly a useful thing. Dynamically changing your system is not. It should not have access. I blame the developers. It doesn't matter if it is mail or web. It might do cute things inside a PDF like give you a calculator for a certain algorithm the PDF is written about. But it should not be able to access even /etc/hosts on your computer.
now we need to go OSS in diesel cars
Lots of people here saying "Don't use Adobe" and suggesting alternatives. Reality is, for many of us, we deal with complex PDF forms and applications that integrate directly with Adobe Acrobat. In my business (CPA firm) we use lots of applications, and most of them are highly vertical with often just one realistic competitor that can function adequately for a firm our size. Many of our apps integrate directly with Acrobat (and Office) so not using Acrobat simply isn't a choice we can make.
So how do we deal with Adobe Acrobat? As some pointed out earlier, defense in depth. Spam filters, multiple virus scans, and our two most important measures: End users don't have admin on their computers and Adobe is one of our "High Priority" upgrade applications. Updates must be pushed out within one day of being released.
BTW, the other other High priority apps are Java and Flash, again, both required by our software. With Acrobat, they make up my "Axis of Evil" of insecure software.
just sayin... you could simply use a more secure pdf reader.
...are you saying Earth was the victim of a planetary-scale golden shower at some point?
Use Foxit and keep javascript off by default. (Or don't even install the JavaScript plugin.) It's lightweight, fast and has fewer quality issues than adobe. Additionally, considering PDF is inherently an unsafe format, I'd say adding a sandbox like Sandboxie can help you. More technical people here might try porting a good PDF reader's key parsing and JS functionality to NaCl sandboxing system. Put each component in separate partitions with inner sandbox protection at a minimum. That lets us use the fast and legacy native code, but have plenty isolation almost for free. Nick P Security Engineer usually on schneier.com
Simple. In our organization, Sumatrapdf is the only allowed PDF reader. Users could request nitro or foxit but a sysadmin would disable JavaScript on install. Never once had a malicious PDF infect our organization. Little more work to not give users admin rights to their machines. But time and time again, users prove they are too incompent to safely manage their own machines.
And moron who allows attachments should be fired.
We allow authorized persons to upload files. 100% of attachments are trashed. Emails with pictures or questionable html, attachments or other tripe are stripped to raw text. There is a zero tolerance policy on personal devices connecting to the internal net and this includes USB or other devices. If you find you can't do your job with all the twiddly blather there's a line at the door.
Scrubbing JS from all PDF files is only one step below blocking PDF outright. Sysadmins have to understand that you can't combat ignorance and stupidity with technology. It's never going to work. We've spent the last two decades trying to block this exploit and that, but has it made us safer? No, it hasn't. You know why? Because people are gullible, that's why. You can't fix that. Just design your systems so that critical infrastructure isn't damaged or disrupted by stupid users.
Seriously, why do people still run acrobat? PDF is a standard format, there are countless programs which support it and the only reason such files are a target is because adobe reader is basically a monoculture and represents a very large and attractive target. We need diversity among PDF readers, just like diversity among web browsers. It was diversity among web browsers more than anything else that reduced browser attacks and caused hackers to concentrate on proprietary monoculture plugins instead.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
The submitter is looking for a code-based solution to a sociological/psychological problem, and it's just not going to be effective.
The real solution is to educate and train your users so they don't fall prey to these sorts of attacks. I know a lot of IT people aren't comfortable dealing with people, and I know it takes quite a bit of time and doesn't look as snazzy on your résumé - but, really, it's the best long-term approach.
#DeleteChrome
Print then delete the file.
In the install tree find the file JSByteCodeWin.bin and rename it. Works for me.
I think all the people commiting suicide at their Seattle office might be getting to them.
Their Seattle office is right under the Aurora bridge, popular with jumpers...
I did not know it was possible to detect javascript in a PDF, and I think this is possibly a better approach than a full rewrite (btw: I found this python script: http://blog.didierstevens.com/programs/pdf-tools/ ) So instead of rewriting every PDF, you just choose to delete any PDF attachments that are detected with JavaScript. I assume this will then not break any legitimate PDFs that have comments or forms, etc? It will need testing, I guess.
The mail relay can then be configured to detect and delete any javascript-containing PDFs and allow everything else through (including encrypted, which is more likely to be legit than not). Once again, this is not the only protection against this malicious code, but just one facet. I found some recent exploits that don't need javascript at all, so it seems the safest, yet most likely to make you hated, approach is to rewrite the PDF completely or not allow PDFs at all.
Ask the guys at NSA check the attachements for viruses etc. for you while they read through your mails anyway :)
http://www.clearswift.com/sites/default/files/documents/datasheets/Sanitization%20SECURE%20Email%20Gateway%20v2.pdf
I use Ghostscript when attempting to compress a "bloated" PDF (such as generated by Xsane). The input is a PDF, output is a PDF:
# Use ghostscript to re-write the PDF
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=new.pdf old.pdf
Also handy to combine multiple PDFs into a single document, or copy out certain pages from a PDF:
# Combine PDFs
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=combined.pdf 01.pdf 02.pdf 03.pdf
# Copy pages 3 & 4 from an existing PDF
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dFirstPage=3 -dLastPage=4 -sOutputFile=new.pdf current.pdf
> cheap out of the box solution
Out of the box probably means fast to implement? So you want something good and fast and for cheap? All three together are simply not possible.
If you have the (big) money, look at Clearswift's "Secure Mail Gateway / Secure Web Gateway appliances" or Proofpoint USA appliances (in Europe it is known as the "F-Secure Messaging Security Gateway").
Those have extremely refined, enterprise level message manipulation capabilities. Clearswift is actually capable of automatically censoring e-mails if you want (recognizes strong language, SSNs, credit card numbers, personal data etc. in mail body and attachments and crosses them out).
Bromium (company I work for) sells a product that wraps all web pages and popular attachment formats like PDF in lightweight VMs, without disturbing the user experience. In this way you can still interact with the documents, but they can't infect your host OS.
If you know a bit of Java you can do it in a few lines of code with the BFO PDF Library.
Remove Adobe-X from all systems. Flash, Reader, Acrobat, Photoshop, everything from Adobe.
As to Javascript in PDF files, xournal, evince don't seem to care about that. Use them instead of Adobe-pdf stuff.
The only people who should have Adobe software loaded on their computers are professionals who make a living from those tools. Everyone else should disable and remove adobe-whatever. It is a matter of computer security and safety.
We should also train our end-users to use HTML for most documents, unless page layout is critical. Page layout is probably only critical to marketing people. The rest of us are better served by the loose layout control that HTML provides.
I like adobe products from a feature standpoint, but that isn't enough anymore. We need software that is safe to use too. Safey is more important than features to most users now. Until Adobe management makes the tough decisions to secure their tools by default, we are all screwed. We have little choice except to deinstall them all.
Only hire people who are smart enough to not open obviously fake attachments
Simply put, sandbox the suspicious pdf in a sandbox environment. Best free solution comes with Comodo Internet Security. What sandbox does is it isolates the pdf virtualy so no malicious code stands a chance.
Also Foxit Reader beats Adobe Reader hands down, and with the above mentioned CIS you can disable JS for readers via custom HIPS ddfense filtersp.
http://pdfjavascriptst.sourceforge.net/
Remember back when Internet worms were rampant? Yeah, back then we had all ports open by default in popular desktop operating systems. Finally they were convinced to release a service pack that closed all ports (turned on the firewall by default) and nearly all the worms went away.
That's what you call using a whitelist instead of a blacklist. A blacklist is stupid, but I guess it's the only option you have if everyone is too stupid to utilize the whitelist. This problem has been boiling for a while, and its come down to authentication of endpoints and blocking all others not in the trust graph. If only there were some system that allowed you to authenticate emails and form decentralized trust graphs... Like PGP.
So, How do you automatically sanitize all the packets hitting your ports? You don't. You block all but the ones that are legitimate. How do you automatically sanitize PDF Email Attachments? You don't. You block all but the ones that are legitimate.
Instead of thinking me clueless, or unhelpful consider that I've already been down that road as far as it can be traveled and wound up in the exact situation I started. Absolute security is impossible, make the environment hostile to propagation. Call up the other IT guys in the businesses you do work with, it's a problem you can't solve on your own. If only Jedi Mind tricks actually worked, you could convince your managers to let you live. These scanners are not the solutions you are looking for. Whitelists are the way to go.
For dangerous attachment types, you can quarantine them using MimeDefang. Then you provide a link for download after X days (notifying the recipient of the mandatory quarantine time), and a procedure for the helpdesk to pre-release 'known good/expected' documents. While in the quarantine area, you can do whatever you like to it ... scan for viruses, convert to another format, etc.
There's an interesting talk from Chaos Communication Camp 2011 about making a verified PDF scanner in the Coq proof assistant: http://www.youtube.com/watch?v=CmPw7eo3nQI
Beside that, if you sue a less feature rich pdf reader, that is safe,
the virusscanner will STILL find the suspicious js code, flag it, and set of all kinds of big alarms. And then they will be back at the mail guy.
Maybe you want to run mail in a virtulized sanatized citrix box, where you cannot infect an pc that has access to al internal systems.
There is no where to hide. It's the evolution of work as we know it. The future is mainframes.
Built-in feature using hardware-hardened (VTx and IOMMU) disposable virtual machines. Process is described here: http://theinvisiblethings.blogspot.com/2013/02/converting-untrusted-pdfs-into-trusted.html
I know it's a thought doomed from the start, but switching from pdf to epub as the document standard would be a big help.
https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
why not solve the problem properly and just sandbox the PDF reader so it can't access anything except the PDF that's passed into it?
We can within a company, but we can't control what reader software the recipient of a forwarded PDF uses.
The actual javascript object looks something like this:
244 0 obj<</S/JavaScript/JS(all javascript code is between the two paranthesis)>>
endobj
Just stripping it out appears to work ok. Though the first number on the line appears to be an object number... so you might need to renumber the remaining objects as well to avoid problems with some viewers (I'm just guessing here).
If you want to go through a PDF a scrub out such things as JavaScript, actions, annotations, etc. I would implement either Enfocus' PitStop Server or Callas' pdfToolBox Server. They pay tools are not some sort of conspiracy. They have been tested in a large number of production environments with a zillion PDFs produced by various tools and systems. The vendors (Adobe included) have libraries (10's of thousands) of malformed PDFs that they use to regression test their products.
Do not refry (PDF--> PS --> PDF) the PDF unless you know what you are doing. It's not the PS --> PDF using Ghostscript that's the problem (ver 9 of GS actually produces a pretty decent PDF). It's the creation of the PS from the PDF feedstock. It is not as easy as you may think. Did you sit down with a loupe to see if you have the resulting PDF look EXACTLY as the input? Didn't think so. You can run into all sort of weird issues with fonts, color spaces, transparency, alternate content layers etc. by doing a blind refry. There are a lot of ways to create a PDF. There are relatively few ways to do it correctly. There are very few (read: only ONE!) PDF Reader that actually does a good job on the not so well-formed PDFs. That being Adobe Reader.
Tools that decompose the PDF and recompose it will be hit or miss.
With regards to installation of Reader in a corporate environment:
1) Use the latest/current version. Starting with Reader X (ten) Adobe launches PDFs in a sandboxed mode (until disabled by the user), negating much of the JS and other exploits that have been rampant previously. Starting with Acrobat XI (Spinal-Tap version - it goes to Eleven!), even Acrobat is launched in a sandboxed mode, again until disabled by the user.
2) Use the enterprise deployment tools that Adobe provides http://www.adobe.com/products/acrobat/it-resources.html to make sure that a) Reader is locked down b) stays locked down according to your corporate needs. The tools provided can allow you to harden Reader quite a bit and keep the users from making changes.
3) If you are truly of the paranoid type - and there are some business areas that have a legitimate need to be hyper paranoid about this stuff - only allow the PDFs to be opened inside of a hardened virtual machine that you remote into. Sort of a glove box approach to the PDF. Others have mentioned various methods to do this which are perfectly acceptable.
Now, a larger number of slashdotter's are not going to like this - but much (most?) of the FOSS PDF software, tools, and libraries, produce less than optimal PDFs. The primary problem stems from 1) good page layout design is not the same as good web design. 2) Good PDF is built by using the most expedient and direct method possible. Not by using the most obscure methods you can find (such as how Apache FOP loves to f-around with the CTM rather than just performing a simple moveto). This is not RISC vs. CISC. Yes, f-ing around with the CTM allows you to slice, dice, Julian, fry, as well as being both a dessert topping and a floor polish. However, it is almost impossible to debug. You would have been better off just coding moveto, rmoveto, translate, scale, rotate, etc. as individual function calls (note, I am using the PostScript equivalents to the internal PDF commands). Your code is easier to parse, understand, debug, and, most importantly, follows generally industry concepts. 3) Use the minimal work to get the job done, not the most maximal. Don't screw around with kerning, leading, etc. unless you really need to. Place stings of characters as stings, not individual glyphs. 4) Learn the industry you are developing in and not gripe that the industry has no clue as to what they are doing. The typographic/layout industry has 10x the longevity as the web industry (500+ years vs. ~50). Most of the mistakes noobs were learnt years ago. Learn from their mistakes first. Yes, there are some things that are holdovers from tim
"- PDF is no longer searchable"
Use a PDF OCR product like Adobe Acrobat Professional or an ABBYY FineReader product.
Print to PDF using the open source PDFCreator, which would create another PDF file?
XPS
use lysoform and fire
Enfocus PitStop Server ( http://www.enfocus.com/en/products/pitstop-server ) can remove javascript from PDFs.
Any manipulation to a digitally signed document would really piss me of, and seriously consider going for face to face talk with the power-hungry "mail guy" who toke that decision!
Believe me... if you want to keep your job and your health, do *nothing*... unless you are prepared to guarantee that your sanitization does NOT change any content being perceived by the user or that it won't be your problem at all (blame mcafee,kaspersky,etc).
Beware of people changing the extension (to, say, ".foo") before sending it inside your network. The recipient just has to change the extension back to ".pdf" and voila: they've snuck in an unsanitized PDF. And then there's the problem of password-protected Zip files.
I use xpdf and FreeBSD. Works for me.