Slashdot Mirror


Ask Slashdot: How Do You Automatically Sanitize PDF Email Attachments?

First time accepted submitter supachupa writes "It seems the past couple of years that spearfishing is getting very convincing and it is becoming more and more likely someone (including myself) will accidentally click on a PDF attachment with malicious javascript embedded. It would be impossible to block PDFs as they are required for business. We do disable javascript on Adobe reader, but I would sleep a lot better knowing the code is removed completely. I have looked high and low but could not find a cheap out of the box solution or a 'how to' guide for automatically neutralizing PDFs by stripping out the javascript. The closest thing I could find is using PDF2PS and then reversing the process with PS2PDF. Does anyone know of a solution for this that is not too complex, works preferably at the SMTP relay, and can work with ZIPed PDFs as well, or have some common sense advice for dealing with this so that once its in place, there is no further action required by myself or by users."

32 of 238 comments (clear)

  1. Foxit Reader? by Anonymous Coward · · Score: 5, Informative

    As far as I know, Foxit Reader strips out any JavaScript. The PDF readers in Chrome and Firefox also should do the same.

    1. Re:Foxit Reader? by MoFoQ · · Score: 3, Informative

      dang...I was about to say the same...

      but yea...best way to sanitize is by not using Adobe Acrobat (or Acrobat Reader).

      on OSX and many Linux distros have their own builtin viewer ("Preview" in OSX, and "Display" at least on Ubuntu).

      Also, you can probably use Google Apps to do the same as well.

    2. Re:Foxit Reader? by fuzzyfuzzyfungus · · Score: 5, Insightful

      That isn't really 'sanitizing', though: It's certainly good that you practice safe text on your computer; but if you are the mailserver guy, and may or may not have as much control as you'd like over the users and their filthy, weatherbug-encrusted, systems, you want to modify the file such that it no longer contains a potential payload, not merely use a reader that doesn't execute payloads.

    3. Re:Foxit Reader? by Mashdar · · Score: 4, Informative

      I run a ghostscript shell script to print a PDF as a new PDF:

      gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=NEW_FILE.pdf -dBATCH OLD_FILE_1.pdf OLD_FILE_2.pdf

      In this case OLD_FILE_1.pdf and OLD_FILE_2.pdf will be combined into NEW_FILE.pdf. AFAIK this strips javascript.

  2. Print to PDF by digitalhermit · · Score: 4, Informative

    The way I'd do it is to create a dummy printer driver that just writes to a file. Print the PDF to the dummy printer, which in turn creates a new PDF without all the junk.

    1. Re:Print to PDF by Kludge · · Score: 4, Informative

      Like
      lpr -P Cups-PDF file.pdf

    2. Re:Print to PDF by DJ+Jones · · Score: 5, Interesting

      Sadly a lot of PDF printers will retain javascript code even if you print it and re-assemble it back into a PDF. The problem lies in the fact that Adobe allows javascript to be embedded inside image objects and compressed blocks of PDF binary. It's not as simple as opening the file and stripping out anything that starts with . Code can be fired on almost any user event and it can be attached to almost any high-level object. It's not impossible to create a scrubber but it's a lot more complicated than you might think.

      I spent the better part of a week attempting to create a PDF scrubber at my office for this same reason. We had become victim to highly targeted attacks from PDF sources. I wrote a scrubber in PHP using an open-source PDF parser and a series of regular expressions to strip out any javascript. At the end of the day, I came very close to a working solution but I ran into issues with encrypted PDF's.

      The project was shelved in favor of making users open all external PDF's on a virtual server that was hardened and re-imaged every evening to prevent any malicious code from running rampant. That's the simplest solution.

    3. Re:Print to PDF by fuzzyfuzzyfungus · · Score: 3, Interesting

      Out of curiosity, were you dealing with enough fancy-forms-and-interactive-nonsense type PDFs that the 'just brutally rasterize it and let them eat .jpeg!' option wasn't an option, or were the attackers good enough that you didn't have a PDF renderer you could trust for the rasterizing duties?

    4. Re:Print to PDF by nicolastheadept · · Score: 4, Funny

      Converting to JPEG? You're a terrible human being.

      --
      09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
    5. Re:Print to PDF by TheRaven64 · · Score: 3, Informative

      Stripping JavaScript isn't enough. For example, a number of 'PDF' exploits have actually been due to vulnerabilities in libpng: if your PDF contains a PNG image (a lot do), then it may have a metadata payload that triggers a bug in libpng that allows arbitrary code execution. The same can happen for embedded fonts and for embedded JPEG images.

      --
      I am TheRaven on Soylent News
  3. Be careful modifying documents by Anonymous Coward · · Score: 5, Informative

    You can change the legality of a document for example by modifying it.

    A solution that modifies the PDF viewer is much better than one that alters the document. That means not using Adobe. Pity the company refuses to build a version that doesn't do Javascript in the first place.

    1. Re:Be careful modifying documents by macbeth66 · · Score: 4, Informative

      I believe that for a PDF document to be a legal document, it needs to be in PDF/A format. This format prohibits the use executable code, such as Javascript.

    2. Re:Be careful modifying documents by godrik · · Score: 4, Insightful

      Where does this belief comes from? Why would there be any format requirement on these things? The requirement would need to be in the law or in a court judgment. Is the law going to be that precise over electronic communications? (Not trying to bitch, just really wondering)

    3. Re:Be careful modifying documents by Aaron+B+Lingwood · · Score: 4, Informative

      I believe that for a PDF document to be a legal document, it needs to be in PDF/A format.

      Where does this belief comes from?

      Many states have legislation regarding the font, margins and paper sizes used for some legal documents.

      US courts, archivists and many case management / COPS systems only accept documents in PDF/A.

      --
      [Rent This Space]
  4. Sumatra PDF by shellster_dude · · Score: 5, Insightful

    Check out Sumatrapdf http://blog.kowalczyk.info/software/sumatrapdf/free-pdf-reader.html. It's super fast and does not support javascript or actionscript in PDF's. I use it exclusively now.

  5. javascript? by sjames · · Score: 4, Insightful

    Why in the world is javascript included in PDF documents? PDF is already a Forth like programming language and environment.

    1. Re:javascript? by fuzzyfuzzyfungus · · Score: 3, Interesting

      I think you are thinking PostScript. PDF requires that all computations resolve to a well defined value based on information contained within the document (i.e. not turning complete). So then of course Adobe had to add a turing complete language back in.

      I don't know if any implementations are stupid enough to implement this(at least without some very careful sanitizing); but(in addition to ramming in javascript and the ability to embed basically anything at all, thanks for nothing 'rich media annotations'), they even added: Launch Actions!

      "12.6.4.5 Launch Actions
      A launch action launches an application or opens or prints a document. Table 203 shows the action dictionary
      entries specific to this type of action.
      The optional Win, Mac, and Unix entries allow the action dictionary to include platform-specific parameters for
      launching the designated application. If no such entry is present for the given platform, the F entry shall be
      used instead. Table 203 shows the platform-specific launch parameters for the Windows platform. Parameters
      for the Mac OS and UNIX platforms are not yet defined at the time of publication."

      Your Standards Compliant Solution for executing arbitrary binaries with arbitrary parameters. No need for messy, version-sensitive, exploit code! Combine with javacript and web-interaction support to build documents that search the target's hard drive for interesting things upon being opened... Or(miracle of miracles!) build a PDF that runs the adobe update utility when you open it, you're sure to find something new every time!

  6. Re:Just block PDFs with javascript by Kardos · · Score: 3, Funny

    Looks like these guys made a tool to do the JS detection: http://www-rsec.cs.uni-tuebingen.de/laskov/papers/acsac2011.pdf

  7. You don't by PNutts · · Score: 5, Interesting

    At some point you trust technology and also reinforce proper user behavior. I hate catch-phrases but your e-mail hygiene should have layers of protection (defense in depth). Assuming that the message got through IP reputation filters, SPAM analysis, malware scans, and was delivered to your user, you rely on desktop protection and cross your fingers that nobody opens it.

    We have SMTP appliances from Axway and we used to stop all executable attachments and deliver a notification to the user to call the help desk and request a release. Times changed and we don't do that any more. However, you could annotate the message to remind the user that if they don't know who it's from or what it is or if they weren't expecting it to not open it. And some will anyway. We also used to hold certain attachments for four hours until the virus definitions (and the other defenses) received a couple of updates and then reprocess the message.

    If you do try to roll your own, be aware that everyone and their dog creates PDF files with varying degrees of success and we had certain PDF files that caused services to fail on our gateway while they tried to scan and process them. You didn't mention the volume but make sure your solution scales well.

  8. Re:Rasterize and reencapsulate by Kardos · · Score: 3, Informative

    If you rasterize and re-encapsulate your user's PDF attachments, your users will hate you, and work around your "stupid filter that breaks pdf attachments". You are better off blocking all PDF attachments by email. It'll save yourself a ton of work, and your users can skip the frustration of mangled attachments and go directly to working around your filter.

  9. Re:Why are you doing this? by tftp · · Score: 4, Informative

    Signed PDFs can be read in any reader, but the signature will be still validated (if the reader is not defective.) Encrypted PDFs will not be even readable if they are not encrypted to you. Password-protected PDFs may require the password to be readable, let alone printable or changeable.

    In other words, PDFs are not designed for wanton modification. Some of them can be modified, but others cannot. This means that you cannot build a reliable method for converting suspect PDFs into safe PDFs.

  10. Re: Just block PDFs with javascript by Anonymous Coward · · Score: 5, Funny

    That link is to a PDF! How do I know it's not a trap? Oh, the dichotomy :-(

  11. Test the Attachments by Flere+Imsaho · · Score: 3, Interesting

    There's a couple of vendors (and many more playing catch-up) selling appliances that detonate attachments on sandboxed VMs running in fast virtual memory.
    They executed/open attachments and watch to see what happens - registry changes, file drops, network activity, attempts to contact known C&C servers, etc.
    Anything that exhibits non-legit behavior get quarantined. FireEye have a box that does this and also crawls network shares, testing files.

    Aside from whitelisting, I think it's the best defense against zero day malware. It's a little too pricy for the company I work at right now, but as more vendors add this functionality, the price will come down.

    --
    It gripped her hand gently. 'Regret is for humans,' it said.
    1. Re:Test the Attachments by ZzzzSleep · · Score: 3, Funny

      Then just do all your work in a VM, and you'll be safe from malware!

  12. You Don't by SuperCharlie · · Score: 5, Interesting

    For a long time, I thought like you, that it was my duty to ward off and protect the "children". After a while, you realize 2 things.

    First, it is most likely your duty to inform and educate. Do that. Do it well, do it loud, and do it as often as you can. When someone eventually opens up one of those attachments, it will get around, and peer pressure will make everyone else gun-shy. After a user or two of mine got bit by an attachment, and I had repeatedly warned my users about these things.. I ended up with people at my desk occasionally asking..can you come look at this.. it just looks funny.. it was all about the peer pressure and not wanting to be That Guy who clicked the stupid link.

    Second, and I hate to say it, this is what we do, and this is job security. You can't save em all Hasselhoff, if ya did, there would be nothing left to do..

  13. Re:Change configuration by fuzzyfuzzyfungus · · Score: 3, Insightful

    And be sure to double-check that the next update doesn't revert those settings on you...

  14. Re:Take A Step Back by tftp · · Score: 3, Insightful

    I'm hoping that somebody can reply to this with a _genuine_ reason why sending a PDF (Pretty Damn F'ked) attachment to an e-mail is either necessary or optimal

    What else would you use to send an invoice, or a contract, or a drawing, or a user's manual, or anything else that requires pixel-accurate placement of all elements as designed ? It has to support digital signatures as a minimum, and preferrably a complete public key encryption. PDF does that.

    'It's good looking' sounds like a weak reason.

    The 'good looking' is a weak reason. "Correct" is a far better reason. Once you print into a PDF, it captures your document exactly as it is. You want your documents to represent what you put into them - neither more nor less. Perhaps there are better formats, but I'm not aware of any.

  15. Other options not always an option by fermat1313 · · Score: 4, Insightful

    Lots of people here saying "Don't use Adobe" and suggesting alternatives. Reality is, for many of us, we deal with complex PDF forms and applications that integrate directly with Adobe Acrobat. In my business (CPA firm) we use lots of applications, and most of them are highly vertical with often just one realistic competitor that can function adequately for a firm our size. Many of our apps integrate directly with Acrobat (and Office) so not using Acrobat simply isn't a choice we can make.

    So how do we deal with Adobe Acrobat? As some pointed out earlier, defense in depth. Spam filters, multiple virus scans, and our two most important measures: End users don't have admin on their computers and Adobe is one of our "High Priority" upgrade applications. Updates must be pushed out within one day of being released.

    BTW, the other other High priority apps are Java and Flash, again, both required by our software. With Acrobat, they make up my "Axis of Evil" of insecure software.

  16. acrobat reader sanitized 100% by jjohn_h · · Score: 5, Informative

    In the install tree find the file JSByteCodeWin.bin and rename it. Works for me.

  17. Summary by supachupa · · Score: 4, Informative
    So the vast majority of people are recommending to ditch Adobe Acrobat, which is not where I was wanting to focus the discussion, but I appreciate your advice. I do agree that using something like Sumatra would be a good part of a defense-in-depth approach, but that approach does not protect your organisation from inadvertently sending out an infected PDF to another organisation.

    I did not know it was possible to detect javascript in a PDF, and I think this is possibly a better approach than a full rewrite (btw: I found this python script: http://blog.didierstevens.com/programs/pdf-tools/ ) So instead of rewriting every PDF, you just choose to delete any PDF attachments that are detected with JavaScript. I assume this will then not break any legitimate PDFs that have comments or forms, etc? It will need testing, I guess.

    The mail relay can then be configured to detect and delete any javascript-containing PDFs and allow everything else through (including encrypted, which is more likely to be legit than not). Once again, this is not the only protection against this malicious code, but just one facet. I found some recent exploits that don't need javascript at all, so it seems the safest, yet most likely to make you hated, approach is to rewrite the PDF completely or not allow PDFs at all.

  18. Ghostscript by nullchar · · Score: 4, Informative

    I use Ghostscript when attempting to compress a "bloated" PDF (such as generated by Xsane). The input is a PDF, output is a PDF:

    # Use ghostscript to re-write the PDF
    gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=new.pdf old.pdf

    Also handy to combine multiple PDFs into a single document, or copy out certain pages from a PDF:

    # Combine PDFs
    gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=combined.pdf 01.pdf 02.pdf 03.pdf

    # Copy pages 3 & 4 from an existing PDF
    gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dFirstPage=3 -dLastPage=4 -sOutputFile=new.pdf current.pdf

  19. CCC by Warbothong · · Score: 3, Interesting

    There's an interesting talk from Chaos Communication Camp 2011 about making a verified PDF scanner in the Coq proof assistant: http://www.youtube.com/watch?v=CmPw7eo3nQI