Slashdot Mirror


DARPA: Reconstruct Shredded Docs, Win $50K USD

ematic writes with a link to an interesting competition from DARPA: "The ability to reconstruct shredded documents will potentially yield information that may save lives or offer critical information about an adversary's plans. Currently, this process is much too slow and too labor-intensive, particularly if the documents are handwritten. We are looking to the Shredder Challenge to generate some leap-ahead thinking in this area. The Shredder Challenge is composed of five separate problems. The overall prize awarded depends on the number and difficulty of problems solved."

32 of 209 comments (clear)

  1. Puny prize by afidel · · Score: 3, Insightful

    Someone with a unique way of reconstructing shredded documents can probably earn more than that in one afternoon of dumpster diving.

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    1. Re:Puny prize by lgw · · Score: 4, Informative

      Well, the normal approach is to scan all the remains, calculate a checksum for the pattern along each edge, then match the checksums to reconstruct the docuement. Without crosscut shredding this is very fast and effective.

      As I understand it, the government now shreds anything important (paper, hard drive, etc) down to less than 1mm on a side, so it's not such an easy problem these days - veyr many disctint pieces, and not much distinctness along the edges.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    2. Re:Puny prize by jd · · Score: 2

      With this problem, you'd do basically the same thing but you'd want slightly more data to handle the smaller size. (For handwriting, pressure and gradient would be the most obvious extensions to use. For original printed documents, you MAY be able to use any random fluctuations in toner or ink, since any fluctuations will be highly localized. For photocopies, the contrast of the copy might not be enough to show up extremely small variations - I'm really not sure what you could use there.)

      However, you needn't get a unique solution.

      Option 1: (Assumes some text is on the document.) You can use the methods used in cryptology to determine if the result of using a given key has produced a possible plaintext. A random arrangement of remains will never produce a sane text when OCRed, you'll always get a line where characters should be but can't be identified.

      Option 2: An alternative second stage would be to treat every possible combination that meets the initial criteria as possible plaintexts for a cypher. The analysis at the end of the 2dem paper shows that basic encryption always expose some information even when the encrypted document is not considered readable. So it aught to be possible to look for information leakage to reduce the number of "keys" (ie: orderings that meet the criteria of stage 1) you need to brute-force look at.

      If option 1 is sensitive enough and the document is of a type that permits simple analysis of the form rather than the content, then step 2 would never be needed. Step 2 is only needed if the form is not enough.

      The problem here is that the above solution isn't guaranteed to reduce the number of possibilities to anything sensible. Stage 2 is a herustic that tries to make one segment correct in the hopes that this will make everything correct. Anyone who has solved one face of a Rubics cube knows this isn't a guaranteed approach. The best that can be said is that it will always reduce the problem space, but won't necessarily reduce it to the level that the problem can be considered essentially solved.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  2. Re:I think they know how to do this very well alre by Anonymous Coward · · Score: 5, Funny

    I don't know, I've been hitting the shredded documents with a wrench for the last 10 minutes, it doesn't seem to be working.

  3. Shred? by MarkGriz · · Score: 4, Insightful

    Any adversary that shreds rather than incinerates critical information they don't want recovered isn't much of an adversary.

    --
    Beauty is in the eye of the beerholder.
    1. Re:Shred? by Surt · · Score: 2

      Any adversary whose incineration chimney doesn't have a tight particle filter isn't much of an adversary.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    2. Re:Shred? by tmosley · · Score: 2

      Nuke them from orbit.

      It's the only way to be sure.

    3. Re:Shred? by TheRaven64 · · Score: 3, Informative

      I did some work for a company that stores legal documents a few years ago. When things are ready for disposal, they are shredded loaded into a locked container. This container is then driven away and not unlocked until it arrives at its destination. Once there, it's emptied into a swimming pool filled with bleach. It is then removed from there and recycled. By the time it comes out of the bleach, it is small fragments of white fluff.

      --
      I am TheRaven on Soylent News
    4. Re:Shred? by halivar · · Score: 4, Funny

      Bundle the documents up in a hardcover, put a picture of Snooki on the front and give her writing credit. Distribute to any bookstore in the country that will take it. The secret will be safe forever.

    5. Re:Shred? by lgw · · Score: 2

      The recurrent problem seems to be that you contract for the one and get the other in practice. That sort of problem is why the popular shredding services will shred your documents on their truck while you watch, so at least you know what you're getting.

      --
      Socialism: a lie told by totalitarians and believed by fools.
  4. Cheapasses by RobinEggs · · Score: 5, Insightful

    You gotta love when someone offers a $50,000 prize for an improvement that would save them millions of dollars in labor, not to mention the value of files reconstructed that might have been ignored before it became so much easier to do.

    A million dollars for improving the movie recommendations on Netflix, and $50,000 for a massive intelligence breakthrough?

    Way to go, Pentagon. Way to prove that even with a defense budget of $649 billion dollars you can still be a total cheapass.

    1. Re:Cheapasses by LostCluster · · Score: 2

      Yep. Whoever solves this puzzle might want to retain copyright on their work rather than sell it for only $50,000 and then go to work for whomever DoD is planning to use this against.

    2. Re:Cheapasses by Surt · · Score: 2

      Well, they are spending the taxpayer dollar. Technically they have an obligation to do it as cheaply as possible. Netflix can just raise its rates, or split its service and demand more money for each, etc.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    3. Re:Cheapasses by pclminion · · Score: 3

      Well, they are spending the taxpayer dollar. Technically they have an obligation to do it as cheaply as possible.

      In other words, the government is obligated to obtain the shittiest services possible? Speak for yourself. Me, as a taxpayer? Fuck that. If you can't afford to do things the right way with the taxes you currently collect, you either need to cancel a lot of spending or raise taxes. "Buy crappy stuff at a discount" is not an option I find acceptable.

    4. Re:Cheapasses by SuricouRaven · · Score: 2

      A matter of lazyness. Imagine you come up with a new super-awesome idea worth millions. You have two options.

      1. Sell it to the highest bidder. You'll get a million or so. You're not going to be one of the mega-wealthy on that, but you can retire early and live a life of comfortable luxury.
      2. Found a business. Now you can be one of the mega-wealthy, but only if you have business savvy, and legal knowledge, and a bit of luck. You'll also spend the next few decades in meetings, running your new company. You do get to life in decadent luxury when you're not busy working, and you get enough money to do some serious dabbling in politics or philoanthropy.

      So it comes down to a simple question of which is more important: Modest wealth but a life without work, or risking it to become much wealthier?

      In the case of this prize though, DARPA just isn't offering enough. Tech worth that much? Put the patent on eBay. I'm sure someone will pay a lot more.

  5. Re:I think they know how to do this very well alre by Verdatum · · Score: 4, Informative

    I get this all the time. You're probably using imperial; try switching to metric.

  6. FUN by kodiaktau · · Score: 2

    This actually looks like a ton of fun. After looking at the basic documents they tried to put other indirection in the images like color levels that really need to be sorted before the actual shredding issue is resolved. There is a mix of up/down and useless data on the page, but the ligatures seem consistent on the images - brute force on the first page is probably the most cost effective solution - the others seem to be order of magnitude problems. The reality of this being "shredded" solution is probably a real-life problem in disguise like a transmitted scrambled image problem or connecting/stitching problem.

  7. Re:Talk to ze Zhermans by Teun · · Score: 2
    Yep, the Germans have software to tackle this problem and they are still working on improving it.

    So they might have a head start to winning the prize.

    --
    "The likes of Facebook and WhatsApp are free to those whose privacy is of zero value."
  8. That's a little... cheap by Leebert · · Score: 3, Insightful

    I think you'd be better off, if you were successful, to simply commercialize it. $50,000? That's like the first year's support contract on the software you'll sell them for $300,000 per seat. And since it's "enterprise" software, it doesn't even have to actually work particularly well. That's why you sell the support contracts.

    1. Re:That's a little... cheap by malakai · · Score: 2

      The 50k is prize money to reward you for trying and doing. It doesn't give them the rights to your technology. You can set whatever price you want on it. But they may now know that 20 other people came really close, and your 'super amazing proprietary' algorithm isn't all that super amazing. This gives them a better negotiating position. You may win the 50k and some other guy may end up with the contract for 10million over 3 years.

  9. Outsource it to 30 years ago by xenocide2 · · Score: 2

    Off the top of my head, this seems very close to the techniques used for shotgun sequencing of genomic data. Lots of little strands you want to line up. Just in multiple dimensions.

    --
    I Browse at +4 Flamebait

    Open Source Sysadmin

  10. Re:expensive OCR operation by malakai · · Score: 2

    This has nothing to do with scanning the fragments. They give you a tiff, with an alpha channel, and each scrapped already pressed out and scanned into the image.

    The thought being, in the field, you can get the grunts to take back the bag of shreds, lay them out in blocks, scan them, and submit the blocks to some back-end program that will do some jigsaw algo to put together pieces within the block. You'll just have to make sure each shred is surrounded by a space.

    Honestly, I'm surprised some archaeological PHD hasn't already invented some system similar to this, for putting back together s broken Egyptian hieroglyph style wall writing or something.

  11. Documents From the U.S. Espionage Den by alanw · · Score: 3, Interesting

    shredderchallenge seems to be Slashdotted, so apologies if this is a dup.

    During the Iran Hostage Crisis teams of carpet weavers were recruited to piece together shredded documents. They were then published in 1982 in 54 volumes under the title "Documents From the U.S. Espionage Den".

  12. They've actually done that for years by sean.peters · · Score: 2

    Even when I first got into the Navy (which was like 25 years ago... damn I'm old), we were using cross-cut shredders to destroy classified paperwork. These things practically turned the paper to dust - the individual pieces were like maybe 3/8" long by, I don't know, 1/32" wide? There's no freaking way you could put these back together.

    And if that wasn't good enough, one ship I was on had a paper mulcher. You threw in the paper you wanted destroyed, and it ground it up with water into a sodden, pulpy gray mass. There was nothing TO put back together after this process.

  13. Some folks in Germany have done this already . . . by PolygamousRanchKid+ · · Score: 3, Informative

    For piecing together shredded East Germany Secret Police (Stasi) documents: http://www.time.com/time/business/article/0,8599,1983287,00.html

    Maybe DARPA needs to take a trip to Germany . . .

    --
    Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
  14. Re:scan, edge detect, match by JustinOpinion · · Score: 2

    Yeah well there's a difference between theory and practice.

    Actually many of the great successes of AI (and even then some would debate how great they've been) are simple-sounding in principle but tough to get right. Things like route planning (just start a directed random walk from the start and finish and explore the graph until they connect to each other), web search (just weight results by popularity/links), document search (just show anything with a partial match), OCR (just threshold the image and match pixels to a database of font characters), voice recognition (just break it up into phonemes and look it up in a pronunciation dictionary), voice synthesis (just pre-record some phonemes and stitch them together), image recognition (just tag a bunch of images and train a neural net), and so on.

    They all sound simple enough. But for an actual implementation to be successful, there are tons of pitfalls and gotchas and real-world ambiguities that need to be figured out. There's then whole other layer of tweaking to get a reasonable idea to run in a reasonable amount of time: many problems can be brute-forced but people typically don't want to wait forever for the answer, so ingenious algorithms for pruning the search tree or efficiently exploring the parameter space have to be designed.

    Point being, don't assume this is as easy as it sounds. If it were, then we wouldn't even be discussing it (and no one would bother using shredders).

  15. Re:Dumbasses by tomhudson · · Score: 3, Funny

    Anonymous Coward doesn't need a low UID, because Anonymous Coward doesn't even have one.

    FYI: A.C's user id is 666

  16. Re:Couldn't be misued by fastest+fascist · · Score: 2

    One person's "nefarious" is another's "good".

  17. Re:Should we really be helping them with this? by fastest+fascist · · Score: 2

    Ask not what your country can do for you, ask how you can de-shred documents for your country.

  18. Re:Dumbasses by Spectre · · Score: 2

    I know you are a 7 digit...

    Would someone with a 5 digit UID please show up and tell this guy he's fucking stupid? By his own logic he'd have to agree.

    I'll have to see if I can find someone young enough to have a 5-digit ...

    but in any case, yeah, UID does not equate to much!

    --
    "Flame away, I wear asbestos underwear"
  19. Re:This is easy, I saw it on TV... by Smallpond · · Score: 2

    I think unshred is the next menu selection after uncrop.