Slashdot Mirror


OCR Software Dev Abbyy Exposes 200,000 Customer Documents (bleepingcomputer.com)

A misconfigured MongoDB server belonging to Abbyy, an optical character recognition software developer, allowed public access to customer files. From a report: Independent security researcher Bob Diachenko discovered the database on August 19 hosted on the Amazon Web Services (AWS) cloud platform. It was 142GB in size and it allowed access without the need to log in. The sizeable database included scanned documents of the sensitive kind: contracts, non-disclosure agreements, internal letters, and memos. Included were more than 200,000 files from Abbyy customers who scanned the data and kept it at the ready in the cloud. "Some collection names like 'documentRecognition,' or 'documentXML' hinted that database would be part of a data recognition company infrastructure," Diachenko writes in a blog post today.

14 of 25 comments (clear)

  1. All hail the cloud by nyet · · Score: 1

    Don't bother keeping anything onsite. Thanks, AWS.

    1. Re: All hail the cloud by datavirtue · · Score: 1

      I guess. It is usually just heads down developers driving to get shit done. No time for configuring servers or firewalls. Much to blame are the server developers who ship misconfigured servers that work. These servers should force you through a wizard that requires prod level configuration. But alas, a competing product would focus on ease of use and people would gravitate to it. Pretty much the case with Mongo and the like.

      --
      I object to power without constructive purpose. --Spock
  2. MongoDB is Web Scale!! by haruchai · · Score: 1
    --
    Pain is merely failure leaving the body
    1. Re:MongoDB is Web Scale!! by rsilvergun · · Score: 1
      --
      Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
  3. No surprise here by imidan · · Score: 3, Insightful

    I just assume that any online (cloud based or not) OCR or fax bridge site is going to store a copy of my document in an insecure way. I assume that employees of the service will have access to view my document. I haven't thought too much about them exposing my documents to the public, but it's not a huge step from what I already assumed about them. Anyway, the result is that I don't send anything sensitive or with information I wouldn't want publicly known through online OCR or fax. Because it would be crazy to upload my private sensitive documents to randos on the Internet and assume that they'll never be seen.

    1. Re:No surprise here by grep+-v+'.*'+* · · Score: 1

      it would be crazy to upload my private sensitive documents to randos on the Internet and assume that they'll never be seen.

      ?? I thought if you uploaded it to the internet you WANTED it to be seen, that was the whole POINT. Otherwise what's it doing up there?

      Oh, you want security? Keep it directly under your control then and watch it. Better yet, encrypt it at rest and watch out for temp files and bad janitors and evil maids.

      (I thought that a bad janitor forgot to empty the trash and an evil maid put the horse head in the bed in Godfather. Live and learn.)

      --
      If the universe is someone's simulation -- does that mean the stars are just stuck pixels?
  4. the DB itself was on the web? and not under proxy by Joe_Dragon · · Score: 1

    the DB itself was on the web? and not under some kind of proxy?

  5. Outsourcing to cloud service was successful by ffkom · · Score: 3, Interesting

    Notice how those who decided to have the scanning process outsourced to "somewhere in the cloud" will consider this a confirmation of their success. Now all the blame is assigned to Abby, and no blame assigned to them - exactly what they wanted to achieve, nothing less.

  6. Re: the DB itself was on the web? and not under pr by datavirtue · · Score: 1

    Yeah that's the thing...it is so easy to use and already comes with an API. The only reason your typical developer would put anything behind a proxy is to graft an API onto it.

    --
    I object to power without constructive purpose. --Spock
  7. A naive question by Snotnose · · Score: 1

    Couldn't a bunch of AWS customers band together to hire a security researcher to check their permissions? Or even Amazon itself on behalf of their clients?

    Granted, there are issues of what companies want public and what they want private. I'm guessing anything bigger than a gig might trigger a warning, as would anything with personal data.

    Then again, I've never used the cloud for anything more than transferring stuff from my phone to my PC, or vice versa, and have never used AWS. So I have no real experience with the issues, but am willing to bloviate on them, just like Trump.

    1. Re:A naive question by fluffernutter · · Score: 1

      Why doesn't Amazon have a message that says "Your data is not protected, are you sure you want to do this?"

      --
      Laws are rules for the court, but merely a bottom bar to hit for life. Think beyond laws in your actions always.
    2. Re:A naive question by coofercat · · Score: 1

      That's the crazy thing - AWS has the concept of a "VPC", and it has the concept of "public" and "private" subnets inside your VPC. If you put a VM in "private", it won't get an internet IP, and so instantly becomes inaccessible to the Internet. You don't need any fancy reviews or certifications for that - just a modicum of common sense. Hell, even if they'd used their app server as a jumpbox to get to their Mongo server, that would have been better than this.

      This wasn't an issue of an "incorrectly configured MongoDB" - this was an issue of utter incompetence at setting up the system in the first place, followed by the same incompetence setting up MongoDB.

  8. Unsurprising by Fencepost · · Score: 1

    My basic assumption for anything being OCR'd effectively free in the cloud through a software provider is that it's not safe. Could be sloppiness (as in this case), could be automated OCR+human verification.

    While I actually do have a couple of Abbyy programs installed (FineScanner Pro and Business Card Reader Pro), I've never actually made serious use of them. On the other hand, I do use Microsoft's Office Lens program which provides much of the same capabilities - but provided under the Office 365 bundling and with a lot more resources devoted to security.

    I used to have a couple more photo-to-scanned-document software packages as well, but I decided to dump anything like that from Russian companies some time back. Abbyy's presence is as much an oversight of "haven't cleaned in a while" as anything else.

    --
    fencepost
    just a little off
  9. Re:Why is it ALWAYS AWS by lastman71 · · Score: 1

    Or ... why is it always mongodb?