Slashdot Mirror


OCR Software Dev Abbyy Exposes 200,000 Customer Documents (bleepingcomputer.com)

A misconfigured MongoDB server belonging to Abbyy, an optical character recognition software developer, allowed public access to customer files. From a report: Independent security researcher Bob Diachenko discovered the database on August 19 hosted on the Amazon Web Services (AWS) cloud platform. It was 142GB in size and it allowed access without the need to log in. The sizeable database included scanned documents of the sensitive kind: contracts, non-disclosure agreements, internal letters, and memos. Included were more than 200,000 files from Abbyy customers who scanned the data and kept it at the ready in the cloud. "Some collection names like 'documentRecognition,' or 'documentXML' hinted that database would be part of a data recognition company infrastructure," Diachenko writes in a blog post today.

25 comments

  1. All hail the cloud by nyet · · Score: 1

    Don't bother keeping anything onsite. Thanks, AWS.

    1. Re: All hail the cloud by Anonymous Coward · · Score: 0

      Actually, thank the idiot millennials who are used to posting everything on the internet wide open.

    2. Re: All hail the cloud by datavirtue · · Score: 1

      I guess. It is usually just heads down developers driving to get shit done. No time for configuring servers or firewalls. Much to blame are the server developers who ship misconfigured servers that work. These servers should force you through a wizard that requires prod level configuration. But alas, a competing product would focus on ease of use and people would gravitate to it. Pretty much the case with Mongo and the like.

      --
      I object to power without constructive purpose. --Spock
  2. MongoDB is Web Scale!! by haruchai · · Score: 1
    --
    Pain is merely failure leaving the body
    1. Re:MongoDB is Web Scale!! by rsilvergun · · Score: 1
      --
      Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
  3. No surprise here by imidan · · Score: 3, Insightful

    I just assume that any online (cloud based or not) OCR or fax bridge site is going to store a copy of my document in an insecure way. I assume that employees of the service will have access to view my document. I haven't thought too much about them exposing my documents to the public, but it's not a huge step from what I already assumed about them. Anyway, the result is that I don't send anything sensitive or with information I wouldn't want publicly known through online OCR or fax. Because it would be crazy to upload my private sensitive documents to randos on the Internet and assume that they'll never be seen.

    1. Re:No surprise here by grep+-v+'.*'+* · · Score: 1

      it would be crazy to upload my private sensitive documents to randos on the Internet and assume that they'll never be seen.

      ?? I thought if you uploaded it to the internet you WANTED it to be seen, that was the whole POINT. Otherwise what's it doing up there?

      Oh, you want security? Keep it directly under your control then and watch it. Better yet, encrypt it at rest and watch out for temp files and bad janitors and evil maids.

      (I thought that a bad janitor forgot to empty the trash and an evil maid put the horse head in the bed in Godfather. Live and learn.)

      --
      If the universe is someone's simulation -- does that mean the stars are just stuck pixels?
  4. I use OCRKit locally on the Mac by Anonymous Coward · · Score: 0

    Since a long time as years ago there were not many OCR on Mac to choose from: https://ocrkti.com So far it just works for me, blazing fast, too!

  5. Don't do it by Artem+S.+Tashkinov · · Score: 0

    scanned documents of the sensitive kind: contracts, non-disclosure agreements, internal letters, and memos

    I have absolutely zero pity for the companies/people who uploaded such data to abbyy's servers. They perfectly knew what they were doing. You don't store private data unecrypted in the cloud unless you want to share it with the entire world.

    1. Re:Don't do it by Anonymous Coward · · Score: 0

      Wow, /. moderation system is impossible to grasp: basically the same comment as the I'm replying to was modded insightful while this one was downvoted.

  6. the DB itself was on the web? and not under proxy by Joe_Dragon · · Score: 1

    the DB itself was on the web? and not under some kind of proxy?

  7. Re:the DB itself was on the web? and not under pro by Anonymous Coward · · Score: 0

    Maybe it was too hard to properly configure NAT for the poor folks, so they just assigned elastic IP to everything with traffic allowed from 0.0.0.0/0.

  8. Outsourcing to cloud service was successful by ffkom · · Score: 3, Interesting

    Notice how those who decided to have the scanning process outsourced to "somewhere in the cloud" will consider this a confirmation of their success. Now all the blame is assigned to Abby, and no blame assigned to them - exactly what they wanted to achieve, nothing less.

    1. Re:Outsourcing to cloud service was successful by Anonymous Coward · · Score: 0

      In fairness, is an incompetent company with terrible security a cloud issue? Or it is an incompetent company with terrible security issue?

      I mean, sure, you've put it outside of your firewall for everyone to see, but all that really means is you took something horribly insecure, moved it outside of your own network, and left it horribly insecure.

      At the end of the day, this company are the ones who set this shit up, who failed to secure it, and who are now having to explain to their customers why they're an incompetent company with terrible security.

      I think the cloud is utterly stupid, and while it can make things worse like in this case, it's still the responsibility of companies to not be complete fucking morons when it comes to security.

      Me, I would argue that the 'security team' at Abbyy were incompetent to do their jobs. And at the end of the day, they bear the entire responsibility.

      If they could disable access to this after they got told they had shit security, they sure as fuck could, and should, have been able to do this before it was breached.

      Your damned right all the blame goes to this company, because it was their own stupidity, laziness, and incompetence which was the cause of it.

      The magical cloud isn't a substitute for not being too fucking stupid to secure stuff.

  9. Why is it ALWAYS AWS by Anonymous Coward · · Score: 0

    Anyone notice that these type of DB exposure issues almost ALWAYS happen on AWS. I'm not saying AWS in insecure because it's not but why is it always someone hosting on AWS that has this issue.

    I know they should not have to, but I wonder how much information would still be private if amazon pushed a nagging splash screen on the management portal reminding it's customers to actually secure their databases? Or even better run a basic security scan on their clients address space. Even a trivial scan will catch most of these unsecured databases.

    1. Re:Why is it ALWAYS AWS by lastman71 · · Score: 1

      Or ... why is it always mongodb?

  10. Re: the DB itself was on the web? and not under pr by datavirtue · · Score: 1

    Yeah that's the thing...it is so easy to use and already comes with an API. The only reason your typical developer would put anything behind a proxy is to graft an API onto it.

    --
    I object to power without constructive purpose. --Spock
  11. A naive question by Snotnose · · Score: 1

    Couldn't a bunch of AWS customers band together to hire a security researcher to check their permissions? Or even Amazon itself on behalf of their clients?

    Granted, there are issues of what companies want public and what they want private. I'm guessing anything bigger than a gig might trigger a warning, as would anything with personal data.

    Then again, I've never used the cloud for anything more than transferring stuff from my phone to my PC, or vice versa, and have never used AWS. So I have no real experience with the issues, but am willing to bloviate on them, just like Trump.

    1. Re:A naive question by fluffernutter · · Score: 1

      Why doesn't Amazon have a message that says "Your data is not protected, are you sure you want to do this?"

      --
      Laws are rules for the court, but merely a bottom bar to hit for life. Think beyond laws in your actions always.
    2. Re:A naive question by coofercat · · Score: 1

      That's the crazy thing - AWS has the concept of a "VPC", and it has the concept of "public" and "private" subnets inside your VPC. If you put a VM in "private", it won't get an internet IP, and so instantly becomes inaccessible to the Internet. You don't need any fancy reviews or certifications for that - just a modicum of common sense. Hell, even if they'd used their app server as a jumpbox to get to their Mongo server, that would have been better than this.

      This wasn't an issue of an "incorrectly configured MongoDB" - this was an issue of utter incompetence at setting up the system in the first place, followed by the same incompetence setting up MongoDB.

  12. Unsurprising by Fencepost · · Score: 1

    My basic assumption for anything being OCR'd effectively free in the cloud through a software provider is that it's not safe. Could be sloppiness (as in this case), could be automated OCR+human verification.

    While I actually do have a couple of Abbyy programs installed (FineScanner Pro and Business Card Reader Pro), I've never actually made serious use of them. On the other hand, I do use Microsoft's Office Lens program which provides much of the same capabilities - but provided under the Office 365 bundling and with a lot more resources devoted to security.

    I used to have a couple more photo-to-scanned-document software packages as well, but I decided to dump anything like that from Russian companies some time back. Abbyy's presence is as much an oversight of "haven't cleaned in a while" as anything else.

    --
    fencepost
    just a little off
  13. Don't get prod to devs by Anonymous Coward · · Score: 0

    Why would a dev have prod access?

    Same mistake made for decades but people still have not learned.

    Go fast, get to the cloud! It only seems fast because people are ignoring security and proven practices because it is "cloud"