Slashdot Mirror


Insecure Hadoop Servers Expose Over 5 Petabytes of Data (bleepingcomputer.com)

An anonymous reader quotes the security news editor at Bleeping Computer: Improperly configured HDFS-based servers, mostly Hadoop installs, are exposing over five petabytes of information, according to John Matherly, founder of Shodan, a search engine for discovering Internet-connected devices. The expert says he discovered 4,487 instances of HDFS-based servers available via public IP addresses and without authentication, which in total exposed over 5,120 TB of data.

According to Matherly, 47,820 MongoDB servers exposed only 25 TB of data. To put things in perspective, HDFS servers leak 200 times more data compared to MongoDB servers, which are ten times more prevalent... The countries that exposed the most HDFS instances are by far the US and China, but this should be of no surprise as these two countries host over 50% of all data centers in the world.

51 comments

  1. dumbass millennials by Anonymous Coward · · Score: 3, Insightful

    And yet companies keep hiring younger people and getting rid of experienced pros that understand security

    also why is the article making it sound like a Hadoop issue when it's clearly the dumbass millennials that configured these so poorly?

    1. Re:dumbass millennials by Anonymous Coward · · Score: 2, Interesting

      At my company, some idiot developer used a public facing URL to put PDFs of our customers' health insurance claims so that he didn't have to write an on-demand report generator to display that same information in an HTTPs session. Even though the file names were pseudo-random, Yahoo quickly crawled it and made the information searchable. It went on for years until a customer called in and asked why his information was found on a Yahoo search.

      That inexpensive off-shore developer cost the company millions....

    2. Re:dumbass millennials by Anonymous Coward · · Score: 0

      Man, if companies just scheduled regular vulnerability scans on their IP space, this stuff would easily be caught. (or even nmap) My guess is millennial are just as guilty as "seasoned pros" for these issues too.

      In fact the issue is probably more complicated. i.e no FW's setup, no DMZ's for front end public servers or no special private zones for "internal servers". Maybe Sysadmins are in charge of FW's, and get lazy / don't understand importance of security on the FW layer, or forget to configure IPTABLES, etc, etc....

    3. Re:dumbass millennials by Gravis+Zero · · Score: 1

      why is the article making it sound like a Hadoop issue when it's clearly the dumbass millennials that configured these so poorly?

      Baby Boomers - Destroying the ecosystem.
      Gen X - Destroying the global economic system.
      Millennials - Not giving any fucks because they are the worst paid generation.

      I'm glad you're focused on the the right things here. ;)

      --
      Anons need not reply. Questions end with a question mark.
    4. Re:dumbass millennials by Nkwe · · Score: 1

      At my company, some idiot developer used a public facing URL to put PDFs of our customers' health insurance claims so that he didn't have to write an on-demand report generator to display that same information in an HTTPs session. Even though the file names were pseudo-random, Yahoo quickly crawled it and made the information searchable.

      So not only was private information made publicly available, the PDF files were in a directory that was marked browseable by the web server? That's extra nice.

    5. Re:dumbass millennials by Anonymous Coward · · Score: 0

      Damn straight! Back in the olden days with Perl or C for CGI, we KNEW about security! These kids got spoiled with these newfangled "languages" that are nothing but packaged real languages that included training wheels. They think that "programming" in these training wheel languages, it makes them "web developers". They even have cutesy child TV names like "Ruby on Rails".

      And while they are at their computers, sucking on their RedBull pacifiers, and patting themselves on the backs for being "lite" or whatever they call themselves when they think they are smart, we old farts got to clean up their mess. These kids, they shit in their diapers as well as everyone elses, and then claim they are God's gift and there's no room for old folks - and we clean up after them!!

      Geeze!

      It wouldn't be so bad if I weren't interrupted from my Matlock marathons. And when it's Banana pudding night, god help you!!

    6. Re:dumbass millennials by Narcocide · · Score: 1, Troll

      Because nobody competent would be using Hadoop in the first place.

    7. Re:dumbass millennials by Anonymous Coward · · Score: 0

      so where was the code review

    8. Re: dumbass millennials by Anonymous Coward · · Score: 0

      Code reviews are irrelevant when it's an incompetent third worlder reviewing the code of other incompetent third worlders. Code reviews require competence.

    9. Re: dumbass millennials by Anonymous Coward · · Score: 0

      Another problem is that the industry hasn't been using the Rust programming language as much as it should be. Rust makes it super easy to create secure software. All new software should be written in Rust. All existing software should be rewritten in Rust, or at the very least it should start using Rust for any new features. That is what's happening with Firefox, which is getting new components written in Rust. This is the only way we will fix insecure software.

    10. Re:dumbass millennials by Anonymous Coward · · Score: 0

      And nobody reviewed it, noticed it during testing or was alarmed by a large number of crawler user agent requests?
      Sounds like the company didn't give a shit about privacy.

    11. Re: dumbass millennials by Anonymous Coward · · Score: 0

      That's a company failure more than anything. Offshoring is fine if you have a team of competent senior engineers reviewing their work.

    12. Re: dumbass millennials by arglebargle_xiv · · Score: 1

      Thanks to H1Bs you don't need to offshore to get the incompetence, now you can bring it in-house.

      (That's sort of tongue in cheek, but based on numerous real-world experiences interacting with low-wage devs brought in from overseas. Some of these guys should have been paying their employers rather than being paid, to make up for the amount of damage they were causing).

    13. Re:dumbass millennials by Anonymous Coward · · Score: 0

      This is age-ism, the definition of it.

    14. Re: dumbass millennials by Anonymous Coward · · Score: 0

      Surprised the JDBC Port wasn't exposed on the Internet

    15. Re: dumbass millennials by Anonymous Coward · · Score: 0

      All I heard in the security review was that Goku was handling it

    16. Re: dumbass millennials by ceoyoyo · · Score: 1

      Nice. That's how I feel about C and Perl.

    17. Re:dumbass millennials by Anonymous Coward · · Score: 0

      I got fired from a job as a doctor at a large regional hospital because I pointed out to the morons that they had loads of confidential information on a publically accessible Windows file share, and every ER record for the last seven years on another publically accessible fileshare. Oh, and all their security cameras were wide open too, with default passwords. Instead of fixing it, they just cancelled my contract.

  2. how does it work? by Anonymous Coward · · Score: 0

    this Hadoop thing? I am far too removed from such things, please explain what's it for and can a lowly developer use it for personal projects?

    1. Re:how does it work? by guruevi · · Score: 1

      It's a distributed data storage/processing system. Whether it's useful depends on your project.

      A good programmer makes sure that their storage and database backend is replaceable and good backend projects make sure that they support at least somewhat standard methods and functions.

      The problem with most of these implementations is they're relatively expensive for small setups. You need 3 dedicated nodes at least to make it "work" well enough and it still has huge amounts of overhead compared to a classic system. They really become useful when you can afford and/or need hundreds of nodes spanning multiple data centers.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    2. Re:how does it work? by Narcocide · · Score: 1

      Imagine you wanted a database to search petabytes of terabyte-sized files. Now imagine you learned nothing about databases and only knew Java, so naturally started over from scratch, blissfully free of any external normalizing influences.

    3. Re:how does it work? by Hognoxious · · Score: 1

      this Hadoop thing?

      Very badly, don't you remember when you had to train him?

      Hang on, that was Hardeep.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    4. Re:how does it work? by Anonymous Coward · · Score: 0

      Which database can process the kind of queries done on Hadoop for lower operational cost? Let's say we are talking about 10PB of data.

      I know of some commercial solutions that either require more human attention or they are proprietary solutions which do not scale as cheaply.

      Tell me something new, and you might change the world.

    5. Re:how does it work? by Narcocide · · Score: 1

      It's nothing new. Read a book on how to optimize mysql. I've worked with Hadoop myself. Any notion that it's better for ANYTHING other than creating a giant boondoggle is utter fiction.

  3. Maybe the data is supposed to be public? by Anonymous Coward · · Score: 0

    I don't suppose that occurred to John Matherly?

    1. Re: Maybe the data is supposed to be public? by Brockmire · · Score: 1

      He crawled the Yellow Pages servers.

  4. MongoDB is webscale by goombah99 · · Score: 1

    WHy not use MongoDB? MongoDB is a webscale database that scales.
    https://www.youtube.com/watch?...

    --
    Some drink at the fountain of knowledge. Others just gargle.
  5. NOt saying this isn't a problem, but by Anonymous Coward · · Score: 1

    Big Data is, by definition, huge volumes of mundane data, usually in unstructured or semi-structured format, which have a very low density of interesting or useful information. But, when aggregated over 100's of TB, some useful patterns can sometimes be gleaned. Now, are the hackers going to ship the terabytes of data out of the datacenter and hope nobody notices what amounts to a DoS attack?

    Yes, there should be protection, but it's like heavy equipment and materials being left unattended at a construction site overnight, because it's farfetched or economical for crooks to steal them.

    1. Re: NOt saying this isn't a problem, but by Anonymous Coward · · Score: 0

      You wouldn't be happy if it were your medical records that were inadvertently made public, revealing to the entire world that you suffer from severe micropenis syndrome. You would likely be mocked and ridiculed for the rest of your life.

    2. Re: NOt saying this isn't a problem, but by Anonymous Coward · · Score: 0

      Yeah, meanwhile *your* severe micropenis syndrome related medical records are safe on my private server (as long as you keep up the monthly payment - which BTW increases from 0.03 bitcoin to 0.04 as from July).

  6. Maybe the data is supposed to be public? by Anonymous Coward · · Score: 0

    I don't suppose that occurred to John Matherly? Perhaps John is the dumbass millennial here?

  7. A hacker stealing a copy of that data by tgibson · · Score: 1

    will have to make a run to Best Buy for a few more thumb drives.

  8. Those peta bites -- by Anonymous Coward · · Score: 0

    were they filled with yummy falafel and tahini sauce?

  9. lets just get rid of privacy by Anonymous Coward · · Score: 0

    lets just get rid of privacy and post everything publicly.

  10. Hadoop = Insecure by Anonymous Coward · · Score: 1

    My experience is a couple of years old, but when I did a deep dive into Hadoop a serious flaw quickly came to light:

    Hadoop was NEVER designed for security.

    Want to own a Hadoop server? Create an a hadoop account on your own box and connect to it. Bang, you are "root" on an Hadoop install.

    Hadoop installs should only be implemented in a secured environment and use restricted VPN connections into it. Anyone who allows the "Internet" to connect to a Hadoop install is an idiot.

    This security "flaw" in design is the main reason I gave up on Hadoop as a fad that would disappear in a few years.

    1. Re: Hadoop = Insecure by ceoyoyo · · Score: 1

      I don't see that as being a flaw at all. Most software should be written like that.

      The problem is with the people who use the software assuming that random special purpose projects like Hadoop have planned for security or are competent to do so. Just assume it's all insecure unless there's good reason to think otherwise, and access it via vpn or ssh.

  11. Who's going to steal or even sort through 5PB? by Anonymous Coward · · Score: 0

    Seriously.

    Aside from a nation state or similar actor, who already know everything.. more insecure data might be better for us all.

    5PB.

    Damn.

    1. Re: Who's going to steal or even sort through 5PB? by Anonymous Coward · · Score: 0

      Maybe someone will make a copy, then there'll be 10 PB to sort through.

    2. Re: Who's going to steal or even sort through 5PB? by Brockmire · · Score: 1

      The download link said "the fappening 5" .

  12. I really don't get this part by 93+Escort+Wagon · · Score: 1

    "According to Matherly, 47,820 MongoDB servers exposed only 25 TB of data. To put things in perspective, HDFS servers leak 200 times more data compared to MongoDB servers, which are ten times more prevalent..."

    Was this statement actually intended as a bragging point for MongoDB? I've looked at this statement several times, and I can't come up with any other spin. Seriously - if somebody threw this line out there trying to sell me on his preferred piece of software, I'd immediately leave and vow to never use either HDFS or MongoDB.

    --
    #DeleteChrome
    1. Re:I really don't get this part by Anonymous Coward · · Score: 0

      I think it was just a reference of how big of a problem it is with so few clusters involved. Some people may look at it and say "oh, only a few thousand clusters are visible". I hope the intent was to show that with about a tenth of the clusters (or do they really mean individual servers with these numbers? Because that inherently means fewer clusters are visible and accessible clusters is the useful number), would-be hackers have access to a lot more data and, what is unmentioned, is that data is probably less filtered than whatever exists in MongoDB (or whatever other data store you may compare it against) due to the nature of Hadoop jobs. This means that those same would-be hackers have the potential to get even scarier data.

  13. And this is what happens... by DirtyFly · · Score: 1

    When you replace knowledgeable workers with a lets google it up mob. People use haddop just because big data, but in reality they don't know how to implement it correctly.

  14. Code is like a leaky boat by BoRegardless · · Score: 1

    Fix it now or it costs you 2 orders of magnitude more when the (code) boat sinks.

  15. How many are supposed to be accessible? by drinkypoo · · Score: 1

    How many of those servers are actually supposed to be accessible, and how many of them are accessible only because they exist on a network with insufficient protection and oversight?

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    1. Re:How many are supposed to be accessible? by Anonymous Coward · · Score: 0

      Yeah, that is what I said above:

      Maybe the data is supposed to be public? I don't suppose that occurred to John Matherly?

      But that information would not get clicks. So I can conclude this story is nothing more than clickbait for ad impressions.

    2. Re:How many are supposed to be accessible? by Anonymous Coward · · Score: 0

      It's possible that a good amount of the data is meant to be accessible, but there's no way that anyone is intentionally allowing their Hadoop clusters to be publicly accessible.

      The nefarious nature of what you can do with accessible data stores is far too high for anyone to intentionally leave them public excluding honeypot setups.

  16. Not really a problem by PPH · · Score: 4, Funny

    ... because, to date, nobody has figured out how to get data out of a Hadoop database.

    --
    Have gnu, will travel.
    1. Re:Not really a problem by Anonymous Coward · · Score: 0

      Hadoop database.

      Therein lies your problem. Hadoop is not a database. It's a filesystem and an execution framework, on which databases can run. So can anything else.

    2. Re:Not really a problem by Anonymous Coward · · Score: 0

      Don't get me startedwith their HQL crap

    3. Re:Not really a problem by Anonymous Coward · · Score: 0

      All this time, I thought my data is safe in HDFS ---

  17. Big Data As a Public Service - BDaaPS by Anonymous Coward · · Score: 0

    Never attribute to malice or stupidity that which may be attributed to philanthropy... those cycles weren't being used for anything better anyway...