Insecure Hadoop Servers Expose Over 5 Petabytes of Data (bleepingcomputer.com)
An anonymous reader quotes the security news editor at Bleeping Computer:
Improperly configured HDFS-based servers, mostly Hadoop installs, are exposing over five petabytes of information, according to John Matherly, founder of Shodan, a search engine for discovering Internet-connected devices. The expert says he discovered 4,487 instances of HDFS-based servers available via public IP addresses and without authentication, which in total exposed over 5,120 TB of data.
According to Matherly, 47,820 MongoDB servers exposed only 25 TB of data. To put things in perspective, HDFS servers leak 200 times more data compared to MongoDB servers, which are ten times more prevalent... The countries that exposed the most HDFS instances are by far the US and China, but this should be of no surprise as these two countries host over 50% of all data centers in the world.
According to Matherly, 47,820 MongoDB servers exposed only 25 TB of data. To put things in perspective, HDFS servers leak 200 times more data compared to MongoDB servers, which are ten times more prevalent... The countries that exposed the most HDFS instances are by far the US and China, but this should be of no surprise as these two countries host over 50% of all data centers in the world.
And yet companies keep hiring younger people and getting rid of experienced pros that understand security
also why is the article making it sound like a Hadoop issue when it's clearly the dumbass millennials that configured these so poorly?
this Hadoop thing? I am far too removed from such things, please explain what's it for and can a lowly developer use it for personal projects?
I don't suppose that occurred to John Matherly?
WHy not use MongoDB? MongoDB is a webscale database that scales.
https://www.youtube.com/watch?...
Some drink at the fountain of knowledge. Others just gargle.
Big Data is, by definition, huge volumes of mundane data, usually in unstructured or semi-structured format, which have a very low density of interesting or useful information. But, when aggregated over 100's of TB, some useful patterns can sometimes be gleaned. Now, are the hackers going to ship the terabytes of data out of the datacenter and hope nobody notices what amounts to a DoS attack?
Yes, there should be protection, but it's like heavy equipment and materials being left unattended at a construction site overnight, because it's farfetched or economical for crooks to steal them.
I don't suppose that occurred to John Matherly? Perhaps John is the dumbass millennial here?
will have to make a run to Best Buy for a few more thumb drives.
were they filled with yummy falafel and tahini sauce?
lets just get rid of privacy and post everything publicly.
My experience is a couple of years old, but when I did a deep dive into Hadoop a serious flaw quickly came to light:
Hadoop was NEVER designed for security.
Want to own a Hadoop server? Create an a hadoop account on your own box and connect to it. Bang, you are "root" on an Hadoop install.
Hadoop installs should only be implemented in a secured environment and use restricted VPN connections into it. Anyone who allows the "Internet" to connect to a Hadoop install is an idiot.
This security "flaw" in design is the main reason I gave up on Hadoop as a fad that would disappear in a few years.
Seriously.
Aside from a nation state or similar actor, who already know everything.. more insecure data might be better for us all.
5PB.
Damn.
"According to Matherly, 47,820 MongoDB servers exposed only 25 TB of data. To put things in perspective, HDFS servers leak 200 times more data compared to MongoDB servers, which are ten times more prevalent..."
Was this statement actually intended as a bragging point for MongoDB? I've looked at this statement several times, and I can't come up with any other spin. Seriously - if somebody threw this line out there trying to sell me on his preferred piece of software, I'd immediately leave and vow to never use either HDFS or MongoDB.
#DeleteChrome
When you replace knowledgeable workers with a lets google it up mob. People use haddop just because big data, but in reality they don't know how to implement it correctly.
Fix it now or it costs you 2 orders of magnitude more when the (code) boat sinks.
How many of those servers are actually supposed to be accessible, and how many of them are accessible only because they exist on a network with insufficient protection and oversight?
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Have gnu, will travel.
Never attribute to malice or stupidity that which may be attributed to philanthropy... those cycles weren't being used for anything better anyway...