Insecure Hadoop Servers Expose Over 5 Petabytes of Data (bleepingcomputer.com)
An anonymous reader quotes the security news editor at Bleeping Computer:
Improperly configured HDFS-based servers, mostly Hadoop installs, are exposing over five petabytes of information, according to John Matherly, founder of Shodan, a search engine for discovering Internet-connected devices. The expert says he discovered 4,487 instances of HDFS-based servers available via public IP addresses and without authentication, which in total exposed over 5,120 TB of data.
According to Matherly, 47,820 MongoDB servers exposed only 25 TB of data. To put things in perspective, HDFS servers leak 200 times more data compared to MongoDB servers, which are ten times more prevalent... The countries that exposed the most HDFS instances are by far the US and China, but this should be of no surprise as these two countries host over 50% of all data centers in the world.
According to Matherly, 47,820 MongoDB servers exposed only 25 TB of data. To put things in perspective, HDFS servers leak 200 times more data compared to MongoDB servers, which are ten times more prevalent... The countries that exposed the most HDFS instances are by far the US and China, but this should be of no surprise as these two countries host over 50% of all data centers in the world.
And yet companies keep hiring younger people and getting rid of experienced pros that understand security
also why is the article making it sound like a Hadoop issue when it's clearly the dumbass millennials that configured these so poorly?
It's a distributed data storage/processing system. Whether it's useful depends on your project.
A good programmer makes sure that their storage and database backend is replaceable and good backend projects make sure that they support at least somewhat standard methods and functions.
The problem with most of these implementations is they're relatively expensive for small setups. You need 3 dedicated nodes at least to make it "work" well enough and it still has huge amounts of overhead compared to a classic system. They really become useful when you can afford and/or need hundreds of nodes spanning multiple data centers.
Custom electronics and digital signage for your business: www.evcircuits.com
WHy not use MongoDB? MongoDB is a webscale database that scales.
https://www.youtube.com/watch?...
Some drink at the fountain of knowledge. Others just gargle.
Big Data is, by definition, huge volumes of mundane data, usually in unstructured or semi-structured format, which have a very low density of interesting or useful information. But, when aggregated over 100's of TB, some useful patterns can sometimes be gleaned. Now, are the hackers going to ship the terabytes of data out of the datacenter and hope nobody notices what amounts to a DoS attack?
Yes, there should be protection, but it's like heavy equipment and materials being left unattended at a construction site overnight, because it's farfetched or economical for crooks to steal them.
Imagine you wanted a database to search petabytes of terabyte-sized files. Now imagine you learned nothing about databases and only knew Java, so naturally started over from scratch, blissfully free of any external normalizing influences.
will have to make a run to Best Buy for a few more thumb drives.
My experience is a couple of years old, but when I did a deep dive into Hadoop a serious flaw quickly came to light:
Hadoop was NEVER designed for security.
Want to own a Hadoop server? Create an a hadoop account on your own box and connect to it. Bang, you are "root" on an Hadoop install.
Hadoop installs should only be implemented in a secured environment and use restricted VPN connections into it. Anyone who allows the "Internet" to connect to a Hadoop install is an idiot.
This security "flaw" in design is the main reason I gave up on Hadoop as a fad that would disappear in a few years.
"According to Matherly, 47,820 MongoDB servers exposed only 25 TB of data. To put things in perspective, HDFS servers leak 200 times more data compared to MongoDB servers, which are ten times more prevalent..."
Was this statement actually intended as a bragging point for MongoDB? I've looked at this statement several times, and I can't come up with any other spin. Seriously - if somebody threw this line out there trying to sell me on his preferred piece of software, I'd immediately leave and vow to never use either HDFS or MongoDB.
#DeleteChrome
When you replace knowledgeable workers with a lets google it up mob. People use haddop just because big data, but in reality they don't know how to implement it correctly.
Fix it now or it costs you 2 orders of magnitude more when the (code) boat sinks.
How many of those servers are actually supposed to be accessible, and how many of them are accessible only because they exist on a network with insufficient protection and oversight?
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Have gnu, will travel.
Very badly, don't you remember when you had to train him?
Hang on, that was Hardeep.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
He crawled the Yellow Pages servers.
The download link said "the fappening 5" .
It's nothing new. Read a book on how to optimize mysql. I've worked with Hadoop myself. Any notion that it's better for ANYTHING other than creating a giant boondoggle is utter fiction.