VMware's Serengeti Brings Hadoop To Virtual, Cloud Environments
Nerval's Lobster writes "VMware's Serengeti is a new open-source project for deploying Apache Hadoop in virtual and cloud environments. Serengeti 0.5 is available as a free download under the Apache 2.0 license. It has been designed as distro-neutral, with support for Apache 1.0, CDH3, Hortonworks 1.0 and Greenplum HD 1.0. Of course, VMware isn't the only company seeking to leverage the increased interest in Hadoop. In June alone, midsize IT vendors such as Datameer, Karmasphere, and Hortonworks have all announced platforms that utilize the framework in some way. Research firm IDC recently predicted that worldwide revenues from Hadoop and MapReduce will hit $812.8 million in 2016, up from $77 million in 2011."
So, if I've got only one server, then up to now, I would have to just run an application on that server.
But now, with only a little overhead, I can pretend to be running the same application in a distributed manner on a cluster, even though it's actually still running on the single server.
I have to admit this is pretty awesome.
Sadly, no mod points today.
Maybe I'm just getting old, but how many applications for distributed computing actually exist? Can't most jobs be done on a single computer, or are programmers just getting too lazy to optimize?
From TFS:
Notice that the revenue is directed toward the few companies supporting and extending Hadoop. If you're working for one of those companies, congratulations. If you're working for one of the companies that is spending its money on this new shiny thing, you're probably in for a ride (one way or another). The technology is definitely good, I'll grant you that. But it is not the solution (or, not a very good solution) for many of the problems IT/data shops have. It really seems that a lot of people are jumping on the Hadoop bandwagon because "everyone else is getting it" and not because it will solve particular, concrete, existing problems. Or, it will solve exactly one relatively small, concrete, existing problem while erecting a complex infrastructure that must be supported for several years, making it more of a PITA than a solution.
Anyway, back to my original point: I think this revenue citation is more of an indication of a technology bubble and successful marketing than anything else. The price IT will pay for that bubble will probably far exceed the original cost.
I've been out of the virtualization world for a long time so this may be a dumb question but could someone explain me through the following:
Virtualization was created to share I/O and storage utilization of one hardware setup with multiple O/S installations thereby making sure that I/O and storage were being fully utilized with great management tools on top.
Hadoop was created to split the handling of gobs (yes gobs) of I/O and storage across multiple O/S / Apache installations thereby utilizing every available bit of bandwidth across multiple hardware setups (with bonuses for redundancy etc.).
How does virtualizing O/S's for Hadoop installations make any sense when virtualization just adds I/O overhead for each hardware setup that Hadoop on a single single O/S would fully utilize anyway? Pardon my ignorance but it doesn't make sense to me. Is it the speed at which you can drop in a functioning installation? One VMWare instance per hardware setup?
I've not tried the tech I will mention yet, but if your motherboard includes an IOMMU you can use physical networking and storage controllers on a virtual machine. it works as long the "passed-through" device is PCIe or PCI, be it onboard or on a card. so you can have racks of physical servers, with for instance one VM on each used as a node for your distributed file system. Virtualization still is useful for using the remainder of your physical server's capacity for other purposes. Or so I imagine it to be, as an armchair datacenter IT worker.
Agreed I initially read that they were loosing money, until I re-read it.
http://www.hadooponazure.com/