Researchers Create Database-Hadoop Hybrid
ericatcw writes "'NoSQL' alternatives such as Hadoop and MapReduce may be uber-cheap and scalable, but they remain slower and clumsier to use than relational databases, say some. Now, researchers at Yale University have created a database-Hadoop hybrid that they say offers the best of both worlds: fast performance and the ability to scale out near-indefinitely. HadoopDB was built using PostGreSQL, though MySQL has also successfully been swapped in, according to Yale computer science professor Daniel Abadi, whose students built this prototype."
Uber-cheap is not a word, and it doesn't even make sense because you're saying it's "above cheap". Stop making up stupid shit.
It's PostgreSQL... but I sympathize with the mixed case confusion and refer you to this Postgres vs PostgreSQL permathread.
The Army reading list
If both the performance and scalability is as good as described I can safely say that this is the most important thing of the decade and not only for DBMS.
Handling large portions of data would get cheaper by an order of magnitude at least and scaling out would be way cheaper than now as well. I do hope it's true.
I thought Essbase was supposed to be one of the best databases for managing too much information. Is this supposed to be an alternative, or act as something in-between using Essbase and a mysql server?
It won't deliver. In the mean time for those of us living and working in the real world, hard-drives will be bigger and faster, file systems will get better, and SSDs will start to shit all over spinning platters.
The grad students do all the work, and the professor takes all the credit. Anyone can come up with ideas, the real work is in actually getting things done. This is the reason I stopped grad school with my MS even though I LOVE computer science, more than anyone i've ever met.
My blog
Scalability is one thing, but what we appreciate in SQL-free databases is also that they don't require SQL.
When what we want is just to retrieve a record, calling get(id) is way easier and more secure than building an SQL statement, and way cheaper than using an ORM.
The Tokyo Cabinet API is absolutely excellent in this regard. And there's no need to learn yet another domain-specific language like SQL, just use the language you use for the rest of the app.
Now, SQL-zealots would troll "but how would you do with ?".
And yes, for complex requests as in data mining, SQL and XPath make sense. For people who aren't developpers, SQL makes sense as well. For interoperability with 3rd-party apps, SQL is also useful, just as FAT is still useful today in order to share filesystems between operating systems.
But for the rest of us, SQL is cumbersome. Databases like MongoDB make you achieve similar results in a more natural way instead of forcing you to learn SQL and to rethink everything in a tabular way.
{{.sig}}
It it will deliver it will change much. Not for your average blogger with a $10 hosting, wordpress and all his 100 readers but for all the folks that have sites successful enough to go beyond that a single DB server can deliver. Now you have to work really, really hard to make it all work with replication as pretty much no free CMS offers data sharding. Now you won't have to. Just get a DB cluster (as a service) that works out of the box with none/very little modification to the software you are using. The wall that they currently hit at the point they have to invest loads of money to continue growth will be gone.
No offense to the creators (well, maybe some offense) but why the heck would you want to put MySQL in where PostgreSQL already was? That's like taking out your star quarterback and putting in, well, me!
"!"
I can't say I'm looking forward to bigger, faster, shit-covered platters...but hey. Who am I to stand in the way of progress?
THL phish sticks
Considering that you like to talk out your ass, your name would be "shit for brains".
Uber and Super both mean "above", knucklehead. Same proto-indo-european root, in fact.
Intersystems Cache
"World's fastest Object Database"
(In my best Special Ed impersonation)
Yaaaaaay, now we can scale out Hadoop! Yaaaaay! Yaaaay Hadoop! Yaaaaay!
Take a look at Intersystems Cache
Fastest object database in existence
Fastest SQL database in existence
Learn how it works and you will see how SQL is nothing more than a mighty kludge
We might create the software intending it to do and be used in one way, but how it will actually be used is determined by the users. Postgre and MySQL don't carry any intrinsic values, only the values which their users discover and, well, use. Without users they have no good or bad features.
So why is it that people feel the need to rally around or defend them? After all, only the developers who have done the work are capable of understanding the snips and criticism leveled against them, and these are the people who have given their work away, to you and me.
MySQL excels at some things. Postgre also excels at some things. If users feel there is too much overlap then they can work to reproduce these features in a single tool, such as Postgre should they feel it has more utility. But to discount a tool many people find useful shows a core misunderstanding of what it is that determines the software's value.
Postgre can not be better then MySQL, it can only provide varying degrees of value. And that value is determined by the user.
Quack, quack.
I firmly believe that any and all patents involving any naturally occurring human process or material should be abolished. Furthermore, some of those attorneys should be seeding our first body bank (nod to Niven and Pohl).
There are also two Hadoop subprojects that either support SQL or will shortly. They both translate SQL queries into map/reduce programs. They are:
http://hadoop.apache.org/pig/
http://hadoop.apache.org/hive/
Taken from cloudbase's user doc (http://cloudbase.sourceforge.net/index.html#userDoc): CloudBase is a high-performance data warehouse system that scales horizontally on commodity hardware or a cloud computing network. It is developed by Business.com and released to open source community under GNU GPL licence 2.0 Built on top of a Hadoop's map-reduce architecture, CloudBase enables business analysts using ANSI SQL to directly query large-scale log files arising in web site, telecommunications or IT operations. But, unlike other map-reduce approaches, it does not create or require the use of a programming language on top of map-reduce.
"Cheap 2.0".
for many reasons in many hadoop use cases schemas are not usable. saying but we need schemas for performance is fail. also this is not a fair comparison. vertica uses compression. others do not. this is biased beyond imagination.
But for the rest of us....
Sorry, but could not help thinking but to this line from "Life of Brian":
But apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, the fresh-water system, and public health, what have the Romans ever done for us?
More seriously, if the main example of the trouble with SQL is that you want to be able to find a record by id with less keystrokes, I do not see how this can be so much of a problem.
Why can't
I think the summary was supposed to link to this: http://www.computerworld.com/s/article/9135726/Yale_researchers_create_database_Hadoop_hybrid?source=CTWNLE_nlt_dailyam_2009-07-21