Slashdot Mirror


Enthusiasts Convene To Say No To SQL, Hash Out New DB Breed

ericatcw writes "The inaugural NoSQL meet-up in San Francisco during last month's Yahoo! Apache Hadoop Summit had a whiff of revolution about it, like a latter-day techie version of the American Patriots planning the Boston Tea Party. Like the Patriots, who rebelled against Britain's heavy taxes, NoSQLers came to share how they had overthrown the tyranny of burdensome, expensive relational databases in favor of more efficient and cheaper ways of managing data, reports Computerworld."

23 of 423 comments (clear)

  1. Quit Whining by KingPin27 · · Score: 5, Funny

    Just use flat text files --- no need for expensive db's .... think of the freedom!

    --
    "i lost my dignity on a slippery wiener"
    1. Re:Quit Whining by Anonymous Coward · · Score: 4, Insightful

      The horrible lag I get when using address completion in Firefox 3 makes me wish more people thought that way!

    2. Re:Quit Whining by Paradise+Pete · · Score: 4, Funny

      I"ve lost data in two filesystems thanks to the Slasher's shoddy work.

      Have you looked near Redwood Regional Park? On the side of a hill?

    3. Re:Quit Whining by jadavis · · Score: 4, Insightful

      One of the reasons is because RDBMSs offer a lot of tools, like atomicity, durability, backup/restore, centralization, point-in-time-recovery, etc. Many application developers need these things without actually needing the abstraction of a relational system.

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
  2. This is what happens by Anonymous Coward · · Score: 4, Funny

    When you get a lot of morbidly obese nerds with no life to program for you.

    Meanwhile SQL users get laid.

    1. Re:This is what happens by Anonymous Coward · · Score: 5, Funny

      It's true. I do a lot of INNER JOINing. Often with multiple tables.

  3. Tilting at windmills by Anonymous Coward · · Score: 5, Insightful

    Seems to be a silly thing to be against. Relational databases and the stuctured query language may not be perfect, but I bet these people could die in their 90's and people will still be using relational dbs and sql.

    If you want to tout open or cheap dbs and more lightweight types of storage/db servers, then they might have some points, but being against sql is just plain dumb.

    1. Re:Tilting at windmills by Qzukk · · Score: 5, Insightful

      SQL isn't the only way possible to query relational databases. It's nice and does a really good job for even mildly complex queries and I would not want to ditch it just yet, but seriously... who hasn't had a business need for multiple levels of aggregates (eg averages of sums across multiple groupings, say "average across all customers' total balances") As it is, you end up splitting the logic between the database and the application, or creating a view of the first level of aggregation, then querying against that and hoping that the performance doesn't suck total ass.

      --
      If I have been able to see further than others, it is because I bought a pair of binoculars.
  4. Yeah, so why are they better? by Anonymous Coward · · Score: 5, Insightful

    If I was to read the article, I bet somewhere someone would be wittering on about Key Value Datastores.

    The brainchild of a generation brought up on high level collections, they learn one (in this case Map) and apply it to everything.

    Sadly SQL, and RDBMS, works for most people. It maps object data well (oh whaaaa, i have to do foreign keys - GROW SOME FUCKING BALLS YOU LAZY GRADUATE!) and it is well understood. And with abstractions like LINQ to query them, even the lazy dumb Windows .NET programmer doesn't have to strain their brain to learn SQL.

    And when you have terabytes of specific unique data, you clearly should go away to work out how best to store it. Even a RDBMS/SQL solution is too generic for all problems.

  5. Re:Flat Earth by MightyMartian · · Score: 5, Insightful

    And yet where the other corporations; the oil companies, the banks, large merchant conglomerates. In IT we seem to have this sort of myopic view that if it isn't an IT company of some kind, it doesn't exist. Google, as compared to the huge companies that use tools like Oracle, is a bit player. I know that's hard for all of us who have sucked at the teat of silicon valley for so long have a hard time dealing with, but a significant amount of data that has nothing to do with social networking and finding pr0n goes on and does use tools like SQL.

    --
    The world's burning. Moped Jesus spotted on I50. Details at 11.
  6. How about saying yes to the alternative by syousef · · Score: 4, Insightful

    Saying no to SQL and relational databases is just fine if you've got something better to replace it with. However I know of no such thing. The reason they're popular is that they are so powerful for data storage. If something better came along you wouldn't even need to say no to SQL. You'd just say yes to the newer better rival.

    --
    These posts express my own personal views, not those of my employer
  7. SQL is not a database by j.+andrew+rogers · · Score: 5, Insightful

    SQL is not a database, it is a standard interface to a feature set commonly associated with relational models. Before everyone standardized on SQL, there were other relational query languages. The "No" part of "NoSQL" refers to the fact that some basic elements of relational implementations cannot be usefully expressed using a much simpler distributed hash table model.

    All the "NoSQL" does is eliminate all the parts of traditional relational databases that do no scale -- discarding the bottleneck rather than fixing it. These are things like joins and external indexing. Unfortunately, discarding those things means you discard a lot of very important functionality as a practical matter, notably the ability to do fast, complex analytics. Adopting the NoSQL architecture runs contrary to the trend toward more real-time, contextual analytical processing. There are a great many analytical applications that are not amenable to batch-mode pattern-matching, and the NoSQL model is a lot less applicable than I think some people want to acknowledge. In its domain, it is a great tool but it has many, many prohibitive limits. We are essentially trading power for scale.

    That said, do not take this as an endorsement of traditional SQL relational databases either, as they have a number of serious limitations themselves. As just mentioned, a number of the core analytical operations those models support are based on algorithms that scale poorly. The SQL language itself has mediocre support for many abstract data types (e.g. spatial) and data models (e.g. graph), which in part reflects the inadequacies of the assumed underlying database algorithms (e.g. B-trees) that are implicit in SQL. The inability to efficiently do event-driven/real-time applications is also more a reflection of the access methods used in databases than any intrinsic weakness in SQL; SQL may be clunky for that purpose, but that is not the real limiter.

    A truly revolutionary deviation from SQL would usefully implement a superset of the features SQL supports, not take them away. Of course, we would need access methods more capable than hash tables and B-trees to useful implement those features, which is a lot more work than discarding features that scale poorly. NoSQL is a stopgap technical measure for that small subset of applications where the serious tradeoffs are acceptable.

  8. Pros & Cons of non-relational solutions by kpharmer · · Score: 5, Interesting

    Note that most of these solutions come from the interwebs, social networks, etc. And it isn't so much anti-sql as it is anti-relational database (sql != rdb).

    The basic premise is that we need different solutions that: can scale very high for very narrowly scoped reads & writes, don't need to perform ranged queries / reporting /etc, and don't need ACID compliance. And that may be the case. Sites like slashdot, facebook, reddit, digg, etc don't need the data quality that ebay needs.

    On the other hand, ebay achieves scalability AND data quality with relational databases. And when I've worked with architectures that scale massively and avoid the relational trap for better solutions - they inevitably later regret the lack of data quality and complete inability to actually get trends and analysis of their data. It *always* goes like this:
        Me: So, is this thing (msg type, etc) increasing?
        Developer: No idea.
        Me: Ok, so lets find out.
        Developer: How?
        Me: I don't know - typical approach - lets query the database.
        Developer: It'll take four+ hours to write & test that query and then days to run. And when it's done we might find that we wrote the query wrong.
        Me: What?!?
        Developer: We had to do it this way, you can't report on 10TB databases anyhow
        Me: What?!? Are you on crack? there are dozens of *100TB* relational databases out there that people are reporting on
        Developer: well, we probably don't need to know what that trend is anyhow
        Me: I'm outta here

  9. Data out-lives applications by 4to6Offshore · · Score: 5, Insightful

    First: my mantra: Data belongs to the organization, not the application... if the app fails and data is accessible then we all go on - if the data fails or is locked away - what was the point of the app again?

    In a SQL database then data is understood by the organisation, DBAs and data architects. If left to app developers taking an app-centric approach to data... I get nervous quickly.

    So long as the data is just as definable and accessible as current SQL databases then all good - give me an app with some odd-ball storage then it is bye-bye.

  10. Re:The problem is performance not SQL by oGMo · · Score: 4, Insightful

    It's just that now that we can assume local clusters and WANs worth of co-operating data stores, there are probably better, more performant ways of implementing persistence, replication, distribution of data than traditional RDBMS implementations.

    You can also assume magical fairy dust and free energy, but that doesn't make it so. You can ask if there are better ways, but you can't assume it, and in the end you will find there is no magic.

    Clusters and replication are NOT NEW. Not even remotely new. There is, in fact, nothing new architecturally at all that would indicate some new capability that hasn't already been repeatedly analyzed and tried. That doesn't mean you can't tweak something for a situation, or that you need a giant Oracle database for everything, but "the web" and "cheap hardware" change the equation by precisely nothing.

    What has changed the equation is cheap, unimportant data, which covers the majority of the web. "Real" applications, where data integrity is important (like say, your bank account), and immediate accuracy guaranteed, require the main thing you use a database for: data integrity. Your facebook page, your google search, that blog entry, or some video on youtube: these don't matter. If it's a little slow, or doesn't update immediately, or you get an error, no one is losing money. No one cares.

    In essence, if a reliable database isn't important for your app, your app isn't really handling important data. This may be fine; in the mainstream, there's a lot of noncritical stuff. But this doesn't make databases unimportant.

    --

    Don't think of it as a flame---it's more like an argument that does 3d6 fire damage

  11. Re:A time and place for everything by E+IS+mC(Square) · · Score: 5, Interesting

    >> Trees is a wellknown problem of SQL, but the fact is that SQL can't handle most datastructures and complex relations, only very simple one dimensional ones.

    Sorry, that's not true. Have you tried analytical functions? You would be amazed how complex scenarios can be handled easily with them. And they are part of ANSI SQL standards. And db providers (Oracle etc) have taken the concept and improved a lot on it.

    I think the anti-sql 'movement' has more to do with new (internet era) languages and their developers than so called 'lack' of features. In my limited experience, I have observed people coming from C (and such) background have no problem with sql, while java developers (and this is probably true for most developers working on web-based applications) are the worst kind when it comes to understanding even basics of sql. All they want is their objects.

    I strongly believe that a competent programmer designing/developing system which includes data and data-storage should at least know normalization, indexes, and what does it mean by 3NF. Programming language is one thing, database is another, and knowledge of both is required to build a decent system.

  12. RDBMS and application logic by gd2shoe · · Score: 4, Insightful

    That is one view. It's nice and all, but incomplete. The issue is performance.

    Any time you're dealing with a large quantity of data, it's always easiest to process or filter where it's located. Transmitting it, processing it, and transmitting back changes adds an unreasonable amount of overhead. Hence, SQL is a "Query" language. In other words, you have the RDBMS do reasonable data processing and filtering of records for you. Your application should only need to specify the operations performed, and should only process data if your computation is particularly unusual. This makes feasible computations that would otherwise be entirely unreasonable. (note that an application working on the same machine generally has the same issue as one working on a separate system. SQL servers present the application with a stream of data - pipe, socket, etc)

    My opinion: SQL is horrendous. It's a pain to use, and many basic data transforms cannot be described in that language (at least without some huge, awful, convoluted command == maintenance nightmare).

    --
    I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
  13. Re:A time and place for everything by julesh · · Score: 4, Informative

    SQL is great for financial data.

    Actually, this isn't true either. See this article for pointers to some of the failings of SQL in dealing with financial data, particularly time series (e.g. sales figures, share prices, etc.). Here's another take on the problem, which essentially is that SQL doesn't recognise that there can be relationships between the rows of a table (e.g., "this happened after this").

  14. Re:A time and place for everything by diamondmagic · · Score: 5, Informative

    Design an efficient table relating a tree structure.

    Huh? Tree structures are best handled by relational databases, as it is far faster then recursion. Give row a unique ID and a parent ID, and in addition, a left hand and right hand number, the root node having a left-hand value of 1 and a right hand value of (number rows * 2), the first child node has a left-hand value of one more than the parent's, the right-hand value is one less then the left-hand of a younger sibling.

    Then design queries to answer questions such as:
    * Find the nodes in the subtree under B.

    SELECT * FROM rows WHERE left > [left hand value of B] AND right < [right hand value of B]

    * Find all ancesters of G

    SELECT * FROM rows WHERE left < [left hand value of G] AND right > [right hand value of G]

    * Find the nearest common ancestor of D and H

    SELECT * FROM rows WHERE left < [lowest left hand value from D,H] AND right > [highest right hand value from D,H] ORDER BY right LIMIT 1

    Trees is a wellknown problem of SQL, but the fact is that SQL can't handle most datastructures and complex relations, only very simple one dimensional ones.

    Are you saying trees are easy or hard? And for more complex systems, that is what JOINs are for. SQL is by far the most powerful way and often the fastest way to manipulate data that I know of. The only time I can recall that I had to use a non-SQL solution that was faster then the SQL solution was a matrix operation.

  15. I don't understand by 93+Escort+Wagon · · Score: 4, Funny

    So a bunch of Excel users got together for dinner in San Francisco - why is this news?

    --
    #DeleteChrome
  16. The RDBMS responds to the troll by smittyoneeach · · Score: 5, Funny

    See, I don't think there is ever a good time or place for SQL.

    SELECT text FROM mild_introductory_statements WHERE id=random();

    Anyone who says so has never had to use it.

    SELECT text FROM statements_indicating_superior_experience WHERE id=random();

    I like to compare it with JavaScript.

    SELECT text FROM unrelated_tool WHERE id=random();

    It's a language that is difficult to refactor, maintain, and while it's a standard, the standard is so vague that it's useless.

    SELECT text FROM seemingly_valid_yet_unsubstantiated_objections WHERE id=random();

    Like JavaScript, people are trying to build other languages on top of it to hide its shortcomings -- for javascript you have tools like GWT, and for SQL you have HQL, Linq, etc.

    SELECT text FROM wrongheaded_causal_analysis WHERE id=same_one_as_two_queries_ago();

    Not to say that there is anything wrong with relational databases, we just lack a good tool to interface with them.

    SELECT text FROM reasonable_sounding_parthian_shot_to_obscure_trolling WHERE id=random();

    --
    Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
  17. Re:A time and place for everything by TheNarrator · · Score: 5, Insightful
    I think the main problem that the web 2.0 dynamic language crowd has with RDBMs is that:
    • 1. Relational data is strongly typed. You cannot easily add new fields to a table or store arbitrary types in a column and expect acceptable performance.
    • 2. Migrating large amounts of relational data to a new structure takes a very very long time. Constant refactoring of data models is to be avoided. You have to get it right the first time or at least very early in the development cycle to avoid major headaches...
    • 3. Databases are hard to mock in a testing context. Automated tests can be significantly slowed down with even a test database..
    • 4. Error in database architecture are very difficult to correct due to 1 and 2, especially when used with a dynamically typed language..
    • 5. It's difficult to maintain the data integrity that RDBMSs take for granted in highly scalable distributed systems and have acceptable performance.

    The only real show stopper and a real reason to replace RDBMSs is #5. All the others can be worked around by just deeper study of data modeling techniques. Data modelling is not something most developers can figure out intuitively. There is a lot of theory to be learned to do it right and it can very easily be done badly leading to severe performance problems and an unmaintainable application.
    With regards to # 5: I went to a presentation at Javaone where some Ebay engineers explained that they do not use transactions in any of their database operations. They just leave junk rows around in the db if a transaction half completes and as long as they aren't reachable they don't consider it a big deal. They have to very carefully organize the order in which they manipulate data to avoid data corruption ,but that lets them get around # 5,

  18. Re:A time and place for everything by Kjella · · Score: 4, Informative

    1) 1996 called, they want their arguments back. For example, most RDBMS have ranking functions now.
    2) Even in 1996, he doesn't know SQL worth shit

    SELECT (prev.sales+now.sales+next.sales)/3 three_day_average
    FROM sales prev,
                  sales now,
                  sales next
    WHERE prev.day_number = now.day_number-1
    AND next.day_number = now.day_number+1

    Easy as pie making most of the calculations he wants. Maybe he should ask someone knowledgable in SQL?

    --
    Live today, because you never know what tomorrow brings