Why Some Devs Can't Wait For NoSQL To Die

← Back to Stories (view on slashdot.org)

Why Some Devs Can't Wait For NoSQL To Die

Posted by Soulskill on Sunday March 28, 2010 @03:29AM from the must-be-the-insurance-policy dept.

theodp writes "Ted Dziuba can't wait for NoSQL to die. Developing your app for Google-sized scale, says Dziuba, is a waste of your time. Not to mention there is no way you will get it right. The sooner your company admits this, the sooner you can get down to some real work. If real businesses like Walmart can track all of their data in SQL databases that scale just fine, Dziuba argues, surely your company can, too."

357 of 444 comments (clear)

Min score:

Reason:

Sort:

Article summary by BadAnalogyGuy · 2010-03-28 03:31 · Score: 5, Funny

People who don't like SQL should get their heads out of their asses and use MySQL, a robust and enterprise-ready database.
Interesting thesis...
1. Re:Article summary by digitalunity · 2010-03-28 03:37 · Score: 5, Insightful
  
  My experience has made me believe PostgreSQL is better in every respect. It's more stable, has more features and is easier to use. The article wasn't specifically pro-MySQL.
  The article is largely correct. The movement to ditch SQL databases is really naive. SQL scales just fine, if you know how to use it right. Look at Oracle solutions. All their fancy eBusiness software is still Oracle SQL DB backed and some of the biggest companies in the world are using it.
  SQL isn't the problem, it's a tool. Bad programmers are the problem.
  
  --
  You can't legislate goodness. Let each to his own destiny, by will of his freely made choices.
2. Re:Article summary by RedMage · 2010-03-28 03:50 · Score: 4, Interesting
  
  We're using both - about five days from our "go-live", and things look good. We just use what makes sense for each part of our application.
  For us, this means PostreSQL for the parts that must be transactional ACID, and Amazon's S3 and SimpleDB for parts that don't. In practice, for the 1.0 release, this means things like notes, user accounting, and documents are in S3 and SDB. The rest is plain ole SQL.
  Not that there wasn't a learning curve with our developers - we're a bunch of old-time enterprise type developers, so "letting go" and moving out of the traditional SQL world took a little thought and proving time. We'll use the first few months to learn more about doing architecture this way.
  We've had the language wars - lets avoid the SQL/NOSQL wars please. I'm tired.
  
  --
  }#q NO CARRIER
3. Re:Article summary by amorsen · 2010-03-28 03:51 · Score: 1
  
  SQL isn't the problem, it's a tool. Bad programmers are the problem.
  Relational databases are quite useful. It's too bad they're hampered by such a lousy syntax though. It's like if we all decided to stick with COBOL but added closures and templates and whatnot...
  
  --
  Finally! A year of moderation! Ready for 2019?
4. Re:Article summary by slim · 2010-03-28 03:54 · Score: 4, Insightful
  
  Look at Oracle solutions. All their fancy eBusiness software is still Oracle SQL DB backed and some of the biggest companies in the world are using it.
  Yep, "nobody ever got fired for choosing Oracle".
  But to get performance and fault tolerance for Oracle, you need to throw a lot of money at it -- high end hardware, RAC licenses etc. Whereas some of the NoSQL DBs promise lots of scalability on clusters of cheap hardware -- situations where failing hardware is the norm.
  If your application suits it (i.e. your data fits the name/value system, and eventual consistency is adequate) why not use something fast and cheap?
5. Re:Article summary by Nerdfest · 2010-03-28 03:56 · Score: 1
  
  Sure you can scale SQL databases. The real point is that it takes a lot more work to do it than with a NoSQL database, and in some cases the advantages of SQL aren't worth the hassle. It depends on what problem you're trying to solve and what your other constraints are.
6. Re:Article summary by deniable · 2010-03-28 04:02 · Score: 4, Insightful
  
  Some of us are simply looking to not use the relational model for *every* bit of data in the system. Application global, put it in a table. Uploaded files, put them in a table. User data, get it from LDAP, nah, create our own table and get somebody to feed it manually. Given the number of apps I've seen that use SQL as a simple key/value store, it's no wonder that there are techniques to avoid the overhead completely.
7. Re:Article summary by c-reus · 2010-03-28 04:03 · Score: 5, Funny
  
  Oracle database license prices scale very well, too.
8. Re:Article summary by squiggleslash · 2010-03-28 04:20 · Score: 5, Interesting
  
  There's a fairly obvious reason for NoSQL vs Pro-SQL, and it's this: SQL is absolutely the worst database query language ever invented... apart from all the others.
  Virtually no-one who's spent any time analyzing and working with large amounts of data has a good word to say about SQL. It was designed from the start as a language that would be integrated into others, and yet simple real world realities make that impossible, with 99% of implementations being of the "Build a large string, and pass that string to "the SQL connector" to be parsed and interpreted" form. Its handling of null and the empty string is incomprehensible and useless, in part because nobody involved ever had the cajones to do what needed to be done with both. There is no standardized set of data types in the real world. Simple issues with unstandardized case dependencies can make an application that works with Oracle and only uses standard "select" statements not work under, say, PostgreSQL. And these are the surface level technical issues: talk to any relational database guru and they'll come up with numerous philosophical issues too.
  To this you add another component that's always an issue: the entirely haphazard way in which relational databases are implemented on most operating systems, whereby the DBMS is another application, that manages its own files, and needs to be coached with kind words and a happy smile in order to get anything done. Does your app use a database for something back-endy, like, for example, MythTV does for its settings and lists of channels and TV programs? Well, either forget it, or be prepared to put your users through hell as they have to ensure that the entirely separate DBMS is installed and that usernames and passwords are set up for your application's use.
  And so, naturally, people hate them. With a passion. To the point that anyone sane is going to put it low on the list for any application, even when it's entirely appropriate. Of course your multiuser databases in your enterprise environment should be stored using an enterprise grade RDBMS, and as nobody's come up with anything better, you should be talking to it using SQL.
  ...and you should be talking to it carefully. Ideally, those writing the application core should be handing over the database access to someone who can abstract each query properly. Because SQL sucks. It just sucks less than anything else designed to do the same thing.
  
  --
  You are not alone. This is not normal. None of this is normal.
9. Re:Article summary by nacturation · 2010-03-28 04:27 · Score: 2, Interesting
  
  I would also fire anyone who specifies MSSQL - with immediate effect, and no severance pay: On grounds of insubordination, incompetence and reckless endangerment.
  So it's a no-go on MSSQL for that Microsoft contract your company just got? Of course, you didn't specify the type of work your company does so this attitude comes across as being rather narrow-minded. And good luck on that no severance pay thing. "I'd fire anyone in my organization who suggested we callously disregard labor laws like that." :)
  
  --
  Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
10. Re:Article summary by timeOday · 2010-03-28 04:28 · Score: 1
  
  I agree, the syntax of PROLOG, for example, seems much simpler, more powerful, and makes more sense to me. There have been many attempts to fuse it with SQL, but nothing past the scale of a few researchers from different universities working together.
  But when all is said and done, you can get familiar with most of SQL in a couple weeks. No doubt mastering all the intricacies of Oracle takes years, but not, I think, due to the SQL syntax.
11. Re:Article summary by seanadams.com · 2010-03-28 04:37 · Score: 5, Insightful
  
  Does your app use a database for something back-endy, like, for example, MythTV does for its settings and lists of channels and TV programs? Well, either forget it, or be prepared to put your users through hell as they have to ensure that the entirely separate DBMS is installed and that usernames and passwords are set up for your application's use.
  
  sqlite is underrated and would be ideal for many such applications.
12. Re:Article summary by ducomputergeek · 2010-03-28 04:41 · Score: 4, Interesting
  
  I don't have mod points, but I've found the same thing. It's the perfect development database if you think that your program is ever going to need to support Enterprise class stuff. On the small scale, I've found that it's fast enough. Is MySQL faster? Yes, but where I've tested it's not been enough to really matter compared to the other advantages of PostgreSQL. Primarily that it's ACID compliant. What we've found is that it works well until you start getting into databases that are GB in size. But then you can easily port the datatables to DB2 or Oracle and go. Especially if you designed the rest of the software to do this from the get go.
  In production, we moved all but one of our databases from MySQL to PostgreSQL. We were having problems with Innodb corrupted once every couple months. When it was announced that Oracle was bidding on Sun, we ported over to PostgreSQL, spent a couple weeks rewriting code, and we've not touched the Postgres database since. It's not corrupted and not even hiccuped once since we deployed. We run regular vacuuming and maintenance and that's it. It's been humming for well over a year and now is getting 400x's the use than we ever had with MySQL.
  The only thing that PostgreSQL was lacking has been HA support. There are number of 3rd party tools that run well, PGCluster, Slony, GridSQL, but this looks like PostgreSQL is going to support native replication, clustering, and HA with hot-standby...
  
  --
  "The problem with socialism is eventually you run out of other people's money" - Thatcher.
13. Re:Article summary by MooUK · 2010-03-28 04:41 · Score: 1
  
  Would it not have been less complex to use PosgreSQL for everything, or was there enough difference to be worth the complexity?
14. Re:Article summary by Vancorps · 2010-03-28 04:44 · Score: 4, Insightful
  
  What product has Oracle ever dropped support for? What is your objection to MSSQL? SQL 2005/2008 are damn fine products which perform extremely well. Sounds to me like you're the one that is ignorant with blanket policies against industry standard tools.
  Of course I run Oracle, MySQL, and MS SQL in my datacenter all without problems and some under nice and heavy loads. About the only sensible stance you have is with Postgresql which is far and away better than MySQL which in my opinion sucks pretty bad.
15. Re:Article summary by JamesP · 2010-03-28 04:51 · Score: 2, Interesting
  
  SQL isn't the problem
  Yes, it is
  Overhead caused by structuring your data the way relational dbs needs.
  Lack of flexibility
  Scalability capabilities (horizontal scaling is easier)
  Speed (see overhead)
  
  --
  how long until /. fixes commenting on Chrome?
16. Re:Article summary by SanityInAnarchy · 2010-03-28 04:51 · Score: 2, Interesting
  
  SQL isn't the problem, it's a tool. Bad programmers are the problem.
  You could say the same about assembly language. You could also say the same about threads, and dismiss things like functional programming and the actor model as fads.
  I'll give you a simple example: Given a big transactional SQL database, if you want it to scale to more than a few machines, you're going to want to shard it. That's going to be a ton of manual work, figuring out what you can shard, what keys to shard it on, adjusting it later on the fly to ensure that each DB server has exactly what it can handle in terms of data and load, and so on. You might be able to write software to do this for you, but that software is going to be fairly tightly coupled to your data model and your app.
  It's possible I'm missing something there, and it's possible there's an easier way to do it, but it seems like every way to scale SQL has similar tradeoffs. Put a proxy in front of your DB cluster, giving the impression of a single database out of those shards? Your app is now not talking directly to the database, and certain queries won't be supported, and certain other queries will be slow or unreliable.
  The database I'm working with now is Google AppEngine. It's pretty much natively sharded, and the tradeoffs are understood up front -- you can only transact over entities in the same group, but if your app is built up front to define entity groups appropriately, Google can physically shard them for you. It's a similar advantage to using Erlang for concurrency -- you probably won't be running your Erlang app on a machine with several thousand cores, but if you've got several thousand concurrent actors, it will trivially scale to anything in between.
  Like Erlang, it's also not a magic bullet. I still use SQL in things like SQLite, because it's the best tool for the job.
  
  --
  Don't thank God, thank a doctor!
17. Re:Article summary by Anonymous Coward · 2010-03-28 05:01 · Score: 2, Interesting
  
  ... were it not for the fact that SQLite is at least two orders of magnitude slower than any other database, including ones written by first year comp sci students.
18. Re:Article summary by dimeglio · 2010-03-28 05:04 · Score: 3, Insightful
  
  I suppose it's the same argument when having to choose a development language. You got to pick 4GL languages, VB, Pascal, c++/Java/c#, assembly, and machine languages. The art of a great analyst is to know which to pick and when.
  
  --
  Views expressed do not necessarily reflect those of the author.
19. Re:Article summary by mjwalshe · 2010-03-28 05:21 · Score: 1
  
  Anne Good luck at an IT with that argument :-) if it was Access i could agree with you :-)
20. Re:Article summary by shmlco · 2010-03-28 05:23 · Score: 1, Funny
  
  "I would definitely fire anyone who specifies Oracle in my organisation."
  Since you can't even spell organization, I doubt you're in a position to fire anyone at all... (grin)
  
  --
  Any sect, cult, or religion will legislate its creed into law if it acquires the political power to do so.
21. Re:Article summary by Anne+Thwacks · 2010-03-28 05:23 · Score: 1, Troll
  
  What product has Oracle ever dropped support for?
  1) Oracle Power Objects
  2) OS/2 support
  Both of these cost me hideous amounts of money. I do telecomms billing, and if you think you can do that with MSSQL (which cant handle timestamps properly), then think again. I certainly cant afford to have another clusterfuck involving MSSQL - if a client wants to use it, let him go elsewhere. i don't want him sueing me cos his billing data is worthless. Yes I know someone else will get the business - I have had bills from several telecomms companies who probably do use MS software! If my competition get sued it suits me fine.
  
  --
  Sent from my ASR33 using ASCII
22. Re:Article summary by TheLink · 2010-03-28 05:34 · Score: 2, Interesting
  
  Just curious - in what way can't MSSQL handle timestamps properly
  --
  
  Too many replies beneath your current threshold
23. Re:Article summary by $RANDOMLUSER · 2010-03-28 05:36 · Score: 1
  
  "I would definitely fire anyone who specifies Oracle in my organisation."
  
  Since you can't even spell organization, I doubt you're in a position to fire anyone at all... (grin)
  
  Actually, it was Noah Webster who couldn't spell organisation.
  
  --
  No folly is more costly than the folly of intolerant idealism. - Winston Churchill
24. Re:Article summary by mkavanagh2 · 2010-03-28 05:40 · Score: 1, Funny
  
  i don't think you can read. pretty marvellous that you still managed to post, though!
25. Re:Article summary by sourcerror · 2010-03-28 05:40 · Score: 1
  
  Yeah, use just OWL for everything ;)
26. Re:Article summary by binkzz · 2010-03-28 05:50 · Score: 1
  
  Bad programmers are the problem..
  Gullible management is the problem IMHO.
  
  --
  'For we walk by faith, not by sight.' II Corinthians 5:7
27. Re:Article summary by Phroggy · 2010-03-28 05:50 · Score: 2, Informative
  
  ... were it not for the fact that SQLite is at least two orders of magnitude slower than any other database, including ones written by first year comp sci students.
  But if MythTV takes twice as many milliseconds to read a channel listing, it really doesn't matter. Nobody's suggesting that SQLite can replace a real database server in all cases, but performance and scalability are completely unimportant in some applications.
  
  --
  $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
  $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
28. Re:Article summary by spongman · 2010-03-28 05:52 · Score: 3, Insightful
  
  MSSQL's lock escalation isn't as efficient as Oracle's, but that doesn't make it a toy.
29. Re:Article summary by binkzz · 2010-03-28 05:53 · Score: 1
  
  "I would definitely fire anyone who specifies Oracle in my organisation."
  Since you can't even spell organization, I doubt you're in a position to fire anyone at all... (grin)
  It's correct spelling in British English.
  
  --
  'For we walk by faith, not by sight.' II Corinthians 5:7
30. Re:Article summary by TheLink · 2010-03-28 05:53 · Score: 4, Interesting
  
  The syntax might be crap, but it's far easier to get everyone to standardize on SQL to talk to DBs.
  
  "NoSQL" stuff is fine if your company is simple in structure - very few products/services, and it has to write most of that stuff itself anyway.
  
  When you have many different departments with their own different apps (in house and 3rd party), and they all want to access the same bunch of databases, SQL just becomes the "standard API or language" you use to talk to them. In contrast say you have some custom "NoSQL" DB, it's going to be harder to find stuff that talks to it (you might have to write your own connectors).
  
  It's just like "English", the syntax might be crap, but it's far easier to get 3rd parties and other departments to use it. In contrast if you use Lojban, despite its supposed advantages you're probably going to have to get translators (or worse - train your own translators) whenever you need to deal with outsiders who don't speak it.
  --
  
  Too many replies beneath your current threshold
31. Re:Article summary by mcrbids · 2010-03-28 05:56 · Score: 5, Insightful
  
  Virtually no-one who's spent any time analyzing and working with large amounts of data has a good word to say about SQL.
  
  I've spent 10 years developing intensively relational applications with SQL. I love it!
  It was designed from the start as a language that would be integrated into others, and yet simple real world realities make that impossible, with 99% of implementations being of the "Build a large string, and pass that string to "the SQL connector" to be parsed and interpreted" form.
  So... because people don't bother to to learn about things like prepared statements, the tool is bad? It's like saying that cars suck because they don't have cruise control!
  Its handling of null and the empty string is incomprehensible and useless, in part because nobody involved ever had the cajones to do what needed to be done with both.
  OK, so enlighten us with your brilliance! Share with us the ultimate answer of what should be done to differentiate a null (logically, "I don't know") with a blank string (logically, "We know there's nothing there") and what should be done differently?
  IMHO, the concept of "null" is a very useful one which allows a developer to differentiate between a blank answer and a no answer.
  
  There is no standardized set of data types in the real world. Simple issues with unstandardized case dependencies can make an application that works with Oracle and only uses standard "select" statements not work under, say, PostgreSQL.
  
  Woah, hold on there boy! You mean to say that features specific to one database engine won't work with another? Well spank my uncle and grease my kittens - this is amazing! Unless, of course, you stick to ANSI 92 syntax, which is pretty much 100% compatible. Yes, there's some regression testing you'll have to do against the different databases. Just like you have to do with HTML, XML, or any other standards-based language.
  (yawn)
  
  And these are the surface level technical issues: talk to any relational database guru and they'll come up with numerous philosophical issues too.
  Strange how you didn't manage to name even one?
  But here's the part of this whole "NoSQL vs SQL" debate - SQL is an interface API to a DBMS, it's not the database itself! You can use any number of technologies "under the hood" including those
  types of technologies commonly referred to as "NoSQL" and put an SQL interface in front! The whole idea that SQL is somehow the problem is just.... idiotic and betrays an astonishing lack of understanding by the programmer(s) involved.
  It's like saying that you should have a stick-shift car because automatic transmissions don't go as fast. It's just moronic. Arguing about NoSQL is like arguing with a tea party dolt about the "socialist" health car plan that just passed! (that was first drafted by the "right wingers" 15 years ago)
  It's argument from stupidity.
  
  --
  I have no problem with your religion until you decide it's reason to deprive others of the truth.
32. Re:Article summary by spongman · 2010-03-28 05:56 · Score: 2, Informative
  
  MSSQL's TIMESTAMP is non-standard. so if you're trying to port 'standard' SQL code from the mythical standard DBMS in the sky, then you've got some work cut out for you.
33. Re:Article summary by RightSaidFred99 · 2010-03-28 05:57 · Score: 3, Funny
  
  Do you like to come on the Internets and prove you don't know what you're talking about, or do you sometimes get drunk and can't help yourself?
34. Re:Article summary by RedMage · 2010-03-28 05:59 · Score: 2, Interesting
  
  Would it not have been less complex to use PosgreSQL for everything, or was there enough difference to be worth the complexity?
  Turns out, yes and no. We're distributed already, so it would have entailed setting up another DB anyway, and all the management infrastructure around that. AWS also seemed like a good fit for things that were essentially document-oriented and it seemed that it would be efficient for this kind of data model.
  
  --
  }#q NO CARRIER
35. Re:Article summary by spongman · 2010-03-28 06:05 · Score: 1
  
  distilling the above:
  - you /can/ use sucky interfaces to it even though good ones exist.
  - SQL isn't strandardized
  - most installers suck
  easily answered:
  - use the good interfaces (they exist)
  - neither is anything else (useful)
  - use non-sucky installers (they exist)
36. Re:Article summary by raynet · 2010-03-28 06:11 · Score: 3, Informative
  
  This might explain some of the problems with it http://www.sqlhacks.com/pmwiki.php/Dates/Timestamp
  Basicly MSSQL timestamp aint a timestamp.
  
  --
  - Raynet --> .
37. Re:Article summary by jfanning · 2010-03-28 06:25 · Score: 1
  
  Okay, I'll bite. You do know that "organisation" is a perfectly valid spelling in many countries?!
38. Re:Article summary by BarefootClown · 2010-03-28 06:25 · Score: 1
  
  And good luck on that no severance pay thing. "I'd fire anyone in my organization who suggested we callously disregard labor laws like that." :)
  Not everybody works in jurisdictions that require severance pay. In some places, employers are actually allowed to terminate an employer-employee relationship as freely as the workers themselves are.
  
  --
  "Make it ten--I am only a poor corrupt official."
  --Captain Louis Renault (Claude Rains), Casablanca
39. Re:Article summary by EnglishTim · 2010-03-28 06:32 · Score: 1
  
  I'd fire anyone who doesn't understand the difference between British English and American English.
  Just kidding, I'm a nice guy after all. But anyway - have a look at: http://en.wikipedia.org/wiki/American_and_British_English_spelling_differences#-ise.2C_-ize
40. Re:Article summary by Onymous+Coward · 2010-03-28 06:40 · Score: 4, Informative
  
  Two orders of magnitude is not 20x, it's 100x.
  And for non-intensive applications, that's still fine.
  And SQLite isn't actually that slow anyway. It's comparable.
41. Re:Article summary by Anonymous Coward · 2010-03-28 06:45 · Score: 1, Insightful
  
  SQL isn't the problem, it's a tool. Bad programmers are the problem.
  I've actually seen people use SQL databases for _cdrom_ distribution, the idea being pack up the data into tables and distribute the whole database on the CD along with an "application" to run it. Or applications that accept a mysql table as input and (you guessed it) generate a mysql table as output... clearly a case for stdio.
  People over-use databases, (and worse, they always use mysql, for EVERYTHING) which I believe, leads to a sort of anti-SQL movement. There are a lot of cases (the CDROM as an example) where plain old text files would work better, or, DBM's with simple key/value pairs.
  Doesn't help that it's always "mysql" either. I'm glad for the NoSQL "movement" if, for nothing else, it's making people aware that you don't need the overhead of a full scale relational database when the filesystem itself works pretty good.
42. Re:Article summary by andi75 · 2010-03-28 06:45 · Score: 1
  
  And in binary, two orders of magnitude is just 4 times longer.
  Hardly an issue for MythTV...
43. Re:Article summary by fyoder · 2010-03-28 06:48 · Score: 1
  
  Twice as many milliseconds is not two orders of magnitude. Two orders of magnitude is 20x as many milliseconds.
  Your comment doesn't sound nearly as brilliant anymore does it?
  Poor Phroggy must be burning with shame. We could be talking hundreds of milliseconds, HUNDREDS!
  
  --
  Loose lips lose spit.
44. Re:Article summary by Vancorps · 2010-03-28 06:50 · Score: 3, Interesting
  
  Given that Oracle has a java client and java is supported on OS/2 how did Oracle drop OS/2? Even with 10 and 11g you can still connect from a OS/2 box although I would say your application has some fundamental design flaws if workstations are directly connecting to a database.
  Also, some the biggest general ledger applications deployed are running on MS SQL, that includes Great Plains and Navision.
  As for Oracle Power Objects you have the same situation, Oracle has another product that achieves the same functionality and more and it evolved into that. Much like Oracle Forms and Reports 10g has no 11g version, Oracle didn't drop support for Forms and Reports services though, they came out with a new product and have a clear and rather easy transition path provided you have a good amount of Oracle infrastructure.
  MSSQL timestamp is a really weak argument as well as there is nothing that forces you to use it's timestamp which we'll agree is different from what you get with Oracle, MySQL, and Postgresql. We get around that by converting to strings since we work with multiple platforms. Each of them have serious strengths and of course, serious weaknesses. I personally believe that the only product worthy of such animosity is mysql because the developers clearly knew nothing about databases in it's design. Naturally they even admit that. They learned along the way and have created a flexible product but it has all the problems that Oracle had 20 years ago and the MSSQL had 15 years ago. When you rely on your application for data integrity you will run into problems again and again and again.
  Sounds to me like you weren't happy being forced off dying platforms, given how long Oracle extended support for both it seems you were quite stubborn. EOL for Power Objects was in 1995 and support actually ended in 2000. That is one seriously long transition period.
45. Re:Article summary by Hurricane78 · 2010-03-28 06:51 · Score: 1
  
  From big over huge and colossal to gigantic?
  
  --
  Any sufficiently advanced intelligence is indistinguishable from stupidity.
46. Re:Article summary by Hurricane78 · 2010-03-28 06:55 · Score: 1
  
  I’m sorry, but how is setting up a database not piss-easily scriptable?
  Install MySQL as a dependency in the package manager, and in your package, run a shell script that sets up MySQL-admin-settings and runs a SQL script for the database setup. In case there is already a MySQL database set up, tell the user in the post-install message to run the setup script with his own parameters. (Or even offer it right in the “first start” dialog.)
  
  --
  Any sufficiently advanced intelligence is indistinguishable from stupidity.
47. Re:Article summary by Vancorps · 2010-03-28 06:55 · Score: 2, Informative
  
  Hate to reply to my own thread but Power objects was released in 1995 not EOL'd. Oracle actually only recently dropped support.
48. Re:Article summary by dkf · 2010-03-28 07:09 · Score: 1
  
  ... were it not for the fact that SQLite is at least two orders of magnitude slower than any other database, including ones written by first year comp sci students.
  O RLY? Care to point to a degree programme with such students?
  (SQLite has some other nice advantages in that it's much simpler to show that it isn't introducing new vulnerabilities into the machine. Adding a database server to a system makes that proof much harder.)
  
  --
  "Little does he know, but there is no 'I' in 'Idiot'!"
49. Re:Article summary by Kenneth+Stephen · 2010-03-28 07:10 · Score: 4, Insightful
  
  Au contraire.
  While there are problems with SQL, 95% of its users are happy as a clam that it exists. The unhappy users are the ones who are pushing the boundaries of what SQL allows and those are the people who know SQL best. When you are writing SQL queries that span 200 lines of code, then, and only then do you begin to scratch at the limits of what SQL allows. Until then, you've only hit the limits of competence.
  I've been working with SQL for over 20 years now. I've worked with applications that didn't use RDBMS's. Some of them used flat files. Some of them used hierarchial databases. People who haven't had the same sort of experiences, haven't come to the realization of why SQL was invented - and that results in then making ill-founded statements like "SQL is absolutely the worst database query language ever invented". Utter tosh. SQL has its problems, but its one of the best. That's why it has left its competitors in the dust of time.
  I look around at all the frameworks that have evolved to not do SQL (EJB-QL, Hibernate, etc) and I laugh. None of those languages come close to handling the same breath and width of problems that SQL can be used to solve. Whenever I see advocates of these frameworks all puff up with fervour, I feel like shaking them and say "Your emperor has no clothes!". The list of problems these frameworks can't solve is so huge that one wonders why anyone works with them at all. But I suppose, there are plenty of people who work for small businesses who haven't encountered the kind of problems that big enterprises have.
  The parent poster that I'm responding to has apparently had an problems porting SQL code. But guess what? Even on the unix platform, applications written in C have had trouble being ported from one Unix to the next. People have worked around it. Nobody goes around arguing that "C is absolutely the worst programming language ever invented".
  
  --
  There is no such thing as luck. Luck is nothing but an absence of bad luck.
50. Re:Article summary by BluenoseJake · 2010-03-28 07:13 · Score: 2, Interesting
  
  Have you used SQL Server? I thought not.
51. Re:Article summary by dirkdodgers · 2010-03-28 07:24 · Score: 1
  
  And what relational algebra languages "of all the alternatives" other than SQL are database professionals such as yourself using to query RDBMSes? Tutorial D?
  So-called NoSQL DBMSes started out by requiring you to use object oriented facilities of your preferred client language to painfully build up queries.
  It's no surprise to me that so-called NoSQL DBMSes are now developing SQL or SQL-derived query languages upon finding, as we seem to go through this same cycle every few years for the past 30, that SQL is pretty good at what it does.
52. Re:Article summary by K.+S.+Kyosuke · 2010-03-28 07:26 · Score: 2, Informative
  
  SQL has its problems, but its one of the best. That's why it has left its competitors in the dust of time.
  Oh, bullshit. SQL succeeded because it came from IBM, and what comes from IBM must be good by definition...or not? If we're talking about *relational* databases, then SQL is about as good a relational query language as COBOL is a general purpose language. C.J. Date wrote The Third Manifesto for a reason.
  
  --
  Ezekiel 23:20
53. Re:Article summary by Lennie · 2010-03-28 07:32 · Score: 1
  
  SQLite is not slow or slower, just is significatly slower when it needs to handle concurrency.
  
  --
  New things are always on the horizon
54. Re:Article summary by Kenneth+Stephen · 2010-03-28 07:43 · Score: 1
  
  From whence comes your bile at IBM? Just because SQL was invented at IBM, doesn't mean that its actually bad, does it? Anyway - I wasn't bringing IBM into my argument. What I was saying was based on what I know SQL can do, and what it can't do.
  
  --
  There is no such thing as luck. Luck is nothing but an absence of bad luck.
55. Re:Article summary by jc42 · 2010-03-28 07:43 · Score: 4, Interesting
  
  ... the syntax of PROLOG, for example, seems much simpler, more powerful, and makes more sense to me.
  Yeah, wouldn't it be wonderful if instead of all the complex cruft usually needed to find the data you need in that morass, you could just write a prolog expression and let the interpreter resolve it? But when I mention this to Team Leaders, they inevitably look at me like I'm from Mars. They have no idea what prolog is or does. (And I'm actually from a planet much farther away than Mars. ;-)
  But when all is said and done, you can get familiar with most of SQL in a couple weeks.
  True, perhaps, and I did that years ago. But that doesn't deal with the major problem with SQL: In my experience, every relational database I've ever worked with was in the grips of a set of professional RDB priests, and you didn't do anything in SQL without their blessing. If they didn't approve of what you were trying to do (typically because they couldn't be bothered to listen to you), it wouldn't get done during your lifetime.
  So I've learned to cultivate them as an acolyte. I write my "prototype" to use flat files, typically small files full of name:value pairs, sometimes with the name part the file name and the value the contents, and a directory tree of multiply-linked files to classify stuff. I agree with their criticism of this, and say that I'd be happy to convert the code to use their DB when they have the time to help me get those subroutines working right. While they chew on that, I get the project working with the flat files, and get some users using it. When the priest finally face the fact that the project works without their help, they finally deign to help.
  But I've never seen them actually get the SQL working to the point that it can supplant the flat files. The parts that do work are always so slow that turning on the "useDB" switch makes it too sluggish to actually use. In some cases, I can get around this by writing "pre-pass" code to extract the common data sets from the DB and write it to flat files, which the interactive software can read through quickly.
  It has long seemed to me that SQL and RDBs in general are Good Ideas. But unless we can find a way to end the stranglehold of the DB priesthood in an organization, it's all sorta hopeless for a mere "developer" to even consider jumping into the mess. It's better to just develop stuff that works, and let the DB experts handle the task of porting it to the DB. That way, we developers can keep our hands clean of all the theology, and actually develop stuff that works.
  Of course, this is all heresy to the True Believers ...
  
  --
  Those who do study history are doomed to stand helplessly by while everyone else repeats it.
56. Re:Article summary by digitalunity · 2010-03-28 07:46 · Score: 1
  
  Oracle DB on a single server isn't that fast. It scales really well though across servers and it's really good on multiproc systems.
  It's also fast enough for most uses, but their eBusiness java interface sucks ass. It's really a clunker; very slow, bogs down the client machine badly. Due to it being launched through a web browser as a java app, it's MDI interface is kinda a hack-has lots of modality issues with different dialogs I've found. The web interface is convenient when it's on a local lan but very slow on big datasets or across slow wans.
  For off the shelf CRM, eBusiness is pretty good. There are better though and I just don't recommend it unless you're already tied to Oracle.
  
  --
  You can't legislate goodness. Let each to his own destiny, by will of his freely made choices.
57. Re:Article summary by mabhatter654 · 2010-03-28 07:48 · Score: 1
  
  After being at several small/medium companies, the problem I see is that most older databases really don't use much more technology than MySQL offers... especially when you get into the old RPG/COBOL mainframes.
  The problem I see is that "pulling forward" data is ridiculously time consuming and expensive.. and that limits growing and merging businesses. I think this argument goes back to TBL's idea of the semantic web with XML as the base. Interesting data is not in SQL databases right now. Think about stuff that's stuffed in Word or Excel documents from 1995 versus HTML pages from 1995.... which is easier to mine 15 years later. More importantly, we have to stop the need to convert everything to new formats every 3-5 years!
  From a business standpoint does having orderly tables even matter? The critical documents to my business are still mostly discrete pieces of paper that represent business actions.... invoices, purchase orders, labor entries, payroll stubs, etc. If I could just keep the originals in a "digital shoebox" would all the complex relational stuff really matter? More importantly, something like rifling through a million XML pages is really easy for modern computers.... why try to keep massive "house of cards" of relations and not just work from source.... rebuild the "house of cards" as needed.
58. Re:Article summary by MightyMartian · 2010-03-28 07:49 · Score: 1
  
  There are a lot of reasons why people hate SQL. My experience from having to deal with working with databases designed by other people is absolutely no knowledge of indexing. Either they have no index other than a primary id index, or they have all sorts of indexes of bizarre fields, but little or nothing lining up with anything that relates to fields they're joining. Some basic knowlege of indexes and of query optimization goes a long way, but most folks seem to come from the world of Access 2000, and they end up blaming SQL because of their own ignorance and lack of any sound RDBMS techniques. In essence, particularly in a world where anyone can get a LAMP server up and running in a couple of hours, you have the greenhorn effect; poorly thought-out badly written completely unoptimized solutions which collapse when you start dealing with lots of load.
  
  --
  The world's burning. Moped Jesus spotted on I50. Details at 11.
59. Re:Article summary by that+this+is+not+und · 2010-03-28 07:52 · Score: 1
  
  SQL succeeded because it came from IBM, and what comes from IBM must be good by definition...or not?
  Also, because it comes from Oracle, and Oracle's marketing team are as aggressive as a pack of crack dealers.
60. Re:Article summary by K.+S.+Kyosuke · 2010-03-28 08:02 · Score: 2, Insightful
  
  I'm not saying that SQL is bad because it was invented at IBM, I'm saying that ((SQL succeeded despite being bad) because it was invented at IBM). [if you really need the parentheses :)] That's not the same thing.
  
  --
  Ezekiel 23:20
61. Re:Article summary by digitalunity · 2010-03-28 08:03 · Score: 1
  
  SQL, like any tool, isn't always appropriate. A lot of arguments I'm hearing against SQL are for cases where a relational database isn't appropriate anyway.
  If your data doesn't have a natural hierarchical structure, an RDBMS isn't the right tool. If it's just a big package of key/value pairs or tables of tangentially related data, a flat file or even SQLite might be more appropriate. If you have a small table of data that doesn't change that often, why are you putting it in a database at all? People tend not to ask these questions and reach for the DB instead, then talk shit about SQL being inflexible because it didn't meet the criteria they used when they chose the DB in err in the first place.
  
  --
  You can't legislate goodness. Let each to his own destiny, by will of his freely made choices.
62. Re:Article summary by BitZtream · 2010-03-28 08:05 · Score: 4, Interesting
  
  Considering that by the time you 'need' Oracle, the price of Oracle is a drop in the bucket.
  The only people that ever complain about the price of Oracle are the people who will never have the need to use it because they'll never have the traffic to it to require it.
  Sorry you haven't got to play with the big boys, but in general if you spend your time worrying about how much 'software costs' your business sucks. Software costs, even for Oracle, are trivial compared to the other costs that go into it.
  An Oracle DB serving internet facing customers for instance is going to cost an order of magnitude more for bandwidth in the first year than the cost of an Oracle license to deal with it.
  But you go ahead, keep pretending you have some sort of clue and are witty by pointing out its expensive. If you ever make it to that scale, the last thing on your mind will be the price of an Oracle license.
  
  --
  Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
63. Re:Article summary by Anonymous Coward · 2010-03-28 08:07 · Score: 1, Insightful
  
  [citation needed]
64. Re:Article summary by K.+S.+Kyosuke · 2010-03-28 08:10 · Score: 1
  
  "Relational databases are quite useful. It's too bad they're hampered by such a lousy syntax though."
  The sad thing is that nobody remembers today that relational databases != SQL databases; just like today nobody remembers that low-level programming != C+Assembly language (poor BLISS...). You can have perfectly fine relational DBs without SQL, and arguably, relational DBs *are* better *without* SQL.
  
  --
  Ezekiel 23:20
65. Re:Article summary by JamesP · 2010-03-28 08:14 · Score: 1
  
  SQL, like any tool, isn't always appropriate. A lot of arguments I'm hearing against SQL are for cases where a relational database isn't appropriate anyway.
  Correct, but as things evolve, older technologies may become irrelevant.
  Like COBOL is completely inappropriate for any modern sw development.
  There are other ways to deal with hierarchical data, and you can have links between data in NoSQL
  just like Foreign Keys
  
  People tend not to ask these questions and reach for the DB instead, then talk shit about SQL being inflexible because it didn't meet the criteria they used when they chose the DB in err in the first place.
  Yeah, but it's not always their choice. Things like reliability / backups may be only available to DB users in certain situations (like, apps in corporate infrastructure) not to mention familiarity with technology.
  
  --
  how long until /. fixes commenting on Chrome?
66. Re:Article summary by hobo+sapiens · 2010-03-28 08:22 · Score: 1, Troll
  
  Nobody ever got fired for choosing IBM is the way I have always heard it. And used in that context, I agree. My experience with IBM products has been nothing but bad.
  I don't know about other Oracle software, it's probably bloatware. But the Oracle DB is an awesome product. Proper handling of views, the ability to perform hierarchical queries, and SQL*Loader are alone worth the (high) cost of admission. Before you dismiss those, try working with hierarchical data in MySQL without nested querying and looping or a non scalable set of union queries. Also, try importing massive amounts of data and see what this does to performance on MySQL. Then try querying a view on some large tables in MySQL and notice the order in which the view's where clause is executed in relation to your query's where clause. For you to have that opinion, I can imagine you've never done these things in MySQL.
  Oracle is a very good DB, but is also sensitive to being (im)properly managed. Since it's not a toy like MySQL or MSSQL, you need to hire a real DBA, one who actually understands databases and not some point-and-click monkey to work with MSSQL or someone who knows a few linux commands to work with MySQL (yay mtop and mysqldump!) And your devs need to actually understand how to design and index tables and write queries properly. IOW, basic competence for anyone who claims to know anything about databases.
  You don't like Oracle because either a) you have suffered a bad implementation at the hands of incompetent DBAs/bad programmers, or b) are one of the preceding.
  I have worked with MSSQL, Oracle, and MySQL extensively. Oracle leaves them all in the dust.
  
  --
  blah blah blah
67. Re:Article summary by hobo+sapiens · 2010-03-28 08:26 · Score: 1
  
  just to clear one thing up: the last version of MSSQL I used was 2000 (one place I worked was still using that exclusively but was keeping up pretty well with Oracle versions) I hear the newer versions are much better, more Oracle like. So didn't mean to totally knock MSSQL. But even comparing MSSQL 2000 to Oracle 7 favored Oracle.
  
  --
  blah blah blah
68. Re:Article summary by TooMuchToDo · 2010-03-28 08:35 · Score: 1
  
  Not to go too far off topic, but most of the US is at-will employment. You can be terminated at any point, for any reason (except those specifically exempted via law, such as race, sex, etc), without any sort of severance pay.
69. Re:Article summary by TooMuchToDo · 2010-03-28 08:46 · Score: 1
  
  Good points. To add to your comment, the .org registry runs on PostgreSQL.
  http://www.computerworld.com.au/index.php?id=760310963
70. Re:Article summary by amorsen · 2010-03-28 08:53 · Score: 1
  
  You can have perfectly fine relational DBs without SQL, and arguably, relational DBs *are* better *without* SQL.
  True, but because relational database engines are tightly connected to SQL-the-language, you can't switch language without switching engines. People who are good at designing computer languages usually aren't up to making an enterprise class database engine just to get their language out in the field...
  
  --
  Finally! A year of moderation! Ready for 2019?
71. Re:Article summary by DiegoBravo · 2010-03-28 09:00 · Score: 1
  
  I agree w/ the 95% of your great comment. I even tried (the hard way) to apply some O/R frameworks to finally fall in love (again) with SQL. Now, it's a bit unfair to compare something like Hibernate (which is just a front-end to SQL) to the real thing; for sure there will be a lot of things that the simpler front-end can't do (and of course, anything non trivial with SQL tends to be pretty weird with those frameworks.) AFIK the only "important" alternative to SQL currently considered are the XPath Querys, of course several light years behind the former.
72. Re:Article summary by GaryOlson · 2010-03-28 09:03 · Score: 1
  
  And in hexadecimal, two orders of magnitude is F^F -- and that's an F'ing large number.
  
  --
  Every mans' island needs an ocean; choose your ocean carefully.
73. Re:Article summary by squiggleslash · 2010-03-28 09:14 · Score: 3, Interesting
  Define irony: A guy who clearly has no experience with large scale database system telling others how bad SQL is while using a tiny fringe asstastic software package as an example.
  Nah, irony is someone writing a petulant rejoiner to a comment claiming the author doesn't know what they're talking about when you haven't actually spent any time trying to understand the original comment.
  Virtually every assertion you've made it based upon a failure to even make an attempt to understand what you're responding to:
  
  "Define irony: A guy who clearly has no experience with large scale database system telling others how bad SQL is while using a tiny fringe asstastic software package as an example." - There's absolutely nothing in my comment suggesting I have no experience with large scale database systems, and the only "example" I give that involves a "tiny fringe asstastic software package" was a passing reference to MythTV to make a point about... why SQL is unpopular for non-enterprise work.
  "Perhaps you should investigate the various SQL standards out there before you talk out your ass. I have a large web app that runs on Oracle, PostgreSQL and MSSQL, with the same queries. Slightly different scripts to create the database to deal with the differences in stored proceedures, so theres a little bit of truth there, but I could have moved the stored procedures to a different location if I wanted to." - You're responding to a claim that it's impossible to write queries that work under all three implementations of SQL. In fact, my first paragraph says no such thing. One of the things it says that case dependencies mean that it's very easy to write standard SQL that doesn't work on different platforms. That's absolutely true, it's one of the reasons why you'll be hard pressed to find any enterprise development shop that developers and tests under anything other than the target RDBMS(es). In addition, the first paragraph also points out a range of other issues with SQL that your response doesn't cover, such as the handling of nulls and blanks. Your response does not, as you claim, prove that "Your entire first paragraph is based on 100% factually incorrect statements." That's basically a lie.
  "Your second paragraph is clearly written by ... well, again someone who has never used a high end database. Any high end database worth its salt is designed to deal with raw disk space for its tables..." At this point, I'm not even sure what crack you've been smoking to think that I implied anything contradicting that. I didn't even address how high end RDBMSes store data physically on a disk. You then go off on a tangent about how crap MythTV is without ever actually addressing the point being made.
  "As for the last two paragraphs ... why bother, you're clearly disconnected and the rest is just you talking out your ass. Perhaps one should consider that its not SQL that sucks since so many people are capable of doing things with it just fine. Perhaps you should look a little closer to home and consider that your inability to use it is what sucks." So you actually are under the impression that abstracting the underlying RDBMS and ensuring that the HLL is kept separate is... a bad thing? That someone proposing it is "disconnected" and "talking out of (their) ass"?
  It doesn't matter much I guess, but I've been working in an Oracle shop for about fifteen years now, and done my best to push free software alternatives such as PostgreSQL in recent years - and, more importantly, seen our application support people push SQL Server with much the same results. I'm directly familiar with the ability of, for example, PostgreSQL to "run the same queries as" Oracle when no thought has been put into the differences. And I'm directly familiar with the type of software that you start to get when 50 or more developers all "think" they're SQL Gods, writing their bits of our applications according to what they think is best pra
  --
  You are not alone. This is not normal. None of this is normal.
74. Re:Article summary by squiggleslash · 2010-03-28 09:19 · Score: 1
  
  Thank you. Someone "gets it". Others saw my criticism of SQL and assumed that this meant I was saying it was the wrong technology, and then proceeded to defend SQL at the costs of all truth and sanity.
  The GP's response is just weird. Where I did claim SQL was an actual RDBMS? When someone feels the need to put words in my mouth (he's not the only one) in order to respond to something, well, it's evidence of not having an argument.
  
  --
  You are not alone. This is not normal. None of this is normal.
75. Re:Article summary by lena_10326 · 2010-03-28 09:30 · Score: 1
  
  But to get performance and fault tolerance for Oracle, you need to throw a lot of money at it -- high end hardware, RAC licenses etc
  I agree with your comment but I have a nitpick regarding the bit about RAC.
  RAC is for high availability because a node failure doesn't impact availability as much as a primary failover does. Despite the marketing claims, in practice RAC doesn't scale that well so it's not very useful for high performance. I know this by working in 2 shops which tried RAC. Both migrated off it quickly after trying it. Think about it: more nodes = more synchronization overhead of the entire dataset. You get a lot more performance by partitioning the data and distributing among a fleet of independent nodes (thus cutting out a big chunk of synchronization communication). That is essentially the tactic NoSQL solutions take.
  
  --
  Camping on quad since 1996.
76. Re:Article summary by haruharaharu · 2010-03-28 09:35 · Score: 1
  
  So it's a no-go on MSSQL for that Microsoft contract your company just got?
  Maybe he doesn't work for a whorehouse, er I mean body shop?
  
  And good luck on that no severance pay thing. "I'd fire anyone in my organization who suggested we callously disregard labor laws like that." :)
  He's probably in the US, so firing someone on the spot for having a tacky tie is legal.
  
  --
  Reboot macht Frei.
77. Re:Article summary by daver00 · 2010-03-28 09:42 · Score: 3, Funny
  
  But you go ahead, keep pretending you have some sort of clue and are witty by pointing out its expensive.
  Cos you know, its way cooler to just berate him for his obvious inferiority....
78. Re:Article summary by rkit · 2010-03-28 10:26 · Score: 2, Informative
  
  sqlite is extremely slow when writing data. The reason is its implementation of transactions with separate journal files for each transaction. Also, there is only a very basic query optimizer. The main advantage of sqlite is that it does not require administration, certainly not performance.
  
  --
  sig intentionally left blank
79. Re:Article summary by ppanon · 2010-03-28 11:07 · Score: 5, Insightful
  
  It's not heresy. However, I have seen a lot of crap data models produced by developers (even worse than what I come up with as GUI designs). I have also seen developers produce SQL that looked OK at first glance but performed abysmally under certain conditions (and have even saved the odd project by finding those and fixing them when the system started dying under load). If you access a SQL database like you would a set of flat files, it is never going to give you the performance that a flat file access will give you for raw throughput because you've got all the extra communications latency. However if you re-write your search and extract queries to pull your data in a single SQL statement instead of a statement for each of your N tables involved in the result, then SQL is going to kick ass as soon as you start getting enough data and users placing enough queries that all the indexes and caching can pay off.
  Flat files will work better for certain types of unstructured data, but most people who get crap performance out of SQL databases just don't understand how to use SQL databases properly. Which is why those True Believers tend to get upset about crap SQL implementations: because those tend to bog down a SQL server and slow down all the well-written apps too.
  No, the real problem with most SQL DBAs is that they haven't adapted to agile methodologies. They still want the data model to be spring fully armored from Zeus' head according to classic waterfall planning. What they need to do is to get some data modelling tools that support round trip engineering so that they can make changes as the developer needs them and have upgrade scripts checked into source control along with the code on new builds. Right now there's only a few tools like ErWin and Data Architect that support that kind of development, and they tend to be ridiculously expensive. The one exception is DeZign for Databases Professional which is comparatively cheap. A lot of companies will lay out a lot of cash for developer tools but won't fork over the dough necessary for their data modelers/DBAs to properly support developer activity. So yeah, the DBAs tend to be a little reticent to do all that work by hand. While there are some developers who still use notepad or gedit by choice, nobody seems to expect them to do it, or to have the same productivity as someone with a decent tool chain.
  
  --
  Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire
80. Re:Article summary by batkiwi · 2010-03-28 11:13 · Score: 3, Informative
  
  Timestamp in mssql is a misnomer, it's not a timestamp at all. It's more of a binary format concurrency key.
  This doesn't excuse the use of the name by MS, but once you realize that it makes the column useful again.
81. Re:Article summary by Jaime2 · 2010-03-28 11:24 · Score: 3, Informative
  
  ... and it was never intended to be. You link to an article stating that MSSQL timestamp isn't compliant with SQL 2003's timestamp definition. However, the first version of MSSQL out after 2003 deprecated the timestamp datatype. MSSQL timestamp is a unique update identifier that was never supposed to be a date/time. Think of it more as a update sequence number. If you want an actual timestamp, it's been there since the product was introduced in the form of the datetime datatype.
  
  Saying MSSQL doesn't have a proper timestamp is like saying that Oracle doesn't have a proper VARCHAR because Oracle only has a VARCHAR2 data type.
82. Re:Article summary by Anpheus · 2010-03-28 11:25 · Score: 1
  
  Honest question from a software developer: is there a free or low price developer version of Oracle DB to test code against?
  SQL Server has Express, a highly limited version, Compact, a limited edition for distribution with retail applications, and Developer which is for development use only and is functionally equivalent to Enterprise (only difference is the price: Developer edition is $50.)
83. Re:Article summary by FooBarWidget · 2010-03-28 11:31 · Score: 1
  
  "My experience has made me believe PostgreSQL is better in every respect."
  Except when you insert a row with a specific primary key value instead of having the primary key sequence generate a new value for you. Like for example when you're restoring an SQL dump. That'll totally screw up the sequence which you'll then have to manually fix, otherwise the sequence might generate a primary key that already exists. Insane. PostgreSQL does a lot of things well but when it comes to something simple like this MySQL does it far better.
  
  "Look at Oracle solutions. All their fancy eBusiness software is still Oracle SQL DB backed and some of the biggest companies in the world are using it."
  Or maybe people mean that they want to scale to millions of users without having to pay licensing fees that can bankrupt them.
  
  "SQL isn't the problem, it's a tool. Bad programmers are the problem."
  And money.
84. Re:Article summary by derfel+cadarn · 2010-03-28 11:44 · Score: 1
  
  We're using both - about five days from our "go-live", and things look good. We just use what makes sense for each part of our application. For us, this means PostreSQL for the parts that must be transactional ACID, and Amazon's S3 and SimpleDB for parts that don't.
  That makes sense, until you need to generate reports from the tables. How are you managing trying to do that?
85. Re:Article summary by CrashandDie · 2010-03-28 11:56 · Score: 1
  
  Oracle has a market penetration of nearly 100% in the Fortune 500 list.
86. Re:Article summary by ppanon · 2010-03-28 12:00 · Score: 1
  
  It's like saying that you should have a stick-shift car because automatic transmissions don't go as fast. It's just moronic.
  
  Bad car analogy. Automatic transmissions are less efficient than manual transmissions so given the same engine and all other things being equal, the car with the manual transmission will go faster. Now if you had said
  
  It's like saying that you should have a stick-shift because paddle shifters are gay. It's just moronic.
  
  then you might have had a point.
  
  --
  Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire
87. Re:Article summary by TemporalBeing · 2010-03-28 12:07 · Score: 1
  
  There is no standardized set of data types in the real world. Simple issues with unstandardized case dependencies can make an application that works with Oracle and only uses standard "select" statements not work under, say, PostgreSQL. And these are the surface level technical issues: talk to any relational database guru and they'll come up with numerous philosophical issues too.
  
  Well, SQL itself is pretty standard - you can find a list at http://en.wikipedia.org/wiki/SQL#Standardization.
  
  The problem is that Microsoft (SQL Server), Oracle, and most proprietary DBMS's don't do very well against that standard - it's just not in their interest to do so, as that would mean portability away from their products and they would actually have to compete on things like performance, reliability, etc.
  
  MySQL (from my experience) and PostgreSQL (to my understanding) follow the standards very closely; and it's in their interest to do so. It makes a fool of the bigger players (for not doing so) and gives them portability as an advantage (Hey, try out MySQL, if you don't like it your app is portable to others as we use the standards).
  
  --
  Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)
88. Re:Article summary by jonadab · 2010-03-28 12:09 · Score: 1
  
  > My experience has made me believe PostgreSQL is better in every respect.
  > It's more stable, has more features and is easier to use.
  
  However, it does take a little longer to initially learn to administer Postgres. It's little things like SHOW TABLES in MySQL versus learning which table the schemata are stored in under Postgres. It's not a very big deal, and IMO it's worth it, but there is a greater initial investment of time there.
  
  --
  Cut that out, or I will ship you to Norilsk in a box.
89. Re:Article summary by Jaime2 · 2010-03-28 12:30 · Score: 1
  
  Its handling of null and the empty string is incomprehensible and useless, in part because nobody involved ever had the cajones to do what needed to be done with both.
  Many languages handle null strings as different from empty strings, C# and Java are two examples.
90. Re:Article summary by ppanon · 2010-03-28 12:34 · Score: 1
  
  The "parse a string" API is simply indefensible when you want to build queries modularly, especially when there's no standardized grammar that's actually worth a damn for implementers.
  Actually SQL92 is pretty well supported by most modern databases. The real differences that would matter to an application developer tend to be in basic data types other than integers and character strings. Prepared statements and parametrized queries may not be as easy to use as "slap a string together" and do require a little care aforethought. On the other hand, they're not as prone to SQL injection vulnerabilities either. If you're careful, you can still build a query modularly so that it gets customized for a particular target database - you just can't keep changing it on the fly, so it has to be targeted for a single purpose. Maybe you mean something completely different by "building queries modularly" but if so you'll have to be a little more explicit because it's hard to see how you're going to do it in a DB-independent way.
  
  Even now the SQL standard still has no concept of indexes, so you will always be in vendor-land there.
  Huh? Why should the query language have any concept of indexes? It's an abstraction, Any index usage hints are going to be implementation specific. Asking for a generic SQL extension for indexes is a little bit like wanting standardized inline assembler directives in Java. While it is sometimes necessary to put in hints in queries, I would think those hints are always to provide extra information to work around the peculiarities of a particular query optimizer, and are therefore database specific.
  There are problems with SQL (you pointed out Date's issues with nulls). However most of the developer complaints I have heard tend to fall more under "Those who don't understand SQL are condemned to reinvent it, poorly."
  
  --
  Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire
91. Re:Article summary by Chitlenz · 2010-03-28 12:41 · Score: 3, Interesting
  
  Ummm FTFL?
  
  Timestamp equivalent * Eventually, MS will convert the current timestamp of a unique row number, to an actual date and time. * Use ROWVERSION instead of timestamp. Row version provides the same functionality and the same value as the current timestamp.
  
  MSSQL 2008 and above is fine, and we use timestamps almost to an atomic precision in medical imaging... eventually came right after that post ... in 2007. SQL Server Vs. Oracle/MySQL is the only fight worth wasting time on. Here's the thing about RDBMS. Not only has it been the standard for 20 years, virtually assuring their own persistence because by very nature they grow.. a LOT, but it is one of the few standards that actually has a solid foundation. You see, in this age of marketing driven products, there are still a few things out there quietly running the world. And I assure you it's not XML pages.
  
  my 2cents.
  
  --chitlenz
  
  --
  Imagination is the silver lining of Intelligence.
92. Re:Article summary by caerwyn · 2010-03-28 12:46 · Score: 1
  
  Yes. You can download oracle to develop against for free, in fact; I recently had to do it myself.
  
  --
  The ringing of the division bell has begun... -PF
93. Re:Article summary by microbox · 2010-03-28 12:48 · Score: 1
  
  The last thing on your mind will be the price of an Oracle license.
  
  Pity that you really don't get that many more useful features that are applicable for the 95% of people who use oracle just because.
  
  It doesn't matter if you use oracle or if you use free software if your developers don't really understand programming theory because they just aren't that interested in computer programming but got a degree in it anyway.
  
  But you go ahead, keep pretending you have some sort of clue and are witty by pointing out its expensive.
  
  I am sure you do wonderful things with oracle that you cannot do with free software. Good for you.
  
  However, I believe that most development shops could save a bundle of money by properly training and mentoring their staff. Big business can look a lot like the public service, in which case, an oracle license is just a par for the course, but hardly necessary.
  
  --
  
  Like all pain, suffering is a signal that something isn't right
94. Re:Article summary by caerwyn · 2010-03-28 12:51 · Score: 1
  
  Its handling of null and the empty string is incomprehensible and useless.
  The empty string is, AFAIK, only handled in a screwy fashion in MySQL. I know that at least in PostgreSQL it's completely sane.
  
  To this you add another component that's always an issue: the entirely haphazard way in which relational databases are implemented on most operating systems, whereby the DBMS is another application, that manages its own files, and needs to be coached with kind words and a happy smile in order to get anything done. Does your app use a database for something back-endy, like, for example, MythTV does for its settings and lists of channels and TV programs? Well, either forget it, or be prepared to put your users through hell as they have to ensure that the entirely separate DBMS is installed and that usernames and passwords are set up for your application's use.
  This pretty much completely ignores the many embedded SQL dbs out there, starting with SQLite and moving up.
  
  Because SQL sucks. It just sucks less than anything else designed to do the same thing.
  As someone who writes a lot of SQL, I've actually really got to disagree here. Thinking in SQL is not the same as thinking in other programming languages, but it's actually very straightforward for what it does, especially if you actually have some set logic background and can therefore reason out what's going on. I think the only bits that start to get hairy are the more recent additions- recursive queries, for instance.
  The biggest issue I see people have with SQL is thinking that they know more than the DB. When you back up a step and maintain a sufficient abstraction, the SQL pretty much just falls into place.
  
  --
  The ringing of the division bell has begun... -PF
95. Re:Article summary by ppanon · 2010-03-28 12:58 · Score: 1
  
  That's absolutely true, it's one of the reasons why you'll be hard pressed to find any enterprise development shop that developers and tests under anything other than the target RDBMS(es).
  So what? Do you know a lot of people who develop web pages for users of Internet Explorer (using "standard" HTML and CSS) and yet who only do their testing on Mozilla? Or who develop on Intel architecture for deployment on SPARC (be it Linux or Solaris) without ever testing on the target platform? While you can do some cross-development based on an abstracted interface, not doing testing on the target platform to eliminate implementation differences is a good way to wind up looking for a new job.
  
  --
  Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire
96. Re:Article summary by shutdown+-p+now · 2010-03-28 13:18 · Score: 2, Interesting
  
  ... were it not for the fact that SQLite is at least two orders of magnitude slower than any other database, including ones written by first year comp sci students.
  One of the following two things are missing in your post:
  1) A reference to back such a bold claim.
  2) A qualifier along the lines of "... with many concurrent writers".
97. Re:Article summary by shutdown+-p+now · 2010-03-28 13:22 · Score: 1
  
  You could say the same about assembly language. You could also say the same about threads, and dismiss things like functional programming and the actor model as fads.
  It's an interesting comparison, but I don't know if it's really apt. Consider this: when comparing e.g. hand-coded assembly, or even C, to FP or actor model, the latter is all about higher-level abstractions. In contrast, when you're comparing SQL to "NoSQL", the latter is less abstracted . It's like going from Java back to C.
98. Re:Article summary by geekpowa · 2010-03-28 13:32 · Score: 1
  
  One of the many issues I have with indexes not being exposed via SQL is that it requires feeding and caring of the underlying cost based planner.
  Now in an enterprise environment, when you have DBAs available to care and feed the CBP (It amazes me the number of people who call themselves 'DBAs' but have no idea on how to look after the Cost Based Planner). I am not a DBA, I am a programmer, yet whenever I encountered a DBA - odds are high I know more about critical concerns like CBP tuning then they do.
  But if you want to put a database into a low end environment, you are in trouble. All works well for a few months then the CBP goes haywire.
  Access to indexes provides predictability and a level of determinism that a reasonably competent programmer can understand. Which in low technology environments, and high volume OLTP situations this is desirable. In order to regain determinism for a SQL type database you need to understand statistics, histograms, sample sizes and how the CBP implementation uses that data and in my experience this is outside the domain skill of a typical programmer (and most DBAs).
  I am part purist - and part of me believes in hiding implementation details - but there are situations, common situations where it is completely inappropriate. Again - low tech environments and OLTP.
  This issue highlights a key problem with SQL. SQL as a language is confused about who its intended audience is.
  Is it intended for programmers : programmers who know alot about DBs, programmers who know little, low-tech end users, DBAs?
  SQL does things like hide indexing details from programmers - so one on hand it is trying to protect programmers from the DB internals and all this results in is a constant fight between programmers and DBAs as they blame one another for rotten DB performance because of confusion of responsibility. So even though details of the implementation are 'hidden' - how a programmer structures a query substantially impacts performance anyway - so responsibility for performance is diluted inspite of best intentions of hiding indexing details.
  Yet - SQL has massive gaps in it. #1 for me is that treats entity relationships as second class citizens. There is no consistent meta data for extracting these details let alone strong enforcement on a language level ( unlike JPQL java persistance query language which solves the problem reasonably well.). So the programmers are required to understand the relational internals in terms of how they are expected to link entities together.
  Why hide indexes, but make programmers labour on relationships and think carefully about performance of SQL they write anyway? Again - SQL is confused about its target audiences and it doesn't serve any of the audiences well.
99. Re:Article summary by TheLink · 2010-03-28 14:12 · Score: 2, Insightful
  
  I find it hard to believe that's the problem with "timestamp" that the OP was talking about. After all, couldn't the OP have just used datetime instead?
  
  Whenever you port from one RDBMS to another you're going to have to put up with all sorts of stuff like this. So I don't see this as a showstopper.
  
  So that's why I was curious on what the OP's problem was. If it's just "MSSQL names stuff differently and I didn't bother to do a bit of research to find that out" then I can ignore that particular complaint about MSSQL.
  
  Whereas if it's something else, then it could be far more interesting and more useful to know.
  
  FWIW, I'd think Oracle's behavior of treating an empty string as NULL could be more annoying, but lots of people still use Oracle...
  --
  
  Too many replies beneath your current threshold
100. Re:Article summary by Anonymous Coward · 2010-03-28 14:18 · Score: 1, Interesting
  
  In general, asking programmers to get things right in a language where "true = !false" is not a true statement is gonna be a difficult proposition. Hence the avoidance of prolog.
101. Re:Article summary by einhverfr · 2010-03-28 14:33 · Score: 1
  
  You know, there are a few things I dislike about SQL but in general I have a lot of good things to say about it. The first one is that declarative logic rocks! I love being able to define valid data and data constraints and be able to guarantee that these are followed!
  
  It was designed from the start as a language that would be integrated into others, and yet simple real world realities make that impossible, with 99% of implementations being of the "Build a large string, and pass that string to "the SQL connector" to be parsed and interpreted" form.
  In general, I try to minimize this by using user defined functions as named queries. That simplifies things a great deal. However, part of the issue is that SQL is designed to be a declarative language. I.e. you declare a set of mutually dependant set operations which are then parsed and executed. This is just different from the way procedural (and structured, and object oriented) languages operate.
  
  Its handling of null and the empty string is incomprehensible and useless, in part because nobody involved ever had the cajones to do what needed to be done with both.
  You are either talking specifically about Oracle or you have a very different idea of what is incomprehensible and useless. (Yes the fact that Oracle treats empty string and nulls as equivalent is truly braindead.)
  
  To this you add another component that's always an issue: the entirely haphazard way in which relational databases are implemented on most operating systems, whereby the DBMS is another application, that manages its own files, and needs to be coached with kind words and a happy smile in order to get anything done.
  Maybe I am spoiled by PostgreSQL, but I have absolutely no idea what your complaint is here. Also note that Pg is quite capable of drawing user accounts from an external source, like the system (PAM), LDAP, Kerberos, or even the cn of the client cert used to connect via SSL.
  There are some braindead things about the SQL spec. I think the folding to upper case is a braindead element I am glad that PostgreSQL avoids. However it really sounds to me like you are using the RDBMS wrong.
  BTW, I do agree there are some cases where NULL handling is problematic. However, these are generally corner-case issues and SQL is probably easier to understand that what the mathematically correct behavior would be. Or course, from a mathematically correct viewpoint (TRUE or TRUE) should evaluate to FALSE but no programming language is that pedantic.
  
  --
  
  LedgerSMB: Open source Accounting/ERP
102. Re:Article summary by pavon · 2010-03-28 14:33 · Score: 1
  
  If you start with something like SQLlite rather than flat files, then you will get all the benefits of a RDB upfront, and it will be much easier to migrate to a "real" database if you ever need to in the future.
103. Re:Article summary by einhverfr · 2010-03-28 14:45 · Score: 2, Informative
  
  OK, so enlighten us with your brilliance! Share with us the ultimate answer of what should be done to differentiate a null (logically, "I don't know") with a blank string (logically, "We know there's nothing there") and what should be done differently?
  Well, the way PostgreSQL handles it is that a NULL is stored as a NULL and treated as one (i.e. NULL || ' more text' evaluates to NULL). '' is stored as an empty string and processed as one (i.e. '' || ' more text') evaluates to ' more text'
  Really, that strikes me as the correct way to do things (that seems obvious...). Oracle OTOH is braindead in its approach of treating NULLs and empty strings as equivalent.
  
  --
  
  LedgerSMB: Open source Accounting/ERP
104. Re:Article summary by pavon · 2010-03-28 15:00 · Score: 1
  
  You can use any number of technologies "under the hood" including those
  types of technologies commonly referred to as "NoSQL" and put an SQL interface in front!
  Except the NoSQL technologies don't support some of the most fundamental features of a RDB. You would end up with something that has the same syntax as SQL but which performs significantly different operations, with completely different data guarantees. In object-oriented jargon, it would break the Liskov Substitution Principle. The semantics of an API affect how you write code using that API far more than the syntax does. Syntax is easy, and I really don't think you would be buying yourself anything by using SQL as the API to a non-RDBMS.
105. Re:Article summary by SanityInAnarchy · 2010-03-28 15:14 · Score: 1
  
  In contrast, when you're comparing SQL to "NoSQL", the latter is less abstracted.
  Well, depends how you define NoSQL. The idea behind it is to use anything but SQL -- do you seriously mean to say that SQL is the only possible way to abstract database operations?
  Many of the new NoSQL databases use REST APIs. Some of them use Map/Reduce. Some of them use common abstraction layers like Lucene. I would argue that in most cases, they are more abstract than SQL. When we're talking about a system where I can design my application without knowledge of where data will physically be stored, yes, that's more abstract than a system where I must manually shard my data and send the SQL query to an appropriate server.
  So no, I would argue it's much more like going from Java to Erlang, or vice-versa. Erlang is more abstract with respect to threading, and less abstract with respect to string manipulation.
  
  --
  Don't thank God, thank a doctor!
106. Re:Article summary by smash · 2010-03-28 15:19 · Score: 1
  
  There are pubs in england that are more than 2x as old as that, where patrons have been chatting over a pint in real english.
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
107. Re:Article summary by Tacvek · 2010-03-28 15:24 · Score: 2, Interesting
  
  What I find interesting is that one of the biggest users of a BASE[0] non-relational database (a NoSQL database), namely Facebook, who uses Cassandra [1], has created an SQL style query interface named FBQL. The interface includes some rather advanced SQL features like embedded sub-queries in addition to the traditional selecting on joined tables.
  Then again, that may be due in large part to the fact that they are using a database schema that is all but identical to a normalized schema used in relation ACID databases, and simply code with the expectation that the database may be inconsistent, so always expect broken references. That is not really the optimal way to use a non-relational database, but it works.
  [0] Basically Available, Soft state, Eventual consistency. The somewhat the opposite of ACID.
  [1] This can be pretty noticeable at peak hours, when you end up seeing an inconsistent database, one in which you are friends and are not friends with another user at the same time.
  
  --
  Stylish sheet to fix many problems in Slashdot's D3: https://gist.github.com/801524
108. Re:Article summary by dgatwood · 2010-03-28 15:32 · Score: 1
  
  Sure, you can learn the syntax pretty easily, but the syntax still makes it a pain in the backside every time you use it no matter how well you know it. It's just too clumsy. I mean you have:
  insert into foo (bar, baz) values ("x", "y");
  update foo set bar="x", baz="y";
  What kind of crack do you have to smoke to come up with a query language where two of the three most common operations have a completely different syntax for no good reason? Pick one style or the other. And to a lesser degree:
  create table foo (foo bigint bar bigint);
  alter table foo add column foo_column bigint after bar
  Why not standardize that:
  alter table foo add column (foo_column bigint, baz_column bigint, ... ) after bar
  And so on. Pretty much everywhere you look in the SQL syntax, the inconsistencies stick out like a sore thumb.
  For that matter, having relationships between a single table and multiple columns in another table are the exception rather than the rule, so why not design the tree structure into the database itself so that for the 99% case, you don't have to specify all that crap in the syntax. Imagine something like this:
  select fullname, subject, body from comments, users, docs where path=/path/to/foo.html;
  instead of
  select fullname, subject, body from comments, users, docs where path=/path/to/foo.html and comments.doc_id=docs.id and comments.user_id=users.id;
  And in the case of ambiguity,
  select fullname, projectname from users, projects related by project_owner;
  select fullname, projectname from users, projects related by project_members;
  Oh, yes, and allow array data types. Having to create a separate table to associate projects with users for the above many-to-many relationship is absurd. Sure, it would probably be implemented as a table internally, but there's no reason for such an ugly implementation detail to be exposed in the syntax.
  Finally, given that security in SQL is a perpetual problem, why not design a syntax in which data values cannot be specified in the syntax and must be passed out of band? If you need to enter a query manually, the client could prompt you for the values.
  
  --
  Check out my sci-fi/humor trilogy at PatriotsBooks.
109. Re:Article summary by Hangtime · 2010-03-28 15:34 · Score: 2, Interesting
  
  From the immortal words of Joe Celko in response to a similar question you discuss and one of the most true statements ever written:
  
  My SQL program is trying to compete with a flat file system.
  If you want to get data to a single user, in a fixed format, you will
  lose. The reason we have databases is not speed. Databases are for sharing
  data (concurrency control and all that jazz), and keeping data integrity
  (normal forms, constraints and all that jazz).
  You can get to the ground floor a lot faster by jumping down an empty
  elevator shaft instead of waiting for the car to arrive. However, there
  are trade-offs ...
  --CELKO--
  If data has little to no value for you then you do not need a relational database. However, if data is of any importance to you then you have to think beyond a flat file. Flat files, hierarchal databases have been around since the dawn of computing. Relational databases were brought about to solve concurrency and integrity problems inherent in these models not to make your application faster. Like the quote implies jumping out down the elevator shaft is faster then taking the car, but there are trade-offs. I think the better question would be is why does your database design or queries take so much time that flat files are faster when there are just a few users of the system?
110. Re:Article summary by RedMage · 2010-03-28 15:40 · Score: 1
  
  We've managed to avoid the question so far due to the nature of the data we've put into AWS. We won't be able to avoid it forever, and I suspect that some kind of decision support DB will be needed not too far into the future. Where that will live, I don't have an answer yet.
  
  --
  }#q NO CARRIER
111. Re:Article summary by shutdown+-p+now · 2010-03-28 16:09 · Score: 1
  
  Well, depends how you define NoSQL. The idea behind it is to use anything but SQL -- do you seriously mean to say that SQL is the only possible way to abstract database operations?
  The idea isn't to "use anything but SQL". Some "NoSQL" databases use a subset of SQL for queries.
  The idea is to "use anything but relational DB". And, of course, a relational DB needs not use SQL for a query language, either.
  
  When we're talking about a system where I can design my application without knowledge of where data will physically be stored, yes, that's more abstract than a system where I must manually shard my data and send the SQL query to an appropriate server.
  I think you misunderstood my application of "abstract". They are on the lower levels of conceptual abstraction - they expose the user to concepts that are more primitive than relational ones. For example, in a proper RDBMS (SQL to not), transaction are magic - they "just work", In a NoSQL DB, if you want ACID, you need to roll it out yourself. Or, say, joins - again, in a RDBMS, you specify the join, no matter how complicated in terms of what you want to get, and let the query analyzer figure out how to get there; whereas most NoSQL DBs don't support joins at all, only key lookups (but, of course, you can do a join yourself - with dismal performance, though).
112. Re:Article summary by colonelquesadilla · 2010-03-28 16:10 · Score: 1
  
  I think his point was that automatics are only less efficient due to the hydraulic torque converter, but automatically shifting with an electronic clutch, which is done on higher end systems, doesn't effect efficiency at all.
  
  --
  It's either false dichotomies, or the terrorists win, you decide.
113. Re:Article summary by derfel+cadarn · 2010-03-28 16:20 · Score: 1
  
  Sounds like a data warehouse derived from multiple DBs from different vendors. I bet it will be lots of fun :)
114. Re:Article summary by Phroggy · 2010-03-28 16:35 · Score: 1
  
  Quite right, I was apparently not paying much attention. However, I believe my point is sound.
  
  --
  $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
  $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
115. Re:Article summary by Phroggy · 2010-03-28 16:37 · Score: 1
  
  I'm also not paying attention now. 20x? Seriously?
  
  --
  $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
  $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
116. Re:Article summary by FlyingGuy · 2010-03-28 16:59 · Score: 1
  
  Also, some the biggest general ledger applications deployed are running on MS SQL, that includes Great Plains and Navision.
  I don;t know about navision since I have never heard of it, but as for Great Plains, the only reason people use MS-SQL is because the have no choice since MS bought the company.
  This is not the same as using it because they choose to do so.
  
  --
  Hey KID! Yeah you, get the fuck off my lawn!
117. Re:Article summary by einhverfr · 2010-03-28 17:19 · Score: 1
  
  There is one area where RDBMS's bring substantial performance improvements to the table: after-the-fact reporting.
  In a hierarchical database or a flat file, you generally have to read all the data into your program to generate a report if the database wasn't designed from the beginning with the report you are generating in mind. In an RDBMS or similar, you have sophisticated aggregate and join capabilities which generally can save a lot of time in this area.
  
  --
  
  LedgerSMB: Open source Accounting/ERP
118. Re:Article summary by ArsonSmith · 2010-03-28 17:47 · Score: 1
  
  I work in a company that deals with a lot of data. Mostly transaction log data that we have to do statistical analysis of, and we are getting more and more almost daily ( to the tune of terabytes a day ) We are wholly owned by our clients and have to deal with them mostly for funding. Even though we out scale Oracle quickly we can't afford it. As we move forward we are looking at Hadoop due to cheap and easy scalability along with the fact that there is processing power tied to every chunk of data that we can use our analysis software on.
  Working in what is effectively Majority Report of crime prevention is not an easy task and needs a lot of both storage and processing power.
  
  --
  Paying taxes to buy civilization is like paying a hooker to buy love.
119. Re:Article summary by Anpheus · 2010-03-28 17:52 · Score: 1
  
  Is there a particular SKU or could you link me to the relevant download and/or licensing info?
120. Re:Article summary by ArsonSmith · 2010-03-28 17:54 · Score: 1
  
  but my mythtv box does over a million channel lookups a second across 230,000 tuner cards.
  
  --
  Paying taxes to buy civilization is like paying a hooker to buy love.
121. Re:Article summary by ArsonSmith · 2010-03-28 18:00 · Score: 1
  
  Were the Democrats the "right wingers" in the early-to-mid 90s? sure seems that it was them that got kicked out for trying to pass this government money grab known as healthcare just over 15 years ago.
  
  --
  Paying taxes to buy civilization is like paying a hooker to buy love.
122. Re:Article summary by mini+me · 2010-03-28 18:03 · Score: 1
  
  CouchDB was essentially invented at, and receives development efforts from, IBM. If what you say is true, CouchDB and NoSQL by extension should see the success that SQL has received in the past.
123. Re:Article summary by einhverfr · 2010-03-28 18:14 · Score: 1
  
  The "parse a string" API is simply indefensible when you want to build queries modularly, especially when there's no standardized grammar that's actually worth a damn for implementers. Even now the SQL standard still has no concept of indexes, so you will always be in vendor-land there.
  I think you are way off base here. Really you have three ways to build queries in a modular fashion in SQL:
  1) Use views
  2) Use user defined functions
  3) Use some sort of SQL abstraction layer like an ORM.
  In all cases you still are parsing a string, but your queries can be modular and have some code reuse. The real problem though is that you are looking at the RDBMS wrong.
  You might as well say "The parse a string API (HTTP) is indefensible when you want to build web pages modularly....."
  
  People like C.J. Date have issues with nulls too, but most people do grudgingly admit that they have their uses in the real world. This doesn't make them suck any less when it comes to dealing with tri-valued logic everywhere (or more accurately, the unpredictable ad-hoc behaviors you get on a per-app basis when they map it onto bi-valued logic)
  That's a good reason to put null handling into the db layer (i.e. properly handle it in your queries). I dunno about you but I think it is more important for a math engine to be mathematically correct to at least a reasonable approximation of correct than for it to be easy for programmers to integrate. I may be weird though.....
  
  --
  
  LedgerSMB: Open source Accounting/ERP
124. Re:Article summary by einhverfr · 2010-03-28 18:20 · Score: 1
  
  Unfortunately exposing indexes to the query wouldn't solve the problem you are describing. Let me tell you a story:
  I deployed a POS solution for a retail customer backed by an RDBMS. For a year everything went fine. Then all the sudden the invoices were taking 45 seconds to post and re-initialize. Obviously this is not good for a cash-register-type program.
  After some debugging I discovered the problem was in CBP areas. I asked about it on the PostgreSQL list and got an insightful answer from Tom Lane.
  Basically, PostgreSQL makes a number of pessimistic assumptions about tables which were entirely empty at last check. In particular it assumes they may be being filled in quickly. Since my customer wasn't using two of the tables involved in one of the queries, it was assuming these took up ten disk pages and hence shifted from a hash join to a nested loop...... Indexes wouldn't have solved the problem. What I had to do was find the problem queries and patch them for that customer.
  Furthermore, I think basic queries should not have hints exposed. However, prepared statements should have hints exposed because you lack enough information in the preparation of the query to choose a good plan. Here, and only here, should indexes be exposed to the query.
  
  --
  
  LedgerSMB: Open source Accounting/ERP
125. Re:Article summary by einhverfr · 2010-03-28 18:29 · Score: 1
  
  The empty string is, AFAIK, only handled in a screwy fashion in MySQL. I know that at least in PostgreSQL it's completely sane.
  I assumed he was talking about Oracle.... MySQL handles empty strings and nulls as equivalent too? That sucks.....
  
  --
  
  LedgerSMB: Open source Accounting/ERP
126. Re:Article summary by geekpowa · 2010-03-28 19:07 · Score: 1
  
  Hi
  I don't quite follow your worked example. If I am getting it right, the CBP switched from doing a HASH join to a index join. To best of my knowledge this would only cause performance problems when dealing with very high data volumes. if you are looking up say 100k records, each join would cost a b-tree scan, but with a hash the cost of that would be reduced significantly.
  You have a link on postgres forums where issue was discussed. I am curious now - that you have presumably an OLTP type interaction where the CBP got it so wrong to the tune of 45 seconds.
  My experiencne with CBP is that they are great for reporting, but 99.999% of the time for OLTP you want simple b-tree walks to resolve primary key and foreign key lookups. Even in situations where a hash join would consistently yield better results - in a non SQL type programming environment (like berkley DB) - you'd go for a temporary client side cache to effect your hash join.
  For OLTP in my experience, cardinality of data doesn't change significantly to justify employing CBP to pick best strategy at the time. Normally decision made a program time will service all needs to sufficient performance with the added benefit that result will be deterministic and consistent. Extending this to prepared statements - if you are doing primary key lookups in a prepared statement you will always want it to use b-tree index. Even if there are only 10 records in it and a seqscan is thoeretically faster - we are talking about microseconds.
127. Re:Article summary by arethuza · 2010-03-28 20:29 · Score: 1
  
  Navision isn't really that big - it's meant for fairly small implementations. SAP - that is large.
128. Re:Article summary by xelah · 2010-03-28 21:42 · Score: 1
  
  Except that 200 years ago the British English spelling was -ize. Organization with an 's' is quite recent in UK English, probably because most people here associate 'z' with Americans and don't want to risk dirtying their English with your weird American ways. The OED will still tell you that 'organize' is preferred. (And so will I, which obviously clinches it).
129. Re:Article summary by vegiVamp · 2010-03-28 21:43 · Score: 2, Funny
  
  But.. but... surely, if *one* order of magnitude is 10x, *two* orders of magnitude must be 20x ? Please ? *cough*
  
  --
  What a depressingly stupid machine.
130. Re:Article summary by vegiVamp · 2010-03-28 21:48 · Score: 1
  
  > Well spank my uncle and grease my kittens
  You owe me a keyboard decaffeination.
  Good post, though. I wish I could put things as eloquently as you.
  
  --
  What a depressingly stupid machine.
131. Re:Article summary by rmm4pi8 · 2010-03-28 21:53 · Score: 1
  
  Really? Because I can almost saturate a gigabit pipe for around $100k/yr these days. Say I've got MySQL on almost 100 cores...now sure you'll say Oracle is more efficient, and it probably is, but I still need at least two boxes in case of hardware failure. I haven't seen anything suggesting I can get Oracle on say 8 cores for $10k. Now maybe this just means I don't "need" Oracle, which in some sense is trivially true since it's running on MySQL and it works, but that doesn't seem very helpful.
  
  --
  U.S. War Crimes blog. Email for free Mandriva support.
132. Re:Article summary by rmm4pi8 · 2010-03-28 22:04 · Score: 1
  
  It's 2010. "GB in size" no longer means something big anymore. That said, MySQL and PostgreSQL both handle datasizes up to a terabyte and several-billion-row tables just fine with mostly standard SQL using the usual tricks. If you're talking petabytes, now that's a separate grade of mess. But if you follow the people using this in production at scale and talking about it in public (eg Facebook, Google) you'll see that the issues for MySQL are really around update/insert performance, replication speed, and replication transaction safety. I don't see anybody talking about Postgres scaling quite that publicly, but in my own experience Slony also has some issues with replication speed (and hot standby is great, but until you can query the slave it's not solving a huge class of realworld problems--and that capability has been forthcoming for a long time, but I'm not sure we're really any closer). Anyway, that frustration aside, PostgreSQL is a damn fine database, and I wish I didn't have to deal with MySQL at all.
  
  --
  U.S. War Crimes blog. Email for free Mandriva support.
133. Re:Article summary by rmm4pi8 · 2010-03-28 22:07 · Score: 1
  
  Hear, hear. NoSQL is all about running into the write performance limits of commodity hardware and realizing that moving from ACID to BASE is loads cheaper than sharding.
  
  --
  U.S. War Crimes blog. Email for free Mandriva support.
134. Re:Article summary by Thundersnatch · 2010-03-28 23:24 · Score: 1
  
  You clearly don't know what you're talking about. MSSQL has never lost a byte of data on me since 1996, and one of our SQL server apps has 700,000 users doing 1200 requests per second. You've never actually used it in production, have you? It is by far Microsoft's best product. It is also the most ANSI-compliant of the majort databases (save Postgresql of course).
135. Re:Article summary by thijsh · 2010-03-29 00:35 · Score: 1
  
  I really love PostgreSQL too, but in stability it is far from perfect... certain queries can bring it down easily. And there is a most annoying bug with stored procedures returning a SETOF some table row... When you alter the table it will bork with some vague error " structure of query does not match function result type", and nothing you can do will fix this inconsistent state short of removing the entire database and restoring it from a backup.
136. Re:Article summary by Gr8Apes · 2010-03-29 01:00 · Score: 1
  
  But I've never seen them [DBA priests] actually get the SQL working to the point that it can supplant the flat files. The parts that do work are always so slow that turning on the "useDB" switch makes it too sluggish to actually use. In some cases, I can get around this by writing "pre-pass" code to extract the common data sets from the DB and write it to flat files, which the interactive software can read through quickly ... It's better to just develop stuff that works, and let the DB experts handle the task of porting it to the DB. That way, we developers can keep our hands clean of all the theology, and actually develop stuff that works.
  You've just succinctly described why so many see DBs as slow and ponderous. No, it's not because they are, it's because the developer didn't do any DBA design nor understand the first thing about DBs to be able to design a decent architecture. Using flat files for your tasking reduces the DB to a pure full fetch model which flat files will usually be faster at for infrequent (ie, non-cached) requests.
  You'd be much better served running your POC against mysql or postgresql than flat files. (depending on what flavor of DB you would be running against and the assumption that your "priests" wouldn't give you a copy of their actual DB to play with) At least this would force you into various considerations of data design and to utilize DB functionality.
  
  --
  The cesspool just got a check and balance.
137. Re:Article summary by Thundersnatch · 2010-03-29 01:06 · Score: 3, Informative
  
  I'm fairly certain that SQL Server inherited its TIMESTAMP keyword from Sybase, and that usage of TIMESTAMP pre-date SQL-89 and SQL-92 usages of that keyword.
  In short, they can't fix it properly, because it would break a ton of existing (very critical) applications that use the existing Sybase and MSSQL semantics of TIMESTAMP. Microsoft deprecated its usage of TIMESTAMP long ago, but they can't just change it without pissing off a lot of people. Oracle is in the same boat with many of its features that "violate" the ANSI standards.
  It's sort of like bitching about IE6 not supporting CSS2 features. IE6 predated the CSS2 standards ratification. It's actually the fault of those writing the standards: they ignored widely-used software and practices. In this case, they chose to use the TIMESTAMP keyword when something like DATEWITHTIME would have been clearer and would not have collided with anybody.
  In my experience, MSSQL is actually the most ANSI-compliant of the major commercial DBs.
138. Re:Article summary by caerwyn · 2010-03-29 01:24 · Score: 1
  
  Just head to http://www.oracle.com/technology/software/index.html and grab the standard edition. The license for it allows for development against it, with some expected caveats: 1 download = 1 person, you can only use it for development (ie, you have to buy a license the moment you start using your app for anything other than testing), and you can only install it on one server.
  You *do* have to buy a license if you want to test more complex oracle deployments, though, it seems.
  
  --
  The ringing of the division bell has begun... -PF
139. Re:Article summary by Gr8Apes · 2010-03-29 01:25 · Score: 1
  
  I look around at all the frameworks that have evolved to not do SQL (EJB-QL, Hibernate, etc) and I laugh. None of those languages come close to handling the same breath and width of problems that SQL can be used to solve. Whenever I see advocates of these frameworks all puff up with fervour, I feel like shaking them and say "Your emperor has no clothes!". The list of problems these frameworks can't solve is so huge that one wonders why anyone works with them at all.
  Having had hibernate and JDO forced on me on three previous projects, I can fully support your assessment of these "helper" frameworks. They're functional in their extremely small niches, maybe. What I find humorous is that as soon as you deviate from their 1 or 2 tricks, you'll find hundreds or thousands of lines of code with half of it will being bad SQL to mimic what a simple 1 or 2 line well formed SQL query will give you had you not used said framework. (I'll pass this on: small JDO project - 35K lines of vendor specific DB code, very small hibernate project, 10K lines of code, also vendor specific, and finally, a rather sizable project using hibernate, somewhere in the neighborhood of 40+K lines of code and still counting, but this one's special as it's also infected with the Spring framework. I'll know more in a month or so after I've stripped out both as I need real access to the persistence layer to allow for an entire order of magnitude in scalability and functionality)
  
  --
  The cesspool just got a check and balance.
140. Re:Article summary by Vectormatic · 2010-03-29 02:03 · Score: 1
  
  i think you mean F*F, or F^2
  and honestly 0x100 times slower would be pretty bad for a database, but F'ing large? peh!
  
  --
  People, what a bunch of bastards
141. Re:Article summary by Tanktalus · 2010-03-29 02:06 · Score: 1
  
  You're apparently measuring cost wrong.
  You don't compare the cost of your DB license against that of your bandwidth cost. That's apples and oranges. Moreover, they're orthogonal costs.
  You compare the cost of your DB license against that of its competitors. So, you compare the cost of Oracle vs DB2 vs MSSQL vs PostGREsql vs MySQL vs Teradata, etc. Of course, some of these may affect your bandwidth cost (going with MSSQL will mandate Windows servers which may require a different hosting plan if you aren't self-hosting), but mostly don't affect it, so they're separate concerns.
  So then the question is: is Oracle expensive for its performance compared to DB2, MSSQL, etc? If you believe the answer to be yes, and I doubt that you, specifically do, but others might, then saying "Oracle costs a lot" is warranted. I am explicitly not saying what I think so as not to cloud the issue here.
  If you think Oracle is an outright steal for your requirements, and I'm sure many who work at Oracle would say so, then I'm sure you could argue that Oracle is not only not expensive, but the only obvious choice. And you might be right. At the same time as others might be right about Oracle being overpriced for their purposes.
  One possibility, and I've not looked into this so this may not represent reality, is bundling: Oracle may bundle everything, and make you pay for most of it. DB2, as an example, may have everything in little pieces. With Oracle, if you need most of the bundle, you're getting a deal because IBM would nickle-and-dime you to death. But, for others who may not need much over and above plain SQL access to their data, DB2 would be far cheaper. And you'd both be right.
  And both would be more expensive than the average developer's salary, make it look to the underinformed that both are expensive, whereas they probably will save your company more than their cost in development of an in-house equivalent that has similar performance and reliability.
142. Re:Article summary by corbettw · 2010-03-29 03:03 · Score: 1
  
  It's like saying that cars suck because they don't have cruise control!
  "under the hood"
  It's like saying that you should have a stick-shift car because automatic transmissions don't go as fast.
  the "socialist" health car plan that just passed
  Do you see now? Do you see what happens when you use too many car analogies? Do you?
  
  --
  God invented whiskey so the Irish would not rule the world.
143. Re:Article summary by jc42 · 2010-03-29 04:09 · Score: 1
  
  ... so why don't you simply prototype your apps to use a local rdbms while you wait for the dbas to port it to production?
  Because that usually takes orders of magnitude longer than prototyping without the DB. I explained the reason: The DB is always under the control of the DB priesthood, and nothing can be done without their permission. Yes, I could set up my own "toy" DB, but that would be stepping on the priests' toes, thumbing my nose at them, or whatever metaphor you prefer to say that they would be highly offended by my intruding on their turf when they discovered what I'd done. It would permanently end any chance of them ever approving anything I wanted.
  Instead, I agree in principle that it would be better to use the DB, but while I'm waiting for approval of that, I write a "temporary" prototype using flat files. I even include a flag in my code to switch to the DB access routines when they're finally working. Invariably, the project goes live using my prototype. I don't think I've ever actually seen a project in which the DB interface part was ever implemented fully while I was working there. I did once talk to the client several years after I'd left and learned that they had augmented my code with DB access, though my flat files were still in use for most of the task. The guys I talked to explained that the DB was just too slow for most of the task, and the folks in charge of the DB had never gotten around to making it work well.
  The file and directory based system you describe sounds brittle to me.
  Nah; it's the DB approach that's brittle. Data in a DB can change its format at any time, without warning, on the whim of the DB priesthood. I've seen it happen too many times to trust the DB. If you want your data access to be reliable, the team supporting the app needs to also control the data. If the data is controlled by a different department, inter-departmental politics always leads to such disasters. It's all to easy for the people in charge of the DB to shoot down the apps of people they're currently not getting along with.
  Note that I'm not talking about anything technical here. Technically, there are good reasons to develop using the DB system. The problems are all due to organizational politics. Database systems have a strong tendency to become power centers that works independently of other parts of the organization. This is inevitable, due to the complex nature of DB systems, necessitating the professional "priesthood" that cares for the DB system, changes its diapers, etc. File systems can run indefinitely without the need for file-system experts on staff to keep them running, but this generally can't be done with DBs (except maybe for mySQL). So you get organizational politics and all the mess that that entails.
  When it's (organizationally) possible to have a small DB entirely controlled by the team supporting the apps, and no way for a separate group of DB experts to take control of the data, then DBs will become a reasonable tool for the development side of the house. Until then, DBs will remain the bureaucratic nightmare that they are in most organizations, and developers who want to get their job done will simply ignore the DB side of the house for most of the development process.
  
  --
  Those who do study history are doomed to stand helplessly by while everyone else repeats it.
144. Re:Article summary by xouumalperxe · 2010-03-29 05:37 · Score: 1
  
  Probably closer than that. When I read his post, I thought "Well, you certainly pulled that idea from Uranus".
145. Re:Article summary by roman_mir · 2010-03-29 06:01 · Score: 1
  
  I am not a 'NoSQL' proponent in any way, but an RDBMS is not the only way to control ACID properties and to be transactional. I am certain there are databases that are not RDBMS and are transactional and ACID compliant.
  RDBMS is mainly about relationships between table columns, foreign references and such.
  
  --
  You can't handle the truth.
146. Re:Article summary by Anpheus · 2010-03-29 08:05 · Score: 1
  
  That's a little disappointing, the bit about not being able to test more interesting deployment topologies.
  However I appreciate your help here, this will help me as it gives me something I can validate code against and say "Yes I can support Oracle."
147. Re:Article summary by SanityInAnarchy · 2010-03-29 08:17 · Score: 1
  
  The idea is to "use anything but relational DB".
  Then I should rephrase: Are you really trying to say that a relational database is the only possible way, or the highest possible way, to abstract database operations?
  
  I think you misunderstood my application of "abstract". They are on the lower levels of conceptual abstraction - they expose the user to concepts that are more primitive than relational ones.
  No, I understood what you meant, I just disagree that RDBMS is higher-level in any absolute sense. For example:
  
  in a proper RDBMS (SQL to not), transaction are magic - they "just work"... Or, say, joins - again, in a RDBMS, you specify the join, no matter how complicated in terms of what you want to get, and let the query analyzer figure out how to get there...
  ...if you're on a single machine with zero hardware issues.
  Beyond that, you have to think about it, and plan, and do things like shard your data -- which kills both joins and transactions across data on multiple shards.
  As an example, Google App Engine does support transactions. They're different, sure, but they exist -- they optimistically-lock, and they apply only to a single entity group at a time, but they are fully ACID within those limitations.
  Now, let's talk about joins. You're in the traditional RDBMS mindset, which means joins are important to you, all the time. It's pretty much the only way you can get the database to talk about any structure more complex than a single record. Say I want to add tagging to a given record type -- in an RDBMS, the obvious way to do that is to have at least two tables, one table full of tags, and another representing the relationship between a given tag and a given record. I'm also going to need some deep voodoo to query that properly.
  Contrast to, say, App Engine again -- arrays are natively supported, and can be queried on. I can simply create a 'tags' field on the record and fill it with a list of tags, and then I can query for which records have a given tag -- and the database natively understands this.
  Or say I want to store a complex data structure -- let's say a cached RSS feed. This is going to be either an absurdly complex set of tables in SQL, or maybe I'll just dump the raw XML as a text field -- but then how would I query it? Or I could simply convert that XML to JSON (or, hell, leave it as a string) and dump it into CouchDB, in which any and all queries beyond get-by-key are map/reduce views. In this case, if I wanted to query by something like author, I could create a JavaScript map function which pulls that out. And Riak is shaping up to be better at this kind of thing than CouchDB -- in Riak, I'd probably store the raw XML, unmodified, and query it using any language I want.
  You can sort of emulate this in SQL -- create a table with a text field to store the raw RSS dump, then anytime you come up with a new kind of query, you add a column, then manually go through every record in your database to initialize the default value of that column, since there's no way you're going to be parsing and manipulating XML in a SQL UPDATE statement.
  But the key word there? Emulate. In other words, CouchDB or Riak give you higher-level concepts which natively support map/reduce, which have it Just Work, even scaling out to a cluster, while SQL is going to make you build it yourself.
  It's different primitives, but they aren't automatically lower-level just because they're not the ones you're used to.
  So again, SQL has its place. It certainly makes sense if you have a ton of data to log and analyze later, and you know it's in a fixed schema, you don't know how you're going to query it, but you do want those queries to run quickly. It also makes sense if you just need a quick place to store stuff on one machine -- SQLite is amazing. But neither SQL itself nor RDBMS in general is the end-all of databases, and they certainly aren't the highest-level databases.
  
  --
  Don't thank God, thank a doctor!
148. Re:Article summary by SanityInAnarchy · 2010-03-29 08:21 · Score: 1
  
  Sorry for the double-post... I did think of something else regarding Erlang.
  In Java, you get high-level primitives that let you do crazy things with threads, so long as you remember to synchronize everything -- if you want to avoid locking, you have to build something else yourself. In Erlang, you get processes which are arguably higher-level, but isolated from each other -- if you want to use any sort of shared state in Erlang, you have to build it yourself.
  That's more or less what I'm saying here -- if you want to join, or transact across every record in your database, SQL is the way to go. But you almost never want to do that, and even if you did, it'd be a means to an end which is probably as easily realized in a fundamentally different way.
  
  --
  Don't thank God, thank a doctor!
149. Re:Article summary by shutdown+-p+now · 2010-03-29 08:34 · Score: 1
  
  But what exactly is SQL (as a language, and not particular implementation thereof) or relational model (as a model, and not particular implementation thereof) missing that NoSQL databases have?
  In your example, if Erlang corresponds to relational/SQL, and Java corresponds to NoSQL, then Erlang is missing shared state. But, so far as I can see, everything a NoSQL database can do, an SQL database can also do. You can ignore transactions (READ UNCOMMITTED). You can have optimistic concurrency in form of snapshots/MVCC. You can have arrays - there's no reason why an array cannot be an atomic type for the purpose of an attribute definition, so long as it's also retrieved or updated atomically (which can, of course, be optimized away to direct element access in actual implementation).
  You can dump raw XML or JSON into strings or blobs - in fact, many RDBMS these days have dedicated types which also allow querying over that - and, of course, if you want to do filter on the client, you can also query that XML/JSON in any language and in any way you see fit. Or if your RDBMS is extensible (most are), you can write your UDFs for XML or JSON processing in a language that's more suitable for that, and then use it in SQL queries - SQL/CLR in MSSQL being a good example of that approach.
  And "emulating" MapReduce in an RDBMS? That's what SQL SELECT pretty much is, no?
150. Re:Article summary by SanityInAnarchy · 2010-03-29 10:14 · Score: 1
  
  You can have optimistic concurrency in form of snapshots/MVCC.
  Out of curiosity, what about the appengine behavior, in which there is only a single version that succeeds, and any concurrent transactions that affect the same entity group fail?
  
  You can have arrays - there's no reason why an array cannot be an atomic type for the purpose of an attribute definition
  Does any existing database do it this way? And more importantly, can you query on the members of said array? In the above key example, a query for "Which records are tagged with a given tag?" is stupidly simple.
  
  You can dump raw XML or JSON into strings or blobs - in fact, many RDBMS these days have dedicated types which also allow querying over that
  Great, what about YAML? What about ID3 tags on MP3s?
  
  and, of course, if you want to do filter on the client, you can also query that XML/JSON in any language and in any way you see fit.
  Doing it on the client is not nearly as efficient as implementing a map/reduce which runs where the data actually is.
  
  And "emulating" MapReduce in an RDBMS? That's what SQL SELECT pretty much is, no?
  No, not even close, unless I'm missing a lot about what SQL is.
  A Map function is an arbitrary bit of code in a Turing-complete language which executes over every element in a given set, creating an arbitrary number (zero or more) corresponding elements which form the result set. A Reduce function is an arbitrary bit of code in a Turing-complete language which executes once for every element in a given set, taking that element and the result of reduce on the previous element as input, thus creating a single datum as output.
  The power of this approach is that a given map can run concurrently on all elements in a given set, wherever they are -- generally, it makes sense to do this where the data is physically stored -- and the results can generally be cached, again wherever the data is physically stored. While a reduce can't necessarily run concurrently, many of them can run in parallel, and again, it's run where the data is.
  A trivial example, for which there is probably a specific database feature, would be a fulltext keyword search. Just create a map function which emits a word and a count for each unique word in each document. Then it's just a matter of a key-based lookup on the result. I'm aware that there are many tools for searching in the relational world -- the point is that this is something you can do trivially, using a tool which isn't specific to search, without any sort of administrative overhead of dealing with cron jobs, rebuilding indices, etc.
  Another example might be to take a given record that's an ODF and convert it to PDF, or extract out relevant metadata or keywords to search on.
  
  --
  Don't thank God, thank a doctor!
151. Re:Article summary by shutdown+-p+now · 2010-03-29 11:26 · Score: 1
  
  Out of curiosity, what about the appengine behavior, in which there is only a single version that succeeds, and any concurrent transactions that affect the same entity group fail?
  That is precisely how snapshot isolation using MVCC works. I know that it is available in at least Oracle, MSSQL, Postgres and Firebird, out of relational databases I have any experience with. The complete list is probably longer.
  
  Does any existing database do it this way? And more importantly, can you query on the members of said array? In the above key example, a query for "Which records are tagged with a given tag?" is stupidly simple.
  Yes. See docs PostgreSQL arrays for a comprehensive coverage, but the gist of it would be:
  
  CREATE TABLE foo (bar TEXT[]) SELECT * FROM foo WHERE ANY(bar) = "tag"
  though such a search would be O(n) for every array (which, of course, isn't a problem for your designated use case of tag clouds).
  Interbase and Firebird also have arrays, though last I checked, no built-in facilities for searching them in this manner (though an UDF can be written in C to do that, or a third-party one can be used).
  I don't know of other databases, save that MSSQL does not have this out of the box. Since it allows to extend both column types and functions via any .NET language, adding that would be trivial.
  
  Great, what about YAML? What about ID3 tags on MP3s?
  You can, obviously, shove anything into a blob or a text field - the DB won't preclude you from doing so. Querying it is also not a problem if the DB in question allows for custom user-defined functions written in a foreign language (practically all modern RDBMS do). Such a query will still be parallelized - it won't parallelize the body of your function, of course, but it will parallelize the application of that function to rows.
  
  A Map function is an arbitrary bit of code in a Turing-complete language which executes over every element in a given set, creating an arbitrary number (zero or more) corresponding elements which form the result set.
  That's what SELECT does, for a set of tuples (which can, of course, be single-element).
  Now, I'm not aware of a standard ANSI SQL way to get multiple output tuples from a single input one (though I suspect there is something in SQL99 or SQL03 additions). In MSSQL, I can define a UDF (written in T-SQL, with CREATE FUNCTION) that produces a table value from its input parameters, and then join on that. For example:
  
  CREATE FUNCTION map_x_to_ys(@x INTEGER) RETURNS @ys TABLE(y INTEGER) AS BEGIN DECLARE @i INTEGER SET @i = 0 WHILE @i < @x BEGIN INSERT INTO @ys VALUES (@i) SET @i = @i + 1 END END SELECT ys.y FROM xs CROSS APPLY map_x_to_ys(xs.x) AS ys
  assuming that "xs" is a table (or a table-type value coming from another query). APPLY here is MSSQL shorthand for this kind of join, which avoids the needs to output keys in the UDF to later join on in SELECT. Effectively, SELECT..APPLY in MSSQL is precisely the "map" part of MapReduce, with exact same semantics.
  In other SQL DBs, the same kind of thing - when you need multiple output values per each input value - could be achieved by using plain joins, albeit with a few more hoops to jump through. I suspect that those with array data types will let you use them to avoid joins, as well.
  
  A Reduce function is an arbitrary bit of code in a Turing-complete language which executes once for every element in a given set, taking that element and the result of reduce on the previous element as input, thus creating a s
152. Re:Article summary by SanityInAnarchy · 2010-03-29 12:56 · Score: 1
  
  such a search would be O(n) for every array (which, of course, isn't a problem for your designated use case of tag clouds).
  Wait, what?
  That sounds like it would be. Every time a user clicks on a given tag, O(n) for each array? That's a serious performance hit vs what AppEngine does here, which is a string lookup in a hash table. Even in SQL, things like acts-as-taggable-on do a better job.
  Of course, if it's just O(n) on insert and update, that's not a problem.
  
  You can, obviously, shove anything into a blob or a text field - the DB won't preclude you from doing so. Querying it is also not a problem if the DB in question allows for custom user-defined functions written in a foreign language (practically all modern RDBMS do). Such a query will still be parallelized - it won't parallelize the body of your function, of course, but it will parallelize the application of that function to rows.
  Still raises some questions, like: Will the results of the application of that function be cached and indexed for future queries?
  
  it won't parallelize the body of your function, of course, but it will parallelize the application of that function to rows.
  Will it do so across multiple machines?
  
  Now, I'm not aware of a standard ANSI SQL way to get multiple output tuples from a single input one (though I suspect there is something in SQL99 or SQL03 additions). In MSSQL, I can define a UDF (written in T-SQL, with CREATE FUNCTION) that produces a table value from its input parameters, and then join on that. For example:
  So, something in the additions somewhere might be able to do it. Either way, it's going to be ugly.
  
  Effectively, SELECT..APPLY in MSSQL is precisely the "map" part of MapReduce, with exact same semantics.
  Next question: Suppose I update, insert, or delete a row in xs. Do I have to reapply the whole thing?
  It seems like what I'm getting here is more or less what I expected -- yes, you can do it (probably, I still have doubts about the performance/reliability tradeoffs), but it's cumbersome, to say the least. In order to take full advantage of these, you're going to have to dig deep into the non-portable details of your chosen database.
  You also didn't really touch on scalability. I just checked the Postgres wiki, and found more or less the same situation I did awhile ago -- there is a total of one solution that can scale writes, and it does so with sharding. The closest is MySQL cluster, which currently holds all your data in RAM.
  And then there's Riak, which can scale today.
  So at the end of the day, what is SQL buying you as a language, and what is the relational model buying you here?
  
  --
  Don't thank God, thank a doctor!
153. Re:Article summary by shutdown+-p+now · 2010-03-29 13:47 · Score: 1
  
  Wait, what?
  That sounds like it would be. Every time a user clicks on a given tag, O(n) for each array?
  In the absence of an index - yes, of course. I mean, that's why it's called an array, which is, by definition, an ordered list of possibly duplicate values with O(1) indexed access. It doesn't have any hashes or indices of it's own.
  And, yes, you can create an index (GIN) on an array field which would permit optimized =ANY() lookups on it. It also supports determining subset and superset lookups.
  
  Still raises some questions, like: Will the results of the application of that function be cached and indexed for future queries?
  Obviously, it depends on the implementation.
  For functions defined via CREATE FUNCTION, I believe the answer is "yes" for any modern RDBMS, so long as they are provably pure (i.e. only reference input and output arguments, and not global tables). MSSQL actually enforces this for all such functions.
  For external UDFs, it's a more interesting question. For MSSQL and SQL/CLR, you explicitly specify whether the function is deterministic or not with respect to its input; I presume that if it is declared as deterministic, the optimizer will use that fact.
  
  Will it do so across multiple machines?
  Not that I know of. For pretty much everything SQL-related that I've dealt with, it was parallelization on multiple CPUs/cores, not on a cluster of machines.
  As a side note, though, at this point we're stranding beyond the original question of "which is higher-level". It's also worth reminding that the claim in TFA was, in fact, "you don't really need a database cluster for your database - you're not Google".
  
  So, something in the additions somewhere might be able to do it. Either way, it's going to be ugly.
  Something in additions to what? ANSI SQL? I think it's a meaningless point to raise, because NoSQL does not have a standard to begin with; and, to be honest, neither do SQL databases. The best we can do is compare relational and NoSQL implementations one-to-one, in which case "APPLY" etc are fair game.
  
  Next question: Suppose I update, insert, or delete a row in xs. Do I have to reapply the whole thing?
  You mean, if you "save" the output within the database (say, via CREATE VIEW)? I do not know the answer to that, unfortunately. I believe that most implementations will recompute in full. In theory, of course, nothing in either relational model nor SQL prevents optimizing this.
  
  In order to take full advantage of these, you're going to have to dig deep into the non-portable details of your chosen database.
  I don't lay claims that those things are possible within the constraints of standard ANSI SQL (they may be, I'm just not an expert on it). My original claim, if you recall, was that they are all possible within the realm of a relational model. There is no contradiction between map/reduce, and relational model - so long as you map relations to relations (with each tuple mapped to one or more tuples), and reduce relations to tuples, it's still relational.
  
  You also didn't really touch on scalability. I just checked the Postgres wiki, and found more or less the same situation I did awhile ago -- there is a total of one solution that can scale writes, and it does so with sharding. The closest is MySQL cluster, which currently holds all your data in RAM.
  And then there's Riak, which can scale today.
  I've mentioned that above. It is a separate (and also interesting) question, but I suspect that TFA is absolutely correct in that there are very few actual users who need that kind of scalability. For traditional relational databases, you usually just throw more powerful hardware at them. Judging by the sizes of some of the existing relational databases, it is a working solution. For example,
154. Re:Article summary by SanityInAnarchy · 2010-03-29 14:35 · Score: 1
  
  As a side note, though, at this point we're stranding beyond the original question of "which is higher-level".
  A bit. I suppose if you can show that in principle it would be possible to scale, you win. It seems pretty difficult to me, and it seems like the abstractions many NoSQL databases have chosen are deliberately biased towards scale.
  
  It's also worth reminding that the claim in TFA was, in fact, "you don't really need a database cluster for your database - you're not Google".
  I'm not, but at the point where I need much beyond SQLite, I probably do want the ability to scale, or at least do some sort of real-time replication and failover.
  
  Something in additions to what? ANSI SQL? I think it's a meaningless point to raise, because NoSQL does not have a standard to begin with
  That's actually my point. If you're straying beyond ANSI SQL, and getting into more and more vendor-specific stuff, you lose the advantage of having a standard at all.
  
  The best we can do is compare relational and NoSQL implementations one-to-one, in which case "APPLY" etc are fair game.
  Fair enough.
  
  You mean, if you "save" the output within the database (say, via CREATE VIEW)? I do not know the answer to that, unfortunately. I believe that most implementations will recompute in full. In theory, of course, nothing in either relational model nor SQL prevents optimizing this.
  Seems like that depends how pure it is, whether or not it has access to other rows in the same table, or other tables.
  
  For traditional relational databases, you usually just throw more powerful hardware at them.
  At which point, it gets much more expensive.
  
  Yahoo had a 100TB Oracle database back then.
  I don't know how Oracle stores data, specifically, though I do remember them doing some interesting things in multimaster, shared media, cluster filesystems. To me, that spells expensive. It spells some sort of massive RAID or SAN which your database cluster has to be wired to directly... ...versus vanilla hard disks (not even RAID, but hard disks with some checksumming) in some beige boxes connected via Ethernet.
  
  The only real advantage of SQL is that pretty much everyone knows the basics of it. It's like XML or Java in that respect - not perfect, and in some cases just downright ugly, but easy to find experts in, has stable and mature solutions, and excellent tooling that, in part, alleviates the pain of having to deal with its deficiencies as a language.
  That's actually a really good analogy. If I may:
  There aren't many places I'd use XML now. It makes sense for document markup, maybe, and I like HTML, especially the extensibility of it. But it's used in a lot of places it doesn't make sense, like AJAX -- straight HTML or JSON is usually better there -- or REST -- if it's two web apps talking to each other, YAML is smaller and easier to parse -- or serialization, or config files -- YAML is smaller, easier to parse, and easier for humans to read.
  Having an XML expert is nice, but it'll take you far less time to become a JSON expert -- something like 20 minutes.
  Java is similar. I'm not sure the difference is quite as obvious, and it certainly takes longer to learn a new programming language, new frameworks, etc, than it does to learn a newer, lighter markup language. On the other hand, the productivity gains by switching to something like Ruby, despite missing certain tools Java has, are enormous. I mean, both are Turing-complete, and JRuby proves they aren't so different after all, but still, attr_accessor vs defining accessors yourself? Rails and convention-over-configuration vs piles and piles of XML? No contest.
  
  More broadly speaking, relational is buying me the ability to spe
  
  --
  Don't thank God, thank a doctor!
155. Re:Article summary by shutdown+-p+now · 2010-03-29 15:40 · Score: 1
  
  It seems pretty difficult to me, and it seems like the abstractions many NoSQL databases have chosen are deliberately biased towards scale.
  Well yes, of course. Isn't it kinda the whole point of NoSQL - to limit the DB only to those abstractions that are either easily scaled automatically, or can be scaled in a manual but obvious way by the user?
  
  There aren't many places I'd use XML now. It makes sense for document markup, maybe, and I like HTML, especially the extensibility of it. But it's used in a lot of places it doesn't make sense, like AJAX -- straight HTML or JSON is usually better there -- or REST -- if it's two web apps talking to each other, YAML is smaller and easier to parse -- or serialization, or config files -- YAML is smaller, easier to parse, and easier for humans to read.
  Having an XML expert is nice, but it'll take you far less time to become a JSON expert -- something like 20 minutes.
  Well, the trick here is that I do not really have to parse that XML at all. Say, I code in .NET - I just whip up a bunch of classes, and have XmlSerializer go at it and map it to XML as it sees fit. Ditto for Java.
  Of course, it's the same for JSON and YAML, but therein lies the catch: which one is most widely supported by most frameworks? Even more importantly, which one is best supported (i.e. least lines of code) on the one I'm using at the moment?
  The day a YAML parser ships out of the box in .NET Framework, I'll start using it for my configs. JSON is actually shipping already, but it doesn't have the same convenient mechanism to map to strongly typed classes, complete with validation (if I specify a property as int, it is read as int - or error is reported) on the go.
  Same for Java - I'm not sure what it has out of the box these days, but I'd expect JSON at least aside from XML - but I suspect the tools that are there to read XML are easier to use.
  Of course, the price is parsing overhead. XML isn't easy to parse if you do it all right - encodings, entities, etc (been there, done that), whereas JSON is trivial.
  
  But I'm still not convinced that the relational model is the right level of abstraction.
  I'm not convinced of that myself. Since I prefer at least some amount of OO on the application logic side, I really don't see why my DBs don't natively support that as well. I want to just navigate the object hierarchies in obvious ways (by navigating properties), in a language that gives some syntactic sugar for all the usual query comprehensions (e.g. LINQ is mostly good enough), and have it figure out all that needs to be done by itself. I don't care how it all ends up being stored, so long as how I see it is objects.
  And all ORMs I've seen so far are very leaky abstractions, so that is definitely not the answer...
  Even so, I know that I do want ACID for sure (preferably optimistic concurrency in form of snapshots) by default, with flexible scaling boundaries. There might be rare cases where I do not need the guarantees and do not want the overhead, but I'd want that to be opt-in rather than opt-out.
156. Re:Article summary by SanityInAnarchy · 2010-03-29 16:30 · Score: 1
  
  Isn't it kinda the whole point of NoSQL - to limit the DB only to those abstractions that are either easily scaled automatically, or can be scaled in a manual but obvious way by the user?
  That's one point. Another is that it can be much easier to plan for and reason about.
  As an example, apply any document-oriented database to email. Store the original email as a giant text blob. Include one view which exposes a hash mapping header names to values, and suddenly be able to query on any header. Include another which actually parses out the MIME structure and exposes attachments and the like. Again, all possible in SQL, but it goes against the grain when the initial step is something like "dump a giant text blob and figure it out later."
  Another example would be any schemaless database for rapid development -- just add a field ad-hoc, no need to deal with migrations, much less some massive ALTER TABLE once you're up and running.
  
  Well, the trick here is that I do not really have to parse that XML at all.
  Well, yes you do. You do have to deal with an XML tree, which is more complex than a JSON tree. You also have to be able to look at the XML yourself and figure out what it means.
  I mean, I certainly second the tendency to use a library, but given the choice between a simpler and more complex format, both with good libraries, I'll choose the simpler one, provided it meets my needs.
  
  Of course, it's the same for JSON and YAML, but therein lies the catch: which one is most widely supported by most frameworks? Even more importantly, which one is best supported (i.e. least lines of code) on the one I'm using at the moment?
  That is indeed the more important step. More important than that is which is best supported in the app you're using at the moment.
  I can see being apathetic about XML in that respect, but it can't be that hard to wrap the existing JSON stuff into something usable, even re-usable.
  
  complete with validation (if I specify a property as int, it is read as int - or error is reported)
  I don't know to what extent I care about that.
  
  Of course, the price is parsing overhead. XML isn't easy to parse if you do it all right - encodings, entities, etc (been there, done that), whereas JSON is trivial.
  Also the complexity of the tree you get out of it, and the format itself, and the bandwidth used.
  So, for example, I would again find it perverse to use XML as a wire format for AJAX. HTML, I understand, because it can easily be dumped into the DOM, but if the client is actually meant to read and understand the message, JSON seems obvious. Rails made it trivial to support all of the above -- HTML XML, JSON, and Yaml -- I would use JSON for the browser (until I switched to HTML) and Yaml for other Rails apps or standalone Ruby scripts.
  
  all ORMs I've seen so far are very leaky abstractions, so that is definitely not the answer...
  Unless they haven't been done right yet. Out of curiosity, have you played with DataMapper much? It has an absurd number of backends on all kinds of things relational and otherwise.
  
  Even so, I know that I do want ACID for sure (preferably optimistic concurrency in form of snapshots) by default, with flexible scaling boundaries.
  What do you mean by "flexible scaling boundaries"?
  And one possible reaction to the article -- you're not a bank any more than I'm Google. How much consistency do you really need?
  
  --
  Don't thank God, thank a doctor!
157. Re:Article summary by shutdown+-p+now · 2010-03-29 16:56 · Score: 1
  
  Again, all possible in SQL, but it goes against the grain when the initial step is something like "dump a giant text blob and figure it out later."
  It goes against the grain of dogmatic old-school relational theory, perhaps; but most real-world databases I've seen used blobs, and quite a few queried over them. Of course, XML fields are also really a kind of loosely organized blob, with no enforced schema, and their direct support by mainstream RDBMS seems to imply that they are generally accepted.
  
  Well, yes you do. You do have to deal with an XML tree, which is more complex than a JSON tree. You also have to be able to look at the XML yourself and figure out what it means.
  Hmm, I wonder if I didn't perhaps get the point across, so let me try some code examples.
  For .NET, what I'd do is write the data structures as classes (i.e. what is easy and familiar to deal with) first:
  
  public class AppSettings { public int Property1 { get; set; } public decimal Property2 { get; set; } public DateTime Property3 { get; set; } public class SubSetting { public string SubProperty1 { get; set; } public XmlNode SubProperty2 { get; set; } } public SubSetting[] SubSettings { get; set; } }
  and then serializing & deserializing XML would be as simple as:
  
  var ser = new XmlSerializer(typeof(AppSettings)); var settings = (AppSettings)ser.Deserialize(File.Open("foo.xml")); ... ser.Serialize(File.Create("foo.xml"), settings);
  This would end up mapping to:
  
  <AppSettings> <Property1>...</Property1> <Property2>...</Property2> <Property3>...</Property3> <SubSettings> <SubSetting> <SubProperty1>...</SubProperty1> <SubProperty2>...</SubProperty2> </SubSetting> ... </SubSettings> </AppSettings>
  I do not deal with any XML aspects here - element/attribute distinction, encodings, schemas etc. If I really want to, I can customize the default mapping by decorating my classes, or generate a schema out of it to validate XML against, but most likely I won't bother.
  Now with something like YAML, the serializing & deserializing code will probably be just as short, but what I'll get out of it is an untyped array or a map, and in a statically typed language, I'll have to do all the casts/parsing myself. With the code above, I can walk the tree simply by walking over chains of properties. And I get IDE code completion to assist me there. Or I can bind to the resulting object tree directly from UI (in either WinForms or WPF/XAML).
  
  So, for example, I would again find it perverse to use XML as a wire format for AJAX. HTML, I understand, because it can easily be dumped into the DOM, but if the client is actually meant to read and understand the message, JSON seems obvious.
  For AJAX, JSON is clearly preferred simply because 1) all JS frameworks have first-class support, and 2) all server-side frameworks these days have first-class support, too. E.g. for .NET 3.5+, there is a JsonSerializer which is somewhat similar to XmlSerializer - something that can map a strongly typed .NET graph to a bunch of JSON and back. So JSON wins by default by being the easiest to go with, and XML would have to be justified.
  
  Unless they haven't been done right yet. Out of curiosity, have you played with DataMapper much? It has an absurd number of backends on all kinds of t
158. Re:Article summary by ppanon · 2010-03-30 03:26 · Score: 1
  
  Heh. One of my friends once told me a story years ago about him walking down the street, hearing a horrible noise, and looking around to see this high end exotic sports car stopped in the middle of the street. Nothing had hit it, but apparently the driver had done something so seriously wrong (probably with the transmission) that they couldn't get the car to start again.
  
  --
  Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire
159. Re:Article summary by SanityInAnarchy · 2010-03-30 04:26 · Score: 1
  
  Now with something like YAML, the serializing & deserializing code will probably be just as short, but what I'll get out of it is an untyped array or a map, and in a statically typed language, I'll have to do all the casts/parsing myself.
  Not necessarily. What, exactly, is typed about XML? As far as XML itself is concerned, you're just dealing with strings.
  In fact, the default YAML implementation in Ruby definitely encodes type information. For example:
  
  > puts 1.to_yaml --- 1 > '1'.to_yaml --- "1"
  Or, let's just pick a random class:
  
  > require 'ipaddr' => true > puts IPAddr.new('127.0.0.1').to_yaml --- !ruby/object:IPAddr addr: 2130706433 family: 2 mask_addr: 4294967295
  And if this didn't happen, then yes, I would have to parse it out myself. Ruby may be dynamically typed, but it is strongly typed -- I can't just take a string and pretend it's an integer, I have to at least call to_i.
  
  For AJAX, JSON is clearly preferred simply because 1) all JS frameworks have first-class support, and 2) all server-side frameworks these days have first-class support, too.
  That, I didn't know -- I know towards the beginning, XML was the default.
  However, as I said, I like HTML. It depends on the app -- if I'm actually building a client-side browser application, I'd probably tend towards JSON. On the other hand, I like the fact that with microformats and CSS, I can effectively create a REST interface that just is the website, and I can easily pull out anything I need to deal with client-side with JQuery -- assuming I'm not just doing what I would be 99% of the time, which is taking the HTML response and injecting it right into the document.
  
  It seems to be a Ruby-only thing, and I've only toyed with Ruby as a language. I'm generally not very fond of dynamically typed languages, preferring something statically typed, hybrid OO/FP, like Scala.
  Definitely a Ruby-only thing. I have to wonder why you like something statically-typed, but that's another discussion.
  Point is, it provides a frontend which has reasonably high-level concepts for querying, working with records, relationships can be navigated as if they were just properties or queried on as if they were tables, etc. Yet much of this is decoupled from a specifically relational model -- it's up to the adapter to map the query (constructed in an entirely type-safe, injection-free manner by an internal DSL) to the database in question. It works very well on relational databases, but it also works on entirely different beasts -- I've been contributing to the Google App Engine adapter.
  (I realize I say App Engine a bit too much. I don't work for Google, it's just the database I've been using the most lately, aside from SQLite.)
  
  I.e. ability to scale transaction to multiple-entity updates, or a sequence of updates interspersed with queries.
  App Engine does the latter just fine, so long as the queries are all within an appropriate scope -- but that scope is fixed. The former depends how your model is defined -- you can easily update multiple entities, but they have to share an entity-group.
  It seems like many (most?) applications fit easily into that model. For example, take Slashdot -- my user preferences could all be stuffed into a single entity-group. It's hard to come up with an example where I'd want to atomically update the preferences of more than one user. This post would be its own entity-group, most likely, as there's no particular need to even update multiple posts on a given story simultaneously, and it's trivial to order them by posting date, and deterministically after that -- however, it would probably make sense to group all moderations associated with this post with the post itself. (You'd want to store all moderations to be a
  
  --
  Don't thank God, thank a doctor!
160. Re:Article summary by JAlexoi · 2010-03-30 22:53 · Score: 1
  
  That is bullshit! Oracle DBMS product's licensing costs are not that high, but RAC is the thing that really bites.
  I have a client, a fairly big multinational, that stays clear of RAC just because of the license costs. They are prepared to face downtime, but will not buy RAC licenses.
  I have no idea how much RAC would cost them, but they did cost benefit analysis. And they are sticking with Oracle since their industry solution is based on Oracle Forms.
161. Re:Article summary by rkit · 2010-03-31 03:20 · Score: 1
  
  I never claimed sqlite is the only solution without adminstration, but I think this is the biggest advantage it brings to the table. Also it seems to be very robust, and there are no license problems whatsoever. Its workings are very well documented. It is a very useful tool if you use it for the right tasks.
  
  --
  sig intentionally left blank
162. Re:Article summary by ckaminski · 2010-04-03 07:11 · Score: 1
  
  I concur. If microsoft were to ever "die" I would want SQL Server saved from certain destruction.
Can't wait it to die? by sopssa · 2010-03-28 03:31 · Score: 2, Insightful

This is like saying "I can't wait for memcached to die" just because your site doesn't need it. Fact is, some do. It's your own fault if you choose to apply unnecessary techniques.
Don't change to newer fancy techniques if you don't understand what they are for and why would you need them.
1. Re:Can't wait it to die? by David+Gerard · 2010-03-28 03:36 · Score: 2, Insightful
  
  memcached is most useful when the underlying app is hideously inefficient, e.g. it's pretty much essential to a MediaWiki installation that gets any appreciable number of users.
  
  --
  http://rocknerd.co.uk
2. Re:Can't wait it to die? by outZider · 2010-03-28 03:46 · Score: 1
  
  Well, no, not entirely. Not many sites out there run purely from memcached. Memcached is a component of a larger architecture. The fact remains that technologies like NoSQL are usually used/desired by people who have no understanding of system architecture, design an inefficient application, and then blame the database software for their poor decisions.
  
  --
  - oZ
  // i am here.
3. Re:Can't wait it to die? by Anonymous Coward · 2010-03-28 03:54 · Score: 3, Interesting
  
  Facebook.com, the highest-traffic site on the Internet, serves more than 95% of its data out of memcached. Twitter, Wikipedia, etc are major users too. And of course, Google serves its web index out of memory.
4. Re:Can't wait it to die? by RightSaidFred99 · 2010-03-28 05:59 · Score: 4, Insightful
  
  No, he's saying he can't wait for the _hype_ over NoSQL to die.
5. Re:Can't wait it to die? by outZider · 2010-03-28 14:17 · Score: 1
  
  Facebook services out of memcached backed by a normalized storage system, and had it been designed by people who weren't dolts in the beginning, things would probably be different now anyway.
  Now they have a PHP to C compiler. What.
  
  --
  - oZ
  // i am here.
6. Re:Can't wait it to die? by rmm4pi8 · 2010-03-28 22:10 · Score: 1
  
  What would be better for keeping every user's profile thumbnail in memory than memcache?
  And would it have gotten off the ground in the first place if it weren't written in a scripting language? Probably not. Now that they have a million lines of PHP code, would it survive a rewrite? Probably not.(see: Netscape). So it's ugly to be sure, but it's almost certainly rational.
  
  --
  U.S. War Crimes blog. Email for free Mandriva support.
7. Re:Can't wait it to die? by vegiVamp · 2010-03-28 23:57 · Score: 1
  
  True, but that doesn't mean that it can't be useful in any well-designed app, too. One of the things that come to mind, is a session store.
  
  --
  What a depressingly stupid machine.
8. Re:Can't wait it to die? by vegiVamp · 2010-03-28 23:59 · Score: 1
  
  Serve out of, yes. Permanently store in, no.
  Memcache is highly useful (and indeed, designed) for serving volatile key/value type data. It would be plain stupid to use it as the main store for your non-volatile data.
  
  --
  What a depressingly stupid machine.
9. Re:Can't wait it to die? by ThePhilips · 2010-03-29 04:14 · Score: 1
  
  The fact remains that technologies like NoSQL are usually used/desired by people who have no understanding of system architecture,
  
  I like it when DBAs claim that DB can do it all without deep tinkering. And then when it doesn't work, blame everybody else.
  And by system architecture - don't you really mean "RDBMS architecture"??
  
  design an inefficient application, and then blame the database software for their poor decisions.
  
  Try to design efficient application which for its work has to do selects on a table with NNNmln entries, totaling >10GB in size.
  Yeah, right, all RDBMS' promises suddenly fall flat.
  
  --
  All hope abandon ye who enter here.
Neither are going away so just shut the fuck up by Anonymous Coward · 2010-03-28 03:33 · Score: 1, Insightful

Hierarchical DBMS have been around longer than SQL-style RDBMS for a long time. OODBMS have existed for a long time as well. Many of these "NoSQL" DBs don't provide the same restrictions or assurances that an RDBMs provides but they often have other features.
BDB isn't going away and neither is SQL. Get over it.
The Actual Quote is by Daengbo · 2010-03-28 03:40 · Score: 5, Insightful

"MySQL or PostgreSQL," for what it's worth. PostgreSQL is a pretty powerful database, and you should have to make a pretty good argument why leaving a well understood technology that powers a lot (an some of the largest parts) of the WWWeb needs to be trashed for something newer and less tested.

--
Put identity in the browser.
I can't wait for databases to die by Anonymous Coward · 2010-03-28 03:40 · Score: 5, Funny

XML text files all the way! /duck
1. Re:I can't wait for databases to die by WrongSizeGlass · 2010-03-28 04:09 · Score: 4, Funny
  
  Let me be the first to say whoosh .
2. Re:I can't wait for databases to die by Phroggy · 2010-03-28 05:52 · Score: 4, Funny
  
  Remember, XML is like violence: if it doesn't solve your problem, use more!
  
  --
  $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
  $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
3. Re:I can't wait for databases to die by Hurricane78 · 2010-03-28 07:03 · Score: 1
  
  Actually, I can”t stand the “retardedness” that went into designing databases, file systems, XML, class hierarchies, etc, as trees, instead of proper graphs in a proper ontology. In fact a graph is the only structure you’ll ever need. Everything else can be represented as a graph. So there is no point in separate list, table and tree data structures. Not even for optimization, since the graph algorithms can be designed in a way that makes them exactly as efficient.
  (Been there, done it, and in fact designing a system based on it.)
  
  --
  Any sufficiently advanced intelligence is indistinguishable from stupidity.
4. Re:I can't wait for databases to die by Arancaytar · 2010-03-28 07:07 · Score: 1
  
  The wonderful thing is that they're so easy to make!
  Just slap
  <xmldata> ... </xmldata>
  around all of your data files, and you're done!
5. Re:I can't wait for databases to die by ceoyoyo · 2010-03-28 08:03 · Score: 1
  
  Kids.
  Binary files. And you'll like it.
6. Re:I can't wait for databases to die by GaryOlson · 2010-03-28 09:19 · Score: 1
  
  Agreed. But, most people -- programmers, DBAs, PHBs, sys admins, et al -- have problems with the basic mathematics behind trees; much less the concepts and math of graphs. Most of their education and life experience is based on the concept of tree based hierarchies. If basic math education started with proper graphs and explained hierarchies as a special form of graph, I imagine more than just databases would be designed differently.
  
  I can't wait to see the system you design and build and ship as a finished product. Until then, we use what is available.
  
  --
  Every mans' island needs an ocean; choose your ocean carefully.
7. Re:I can't wait for databases to die by vegiVamp · 2010-03-29 00:01 · Score: 1
  
  You got a syntax error, there. You probably meant XML text files all the way! .
  
  --
  What a depressingly stupid machine.
Why? by ThoughtMonster · 2010-03-28 03:43 · Score: 2, Insightful

Why should anything "die"? People choose solutions based on their individual merits. If something doesn't work, exchange it for something that does. I'm sure certain people find NoSQL-type databases perfect for their needs.

In short, people should just shut up about other people's choices and get on with their own.
1. Re:Why? by lukas84 · 2010-03-28 04:01 · Score: 1
  
  People choose solutions based on their individual merits.
  No, just no.
2. Re:Why? by Jeff+DeMaagd · 2010-03-28 04:18 · Score: 2, Insightful
  
  No, just no.
  That's about as information-free as one can get. I'd ask why, but then, I don't understand why I would have to, just saying "no" is void of context and explanation.
3. Re:Why? by Excelcia · 2010-03-28 05:23 · Score: 1
  
  Why should anything "die"?
  What I took from the article was that the author wasn't particularly advocating that NoSQL should die, but that the hype surrounding should die. Or, rather, that the hype surrounding it was inflated with respect to its actual usefulness.
  
  People choose solutions based on their individual merits.
  Unfortunately, this just isn't the case. People choose solutions for a multitude of reasons, that, in some cases intersect with the actual appropriateness of that solution to the problem at hand.
  
  In short, people should just shut up about other people's choices and get on with their own.
  If everyone shut up and essentially "minded their own business", then a lot of software solutions wouldn't exist. A lot of them exist because someone had a problem and thought "I bet a solution to this would help a lot of other people with the same problem". Are you using Firefox right now? Chrome? Good thing that the developers of those browsers foresaw the issues with IE that existed then and would exist in the future, wrote a solution for their and other peoples' problems, and then advocated those solutions. Advocation for or against a how a particular solution fits a set of problems is needed and healthy.
  
  In my experience, IT works on a generational system just like many other things, just that the generations are much much shorter. As short as a year or two. And each new crop of fresh-faced idealistic young developers and engineers has their own particular "bool beans" technology. It's the same anywhere, just it happens faster in IT. This article is much like any other generational lament, citing how the "young folk" with their "new fangled" ideas have it all wrong and how they should listen to the Voice of Experience(TM). In reality, like almost every other case, I expect the appropriate path will probably be somewhere in the middle of the new and the old.
  
  Have you never seen the hype around the technology-du-jour cause people to choose it over an other, perhaps more appropriate technology? In that sense, the author of the article is absolutely right. Sooner or later we'll stop seeing back-to-back Slashdot articles that make it seem there is a mass exodus of large enterprise from SQL. We'll start to hear instead stories instead of how Company B got burned because some inexperienced, fresh-faced IT guru advocated NoSQL inappropriately due to its coolness factor. The next generation will be out with it's technology-du-jour, and that will leave everyone more free to choose SQL or NoSQL solutions, as you say, based their merit rather than on their hype.
  
  Maybe this article will help everyone to get over the hype more quickly.
4. Re:Why? by jreineri · 2010-03-28 05:50 · Score: 1
  
  I believe that you missed the real point of the article. It was clear to me the intent was less a slam at NoSQL and much more of a slam at the stupidity of everyone suddenly jumping across the fence for SQL to NoSQL for no other reason than it is the current "correct" thing to do.
5. Re:Why? by hedwards · 2010-03-28 05:50 · Score: 1
  
  Because in practice that doesn't work so well. That's how we got shafted with Windows, Flash and iPods. In none of those cases were the products superior to the competition. Point of fact they all suck and for the most part sucked even from the get go, however they had powerful people behind them tricking people into thinking they were the correct solution. Flash has managed to get worse with every iteration whereas the other two are somewhat debatable. Bill Gates himself is famous for convincing people that he's the only one that can give them what they need and that nothing else will do.
  
  Imagine what the world might be like if those abominations had been still born.
6. Re:Why? by Xtifr · 2010-03-28 09:46 · Score: 1
  
  I was tempted to write something snarky in reply, but upon reflection, I think I'd just like to ask where I can apply to emigrate to your planet.
  On second thought, never mind. A place where people make decisions based on rational analysis of the relevant facts, rather than leaping to judgment based on emotion, hormones, prejudice and force of habit, is probably more than I can take. I'm only human, after all. :)
7. Re:Why? by thePowerOfGrayskull · 2010-03-28 14:43 · Score: 1
  
  No, people shouldn't choose solutions based on merits? What, then, should they use as the basis for their decisions?
Picutres. by ROBOKATZ · 2010-03-28 03:44 · Score: 1

I liked the pictures. Is there a name for the muppet guy in the first one?
1. Re:Picutres. by styrotech · 2010-03-28 08:53 · Score: 1
  
  It looks like a poor copy of Zippy:
  http://en.wikipedia.org/wiki/Zippy_(Rainbow)
  http://images.google.com/images?q=zippy+rainbow
  And speaking of Rainbow, this is required viewing...
  http://www.youtube.com/watch?v=Kclq2zGQy4w&feature=related
There's a place for SQL, but there are some... by BLToday · 2010-03-28 03:46 · Score: 2, Informative

There's a place for SQL, but there are some cases where BigTable-like (ie. HyperTable) works better. Our company manages data using SQL, but when we present data to the users it's through a HyperTable implementation. SQL is easier to data management but HyperTable uses our server resources better.
Hardware is cheap. Developers aren't. by Anonymous Coward · 2010-03-28 03:47 · Score: 5, Interesting

It's really that simple. A standard dual socket server with the latest CPU's from Intel or AMD can handle hundreds of requests per second; if one isn't enough, just add more hardware, one month of salary can buy you another node, a year can buy you a whole cluster of rackable systems or a chassis full of blades. If it takes a few months extra for a team to solve the problem the NoSQL way, that's a few months of extra salary costs and missed sales.
Slashdot runs on SQL. I run a site of 1M pages daily (1/3-slashdot according to Alexa) with just a single system with 2x Xeon E5420, Django/PostgreSQL at 10% load. Unless you attract enough attention to require scaling past 10M pages a day, you're wasting your time reinventing the wheel with NoSQL, just stick with a standard ORM, launch your site and start convincing customers and generate sales. You can survive a slashdotting just fine without spending so much time on those exotic tools.
1. Re:Hardware is cheap. Developers aren't. by pavera · 2010-03-28 04:12 · Score: 4, Informative
  
  Pretty sure he meant 1M page views/day as he compares it to slashdot using alexa data.... Is reading comprehension really that hard? Context clues are your friend.
  I run a site using django/postgres, we do about 100k page views/day on a 512Mb 10GB Virtual machine. Its not doing anything crazy like google, but yeah, we aren't close to needing more power yet. When we do, first thing we'll do is bump up RAM for increased cache space...
2. Re:Hardware is cheap. Developers aren't. by Vellmont · 2010-03-28 04:45 · Score: 3, Insightful
  
  It's really that simple. A standard dual socket server with the latest CPU's from Intel or AMD can handle hundreds of requests per second;
  
  Hundreds of requests for WHAT per second?
  Your idea of "just throw hardware at the problem" isn't generalizable. Throw hardware at WHAT problem? For some problems, you're right. For others, you couldn't be more wrong. There's really no point in saying anything further.
  
  --
  AccountKiller
3. Re:Hardware is cheap. Developers aren't. by Lazy+Jones · 2010-03-28 04:46 · Score: 4, Insightful
  
  Unless you attract enough attention to require scaling past 10M pages a day, you're wasting your time reinventing the wheel with NoSQL, just stick with a standard ORM, launch your site and start convincing customers and generate sales.
  Most of the buzz about these things comes from and is aimed at people who actually believe they'll build the next Facebook or Twitter. The fallacy is in their belief that it's the size/traffic of those sites that supposedly mandates NoSQL and not the simple data models. Some of the biggest, less spectacular projects out there run on PostgreSQL for example (Skype, Affilias = .info and .org).
  
  --
  "I love my job, but I hate talking to people like you" (Freddie Mercury)
4. Re:Hardware is cheap. Developers aren't. by Anonymous Coward · 2010-03-28 04:48 · Score: 1, Informative
  
  Pure dynamic. It's a datamining / analysis site, so every user is viewing their own set of data, slicing and zooming randomly. Caching is completely useless for 99.9% of the pages, but we do store some heavy "SELECT COUNT(*) ... GROUP BY ..." queries in memcached. We chose PSQL because it can handle the complex multiple table joins with many indexes required - just that one thing would mean endless pain in a non relational datastore.
  If you still have any doubt, just write your code the easy way and grab Apache JMeter to benchmark your site on localhost. You'll be surprised how well even the dev server works, on an average page with ~10 queries, it takes only 50-100ms to serve a page. At 10/sec/core, extrapolated to 24 hours means almost a million pages/core. You can just take this and run it on a 8-12 cores node and survive any traffic surge imaginable, without cache. Add cacheing and I really can't see how a blog/news site/forum/CMS can ever require NoSQL to run, except when you reach "Facebook" popularity.
  PS.: We aim for these numbers for a non cacheable page: 1s = slow but manageable. 0.2s = good. 0.1s or less = perfect.
5. Re:Hardware is cheap. Developers aren't. by SanityInAnarchy · 2010-03-28 04:54 · Score: 1
  
  I run a site of 1M pages daily (1/3-slashdot according to Alexa) with just a single system with 2x Xeon E5420, Django/PostgreSQL at 10% load.
  Good for you -- that says nothing about how much you're actually doing for each page.
  
  just stick with a standard ORM
  As a rule, I do. I use DataMapper in Ruby. It's just that DataMapper has pluggable backends, some for SQL databases, some for more exotic things.
  
  --
  Don't thank God, thank a doctor!
6. Re:Hardware is cheap. Developers aren't. by JamesP · 2010-03-28 05:01 · Score: 2, Insightful
  
  For the type of loads 'front-page' slashdot (and your site, most likely) gets, SQL fits fine. But even then, NoSQL may give you a run for the money.
  Now think of the loads incurred in the comment tree of slashdot.
  Also think how something like GMail or even Google Search would fit in an SQL scheme. It doesn't, not at least, with table juggling that would be very inefficient.
  
  --
  how long until /. fixes commenting on Chrome?
7. Re:Hardware is cheap. Developers aren't. by stimpleton · 2010-03-28 05:19 · Score: 1
  
  " just a single system with 2x Xeon E5420, Django/PostgreSQL at 10% load"
  
  For best total running costs(power, cooling, cost of capital). Running a server at 60% load is ideal. And scale from there. It doesnt matter with a single server. You have to start somewhere. But adding a second, you should start considering it.
  
  --
  
  In post Patriot Act America, the library books scan you.
8. Re:Hardware is cheap. Developers aren't. by slamb · 2010-03-28 06:59 · Score: 1
  
  Unless you attract enough attention to require scaling past 10M pages a day, you're wasting your time reinventing the wheel with NoSQL
  Significantly past 10M pages a day. That's only 12 per second, which unless your page views are tremendously complex shouldn't even be hard. Let's be conservative and say that you need to handle peak load of twice average, a rewrite would take a year and a half, and you grow at 25% per quarter. Then you should be looking at the number 88 pages per second when deciding if you need to start a rewrite. That still seems doable on one machine with a standard database engine, so you probably don't need to start yet.
9. Re:Hardware is cheap. Developers aren't. by An+Onerous+Coward · 2010-03-28 08:22 · Score: 1
  
  Sure, relational databases can handle lots of very high performance tasks. I don't think that's up for dispute.
  I'm skeptical of your premise, that the only problem NoSQL is trying to solve is, "what do I do if my site becomes the next Twitter?" If that were true, it wouldn't be getting the attention that it is right now.
  Many of the NoSQL databases have scalability "built in" in a way that relational DBs don't, but the bigger problem that I think NoSQL is trying to solve is, "What if my data doesn't fit easily into a relational schema?"
  There are times when you'd like to enrich some bit of your data, adding a new attribute or two, but your current database schema doesn't support the new form of data, and you have to decide how to extend your DB to allow for it.
  In my current project, my database includes lots of columns that just hold type data, to say things like "what sort of object does the foreign key refer to," and a couple of columns that hold marshalled hash objects. During a prior project, I suggested implementing what was essentially a key/value pair table inside the project database.
  With NoSQL databases, those sorts of contortions aren't necessary.
  Other contortions might be, depending on which database you're using. You might end up using a relational database for, say, financial information that requires all the overhead of ACID, transactions, prepared statements, etc., and a NoSQL database for parts of a user forum, and maybe even an ORM to stitch the two together.
  One suggestion: if you find yourself storing elaborate XML, YAML, or other formatted data in your DB, and especially if you're trying to use the data in a meaningful way (rather than just spitting out the raw data in response to someone's query), then NoSQL might have something to offer.
  I've done little with NoSQL myself, but the point is that they're both "just tools." Neither is going to be objectively better in all situations, both are going to be put through ill-fated attempts do do things that they were never intended to do, neither needs to die in a fire, and there is nothing particularly magic or exotic about NoSQL.
  
  --
  You want the truthiness? You can't handle the truthiness!
10. Re:Hardware is cheap. Developers aren't. by mini+me · 2010-03-28 18:31 · Score: 1
  
  You are absolutely correct. All of the NoSQL databases exist to solve problems that SQL databases do not solve elegantly. (Vertical) scalability is one of those problems, but hardly the only problem. Like you mentioned in your post, one of the more interesting applications of some of the NoSQL databases is the ability to index unstructured data.
  The web is an unstructured place. Most web-facing applications can benefit from being able to store unstructured data which is why some of the NoSQL databases are a perfect fit for web development. Yes, you can mathematically provide those services atop SQL, but the implementation is ugly and defeats the purpose of using SQL in the first place.
  Ultimately it is about using the right tool for the job. SQL is the right tool for many jobs, but it definitely is not the right tool for every job. The NoSQL movement is interesting because it is pushing the idea to developers that SQL is not always the right tool for the job and that there just might be a better database for your application. It is important that developers be aware that these tools exist so that they can be utilized where appropriate instead of trying to shoehorn every task into a SQL server.
11. Re:Hardware is cheap. Developers aren't. by prockcore · 2010-03-28 18:50 · Score: 1
  
  The ironic part is that "hardware is cheap, developers aren't" is the primary reason for going with NoSQL. Throwing more hardware at NoSQL is brain-dead easy. Cassandra, et al, are designed to cluster extremely easy. SQL databases aren't. Replication is very tricky with most SQL databases... it certainly involves programmer time. Cassandra is as simple as turning on a new server and pointing to it.
12. Re:Hardware is cheap. Developers aren't. by Thundersnatch · 2010-03-29 10:03 · Score: 1
  
  Let's be conservative and say that you need to handle peak load of twice average
  That's not conservative at all.
  Most business-related sites are significantly more "spikey" than what you illustrate. Our 50th percentile is about 4 dynamic page requests per second. 95th percentile is about 125 req/sec. 99.9% is 202 req/sec and peak is 400 dynamic page req/sec (times about six if you include static items).
  So we target the 99.9% for capacity planning, but that still leaves ten minutes per week where the application is quite slow. (It happens around 13;00 on Wednesdays US Central US time.)
  Capacity planning is hard to do right, and is never as simple as "two times average load"
13. Re:Hardware is cheap. Developers aren't. by slamb · 2010-03-30 09:31 · Score: 1
  
  Let's be conservative and say that you need to handle peak load of twice average
  That's not conservative at all. Most business-related sites are significantly more "spikey" than what you illustrate. Our 50th percentile is about 4 dynamic page requests per second.
  
  Interesting figures. Yeah, my rule of thumb is probably not conservative for your particular site.
  Your point would have been stronger if you gave information directly comparable to my statement. By average I meant "mean", the most common/widely accepted definition of the term. With the figures you described, your mean is at least 8 requests per second (that's with 0 requests/sec 50% of the time, 4 requests/sec 45% of the time, and 125 requests/sec 4.9% of the time, and 202 requests/sec .1% of the time). I'd guess more like 20, which means you have a factor of 10 difference between peak and average rather than my "conservative" 2. (You're probably almost entirely US-based?)
  
  Capacity planning is hard to do right, and is never as simple as "two times average load"
  I don't mean to suggest you should take my rule of thumb numbers as gospel without confirming them against your own service, or that capacity planning is easy. (Without going into specifics, I spend...well, quite a bit...of time on capacity planning for my...large...service...which uses...many...machines does not use standard...anything.) Likewise as I already alluded to, if your requests are inherently extremely expensive, 88 requests per second may be legitimately infeasible with a standard database stack. I accept that there are exceptions, but the idea that most people need a complex setup to scale to 10M requests per day doesn't seem right, and your post doesn't change my mind about that.
14. Re:Hardware is cheap. Developers aren't. by Thundersnatch · 2010-03-30 13:32 · Score: 1
  
  I'd guess more like 20, which means you have a factor of 10 difference between peak and average rather than my "conservative" 2.
  Reasonably close. Aritmetic mean is 31 req/s. My point was that planning for 60 req/s would leave us short of capacity forty hours of every week! The overwhelming majority of sites have regional audiences, and most other sites I've been involved with have similar patterns. Truly global sites are quite rare.
  
  You're probably almost entirely US-based?)
  Yep... a little Hawaii, Guam, Europe and Middle East in there too.
  
  I accept that there are exceptions, but the idea that most people need a complex setup to scale to 10M requests per day doesn't seem right, and your post doesn't change my mind about that.
  No, I agree with you entirely. Our application actually uses a single (replicated for HA and reporting) commercial SQL database instance. It isn't the bottleneck, even at full load. We have, like you, many application servers, and only a few static content servers.
  The NoSQL crowd is locking their data away, hiding it from standard analysis and reporting tools. Not to mention leaving relational integrity up to the application layer (which typically doesn't have transactional semantics.) Both are dumb moves unless you have absolutely no choice (like Google or Amazon).
  The only thing I disagreed with was the "two times average load" example you used.
Don't you dare tell me... by AmazingRuss · 2010-03-28 03:48 · Score: 4, Funny

... that I can't tell others what to do!
1. Re:Don't you dare tell me... by GaryOlson · 2010-03-28 09:21 · Score: 1
  
  Oh, shut up already!
  And get back in line!
  
  --
  Every mans' island needs an ocean; choose your ocean carefully.
Some docs can't wait for Cardiac Clamps to die. by Vellmont · 2010-03-28 03:48 · Score: 4, Informative

So you're in surgery for 3 hours doing a kidney transplant, having used your trusty medium vascular clamp that have served you for the past 20 years. You're finally done and the patient is in recovery, so you sit down to relax with the latest copy of JAMA. They've got a great article about the latest development of Cardiac clamps, and you think to yourself "Why not use a heart clamp for kidney transplants!" Brilliant. So you order up some new clamps from MedicalClamps.com, and use them on your next patient. The surgery goes fine, but 3 months later the patient is back in your office with a failed kidney. You open 'em up, and it's obvious the clamp exerted too much pressure on the artery, damaging it in the process. Stupid carciac clamps! You're not a heart surgeon!

--
AccountKiller
1. Re:Some docs can't wait for Cardiac Clamps to die. by WrongSizeGlass · 2010-03-28 04:18 · Score: 2, Informative
  
  I think this would have been better if you'd used a car analogy ... maybe something with hose clamps?
2. Re:Some docs can't wait for Cardiac Clamps to die. by gazbo · 2010-03-28 04:43 · Score: 4, Funny
  
  You missed out the bit where the article about cardiac clamps talks about how much better they are than the old-fashioned medium vascular clamp. And how every subsequent edition of JAMA has several articles all trumpeting the glory of the cardiac clamps over the now-outdated vascular clamps (although all of these articles are written by first-year med students who have never actually performed an operation - but they did once have a nose-bleed and chose to use a cardiac clamp to stop it).
  Analogies are FUN!
3. Re:Some docs can't wait for Cardiac Clamps to die. by Vellmont · 2010-03-28 06:13 · Score: 1
  
  I think this would have been better if you'd used a car analogy ... maybe something with hose clamps?
  
  Too many people already think software development and IT are merely glorified mechanics. I much prefer the idea that it's more like the un-matured medical field of the early 20th century.
  
  --
  AccountKiller
4. Re:Some docs can't wait for Cardiac Clamps to die. by ErikZ · 2010-03-28 07:33 · Score: 1
  
  Nah. Futurama references can really take an analogy where it was never meant to be:
  http://www.youtube.com/watch?v=qf8_tn7lBIc
  
  --
  Democrats or Republicans. They are both taking us to the same place and they are not afraid of us anymore.
Different strokes for different folks... by Anonymous Coward · 2010-03-28 03:50 · Score: 1, Interesting

I think this fellow's blog entry sums this up pretty nicely - especially the last paragraph: http://blog.cleverelephant.ca/2010/03/nonosql.html
Resources vs. Smarts by RedMage · 2010-03-28 04:01 · Score: 2, Insightful

FTA:
"In the meantime, DBAs should not be worried, because any company that has the resources to hire a DBA is likely has decision makers who understand business reality."
Bad English aside, I just don't agree. Money != Reality. I have worked both sides of this coin - Startups with plenty of money but don't see the value in proper maintainance of the data store (one almost was put out of business by a disk failure), and very smart startups that are running lean but do understand the risks.
That said, on the deeper level, why does business reality == SQL? Sure I can scale Oracle to support massive DB's (and have), but I could probably get more value from using Amazon's SimpleDB for things that don't require massive scaling. Use the right tool for the job - Hammers are for nails, etc. Do the design work up front, decide how its gonna work, and the right tool should present itself.

--
}#q NO CARRIER
1. Re:Resources vs. Smarts by pavon · 2010-03-28 04:51 · Score: 1
  
  Sure I can scale Oracle to support massive DB's (and have), but I could probably get more value from using Amazon's SimpleDB for things that don't require massive scaling. Use the right tool for the job ...
  Isn't the entire point of these NoSQL databases that they offer better scalability at the cost of traditional ACID data guarantees? Why would you give up the flexibility and reliability of SQL if you didn't need massive scaling?
2. Re:Resources vs. Smarts by RedMage · 2010-03-28 05:56 · Score: 1
  
  I can answer why WE decided to in this case: Cost and flexibility. For our application, there is a traditional PostgreSQL DB for things that ACID does well. For places we didn't think we'd need that level of transaction we decided that it wasn't cost effective to manage another DB instance, and instead move it to SimpleDB. Scalability wasn't the major driver at the moment, as we're already a distributed system.
  
  --
  }#q NO CARRIER
3. Re:Resources vs. Smarts by BitZtream · 2010-03-28 08:47 · Score: 2, Informative
  
  If you're worrying about the cost of an Oracle license, what DB you use is irrelevent, you simply aren't large enough to make a wrong choice.
  When you are large enough for this to matter, the cost of Oracle or the cost of a handful of DBAs is the least of your concern.
  It blows my mind how much value slashdot geeks put on the cost of software. You guys have absolutely no fucking clue how much a single employee costs a company excluding salary do you? You've been spending far too much time living in the basement and drooling over free (as in no cost) software to realize that not everyone is broke like you are. Real businesses don't worry about software license costs, they are so trivial in the grand scheme of things. You realize repurchasing all the software on pretty much any workers PC will be paid off in a couple months of their salary? Do you really not have any idea how 'cheap' Oracle is when you get to that scale?
  No, you don't. Clearly.
  Right tool for the right job is correct, and building your own or using someone elses half assed hacked together pile of 'OSS' is generally not the way businesses care to run. They typically want to use software from someone who has some sort of vested interest in the software not sucking ass. Its far less expensive to buy from Oracle than it is to deal with a fincky OSS developer. If you're going to hire your own inhouse developer to maintain it you've instantly spent more than you would have spent just buying some software and you now have none of the advantages of such.
  Stop talking about business reality when you clearly haven't even been in that part of the real world.
  
  --
  Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
4. Re:Resources vs. Smarts by rmm4pi8 · 2010-03-29 02:03 · Score: 1
  
  That is, for better or worse, just not true. I work for a $400MM company, nearly half profit, and there's no way we'd invest in Oracle & DBAs for our OLTP systems. It's not the total dollar volume, it's that since you always try to grab the most profitable niches first, you tend to grow by eeking profit out of places with less room for it, e.g. less profit per user/transaction/whatever, and thus anything but open source on commodity hardware means that your costs grow faster than your profits and the tech department is unpopular. You might think that's silly, but in the consumer online space that's how the business thinks.
  
  --
  U.S. War Crimes blog. Email for free Mandriva support.
No all databases are for business by C3c6e6 · 2010-03-28 04:03 · Score: 1

I think the author of TFA is missing something: not all databases / datastores are developed for businesses to keep track of their inventories. These days, many scientific disciplines, such as bioinformatics, rely heavily on databases as well.
The latest experimental techniques produce so much data such that "old-fashioned" RDBMSs just don't cut it anymore. So, for certain application domains, NoSQL seems to be at the moment the way forward. I'm afraid the author can wish all the he wants but NoSQL is gonna be around for a while. Until something better comes up, that is.
1. Re:No all databases are for business by jda104 · 2010-03-28 04:56 · Score: 1
  
  I think you're committing the same sin of which you're accusing the author, just on the opposite side of the pendulum.
  Saying that all DRBMSs won't "cut it" for modern applications in any domain is pretty narrow-minded. It seems like a simple rule of thumb to me: put relational data in an RDBMS, put key-value data in a Key-Value DB...
  As an aside, having worked on the Information Retrieval side of bioinformatics for the past few years, I've found that the complex side of bioinformatics is generally in the computation, not the retrieval. I've been well-suited by a single RDBMS server up to this point, though I have played around with MemCached for a couple of web apps.
Totally confusing... by CoffeeDregs · 2010-03-28 04:04 · Score: 1

The article was stripped of all nuance and then injected with confusing bits. e.g.
>NoSQL will never die, but it will eventually get marginalized, like how Rails was marginalized by NoSQL
What? How was Rails marginalized by NoSQL?
Also, it's nice to see the whole BerkeleyDB-ish/key-value sector of the data storage world suddenly exploding with innovation. There's a lot of dogma on both sides of the NoSQL argument (and the name "NoSQL" doesn't help), but some of the many NoSQL tools look as though they'll be pretty useful. Cassandra and MongoDB especially. And big companies getting behind the growth of new tools is never a bad thing.
1. Re:Totally confusing... by LizardKing · 2010-03-28 05:24 · Score: 1
  
  How was Rails marginalized by NoSQL?
  I think what Ted meant, and I have seen this myself, is that the kind of people who are always itching to move onto the "next big thing" switched from RoR to NoSQL when it appeared. This is probably a good thing, as those kind of people tend to be the shrill fanboys - pushing technology X as the best thing in the world, while berating those who stick with more mature technologies. When they move on, there's more of a chance to be able to site back and see what advantages technology X really offers, minus the hype.
  By the way, is Ted mellowing a bit? In this article there's not a single swear word or nasty analogy in sight!
Re:Right! by WrongSizeGlass · 2010-03-28 04:06 · Score: 2, Insightful

Everyone's needs are different, and there are going to be different solutions for those needs. If NoSQL isn't for you then just don't use it (don't spend any time learning it, try it out, running a site with it, etc, etc). I don't have a need for it yet, but we do all sorts of sites and programming so who knows if it will be the right solution for one of our future projects? I won't unless I learn about it, test it and get my hands dirty with it.

And as far as it being 'a product of the braindead and buzzword-infested effluents of the American "education" system, where nobody understands math or logic', I don't care if it came from the bottom of a well in the middle of a jungle where they are masters of logic and math, if it could possibly meet my client's needs then I'm going to give it the time and attention it takes to make the decision for myself.
The Article Is Right... And Wrong by SQL+Error · 2010-03-28 04:09 · Score: 3, Insightful

Real business track their data with SQL databases, true. However, real businesses have small numbers of transactions relative to their value. If Walmart had the same revenue but the average sale was a tenth of a cent, their fancy SQL database would be smouldering rubble.
That's what Facebook and Twitter and other large social media sites are facing. Just try running Twitter's volume and Twitter's page hits and API hits off MySQL. It doesn't matter how many replicas you run, it's not going to work. Maybe you could run it on a cluster of IBM Z-series mainframes running DB2 - but where is the money going to come from?
Cassandra and HBase and the other distributed NoSQL database solve specific problems in specific ways. They won't work for Walmart, but they'll do the job just fine for Facebook and Twitter. If you have those specific scaling problems and can live with the restrictions (you lose ACID, indexes, and joins to varying degrees) then they'll work for you.
If all you know is that your site is running slow, then implementing NoSQL is unlikely to improve things.
Someone tell me again... by jim_v2000 · 2010-03-28 04:13 · Score: 1, Insightful

Why should I give two shits about what database system someone else uses?

--
Don't take life so seriously. No one makes it out alive.
1. Re:Someone tell me again... by hamburgler007 · 2010-03-28 05:01 · Score: 1
  
  Because we learn by example and experience.
2. Re:Someone tell me again... by jim_v2000 · 2010-03-28 05:50 · Score: 1
  
  So that means that every other week someone should post an article about why relational/non-relational databases should go away?
  
  --
  Don't take life so seriously. No one makes it out alive.
Some people just want the holy grail by SmallFurryCreature · 2010-03-28 04:14 · Score: 2, Insightful

I think some developers keep looking for the holy grail. Some magical solution that will turn development from punching in code, to Star Trek: "Computer do my job for me please".
Template languages, 4GL, NoSQL, Ruby on Rails... it is all part of an attempt to take the nasty out of development and they all... well... they all just don't really happen.
Because deep down, with all the frameworks and generators, if you want your code to do what you want it to do, you are still writing out if statements a lot.
And yes, OO and such also belong to this. Not the concept themselves, but the way most people talk about. OO means code re-use right?
If you said yes, then you are a manager, go put on your tie, you will never be any good at coding.
You can re-use all code. And it has been done for a long time.
What, did you think that people who wrote basic for the C64 went "Oh I wrote this bit of code for printing, now I need the same functionality, I am going to write it all over again!"
OO does make code re-use a bit easier BUT that is NOT the claim that people often make. Trust me, I ask this in interviews and it is always the same answer. Apparently you can't re-use functions. No way, no how. NEXT!
I see two kind of developers. Those who hate their job and those who don't. The former want to be managers, get away from writing code as fast as possible. And they will leap on anything that seems to make their jobs easier. Meanwhile the rest of us go on with actually producing stuff.
Just check, how many times do you get one of those managers wannabe introducing something they read in a magazine because it promises that you don't need to write another line of code ever!

--

MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
1. Re:Some people just want the holy grail by WrongSizeGlass · 2010-03-28 04:22 · Score: 1
  
  "Computer do my job for me please"
  [HAL] Certainly, Small Furry Creature ... would you like fries with that? [/HAL]
2. Re:Some people just want the holy grail by Vellmont · 2010-03-28 04:36 · Score: 3, Insightful
  
  In some ways I agree with the general idea of your post. But stepping back a bit, code HAS gotten easier to write over the long term. I'd hope nobody would argue that writing a large application in a modern high level language is easier than writing it using 1970s technology in assembly. Those advancements in language came through a lot of trial and error (a lot of error). How many failed language exist that turned out to be dead ends (though spurred further advancements and refinements?). How do you know the technologies you mentioned won't turn into the next (your favorite productive language here)?
  You're right that endlessly pursuing the latest trend is just foolhardy, as most "new latest greatest technology" turn out to be duds. The point being those duds sometimes DO pan out. Anyone that thinks that relational databases are the end-all-be-all of persistent data storage hasn't done enough relational database development to understand some of the limitations.
  
  --
  AccountKiller
3. Re:Some people just want the holy grail by tukang · 2010-03-28 04:37 · Score: 1, Informative
  
  OO does make code re-use a bit easier BUT that is NOT the claim that people often make. Trust me, I ask this in interviews and it is always the same answer. Apparently you can't re-use functions. No way, no how. NEXT!
  You can reuse functions but you can't extend them and that's where OOs reuse shines. It's very powerful to be able to lay out your code as a tree and control the reuse 'flow' at the nodes.
4. Re:Some people just want the holy grail by SanityInAnarchy · 2010-03-28 05:05 · Score: 1
  
  I think some developers keep looking for the holy grail. Some magical solution that will turn development from punching in code, to Star Trek: "Computer do my job for me please".
  While true, that doesn't invalidate actual progress.
  
  deep down, with all the frameworks and generators, if you want your code to do what you want it to do, you are still writing out if statements a lot.
  Sure, no one claimed otherwise. The point is that I'd much rather be writing the if statements that have something to do with what I want my code to do, rather than the if statements that have to do with how to manage a session, or how to handle one URL vs another, or how to sanitize this particular query for the database.
  As an example, you bash Rails -- ActiveRecord is hardly the only ORM capable of this, but it's possible to build a Rails app without writing a single line of SQL, or opening yourself up to a hint of a possibility of a SQL injection attack. Similarly, writing in any language higher-level than, say, C or C++, will generally save you from the possibility of a buffer overrun.
  Now, is that going to take out all the tedium? No, it might not even make my job less tedious at all. What it will do is mean I'm more productive -- I'm doing more in the same amount of time. That's usually valuable.
  
  yes, OO and such also belong to this. Not the concept themselves, but the way most people talk about. OO means code re-use right?
  It's a way of organizing your program, and it actually does have a lot to do with code re-use. For example:
  
  Apparently you can't re-use functions. No way, no how. NEXT!
  One core goal of OO is to encapsulate and hide away the details of a given concept and expose only a simple, re-usable interface. Now, anything you can do in one Turing-complete language, you can do in another, but it can be a considerably different level of pain and complexity. Consider basic patterns like iterators in a language like C -- yes, you can do them, but does it really make sense?
  
  Just check, how many times do you get one of those managers wannabe introducing something they read in a magazine because it promises that you don't need to write another line of code ever!
  Well, I tend to advocate Rails, web apps, NoSQL, REST, higher-level languages, metaprogramming, and so on. I do this partly because of what I can do with those, and partly because they make my job easier, and thus more productive.
  I never do them because I expect to stop having to write code.
  
  --
  Don't thank God, thank a doctor!
5. Re:Some people just want the holy grail by raftpeople · 2010-03-28 05:08 · Score: 1
  
  But, in defense of the parent, when OO was first becoming popular on a large scale (early 90's), that was exactly the message in the media etc. OO will make programming a snap because you just plug together a bunch of objects.
  
  The reality is that OO's primary benefit is in reducing complexity by encapsulation, although for some specific areas like GUI's it has the additional benefit of easy extension.
6. Re:Some people just want the holy grail by lenski · 2010-03-28 05:18 · Score: 1
  
  ...is easier than writing it using 1970s technology in assembly
  Been there, done that: Assembly, FORTRAN (II, IV, V), COBOL, PL/1. (I remember when DB2 rented for several grand per month.)
  Developing has gotten way more easier, though what I've found is that the expectations are way more greater [sic]. Many experiments have died on the vine, but as you note, we needed those experiments to find the right solutions.
  
  Anyone that thinks that relational databases are the end-all-be-all of persistent data storage hasn't done enough relational database development to understand some of the limitations.
  Agreed: My experience supports the hypothesis.
7. Re:Some people just want the holy grail by Vellmont · 2010-03-28 05:45 · Score: 1
  
  You can reuse functions but you can't extend them and that's where OOs reuse shines. It's very powerful to be able to lay out your code as a tree and control the reuse 'flow' at the nodes.
  
  With each new tool brings a new way to abuse it. What I'd add is that for code re-use to work (in whatever language) you have to design it to be reused in the first place. A function written by a halfway decent developer with at least some re-use in mind is going to be 10 times better than a shitty OO object designed by what I call a "get the cookie" developer.
  
  --
  AccountKiller
8. Re:Some people just want the holy grail by Jane+Q.+Public · 2010-03-28 08:49 · Score: 1
  
  Mod parent up.
  
  Have you ever used Windows 95/98 API calls? Typically, you'd be calling a C function that had a list of arguments a yard long. And also typically, the results you really wanted were returned as changed argument values, not as the return value of the function at all. Which is precisely the kind of programming I was taught at University to avoid at all costs, i.e., passing parameters by reference. Because passing by value avoids side effects. But the fact is, Windows was built around functions that work via side effects.
  
  OO does much to improve that situation. Not everything, but a lot. Programming paradigms might still not be perfect, and they might still in some ways resemble the old-school way of doing things, but they are not the same, and they are demonstrably better.
  
  So along that line, I have to add that I would not work for the person who wrote GP. At least not for long. I use the modern methods, and I produce more. Don't tell me that you can produce as good and as much using the old-school techniques and tools, because I was there then too. You can't. These aren't fads, they are evolutionary steps. Not every one is revolutionary, but they are evolutionary. And as such, some of them will be dead ends. But Rails isn't one of them. In fact, depending on who you talk to, Ruby, thanks largely to Rails, is still the fastest growing language, and has been for some years now. Not exactly a flash in the pan.
  
  I used VB back when there were few alternatives (yes, Delphi was a viable alternative, and superior IMHO). I was good at it. I could hammer out a program in short order. BUT a lot of that was due to the built-in drag-and-drop UI. If you throw in database access and so on, it could take quite a bit longer. And a complex program can take a lot of time to do properly.
  
  I can do a complex program in Ruby in less time (not counting the UI unless you want it web-based), and be much more confident that it works properly because of the more robust and mature testing frameworks. And I have more fun doing it, too.
9. Re:Some people just want the holy grail by BitZtream · 2010-03-28 08:55 · Score: 1
  
  I'd hope nobody would argue that writing a large application in a modern high level language is easier than writing it using 1970s technology in assembly.
  Depends on your goals. If I were aiming for high reliability and performance, Assembly is the best way to go, but you'll be waiting years to get it done.
  If I want it tomorrow in some way that works 50% of the time or more, I'll use a 'modern' language.
  The more 'modern' the language, the more complex it is in and of itself. Anyone who truely understands how the GCs in the .NET runtime and Java work also know that they are extremely complex beasts and that 99.9% of the people who use them don't have a clue as to what goes on behind the scenes. This makes most modern languages in fact far more unsafe than lower level ones because while they make it easier at first glance, they are infact FAR more complex than what people think of as being 'hard'.
  I've never had a problem with memory management in assembly, but a basic webserver is the most I've ever had to write. I don't think I've ever written a Java or .NET app that I haven't had circular reference issues preventing memory from getting collected or had some sort of cleanup performed on an object that was already tombstoned.
  Your statement is exactly the problem with 'modern' langauges. People assume they are easier but really don't understand how they work.
  The problem is people think that throwing an exception box on the screen is somehow different than a crash with no warning other than 'it broke'. The end result is the same, the program crashed. It doesn't matter if the VM is able to provide some worthless message to the user, the user is still put out. The only difference is the 'developer' (not that they should be called that) can say 'oh its a handled bug, don't do that' or some other bullshit they learned from MS.
  
  --
  Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
10. Re:Some people just want the holy grail by Anonymous Coward · 2010-03-28 09:17 · Score: 1, Insightful
  
  You can reuse functions but you can't extend them and that's where OOs reuse shines.
  Have you ever programmed in anything other than OO? I was reusing and extending functions in several different languages long before OO became fashionable.
  
  It's very powerful to be able to lay out your code as a tree and control the reuse 'flow' at the nodes.
  Yeah, until the customer asks for audit logging on everything that saves data and for a dozen distant nodes to start talking to each other. Then you have to redesign the tree from near enough the ground up. Which has happened on pretty much every OO enterprise application project I've ever seen.
  Face it, for almost anything except games and GUIs, OO bites.
11. Re:Some people just want the holy grail by sproketboy · 2010-03-28 11:16 · Score: 1
  
  I wish I had mod points. Good post.
12. Re:Some people just want the holy grail by ceoyoyo · 2010-03-28 15:42 · Score: 1
  
  True, and occasionally people actually do that. OOs biggest feature though, is that it forces you to organize your code. You have to think about partitioning functional bits off into different objects and you have to think about the interfaces for those objects. It forces you to design modular code with defined ways for different modules to interact. It's nothing you can't do in a non OO language, but OO forces you to do it.
13. Re:Some people just want the holy grail by einhverfr · 2010-03-28 16:38 · Score: 1
  
  Anyone that thinks that relational databases are the end-all-be-all of persistent data storage hasn't done enough relational database development to understand some of the limitations.
  Well, there are some important and underrated strengths of RDBMS's in this area. No, I don't believe that RDBMS's should replace temp files, and even XML has it's uses. However, for anything you might possibly need to do complex reporting on at any point in the future, RDBMS's are the way to go. This is the single biggest reason to use an RDBMS. Also where you already use an RDBMS, often it can be useful for other bits of info which are outside of the core RDBMS area.
  For example, LedgerSMB uses key/value modelling to store application menus and application settings. The data model sucks because one is representing freeform data in a tool designed for highly structured data. However, we put it in the db because it adds less complexity than adding yet another storage media.
  So it's not just the complexity of the data model. And RDBMS's aren't perfect for all things. However, I would be very surprised of Twitter or Facebook started putting their real business data in NoSQL databases.....
  
  --
  
  LedgerSMB: Open source Accounting/ERP
Right Tools For The Job by Ashcrow · 2010-03-28 04:16 · Score: 1

I think the frustration is actually in some people not using the right tools for the job. I like NoSQL databases (specifically MongoDB), but I have not used them with anything I've written. Why? Because it wasn't the right tool for the job. I tend to use MySQL, Postgres or sqlite because it's so widely available and well known in how to administer. There are times that NoSQL will makes sense, it's just not the area I work in.

I do think we are going to continue seeing an uptick in NoSQL related things since many companies are fixated on "the cloud" while not really knowing what "the cloud" is (heck, no one still really, truly has a common definition of what it means ...). Since NoSQL seems to be a popular tool, and "the cloud" is a popular buzz phrase CIO's/CTO's will likely be pushing their shops to utilize "NoSQL in the cloud". While large scale applications which don't require relational information and need fast syncing across many servers is good grounds for NoSQL, these "NoSQL in the cloud" instances will probably not actually fit that status.

I do agree that it will be a good thing when "NoSQL for everything" dies. Just like it was a good thing when "PERL for everything", "Java for everything" and "Ruby for everything" died, but let's not throw out the whole idea because a lot of people use it wrong.
1. Re:Right Tools For The Job by Jane+Q.+Public · 2010-03-28 08:55 · Score: 1
  
  I predict an eventual "cloud bubble" much like the 2000-ish "tech bubble" (which was really a Web bubble). People will find out -- probably due to some catastrophic service crash -- that "the cloud" is not all it is cracked up to be, and their business models will be toast.
  
  Of course, many of us know what the cloud is and isn't good for already. But many don't seem to quite get it yet.
XML (of databases)? by Manip · 2010-03-28 04:17 · Score: 1

The company I work out is currently having a huge headache moving from files into databases. We currently store everything in XML which gives us a great amount of freedom and adaptability. However most database solutions fix you to a single (or handful) of data definitions. Which you can kind of re-create XML be defining all kinds of crazy relationships, it gets hugely convoluted (to say the least).
I would LOVE to see a document/XML-live database. Just needs to do things that standard databases support (e.g. Security Model, Easy Mirroring, Search/Queries) to make it worth our while moving at all. Last I checked we're up to 260,000 XML files and approx 40 different distinct file "formats" (XML layouts).
1. Re:XML (of databases)? by DalDei · 2010-03-28 04:26 · Score: 1
  
  You want MarkLogic (www,marklogic.com). Really, you do. You'll never go back to Relational again. /Disclaimer ( I'm not related to MarkLogic, just a fanboy)
2. Re:XML (of databases)? by Gorath99 · 2010-03-28 04:37 · Score: 1
  
  Have you checked out something like XML DB? I haven't used it much myself, but it sounds like it may meet your needs. It comes bundled with the XE database, which is free as in beer. (But XE has some limitations that the enterprise product doesn't have, of course.)
  Disclaimer: I work for Oracle.
3. Re:XML (of databases)? by kronosopher · 2010-03-28 04:50 · Score: 1
  
  use an object relational mapper or something similar to generate interoperable database/xml schematics.. you should be able to write conversion scripts pretty easily that way. In so far as mirroring and xml and sql datastores, I'm not sure if such a solution exists.
4. Re:XML (of databases)? by kuhneng · 2010-03-28 04:58 · Score: 1
  
  Oracle and DB2 both support the SQL/XML standard and provide quite a bit of functionality for native handling of XML. Both can store structured / compressed representations in a native XML type (with or without a predefined schema) and use XPath-based indexes for efficient query execution.
  Wonderful stuff, and one of the few features I really miss back in the PostgreSQL world.
5. Re:XML (of databases)? by matria · 2010-03-28 05:23 · Score: 1
  
  http://www.xpdo.org/
6. Re:XML (of databases)? by ckaminski · 2010-04-04 14:19 · Score: 1
  
  Check out Berkeley DB XML
  
  http://www.oracle.com/database/berkeley-db/xml/index.html
because all apps are the same by vonkohorn · 2010-03-28 04:21 · Score: 1

So they should all use the same data management tools as wallmart. Is that the reasoning? Better to use the right tool for each job. Some things work better in a nosql non-schema.

--
Better to light a candle than complain about the darkness.
The NoSQL debate never gives any real information by i_ate_god · 2010-03-28 04:29 · Score: 1

At first, I thought NoSQL like Cassandra should simply be used as a store for precomputed relationships. Then I thought NoSQL was just a structureless store that can scale in any given direction with no effort.
Both sound interesting, but then the debate against NoSQL is just "well, SQL can already do all that, but you get data integrity with it. If it doesn't scale, then just build a manly man's server and it will".
So, I dunno. The whole debate has gotten very religious very quickly and as a result, no one is really doing a proper comparison because no one seems to take the approach of "right tool for the right job, so here are the jobs NoSQL Is right for, and here are the jobs your RDBMS is good for".

--
I'm god, but it's a bit of a drag really...
ORLY? by Anonymous Coward · 2010-03-28 04:32 · Score: 1, Funny

You can survive a slashdotting just fine without spending so much time on those exotic tools.
Care to provide a link to your site so we can test this?
More RDBMS dogma by Angst+Badger · 2010-03-28 04:32 · Score: 3, Insightful

Use the right tool for the job, except databases, eh?
The simple fact of the matter is that not every app is aiming for Google's scale. (Not every app is web-based or even going to be web-based, though people seem to forget that.) And even some large-scale apps don't fit the relational model very well, medical records being one of the more outstanding examples.
And yes, I have read Codd and Date and understand the relational model and its benefits very well, and it annoys me to no end when people break the relational model without realizing or understanding what it costs them. That said, sometimes those costs are acceptable, and sometimes an application requires features that the relational model does not (and in fact cannot) bring to the table.
It may be, as with every other silver bullet fad, that what's at work here is the basic human tendency to become familiar with something, begin to see everything in terms of it, and then try to persuade anyone who'll listen that they are in possession of the all-singing, all-dancing solution to all problems. Today, it's Ruby, multi-touch interfaces, and functional programming. But not very long ago it was COBOL and CICS. And while one must acknowledge that progress has been made, it is equally obvious that progress will continue to be made and that "one size fits all" is always BS, even in clothing.

--
Proud member of the Weirdo-American community.
1. Re:More RDBMS dogma by Jane+Q.+Public · 2010-03-28 09:07 · Score: 1
  
  Agree with the other poster. Medical records would seem to be an ideal candidate for relational mapping. Not all the information of course, there is plenty of room for free-form text. But that fits into the relational model, too.
  
  Certainly, some things would be huge. The sheer number of possible maladies, for example. Nevertheless, that is not an argument for non-relational storage. And files... relational DBs handle digital images (high-res scans of x-ray film, for example) just fine. If performance is an issue, it is not necessary to store the binary information in the database itself. And the location of the original physical files can be tracked easily.
  
  So unless there is something I simply haven't considered, I have no idea why medical records might be unsuited for a relational model.
2. Re:More RDBMS dogma by Joey+Vegetables · 2010-03-28 11:36 · Score: 1
  
  My roughly 20 years of development experience in litigation, publishing, financial, brokerage, Web, and embedded systems have only very rarely presented me with a data persistence problem for which some type of RDBMS was not an acceptable solution.
  I handle data models similar to those in the medical industry through the "EAV" (entity/attribute/value) model. In fact, the linked article discusses clinical findings as a typical example of where EAV might prove useful. There are tradeoffs for sure, but generally speaking, it works very well when you have a very large number of potential attributes that each entity *might* have, but a much smaller number of attributes that any particular entity *will* have. For instance there are hundreds of thousands of illnesses or courses of treatment a patient might have, but no one will ever live long enough (or have enough money) to go through all, or more than a tiny fraction, of these.
  What's great about EAV is that you still get most of the benefits of the relational model, as well as most of the benefits of indexing. The tradeoff is that you do end up having to write more complex code, and sometimes querying can get tricky if your values are of multiple, incompatible types (lots of casting, etc.). Also, your querying and reporting tools, and your PHBs, all need to be aware that attribute data is being stored as rows rather than columns, and sometimes naive ones aren't.
  
  --
  
  Nonaggression works!
3. Re:More RDBMS dogma by mdielmann · 2010-03-29 07:39 · Score: 1
  
  ..."one size fits all" is always BS, even in clothing.
  That dress seemed to work pretty well for Alice. And if she had to do the shoot wearing a bikini so they could add the dress using CGI later, well, that works pretty well, too. ;-)
  
  --
  Sure I'm paranoid, but am I paranoid enough?
...because "there can be only one!" by Joce640k · 2010-03-28 04:38 · Score: 5, Funny

The whole of geek debating is based on the Highlander principle.

--
No sig today...
1. Re:...because "there can be only one!" by Troy · 2010-03-28 05:39 · Score: 1
  
  At some point in the future, I'm going to plagiarize this statement.
  It just seemed polite to give you notice.
2. Re:...because "there can be only one!" by ceoyoyo · 2010-03-28 08:04 · Score: 1
  
  That's the problem. I haven't been cutting off their heads when I beat them. Time to sharpen the sword.
"It's Just a damned popularity with you kids!" by Dark_Matter88 · 2010-03-28 04:41 · Score: 1

Sure, Ive messed around with some NoSQL databases, they just aren't my thing, give me mysql, your spec and a cup of tea and i dont have to look round silly experiments to see the best way of doing things in new radical 'paradigms.' That being said, I am glad the experiments are being done by people who are in such an environment to experiment. I mean, like the article says, its the social networks like twitter and facebook developing things like Cassandra, and its good that there is someone pushing the bar, but they are the only people who CAN do this, they aren't necessary, nobodies gonna die from a 5 minute outtage of poking each other (that sounds bad). I havent really understood the whole NoSQL thing,I havent really ever had a problem with SQL based Databases, maybe thats just the nature of my work, but it all seems as though this has nothing to do with technology, just people who want to be heard...
Price may favor noSQL for some applications by cervo · 2010-03-28 04:43 · Score: 4, Informative

Many of the NoSQL sources scale better than a normal database and are available cheap. Oracle costs a fortune, and if you want to run Oracle on a cluster good luck. They also don't let you publish benchmarks without their permission. But most people I know who use Oracle claim it totally beats everything else (without further clarification). DB2 includes a cluster edition that is also quite good. It uses a shared nothing architecture. But none of these solutions are free. Also teradata is also cited as a good parallel database. If you are a start-up and your choice is a NoSQL solution that is almost free or 100,000+ for some commercial parallel database, which do you go to?

But no matter what you will consume resources with a relationship database on ensuring consistency (which many times is what you want but not 100% of the time). Amazon's Dynamo works by not caring so much about consistency and trading consistency for availability of the overall service. For a shopping cart it is fine, but you wouldn't want to do your credit card processing using it. Google's GFS is optimized to do the file operations that google does the most. However there was an article in the ACM not that long ago comparing Map Reduce (Hadoop's implementation) against two parallel databases, and it lost. OF course the Parallel Databases were all not free....and hadoop is....

So overall I'd say the decision comes down to price mostly (as it does with most startups). If you can make do with one server than sure do PostgreSQL (or mySQL...although they always tried to force licensing for commercial products even though it is GPL...). If you need a cluster, both have clustering solutions, but as far as I can tell they are not as good as the commercial Parallel databases. If you have lots of money then sure go with Oracle, it seems through word of mouth Oracle is the best for both parallel and stand alone in terms of performance. DB2 was good enough for a former job. They had terabytes in the mid 1990's using about 20 servers. Now that the hardware is much better I'm sure it scales even better.... But if money is a consideration, then go with an open source noSQL solution. A lot of people now swear by Cassandra, I haven't had a chance to check it out yet.
1. Re:Price may favor noSQL for some applications by konohitowa · 2010-03-28 06:17 · Score: 1
  
  Oracle no longer runs on clusters? When did they drop clustering support?
2. Re:Price may favor noSQL for some applications by Dahamma · 2010-03-28 09:07 · Score: 1
  
  I think it was that SQL injection to their source control web interface a while back... "Foo'); DROP TABLE clustering_support; --"
3. Re:Price may favor noSQL for some applications by ducomputergeek · 2010-03-28 11:25 · Score: 1
  
  There are different databases for different tasks. Teradata produces a damn good database, but it's primarily for data wearhousing and BI tasks. That being said, they now have a version that will run up to 1TB (I believe it's 1TB) available Free. Just like IBM has their DB2 Express-C edition for free as well. Both have suited our tasks for development work just fine.
  However, PostgreSQL 9 is supposed to have native replication/clustering/Hot Standby. Which would address my concerns with using the database in mission critical databases that aren't dealing with large datasets.
  
  --
  "The problem with socialism is eventually you run out of other people's money" - Thatcher.
4. Re:Price may favor noSQL for some applications by cervo · 2010-03-29 01:17 · Score: 1
  
  I meant good luck due to the price. Most companies I have worked for (including a giant real estate firm) see Oracle as so expensive that even a single instance is tightly locked down and only for mission critical things. They also tend to use other databases for less sensitive things...ie Microsoft SQL Server, Sybase, etc.. Now I work for one of the most successful asset managers, and even they only use Oracle for their financials/HR and the rest they use sybase for. A single instance running on a single machine is super expensive. A cluster is even more expensive than that.
  
  Even my college has Oracle so locked down that if you don't have a class that requires Oracle, you don't get access. And they lock you out as soon as the semester ends. Meanwhile MySQL you can have all the access you want.
5. Re:Price may favor noSQL for some applications by konohitowa · 2010-03-29 10:40 · Score: 1
  
  Ah, okay. I haven't maintained any Oracle installations for a while, so I really didn't know whether they were still running the same clustered tech or not. Last I paid any attention, they were still doing the Sun/Oracle turnkey clusters. Then again, last I paid attention, Amazon and eBay were Oracle on the backend. I have no idea if that's still the case.
Re:The Article Is Right... And Wrong by ducomputergeek · 2010-03-28 04:45 · Score: 1

If you get to the size of Walmart doing anything, you have access to the capital to get a system from IBM or Oracle for OLTP and Teradata for data wearhousing.

--
"The problem with socialism is eventually you run out of other people's money" - Thatcher.
I'm Still Fuzzy on NoSQL by RAMMS+EIN · 2010-03-28 04:55 · Score: 4, Interesting

I'm still fuzzy on what NoSQL is supposed to be and what it is supposed to bring to the table.
From what I've understood, it's basically a common banner for various different databases that all share the common property of not being relational databases and not providing ACID guarantees.
If so, it seems to me that the whole NoSQL vs. RDMBS debate is about a false dichotomy. There are some applications where a relational database is the right tool for the job, and there are some where a relational database is not the right tool for the job. In some of those latter cases, one of the NoSQL databases may be the right thing.
This is nothing new. Non-relational databases have been used on Unix for a long time, and are even a standard part of POSIX (see for example the manpage for dbm_open). It's also long been known that, for example, Berkeley DB can be a lot faster than an RDBMS - as long as your application doesn't make use of all the features an RDBMS provides. Lots of programs even don't use one of these database systems, but invent their own, custom format. Git is a very successful example of this.
To me, it seems that what we are seeing here is loads of people who had learned to use relational databases for all their storage needs discovering that there are other ways to store data, and that one of those methods may work better than an RDMBS for a particular application. Well, yes. Does that surprise anyone? It sure doesn't surprise me. Does it mean that RDMBSes are now useless? Not at all. Does it mean you should use a non-relational storage system where this makes more sense? Of course! Now, can we please get back to work? I don't see the point of having a holy war over whether RDBMS or NoSQL is better, when common sense says that they both have their uses.

--
Please correct me if I got my facts wrong.
1. Re:I'm Still Fuzzy on NoSQL by jeremie · 2010-03-28 08:42 · Score: 1
  
  Agreed!
  If your system has a complexity of X to build, and using a traditional RDBMS solves Y% up front, it adds an ongoing Z complexity overhead in growing/maintaining the system over time. NoSQL may have a lower Y, but it 's goal is to also have a lower Z, and can often win out in the bigger picture even without talking about massive scale. It's not true all the time, but in my experience it's been the general rule :)
  In any sufficiently complex system a combination of both usually work well together. I heard a good offhand comment this week of "80% of the structure should be in an RDBMS and 80% of the data should be via NoSQL" which makes a lot of sense to me.
2. Re:I'm Still Fuzzy on NoSQL by ishobo · 2010-03-28 09:17 · Score: 2, Insightful
  
  Unfortunately the NoSQL people should have called their movement "nonrelational". You can have a relational database and not use SQL; the two are not dependent on each other as there are nonrelational databases that allows the use of SQL. Although the movement for the use of nonrelational databases may be new, the use of nonrelationals is not. My first exposre to a business class database was Pick in the 70s. There are plenty of these types of systems in use today. Nonrelationals have been going strong for over 40 years
  This article is another case of somebody that does not have the breadth of experience in the field. The same applies to the the people that started this movement. I have to ask, does college no longer teach the history of computing?
  
  --
  Slashdot - The great and glorious cluster fuck of Internet wisdom.
BS by ajung · 2010-03-28 05:05 · Score: 1

I call: bullshit
We are using object-oriented databases like the ZODB for ten years when the data model is not relational oriented
We are using relational databases when your data is relational
We are using relational databases and object-oriented databases together in the same app when we need both
We are not using MySQL when we are in need of a *real* database.
Use the right tool for each problem - only idiots use a RDBMS for all and everything.
1. Re:BS by shutdown+-p+now · 2010-03-28 13:57 · Score: 1
  
  I don't think OODBMS really count as "NoSQL" these days. So far as I can see, the major point of the latter is lack of any ACID guarantees. Now, I've never worked with an OO database in production, but for those few that I've tinkered with to satisfy my curiosity, they did have proper transactions, rigidly defined data models (different limitations compared to RDBMS, and generally fewer of them, but the schema is still there) etc.
SQL performance by garry_g · 2010-03-28 05:06 · Score: 3, Insightful

People complaining about SQL performance are most likely either using incorrectly scaled machines for the job, or believe they can throw a four-line SQL statement at the database and expect it to work out the optimization on its own ... query optimizers may be able to do a decent job on average, but once you go large databases (multi-million dataset tables), planing the query structure will go a long way preserving performance.
Yes one can write complicated queries to return exactly what you want in one query, but in many cases doing some logic around it and using smart grouping/loops will outperform the complex query ...
1. Re:SQL performance by Just+Some+Guy · 2010-03-28 13:13 · Score: 1
  
  query optimizers may be able to do a decent job on average, but once you go large databases (multi-million dataset tables), planing the query structure will go a long way preserving performance. Yes one can write complicated queries to return exactly what you want in one query, but in many cases doing some logic around it and using smart grouping/loops will outperform the complex query ...
  No offense, but are you primarily used to MySQL? My company's PostgreSQL tables are (I think) pretty small, on the order of a few tens of millions of rows in the larger ones. They're small enough that I've never bothered with setting up partitioning; each table really is just a big ol' table. Still, they're big enough to benefit from approaching them intelligently. In every case when smart goruping and loops were running too slow, I found that replacing the client-side mess with a few complex queries fixed the problem. In short, its query analyzer is far better at optimization than we are, on average. A database is exceedingly good at managing data. Why not let it?
  
  --
  Dewey, what part of this looks like authorities should be involved?
2. Re:SQL performance by garry_g · 2010-03-28 15:49 · Score: 1
  
  No, not only MySQL ... PostgreSQL also ...
  If you think a DB engine is fast, try doing joins and stuff between tables ... an optimizer can only be of limited efficiency, as it won't know what to expect ... e.g., doing a join of two large tables might result in either a very small result set, medium (many results from one part, few results from the other) or a very large result ... depending on the indexes etc. the wrong optimization may be _very_ costly ...
3. Re:SQL performance by Just+Some+Guy · 2010-03-29 02:08 · Score: 1
  
  If you think a DB engine is fast, try doing joins and stuff between tables
  Jjjjjj...oins? What are those?
  Seriously, I can think of a specific query I run that involves 3 or 4 subselects (depending on options passed to it), joins between multiple large tables, and predicates with values like "SUBSTR(UPPER(foo), 1, STRPOS(foo, ',') - 1)" that runs in milliseconds on PostgreSQL. The easiest "trick" is to see what EXPLAIN is doing and make sure you've got indexes on the join columns. In 99% of cases, assuming the query is actually sane (that is, you're not doing a cartesian join in a subselect then filtering the results later), that's enough.
  Another huge optimization is to do as much filtering as possible in PostgreSQL itself. The other common bottleneck is asking for way too much data ("SELECT * FROM t1 JOIN t2 on t1.key1 = t2.key1 JOIN t3 on t2.key2 = t3.key2") then winnowing the tuples on the client. You're almost always far better off selecting only the rows columns you actually need.
  There. That's almost the entirety of my database tuning "magic" that actually gets used in production. I've got PostgreSQL set to log all queries that take longer than 3 seconds. Daily, that comes out to 10-20 queries total on a fairly heavily loaded server. I can live with that.
  
  --
  Dewey, what part of this looks like authorities should be involved?
Re:The Article Is Right... And Wrong by Anonymous Coward · 2010-03-28 05:08 · Score: 1, Interesting

I've got news for you ... all the major stock exchanges, banks, and telecoms in the world use SQL RDBMSs to track transactions that match or exceed anything Facebook and Twitter are doing. I guarantee you, without a single doubt in my mind, that Facebook and Twitter could be run on a SQL RDBMS ... by that I mean Oracle, not MySQL.
There are times... by lenski · 2010-03-28 05:08 · Score: 2, Interesting

Our development organization is heavily invested in PostgreSQL, finding it to be perfectly matched to almost all of our needs. It is exceptionally reliable, and is very (but not perfectly) manageable. (We've had issues in the past with mis-timed auto-VACUUM for instance which are now resolved.) We even found a small but significant corner-case bug which upon being reported, received immediate attention from the developers, resulting in a resolution in under 72 hours. I believe our use of this particular tool has saved us significant resources (dollars, developer time) that has allowed the development organization to direct our time and money to our own application development.
But we're finding that even PostgreSQL has limits, mostly with respect to the large and growing datasets our application uses for large scale real time control. We could transition to a really expensive SQL solution, but we are at least considering the choices that may be a better fit for these particular subsystems than PostgreSQL or any other SQL solution. Just a few weeks ago, we started seeing a good comment in teh interWebs... "NoSQL" should mean "not only SQL".
Not a rejection of a powerful toolkit that holds a central role in our organization, but rather a recognition that we would be remiss in our responsibilities if we didn't pay attention to the choices that could simplify our lives as developers.
1. Re:There are times... by einhverfr · 2010-03-28 15:00 · Score: 1
  
  PostgreSQL does have some limits. These include the lack of parallel execution of queries across nodes in a cluster (this is something Green Plum DB has added). Once you are in the TB range, and processing GB of data in each query, you have some inherent performance issues which the application is not able to solve in its current architecture.
  The solution at that point is to see whether any of Green Plum's proprietary products (built on PostgreSQL) will fill that gap in your operations.
  
  --
  
  LedgerSMB: Open source Accounting/ERP
NoSQL isn't just for large-scale apps by tomhudson · 2010-03-28 05:08 · Score: 1

It's quite possible to use the same design concepts to develop smaller apps that don't need the overhead of an sql database. Don't forget that before RDBMS came out, hierarchical file systems were used as data stores.
In Soviet Russia, NoSQL kills off Devs!
1. Re:NoSQL isn't just for large-scale apps by mjwalshe · 2010-03-28 05:19 · Score: 1
  
  yes I was a devloper on a system in the 80's that ran on 17 PR1MES and was written using ISAM files how ever its a lot more complex to handel the constraints by hand in FORTRAN however its a lot more work.
2. Re:NoSQL isn't just for large-scale apps by tomhudson · 2010-03-28 06:51 · Score: 1
  
  I did it in c "way back when" in the days when a 286-20 couldn't give the performance otherwise. Not even ISAM files - just a program with enough smarts to be able to calculate where in the flat-file "index" the ascii representation of integer offset to fseek to in the data file was - and enough smarts to know which index to look into. Went from "press a key ... wait ... wait" to results in well under a second. Sure, it's more work in the sense of "you have to figure out in advance your data representation, etc." but the planning paid off in quicker development time, since you then had a clear target to aim at.
  I miss those days.
3. Re:NoSQL isn't just for large-scale apps by mjwalshe · 2010-03-28 08:31 · Score: 1
  
  lol yes when I ran the Telecom Gold billing system I loved it seeing all 17 or so machines come up under my control - not so much fun when you did 36 hurs with out sleep.
Rails "marginalized" by NoSQL? by SanityInAnarchy · 2010-03-28 05:10 · Score: 2

Bullshit.
ActiveRecord? Definitely. Rails as a whole? You might consider replacing it with another Ruby framework, but the same ideas are going to apply. Remember how Rails and Merb are merging? Merb tends to be ORM-agnostic, but the recommended Merb stack suggested DataMapper, which does support a few NoSQL databases.
Even if you needed a different ORM per NoSQL database, it wouldn't marginalize Rails as a whole, but that simply isn't the case. Just use DataMapper, then plug in the flavor of the day.
As an example, Rails (and DataMapper) run on Google App Engine.

--
Don't thank God, thank a doctor!
Re:The Article Is Right... And Wrong by Front+Line+Assembly · 2010-03-28 05:12 · Score: 1

It should probably be called NoMysql instead of NoSQL...
Here are some good posts. Seems NoSQL is just the new xml. Sure, great for some things, but not really worth the hype...
http://www.yafla.com/dforbes/Responding_to_Joe_Stump_on_the_NoSQL_Debate/
http://www.yafla.com/dforbes/Getting_Real_about_NoSQL_and_the_SQL_Isnt_Scalable_Lie/
http://www.yafla.com/dforbes/Getting_Real_about_NoSQL_and_the_SQL_Performance_Lie/
Re:Walmart's primary business isn't online by raftpeople · 2010-03-28 05:15 · Score: 1

At approx 200million transactions per day, does it matter whether the source is a website or a retail system?
Walmart's primary business isn't online: REALLY? by lenski · 2010-03-28 05:21 · Score: 1

I thought it was reasonably well understood that one of Walmart's primary characteristics is their *amazing* control over logistics. In fact, I thought one of their big process inventions was to bring logistical activity online.
I welcome clarification, since I haven't worked for Walmart.
Re:The Article Is Right... And Wrong by Tablizer · 2010-03-28 05:35 · Score: 1

Perhaps some are afraid that the No-Sql movement will leak into other niches out of hype. After all, OOP leaked out of physical modeling and into other niches without being fully tested for those niches, and people started clamoring for OODBMS. (I'm of the opinion that "everything OOP" is a no-no. Use it where it helps, but not where it doesn't.)

--
Table-ized A.I.
Storage as a Service by Qwavel · 2010-03-28 05:40 · Score: 2, Insightful

The article focuses on NoSQL's claim to scalability, but isn't that just one of the features of (some of the) NoSQL options?
Google, Amazon, and Microsoft all provide NoSQL storage as a service that is easy to use and cheap, particularly for getting started. Those are two pretty important features and I would imagine that it is those features, rather then dreams of needing vast scalability, that attract the many web startups.
depends on the problem by dominux · 2010-03-28 05:42 · Score: 1

relational databases are great if you have a relational problem. For everything else there is NoSQL. It is surprising how much of the world's data looks like "a stack of documents" rather than "a collection of mathematically related sets of data" Lotus Notes was the only NoSQL player for 20 years, now there are lots. Notes sucked because it had no competitors, the concept was and is sound. Now there is competition and lots of NoSQL database systems and application environments on top that suck less and less by the day.
Re:SQL by Anne+Thwacks · 2010-03-28 05:49 · Score: 1

The next development is logically called Egress, and designed to use a Fortran like syntax. Easily implemented using gcc and yacc.
(Since the PDP11 was designed to be a hardware Fortran machine, and C was its assembler, and the i86 a poor clone of the PDP11!)
Or maybe I iGress!

--
Sent from my ASR33 using ASCII
Re:The Article Is Right... And Wrong by fuzzyfuzzyfungus · 2010-03-28 05:51 · Score: 2, Insightful

For about 30 seconds, until the VC money dries up....

The point isn't (generally, there might be some pathological corner case) that the various web2.0 kiddies couldn't implement their stuff in SQL; but that they couldn't afford to do so. If you want to be able to serve large numbers of users in order to generate enough adsense pennies to keep the lights on until somebody buys you, your options are pretty much A). Software with a more or less zero per-node cost, running on commodity x86s with no exotic interconnect. or B) Bankruptcy.
WalMart doesn't use SQL by Anonymous Coward · 2010-03-28 05:52 · Score: 1, Insightful

WalMart has one of the largest Teradata installs, it doesn't use SQL.
1. Re:WalMart doesn't use SQL by shutdown+-p+now · 2010-03-28 14:02 · Score: 1
  
  I don't know enough about Teradata to say anything about it in particular, but just because it doesn't use SQL as a query language, it doesn't make it a "NoSQL" solution. It's kinda confusing - some of the "NoSQL" databases actually use a subset of SQL as a query language.
  There's no authoritative definition of NoSQL (as with many buzzwords of the day), but, so far as I can see, it's defined more in terms of what it doesn't do:
  1. Not relational.
  1. No ACID.
  3. No rigid table schema.
2. Re:WalMart doesn't use SQL by mini+me · 2010-03-29 14:28 · Score: 1
  
  No ACID.
  Many of the NoSQL databases are ACID compliant.
3. Re:WalMart doesn't use SQL by shutdown+-p+now · 2010-03-29 15:41 · Score: 1
  
  Can you give one example, and also explain how much it is ACID compliant? I.e. does it give ACID guarantees only for a single entity/row update? Or can it actually handle a transaction involving queries and updates over multiple entities/rows?
4. Re:WalMart doesn't use SQL by mini+me · 2010-03-29 18:23 · Score: 1
  
  CouchDB is ACID compliant for document transactions.
  
  Or can it actually handle a transaction involving queries and updates over multiple entities/rows?
  I realize this is a fundamental feature of relational databases, but NoSQL databases are decidedly not relational. If we look to CouchDB again, the document is the transaction. All of your "rows" and "columns" are stored within the document. You can be certain that all of your entities contained within the document will be saved in a consistent state.
  Again, CouchDB is not a relational database. If you need a relational database, I would recommend using one. With that said, there are many applications that do not benefit from relational systems and are much more appropriately implemented on NoSQL databases. It is all about choosing the right tool for the job.
Ummm.... by gbutler69 · 2010-03-28 05:58 · Score: 1

...and those OLAP systems are most likely ROLAP or perhaps ROLAP with MOLAP support also. In any case, the underlying non-aggregated data is probably in an SQL database that supports materialized views and auto-aggregate tables. The OLAP is simply a multi-dimensional aggregate cache that sits in front of it.

--
Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
Re:The Article Is Right... And Wrong by RAMMS+EIN · 2010-03-28 06:07 · Score: 1

So you're saying that an RDBMS is the right tool for the job if your transactions have enough value, and, if the value per transaction is too low, you won't be able to afford an RDBMS, but you can still go with a NoSQL database? That's an interesting point of view.
So how do you make your system work with NoSQL? As you say in your post, "you lose ACID, indexes, and joins to varying degrees". To me, with my relational view of the world, it seems that you would want to use an RDBMS exactly because of these things. Specifically, the fact that your RDBMS does the hard work of keeping your data consistent for you. Wouldn't you have to implement that all by yourself if you went with a NoSQL system? If so, what realistic expectation can you have to come up with something that is both correct and as performant as an RDBMS which lots of smart people have worked on over the years?
Or is it just that people are throwing consistency out of the window and saying "We can afford to lose a couple of records or have a couple of dangling references here and there, as long as it SCALES". Because I can build something that scales if it doesn't have to maintain ACID, too. The difficulty is in having _both_ ACID and scalability.

--
Please correct me if I got my facts wrong.
Mainframe-based HVTP Systems Don't use SQL by emes · 2010-03-28 06:16 · Score: 1

For well over 30 years, airline reservation, hotel reservation, and other high volume transaction processing(HVTP) systems that are mainframe-based have not used SQL in the core transaction processing system. They use either the built-in key/value subsystem of TPF/ZTPF, or a slightly more sophisticated subsystem known as TPFDB. Using facilities similar to zOS, failover and recovery happen in record time should it be necessary. This successful real-world system and approach deserves the attention of those who would like to learn how this stuff really works.
Re:Not all databases are for business by C3c6e6 · 2010-03-28 06:23 · Score: 1

I didn't (want to) say that all RDBMS won't cut it. The only point I wanted to make was that while I can see the point of the author that solutions like Cassandra are a bit overrated for most business applications, for other applications domains they are becoming a viable solution.
Yes, it does. by gbutler69 · 2010-03-28 06:25 · Score: 2, Interesting

Each transaction of those 200,000,000 for WalMart is a fairly significant source of revenue. Averaging on the order of $50.00 to $100.00 per transaction. That same 200,000,000 transactions for a web application would average like $ 0.03 (yes, 3 cents). Now, if the cost per transaction using tradition RDBMS is something like $ 0.25 (25 cents), how is that going to work for the Web case? What if the cost is $ 0.01? Still epic fail for the web case.

--
Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
1. Re:Yes, it does. by mabhatter654 · 2010-03-28 08:47 · Score: 1
  
  A very good portion of those come from retail systems and EDI logistics... things that are very tightly controlled on both ends by developers, not the "wild west" format of something like Facebook.
  I think the argument is much like mainframe vs. PC, MaBell vs. TCP/IP, and HTML vs. MS Word, or Akamati vs Bittorrent. You can gain super duper efficiencies if you can tightly control inputs, outputs, and growth.... and be willing to pay big bucks for it. This is a similar argument... trying to use "enterprise" tools versus distributed types of access of different types of data.
Missed in PostgreSQL? by gbutler69 · 2010-03-28 06:31 · Score: 1

I'm fairly certain that at least as of 8.4 PostgreSQL supports XML fairly robustly.

--
Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
Being someone who does A LOT of medical records... by gbutler69 · 2010-03-28 06:35 · Score: 1

...in what way do they not map to the relational model? If you say "unstructured data", that is not an answer.

--
Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
Walmart huh/ by iCEBaLM · 2010-03-28 06:43 · Score: 1

If real businesses like Walmart can track all of their data in SQL databases that scale just fine, Dziuba argues, surely your company can, too.
Oddly enough I'm trying to get to walmart.ca right now, and it's down....
if you think devs are stupid... by ruurd · 2010-03-28 06:53 · Score: 1

Oh shoei. Must have been a slow day. Yes everybody that uses an SQL database for no good reason is insane. Yes everybody that uses a NoSQL database because it is the latest-and-greatest has the same affliction. Use what fits your purpose. SQL or NoSQL? Does not matter.

--
ruurd
Developers are the problem by Colin+Smith · 2010-03-28 06:55 · Score: 1

They have no clue how to scale their systems[1]. Therefore they pass the problem on to the underlying layer and say you do it for me.
[1] They don't understand the mathematics of what they are doing.

--
Deleted
Use what is appropriate not FUD by Joviex · 2010-03-28 07:28 · Score: 2, Insightful

It's a poor artist / programmer / cook / et. al. that blames his tools. If you know the problem, you use the best tool to solve it. SQL or Document-DBs or Graph-DBs whatever is the best fit to solve the problem is what you use. You don't go around saying something is crap because you have no need for it.
Future Data Storage by prefec2 · 2010-03-28 07:39 · Score: 1

Today many people store data on their private machine using classic file systems and they use databases to store files and to tag them. In future tags or other kinds of attributes will become more important in information storage and retrieval. Therefor we need databases capable of managing such information. RDMS are very good at storing such information and to work with sets and subsets. And tags and attributes of objects/files/entities are nothing more than markers that show to which sets objects belong. So I doubt that SQL databases will go away.
Furthermore, objects in OOP languages are very restrictive. If you look for example at objects (called individuals) in OWL, you can see that data objects can have properties and relationships to other objects which cannot be expressed that easily in OOP language style. Therefor using DBs which are limited by the object model of OOP languages will not suffice.
Easy. by Estanislao+Mart�nez · 2010-03-28 07:42 · Score: 2, Insightful

Why should I give two shits about what database system someone else uses?
Because they're gonna tell your non-technical boss to make you use it, and he's gonna listen when they start telling him that Google, Twitter and Facebook do.

--
Are you adequate?
PhDs at Google at totally idiots then... by Liquid-Gecka · 2010-03-28 08:14 · Score: 1

Joe Stump wrote a post that is a perfect response to this insanity.
http://stu.mp/category/nosql
Why is it that all the people working at scale seems to be going with NoSQL solutions? Are all the devs at Google, Facebook, Twitter, Digg, Redit, etc total idiots or in fact is there a problem that they face that is actually real?
Anybody that sites Amazon, Walmart or any large retailer as an example of why SQL scales is missing the point. Retails have very few write operations compared to the read load. The vast majority of the load hits databases that serve reads and have a high tolerance for write latency. This is a field SQL is good at solving.
On the other hand, social sites that have massive cross user data ties and constant write updates where latency is very important don't fit this model that well. Sure, you can remove SQL replication from the mix, use independent instances of MySQL serving fractions of the overall site, with redundancy between them but if you do that you have functionally built a NoSQL data store. The concept isn't to get right of SQL, its to get rid of the relational aspect of data storage. You can no longer rely on all your data being available to a single SQL statement.
Being an operations guy though I should point out the number one failing of SQL in my world. If you assume that, on average, a machine will either crash or have some sort of hardware failure once a year and you consider a site with 1,000 machines then you see that nearly 3 machines will die every day. Even if you count on 2 years of continuous uptime that is over 1 a day. with 10,000 machines your failure rate is 27 per day, 100,000 machines is 273. This means that any database layer that requires a large number of machines has to build in a recovery layer. Clients need to know that a node is down, when it comes back it needs to have data uploaded to it.. etc. The NoSQL solutions like Cassandra manage this automatically. Trying to do this with MySQL becomes really complicated and you end up implementing all the same logic and constraints in NoSQL solutions anyways. I have seen this happen twice now.
1. Re:PhDs at Google at totally idiots then... by rmm4pi8 · 2010-03-29 02:12 · Score: 1
  
  I'm also on the ops side, and I think a lot of people not running dozens of SQL servers really underestimate the pain of this. I do worry a bit about the visibility with NoSQL though--if you trust it to manage that there are 3 copies of your data at all times or whatever, how do you really guarantee that? And how do you know which bits to back up? The promise is good, but I definitely worry about whether we're there yet as we start to deploy Cassandra in production.
  
  --
  U.S. War Crimes blog. Email for free Mandriva support.
Re:The Article Is Right... And Wrong by BitZtream · 2010-03-28 08:22 · Score: 1

Real business track their data with SQL databases, true. However, real businesses have small numbers of transactions relative to their value. If Walmart had the same revenue but the average sale was a tenth of a cent, their fancy SQL database would be smouldering rubble.
This might be true if they sold items for 1/1000th of a cent, but its simply untrue for any sale anywhere.
Twitters load isn't that impressive, its a poorly written big mess of a service. Its pretty common knowledge that it could be made far better if they would just use some untrained monkeys.
Again, facebook ... bad example.
You've taken two over night one hit wonders that will be gone in a few years and used them as if they are valid examples of how to do it. They aren't, they aren't even close. They are what happens when you grow so fast you don't have a chance in hell of keeping up, so you cobble things together as best you can to survive knowing that its just a matter of time before the fad passes.
Do I think FB and twitter could survive on MySQL? Probably not, but on a real DB with real DBAs, more than likely yes.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
hammer - all nails by kalman5 · 2010-03-28 08:25 · Score: 2, Insightful

As soon people have an hammer on their hands, all the stuffs they see around are nails. The good IT worker is the one with different tools on his hands with the ability to choose the right one at the right time, and before you forget it remember that premature optimization is the root of all evil.
Re:Being someone who does A LOT of medical records by BitZtream · 2010-03-28 08:39 · Score: 1

The person you're replying to is clueless as far as to what 'medical data' is I think.
You should have picked up on this when he starts naming books that he's read. The more name/buzzword dropping you see the more you know the person doesn't really have a clue.
He even had to do a quick google to find some old buzzwords to throw in, I almost want to give him points for throwing in CICS, almost.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
even if you are not Google... by pydev · 2010-03-28 09:18 · Score: 2, Insightful

It's easy to hit intrinsic performance limits with SQL databases even on small apps. And for people who aren't database experts, it's even easier since they don't know the hoops to jump through to make their SQL databases perform well. For the average programmer, it's easier to get good performance out of no-SQL databases.
Using SQL databases programmatically is a fairly silly notion to begin with: SQL was originally intended as an easy-to-use query language for non-experts because people were having trouble with navigating data structures. But programmers are excellent at navigating data structures and designing efficient data structures. SQL is solving a problem that most programmers don't have, and you're paying a big performance penalty for that.
Sometimes an SQL database is the right thing to use, sometimes it isn't. People really need to use their head instead of blindly picking one or the other solution.
1. Re:even if you are not Google... by Pinky's+Brain · 2010-03-28 16:45 · Score: 1
  
  That's always the problem I have had with SQL as a (below) average programmer ... passing a query string over a socket for retrieving data which can be serviced from memory with a native function call 99.99% of the time. Then throw in an opaque automatically optimized indexes with untold number of tuning parameters, ugh. I've never quite understood how web developers thought that was a good idea ... especially a over decade ago when MySQL was not exactly stable and hardware definitely wasn't fast enough to make the inherent inefficiency irrelevant.
2. Re:even if you are not Google... by FlyingGuy · 2010-03-28 18:08 · Score: 1
  
  The problem with most databases is the lack of data analysis before a single table is created. You must understand the nature of your data long before you create a table. If you don't, you will fail no matter what system you use to store the data.
  
  --
  Hey KID! Yeah you, get the fuck off my lawn!
3. Re:even if you are not Google... by pydev · 2010-03-29 12:00 · Score: 1
  
  It's not sufficient to understand the nature of the data, you also need to understand the idiosyncracies of the database. Something that works very fast on one database may be a dog on another.
  But why should programmers do this at all? As a programmer, I know how to build efficient data structures in memory and on disk; that's what I'm trained to do. I'm not trained to second-guess performance issues in some bloated
  commercial software package that was probably not even remotely designed for the kinds of problems I'm working with.
4. Re:even if you are not Google... by FlyingGuy · 2010-03-29 15:21 · Score: 1
  
  I agree with your first paragraph completely.
  Your second paragraph though...
  The problem with schema's that are created by anyone and I do mean anyone is that at the time we design it, no matter how far thinking we are, no matter how much we believe that we have thought through every eventuality, unless the schema is trivial, we will miss something and we will have to go back and start modifying things and this is always a problem due to the side effects of the changes we make.
  This is the reason that DB engines and things like SQL were invented. Yes some or more efficient then others, some are faster, some scale better, so do most things very well, some do a few things insanely well, but overall most of them manipulate data very well, within the scope of their design.
  Can I or you design in code a single purpose database and have it run faster and more efficiently then the same schema in Oracle, MySQL, MS-SQL, SQLite or even dBase for that matter, yes we can. The problem comes into existance when we are no longer there at the company and someone else is there and the data "query" changes and then someone has to change real code and rebuild it and that is when things typically go straight down the shitter.
  
  --
  Hey KID! Yeah you, get the fuck off my lawn!
any usable ones out there? by pydev · 2010-03-28 09:32 · Score: 1

OK, so which OODBMS do you recommend? I know of Ozon and db4o, both for languages that I rarely use. What about an OODBMS that I can access from C++, Perl, Python, and C? How widely used are they? How good is the support?
1. Re:any usable ones out there? by Kagetsuki · 2010-03-28 14:39 · Score: 1
  
  TokyoCabinet. We have an application that runs a C/C++ client accessing it and a Ruby application that binds it to the web. You can access it from Perl as well but I'm not aware of if you can use (somebody may have written a binding). It's used on everything from small applications to mega scale social networks (namely Mixi) and it's very, very pleasant to use as a developer.
2. Re:any usable ones out there? by Pinky's+Brain · 2010-03-28 17:03 · Score: 1
  
  How is that an OODBMS? AFAICS, it's embedded so I don't think the MS applies and to get object storage you'd have to layer that on top yourself (as well as code your own garbage collector).
3. Re:any usable ones out there? by Kagetsuki · 2010-03-28 18:05 · Score: 1
  
  Oh, I'm sorry it's not OO. I somehow misread OO as OSS. Feel free to just ignore me and move along then.
Re:The Article Is Right... And Wrong by Zironic · 2010-03-28 09:34 · Score: 1

"Or is it just that people are throwing consistency out of the window and saying "We can afford to lose a couple of records or have a couple of dangling references here and there, as long as it SCALES""
Hmm, I could actually see that being the case for some applications, probably not one common in the business world but for research it's probably fairly likely you might generate huge datasets where losing individual records wouldn't matter much.
We need alternatives to SQL. by codealot · 2010-03-28 09:39 · Score: 1

As another poster pointed out, this is a false dichotomy. We're emerging from a technology monoculture of "every DBMS must be SQL" to "it's possible, even viable to design, implement and use a DBMS that does not implement SQL". Anyone advocating a mass exodus from SQL-land is a fool. Props to the NoSQL guys for opening our eyes to fresh ideas.
There's no need to stick with any single DBMS platform for 100% of your organization, unless you're so small that you have but one server total, and then I suspect this debate is largely irrelevant to you anyhow.
We're not Google, but we have some of the same problems Google faced with scaling up our applications for the Internet. We do use MySQL, having some 35+ instances of it. Our application processes in excess of 1000 transactions each second, and we know it'll be difficult to scale to tens of thousands of transactions per second without some fundamental changes. Today we survive by imposing limitations on developers writing OLTP applications--things like "all row operations must search by primary key" and "no table scans, ad-hoc queries or file sorts". The access language is still SQL, but increasingly we don't *need* SQL for OLTP transactions. We could plug in something today that is equivalent and far simpler.
What we're lacking in SQL-land is a good way to host DBMS applications on distributed infrastructure, e.g. in the cloud. There are clustered databases available, but these often require fast/short interconnects, may have difficulty scaling above bandwidth limits on a local network or SAN, and can be frustratingly fragile to use in the "real world". Not to mention expensive. This is due to the consistency model imposed on such systems, i.e. the "AC" in ACID.
Data sharding is a popular way to exceed the limits of SQL, but once you introduce sharding you're treading on "NoSQL" waters already. You can't retrofit sharding onto an application that isn't aware of it, not very successfully, in my experience. So developers need to become acutely aware of the storage tier and design for it, meaning they've already lost the perfect abstraction of SQL.
I'm keenly interested in emerging products like the Cassandra database, and while I have no intent of ever abandoning SQL (and probably even MySQL) in our organization, we're absolutely going to take "NoSQL" for a spin to see what it delivers in terms of cost, complexity, relability and performance.
Why I am considering NoSQL by Apoptosis66 · 2010-03-28 09:43 · Score: 1

My team is currently considering a "NoSQL" solution moving away from PostgreSQL, and the reason is: We desperately need Multimaster over the WAN that handles split brain situations gracefully. Its a tough problem and frankly no RDBMS handles it well. I suspect any group who has had to support multiple disperse locations has the same thought.
COBOL rules! by Anonymous Coward · 2010-03-28 10:36 · Score: 1, Insightful

What was wrong with COBOL? Didn't it solve most of businesses problems? What makes C/C#/C++/Java/Ruby/Perl/PHP so much better?
In other news ... by yelvington · 2010-03-28 11:14 · Score: 1

In other news, some random hammer enthusiast posted on his blog that he just can't wait for screwdrivers to die.
Re:The Article Is Right... And Wrong by SQL+Error · 2010-03-28 11:40 · Score: 1

Exactly. That's the point I was making with the value per transaction. The value of a bank transaction or a stock market transaction is considerable - and so are the fees. If Twitter charged you 25 cents per tweet - let along $25 - they'd have no trouble buying a suitable SQL platform to store their data. Mostly because they wouldn't exist.
Object databases by countach · 2010-03-28 12:13 · Score: 1

All our languages are now object, but we're still using non-object databases and mapping between rows and columns and objects. WHY?? Yes, tools can help you map, but its a bandaid, it screws up performance on all but simple cases. And it means you can't do queries using the same model as your language.
Yes, relational algebra is a useful query tool, but there is no need to be beholden to relational table structures to get relational algebra. Neither is the so-called object-relational features of postgresql going to cut it. You can't even do a query and get back a list of objects of different types for goodness sake.
1. Re:Object databases by FlyingGuy · 2010-03-28 18:19 · Score: 1
  
  The same model as what language?
  
  --
  Hey KID! Yeah you, get the fuck off my lawn!
Re:The Article Is Right... And Wrong by mukund · 2010-03-28 12:51 · Score: 1
One thing that many people don't seem to get right: Using these "NoSQL" databases doesn't mean that you don't get ACID. Many key-value databases support ACID just fine:
- Berkeley DB transaction support; Berkeley DB used to even include XA support for a distributed transaction manager.. but that seems to have been removed.
- Tokyo Cabinet transaction features
You've got to remember that (simplifying drastically,) SQL is a query language layered on top of a "NoSQL" style database (whether built into the SQL DBMS implementation, or a 3rd party one). Such "NoSQL" databases have to be ACID capable in their native API and implementation first.
--
Banu
Tagged union types, à la Haskell by Estanislao+Mart�nez · 2010-03-28 12:53 · Score: 1

OK, so enlighten us with your brilliance! Share with us the ultimate answer of what should be done to differentiate a null (logically, "I don't know") with a blank string (logically, "We know there's nothing there") and what should be done differently
The way languages like F# or Haskell treat "nulls" would be a straightforward definite improvement over the way SQL does (or for that matter, the common C/C++/Java/C# paradigm). A type whose value is either a Foo or nothing is just a tagged union type. So allowing columns to take tagged unions as their type would solve that right away--and also allow to impose further logical distinctions as needed by the application.
The whole three-valued-logic "null is not a value" paradigm of SQL is a disaster, that one's for sure. There are all sorts of query optimizations that are impossible to do on the face of it.

--
Are you adequate?
TokyoCabinet by Kagetsuki · 2010-03-28 14:35 · Score: 1

Our company has saved immense time and made our applications faster and easier to understand, as well as [theoretically] more secure by switching to Tokyo Cabinet. F*ck SQL and F*ck MySQL especially. I personally wrote some of the interfaces to the TokyoCabinet databases we are using and at this point I have decided I never want to do anything with SQL ever again. Seriously, SQL sucks - it's clunky, easy to introduce security flaws, slow, breaks easily, difficult to access from multiple languages simultaneously, you often have to do things like create special users to do certain things which then introduced more security risk... and on and on. SQL is crappy and should be considered deprecated.
1. Re:TokyoCabinet by ckaminski · 2010-04-05 06:19 · Score: 1
  
  Awesome astroturfing dude!
  
  Difficult to access from multiple languages simultaneously... what rock have you been living under the past 20 years?
2. Re:TokyoCabinet by Kagetsuki · 2010-04-09 02:55 · Score: 1
  
  TC doesn't lock up and break like SQL does. I've had a lot of trouble accessing with multiple applications with MySQL especially. PostgreSQL I admit I have not had such problems that I can recall; but regardless TokyoCabinet is significantly simpler to use and I have yet to encounter any lock problems or problems where multiple access corrupted the database. Then again I guess if YOU have spent the last 20 years dealing with SQL you may find it pretty easy; but you know what I don't want to spend any more time than I need to dealing with DB crap and just want working applications now so I'll stick with TokyoCabinet.
Re:Walmart's primary business isn't online by einhverfr · 2010-03-28 16:16 · Score: 1

You are right. There is no comparison. Having worked with POS frameworks and the like I can tell you performance is a MUCH bigger issue there.
Interestingly the two largest databases I help with regarding LedgerSMB are a financial services business with over a hundred employees and the other is a convenience store with two tills. And with the POS environment, you have to have top performance. A 10 second delay is something that needs to be fixed quickly and a 30 second delay is almost unworkable. So yes, no comparison. Performance is MUCH more important on the brick and mortar retail end.....

--

LedgerSMB: Open source Accounting/ERP
Re:Being someone who does A LOT of medical records by Angst+Badger · 2010-03-28 16:41 · Score: 1

He even had to do a quick google to find some old buzzwords to throw in, I almost want to give him points for throwing in CICS, almost.
What are you, twelve? I've developed COBOL and CICS applications, though thankfully, I work mostly in C++ and Java now.

You should have picked up on this when he starts naming books that he's read. The more name/buzzword dropping you see the more you know the person doesn't really have a clue.
No, but thirty-five years of software engineering has taught me that treading on the sacred turf of DBAs gets you one of two possible responses. If you don't make it clear at the outset that you do know what you're talking about, you're immediately dismissed as a clueless outsider. If you do make it clear that you know what you're talking about, you get responses like yours, which just descend into nonsensical nastiness. You can't have a meaningful discussion with people who aren't interested in dealing in good faith.
All of which serves to underscore my original point, which is that there is a deeply entrenched RDBMS faction that can only see problems in terms of the one tool that they have, and react to problems that don't fit the tool well (or at all) by simply denying their existence. The irony is that there is hardly anyone who denies the broad utility of the relational model. The hysterical reaction to the suggestion that not everything fits the model equally well and a few things don't fit it at all only highlights the blind dogma involved.

--
Proud member of the Weirdo-American community.
Re:The Article Is Right... And Wrong by grepya · 2010-03-28 17:08 · Score: 1

Or is it just that people are throwing consistency out of the window and saying "We can afford to lose a couple of records or have a couple of dangling references here and there, as long as it SCALES". Because I can build something that scales if it doesn't have to maintain ACID, too. The difficulty is in having _both_ ACID and scalability.
That's exactly right. To any experienced server engineer/architect, it's obvious that much greater scaling -- both horizontal (automatic sharding of data across nodes) and vertical (more writes/second/node -- can be achieved if you give up the absolute guarantee of zero data loss (...while still usually keeping data *consistency* in the non-lost portion of the data). Many of the social networking type applications... twitter, facebook and the likes, can probably afford that risk. Given that... you can do many many more txns/s with key-value type database instead of the transactionally oriented OLTP type databases. Now, for a smaller organization (shoe-string startups and such), the RDBMS model still has many benefits that can't be ignored --- vast "googleable" knowledge base behind the traditional software products, larger candidate pool with expertise in said systems etc. Unfortunately there really is no standard answer here other than evaluate your own situation carefully and make up your mind based on all data (most of which is available only to you).
It's not about "technology" purity. by xyourfacekillerx · 2010-03-28 18:25 · Score: 1

Look, guys. Let's be honest here. NoSQL has been around forever; it's the default approach for data storage unless a relational database is selected as a requirement of the software being written (am I the only one who still writes his own file formats and uses record-based random access for small-time data storage? If you don't need the complexity of XML or SQL, then don't use it...)

That being said, NoSQL is just giving that obvious practice a name as if it is a new phenomenon in the development world. Agreed now that it has a name it tends to mislead developers into discarding SQL DBMS irresponsibly, but it does serve an extremely important purpose in the business world: It superficially inflates an otherwise vacuous business process, which under the guise of "innovation", drives business demand.

The IT world does this all the time. They re-package existing solutions, or disrupt them in favor of "new" solutions which to be honest are often unnecessary and more complex than the original solutions. But it drives business. It creates new hardware, new software, new job positions, new education criteria that academia can sell and creditors and government can tax, new system maintenance and migration hurdles; it turns businesses into consumers, it creates new consumers for those businesses, and justifies continuing relationships with consumers when the last product was already good enough.

The mere hype of IT "solutions", however irrelevant or pointless or unnecessary, perpetuates the industry. A lot of it is utter BS. It's all they can do during times when few real advancements are made... and sadly it works too well... and that is the REAL problem with the NoSQL trend. Not bad programming practice. Just artificial business fuel.
Re:The Article Is Right... And Wrong by SQL+Error · 2010-03-28 18:27 · Score: 1

So how do you make your system work with NoSQL? As you say in your post, "you lose ACID, indexes, and joins to varying degrees". To me, with my relational view of the world, it seems that you would want to use an RDBMS exactly because of these things. Specifically, the fact that your RDBMS does the hard work of keeping your data consistent for you.
Sure, if your transactions are worth something.
In the world of social networking, consistency is much less important than speed. If two different users see different data because the nodes are a few seconds out of sync, no-one cares. But slow answers are wrong answers.
You can't do that with a bank or a stock exchange. It would be a disaster. For a social networking site, no-one will care - no-one will even notice.

If so, what realistic expectation can you have to come up with something that is both correct and as performant as an RDBMS which lots of smart people have worked on over the years?
We throw strict correctness out the window. That's where most of the performance gain comes from. You still have to build an architecture that can take advantage of this opportunity, though, and that's not trivial.

Or is it just that people are throwing consistency out of the window and saying "We can afford to lose a couple of records or have a couple of dangling references here and there, as long as it SCALES". Because I can build something that scales if it doesn't have to maintain ACID, too. The difficulty is in having _both_ ACID and scalability.
Consistency, scalability, affordability. Pick two... At most.
Re:Right! by rmm4pi8 · 2010-03-28 22:12 · Score: 1

Strongly agreed, though I do worry that many NoSQL projects' websites are overly blase about runtime issues, including crash safety and online schema changes, as well as upgrade-safety. Now this is really all about using alpha software rather than anything conceptual/design related, but it is a real issue at the present time.

--
U.S. War Crimes blog. Email for free Mandriva support.
What? by C_Kode · 2010-03-29 00:37 · Score: 1

I wrote this guy off the second he mentioned Walmart's database. How Walmart uses databases vs Facebook or Twitter is completely different. I'm pretty sure Walmart doesn't have 6 million people writing to that database at any given time.
Reply from Cassandra's Eric Evans by beemishboy · 2010-03-29 01:02 · Score: 1

Eric Evans, who coined the term NoSQL and is a committer on the Cassandra project, responded in a blog post:

http://blog.sym-link.com/2010/03/28/haters_gonna_hate.html
NoSQL is nothing revolutionary by reich · 2010-03-29 10:20 · Score: 1

Ugh.
The saddest part to me about the "new hotness" of NoSQL zealots is that a scalable, fast, flexible key-value store isn't new at all. It's called LDAP. Sadly, it's continues to be a horribly misunderstood beast. Yes, it's more than a shared address book.
In the end, you use the right tool for the job. SQL is relational. LDAP is hierarchical. Neither is new hotness, so stop pretending to invent. Both perform their jobs exceptionally well.... if you use them for the job they are intended, and learn a little something about the concepts invented before your birth. Chances are, they've been thought through before, and you're being lazy. Go read up.
1. Re:NoSQL is nothing revolutionary by ishobo · 2010-03-30 13:22 · Score: 1
  
  The saddest part to me about the "new hotness" of NoSQL zealots is that a scalable, fast, flexible key-value store isn't new at all. It's called LDAP.
  
  The saddest part is you are commenting on a field you know nothing about. You may want to look a little further back in time, around the 1960s.
  
  --
  Slashdot - The great and glorious cluster fuck of Internet wisdom.
Mongo by GWBasic · 2010-03-29 14:49 · Score: 1

I've been pretty happy with MongoDB. Why? The document architecture makes ORM a lot easier.

--
No, I will not work for your startup
Re:PstrgreSQL and ACID? You are kidding right by rkit · 2010-03-31 02:57 · Score: 1

Can you give any specifics? Btw, you know that there is more than one isolation level available in Postgres?

--
sig intentionally left blank
Re:The Article Is Right... And Wrong by ckaminski · 2010-04-04 13:13 · Score: 1

Traditional OODBMS have two major problems... well, maybe three, going against them

1. Hard to adhoc restructure data to do set-based modelling (is this really a downside?)
2. Schema evolution (changing a model from one version to another.
3. Lack of tools sophistication. For ObjectStore (which I have supported in the past and work for today) - we have always had a lack of easy-to-use tools like Crystal Reports and some visualization tools that the SQL market has had almost since day 1. Requiring a programmer to do your data mining is a serious downside to using a pure OODBMS.

Although since we added Xquery support to the product, it's getting easier to do adhoc queries without requiring access to a C++ or Java compiler.
NoSQL is not just for scalability,it buys you time by nakubu · 2010-04-07 21:17 · Score: 1

This article seems to totally miss the point about why startups are using NoSQL databases, namely, those that are schemaless. It's because most startups are in the process of building their main product on the fly, pushing out new versions as often as a few days or hours depending on their deployment model. Schema stand in the way of rapid development since you have to CONSTANTLY redefine them as you redefine your product. So updating your db goes something like this: "Oh, I have to change this relationship from one to one to one-to-many." "Well, now that we redesigned part of the database, let's migrate it." "Ok, well enough time has passed. is it done yet? no? okay" "ah, it's done, ok, take down the servers for maintenance and restart" or, you can use something like couchdb, and just insert whatever new data you want on the fly, without defining schema, without migrations, and without downtime. A win for startups. It's not just about scalability, it's also about being able to do a simple task.