Moving From CouchDB To MySQL

← Back to Stories (view on slashdot.org)

Posted by Unknown on Wednesday May 16, 2012 @01:37AM from the hep-cats-just-use-postgres dept.

itwbennett writes "Sauce Labs had outgrown CouchDB and too much unplanned downtime made them switch to MySQL. With 20-20 hindsight they wrote about their CouchDB experience. But Sauce certainly isn't the first organization to switch databases. Back in 2009, Till Klampaeckel wrote a series of blog posts about moving in the opposite direction — from MySQL to CouchDB. Klampaeckel said the decision was about 'using the right tool for the job.' But the real story may be that programmers are never satisfied with the tool they have." Of course, then they say things like: "We have a TEXT column on all our tables that holds JSON, which our model layer silently treats the same as real columns for most purposes. The idea is the same as Rails' ActiveRecord::Store. It’s not super well integrated with MySQL's feature set — MySQL can’t really operate on those JSON fields at all — but it’s still a great idea that gets us close to the joy of schemaless DBs."

221 of 283 comments (clear)

Min score:

Reason:

Sort:

Not getting RDMS by Anonymous Coward · 2012-05-16 01:41 · Score: 5, Insightful

And in another three years they will switch to whatever is the coolest up-and-coming storage solution. Incompetent developers will always be incompetent developers.
1. Re:Not getting RDMS by gbjbaanb · 2012-05-16 02:24 · Score: 5, Insightful
  
  true, just reading their blog
  
  Things like SQL injection attacks simply should not exist.
  HTTP API. Being able to query the DB from anything that could speak HTTP (or run curl) was handy.
  
  so sql injection is real bad, bad design of SQL... yet allowing any old HTTP javascript queries is somehow ok. Yes, incompetent developers indeed.
  They also say
  
  Why are we still querying our databases by constructing strings of code in a language most closely related to freaking COBOL, which after being constructed have to be parsed for every single query?
  apart from the concepts of query caches - and stored procedures - so what if the language is related to COBOL, javascript is closely related to C which is almost as old. And that has plenty of relations to Algol which is even older.
  So yes, it sounds like they havn't really got a clue. Great advert for their business!
2. Re:Not getting RDMS by gorzek · 2012-05-16 02:25 · Score: 4, Insightful
  
  I think the main problem is application developers not understanding anything about database theory. The vast majority of databases I encounter are not normalized at all, and it's almost always because they were designed by a developer with no database background.
  Granted, I didn't come into this field with that background, either, but I made a point to learn it, and now I'm very cognizant of implementing sound database designs. This whole idea of throwing random strings of structured text into a database column, and then relying entirely on the program code to parse and use it... well, why the hell even use a relational database, then?
  Relational databases aren't suitable for every application, nor are "bigtable" and other NoSQL implementations. The problem is that developers use a particular kind of database without really understanding how to use it properly. If they can get data in, and get data out, that's basically all they care about. Never mind if they make it a maintenance nightmare in the process.
  
  --
  Check out my world simulator thingy.
3. Re:Not getting RDMS by arth1 · 2012-05-16 02:43 · Score: 2
  
  so sql injection is real bad, bad design of SQL... yet allowing any old HTTP javascript queries is somehow ok.
  HTTP isn't a subset of javascript - no javascript queries are needed for HTTP. Even for JSON and other javascript objects.
  That said, yes, the developers don't seem to "get it". An object/method based database query language, which they seem to want, has already been tried. Look where Informix is right now.
  Yes, parsing can be a bitch, and which is why using a structured database isn't always the right choice to start with. If you're just using it for data storage, it rarely makes sense.
4. Re:Not getting RDMS by Xest · 2012-05-16 02:45 · Score: 5, Insightful
  
  "Why are we still querying our databases by constructing strings of code in a language most closely related to freaking COBOL, which after being constructed have to be parsed for every single query?"
  I couldn't agree with you more, this quote makes me want to vomit. Is this really how low the average competence of today's web developer has stooped? Between PHP developers not getting why PHP is a pretty shitly designed and developed language and stuff like this, I barely get how the web even runs anymore.
  To answer the original quote, the reason we're "still querying our databases by constructing strings of code in a language most closely related to freaking COBOL, which after being constructed have to be parsed for every single query?" is because SQL is a language based on mathematically sound principles, and which is supported widely, and known widely, and is processed by database engines across the globe that have literally decades of stability behind them, data in them and so forth.
  There's absolutely no reason to change SQL, because if you build a new query language that is based on the same mathematically sound principles of relational algebra then it will er... look just like SQL. The fact the kiddie (I can only assume he's a kiddie due to his blatant lack of knowledge and/or experience in the field) who wrote that blog post doesn't get this suggests he should absolutely not be trusted with your data as he'll only lose it.
  This is a classic example of someone bitching about something not because it's bad, but because they simply don't understand it and believe that rather than learn about it properly, it's better to bitch and hope you can somehow effect change by bitching.
  The advantage of most SQL/RDBMS is that they do adhere to the ACID principles, and for people who want to be able to have some degree of trust in their data source that's pretty fucking important. It's no surprise that they've moved over to MySQL though as it's one of the few RDBMS that is completely shit at adhering to the ACID principles and keeping uptodate with solid, stable implementations of modern database functionality.
5. Re:Not getting RDMS by SQLGuru · 2012-05-16 03:01 · Score: 4, Insightful
  
  I completely agree. A lot of non-DB centric people think that they can do more in the app tier, effectively using their databases as glorified file stores. Why even have a database server in those instances? I'm not saying that everything should be done in the database, either, but take advantage of every tool you have.
  NoSQL has a place, so does relational. Learn their strengths and determine which is the best fit for your project. Then, learn how to use the tool to its fullest.
6. Re:Not getting RDMS by serviscope_minor · 2012-05-16 03:04 · Score: 4, Insightful
  
  so sql injection is real bad, bad design of SQL...
  SQL injection actually has nothing to do with SQL.
  Exactly the same attacks happen in any system where you build up a string from user data and pass it off to an interpreter. SQL has nothing to do with it.
  Exactly the same thing used to happen with sudo shell scripts.
  Exactly the same thing happened with javascript injection in very early webmail systems.
  There are plenty of opportunities for code injection on poorly written PHP, too.
  
  --
  SJW n. One who posts facts.
7. Re:Not getting RDMS by Lisias · 2012-05-16 03:15 · Score: 1
  
  But yet, are these same developers that are being *highly* paid on these Web 2.0 times.
  Serious. I was of of them - but got kicked out because I made the huge mistake of pointing the obvious: you must be a skilled programmer to do programs right. Ruby On Rails will not make a good coder from a dumb ass.
  The dumb asses joined up em kick me out. =D
  
  --
  Lisias@Earth.SolarSystem.OrionArm.MilkyWay.Local.Virgo.Universe.org
8. Re:Not getting RDMS by Lisias · 2012-05-16 03:16 · Score: 2
  
  COBOL can be a bad language, but the best paid jobs around here are for COBOL programmers.
  It's hard to find a position (someone must die in order to open up a position), but once you get it, it's for life. =]
  
  --
  Lisias@Earth.SolarSystem.OrionArm.MilkyWay.Local.Virgo.Universe.org
9. Re:Not getting RDMS by thisisfutile · 2012-05-16 03:20 · Score: 1
  
  Well stated! (There, I'm not an "anonymous coward" anymore) ;-)
10. Re:Not getting RDMS by speculatrix · 2012-05-16 03:23 · Score: 1
  
  mod parent up
11. Re:Not getting RDMS by K.+S.+Kyosuke · 2012-05-16 03:41 · Score: 4, Interesting
  
  There's absolutely no reason to change SQL, because if you build a new query language that is based on the same mathematically sound principles of relational algebra then it will er... look just like SQL.
  False. First of all, SQL is NOT based on mathematically sound principles of relational algebra. SQL took the mathematically sound principles of relational algebra and fucked them up. There should be no NULLs, there should be no natural ordering of "columns", there should be no possibility of having duplicate rows, there should be no possibility of inconsistent intermediate states in transactions (no deferred checking) etc. SQL has them all, and then some. Why? Because SQL simply ignores the relation model and "does what IBM and Oracle always did". That's not the same thing as "implementing the relational model".
  Second, there is a separation between the surface structures of a language and its foundations. I really don't think that a language based on relational algebra has to look like SQL. That's like saying that a language with nouns having singular and plural and verbs having tenses has to look like English. Nope, it doesn't have to at all. Just look and VB.NET and C#. Basically two front-ends to a virtually identical language semantics, only one of them does not avoid non-alphabetic structural delimiters like the plague (and is so much more pleasant for it).
  
  --
  Ezekiel 23:20
12. Re:Not getting RDMS by gmack · 2012-05-16 03:49 · Score: 2
  
  That is a common reason for firing. A couple of years ago some programmers wanted me to support them with the boss on switching a project written in python to Java. Their justification? The python programmer called them a bunch of monkeys. No technical arguments at all.
  Unfortunately the boss sided with the monkeys and I was next on the chopping block for pointing out that a 200 Bingo player max using 3 machines (1 web 1 db, 1 backup db) was a design flaw.
13. Re:Not getting RDMS by Nadaka · 2012-05-16 03:59 · Score: 3, Insightful
  
  COBOL can be a bad language, but the best paid jobs around here are for COBOL programmers.
  It's hard to find a position (someone must die in order to open up a position), but once you get it, it's for life. =]
  In the end, there can be only one.
14. Re:Not getting RDMS by TheRealMindChild · 2012-05-16 04:32 · Score: 5, Interesting
  
  There should be no NULLs
  Then how do I, say, indicate the date of death for someone who hasn't died? An IsDead field? Really? (Yes, a NULL in a field is a shortcut for proper relationship, but a lack of relationship when using a linking table will still be represented by NULL)
  
  there should be no natural ordering of "columns"
  Does it really matter? The natural ordering of columns is the order in which you added them to the table. Ignore it. It isn't important, and not in need of a "solution"
  
  there should be no possibility of having duplicate rows
  Firstly, get to know your DISTINCT SQL keyword. Secondly, data in real life sometimes IS duplicate. What the hell should people do? Have a DuplicatedThisManyTimes field? Ugh.
  
  possibility of inconsistent intermediate states in transactions
  That is a property of the database engine, not SQL.
  
  Because SQL simply ignores the relation model and "does what IBM and Oracle always did". That's not the same thing as "implementing the relational model".
  Where do you get this shit? Are you telling me the function of foreign key constraints and referential integrity, and the good ol INNER/RIGHT/LEFT join keywords are just smoke and mirrors and everything is really just a chaotic bowl of soup? References please.
  
  --
  
  "When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
15. Re:Not getting RDMS by SolitaryMan · 2012-05-16 04:44 · Score: 1
  
  ...I barely get how the web even runs anymore.
  In the cloud, obviously.
  What, you didn't get the memo?
  
  --
  May Peace Prevail On Earth
16. Re:Not getting RDMS by Alex+Zepeda · 2012-05-16 04:46 · Score: 1
  
  Unfortunately the developers of these "NoSQL"databases seem to have the same idea. I'm working with one that shill remain nameless but sounds oddly like a piece of fruit right now. The generally accepted best practice for scaling is to pull as much of the logic out of the database layer. While there are fancy aggregation pieces, they're all impossibly slow (and hamper concurrency). Argh.
  
  --
  The revolution will be mocked
17. Re:Not getting RDMS by Anonymous Coward · 2012-05-16 05:08 · Score: 4, Insightful
  
  That's not how debate works. If you can't take a position and defend it against questioning, without resorting to "go away and learn more", then you have no position and shouldn't have posted in the first place.
18. Re:Not getting RDMS by Grishnakh · 2012-05-16 05:10 · Score: 1
  
  A lot of non-DB centric people think that they can do more in the app tier, effectively using their databases as glorified file stores. Why even have a database server in those instances?
  This is pretty easy to answer, I think: because databases offer ACID attributes. Reimplementing those on your own is a big project and likely to create bugs; it's a lot easier to just grab an existing database and use it.
  For instance, what if you need a "glorified file store" that multiple processes on multiple systems can all access at once without any data corruption happening? How are you going to accomplish that effectively without a database? In fact, that's one of the main features of a database. There's nothing that requires a database to do lots of processing on the data; early databases in fact didn't, and really were little more than "glorified file stores". It wasn't until some time later that the relational database was developed, with the idea that the database server would figure out the relations between data, rather than forcing the application to do that work. Today's NoSQL appears to be largely a return to that paradigm, for applications where they don't need to worry about relations, and only need a glorified file store that's able to handle a lot of traffic.
19. Re:Not getting RDMS by plopez · 2012-05-16 05:52 · Score: 1
  
  After RTFA all I could think was, "One rookie mistake after another." I'm glad I don't work there.
  
  --
  putting the 'B' in LGBTQ+
20. Re:Not getting RDMS by plopez · 2012-05-16 05:53 · Score: 2
  
  SQL is nothing like COBOL. Once again they show how they are clueless rookies.
  
  --
  putting the 'B' in LGBTQ+
21. Re:Not getting RDMS by hey+hey+hey · 2012-05-16 06:01 · Score: 1
  
  It wasn't until some time later that the relational database was developed, with the idea that the database server would figure out the relations between data, rather than forcing the application to do that work.
  That "some time later" was about 50 years ago. I think we can safely declare this "mature" technology that everyone can safely use...
22. Re:Not getting RDMS by godefroi · 2012-05-16 06:07 · Score: 1
  
  All sorts of things were added and improved (and made worse as well...) over the last 20 years, they just weren't put in the SQL standard. Vendor lock-in, it's not just for Microsoft.
  
  --
  Karma: Poor (Mostly affected by lame karma-joke sigs)
23. Re:Not getting RDMS by Joey+Vegetables · 2012-05-16 06:21 · Score: 2
  
  GP is correct, and your understanding of the relational model appears to be - no offense - a bit lacking. To address your first example: people and deaths are different, though related, concepts. Ideally, they should have separate tables, plus a view. If someone died, he or she has a row in a Deaths table, which joins to the People table; otherwise, not; no NULLS necessary. When interacting with the data from outside the database, you use a view, which can be engineered to appear to contain NULLs, duplicate rows, and so forth. The views can be updateable, using triggers and whatnot, so you can treat them as if they were tables wherever it is convenient to do so, and they will behave the way you appear to believe they should (or the way your ORM tool believes they should); but, behind the scenes, the data will be stored in 3NF and therefore will be far less subject to insert, update and delete anomalies than they might be otherwise. Now, no one is holding a gun to your head and saying you *must* use the relational model. But I do advise you to understand it, and its benefits, and to use it where it makes sense, and, if you don't use it, to understand the tradeoffs you are making.
  
  --
  
  Nonaggression works!
24. Re:Not getting RDMS by tgd · 2012-05-16 06:33 · Score: 2
  
  I think the main problem is application developers not understanding anything about database theory. The vast majority of databases I encounter are not normalized at all, and it's almost always because they were designed by a developer with no database background.
  Or a developer who is experienced enough to know how bad an idea an overly normalized database is for most applications.
25. Re:Not getting RDMS by gorzek · 2012-05-16 06:41 · Score: 1
  
  While I'm sure that happens, I have yet to encounter it. :-p
  I'm also not against having two databases: one that's not normalized but is instead fully tuned for application performance; another that's a fully normalized, deeply relational data warehouse. The former feeds the latter. I've seen applications that do this, so they can code for performance on the frontend, and have reliable data consistency/logic on the backend.
  
  --
  Check out my world simulator thingy.
26. Re:Not getting RDMS by SomePgmr · 2012-05-16 06:51 · Score: 1
  
  . If someone died, he or she has a row in a Deaths table, which joins to the People table; otherwise, not; no NULLS necessary.
  I'm no SQL genius, but people are either alive or dead, and can only die once. So it seems like you would represent a persons death as a boolean (and optionally, a potentially null date) as a property of the person. I guess what I'm saying is, breaking out a boolean w/ optional date seems like taking normalization too far... no?
  Addresses, phone numbers, etc. are the sort of thing I usually break out, because a person might have more than one, of different types.
27. Re:Not getting RDMS by Grishnakh · 2012-05-16 07:00 · Score: 1
  
  I never said it wasn't. But my understanding is that relational databases didn't really come around until the 70s, which is a little less than 40 years ago, not 50. 50 years ago, they were using hierarchical databases like the IBM IMS system used for the Apollo project. The point is that databases are needed to allow concurrent access (read and write) to data without creating corruption, something you won't get with a flat file unless you implement locking (which incurs a significant performance penalty).
28. Re:Not getting RDMS by Just+Some+Guy · 2012-05-16 07:18 · Score: 1
  
  You're right, as long as column order isn't used for any substantial business logic
  I once told a coworker that I was going to make the database connector randomize the column ordering if it wasn't specifically set in the SELECT clause. After he kept writing code like you describe, I showed him a working implementation of my idea and told him I was pushing it to master the following week. He got the hint and started accessing columns by name.
  
  --
  Dewey, what part of this looks like authorities should be involved?
29. Re:Not getting RDMS by Xest · 2012-05-16 08:01 · Score: 2
  
  "False. First of all, SQL is NOT based on mathematically sound principles of relational algebra."
  No, you've completely missed the point - I'm not saying SQL is an implementation of, and only of the relational model and nothing more, and nothing less, merely that those are it's foundations. SQL absolutely IS based on the principles of relational algebra - it's still ultimately based on much of the important set theory that underlies that when it comes down to it. The point being that sure, whilst SQL is far from perfect, it at least stems from far more solid principles than many alternative offerings nowadays which don't even consider looking at fairly sound mathematical principles as a starting point and so just end up a mess. This isn't to say you can't use them for anything - it's the same thing as PHP, sure these things "work", but you can't come crying when you inevitably encounter bugs that stem from the fact many such alternatives are poorly designed and have an abysmal foundation to their existence. At the end of the day, SQL RDBMS are still pretty much the most foundationally solid persistent data platforms we have, that are also practical to use and that's my key point here.
  It's also worth noting that because of these differing foundations, you can, if you choose to, use the subset of SQL features that do allow you to adhere to the relational model which genuinely does have a mathematically sound foundation. Thus, any issues you have with SQL allowing you to stray away from the relational model are entirely optional. Compared this to the alternatives, and many of these just don't even give you the option of ensuring your data is sound on mathematically sound foundations.
  I'd hence argue that you're being rather dishonest in saying "Because SQL simply ignores the relation model and "does what IBM and Oracle always did". That's not the same thing as "implementing the relational model"." as that's simply not true, SQL doesn't ignore the relational model it's entirely based on it, the only difference being that IBM/Oracle et. al. have extended it.
  "I really don't think that a language based on relational algebra has to look like SQL. That's like saying that a language with nouns having singular and plural and verbs having tenses has to look like English."
  Well, okay, you're right, if we're being pedantic then yes of course you could change it, but fundamentally my intention was that the SQL language is the way it is because it's intended to map closely to the mathematical operations that define relational algebra, precisely because those are it's foundations. If you start to move away from that you lose those foundations, and whilst not entirely the same thing as replacing the SQL language itself, you only have to look at the problems that arise from ORM implementations to see what happens when you move away from those principles - it's okay for some projects, but in many other cases ORM just gets in your way and forces you to mangle your data into a form that no longer makes sense. So yeah, move things around a bit, change a bit of syntax if that's really what you want, but you're still going to need your selection, projection, and your joins and then what? you've got a language that no one knows, isn't supported anywhere, and that ultimately doesn't change anything of any value because it's still just mimicing those core relational operations.
  I guess if you want to get right to the crux of my argument it's this - people like the guy who wrote the blog we're talking about seem to think that the only choices between SQL and alternatives are a few features here and there, and whether you like the syntax - they completely miss the point that there's far more to it than that, that theres fundamental differences in the confidence you can have of the underlying data storage methods, in the data retrieval methods and so forth. Many NoSQL implementations basically just do away with one or more of the ACID principles to achieve their speed benefits and so forth, yet many people usin
30. Re:Not getting RDMS by Joey+Vegetables · 2012-05-16 08:11 · Score: 4, Informative
  
  From a purely pragmatic point of view, it may not seem unreasonable to model it that way. But you should be aware that you are trading one form of complexity for another, probably bigger one. For instance, now, if you want to know who was alive on some specific date, you have to write something like "WHERE DateOfDeath IS NULL OR DateOfDeath > @date." You also will not know for certain whether a NULL means "person is still alive" versus "person is dead but we do not know his or her date of death." When you try to compare different people's death dates any comparison to NULL will yield NULL and you will need special case logic in every such comparison. You will need tristate logic throughout any part of your application that does logical tests based on the date of death. Nullable values will sometimes require special treatment in your code, depending on the language (e.g., whether date/time values are considered to be nullable in that language). I could go on. I also could build you both tables, an updateable view, and a set of SPs to do your basic CRUD stuff on both tables plus "show me living people" and "show me dead people", in a LOT less time than it would take to handle all the code problems that would result from breaking 1NF. I am not an extremist on this subject, but I wear both DBA and developer hats, and when I'm acting as a DBA or in any other situation where I have control over the DB, I do try to get into 3NF, and then denormalize only if there are demonstrated reasons to do so. As a developer, I will sometimes take shortcuts if it's genuinely necessary, but, more often than not, I end up regretting them.
  
  --
  
  Nonaggression works!
31. Re:Not getting RDMS by SomePgmr · 2012-05-16 08:54 · Score: 1
  
  I see, and that makes sense. I guess it'd depend on how much work I'm likely to do with that particular field, then too. Though I realize that probably depends too much on being able to see the future. ;)
32. Re:Not getting RDMS by petit_robert · 2012-05-16 09:01 · Score: 1
  
  Man, this must be the stupidest argument I read in years.
  Typing the column names can easily be avoided with an appropriate query to list them, and a few strokes in a text editor will adapt the list to the query. Emacs has an sql mode which I use all the time for these things; very convenient.
  I am dumbfounded that you could put this forward as an argument against SQL. This workplace you mention really appears like an awful place.
33. Re:Not getting RDMS by TheLink · 2012-05-16 10:20 · Score: 1
  
  What if you know a person has died and cause of death, but you don't know when? How would you store that?
  
  I'd store person has died and cause of death but leave death date as null - to be updated later.
  --
  
  Too many replies beneath your current threshold
34. Re:Not getting RDMS by SQLGuru · 2012-05-16 10:35 · Score: 1
  
  If your data isn't relational, then you have a good candidate for NoSQL. If your data IS relational, you should consider a relational database. The rest of my post was about picking the right tool for your project and then learning to take full advantage of said tool. If you just need file access, use your file system.
35. Re:Not getting RDMS by Grishnakh · 2012-05-16 10:47 · Score: 1
  
  Using your filesystem doesn't work if you need concurrent file access by multiple processes (which may be on multiple different systems). You'll get data corruption.
36. Re:Not getting RDMS by JonySuede · 2012-05-16 10:59 · Score: 1
  
  Then how do I, say, indicate the date of death for someone who hasn't died? An IsDead field? Really? (Yes, a NULL in a field is a shortcut for proper relationship, but a lack of relationship when using a linking table will still be represented by NULL
  Usually the lack of a relationship is represented by not have an entry between the related entities. But your remark about nullable columns being a shortcut for proper relationship is quite insightful.
  
  --
  Jehovah be praised, Oracle was not selected
37. Re:Not getting RDMS by shutdown+-p+now · 2012-05-16 11:18 · Score: 1
  
  Thing is, SQL has specific semantics for NULL which only makes sense when it's interpreted as "unknown" (rather than "value missing" or "not applicable").
38. Re:Not getting RDMS by shutdown+-p+now · 2012-05-16 11:23 · Score: 1
  
  To answer the original quote, the reason we're "still querying our databases by constructing strings of code in a language most closely related to freaking COBOL, which after being constructed have to be parsed for every single query?" is because SQL is a language based on mathematically sound principles, and which is supported widely, and known widely, and is processed by database engines across the globe that have literally decades of stability behind them, data in them and so forth.
  If I get the idea correctly, TFA was not arguing against "mathematically sound principles", but rather against the notion that you have to use text for queries. There's no reason why one can't come up with an API to describe sound relational queries in a typesafe way, and avoid the whole overhead of constructing a text string which would then immediately be decomposed into an AST (and then eventually some internal query representation) by the database - one wonders why not just pass the AST to begin with?
  And, let's admit it, SQL syntax is horrible. It's extremely verbose and woefully inconsistent. It also doesn't represent relational algebra particularly well.
  
  There's absolutely no reason to change SQL, because if you build a new query language that is based on the same mathematically sound principles of relational algebra then it will er... look just like SQL.
  Actually, I think it would rather look like Date's "Tutorial D".
39. Re:Not getting RDMS by shutdown+-p+now · 2012-05-16 11:25 · Score: 1
  
  Exactly the same attacks happen in any system where you build up a string from user data and pass it off to an interpreter. SQL has nothing to do with it.
  SQL has everything to do with it in a sense that, in order to perform a query on an SQL database, you have to build up a string from user data and pass it to the database. That's precisely the point that TFA makes. They don't want to send strings, they want a sane query API where all operations are exposed directly, and you only pass in values (think of it as a query builder).
40. Re:Not getting RDMS by shutdown+-p+now · 2012-05-16 11:27 · Score: 1
  
  The data itself is just data, it doesn't have any particular preference for how it's represented. You can represent any data set relationally if you want to. The trick is knowing when it actually makes sense.
41. Re:Not getting RDMS by complete+loony · 2012-05-16 11:28 · Score: 2
  
  I've worked on quite a few large-ish database applications (eg 800 - 2000 tables, some with multi-million rows), and I'd say I'm fluent with SQL. But the thing that annoys me most about SQL, from a maintenance perspective, is how much of the database structure ends up strewn around in your code base. SQL is *not* good at encapsulation.
  When a new requirement comes in that should cause you to change some of the primary relationships in your database, you have a look at how much code you'd need to change to do it properly, and end up just hacking in something really ugly instead. Maybe you should be able to use a foreign key, or many-to-many join table relationship, just by using its name (or something like that). Instead, every query must list the entire set of columns involved. And often provide hints to the database engine on which indexes to use first.
  And the problem gets much worse if you build your schema "properly" and heavily normalise the structure of everything. Because then any "simple" query easily involves 7 tables. And you probably need to code it into a stored procedure so you can build the data manually in a temporary table, since the database engine can't choose the right indexes and processing order and ends up scanning through millions of records to find the half dozen you actually wanted.
  Sure you could write a set of classes to build the SQL query strings for you, so you can encapsulate these details in one place. But then you'll probably end up with an ugly inner-platform that only the original designers really understand and your skills with using it wont translate directly into any other application or language.
  I'm not saying the NO-SQL folks have reduced any of the complexity of working with large complicated data sets. But SQL is not the silver bullet you make it out to be either.
  
  --
  09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
42. Re:Not getting RDMS by LurkerXXX · 2012-05-16 11:59 · Score: 1
  
  If that worked well, in the past several decades someone would have come up with a working implementation that's usable. They haven't.
  Every time SQL comes up, some pedant always comes along saying how aweful SQL is because it doesn't correctly implement the relational model. When asked to provide an alternative database system that does correctly follow Date, they have no answer, because there isn't anything available that's better than SQL.
43. Re:Not getting RDMS by LurkerXXX · 2012-05-16 12:01 · Score: 1
  
  "And you probably need to code it into a stored procedure".
  Probably a good idea. SQL injection worries go away right there if done properly.
44. Re:Not getting RDMS by shutdown+-p+now · 2012-05-16 12:18 · Score: 1
  
  If that worked well, in the past several decades someone would have come up with a working implementation that's usable. They haven't.
  They have, actually. The reason why those don't replace SQL is because, like many other things in life in general and IT in particular, SQL is "good enough" - there simply isn't sufficient value in replacing it with something that is marginally better, even for all its quirks, because of the associated cost of having to abandon mature existing DBMS and frameworks.
  Another similar example is C. It has plenty of problems from PL design perspective, but none of them are big enough to bother replacing it, especially with all the implementations and tools and code that's already out there.
45. Re:Not getting RDMS by LurkerXXX · 2012-05-16 12:24 · Score: 1
  
  Don't hold back. What is the name of the working implementation, and where can I buy/download it from?
46. Re:Not getting RDMS by shutdown+-p+now · 2012-05-16 13:02 · Score: 1
  
  Suneido is one example, though it's a somewhat different niche from traditional shared RDBMS.
  Seriously, though, why do you need examples? Do you find it hard to imagine an RDBMS that would have query language syntax better than SQL?
47. Re:Not getting RDMS by HornWumpus · 2012-05-16 13:15 · Score: 2
  
  You've got it backwards. The highly normalized database is connected to transaction processing. Highly normalized databases have few lock issues and are optimized for transaction processing. Also TPS is narrow so you have good coders dealing with the relatively little code that bangs on it hard.
  The read-only database denormalized for simplicity and query performance is the data warehouse. That's where the report monkeys work.
  
  --
  John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
48. Re:Not getting RDMS by LurkerXXX · 2012-05-16 13:54 · Score: 1
  
  I need an example, because as I said, pedants talk all the time about how SQL doesn't truely follow the relational model of Date and Codd, but never provide a working alternative that does. I said nothing about syntax at all. I make no claim that SQL syntax is great or better than any other. I said I want to see a workable database that follows the *model* correctly/completely. I've only looked at their site for a few minutes, but I seen nothing that indicates they follow the model any better. Only that they like their syntax better, and is object oriented. Neither have any bearing on following Date /Codd's model.
49. Re:Not getting RDMS by shutdown+-p+now · 2012-05-16 15:04 · Score: 1
  
  I need an example, because as I said, pedants talk all the time about how SQL doesn't truely follow the relational model of Date and Codd, but never provide a working alternative that does. I said nothing about syntax at all. I make no claim that SQL syntax is great or better than any other. I said I want to see a workable database that follows the *model* correctly/completely. I've only looked at their site for a few minutes, but I seen nothing that indicates they follow the model any better. Only that they like their syntax better, and is object oriented. Neither have any bearing on following Date /Codd's model.
  Date himself gave the description of a language that follows the model, so if you want to see an example, that's it. Here is a simple implementation of it.
  I had already explained the reasons why there are no production-quality, mature implementations - this has everything to do with ROI; SQL is simply not bad enough to bother.
  As for Suneido, judging by your mention of "object oriented", you've only looked at their application language, which doesn't have anything to do with relational model. What I was rather referring to is the query language for their integrated database; it's not object oriented in any way - it's pure relational algebra.
50. Re:Not getting RDMS by Xest · 2012-05-16 21:02 · Score: 1
  
  Why not use views?
51. Re:Not getting RDMS by complete+loony · 2012-05-16 21:39 · Score: 1
  
  Because you run into the exact same problems with efficiency.
  
  --
  09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
52. Re:Not getting RDMS by Xest · 2012-05-16 21:55 · Score: 1
  
  If you're reaching a point where you have such performance problems, and everything that should be indexed is etc. then you have a number of choices:
  1) Look into improving the platform - clustering etc.
  2) Look into denormalisation, the views can remain in the same state
  3) Look at the feasibility one or more of the ACID principles and switching to NoSQL but accept that you will lose integrity/reliability in doing this
  The problems you're talking about aren't so much about inherent problems with SQL and RDBMS's, but about dealing with that volume of data in general. The point is with views you can perform optimisations like denormalisation without screwing the interface to the code, which is what you were complaining about.
53. Re:Not getting RDMS by julesh · 2012-05-16 22:12 · Score: 1
  
  There's absolutely no reason to change SQL, because if you build a new query language that is based on the same mathematically sound principles of relational algebra then it will er... look just like SQL.
  Actually, no it wouldn't. First of all, there's an assumption you're making here which is unwarranted: why do we need a query _language_ at all? A query API would remove a rather large class of serious bug: data quoting errors, typically leading to either SQL injection vulnerabilities, data corruption (ever see a site with all the 's turned to \'?) or both. It would also be more intuitive to new developers, better integrated with tools (autocomplete in language IDEs, etc), and more efficient (removing the need for text parsing).
  Secondly, SQL is actually based on some rather horrible abuses of relational calculus that make most theoreticians in the field cringe. Including Christopher Date and Hugh Darwen, two of the original designers of SQL. You should check out their manifesto for a relational database system not based on SQL. Among other things, they think SQL screwed up handling nulls, and that 'group by' and 'having' are a violation of relational principles that lead to inefficient queries.
  
  The advantage of most SQL/RDBMS is that they do adhere to the ACID principles
  Which is also their disadvantage. ACID is brilliant, and I agree that it should be the default starting position. But it has terrible performance implications, and when we're dealing with large quantities of data where it doesn't matter if (say) a small random selection of recent changes are lost in event of a system failure, or if different clients see an inconsistent view of what changes have happened, you can do an awful lot better performance-wise. Which is where CouchDB and their ilk come in. You just need to be sure, before starting down that route, that you really do have one of the minority of problems for which sacrificing that degree of reliability makes sense.
  Which is why I'm laughing at this article. There shouldn't be any case where you migrate from couchdb to mysql. You should only ever be using couchdb in the first place if you're sure SQL can't handle what you're trying to do.
  (That and the fact that putting non-relational data in an SQL field is a newbie error that I made about 15 years ago and have already learned the lessons of...)
54. Re:Not getting RDMS by julesh · 2012-05-16 22:26 · Score: 1
  
  There should be no NULLs
  Then how do I, say, indicate the date of death for someone who hasn't died? An IsDead field? Really? (Yes, a NULL in a field is a shortcut for proper relationship, but a lack of relationship when using a linking table will still be represented by NULL)
  By storing the data in a separate relation and not having a row present if the data isn't present. This is how the relational model is supposed to work, the use of null to represent missing data is a variation from the formal and valid mathematical model that the GGP post was praising and suggesting that as SQL implemented it perfectly we didn't need any other database interface. He was wrong.
  
  there should be no possibility of having duplicate rows
  Firstly, get to know your DISTINCT SQL keyword.
  The fact that SQL provides a means to allow you to force your data to be relational is no justification for the fact that by default it produces non-relational output to some queries. Again this is a deviation from the proven-valid mathematical that the GGP post was praising, etc.
  
  Secondly, data in real life sometimes IS duplicate. What the hell should people do? Have a DuplicatedThisManyTimes field? Ugh.
  If you have duplicates, your data is not in a normalised relational model. Data in a relation says "the values in these columns are related to each other in this specific way". This person has this phone number. This patient has an appointment to see this doctor at this date+time. The structure of those sentences is part of the formal definition of what a relation is. It has semantic implications, and one of those implications is that *duplicated rows are meaningless*.
  If you have duplicated data and you actually need it, you're missing some important piece of information because those two rows must actually be talking about two different things. There's a column missing from your relation.
55. Re:Not getting RDMS by julesh · 2012-05-16 22:37 · Score: 1
  
  I think you're missing the point. So here's an illustration of what I think GP is talking about. I want to write a query. I type: "SELECT ". Now I want a column name. The environment is unable to complete at this point because it doesn't know what table/join I'll be selecting from.
  If SQL syntax were the more natural FROM [join description] SELECT [projection] WHERE [selection] this problem wouldn't arrise. You could also then leave the projection section as optional, allowing queries like 'FROM contacts WHERE id=5' rather than the frankly bizarre 'SELECT *' syntax SQL forces you to use. The data source is both the most important thing and the only thing that is always required, so putting it at the top of the query makes most sense.
56. Re:Not getting RDMS by julesh · 2012-05-16 22:43 · Score: 1
  
  SQL is nothing like COBOL. Once again they show how they are clueless rookies.
  They are alike in a sense: both of them are designed on the misguided notion that making code read like a natural language makes it easier to work with. Both of them became entrenched simply because they were the first 'good enough' implementation of their respective ideas. Both of them are extremely hard to completely replace.
57. Re:Not getting RDMS by julesh · 2012-05-16 22:53 · Score: 1
  
  I'm working with one that shill remain nameless but sounds oddly like a piece of fruit right now.
  I think that description matches about half of them, doesn't it? What it is with these "next big thing" techs and strange names?
58. Re:Not getting RDMS by julesh · 2012-05-16 22:57 · Score: 1
  
  This is pretty easy to answer, I think: because databases offer ACID attributes. Reimplementing those on your own is a big project and likely to create bugs; it's a lot easier to just grab an existing database and use it.
  ACID is actually pretty simple as long as you can serialize transactions and lock individual objects while accessing them. This is actually possible for most applications that aren't large enough to require a distributed system (which is to say, it'd probably work for 90% of all systems that get built).
59. Re:Not getting RDMS by purpledinoz · 2012-05-17 00:10 · Score: 1
  
  SELECT * FROM MYTABLE t WHERE t.DATA = :par OR (:par IS NULL AND t.DATA IS NULL);
  BARF! So, why exactly a an equals comparison to NULL always result to false? I understand this from the theoretical point of view, but in reality, it ONLY causes headaches.
60. Re:Not getting RDMS by purpledinoz · 2012-05-17 00:27 · Score: 1
  
  I'm no database expert, but it seems to me that normalization is an optimization to sacrifice time in favour of reduced disk space. Of course normalization also prevents errors. Just wondering what people's opinions are about this. For example, if we have an Employee table, should the Country be normalized? Should PostalCode be normalized? Many people have the first name "John", should FirstName be normalized too (prevent duplicate data)? Essentially, every column in the Employee table could be normalized so that it's just a table full of FKs. Of course that's extreme, but it has to stop somewhere... I've worked with "normalization extremists" who wish to normalize every column, but I'm of the thinking that normalization should not be taken too far.
61. Re:Not getting RDMS by Joey+Vegetables · 2012-05-17 01:51 · Score: 1
  
  Some people would still insist on a separate table. I probably would not (even though Deaths probably should be). In most contexts, we can adopt the convention here that a birth date of NULL should be interpreted as "unknown." Again, I'm breaking the relational model - very slightly - but in a way I'm fairly confident, from experience, should not cause too many problems. BTW, there are contexts in which this model might *not* make sense. How about a maternity unit in a hospital? We might want to start recording information about a baby who has not yet been born. He or she most certainly exists, and may have various attributes that we can know and wish to make use of, yet birth date may not be one of them, because it hasn't happened yet and we don't know for sure when it will (or even if it will). There are good relational ways to model all of these attributes, and a nullable BirthDate column may or may not be an acceptable compromise. In my opinion (which is not shared by relational purists), it depends largely on whether there is a single, unambiguous meaning for that NULL.
  
  --
  
  Nonaggression works!
62. Re:Not getting RDMS by Grishnakh · 2012-05-17 03:25 · Score: 1
  
  You think it's easier for a mediocre programmer with very limited time to write an application to handle locking and concurrency issues, than to just grab a free database like PostgreSQL or SQLite and use that? If it's a small system and you're not dealing with a lot of data, SQLite only uses something like 300K.
  Reinventing the wheel is rarely a good idea.
63. Re:Not getting RDMS by lewiscr · 2012-05-17 05:52 · Score: 1
  
  When interacting with the data from outside the database, you use a view, which can be engineered to appear to contain NULLs, duplicate rows, and so forth.
  One of the base tenants of relational theory is that a view is indistinguishable from a base table. You can take 2 tables and make a view using them. Or you can flip it around, have a single base table and 2 views that resemble the original tables. Both methods are equally valid, and a user querying the data can't tell the difference.
  Until you outlaw NULLs in base tables. Now base tables and views are not interchangable.
  I agree that NULLs shouldn't be stored in a table, but I also believe that I should be allowed to hurt myself if I really want to. I don't want to do it, but it's just too slow to do it the correct way. So I compromise now, submit a bug report, and revisit when performance improves. And future-me swears at past-me for doing it.
64. Re:Not getting RDMS by bill_tvm · 2012-05-17 06:18 · Score: 1
  
  Perhaps Tutorial D is the right solution for this: a strong adherence to the fundamentals and a purer syntax
65. Re:Not getting RDMS by Tablizer · 2012-05-17 11:13 · Score: 1
  
  There are a lot of "purity" debates about whether RDBMS and/or SQL should allow Null's and/or duplicate rows and such debates go on and on like vi versus emacs.
  However, coder-centric issue that irks me about SQL is the inability to reference sub-queries by name in a given query; instead one has to physically nest them. Referencing sub-queries by name would allow not only more readable query code, but also allow factoring out of certain duplication, and simplify code snippet libraries because the top of the sandwich bread is not separated from the bottom bread, which deep nesting causes.
  It's almost like the concept of views, but they could be created for the scope of a given query alone.
  
  --
  Table-ized A.I.
66. Re:Not getting RDMS by Tablizer · 2012-05-17 11:21 · Score: 1
  
  Typically there would be a "status code" of some sort, with values such as "alive", "dead", "missing", "coma", "zombie", etc. Nulls can serve as a limited form of status indicator, but shouldn't necessarily do all the heavy lifting of describing state if more detail is needed.
  
  --
  Table-ized A.I.
67. Re:Not getting RDMS by Tablizer · 2012-05-17 11:31 · Score: 1
  
  Amen! I've been in similar such debates and ask for realistic scenarios where doing X would cause problems, but some "theory-oriented" debaters seem to dismiss real-world benefits/problems as useful to exploring trade-offs.
  It's almost as if they argue, "Doing X will undermine the general integrity of the system causing unspecified or non-describable problems". It's a rather ethereal and anti-empirical approach.
  If you cannot predict and describe real-world side-effects along with estimated probability of occurrence, then you don't understand the problem well enough. Or more specifically, you don't understand the issue's relationship to the real world well enough.
  In the end, most of IT comes down to an economic problem. We don't build temples to "pure" ideas just for the sake of having pure concepts in place. That's not what owners/customers/bosses want from us. We have to describe how our tools help THEM reach their goals.
  
  --
  Table-ized A.I.
68. Re:Not getting RDMS by Tablizer · 2012-05-17 11:39 · Score: 1
  
  API's to represent non-trivial languages often either tie the "side" language to a specific app language (making it non-portable), have the same problem in the end, and/or make code hard to read.
  It's almost like going from:
  NORMAL (expression):
  print(a + b)
  BLOATED (API):
  am = new math.ArithmeticManager()
  opA = new math.Operand((float) a)
  opB = new math.Operand((float) b)
  am.addOperand(opA)
  am.addOperand(opB)
  am.operator = new math.operators.Addition()
  am.executeMathOperation()
  system.io.output.print(am.mathOperationResult())
  
  --
  Table-ized A.I.
69. Re:Not getting RDMS by shutdown+-p+now · 2012-05-17 11:46 · Score: 1
  
  The problem here isn't with the form, it's with the representation. print(a+b) is perfectly okay from that perspective - it's concise, but it's still a type-verified etc expression with separate nodes indicating "a" and "b" and operator "+". The problem with SQL is that it's rather print("a+b"), where print takes the string and parses it in some special way that the language does not know.
  Your second example is typical of a language that is not sufficiently expressive to concisely express DSLs in a general way, and does not include a specialized DSL for this particular case - such as Java. In C#, for example, you can write something like Print((a, b) => a + b), and have Print obtain the AST for the expression that is passed to it (rather than the result of the computation); and it also special-cases queries by providing LINQ (for which AST can also be obtained if desired). So you can write things in their base form, with method calls, such as xs.Where(x => ...).Select(x => ...); or you can write from x in xs ... where ... select ..., just as concisely as SQL (in fact, usually more so).
70. Re:Not getting RDMS by Tablizer · 2012-05-17 12:00 · Score: 1
  
  But every app language will have different ways available and not available to build and describe non-trivial expressions.
  That means you have to learn how to do it for each and every app language, whereas the string "SELECT * FROM foo WHERE ..." is going to be pretty much the same thing in each app language: a string.
  And somebody doesn't have to build a new expression-builder API library for each app language.
  
  --
  Table-ized A.I.
71. Re:Not getting RDMS by shutdown+-p+now · 2012-05-17 12:21 · Score: 1
  
  But every app language will have different ways available and not available to build and describe non-trivial expressions. That means you have to learn how to do it for each and every app language
  Sure. How many app languages do you need to learn, though?
  On the other hand, when every app language has its own query syntax, that query syntax can be made more idiomatic to how that language does things. For example, in C#, standard string operations are expressed as methods on class String - things like ToUpper, Trim, Substring etc. Because LINQ is a part of C#, it also exposes those operations, and the LINQ provider for a particular database can then translate them to the corresponding database operations. From the perspective of a C# developer, he only had to learn a single language: C# - and he can now write uniform queries against various data sources.
72. Re:Not getting RDMS by jedwidz · 2012-05-17 15:56 · Score: 1
  
  There should be no NULLs
  Actually, NULLs are needed for outer joins.
73. Re:Not getting RDMS by petit_robert · 2012-05-17 20:20 · Score: 1
  
  "The environment is unable to complete at this point because it doesn't know what table/join I'll be selecting from."
  I got that
  "If SQL syntax were the more natural FROM [join description] SELECT [projection] ..."
  IMO this is a very minor inconvenience compared to the power of SQL for managing large datasets, when one knows how to use it. You are welcome to propose a new standard, but I wonder if you considered all the implications of this change on things other than the convenience of typing?
  With MS-Access, I used to solve this with a 3 line VB procedure that outputs the list of field names. With Postgresql, I simply run this query (the table has 34 fields) :
  SELECT column_name FROM information_schema.columns WHERE table_name = 'tblcontentieux';
  A few key strokes in Emacs to add the table identifier t1 to each field (just a simple replace-string) and the appropriate SQL. Very easy, I did it just for fun, and did not type any of these field names :
  SELECT t1.id_contentieux, t1.id_etat, t1.affaire, t1.libelle, t1.date_debut_contentieux, t1.date_fin_contentieux, t1.ref_dossier, t1.ref_avocat, t1.ref_huissier, t1.ref_assureur, t1.id_service, t1.notes_contentieux, t1.id_categorie, t1.id_position, t1.id_nature, t1.suivi_par, t1.montant, t1.id_site, t1.id_client, t1.description, t1.franchise, t1.id_keyword, t1.ref_archive, t1.remboursement_prevu, t1.address1, t1.address2, t1.ville, t1.code_postal, t1.id_budget, t1.responsabilite, t1.id_agent, t1.id_contrat, t1.id_garantie, t1.id_objet_parc FROM tblcontentieux t1;
74. Re:Not getting RDMS by Rich0 · 2012-05-19 01:17 · Score: 1
  
  If someone died, he or she has a row in a Deaths table, which joins to the People table; otherwise, not; no NULLS necessary.
  The problem with this sort of logic is that it gets really cumbersome to work with.
  I want to store an optional address for each record. So, that means a place table and a many:many relationship between people and places. Oh, but a place might or might not have an address line 2 or zip code, so that means a sub-street and zip table too, maybe 1:0-1 on the first and many:many on the second. You'd end up with 30,000 views in such a database, and those views basically look just like the tables you've gotten rid of, and those are what everybody would use, and if you wanted to take advantage of your highly-normalized design and assign two places to a single person your code would probably break since that now leads to duplicate rows in the view that the code wasn't designed to handle, since we don't value competent programmers in most enterprises that need these big complex databases anyway.
  I can get how mathematically it is more elegant to get rid of the nulls, but when you have people with relatively minimal skill programming and manipulating the database it isn't very practical.
75. Re:Not getting RDMS by Rich0 · 2012-05-19 01:27 · Score: 1
  
  I think the main problem is application developers not understanding anything about database theory. The vast majority of databases I encounter are not normalized at all, and it's almost always because they were designed by a developer with no database background.
  Unfortunately I've seen the opposite extreme as well - databases that are SO normalized that to find anything at all you need about 35 joins. Sure, everything is arbitrarily expandable, but basically there is no practical way to work with it without writing code. We have a database at work that anytime I have to actually look inside and see what is going on I end up copying and pasting from my personal library of 1.5 page long SQL SELECTs. People don't take much advantage of all the expandability since doing anything to the system costs a ton of money.
  
  This whole idea of throwing random strings of structured text into a database column, and then relying entirely on the program code to parse and use it... well, why the hell even use a relational database, then?
  And that is the opposite extreme. I've also seen a database at work where half the fields in the databases are CLOBs full of XML. That is even more painful to deal with than the 1.5 page SQL since you end up replicating half the database over the network to do anything that wasn't fielded at the DB-level - like a NoSQL from hell.
  Oh, and those two systems talk to each other. :)
The decision the simple by Anonymous Coward · 2012-05-16 01:41 · Score: 1

If you want to hire twentysomething engineers who have party during the week and have gadgets falling out of their pockets, go with CouchDB or Cassandra or MongoDB.
If you'd rather have people in their 30's and 40's with families, then go with MySQL or Oracle.
1. Re:The decision the simple by Sarten-X · 2012-05-16 01:47 · Score: 4, Interesting
  
  That's actually a rather insightful point...
  If your application fits well with the methodologies of a traditional RDBMS, use a traditional RDBMS, and hire people who are trained and experienced in using those methodologies to their full potential.
  If you're dealing with the latest Big Data paradigms and designs, where you can sacrifice some of the rigidity of a RDBMS to gain some flexibility and cheaper scalability, use a NoSQL database, and hire people who aren't stuck in their old RDBMS ways.
  
  --
  You do not have a moral or legal right to do absolutely anything you want.
2. Re:The decision the simple by Anonymous Coward · 2012-05-16 01:57 · Score: 5, Insightful
  
  That's actually a rather insightful point...
  If your application fits well with the methodologies of a traditional RDBMS, use a traditional RDBMS, and hire people who are trained and experienced in using those methodologies to their full potential.
  If you're dealing with the latest Big Data paradigms and designs, where you can sacrifice some of the rigidity of a RDBMS to gain some flexibility and cheaper scalability, use a NoSQL database, and hire people who aren't stuck in their old RDBMS ways.
  The real key is for the person doing the hiring to understand which of those of methodologies fits their application.
3. Re:The decision the simple by gstoddart · 2012-05-16 02:01 · Score: 2
  
  And most importantly, make sure you know the difference.
  Because I should think someone who thinks you should ditch your RDBMS when it's the thing you need to keep using is going to cause you more problems than they're worth. Of course, the opposite is true ... I remember someone who insisted in writing ER diagrams to describe our system, despite it not being an RDB, and not being accurately described by ER diagrams -- but to him everything was an ER diagram.
  It's not uncommon for geeks to push to use the latest stuff simply because it's the latest. (Or, as you point out, use something because that's what they've always used)
  I've actually seen someone suggesting we scrap an architecture to go with something he'd read recently -- despite having insisted we switch to the current architecture after reading about that.
  After a certain point, you just realize they're a technology magpie and tell them to STFU if they're not providing solid reasoning for why this is better in this context. After a while "because it's newer and better" becomes code for "shiny and pretty". Especially if these whims happen in shorter periods than your development lifecycle.
  
  --
  Lost at C:>. Found at C.
4. Re:The decision the simple by sycodon · 2012-05-16 02:07 · Score: 3, Insightful
  
  RDBMS systems can be flexible also. It just takes a bit of planning, a good understanding of your data and a well designed application...which you should do/have regardless of your storage solution.
  Call me set in my old RDBMS ways, but if I'm supporting it then I want to know what the hell is gong on with the data.
  
  --
  When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
5. Re:The decision the simple by TheSpoom · 2012-05-16 02:10 · Score: 4, Informative
  
  The real key is for the person doing the hiring to understand which of those of methodologies fits their application.
  This is insighful. I've worked extensively with RDBMS solutions and now quite a bit with NoSQL technologies. They each have their place. An entire article could be written on where each fits most naturally, but in general if you don't need to join between tables, need to throw data to your store at a high velocity (e.g. logging), and/or need a loose schema, a NoSQL solution works best. If what you're doing can be naturally modeled (i.e. users HAVE AND BELONG TO stations, stations HAVE MANY playlists, etc. etc.), use an RDBMS.
  One can see in the subtext of the GP that they may not get this, with their comment that people using RDBMS solutions are "stuck in old ways". It seems like they are saying that NoSQL is effectively always best. I'm curious why they think that. Nail, hammer, etc...
  
  --
  It's better to vote for what you want and not get it than to vote for what you don't want and get it.
  - E. Debs
6. Re:The decision the simple by sycodon · 2012-05-16 02:17 · Score: 1
  
  going, going, gong!
  
  --
  When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
7. Re:The decision the simple by h4rr4r · 2012-05-16 02:29 · Score: 3, Insightful
  
  Or use a better DB like Postgres. How the MySQL still is popular I will never know. I think it is a conspiracy to prove FREE DBs suck.
8. Re:The decision the simple by The+Dancing+Panda · 2012-05-16 02:43 · Score: 1
  
  Not all of the 20 something engineers are idiots. I'm 27, and I'd much rather use a SQL Database for most things. Also, I know what I'm doing.
9. Re:The decision the simple by Sarten-X · 2012-05-16 03:01 · Score: 1
  
  people using RDBMS solutions are "stuck in old ways". It seems like they are saying that NoSQL is effectively always best.
  No, no, no, no, no, no, no, no, no, and hell no.
  I'm referring somewhat-sarcastically to the RDBMS proponents who reject NoSQL out of hand. The ones who see "database" and think it must have a rigid structure, where all connections are made with JOINs. The ones who don't accept that NoSQL databases are inherently different and must be designed differently. If a programmer is actually stuck thinking in terms of an RDBMS, they should not be working in a NoSQL database. If the programmer is flexible enough to do both, good for them! They're hired!
  It's all about having the right tool for the job, and hiring someone who knows how to use the tool in question. The process for soldering pipes together is very different from soldering electrical components, but that does not prevent someone from being able to do both.
  For the record, a major part of my NoSQL experience was building a web application for medical research. We pulled data into a Hadoop/HBase store (for speed and cheap parallelism), ran processing on it, and put results into MySQL (for rigid structure and maturity). The web app ran entirely off the MySQL instance. It was the right tool for the job.
  
  --
  You do not have a moral or legal right to do absolutely anything you want.
10. Re:The decision the simple by TheSpoom · 2012-05-16 03:21 · Score: 1
  
  Fair enough, sorry if I misunderstood.
  
  --
  It's better to vote for what you want and not get it than to vote for what you don't want and get it.
  - E. Debs
11. Re:The decision the simple by gstoddart · 2012-05-16 03:33 · Score: 1
  
  The ones who see "database" and think it must have a rigid structure, where all connections are made with JOINs
  So, I'm actually curious about this part.
  I've worked in RDB's, and I've worked in things that are more based on Berkeley DB ... but I am actually having a hard time thinking of specific examples of where I'd want something database-ish and not have the need for JOINs.
  Berkeley gives you key value pairs, but the product I worked on which was based on it allowed us to do searching on multiple of those, which was kind of join-ish.
  I just can't think of a specific case in which I have data structured enough to be in a database, but unstructured enough to never have joins, schemas, or the rest of the RDB trappings. But it may just be that I've not encountered any yet.
  
  --
  Lost at C:>. Found at C.
12. Re:The decision the simple by siride · 2012-05-16 03:42 · Score: 1
  
  MySQL has a version 5 now, you should check it out.
13. Re:The decision the simple by XSpud · 2012-05-16 03:56 · Score: 1
  
  Of course, the opposite is true ... I remember someone who insisted in writing ER diagrams to describe our system, despite it not being an RDB, and not being accurately described by ER diagrams -- but to him everything was an ER diagram.
  I can't say whether entity relationship diagrams were appropriate in the situation you describe but there is nothing wrong in principle in using ER diagrams to describe non-RDB systems. ER diagrams describe the logical or semantic model, not the physical implementation, and are therefore DB agnostic. Yes, they are often used to help design an RDB schema but their real value is to understand your data at the semantic level.
  Unfortunately, many don't grasp this distinction and you'll see many RDB systems where the ER description is used directly to create a normalised DB schema. Except for simple and small databases these will usually perform poorly. And you'll also see both RDB and non-RDB systems where developers have not appropriately considered the semantic view of the data in the system. Except for simple and small databases these will usually become difficult to maintain over time.
14. Re:The decision the simple by Eponymous+Hero · 2012-05-16 04:16 · Score: 1
  
  except it doesn't look like they do.. from their blog:
  
  We were excited to try a NoSQL db, having spent too many years using MySQL in ways that the designers of relational databases never imagined. CouchDB seemed well suited to our needs.
  wow. and i thought i had some mean hubris.
  
  --
  insensitive clod overlords obligatory xkcd car analogy russian reversals whoosh pedant fanbois ftfy in 3...2...1..PROFIT
15. Re:The decision the simple by h4rr4r · 2012-05-16 04:19 · Score: 1
  
  Does it have transactions yet? In the default engine, not by going to someone else and getting InnoDB.
  Does it have booleans yet?
16. Re:The decision the simple by Anonymous Coward · 2012-05-16 04:32 · Score: 1
  
  It really gets my goat when people try to pretend like InnoDB is effectively not part of MySQL. Nobody uses MyISAM unless it's a toy project. Every project I've been on that involves MySQL uses InnoDB (or XtraDB if they are using Percona). MyISAM is intentionally simple. Look at the name! It's not going to be developed further because people who need regular relational concepts are using InnoDB. So let's let this dead horse die.
  Does MySQL need booleans? Even SQL Server and Oracle don't have explicit booleans. In fact, SQL Server is mean enough that you can't do things like "WHERE TRUE AND (...)" that you can do in MySQL. Nevertheless, it's not a big deal. In MySQL you can use TINYINT(1) which maps to a data type alias called BOOLEAN. In SQL Server you use a bit field. Whatever Postgres has is not worth it because, I mean, it's pretty damn small issue.
17. Re:The decision the simple by siride · 2012-05-16 04:34 · Score: 1
  
  The anonymous coward comment was by me. I forgot to log in on this account.
18. Re:The decision the simple by Alex+Zepeda · 2012-05-16 04:56 · Score: 1
  
  If you're dealing with the latest Big Data paradigms and designs, where you can sacrifice some of the rigidity of a RDBMS to gain some flexibility and cheaper scalability, use a NoSQL database, and hire people who aren't stuck in their old RDBMS ways.
  Well, no. If you're dealing with "big data", you still need to evaluate which tool is appropriate for the task. If you're calling it "NoSQL", you're probably referring to a rather immature set of products designed to pander to people looking for teh new hotness. If you're looking for a key-value store, mature solutions like Berkeley DB have been around for ages.
  It's not that key-value stores don't have their place, it's that people running around chanting the NoSQL mantra are really just reinventing the wheel. Poorly.
  
  --
  The revolution will be mocked
19. Re:The decision the simple by Grishnakh · 2012-05-16 05:14 · Score: 2
  
  Also, I know what I'm doing.
  This line really doesn't count for anything. How many people are really going to say "I don't know what I'm doing", or "I'm incompetent"? Everyone thinks they know what they're doing.
  You may or may not really know what you're doing, we have little way to know for sure, but you saying it about yourself is meaningless.
20. Re:The decision the simple by tixxit · 2012-05-16 05:35 · Score: 1
  
  At the end of the article, they say they don't do any joins in their MySQL DB (not joking). So, I'm guessing they still match the "should probably be using a NoSQL DB" but aren't camp. I'm not sure what to make of this.
21. Re:The decision the simple by gstoddart · 2012-05-16 05:41 · Score: 1
  
  I can't say whether entity relationship diagrams were appropriate in the situation you describe but there is nothing wrong in principle in using ER diagrams to describe non-RDB systems. ER diagrams describe the logical or semantic model, not the physical implementation, and are therefore DB agnostic.
  LOL, let me paint a more vivid picture ... what you say is true for the most part, but in this example it was fairly egregious.
  Imagine you're a new middle manager at a tech company. Imagine you come from a world where everything is an RDB. Now imagine that based on a few meetings, you've taken a system you know nothing about, and designed your own ER diagram and insisted everybody use it in their documentation.
  Now imagine that the resulting ER diagram had nothing to do with the actual system, incorrectly described the semantics of the whole thing, and was written in a piece of software which only that middle manager had a license to (his own personal license) and therefore nobody else in the company could access -- other than the exported image he provided us. Oh, and you've taken everything to fourth normal form making it apply even less to the system in question. It's a pretty, but meaningless, abstraction.
  In this case, the manager was completely wrong, and there were aspects of the behavior which couldn't be modeled using his ER diagrams -- they just didn't line up conceptually, and worked differently.
  This was a highly specialized problem domain, wasn't implemented using an RDB in any way, and had a couple of things which didn't map to the vocabulary available to him -- or at least it didn't map to what the tool provided.
  The guy was such a prat it took about two weeks for the technical people to start completely ignoring everything he said. It took about 2-3 weeks after that before the rest of the managers also started ignoring him. Not long after that, he wasn't around.
  In this context, in this system, I believe the ER diagram was incapable of actually expressing some of the relationships and/or semantics of the system (this was now well over a decade ago so I don't remember the particulars) and it was being championed by someone who was an idiot -- it may not be the ER diagram itself. But when we tried to explain how his model was completely inaccurate, he didn't seem to be able to grasp the reasoning and kept sputtering about how it must be correct because there was no way we could have implemented it without it being just like his diagram.
  So, you're right -- maybe in this context someone else could have been able to map the actual stuff to the model. But I do remember a couple of 20+ year geeks with big grey beards trying to explain why this was never going to be suitable in this case (to no avail).
  
  --
  Lost at C:>. Found at C.
22. Re:The decision the simple by h4rr4r · 2012-05-16 05:59 · Score: 1
  
  InnoDB is not part of MySQL. Deal with it or beg oracle to buy it.
  Does DROP DATABASE work yet on tables with foriegn key contraints or do you still have to do it 10 times?
23. Re:The decision the simple by Shimbo · 2012-05-16 06:16 · Score: 1
  
  In the default engine, not by going to someone else and getting InnoDB.
  InnoDB is the default engine.
24. Re:The decision the simple by h4rr4r · 2012-05-16 06:37 · Score: 1
  
  Embarrassed by performance? not at all. My db performs very well at not losing data.
  Why would I vacuum a table, auto-vaccum does that all for me. Guess you have not kept up in the last decade either.
25. Re:The decision the simple by tibman · 2012-05-16 07:18 · Score: 1
  
  The default engine, MyISAM, is by design a non-transactional engine. BY DESIGN. If you want transactions you have to use InnoDB. It has always supported booleans, that i've seen. bool/boolean is the same as tinyint(1). 0==false, anything non-zero is true.
  https://dev.mysql.com/doc/refman/5.5/en/other-vendor-data-types.html
  https://dev.mysql.com/doc/refman/5.0/en/storage-engines.html
  
  --
  http://soylentnews.org/~tibman
26. Re:The decision the simple by turbidostato · 2012-05-16 07:18 · Score: 1
  
  "I'm referring somewhat-sarcastically to the RDBMS proponents who reject NoSQL out of hand."
  Not out of hand but out of pragmatism.
  I'm old enough to have delevoped B-trees over filesystems for "plain database" access (well, more like ordered filesystem) so I basically understand (because I already did it) when and why fast access to loosly coupled datasets is advantageous and I know my way on relational databases while I know I'm not an expert on them too.
  Given that background I asure you that just systematically answering "no, you are wrong" to whoever comes to you talking about how nosql is a perfect fit for their project is the sounded decision. If and when one out of one hundred happens not to be the case, the one talking about nosql will know better arguments and, even if he doesn't, not so much will be lost than giving a chance to the other ninety nine.
27. Re:The decision the simple by petit_robert · 2012-05-16 09:23 · Score: 1
  
  They are, there is just disagreement over them:
  https://en.wikipedia.org/wiki/Boolean_data_type
  Postgresql has a boolean data type
28. Re:The decision the simple by petit_robert · 2012-05-16 09:31 · Score: 1
  
  "Why are you using booleans in your database? "
  I forgot to reply to this part : booleans are useful when you only ever want 2 (or perhaps 2+NULL) values allowed for the column. It's about consistency.
29. Re:The decision the simple by TheLink · 2012-05-16 10:33 · Score: 1
  
  You may or may not really know what you're doing, we have little way to know for sure, but you saying it about yourself is meaningless.
  Actually it is not meaningless. It's a bit like raising the bet in poker ;). But on Slashdot it doesn't cost much to challenge that.
  In absence of a "challenge" it can save some time and space.
  --
  
  Too many replies beneath your current threshold
30. Re:The decision the simple by julesh · 2012-05-16 23:10 · Score: 1
  
  I just can't think of a specific case in which I have data structured enough to be in a database, but unstructured enough to never have joins, schemas, or the rest of the RDB trappings. But it may just be that I've not encountered any yet.
  Consider graph structured data, where you access it by following labelled edges in a graph, and where nodes have attributes. Different node types have different attribute sets, but edges with a particular label may point to nodes with varying types in different circumstances. This is a rather loose structure, but structured enough that a database store for it is a good idea. It is also rather hard to model with relations and joins (it *can* be done, and there are software packages out there that attempt to do it automatically, but performance is generally considered suboptimal). If your data matches this description, a NoSQL db is possibly better for you than trying to force your data to conform to a more relational model.
  The downside of using this kind of database structure, though, is that operations that work on your data in bulk become much harder. If an edge with a particular label can't be relied on to point to particular type of node, querying based on its values may fail. Because you can't load the data in bulk and perform a join, operations on large subsets of your database are slower. If you need to perform bulk operations, SQL is much better than this kind of database.
  It's all tradeoffs.
  (By the way, in case you're struggling to think of data that fits the model I described above, consider a typical object-oriented program. Fields in your objects become attributes at nodes, pointers between objects edges in the graph. Some pointers may be polymorphic, i.e. you may have multiple subclasses of the type of pointer.)
31. Re:The decision the simple by Sarten-X · 2012-05-17 02:05 · Score: 1
  
  How much flux? How much heat? How much solder? What tool do you use to apply the heat?
  
  --
  You do not have a moral or legal right to do absolutely anything you want.
32. Re:The decision the simple by Sarten-X · 2012-05-17 02:29 · Score: 1
  
  This is a perfect example of the attitude I'm referring to. You dismiss NoSQL out of hand as being immature, then propose a solution that doesn't actually provide the same functionality. Yes, on the surface most NoSQL solutions are key/value stores, and Berkeley DB is a key/value store, but the similarities end there. BDB is not clustered, not networked, and not designed for distributed computation. Yes, with great effort you can use the replication to emulate a cluster... but then you're again dealing with an immature codebase managing your data.
  Berkeley DB is great for what it does, but it doesn't do the same job as those immature, pandering, mantra-inspiring NoSQL solutions.
  While I'll agree that "NoSQL" is a term that should die, the technologies it refers to shouldn't. I'd prefer a better name like "flexible-structured clustered eventual-consistency data stores", but that makes conversation difficult.
  
  --
  You do not have a moral or legal right to do absolutely anything you want.
33. Re:The decision the simple by Sarten-X · 2012-05-17 02:32 · Score: 1
  
  At the end of the article, they say they don't do any joins in their MySQL DB (not joking). So, I'm guessing they still match the "should probably be using a NoSQL DB" but aren't camp. I'm not sure what to make of this.
  Without knowing much about their application or architecture, I'd assume that they're gathering a lot of data that doesn't have many relationships to other data. There's lots of models that fall easily into a list structure, which can of course be put in a single table without needing joins.
  
  --
  You do not have a moral or legal right to do absolutely anything you want.
34. Re:The decision the simple by lewiscr · 2012-05-17 06:03 · Score: 1
  
  InnoDB is the default engine, as of version 5.5. Oracle fixed enough performance problems that InnoDB is faster than MyISAM.
35. Re:The decision the simple by tibman · 2012-05-17 08:05 · Score: 1
  
  Neat! Thanks for the info
  
  --
  http://soylentnews.org/~tibman
36. Re:The decision the simple by Alex+Zepeda · 2012-05-19 10:45 · Score: 1
  
  I dismiss these NoSQL tools as being immature because they are immature. I'm in the process of migrating away from a NoSQL solution named like a piece of fruit. I was able to crash it by doing something WILD AND CRAZY. I created an index on a live system. Known bug. Not documented in the wiki. That was fun.
  If these NoSQL solutions delivered on what they promised, or came close, that'd be one thing. But, at least with the DB that sounds like a fruit, the promise of all this distributed computation was for naught. MapReduce is the only way to distribute calculations across multiple nodes. However, MapReduce was not only the slowest way of munging data, but it still has/had major concurrency issues because it relies on process level locking. Not to be deterred I took a look at people who had been trying to use this distributed goodness (a.k.a. sharding). I was frightened away by the numerous complaints about this fruit-like database's inability to keep the partitions balanced, resulting in massive contention for the process lock on one node. That's an immature code base.
  Or you could read the TFA. CouchDB is simply not mature enough for any semblance of high availability. Most of the downtime was credited to CouchDB being unstable. But, more importantly, as I've already outlined in another post on this story, distributed computation with NoSQL stuff ain't all it's cracked up to be. The reason it's being pushed so hard by the NoSQL camp is because that's the only way these products can scale in the first place. The reason you don't see stuff like that in more mature products is because fancy concepts like MVCC or finer grained locking mean that you typically don't need to distribute calculations.
  Anyhow. At this point the only way we've been able to keep any semblance of performance (and we've got a tiny data set) has been to pull as much logic out of the NoSQL parts as possible. IOW it's a dumb key-value store... and if all you need is a key-value store, BDB's got you covered (and it's already way more mature).
  
  --
  The revolution will be mocked
Why not PostgreSQL? by JamesA · 2012-05-16 01:48 · Score: 5, Interesting

PostgreSQL 9.2 beta improves scalability, adds JSON
http://www.h-online.com/open/news/item/PostgreSQL-9-2-beta-improves-scalability-adds-JSON-1573815.html
1. Re:Why not PostgreSQL? by squiggleslash · 2012-05-16 02:18 · Score: 5, Funny
  
  Because it's an urban myth.
  The reality is there are only two SQL databases in the entire universe: MySQL and Oracle. You might have been told others exist, hell, you might even have worked on something called "SQL Server" in your .NET shop, but in reality: they don't. They're all figments on your imagination. Your imagination is SO determined to find better, more robust, faster, powerful, alternatives to MySQL and Oracle that an entire fantasy world comprised of "a successor to Ingres that makes MySQL look like a piece of crap" and "A Microsoft product that doesn't feel like a thirty year old mainframe product hacked onto a modern platform" develops in your head.
  C'mon, if these mythical products actually existed, sites like Slashdot wouldn't ignore them, right? Right?
  
  --
  You are not alone. This is not normal. None of this is normal.
2. Re:Why not PostgreSQL? by aclarke · 2012-05-16 02:40 · Score: 1
  
  I'm not generally a Microsoft fan, but I love SQL Server. However, I haven't started a new project with it in years, I guess since pricing for SQL Server 2008 was announced. I've not been in a situation where I could justify the costs as the project (hopefully) was successful and scaled up. I also don't like being forced to run my database server on Windows. For these reasons, I just don't use it any more except in projects where it was selected years ago. I know you have to look at TCO, but I still can't generally justify the costs for SQL Server and Windows Server.
  
  If I was working in a corporate environment with big budgets to throw at projects, then I'd probably still be using it SQL Server at times. In my world though, it generally just doesn't add up.
  
  --
  
  www.clarke.ca
3. Re:Why not PostgreSQL? by sourcerror · 2012-05-16 02:44 · Score: 1
  
  MySql-s MyIsam is much faster with reads than PostgreSql. I think for the things people use NoSql, MyIsam is perfect. And when you want to move better ACID support, you can effortlessly switch to InnoDB.
4. Re:Why not PostgreSQL? by vadim_t · 2012-05-16 02:49 · Score: 2
  
  That's all fine until you need to actually write to that table. With myISAM any write needs a table lock, and that makes performance drop like a rock.
5. Re:Why not PostgreSQL? by sourcerror · 2012-05-16 02:53 · Score: 1
  
  And how is it worse than NoSQL?
  Anyway, that's when you move to InnoDB.
6. Re:Why not PostgreSQL? by Lisias · 2012-05-16 03:22 · Score: 1
  
  Not really.
  MySQL is good enough for 80% of the currently web sites. Damn it, even PHP is good enough for a small site.
  As someone below states, if you need to be serious about ACID, you can switch to InnoDB.
  Are you pissed of with Oracle? Go for MariaDB.
  I like PostGRES, and acknowledge its technical superiority on every single aspect of MySQL.
  But I'm using MariaDB on my site: I (still) don't need the PostGRES superiority, and MariaDB is easier to maintain (not to mention its smaller memory footprint!).
  
  --
  Lisias@Earth.SolarSystem.OrionArm.MilkyWay.Local.Virgo.Universe.org
7. Re:Why not PostgreSQL? by Nadaka · 2012-05-16 04:19 · Score: 1
  
  I use mysql because it is the best database supported by my cheap ass webhost (access? hells no) for free (mssql for $20 a month? no thanks).
8. Re:Why not PostgreSQL? by Alex+Zepeda · 2012-05-16 05:08 · Score: 1
  
  And with InnoDB you still get table locks every time you update an autoincrement field. Performance then drops like a fucking rock. Anyway, that's when you move to Postgres.
  
  --
  The revolution will be mocked
9. Re:Why not PostgreSQL? by kriston · 2012-05-16 07:14 · Score: 1
  
  This post serves as the obligatory reminder that Microsoft SQL Server is a fork of Sybase 8.
  
  --
  Kriston
10. Re:Why not PostgreSQL? by sourcerror · 2012-05-16 10:22 · Score: 1
  
  MySQL uses table-level locking for MyISAM, MEMORY and MERGE tables, page-level locking for BDB tables, and row-level locking for InnoDB tables.
  
  http://dev.mysql.com/doc/refman/5.0/en/internal-locking.html
11. Re:Why not PostgreSQL? by TheLink · 2012-05-16 11:07 · Score: 1
  
  I've used both PostgreSQL and SQL Server, and I prefer PostgreSQL. What do you really like about SQL server?
  
  I have to use SQL server at work and I personally find transact sql really ugly. There are also other annoyances - but I think I've repressed them for now - I could probably dig them up ;).
  
  Overall the engine and backend is not too bad. But it was quite an eye-opener to find out deadlocking can be very common (unless you switch to MVCC ala "snapshot isolation" - which somehow was slower for our app). And also there was a weirdness where a query could take 1 millisecond or 100+ depending on the value of the parameter you used the first time you ran it after a reindex! e.g. reindex, run the query with param=A, run same query with param B. second query takes 1 millisecond. Reindex, run query with param=C, run same query with param B, second query takes more than 100 milliseconds... Same query for all parameters.
  --
  
  Too many replies beneath your current threshold
12. Re:Why not PostgreSQL? by aclarke · 2012-05-16 11:37 · Score: 1
  
  Well, off the top of my head, I really like SQL Server Management Studio. It's the best database front end I've used, by a wide margin.
  
  I also like SQL Server's profiling and optimization tools. The query plan is nice, the tracing tools are great, etc. The depth of options I have to performance tune SQL Server over anything I've seen in MySQL is significant. It might all be there in MySQL too, but it's easier to find and use in SQL Server.
  
  I spend much more of my time these days with MySQL than SQL Server. I can work with MySQL, but I really ENJOY working in SQL Server. It's the only Microsoft tool I actually actively like. I haven't used PostgreSQL enough to really form an opinion about it.
  
  --
  
  www.clarke.ca
13. Re:Why not PostgreSQL? by Alex+Zepeda · 2012-05-17 01:32 · Score: 1
  
  Looks a whole lot like table level locking to me.
  
  --
  The revolution will be mocked
14. Re:Why not PostgreSQL? by TheLink · 2012-05-17 02:59 · Score: 1
  
  Well MySQL is crap. No surprise SQL Server looks great compared to MySQL.
  It's a got a nice feature set on paper which impresses the bosses, but you can't use all of them at the same time. You want X that means you need backend P, which means you can't have Y which needs backend Q. You can run both backends, but then stuff using the backend that doesn't support transactions isn't rolled back...
  And it's not reassuring when they can allow bugs like these out the door:
  http://bugs.mysql.com/bug.php?id=31001
  http://bugs.mysql.com/bug.php?id=32202
  Note the similarity between #31001 and #32202 AND that the #31001 bug reappeared a year later!
  MySQL is the PHP of databases. Both products seem to be produced by developers who are working on something that is beyond their expertise.
  But yeah SQL Management Studio is nice. FWIW psql is decent for a cli, though of course it's not as sophisticated.
  --
  
  Too many replies beneath your current threshold
15. Re:Why not PostgreSQL? by sourcerror · 2012-05-17 06:09 · Score: 1
  
  You see it wrong. They're only locking on a single variable, the counter. It doesn't affect all writes that happen in the table (e.g. updates that don't change the ID field).
16. Re:Why not PostgreSQL? by Alex+Zepeda · 2012-05-17 15:12 · Score: 1
  
  Right. Inserts still lock at the table level, however. Your move.
  
  --
  The revolution will be mocked
17. Re:Why not PostgreSQL? by sourcerror · 2012-05-18 23:26 · Score: 1
  
  Your move to prove that Postgres doesn't block on inserts when using sequences (auto increment).
18. Re:Why not PostgreSQL? by Alex+Zepeda · 2012-05-19 07:40 · Score: 1
  
  RTFM.
  If you need to narrow down your search, look for the phrase "To avoid blocking concurrent transactions that obtain numbers from the same sequence".
  
  --
  The revolution will be mocked
Nosql in Postgres by rla3rd · 2012-05-16 01:48 · Score: 4, Interesting

You can get json support using the PLV8 extension http://code.google.com/p/plv8js/wiki/PLV8

or altenatively you can use the hstore data type.
The Uncle Larry feature by symbolset · 2012-05-16 01:49 · Score: 1

This makes me want to avoid your suggestion. If you presently don't have the Larry problem it's best avoided.

--
Help stamp out iliturcy.
Re:Has to be said by Anonymous Coward · 2012-05-16 01:49 · Score: 5, Funny

MongoDB is Webscale. MySQL is not Webscale, because it uses joins. SQL also has impetus mismatch.
programmers don't know how to store data by vlm · 2012-05-16 01:54 · Score: 2

But the real story may be that programmers are never satisfied with the tool they have.
Ah typo

But the real story may be that programmers don't know how to store data
They many not know because no one knows the business needs, but more often because they have no idea what they're doing WRT to data storage.
IT training tends to cover data manipulation pretty well "how to add two numbers'
IT training gets shakey on data structures "So, in junior level class we will talk about data structures, which is too bad because you've already developed at least two years of bad habits first"
IT training tends to pretty much skip data storage "In a senior level class, you might talk about scalability, maybe in an optional class. Or maybe you'll take a semester of cobol instead"

--
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
1. Re:programmers don't know how to store data by Zocalo · 2012-05-16 02:04 · Score: 2
  
  But the real story may be that programmers are never satisfied with the tool they have.
  Ah typo
  Possibly, but given how quick many programmers are to get into a fruitless pissing match over their favourite language it's quite apropos, no?
  
  --
  UNIX? They're not even circumcised! Savages!
2. Re:programmers don't know how to store data by vlm · 2012-05-16 03:48 · Score: 1
  
  1) I know my business needs.
  In some industries you can pretty well predict the future. In others.. no.
  One app I built years ago would have literally required geographic changes to expand. Then "surprise" it gets rolled out to 5 additional bigger cities. Well, that was unexpected... I had a O(n**2) algorithm in there that did pretty well for values of N around 7 where N can never increase beyond 7, but not so good for values of N around 57. whoops.
  
  --
  "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
Is a DB even needed sometimes? by Viol8 · 2012-05-16 02:00 · Score: 1, Interesting

It seems to be a knee jerk reaction amongst a lot of developers and designers that as soon as your app starts requiring persistent data beyond ini values a database is needed. Why? For large but simply structured data something like json or XML or even a flat csv file is perfectly adequate. Performance can be an issue during searches but if for example you have a fixed record size with key sorted data then finding a given key is simple (binary chop or similar).
It seems to me that reaching for a DB is the easy way out taken by a lot of oders and they end up paying for it in maintanability, bugs and support.
1. Re:Is a DB even needed sometimes? by neorush · 2012-05-16 02:14 · Score: 1
  
  This logic works just fine until the customer goes, oh! lets add our customer's information, transactions, etc. to that completely static tool that we never planned to need a real database for. Even very simple tools and services we at least use SQLite, with PDO connections it's (mostly) painless to move the system up a level or two in relational database capability. I've found that schemaless databases have there place, but the jump on the schemaless data band wagon has been premature for many software projects. That said, look at draw something's expansion, zero downtime as they increase capacity because they used a scalable schemaless database.
  
  --
  neorush
2. Re:Is a DB even needed sometimes? by TheSpoom · 2012-05-16 02:19 · Score: 1
  
  Starting with a database avoids the pain of migrating flat files to a database later when the database is needed (and if your app gets at all popular, it will be).
  Sure, if you're only ever expecting 10k rows of data with very little concurrent access, go nuts with your flat files.
  
  --
  It's better to vote for what you want and not get it than to vote for what you don't want and get it.
  - E. Debs
3. Re:Is a DB even needed sometimes? by rtaylor · 2012-05-16 02:23 · Score: 4, Informative
  
  A CSV or XML or JSON file is a db (a DB is just structured data).
  Are relational DBs always required? Certainly not.
  The big benefit to a relational DB with lots of enforcement at the data layer is that you can have one or more applications reading/writing to it with minimal concern of data corruption.
  What isn't obvious is that second application is often aggregate reporting for management. "How many customers are using $foo and where do they live geographically". With a relational DB, I might knock that query out in a few minutes across millions of customers.
  With a flat XML file per customer spread across a number of servers, this could take days to assemble, particularly if $foo is nested deep in the structure.
  Having spent far too much time writing one-off scripts to gather customer data because the middleware didn't support that type of query, I've actually gone the other way and started shoving some business logic into the DB.
  Functions such as isCustomerPaymentOverdue are now in the relational DB with a very thin model in the middleware to allow for much easier and faster reporting.
  
  --
  Rod Taylor
4. Re:Is a DB even needed sometimes? by Anonymous Coward · 2012-05-16 02:37 · Score: 1
  
  ...For large but simply structured data something like json or XML...
  You wouldn't happen to be the developer whose code I inherited, would you?
  Because if you are, I'd just like you to know that I hate you with ever fibre of my being.
5. Re:Is a DB even needed sometimes? by serviscope_minor · 2012-05-16 03:12 · Score: 4, Insightful
  
  The big benefit to a relational DB with lots of enforcement at the data layer is that you can have one or more applications reading/writing to it with minimal concern of data corruption.
  Not just that, but good use of relations and normalization makes whole classes of bug impossible.
  
  --
  SJW n. One who posts facts.
6. Re:Is a DB even needed sometimes? by Lisias · 2012-05-16 03:43 · Score: 1
  
  Not just that, but good use of relations and normalization makes whole classes of bug impossible.
  That's precisely the motive the current cast of "developers" avoid it like the Devil.
  They NEED bugs in order to justify the overdue payments and overpaid weekend death marches.
  Software Development *must* be a arcane practice, not a scientific knowledge - or they will be measured under rational arguments, and ending up loosing their jobs.
  This kids think they are artists, and behavior as they are.
  
  --
  Lisias@Earth.SolarSystem.OrionArm.MilkyWay.Local.Virgo.Universe.org
7. Re:Is a DB even needed sometimes? by Ksevio · 2012-05-16 05:42 · Score: 1
  
  In the web world it helps to have a database managing concurrent reads/writes and access. Set up your file IO wrong for your csv file and a search bot could accidentally wipe your data.
8. Re:Is a DB even needed sometimes? by shutdown+-p+now · 2012-05-16 12:11 · Score: 1
  
  Even with a single application, having ACID, and especially transactions, can be very handy. Of course, a relational DB is not the only way to get that, but it's often the one that's most readily available, and also the only one that many programmers even know.
9. Re:Is a DB even needed sometimes? by shutdown+-p+now · 2012-05-16 12:20 · Score: 1
  
  When people use XML, JSON or CSV files, they usually just read the whole thing into memory and run queries against that, without any need to worry about concurrency etc. And doing that kind of thing is trivial in any decent high-level framework - e.g. in .NET you'd typically just write three lines of code to use XmlSerializer to deserialize the file as a collection of objects, and then use LINQ to Objects to query that.
  Of course, it is not a replacement for DB in general, but that's precisely the point - a lot of use cases don't need a DB - they don't need concurrency etc, and they're dealing with at most a thousand records, not millions.
10. Re:Is a DB even needed sometimes? by julesh · 2012-05-16 23:34 · Score: 1
  
  Of course, it is not a replacement for DB in general, but that's precisely the point - a lot of use cases don't need a DB - they don't need concurrency etc, and they're dealing with at most a thousand records, not millions.
  With 64-bit systems with 16GB of memory now retailing for under £500, even millions of records can now be handled in memory with ease.
11. Re:Is a DB even needed sometimes? by claytongulick · 2012-05-17 05:03 · Score: 1
  
  I agree with this completely, I just implemented a project that uses an XML data store.
  It's an interesting architecture, each instance gets a 20ish meg XML file, loaded into memory. This is then queried with e4x for the UI. It's blazing fast, and doesn't need any round trips to a database server, and it's quick to develop.
  Of course, the only reason I can get away with this is because it's read-only, so technically it isn't a data store, but a data transmission methodology. Still, in most traditional apps I've seen, this would have had an n-tier kind of thing, where the UI queries middleware, which talks to a DB. Using a big ole XML doc as a read-only data source has been pretty cool, and worked great. It's also massively decreased points of failure and opportunities for bugs.
  I went back and forth quite a bit on XML v/s JSON for this, but I ended up going with XML because e4x is pretty nice for querying data. I fully expect to redo all this in a few years if/when I migrate to HTML5 for the UI, because other than Mozilla no browser vendor will support e4x.
  
  --
  Drinking habits can be dangerous. You can choke on the cloth and the nuns will wonder where their clothes are.
Wikipedia and Slashdot use MySQL by tepples · 2012-05-16 02:05 · Score: 3, Insightful

Anonymous Coward wrote:

MySQL is not Webscale, because it uses joins.
Then how does a non-webscale database power popular web sites such as Wikipedia and Slashdot? If you don't do joins in the database, you'll probably end up doing the equivalent of joins (using one value as the key in another table) in your application.
1. Re:Wikipedia and Slashdot use MySQL by SuiteSisterMary · 2012-05-16 02:07 · Score: 5, Informative
  
  MongoDB is webscale. It has sharding. That is the secret webscale sauce.
  
  --
  Vintage computer games and RPG books available. Email me if you're interested.
2. Re:Wikipedia and Slashdot use MySQL by emorning · 2012-05-16 02:19 · Score: 1
  
  like he said, you end up doing joins in your application
3. Re:Wikipedia and Slashdot use MySQL by Hognoxious · 2012-05-16 02:33 · Score: 4, Funny
  
  MongoDB can write its data to /dev/nul/ for extra performance.
  
  --
  Confucius say, "Find worm in apple - bad. Find half a worm - worse."
4. Re:Wikipedia and Slashdot use MySQL by Anonymous Coward · 2012-05-16 03:36 · Score: 3, Funny
  
  If /dev/null is webscale then I will use it.
5. Re:Wikipedia and Slashdot use MySQL by cc1984_ · 2012-05-16 07:58 · Score: 1
  
  MongoDB can write its data to /dev/nul/ for extra performance.
  Just so you know, so can MySQL
  http://dev.mysql.com/doc/refman/4.1/en/blackhole-storage-engine.html
How is XML indexed? by tepples · 2012-05-16 02:09 · Score: 1

do they have an XML field type? MS SQL Server does [...] which allows you to essentially keep the table schema-less but still allows you to perform complex queries on the contained data.
But how does it index the data in the XML or JSON fields? How does it, say, tell an element containing a number from an element containing text? Does it act like SQLite, which is dynamically typed (and thus can store text in any field) but can be told to prefer to compare and index certain columns as numbers, dates, text with Unicode collation, or binary data?
1. Re:How is XML indexed? by serviscope_minor · 2012-05-16 03:10 · Score: 1
  
  Does it act like SQLite, which is dynamically typed
  That's one of the annoying thing to me about SQLite, and the justification is a marvellous case of doucle standards. It goes something like this:
  Static typing is bad because it limits flexibility and you can check elsewhere if you need to. We will never add it ever because we are right about this.
  Then, in release foo.bar of SQLite
  We've added foreign keys!
  Either constraints are good or they are bad. Typing is just another constraint, like foreign keys, and various other domain constraints. I cannot see any valid argument for having foreign keys but not type constraints. It jest seems bizarre to me. It's not like they could be optional or anything.
  
  --
  SJW n. One who posts facts.
2. Re:How is XML indexed? by SQLGuru · 2012-05-16 04:01 · Score: 1
  
  SQL Server really does have good online documentation. Just search for "SQL Server [feature]" and pick the MSDN link. In this case, I searched for XML INDEX. http://msdn.microsoft.com/en-us/library/bb934097.aspx
  Oracle supports XML Types with XPath style queries. I don't remember any XML Indexes, but you can always use a function based index against the XPath.
3. Re:How is XML indexed? by Christian+Smith · 2012-05-16 04:57 · Score: 1
  
  Does it act like SQLite, which is dynamically typed
  That's one of the annoying thing to me about SQLite,
  Either constraints are good or they are bad. Typing is just another constraint, like foreign keys, and various other domain constraints. I cannot see any valid argument for having foreign keys but not type constraints. It jest seems bizarre to me. It's not like they could be optional or anything.
  Foreign keys are good, but type independent. You just want to check that some foreign key actually references valid data in another table. If they match, they match.
  However, typing in SQL is almost certainly a case of implementation details leaking through the abstraction layer. The data type is defined by the data and how it is used. Traditional SQL databases require that type data up front so they can organise the data on disk. But you shouldn't care how data is organised on disk.
  When comparing dates, the data type is implicitly some form of timestamp. Whether that is stored on disk as a string with a "YYYY-MM-DD HH:SS" format, or as a UTC integer offset from 1970-01-01 00:00, it doesn't matter. Using the SQLite way, the two can be transparently stored and compared. There are probably better examples, like not having to declare varchar sizes versus text versus CLOB versus BLOB. They're all stored and indexed and searchable in the same way (not that you'd want to index any sort of LOB.)
  And if you don't like it, just declare the types of the columns up front, and SQLite will use type affinity to convert the data for storage.
  Personally, I like the optional typing in SQLite.
4. Re:How is XML indexed? by serviscope_minor · 2012-05-16 06:15 · Score: 1
  
  Foreign keys are good, but type independent.
  I know. I was just questioning the bizarre logic that leads the author to declare that some constraints are good and should be implemented, but others aren't.
  He's even implemented domain constraints, but for some reason the type of the data is missing from the possible domain constraints. That's really strange. You can limit the range of an integer, but you can't guarantee that it is actually an integer! That makes no sense whatsoever!
  However, typing in SQL is almost certainly a case of implementation details leaking through the abstraction layer. The data type is defined by the data and how it is used. Traditional SQL databases require that type data up front so they can organise the data on disk. But you shouldn't care how data is organised on disk.
  I don't care how it's organised on disk. What I want is as much type checking up front as possible: the same reason I prefer statically typed languages. Type checking is no more implementation details leaking through than using relations.
  Most of the time, you know up front what your data is going to be and that will never change. For instace, it is probably quite unlikely that you will decide you need to change from a DATE type to a VARCHAR type in the same column.
  If I'm going to have a brainfail and accidently try to insret a string into a date column, I'd rather know when I try to insert it (i.e. where the bug is), than have an error somewhere later where the bug isn't.If I accidently insert a string into a date column, then I'll have had a transaction succeed yet leave the database in an inconsistent state.
  Using the SQLite way, the two can be transparently stored and compared.
  No, not really, or more to the point only if it converts your knows how to convert the integer into a sensible date. And if you've just inseretd a seconds-since-the-epoch time, how can you convert if you don't even know the epoch, which is difference for Windows, SQL Server, UNIX, NTP and various other systems.
  Also what if a user inserts a string? It might parse as a date, or it might not. How do you know if it should parse as big endian, middle endian or little endian?
  What if it doesn't parse? Then at some point you will have a comparison between a date and a string. What does that even mean?
  And if you don't like it, just declare the types of the columns up front, and SQLite will use type affinity to convert the data for storage. Personally, I like the optional typing in SQLite.
  The typing in SQLite isn't optional: you never get to have type constraints at all. You can give it hints, but ultimately you can stuff any type into any column for any database.
  Not having to declare string lengths is more of a case of having good types available (i.e. variable length string type). That's orthogonal to not having type constraints checked.
  I dislike that hole for the same reason I disliked the lack of foreign key constraints until a few of years ago. In the ideal world, foreign key constraints are unnecessary because I wouldn't ever make a mistake. Same applies to type constraints. In both cases, I do make mistakes, and I'd rather have the database tell me early and noisily that I've fouled up, so bad data doesn't get into the database.
  
  --
  SJW n. One who posts facts.
5. Re:How is XML indexed? by ultranova · 2012-05-16 07:05 · Score: 1
  
  When comparing dates, the data type is implicitly some form of timestamp. Whether that is stored on disk as a string with a "YYYY-MM-DD HH:SS" format, or as a UTC integer offset from 1970-01-01 00:00, it doesn't matter. Using the SQLite way, the two can be transparently stored and compared.
  
  But it's not stored as a string or an integer, it's stored as a bit pattern. It's impossible to compare two bit patterns unless you know what they mean (even for equality - for example, the same string could yield two different patterns in two different encodings). In your example, how is the computer supposed to know that you're comparing dates if neither data item is marked as a date, in other words typed?
  
  --
  Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
6. Re:How is XML indexed? by shutdown+-p+now · 2012-05-16 12:08 · Score: 1
  
  But how does it index the data in the XML or JSON fields? How does it, say, tell an element containing a number from an element containing text?
  I have no idea about JSON support in SQL Server, but for XML, it doesn't actually store it simply as plain text. Instead, it internally decomposes it to something that is more or less relational. Similarly, when you do a query over an XML field - which, in MSSQL, involves XQuery expressions - it converts that XQuery into the equivalent relational query over the internal representation.
  With respect to types, obviously, in untyped XML everything is a string. However, it also permits you to specify an XML Schema for the field, in which case individual nodes become strongly typed according to what's in the schema.
7. Re:How is XML indexed? by julesh · 2012-05-16 23:28 · Score: 1
  
  SQLite does use types -- it just doesn't use typed columns. Each entry has a type associated, so it can determine what algorithm to use for the comparison when it pulls the row in from disk.
all db solutions have problems at some point by Torvac · 2012-05-16 02:09 · Score: 1

with big data comes great responsibility.
1. Re:all db solutions have problems at some point by JustOK · 2012-05-16 02:24 · Score: 1
  
  SELECT * and ye shall find.
  
  --
  rewriting history since 2109
Native JSON fields by Xanni · 2012-05-16 02:15 · Score: 2

PostgreSQL 9.2 (now in beta) includes native JSON fields:
http://www.h-online.com/open/news/item/PostgreSQL-9-2-beta-improves-scalability-adds-JSON-1573815.html
It's also available as an extension for the current 9.1 release:
http://people.planetpostgresql.org/andrew/index.php?/archives/255-JSON-for-PG-9.2-...-and-now-for-9.1!.html

--
http://www.glasswings.com/
1. Re:Native JSON fields by DiogoBiazus · 2012-05-16 02:19 · Score: 1
  
  You can also use the hstore contrib module in the current PostgreSQL versions, which has a gem (https://github.com/softa/activerecord-postgres-hstore) for ActiveRecord and Heroku support (at least in some database plans). Seriously, PostgreSQL is the best Not-Quite-So-SQL DBMS out there.
Re:Has to be said by TheSpoom · 2012-05-16 02:16 · Score: 1

I hadn't read the article this was based on before, thanks for the laugh. I encourage others to Google "webscale" :^)

--
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs
Then what's it called instead of a join? by tepples · 2012-05-16 02:19 · Score: 5, Insightful

In NoSQL systems such as MongoDB and CouchDB, what do you call the operation where you retrieve one document, pull an identifier out of that document, and use that identifier as the key to retrieve another document?
1. Re:Then what's it called instead of a join? by mj1856 · 2012-05-16 02:30 · Score: 3, Interesting
  
  Not sure about MongoDB or CouchDB, but I have experience with RavenDB, which is absolutely fantastic. Instead of "joins" you have "includes" or "live projections". See http://ravendb.net/docs/client-api/querying/handling-document-relationships
2. Re:Then what's it called instead of a join? by Anonymous Coward · 2012-05-16 02:31 · Score: 5, Funny
  
  Witchcraft.
3. Re:Then what's it called instead of a join? by Eponymous+Coward · 2012-05-16 02:50 · Score: 1
  
  If you have a lot of related data, then you should probably be using a relational db, no?
4. Re:Then what's it called instead of a join? by TheSpoom · 2012-05-16 05:44 · Score: 1
  
  In NoSQL systems such as MongoDB and CouchDB, what do you call the operation where you retrieve one document, pull an identifier out of that document, and use that identifier as the key to retrieve another document?
  A badly architected system that should probably be an RDBMS. If you're using a NoSQL database that way, you don't understand the use case for NoSQL (or RDBMS, for that matter).
  Imagine doing the same thing against MySQL. Now you're making two trips to the database where only one is necessary. See the problem? Multiply that delay by several queries and it adds up.
  
  --
  It's better to vote for what you want and not get it than to vote for what you don't want and get it.
  - E. Debs
5. Re:Then what's it called instead of a join? by bingbing · 2012-05-16 07:27 · Score: 1
  
  hmm, to use a NoSQL to avoid JOIN? A RDBMS can be designed to avoid joins easily with flat tables or text columns. So far, the only values I see from noSQL solutions is to sacrifice ACID for horizontal scalability to fit the user scenarios where ACID is not that critical. Everything else can be covered by RDBMS easily.
6. Re:Then what's it called instead of a join? by TheSpoom · 2012-05-16 09:04 · Score: 1
  
  So far, the only values I see from noSQL solutions is to sacrifice ACID for horizontal scalability to fit the user scenarios where ACID is not that critical.
  Bingo. There are a few of those use cases though (like the example I mentioned earlier, internal logging), especially when a lot of data is being generated / stored.
  
  --
  It's better to vote for what you want and not get it than to vote for what you don't want and get it.
  - E. Debs
7. Re:Then what's it called instead of a join? by shutdown+-p+now · 2012-05-16 12:00 · Score: 1
  
  Imagine doing the same thing against MySQL. Now you're making two trips to the database where only one is necessary.
  Erm, GP is specifically asking about doing a single trip to the database. That's why he mentioned joins!
8. Re:Then what's it called instead of a join? by julesh · 2012-05-16 23:19 · Score: 1
  
  hmm, to use a NoSQL to avoid JOIN? A RDBMS can be designed to avoid joins easily with flat tables or text columns. So far, the only values I see from noSQL solutions is to sacrifice ACID for horizontal scalability to fit the user scenarios where ACID is not that critical. Everything else can be covered by RDBMS easily.
  Relationships from data item A to data item B where the type of item B is not known in advance. RDBMSs suck at handling this.
9. Re:Then what's it called instead of a join? by Hognoxious · 2012-05-17 04:16 · Score: 1
  
  Documents are an outdated paradigm. Aspects are like molecules of data. Aspects do not complicate your stack. Aspects are agile. Aspects are webscale.
  
  --
  Confucius say, "Find worm in apple - bad. Find half a worm - worse."
PICK by kibbey · 2012-05-16 02:21 · Score: 2

Hop into the wayback machine and fire up any flavor of PICK. The database where schema is applied on use, not on storage. No length limits on fields and very fast on old hardware (really fast on new). Storing bits of xml and code are no problem. And for those users who simply must have SQL, many versions will support that too (UniData and UniVerse are two examples). It's not cool, not new, but it does work.
Re:Has to be said by emorning · 2012-05-16 02:24 · Score: 1

MongoDB is Webscale. MySQL is not Webscale, because it uses joins.
I see this comment a lot and I don't get it. Somehow Map/Reduce is considered *webscale* and joins are not. But I can implement a join with Map/Reduce so why don't the webscale databases provide join functionalty that is built on top of Map/Reduce and save us all the trouble of reinventing the wheel?
Riak by platypusfriend · 2012-05-16 02:31 · Score: 1

We're using a four node Riak cluster, in a production environment spanning two datacenters, and have zero downtime because we can lose two nodes (for example, a whole datacenter) and still read/write. Maybe CouchDB can do that; maybe it can't. Maybe our setup isn't even the best. But, as many people have already pointed out, there is a time and a place for a NoSQL solution, and you can have "near zero" (or, so far for us, zero) downtime. Here's the catch: We are using Secondary Indexing and MapReduce, in Riak, but for searchable analytics we fire off a stripped dataset to a smaller, less-redundant MySQL store.
Urban Airship by jjohnson · 2012-05-16 02:33 · Score: 3, Interesting

Urban Airship went PostgreSQL to MongoDB to Cassandra to PostgreSQL. http://wiki.postgresql.org/images/7/7f/Adam-lowry-postgresopen2011.pdf
It's a good presentation because they're in love with none of them and are moving for specific reasons each time, handling different issues. It's not coders chasing the new hotness.

--
Anyone who loves or hates any language, platform, or manufacturer, doesn't know what they're talking about.
Kneejerk Reaction by mj1856 · 2012-05-16 02:34 · Score: 1

So they love NoSQL but had a bad experience with CouchDB. And the solution is... move away from NoSQL??? Ridiculous. CouchDB is certainly NOT the only solution out there, and they all have their strengths and weaknesses.
I've recently switched from MS SQL Server to RavenDB http://www.ravendb.net/ and I am never going back! Many of the things they found wrong with CouchDB and MongoDB are superior in RavenDB.
1. Re:Kneejerk Reaction by gbjbaanb · 2012-05-16 02:45 · Score: 1
  
  what did they find wrong with MongoDB, apart from rumour.
  I doubt RavenDB is any different really, it looks and feels the exact same (except written in .NET so you have a lot of garbage and resource issues to deal with), same as Cassandra (written in java).
  Maybe erlang isn't as good as they say, or maybe CouchDB isn't as well written as they say.
Not quite true by Viol8 · 2012-05-16 02:39 · Score: 4, Informative

If all your application is ever going to do is read and write to fixed sized record structured data with little relational (or any) attributes then COBOL will suit you fine as that's what it was designed for. Unfortunatly those sorts of apps are few and far between these days, but in its ever decreasing niche COBOL is still good.
1. Re:Not quite true by Bill,+Shooter+of+Bul · 2012-05-17 06:53 · Score: 1
  
  No, COBOL is bad. Your post is prety much exactly what I meant when I said its bad. Its not popular. Its not the next big thing that people want to learn. This is why people don't want anything like it, but people do want things like c. C is still cool, still popular, still gets the chicks.
  
  --
  Well.. maybe. Or Maybe not. But Definitely not sort of.
Normalisation isn't a panacea by Viol8 · 2012-05-16 02:44 · Score: 1

Yes it makes sense up to a point , but it starts to suffer from the law of diminishing returns and at some point having to do complicated multi-table joins actually slows down your queries so much that it becomes simpler and faster to suffer duplicate data than normalise to the Nth degree.
1. Re:Normalisation isn't a panacea by Xest · 2012-05-16 02:49 · Score: 2
  
  It depends on the task though, I'd wager 90% of SQL work that is done by developers day to day isn't in such a performance sensitive environment that it needs to favour performance over normalisation, and I agree with the GP, there's far too many developers out there that just don't do it and hence simply don't have the performance excuse. It really is just bad database design as a result of incompetence most the time.
2. Re:Normalisation isn't a panacea by gorzek · 2012-05-16 02:57 · Score: 2
  
  I can definitely see the value in making an informed tradeoff, but like you said, a lot of the time it's not an informed decision--they just do it to make it work and don't really have the expertise to know which is the right way to go. I've definitely seen enough bad database designs to know that most developers just have no clue how to design them. The worst I've seen had bad designs and poor performance, and were built in a completely ad hoc manner without any eye toward maintainability, performance, or data integrity/consistency. The philosophy was just "make it work."
  I think developers need to realize that databases are a lot like code: first you prototype, then you throw away the prototype and do it right. (Then again, plenty of developers just keep the prototype and use it for production.)
  
  --
  Check out my world simulator thingy.
3. Re:Normalisation isn't a panacea by gorzek · 2012-05-16 03:02 · Score: 4, Insightful
  
  Yeah, it really depends on what you are doing. But any time you break normalization there should be a good reason. Performance is certainly a valid reason. "I'm too lazy to make a well-designed database," however, is not.
  If you find yourself breaking normalization all the time, then you've probably found a use case where a relational database isn't the best tool for the job.
  While there is a "right" way to use a given tool, there is no one tool that is right for every situation. People who get this backwards are zealots and will often make poor decisions.
  
  --
  Check out my world simulator thingy.
4. Re:Normalisation isn't a panacea by siride · 2012-05-16 03:39 · Score: 2
  
  And in many databases, there'd be more performance gains from proper normalization than pre-mature optimization. I'm working with a legacy database that has this problem. Proper normalization would probably make it lightning fast, but instead it's slow as fuck because too many concerns are put in one table when they should be put in several tables. Also, it uses functions to retrieve values, which is just...so wrong.
5. Re:Normalisation isn't a panacea by K.+S.+Kyosuke · 2012-05-16 03:55 · Score: 1
  
  Yes it makes sense up to a point , but it starts to suffer from the law of diminishing returns and at some point having to do complicated multi-table joins actually slows down your queries so much that it becomes simpler and faster to suffer duplicate data than normalise to the Nth degree.
  The question is whether this should be solved at the conceptual model level. As a developer, I don't care whether the database cheats and duplicates something to speed things up, as long as I don't have to do it in the data model and as long as the implementation is correct. The same logic applies to CPU caches and compiler optimizations. The computer is allowed to "cheat" if it can prove that the shortcut is correct. But you shouldn't be forced to do it manually, since it only makes your code (and data structures) complex and more brittle and muddles the semantics of your code for anyone trying to read it and modify it.
  Put simply, if current RDBMS implementations don't manage to run queries over properly designed relational data properly, it's the vendor's fault, not yours. You need a sufficiently clever DB engine, that's all, just as we have sufficiently clever compilers (for most tasks) these days and don't need to write in assembly most of the time. (We could have some, say, in twenty years. That's about the difference of age between first modern compilers and first modern databases.)
  
  --
  Ezekiel 23:20
6. Re:Normalisation isn't a panacea by HornWumpus · 2012-05-16 13:05 · Score: 1
  
  I'd say 90% of database apps are so simple there is no reason to favor normalization over simplicity.
  The classic example is invoice, lineitem. Do you keep a total in invoice or total up lineitem every time. Do you keep the price in Line Item or do you join to Inventory with ItemId and date? How do you handle special pricing? It gets complicated real fast if your dealing with someone who normalizes religiously.
  That's just third normal form (no redundant data). While I've run into fools who think Indexes are optional and a new column fixes anything. I've also run into fools who think _everything_ should be normalized as if it's part of a transaction processing system. For TPS highly normalized is high performance as it minimizes lock contention.
  
  --
  John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
7. Re:Normalisation isn't a panacea by julesh · 2012-05-16 22:51 · Score: 1
  
  The classic example is invoice, lineitem. Do you keep a total in invoice or total up lineitem every time. Do you keep the price in Line Item or do you join to Inventory with ItemId and date? How do you handle special pricing? It gets complicated real fast if your dealing with someone who normalizes religiously.
  Whereas not normalizing tends to lead to bugs. I've worked on a system that keeps total in invoice (or rather in a basket which was converted to an invoice by the checkout procedure) which had a flaw where if you submitted a new item to the basket and hit the 'checkout' button at the same time you could get that last item without paying for it about 10% of the time. Just because the original developer thought keeping a total price in the basket would be simpler than adding it up every time it was needed.
  Price in line item, however, makes a little more sense -- keeping a history of price changes is a little excessive for something that really doesn't need to be tied back to its source like that. Special pricing, however, should be handled with a normal line item with the price copied directly from the product followed by an extra line item showing discount. You get better reporting that way.
Re:Has to be said by i_ate_god · 2012-05-16 02:47 · Score: 1

I read this entire thread wondering if "Webscale" is really some kind of valid term or not. I'm not convinced that it has any valid meaning whatsoever.

--
I'm god, but it's a bit of a drag really...
Oh, the joy! by pz · 2012-05-16 02:49 · Score: 1

... the joy of schemaless DBs.
You mean working with a file system and not using a DB at all, not needing to pay a DBA, not dealing with corrupted databases, not using arcane tools, etc.?
I jest, but not entirely. Clearly there are purposes for which databases are the right toold for the job. I'm most definitely not convinced that big blob storage is one of them.

--

Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
1. Re:Oh, the joy! by The+Moof · 2012-05-16 03:20 · Score: 2
  
  not using arcane tools
  I know database concepts are difficult for some people, but it's by no means magic.
2. Re:Oh, the joy! by pz · 2012-05-16 03:58 · Score: 1
  
  not using arcane tools
  I know database concepts are difficult for some people, but it's by no means magic.
  Sorry, I beg to differ. You select a DB. Turns out that's just the interface, and you have to *then* select the actual DB engine. Some engines / databases allow checking for and repair of corruption on-line, some don't. There's locking. Line level, table level, database level. Oh, wait, you didn't know about tables vs databases? What do you do when your query takes too long? Didn't you know about connecting before making a querry, persistent connections, and how to interpret obscure error messages? What do you do when there is a corruption? Binary replication? Non-binary replication? I can export a table to CSV only when there is write access to the filesystem on the remote host? And don't get me started on the query syntax.
  This is all black magic to non-DBAs. It is arcane. When I use a file system for storing blobs -- as simple files -- I don't need a DBA. Back in the early years of modern computers, you needed a file system administrator, for more-or-less the same reasons that you need a DBA now: file systems were fragile. Now, file systems are one of the most reliable parts of computer systems: they Just Work. Until databases evolve to the point that they Just Work, and don't require hiring a specialist, there will be arcana.
  
  --
  
  Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
3. Re:Oh, the joy! by vadim_t · 2012-05-16 08:15 · Score: 1
  
  This is all black magic to non-DBAs. It is arcane. When I use a file system for storing blobs -- as simple files -- I don't need a DBA. Back in the early years of
  modern computers, you needed a file system administrator, for more-or-less the same reasons that you need a DBA now: file systems were fragile. Now, file systems are one of the most reliable parts of computer systems: they Just Work
  That's a dangerous and unfounded assumption.
  Filesystems are reliable only at the metadata level: if you yank the cord, the system will still boot afterwards. You won't end up with a filesystem doesn't mount, or where the system manages to mix two files up in such a way that writing to one damages another. But that doesn't guarantee much about your precious data. A half written file will be half written, or even corrupted, unless precautions are taken. And those precautions (in which order to write, how to ensure your data is safe, when to fsync) are just as arcane as the database stuff, if not more. Because databases deal with that crap internally and give you a much simplified interface.
  Take for instance a simple exercise: writing .jpg files with product images. But just because the file is there doesn't mean it's good, if you crash at the wrong time you might have half an image, or something that seems to be the file, but is really full of junk inside. So you need some way to determine when a file is really fully written. So now you're keeping a list of checksums which is prone to the same corruption, or doing durable renames, which is arcane magic, and still goes wrong when filesystem developers find another way to optimize while following the very lax spec (see the ext4 debacle)
Couchdb is great when it's used right by thanosv · 2012-05-16 02:52 · Score: 1

Clues to the source of some of Sauce Labs' problems can be gleamed from their list of Maintenance headaches:

View indexes are only updated when queried — insertion does not update the index. That means you have to write a script to periodically run all your views, unless you want them to be surprisingly slow when they haven’t been queried in a while. In practice we always preferred view availability to any performance boost obtained by not updating indexes on insertion, but writing reliable scripts to keep view indexes up to date was tricky.
Oh please doing an HTTP GET periodically is tricky ??? No it's not. With couchdb if your database is very dynamic you should either index periodically and very frequently. This creates a quantified and controlled performance demand on the server. Ideally read servers should never be the write server and replication should be filtered. Using Couchdb naively will lead to failure. Don't use javascript views, python views are 3-4 x faster, erlang views are 7-10 faster. Used in the right way and following the many tips that you can get from the Couchdb community will make Couchdb not just a great database but a great application platform.
And yes 1.2 is a great improvement.
Currently I'm using coucdb, mongodb, and MySQL all in one high profile project handling terabytes of data and millions of hits. Each has it's use. When it comes to reliability and performance all three DBs are NOT my problem.
1. Re:Couchdb is great when it's used right by Eponymous+Hero · 2012-05-16 04:26 · Score: 1
  
  coughcoughsandyvaginacoughcoughcough
  
  --
  insensitive clod overlords obligatory xkcd car analogy russian reversals whoosh pedant fanbois ftfy in 3...2...1..PROFIT
Re:Has to be said by emorning · 2012-05-16 03:13 · Score: 1

Yes, traditional joins run in a single process but AFAIK there no reason why a join could not be distributed across many processes by implementing the join with Map/Reduce. Right?
Re:Has to be said by dririan · 2012-05-16 03:16 · Score: 1

It's a quote.
Re:Has to be said by The+Moof · 2012-05-16 03:24 · Score: 1

I see this comment a lot and I don't get it.
Here you go. Now you'll get why it's funny, and not serious.
CouchDB just didn't work by Animats · 2012-05-16 03:29 · Score: 2

a majority of our unplanned downtime was due to CouchDB issues
Nowhere on the CouchDB home page is reliability even mentioned. And that's the real issue. Developing a reliable database system is a difficult design and programming task. It requires real software engineering. The hacks who write PHP and use JSON aren't up to a job like that. The "aw, we'll fix it in the next release" attitude doesn't cut it in databases.
1. Re:CouchDB just didn't work by tokul · 2012-05-16 04:46 · Score: 1
  
  Nowhere on the CouchDB home page [apache.org] is reliability even mentioned.
  
  If Microsoft says that Windows server is the most reliable piece of software, will you believe it?
  Reliability is not marketing material. If you don't have it, you have reports in your issue tracker.
So what do they do again? by FilmedInNoir · 2012-05-16 03:30 · Score: 1

They do cross-browser testing. Right? I can't really figure out what they sell or why I would want it. The real question, "Why haven't I started a mismanaged company with ambiguous products?"

--
Sig. Sig. Sputnik
SQLite static typing by tepples · 2012-05-16 04:11 · Score: 1

Typing is just another constraint, like foreign keys, and various other domain constraints. I cannot see any valid argument for having foreign keys but not type constraints. It jest seems bizarre to me. It's not like they could be optional or anything.
They are optional. It appears you can enforce static typing for a column with constraints like CHECK(typeof(x)='integer'). I'd give more details, but the document that Wikipedia cites about such constraints is a printed publication of which I happen not to own a copy.
1. Re:SQLite static typing by serviscope_minor · 2012-05-16 04:18 · Score: 1
  
  That's interesting. I'll have to look into that.
  It's more than a little mildly annoying that when you specify the type, it doesn't check it, more you have to specify it twice. Anyhow, if it can be beaten into shape, then that makes it more acceptable.
  
  --
  SJW n. One who posts facts.
COBOL is cool! by NotesSensei · 2012-05-16 04:28 · Score: 4, Funny

In what other language would this statement compile without error:

PERFORM makemoney UNTIL rich.

(Note the the full stop at the end)
If all you have is a hammer by tepples · 2012-05-16 04:31 · Score: 1

And someone with experience in the entity-relationship modeling that underlies a relational database schema is likely to see everything as "a lot of related data". For example, a Slashdot comment is related to a parent (a story or another comment), the user who posted it, and the moderations done to that comment.
Re:Has to be said by Alex+Zepeda · 2012-05-16 05:06 · Score: 2

So the thing is, traditional joins (on, say, Postgres or MySQL) aren't blocking operations. You can run more than one at a time. MapReduce (as well as writes, any aggregation, and any use of JavaScript) are blocking operations on Mongo. They block the entire mongo process. The MapReduce case gets around this with a bit of cooperative multitasking (yielding every few hundred or thousand rows), but writes, aggregation, and other use of javascript do not. So there's already a much bigger need to distribute MapReduce on Mongo than there is to distribute a JOIN on an SQL database.
Plus, MapReduce on Mongo is painfully slow, so you'll need to break things down into really small partitions to scale at all. Aggregating 800,000 documents (group by + sum) took me about 20 seconds with Mongo using the existing aggregation framework (which is universally credited with being faster than the MapReduce case). Porting the whole thing over to an SQL database allowed me to a.) not block the entire freaking process and b.) run the query in about 800ms.
So, sure, we could have partitioned the data and spread it across multiple nodes. Maybe that would have been faster (but you can only run MapReduce across multiple nodes, using the existing aggregation framework you can only operate on one node). Dunno. But it would have been a lot more expensive since we would have required more hardware to accomplish the same thing that an SQL database is optimized for.
The reason that "they" mock JOINs is because you simply can't do that efficiently with Mongo.

--
The revolution will be mocked
Re:Has to be said by emorning · 2012-05-16 06:33 · Score: 1

That's an informative reply, thanks for taking the time to post it.
Both sides are right and wrong by greywire · 2012-05-16 06:35 · Score: 1

I thought the article referred to here particularly interesting because it puts forth the idea that, essentially, NoSQL and Relational DB's are just two ways of looking at the same thing, and thus are both equally valid. Right now they are used situationally (at best) but I think you will see a convergence at some point with something "new" that can do both equally well using a common core and different wrappers around it.

--
-- Senior Software Engineer, Attorney appearance services, locallawyerapp.com.
Schemaless? by AlchemyX · 2012-05-16 19:22 · Score: 1

It is kind of a weird word, isn't it? Almost as shemales.
text versus CLOB by tepples · 2012-05-16 23:46 · Score: 1

like not having to declare varchar sizes versus text versus CLOB versus BLOB.
What's the difference between a text column and a character large object column? MySQL doesn't appear to have distinct types for the two.

They're all stored and indexed and searchable in the same way (not that you'd want to index any sort of LOB.)
If CLOB is the same thing as text, then why wouldn't you want to support a full-text index on a CLOB? That said, I tend to write my own full-text indexing tools to work around the limitation in MySQL's full-text index: no words shorter than four characters unless you're root, no common English words unless you're root, etc.
1. Re:text versus CLOB by Christian+Smith · 2012-05-18 00:21 · Score: 1
  
  like not having to declare varchar sizes versus text versus CLOB versus BLOB.
  What's the difference between a text column and a character large object column? MySQL doesn't appear to have distinct types for the two.
  Some databases limit the size of text and/or varchar data, so that rows have a limited size. Sybase, for example, used to have a 255 character limit in varchars, and a 2048 byte limit for row size (off the top of my head, details may be wrong.) To store any text larger than 255 characters, the data was effectively stored offline from the row, in a special table, with a reference to the large text in the row.
  DB2 also has row size limits, rows not being able to be bigger than ~0.5 the tablespace page size, so this can also require text to be stored offline in LOB tables.
  
  They're all stored and indexed and searchable in the same way (not that you'd want to index any sort of LOB.)
  If CLOB is the same thing as text, then why wouldn't you want to support a full-text index on a CLOB? That said, I tend to write my own full-text indexing tools to work around the limitation in MySQL's full-text index: no words shorter than four characters unless you're root, no common English words unless you're root, etc.
  I was talking about a table index. As in:
  CREATE INDEX foo_idx ON foo(textfield);
  This is not the same as a full text index, which generally stores an index row for each distinct word in the source row.
2. Re:text versus CLOB by tepples · 2012-05-18 02:15 · Score: 1
  
  Some databases limit the size of text and/or varchar data, so that rows have a limited size.
  For example, MySQL has VARCHAR and VARBINARY, which can be used in a key but are limited in length, and TEXT, MEDIUMTEXT, and LONGTEXT types for text and BLOB, MEDIUMBLOB, and LONGBLOB types for binary, which cannot be used in a key. These are limited to 64 KiB, 16 MiB, and 4 GiB respectively. One of the MySQL storage types for these long text and binary types stores the first 256 bytes in the main table and the rest in a separate table, but this is transparent to the application. So in MySQL, VARCHAR is a string and TEXT is a CLOB; I was just confused as to what you meant by a "text" type.
  
  This is not the same as a full text index, which generally stores an index row for each distinct word in the source row.
  Yeah, that's how the application-level full-text index on PhilsHobbyShop.com works, using a table with columns documentid, wordid, and freq. It's just that MyISAM (and no other MySQL engine) has its own built-in full-text index functionality, which I don't use because of its own drawbacks (MyISAM's lack of crash safety, combined with stopword limitations if you're not root on the database server).
Advantage with schema? by tepples · 2012-05-16 23:58 · Score: 1

Richard_at_work wrote:

XML field type [in] MS SQL Server [...] allows you to essentially keep the table schema-less
shutdown -p now wrote:

it also permits you to specify an XML Schema for the field
But what's the advantage of XML with a schema over just parsing the XML in the application and putting each value in a column? Perhaps I'm having trouble seeing it because my mind is trapped in a relational paradigm.
1. Re:Advantage with schema? by shutdown+-p+now · 2012-05-17 04:18 · Score: 1
  
  XML Schema is more flexible in terms of what it can describe than a bunch of columns - it describes a tree-like structure, after all, not just a sequence of elements all on the same level, and it doesn't require a specific number of them. Nor does it require strong typing - you can just leave an element as xs:any, or, say, specify it as a choice of xs:string or xs:integer. So, in practice, it lets you associate an arbitrary number of named values with every record, as well as defining some hierarchical structure between those values. And then fast XPath/XQuery queries over that - again, a query language tailored for tree structures, with operators such as depth search.