Yale Researchers Prove That ACID Is Scalable

Pfah. by stonecypher · 2010-09-01 04:53 · Score: 5, Interesting

NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google and Walmart that make the sites that built these databases in desperation look positively tiny.

Digg's engineers wear clown shoes to work.

--
StoneCypher is Full of BS

Re:Pfah. by Anonymous Coward · 2010-09-01 05:00 · Score: 1, Interesting

NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL [...]
Is MySQL ACID?
Re:Pfah. by TheSunborn · 2010-09-01 05:05 · Score: 3, Insightful

It was newer database size which were the problem but the number of queries per second(Aka performance) which could be executed.
You can run a Google size database from MySQL, but you can't use to MySQL* to implement a search solution with performance like Google, without requiring much much much hardware.
*Or an other sql database.
Re:Pfah. by mini+me · 2010-09-01 05:06 · Score: 5, Insightful

NoSQL is not really about scalability, it is about modelling your data the same way your application does.
There is a strong disconnect between the way SQL represents data and the way traditional programming languages do. While we've come up with some clever solutions like ORM to alleviate the problem, why not just store the data directly without any mapping?
I am not suggesting that SQL is never the right tool for the job, but it most certainly is not the right tool for every job. It is good to have many different kinds of hammers, and perhaps even a screwdriver or two.
Re:Pfah. by bluefoxlucid · 2010-09-01 05:14 · Score: 5, Interesting

NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google
Google uses BigTable, a NoSQL database.

--
Support my political activism on Patreon.
Re:Pfah. by Shados · 2010-09-01 05:19 · Score: 1

Depends for what part, but Walmart's site runs at least partly on a "NoSQL" (I use the term loosely in this case) system.
Re:Pfah. by TooMuchToDo · 2010-09-01 05:27 · Score: 2, Insightful

Google initially used MySQL for Adwords, tried to switch away from it, and then switched back (if I recall correctly). Your Googling May Vary.
Re:Pfah. by bluefoxlucid · 2010-09-01 05:28 · Score: 5, Insightful

There is a strong disconnect between the way SQL represents data and the way traditional programming languages do.
Yes but there is a strong disconnect between computer RAM and information. Computer RAM contains DATA; information comes in associated tables. Relational databases represent data in tables with indexes, keys, etc. A Person is unique (has a unique ID), but they may share First Name, Last Name, and even Address (junior/senior in same household). There are many Races, and a Person will be of a given Race (or mix, but this is horribly difficult to index anyway). A Person will own a specific Car; that Car, in turn, will be a particular Make-Model-Year-Trim, which itself is a hierarchy of tables (Trim and Year are pretty separate, Model however will be of a particular Make, while a particular car available is going to be Model-Year-Trim).
Indexing and relating data in this way turns it into information, which is what we want and need. Separating the data eliminates redundancies and lets us use fewer buffers along the way, crunching down smaller tables and making fast comparisons to small-size keys before we even reference big, complex tables. Meanwhile, we're still essentially asking questions like "Find me all people who own a 1996-2010 Year Toyota Prius." Someone might own 15 cars, so we're looking in the table of all individual Cars with MYT where table MYT.Model = (Toyota Prius) and .Year is between 1996 and 2010, and pulling all entries in table Persons for each unique Cars.Owner = Persons.ID (an inner join).
Information theory versus programming. We're studying information here. We might have something more interesting to do than look in a giant array of Cars[VIN] = &Owners[Index]. For the actual data, the model we use makes sense; programmers get an API that says "Yeah, ask me a specific structured question and I'll give you a two-dimensional array to work with as an answer." That two-dimensional array is suitable for programming logic to manipulate specific structured data; extracting that data from the huge store of structured information is complex, but handled by a front-end that has its own language. You tell that front-end to find this data based on these parameters and string it together; it does tons of programming shit to search, sort, select, copy, and structure the data for you.

--
Support my political activism on Patreon.
Re:Pfah. by DragonWriter · 2010-09-01 05:29 · Score: 4, Insightful

NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google and Walmart that make the sites that built these databases in desperation look positively tiny.
Database size was never the main driving force beyond the new move toward NoSQL databases. Support for distributed architectures is. In part, this is about handling lots of queries rather than handling lots of data; it also -- particularly if you are Google -- deals with latency when the consumers of data are widely distributed geographically.
And note that one of the companies that is heavily involved in building, using, and supplying non-SQL distributed databases is Google, who, as you so well point out, is very much aware of both the capabilities and limits of scaling with current relational DBs.
This new research may offer new prospects for better databases in the future -- but TFA indicates that the new design has a limitation which seems common in distributed, strongly-consistent system "It turns out that the deterministic scheme performs horribly in disk-based environments".
In fact, given that it proposes strong consistency, distribution, and relies on in-memory operation for performance, it sounds a lot like existing distributed, strongly-consistent systems based around the Paxos algorithm, like Scalaris. And it seems likely to face the same criticism from those who think that durability requires disk-based persistence, and that replacing storage on disks (which, one should keep in mind, can also fail) with storage in-memory simultaneously on a sufficient number of servers (which, yes, could all simultaneously fail, but durability is never absolute, its at best a matter of the degree to which data is protected against probable simultaneous combinations of failures.)
So -- reading only the blog post that is TFA announcing the paper and not the paper itself yet -- I don't get the impression that this is necessary are giant leap forward, though more work on distributed, strongly-consistent databases is certainly a good thing.
Re:Pfah. by Splab · 2010-09-01 05:33 · Score: 5, Funny

"Your Googling May Vary."
Yes, that is exactly the problem with NoSQL.
Re:Pfah. by sarkeizen · 2010-09-01 05:40 · Score: 1

Isn't that a very specific context though? The underlying assumption seems to be that there is one dataset per application. Which may well be the general case - in other words what is the "same way your application does but without mapping" when your applications are written in different frameworks, languages or when the data is accessed via say an reporting environment?
Re:Pfah. by Trieuvan · 2010-09-01 05:41 · Score: 5, Informative

It is if you use innodb .
Re:Pfah. by Anonymous Coward · 2010-09-01 05:47 · Score: 1, Insightful

Right, raw size is only one component. As a practical matter, if you have 100 trillion records in a DB, you probably also have ferocious insertion and query rates, as well. Not enforcing ACID has its advantages under those conditions.
Whether such a tact was logically required is an interesting question...
Re:Pfah. by Tablizer · 2010-09-01 05:55 · Score: 1

For this reason I suggest that app language designers work on better fitting RDBMS and SQL rather than the other way around (at least for data-driven apps). OOP may be nice, but it inherently conflicts with relational concepts and patterns. Generally, one is based around attribute-handling idioms and the other behavior-handling idioms. OOP also tends to be nested, hierarchical, and/or graph-shaped; while relational is set-centric. Either you de-emphasize one or the other, or deal with complicated and expensive translation layers. Barring some revolutionary breakthrough, something has to give. Right now it's like men wearing womens' underwear and vice-verse.

--
Table-ized A.I.
Re:Pfah. by GWBasic · 2010-09-01 06:01 · Score: 2, Insightful

NoSQL is not really about scalability, it is about modelling your data the same way your application does.
I 100% agree. Earlier this year I created a moved a prototype application built around SQLite and flat files to MongoDB. MongoDB is SQL-like in its ability to have queries and indexes; but it stores its data in a way that doesn't require me to deconstruct all of my data structures into tables. This dramatically reduced complexity in code that used to deal with 5-6 SQLite tables. In the case of MongoDB, I was able to replace 5-6 tables with a single collection of structured documents. MongoDB lets me write queries against data that's deeply-nested, yet it can return the full data structure so I don't have the performance hit (and programmer time hit) of running (and writing) many queries to hydrate data structures around foreign key relationships.
The other advantage to MongoDB is that its schemaless approach makes it much easier to handle inheritance. I can have documents with common parts for base classes, and varying parts for child classes. This is much harder in SQL, because I either need to design a super-table that can handle all variations of the base class, or I need to use a multi-join around all potential classes that I can query. MongoDB's document-based approach, as opposed to SQL's table approach, lets me write a single query that can handle future subclassing of the data, and future variations of the data.

--
No, I will not work for your startup
Re:Pfah. by Tablizer · 2010-09-01 06:10 · Score: 2, Funny

After all, MySql is why slashdot is so relia~ `} v* m& + ' ,

--
Table-ized A.I.
Re:Pfah. by Anpheus · 2010-09-01 06:17 · Score: 3, Insightful

Well, and if you don't need it [the guarantees of ACID], why pay for it? I mean, if you have to spend any amount of time thinking about "How do I make that work?" that's a cost.
Whereas if all you care about is updating individual records without global consistency, well, don't enforce global consistency.
Re:Pfah. by GooberToo · 2010-09-01 06:26 · Score: 4, Funny

Funny. Insightful. Informative. So many options with your post. I'm sure at least one moderator will get it figured out.
Re:Pfah. by jimrthy · 2010-09-01 06:49 · Score: 1

I'm just curious...why was this modded funny?
I'm sure google uses some flavor of RDBMS for some things. But not for their bread-and-butter.
Re:Pfah. by jimrthy · 2010-09-01 06:52 · Score: 1

That's an interesting bit of knowledge. According to wikipedia, you're right: they tried to switch to Oracle, but it was too slow.
Re:Pfah. by snadrus · 2010-09-01 06:55 · Score: 1

SQL can turn frameworks into mere reporting applications. If all logic, constraints, triggers, etc are in the SQL structure, there's nothing left to do but show & decorate a report, and dumb input that pushes to a smart SQL backend.

This also multi-processes for programmers and is easier to audit when the saves and logic are together.

--
Science & open-source build trust from peer review. Learn systems you can trust.
Re:Pfah. by h4nk · 2010-09-01 06:55 · Score: 5, Insightful

Well said. This "problem" has more to do with architects and developers understanding the concepts of layering and information hiding. When programmers are allowed to dictate architecture under the pretense that certain interfaces to a Service should determine the structure of the Information itself, there is a huge problem at the business level. How does this happen? Uninvolved, or under-skilled DBAs and data architects. This is their job. My experience is that business managers and programmers have always seen the database as some sort of necessary evil without understanding its full purpose. Too many programmers with very little database experience are given direct access to databases themselves. The motivation of "Get it to work" takes precedence over well-researched and proven approaches, approaches that will only benefit in the long run. Companies that implement poor strategies for the sake of short-term gains usually have the idea that the best approach is somehow the one that takes the most time to implement. Short-sighted solutions are put into play and almost as soon as they are implemented, the scalability and data requirement issues begin to crop. These poor strategies are often the result of inexperience and poor education on all levels. This is why it is so important to hire people that really know what they are doing from C-level management down to the programmers. I have seen bad thinking gut companies. A service built on sound architecture will have issues maturing, not doubt. How well it matures depends on the wisdom and skill of the company.
Re:Pfah. by bsdaemonaut · 2010-09-01 06:55 · Score: 1

Since when does Google use a traditional SQL database? Last I checked they used their BigTable, which is anything but traditional.
Besides, supporting reads on a scalable system has never been a problem, the problem is supporting scaleable, ACID compliant writes in a reasonably quick and efficient manner. Most systems end up having to have a single authoritative master because there is no way to insure that the data is updated both quickly and consistently over a range of "masters."
Re:Pfah. by smooth+wombat · 2010-09-01 07:01 · Score: 4, Funny

they tried to switch to Oracle, but it was too slow.

What? Oracle too slow? How dare you besmirch the all-powerful Larry Ellison. We switched from a mainframe environment which handled all our sales data to an Oralce-based ERP system. I'll show you how fast this puppy now runs. Let me show you our sales data for the last month...

Hang on, I'll get the answer in a minute...

Bear with me, it will be here soon...

Here's a bottle of Mountain Dew while you wait...

Can I get you anything to snack on? M&M's? Doritos? A Snickers bar perhaps?

--
We will bankrupt ourselves in the vain search for absolute security. -- Dwight D. Eisenhower
Re:Pfah. by bsdaemonaut · 2010-09-01 07:08 · Score: 2, Insightful

NoSQL has a lot to do with scalability. Sure there's other reasons, but not enough to recommend them over hash databases. Hash databases have been around for decades which do what you propose and a lot more, their main con is the lack of scalability -- hence NoSQL. BerkeleyDB is an example, but it's a list to huge to continue..
Re:Pfah. by bluefoxlucid · 2010-09-01 07:17 · Score: 1

Nobody bothered to Wikipedia BigTable.

--
Support my political activism on Patreon.
Re:Pfah. by jimrthy · 2010-09-01 07:21 · Score: 1

NoSQL is a horribly misleading term--as the article points out. Unfortunately, they seemed to miss the point as well.
Then again, as most of the follow-up posters demonstrate, what the phrase actually means doesn't seem to be all that well defined. In most cases, AFIACT, it actually means a non-RDBMS database.
I'm not sure why you think google kicked off the NoSQL movenment with Big Table, but a lot of the talks I've watched from Google I/O about database performance on app engine definitely seem to imply that performance was a pretty important factor.
When web developers start doing scratch figures about the seek time performance on disk heads, and they're serious about it being a major limiting performance factor, it's probably worth assuming they have a reason for it. (Especially when TFA mentions that their current implementation runs horribly if they have to access hard drives).
If you're coming from an RDBMS world, the de-normalized philosophy behind Big Table is mind-boggling (at least it is for me, and it's an extremely common source of confusion on the mailing list). Their engineers didn't come up with the idea (or, at least, make something the public became aware of...it's not like they invented column-oriented tables) just to kick off a new buzz word. Not being forced to remember to update 18 copies of the same column in different tables is one of the huge benefits of an RDBMS.
Big Table does offer ACID transactions (which is what the article's really about)...they just scale very poorly. I'm not sure how well they possibly can. If you have three different clients connecting to 3 different data centers scattered around the world, trying to transfer all the money from the same bank account into a Swiss numbered account...that needs ACID enforcement, and you will have a nasty performance hit. Two of those transactions absolutely must fail.
If you have the same scenario, but the clients are trying to update the visitor counter on your blog...it really doesn't matter much if the next few people who check happen to see the wrong number. Making those trade-off decisions up front is part of the pain of writing for that platform.
Maybe the article is describing a way to work around the bank example, but it didn't really sound that way to me. Either way, it'll be interesting to see what they come up with.
Re:Pfah. by harlows_monkeys · 2010-09-01 07:30 · Score: 1

NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google and Walmart that make the sites that built these databases in desperation look positively tiny.
It isn't data volume that is the problem. It is often data organization. Traditional SQL databases are row stores. For some applications that is not a good way to store data. Column stores make more sense in data warehousing, for example. Michael Stonebraker has blogged about this a few times at the same blog site cited by the submitter.
Re:Pfah. by SQLGuru · 2010-09-01 07:40 · Score: 1

This is why I call myself a database programmer. I'm not a DBA, never have been and don't want to be. I understand how to make the database do what it needs to do. At a high level, I understand how data is stored to disk, but I don't really care about that (that's a DBAs job). I also understand at a high level the questions that an application developer needs to ask (not a DBAs job at all). I bridge the gap and write code (sprocs, triggers, functions, etc.) to support the app. I tune queries and db code to support the database server (and the user experience). I have yet to meet an English question about data that can't be answered in performant SQL.
I'm no opposed to noSQL (even with a nick like SQLGuru), but like was said earlier, a sledge hammer, a tack hammer, and a claw hammer all have their appropriate uses......some even drive screws in better than others. SQL is still the right tool for business applications (where aggregating and reporting is extremely important). Right tool + right job = cake (it's not a lie). Wrong tool + right job = frustration.
Re:Pfah. by Anonymous Coward · 2010-09-01 07:43 · Score: 1, Informative

But then it's not scalable.
If you want ACID and scalability, use Postgres.
Re:Pfah. by DiegoBravo · 2010-09-01 07:46 · Score: 1

After all, MySql is why slashdot is so relia~ `} v* m& + ' ,
--
Civilization = writing + taxes + booze + hookers
Re:Pfah. by TooMuchToDo · 2010-09-01 07:56 · Score: 1

This. Oracle: For when you couldn't figure out how to run PostgreSQL and had millions of dollars to blow.
Re:Pfah. by Krahar · 2010-09-01 08:09 · Score: 1, Insightful

Doesn't work so well if you've got a graph structure or a tree. If in a family tree, you want to find all 5'th descendants or all descendants of some guy, SQL won't make you happy. As far as I can see, you end up iterating a query to add children until you reach a fixed point, and SQL doesn't have fixed point operators so you have to do it by hand. Right?
Re:Pfah. by RAMMS+EIN · 2010-09-01 08:15 · Score: 3, Interesting

``There is a strong disconnect between the way SQL represents data and the way traditional programming languages do.''
I agree, but ...
``While we've come up with some clever solutions like ORM to alleviate the problem,''
I don't think ORM alleviates the problem so much as entrenches it. The classes-and-instances object model and the relational model are different, but can be expressed in one another. Object-relational mapping makes this easy by pretending the models are the same, and doing the mapping behind the scenes. This works for some cases, but if you want to get the best performance, you have to express things in a way that takes into account the efficiency considerations of the actual implementation. With ORM, you run into the situation where what is most succinct to express in code is not necessarily what is most efficient in terms of disk access and network resource usage. So, for efficiency reasons, you end up breaking the abstractions that your ORM provided ...
``why not just store the data directly without any mapping?''
There isn't really such a thing as "without any mapping". However, you can ensure that the constructs your API provides are equivalent to what you can efficiently fetch or store in your data store. Since typical RDBMSs are usually optimized to execute typical SQL queries efficiently, SQL is actually a fairly good starting point. You can optimize this by creating indices to speed up common operations, and by tuning your RDBMS to speed up common operations. And, no doubt, you can do even better by creating custom shortcuts for specific needs of your application.
This is sort of what so-called NoSQL databases do: they are optimized for specific scenarios, and thus may outperform stock RDBMSs that are optimized for "we don't know what you want to do, so we try to make everything reasonably fast". It's also worth noting that NoSQL systems often return stale data or even allow inconsistencies in order to improve performance. By contrast, the strength of a good relational database is preserving the integrity of your data no matter what happens. Different tools for different jobs - or at least, different optimizations for different scenarios.

--
Please correct me if I got my facts wrong.
Re:Pfah. by Tablizer · 2010-09-01 08:16 · Score: 1

The trend is toward multi-paradigm, which in general I think is a good thing. Each has niches where they seem to do well. But the trick is knowing where to use what in a team environment. There's no good road-tested design system/technique for identifying where in a given app a given paradigm works best, resulting in heated arguments.

--
Table-ized A.I.
Re:Pfah. by bluefoxlucid · 2010-09-01 08:19 · Score: 1

That is an excellent question for a DBA evaluation exercise.

--
Support my political activism on Patreon.
Re:Pfah. by m50d · 2010-09-01 08:36 · Score: 1

You're looking at this precisely backwards. The problem is that the programming-language representation is richer than the SQL one; SQL can only store tables (raw data), not, say, documents, which are more informational.

--
I am trolling
Re:Pfah. by loufoque · 2010-09-01 08:44 · Score: 1

While we've come up with some clever solutions like ORM to alleviate the problem
You probably meant to make the problem worse, as this is the only thing Object-Relational Mapping does.
Dealing with a relational database requires thinking in relational terms. Mapping it to a recursive object-oriented model is stupid and inefficient.
In particular, the queries should be at the center of algorithms, and there should be a constant number of them.
Re:Pfah. by NNKK · 2010-09-01 08:47 · Score: 3, Interesting

That is an excellent question for a DBA evaluation exercise.
So...
Efficient SQL Usage == Programmer + DBA
Efficient NoSQL Usage == Programmer
Thank you for making the case for NoSQL so clearly.
Re:Pfah. by modmans2ndcoming · 2010-09-01 08:57 · Score: 1

The graph structure or tree structure belongs in your program. The data should still be relationally stored in the database. A tree and graph can be built from RDBMS stored data since each node is related to the other nodes it is connected to in your tree or graph structure, you should be able to construct a relational schema that meets your storage needs and makes it easy to reconstruct the logical tree/graph in memory.
Re:Pfah. by FlyingGuy · 2010-09-01 08:59 · Score: 1

NoSQL is not really about scalability, it is about modeling your data the same way your application does.
Of all the inane and ridiculous statements I have every heard on /. this take the cake. "The way your application does", really I mean you really believe that an application has a life of its own and does data modeling?
A programmer ( an actual human ) builds an application and if he or she does not have a clue about databases, then yes the data modeling is going to really suck.
You can model data hierarchically or relationally you can even do a mix of both. You can choose a data model that suites the data requirements, hell you can even use flat files if you want your data to be full of white space but that is a choice that a programmer made, NOT the application.

--
Hey KID! Yeah you, get the fuck off my lawn!
Re:Pfah. by DragonWriter · 2010-09-01 09:01 · Score: 5, Informative

Doesn't work so well if you've got a graph structure or a tree. If in a family tree, you want to find all 5'th descendants or all descendants of some guy, SQL won't make you happy.

A decade plus ago, and that would be true.
Standard SQL from SQL-99 on will, in fact, do this quite easily with via recursive Common Table Expressions. Now, some SQL-based DBMSs don't support enough of the standard to use this, but, current versions of, I believe, DB2, Firebird, PostgreSQL, and SQL Server all implement standard CTEs well enough to do those examples in SQL fairly directly, and Oracle has its own proprietary syntax (CONNECT BY) that works for the examples that you pose, though its less general than SQL-99 recursive CTEs.
Re:Pfah. by DragonWriter · 2010-09-01 09:12 · Score: 1

Big Table does offer ACID transactions (which is what the article's really about)...they just scale very poorly. I'm not sure how well they possibly can. If you have three different clients connecting to 3 different data centers scattered around the world, trying to transfer all the money from the same bank account into a Swiss numbered account...that needs ACID enforcement, and you will have a nasty performance hit. Two of those transactions absolutely must fail.
Two of them must fail, but that doesn't necessarily mean you have to have poor scalability. Scalable distributed systems with ACID guarantees have been demonstrated (TFA specifically addresses one, but others have been demonstrated before, e.g., Scalaris.)
Re:Pfah. by QuoteMstr · 2010-09-01 09:28 · Score: 5, Informative

An ACID compliant RDBMS can't even get read access to the user, car, friend, picture and pet_survey_answer table set as long as any of the million users of the system is making a change to his data, even if the application only locks one table at a time for write access, let alone the problem of a million users trying to gain write access to the same table at the same time.
You have no idea what you're talking about, probably because your brain has been irreversibly warped by MySQL. Concurrent writing is widely-supported.
Hint: MVCC.
Re:Pfah. by Atzanteol · 2010-09-01 09:32 · Score: 2, Funny

Right now it's like men wearing womens' underwear and vice-verse.
You mean it makes me feel pretty?

--
"Ignorance more frequently begets confidence than does knowledge"

- Charles Darwin
Re:Pfah. by lennier · 2010-09-01 10:01 · Score: 2, Insightful

Yeah, ask me a specific structured question and I'll give you a two-dimensional array to work with as an answer.
That's fine until someone asks you an unstructured question for which a two-dimensional array cannot contain the answer.
Like, for example, 'Here's an ordered DOM tree of nodes each containing tags, subtrees and/or chunks of CDATA'.
Or 'Here is a set of objects each of which contain their own custom properties not found in others.'
Not every form of useful information in the real world is strictly typeful and represents a well-formed relation over finite domains.

--
You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC
Re:Pfah. by jimrthy · 2010-09-01 10:02 · Score: 1

They've spent years watching millions (billions?) of users pounding on tens of thousands of data centers spread all over the planet? I'd consider that "proof."
When you start needing to coordinate transactions among continents, scalability will suffer. That's probably why (last I heard) Google has App Engine running from only 1 data center.
Don't get me wrong. What little I've been able to find about Scalaris sounds amazing. But it isn't a silver bullet yet.
Re:Pfah. by FooBarWidget · 2010-09-01 10:25 · Score: 1

But Postgres is not web scale. MongoDB however is.
Re:Pfah. by caluml · 2010-09-01 10:29 · Score: 1

That's fine until someone asks you an unstructured question for which a two-dimensional array cannot contain the answer. Like, for example, 'Here's an ordered DOM tree of nodes each containing tags, subtrees and/or chunks of CDATA'. Or 'Here is a set of objects each of which contain their own custom properties not found in others.'
Questions usually ask something, not state something :)

--
Get your own free personal location tracker
Re:Pfah. by prockcore · 2010-09-01 10:44 · Score: 1

You use Google to defend SQL against NoSQL, conveniently ignoring the fact that Google is a flag waving NoSQL fan. Why do you think they created BigTable?
Re:Pfah. by hey! · 2010-09-01 10:51 · Score: 5, Insightful

NoSQL is not really about scalability, it is about modelling your data the same way your application does.
I've actually been in the business long enough to remember when relational databases were the new thing. What people seem to forget is that modeling your data in a different way than your application does *was the whole point*. The idea was to make data a reusable resource *across applications*. Of course, that turned out to be a lot harder than we thought it would be. Philosophically, one might well ask whether it is possible to understand data at all apart from its intended applications. Of course, by the time we'd figured that out, a whole new generation was coming up trying to create a Semantic Web.
I basically agree that SQL isn't always the right tool for the job. I happen to think certain aspects of the relational model are somewhat broken (e.g. composite keys), and SQL is a pretty crappy query language in any case. But I think because RDBMSs are a mature technology, recently trained programmers don't bother to understand them, and cover that lack of understanding by pooh-pooh-ing the stuff that's over their head. I went through a patch a few years ago where I was interviewing programming candidates who had XML coming out of their ears but hadn't the foggiest idea of what "NULL" means in the relational model. Naturally they had all kinds of problems on the relational end of things, and tended to view the RDBMS as a kind of pitfall in which bad things inexplicably happen. Consequently, they tended to think of the database as simply a backing store for the application *they* were working on. In some cases this is acceptable, but one often sees abominable schema that are the product of ignorance, pure and simple.
Naturally, non-relational systems are most attractive where performance is at a higher premium than flexibility. This characterizes many web applications that do a small number of relatively simple things, but to do it on a scale that takes special expertise to achieve using a relational model. That was very much the case at the beginning of the relational era, when applications tended to be narrower in scope and query optimization primitive. You thought of order line items as "part-of" an order, whereas in relational thinking they could just as easily be considered attributes of products. This made the programmer's job a lot easier, so long as the RDBMS could process invoices fast enough to make the users happy.

--
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Re:Pfah. by shutdown+-p+now · 2010-09-01 11:04 · Score: 1

NoSQL is not really about scalability, it is about modelling your data the same way your application does.
If that's the case, then most NoSQL databases would be OODMBS, which is not the case. Quite often they are just as inconvenient to map data to as relational databases are.
No, TFA is spot on - "NoSQL" really means "no ACID", and ACID is just damn too convenient to forgo. If they actually manage to pull off a truly scalable ACID DB (relational or not), it will be awesome, and we can get rid of the "NoSQL" hack for good.
Re:Pfah. by pankkake · 2010-09-01 11:04 · Score: 1

Is randomly losing data ACID? If no, MongoDB is not ACID.

--
Kill all hipsters.
Re:Pfah. by takev · 2010-09-01 11:08 · Score: 1

Oh, did he forgot to tell you that the family tree goes all the way back to Adam and Eve and includes every person who ever lived and died on earth? In other words the size of the tree is quite a few gigabytes in size.

In that case you may want to look at a graph database with its own query language that is designed for this kind of data. Not everything that can be stored in a RDBMS should be stored in one.

To be honest, I have not found a perfect graph database yet, but I have a feeling that one will be created quite soon.
Re:Pfah. by LurkerXXX · 2010-09-01 11:14 · Score: 4, Informative

An ACID compliant RDBMS can't even get read access to the user, car, friend, picture and pet_survey_answer table set as long as any of the million users of the system is making a change to his data, even if the application only locks one table at a time for write access, let alone the problem of a million users trying to gain write access to the same table at the same time.
Wow. Just wow. Any serious ACID complient RDBMS can do that with no problem.
Re:Pfah. by fishbowl · 2010-09-01 12:20 · Score: 1

"You can run a Google size database from MySQL, but you can't use to MySQL* to implement a search solution with performance like Google, without requiring much much much hardware."
There's nothing else that can meet such requirements without "much much much hardware", either.

--
-fb Everything not expressly forbidden is now mandatory.
Re:Pfah. by h4nk · 2010-09-01 13:44 · Score: 1

thank you for proving my point.
Re:Pfah. by daveime · 2010-09-01 13:47 · Score: 1

After all, MySql is why slashdot is so relia~ `} v* m& + ' ,
You can choose 100% reliability or 100% scalability, not both.
The same "results" from a NoSQL database would be :-
"After all, MySql is"
"After all"
"lemons"
NULL
But hell, at least you get yout data FAST, who gives a fuck if it's the right data ?
Re:Pfah. by Johnno74 · 2010-09-01 14:00 · Score: 5, Interesting

Totally agree. Only problem is writing recursive CTE queries is beyond most programmers. Hell, a lot of programmers struggle with anything but simple inner joins.
IMHO CTE's are one of the most underused and powerful features of SQL. Not just for recursive queries, but for bridging the gap between functional and procedural programming.
I write all my complex queries as a series of simple CTE's now - each CTE gets me one step closer to the actual query I need, and the magic of the query optimizer combines them all into a single query plan. Makes testing, debugging and maintaining a complex query about a million times easier.
Re:Pfah. by hanshotfirst · 2010-09-01 14:50 · Score: 1

So what do you do when the boss asks for data in a way the application didn't anticipate? Give me how many widgets we sold in June to evil inventors in the tri-state area, for example. I know I can do that relationally, and I know I can with cube-based DBs, but I've not understood how NoSQL systems handle the unexpected ad hoc needs, so I'm genuinely interested and not trolling.

--
Why, oh why, didn't I take the Blue Pill?
Re:Pfah. by gfody · 2010-09-01 17:03 · Score: 1

PS. Don't mix terms like NoSQL and SQL. One is a product, the other is a query syntax.
which one is a product?

--

bite my glorious golden ass.
Re:Pfah. by Sircus · 2010-09-01 19:03 · Score: 1

HIs brain's warped, but not by MySQL, which supports this just fine, even with MyISAM.

--
PenguiNet: the (shareware) Windows SSH client
Re:Pfah. by mikelieman · 2010-09-01 19:39 · Score: 2, Insightful

Unless you're writing the code for the database engine, you are NOT a database programmer, you're an application programmer...

--
Technology -- No Place For Wimps! Grateful Dead and Jerry Garcia Chatroom -- http://www.wemissjerry.org
Re:Pfah. by SQLGuru · 2010-09-01 23:45 · Score: 1

But Application Programmer doesn't distinguish my strengths from any random font-side developer (don't get me wrong, I code Java and C#, too. But when playing up my strengths, it's writing good DB-side code.). I find my skills to be somewhat unique (few people strong in DB-side code) in the large companies I've worked for.....and once the skill is discovered, teams are amazed at how much they benefit from the skill. It is a different mind-set. When I started learning that side of it, I equated the shift as being at the same level as a shift from procedural to object-oriented programming -- a completely different paradigm. That's also why I think most people aren't strong on the DB side; making paradigm shifts is hard for someone who's been doing something for a while.
Re:Pfah. by bluefoxlucid · 2010-09-02 01:21 · Score: 1

MySQL InnoDB works pretty well. MyISAM doesn't.

--
Support my political activism on Patreon.
Re:Pfah. by bluefoxlucid · 2010-09-02 01:34 · Score: 2, Insightful

That depends. If I'm storing video data I don't want a relational database. A small-scale family tree might be good in a proprietary format. A large-scale family tree might also be good in a proprietary format. The Windows registry is inherently hierarchical and needs a non-relational model, just like file systems (quit arguing that file systems should be relational DBs; the current model is fine).
A large-scale family tree that I need to use to look up other information with absolute identity (i.e. there are 15 James Clyde Simmons in the world, 7 in my city somehow, and 3 in my zip code!) needs to at least sync its individual identifiers with the primary key of a RDMBS holding all the other data in any case where relational analysis is also needed i.e. find me all PERSONS with $ATTRIBUTE. Keeping these two things in absolute sync requires a specialized database engine; but you can write program code that fakes it for all useful cases if you keep the primary common identifier unique and static.
There are going to be tasks where an RDBMS is excellent and anything else is going to be complete failure. College information systems, forever, have to track students vs student IDs vs all completed courses and grades vs when those courses were completed vs what courses the student is enrolled in now vs if they've paid for their tuition... this is the wrong kind of information to list line by line (flatfile) or hierarchically. Maybe I want to see everyone enrolled in MATH314, or everyone enrolled in MATH314 class DXA, or everyone enrolled in MATH314 on Middlesex campus. Maybe I want to see all courses James Peak is enrolled in, or has enrolled in ever. For these tasks, you need an RDBMS.
There are also going to be good flatfile cases-- MP3s, video files, XCF, etc. As well, there will be stores of information that must fall into hierarchical organization-- file systems, geneology databases, the Windows registry. These should optimally not use an RDBMS structure.
There will be tasks that operate on one set of data but bring a corner case that benefits from another method of organization. For example, looking through a database at an insurance company to check for dependents (parents/children/spouses). Of course hierarchical databases might be better for this operation; but all the information and all operations you'll ever do is going to go better in an RDBMS, and any other storage method will require either tons of cross-indexing (to the point of implementing a BAD RDBMS) or lots of memory and time to do 0.06 second queries in 10 minutes. Too slow, too broken. The corner case operations cause trouble, but what can you do?

--
Support my political activism on Patreon.
Re:Pfah. by DuckDodgers · 2010-09-02 01:34 · Score: 1

So how did you learn your particular skills? I've been forced to develop some small proficiency with database queries and stored procedures because I work at a small company and if I couldn't fix the problem, it did not get fixed. I used a combination of information I got from web searches and trial and error, and while my latest solutions are better than the earlier ones, I suspect a skilled database user would still run screaming from either.

Are there books you read? Classes you took? Certifications?
Re:Pfah. by bluefoxlucid · 2010-09-02 01:50 · Score: 1

SQL is for scanning through structured information, not documents full of bullshit. If I want to look for a candidate in my huge pile with VB.NET experience, I have two options:
Option 1: massive grep through all these 10,000 PDF/Word documents for "VB.NET," "VB .NET," "VB" and ".NET" ... then try to extract phone numbers and names out of them (they may be in odd, unstructured formats), then pick ones I like and look through first to make sure they've got 2 years experience and not "Learning VB.NET" or "Took a programming course on VB.NET" ...
Option 2: (SELECT tbl_candidates.lastname, tbl_candidates.firstname, tbl_skills.skillname, tbl_skills.yearsexp FROM tbl_candidates INNER JOIN tbl_skills ON tbl_candidates.ID=tbl_skills.CandidateID WHERE tbl_skills.skillname = "VB.NET";) and hire an HR secretary to read each resume we get and enter/update the candidate's information. Store the resumes as files with a particular naming convention based on tbl_candidates.ID and leave a link next to the candidate in the Web application to retrieve the original resume.
I built a resume searching system for a company that had to hire an intern for 3 months to enter in all their resumes from 4 large filing cabinets, not to mention all the electronic ones they got online. There's a reason Monster and Dice want discrete skills and phone numbers entered, rather than a blob of text that they grope at.

--
Support my political activism on Patreon.
Re:Pfah. by SQLGuru · 2010-09-02 02:52 · Score: 1

Mine was the school of hard knocks and a really great team in my first job out of college. Back in 1994, the team I joined was a "rogue" group within the IS org (back when Information Systems was not lumped under Information Technology). Everyone else was doing Oracle and they were doing MS SQL Server 4.3 on OS/2 2.1 servers. Being the "new guy," I got most of the database work (everyone else thought it was less fun) as well as being tasked with making sure the servers were up. Even still, the team had some really sharp people in it that were also good mentors (in my opinion, the best way to learn). My first formal database class (even with a CS degree) wasn't until 2002 and that was for Oracle, but by then I was widely considered an expert in both SQL Server and Oracle and the class was just to get introduced to whatever the latest features were at the time.
Re:Pfah. by mr_mischief · 2010-09-02 03:17 · Score: 1

You could also use something like IMS if the hierarchy is so strong. Not all DBMSes are relational, after all.
Re:Pfah. by DragonWriter · 2010-09-02 06:53 · Score: 1

That's fine until someone asks you an unstructured question for which a two-dimensional array cannot contain the answer.
Like, for example, 'Here's an ordered DOM tree of nodes each containing tags, subtrees and/or chunks of CDATA'.
That's neither unstructured nor a question, and, since it is not a question it has no answer. But if you can certainly represent an ordered tree of nodes, each of which may either be a tag, or a subtree, or a chunk of text in a relational schema. And answer pretty much any answerable question about it using standard SQL.

Or 'Here is a set of objects each of which contain their own custom properties not found in others.'
Again, not a question. The relational model can handle that kind of data, though, and SQL can be used to answer most reasonable questions about it, though to be sure its not what most RDBMS's (or the SQL language) are optimized for. But that's more a problem with the particular products and SQL as a language than with the relational model.
Re:Pfah. by GWBasic · 2010-09-02 08:46 · Score: 2, Informative
So, remember, NoSQL means that's anything but SQL. It's not a standard; rather, it's an honest effort to try to experiment with different database techniques where traditional SQL just isn't meeting an industry need. Key-value databases aren't going to satisfy the "give me how many widgets we sold in June to evil inventors in the tri-state area" need; but they do satisfy the scalability need for sites that have millions of concurrent users.
Regarding Mongo, the NoSQL database that I use, it can answer the "give me how many widgets we sold in June to evil inventors in the tri-state area." Basically, instead of having 100 tables with foreign key relationships, you'll have 10 collections of "documents," which are really just data structures. You can query deeply into data structures and return partial data structures.
Let's assume I have an "invoices" collection. Each invoice has an array of "line items", and each item has a count. I can do the following in Mongo:
- Give me all of the documents from the invoices collection that are for companies in the tri-state area and have widgets X, Y, and Z: http://www.mongodb.org/display/DOCS/Dot+Notation+(Reaching+into+Objects)
- (Use the above query,) but instead of returning full documents, just return the parts of the documents that represent the "line items" that match widgets X, Y, and Z: http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields
- (Use the above query,) but just return the counts of widgets sold: http://www.mongodb.org/display/DOCS/Aggregation
Again, NoSQL isn't a standard. It's basically experimenting with different ways of having a database with the hopes of finding one that's easier to work with. Mongo is a lot closer to SQL then things like Key-Value databases.
--
No, I will not work for your startup
Re:Pfah. by DragonWriter · 2010-09-02 11:17 · Score: 1

For this reason I suggest that app language designers work on better fitting RDBMS and SQL rather than the other way around
I'd suggest instead that both app and query language designers should work on better fitting generalized handling of data using the relational model; SQL has a lot of limitations that have nothing to do with the essentials of the relational model, and much of the evolution of SQL has been toward shoehorning non-relational features into it rather than addressing the limitations that prevent it from solving some problems using the tools naturally available within the set theoretic model notionally underlying "relational" databases.
Certainly, the relational model has a lot to recommend it, and its good that many application languages (enabled largely by the trend to move features from functional languages into popular OO languages) are developing convenient and elegant means of expressing some relational operations within the application language without resort to external languages. But there is no particular reason that conforming closely to the quirks of SQL should be the goal.
Re:Pfah. by badkarmadayaccount · 2010-09-02 18:23 · Score: 1

I think the issue is that SQL is procedural, and query languages are better when declarative. Or at least abstract away the table metafor - a Google like query interface.

--
I know tobacco is bad for you, so I smoke weed with crack.
Re:Pfah. by badkarmadayaccount · 2010-09-02 18:42 · Score: 1

I thik ORM tools need to be subject to compiler optimizations.

--
I know tobacco is bad for you, so I smoke weed with crack.
Re:Pfah. by FooBarWidget · 2010-09-03 03:28 · Score: 1

Woooooosh.
Watch the video.
Re:Pfah. by DragonWriter · 2010-09-03 03:58 · Score: 2, Informative

I think the issue is that SQL is procedural, and query languages are better when declarative.

SQL, as such, is declarative. Many RDBMSs include, in addition to SQL, an SQL-derived procedural scripting language (Oracle's PL/SQL, and so on.)
Re:Pfah. by greg1104 · 2010-09-03 07:36 · Score: 1

To provide some better references to what you've said here: the article PostgreSQL 8.4: Common Table Expressions (CTE)... covers this feature in PostgreSQL, and includes links to the documentation of other database products that support this feature to compare against. Same author also did Using recursive CTEs to represent tree structures on this topic.
Re:Pfah. by m50d · 2010-09-03 11:18 · Score: 1

Again, you have it backwards - traditional SQL ends up *more* like a "blob of text" than the modern nosql systems. Your resumes are a very good example - it'd be much more effective to use something like couchDB, where you can store them in a resume data structure that makes sense for a resume, than an SQL database that thinks everything is a 2D table.

--
I am trolling
Re:Pfah. by modmans2ndcoming · 2010-09-03 13:49 · Score: 1

I know... which is why I said RDBMS but I don't care how hierarchical your data is, if you write a proper Schema for it, an RDBMS with cor5rect indexing will return a query on a large dataset orders of magnitude faster than a hierarchical database.
Re:Pfah. by arivanov · 2010-09-05 05:20 · Score: 1

Bingo!!! SQL is a "fortran" era language and looks abhorent to anyone who have been indoctrinated into object orientation.
However, there is a way out.
All you need is to map accessor methods in your objects onto SQL primitives. You fight your disgust with the fortran-likeness of SQL once or pay someone to fight it before you start coding.
Initially, you will have a much slower software compared to "normal" objects. However, you can now have as many threads as you like working on the same problem. You can have as many machines as you like. You can have machines in different locations. It all happens automagically if one condition is met - SQL is put there properly day 1 by design, not as retrofit and any "in-memory" operations which cache SQL locally are allowed only as special exemptions.
However, that is not what Joe Average Programmer does. He will copy all the date out of SQL, load it into objects (handcoded or through a persistence framework), work in memory and update SQL only from time to time. From there on Joe Average Progammer needs mutexes (you can pretty much forget about that concept with SQL accessors), HA frameworks, clustering, heartbeat and a LOT of other tripe that sits nicely and looks fancy on his CV.

--
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Re:Pfah. by bluefoxlucid · 2010-09-07 00:52 · Score: 1

Each entry in an SQL table would link Education to (PersonID, SchoolID, Years, DegreeType, DegreeSubject, GraduationMonthYear). Each section of a resume (skills, work experience, education...) would have its own table, where we'd identify each row on the ID of the person.
HR wants to search for SKILL=$MYSKILL YEARSEXP=$YEARS, and the computer wants to do this by looking through a pile of data about skills with those attributes. It'll fetch the applicant's information by looking up the applicant. If you want the whole thing, it'll scan each table (indexed on PersonID) by the applicant's ID using a B*Tree or something.
Nobody wants discrete resume objects. The computer has to then search through each discrete resume object, and they're hard to index because they can be variable sizes because of repeated structured data that you happen to want to search on. You can either structure that data with all other data of that type and index it; or you can scatter it all over the place and try to index it still so you get the illusion of having a bunch of tables, but a nightmare of disk activity and cache misses trying to do searches.
The problem is you're thinking like a human, dealing with data a human wants to deal with in the way a human will deal with it. You're forgetting you need to work with a computer, and the computer needs to do certain logical operations. The only fast, scalable way to do this is to store the data suitable for a machine and index it with a data structure a machine can traverse quickly. The interface can make it look human-readable; this is called "output."

--
Support my political activism on Patreon.
Re:Pfah. by m50d · 2010-09-07 07:49 · Score: 1

Nobody wants discrete resume objects. The computer has to then search through each discrete resume object, and they're hard to index because they can be variable sizes because of repeated structured data that you happen to want to search on. You can either structure that data with all other data of that type and index it;
Of course you need to structure it. It's just that you can structure it in a way that makes sense for a resume, rather than cramming it into a set of 2d tables the way you have to if you're using SQL. Look at the performance of modern non-sql databases; I think you'll be pleasantly surprised.
The problem is you're thinking like a human, dealing with data a human wants to deal with in the way a human will deal with it. You're forgetting you need to work with a computer, and the computer needs to do certain logical operations. The only fast, scalable way to do this is to store the data suitable for a machine and index it with a data structure a machine can traverse quickly. The interface can make it look human-readable
The computer is perfectly capable of indexing it as a structured document - it's just that structure doesn't have to be SQL. Sure, the internal representation is probably some big bunch of tables - just as the internal representation of those is ultimately a big block of bytes. And you can do your own mapping from the logical resume to a set of sql tables if you want - but this leads to writing repetitive code to solve a problem that's already been solved, much like writing your own database would.

--
I am trolling

digg does not need to worry anymore by Dan667 · 2010-09-01 04:57 · Score: 5, Funny

digg has chased all their users away with the new version of their site so they could probably change over to MS Access and be ok.

Re:digg does not need to worry anymore by Pojut · 2010-09-01 04:59 · Score: 2, Insightful

offtopic:
Considering how fanatical digg users can be, I can't possibly imagine why they thought it was a good idea to implement the changes they've made.

--
Living With a Nerd
Re:digg does not need to worry anymore by Kaboom13 · 2010-09-01 05:13 · Score: 2, Interesting

Because the entire site had been completely overwhelmed by spammers? Digg went from a great site to go see whats new to a glorified RSS feed for cracked.com , college humor and reddit. They had to change something,
Re:digg does not need to worry anymore by BabyDuckHat · 2010-09-01 05:48 · Score: 5, Funny

Yeah, now instead of being a glorified RSS feed for reddit, they're an actual RSS feed for reddit. Great change!
Re:digg does not need to worry anymore by Dan667 · 2010-09-01 05:53 · Score: 4, Insightful

actually most of the change was to allow auto submitting of stories from big publishers/companies. They basically changed digg into a paid for RSS ad service. If you hated the gaming of the old site digg I am sure you just stopped using the new site digg all together. No one goes to a website to read ads.
Re:digg does not need to worry anymore by Anonymous Coward · 2010-09-01 07:06 · Score: 1, Insightful

The first time I boycotted Digg was when they had a top headline or story where the URL didn't even resolve. Like 2,000 diggs for a host not found. I then went back for the almost safe for work mindless BS that they had for a while. Remember, digg used to be called the L1 cache for slashdot. Now, it looks like some kind of Windows XP clone and I have no idea what the content is supposed to be targeted for, so I think I'm done for now with them.
All around, a poor website as time has gone on. It was at least useful as comic relief, but that is gone as well now, its not really worth anything anymore...
Re:digg does not need to worry anymore by Dan667 · 2010-09-01 08:12 · Score: 1

you must not have visited the site since the upgrade. People are not stupid.
Re:digg does not need to worry anymore by mikelieman · 2010-09-01 19:43 · Score: 2, Funny

And they would have gotten away with it if it wasn't for those meddling kids!

--
Technology -- No Place For Wimps! Grateful Dead and Jerry Garcia Chatroom -- http://www.wemissjerry.org
Re:digg does not need to worry anymore by ukyoCE · 2010-09-02 07:42 · Score: 1

You lost me at "Digg went from a great site..." :)

Berkeley DB by nacturation · 2010-09-01 04:58 · Score: 3, Funny

Didn't Berkeley prove back in the 60s and 70s that acid was scalable?

--
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.

Re:Berkeley DB by Anonymous Coward · 2010-09-01 05:04 · Score: 1, Funny

Your grasp of the obvious is on par with a character from a Dan Brown novel.
Re:Berkeley DB by Zak3056 · 2010-09-01 05:19 · Score: 1

Didn't Berkeley prove back in the 60s and 70s that acid was scalable?
At the very least, they proved it was salable...

--
What part of "shall not be infringed" is so hard to understand?
Re:Berkeley DB by WolfWithoutAClause · 2010-09-01 13:15 · Score: 1

At least up to jumbo sizes, but then they fall over in a heap on the floor.
http://www.museumofhoaxes.com/hoax/top/experiments/

--
-WolfWithoutAClause
"Gravity is only a theory, not a fact!"

Interesting thesis by Peeteriz · 2010-09-01 05:09 · Score: 5, Interesting

In essence, TFA claims that if the traditional ACID guarantee "if three transactions (let's call them A, B and C) are active ... the resulting database state will be the same as if it had run them one-by-one. No promises are made, however, about which particular order execution it will be equivalent to: A-B-C, B-A-C, A-C-B" is not abandoned (as in NoSQL systems), but is even strengthened to a guarantee that the result will always be as if they arrived in A-B-C order, then it solves all kinds of possible replication problems, requires less networking between the many servers involved, and allows for high scaling while also keeping all the integrity constraints.

Re:Interesting thesis by capnchicken · 2010-09-01 05:37 · Score: 1

Determinism solves many things in DB design that's why things like WITH SCHEMABINDING for views and user defined functions in MS SQL make things run so much faster. With over 40 years of RDMS design, it's odd that this path has never been gone down before. But the whole turning "out that the deterministic scheme performs horribly in disk-based environments" makes perfect sense if this is something that scales very well in high memory environments that didn't exist until now.
Now THIS is news for nerds, it's too bad I had to scroll through so many LSD/Acid (hurr hurr drugs) jokes to get down to a comment of someone who actually read this.

--
A libertarian shat on my carpet once. Claimed the free market would sort it out. -Ford Prefect(8777)
Re:Interesting thesis by julesh · 2010-09-01 07:42 · Score: 1

In essence, TFA claims that if the traditional ACID guarantee "if three transactions (let's call them A, B and C) are active ... the resulting database state will be the same as if it had run them one-by-one. No promises are made, however, about which particular order execution it will be equivalent to: A-B-C, B-A-C, A-C-B" is not abandoned (as in NoSQL systems), but is even strengthened to a guarantee that the result will always be as if they arrived in A-B-C order, then it solves all kinds of possible replication problems, requires less networking between the many servers involved, and allows for high scaling while also keeping all the integrity constraints.
Which, to anyone who has seriously thought about how to implement atomic transactions in a nosql environment, should not exactly come as a shock. It's the obvious solution to the problem, and I'm sure if you dig into it you'll find hundreds of implementations that work just like that.
The interesting problem then becomes coming up with an efficient way of deciding what that complete ordering on transactions is going to be. You can only get so far with a single server that assigns serial numbers to the transactions on arrival (at least if your data is distributed and not replicated; in the latter situation you should be able to do so easily).
Re:Interesting thesis by bar-agent · 2010-09-01 09:19 · Score: 1

I'd say the obvious solution is not to use a server to assign serial numbers, but instead to use, say, a hash of the operations/data involved or the submission timestamp coupled with the global static thread number or something.

--
i'd hit it so hard, if you pulled me out you'd be the king of britain [bash.org]
Re:Interesting thesis by DragonWriter · 2010-09-01 09:51 · Score: 1

Which, to anyone who has seriously thought about how to implement atomic transactions in a nosql environment, should not exactly come as a shock. It's the obvious solution to the problem, and I'm sure if you dig into it you'll find hundreds of implementations that work just like that.
The few distributed "NoSQL" implementations (like Scalaris) that offer strong consistency do this; but its about consistency, not atomicity, and most dstributed NoSQL implementations don't offer strong consistency guarantees, so they don't do this. (Note that this is a choice, not a failure: it has been proven -- the CAP theorem -- that for a system on a network which can partition, you can guarantee only one of consistency and availability, and there are definitely use cases for either guarantee.)
Re:Interesting thesis by julesh · 2010-09-01 10:05 · Score: 1

I'd say the obvious solution is not to use a server to assign serial numbers, but instead to use, say, a hash of the operations/data involved or the submission timestamp coupled with the global static thread number or something.
The problem then is that transactions may arrive out of their globally defined order... you either have to back out subsequent ones and reapply in the correct order if this happens, or wait for some indeterminate period of time before applying a transaction to make sure one doesn't arrive after it that should be executed before it.
I'm sure there is a solution to this that doesn't present a scalability issue, but I don't really see it right now. My current project is likely to be hitting this kind of issue in ~12-18 months, so I keep thinking about it, but right now the serial number server seems to be the only practical way.

Possible != Practical by Tablizer · 2010-09-01 05:17 · Score: 3, Insightful

A bigger issue may be the cost of ACID even if it can in theory scale. Supporting ACID is not free. A free web service may be able to afford losing say 1 out of 10,000 web transactions. Banks cannot do it, but Google Experiments can. The extra expense of big-iron ACID may not make up for the relatively minor cost of losing an occasional transaction or customer. It's a business decision.

--
Table-ized A.I.

Re:Possible != Practical by Tablizer · 2010-09-01 05:40 · Score: 1

Accounting must balance or you have boat-loads of headaches. I've seen some spend months tracking down a few pennies of difference. The transaction paths are often too complicated to just plug the difference with a fudge; it creates unintended consequences down the line. It's kind of like trying to lie about a subject that you don't know much about; your lie-web unravels on scrutiny from experts and the diligent.
And further if you are sued and it comes out that you skipped ACID to cut costs, the jury won't be very lenient. The car industry found this out for situations where they skipped safer designs to shave some bucks.

--
Table-ized A.I.
Re:Possible != Practical by sarkeizen · 2010-09-01 05:44 · Score: 1

Ok all puns on "acid" aside (especially when you add adjectives like "big-iron"). The point of the article seems to be about scaling out - specifically with cheaper hardware. I agree that one's choice of tools is a business decision (so is everything in business) but it's not like using MySQL or postgres is somehow cost prohibitive.
Re:Possible != Practical by Peeteriz · 2010-09-01 05:54 · Score: 2, Insightful

Typically the NoSQL approach just shifts the problems from the database layer to the application programmer - if it's simply ignored, a typical app can't cope with unpredictable/corrupt data being returned from db, and results in weird bugreports that cost a lot of development time to find and fix; and with these fixes parts of the ACID compliance are simply re-implemented in the app layer.
You gain some performance of the db, you lose some (hopefully less) performance in the app, and it costs you additional complexity and programmer-time in the app.
Re:Possible != Practical by Tablizer · 2010-09-01 06:00 · Score: 1

Given all else being equal, supporting ACID is going to require more hardware than not supporting ACID (as long as living with or ignoring more integrity errors is not a huge problem, which is probably domain-specific).

--
Table-ized A.I.
Re:Possible != Practical by Tablizer · 2010-09-01 06:08 · Score: 1

Not necessarily. For one, it could be dumping the extra processing to the user's browser, whose CPU cycles are generally not part of a company's cost (if they don't overdue it).
Second, the cost of dealing with problems may not be that big in some domains. They may simply dump, skip, or ignore the error or transaction(s). It's all about weighing the trade-offs. Do you spend X dollars to reduce the number of pissed users by Y percent, for example.

--
Table-ized A.I.
Re:Possible != Practical by Troy+Roberts · 2010-09-01 06:16 · Score: 1

Now, if only you had read the article
Re:Possible != Practical by Tablizer · 2010-09-01 06:38 · Score: 1

Perhaps the word "big-iron" was misleading on my part. My apologies. As I mentioned nearby, ACID is going to cost more than non-ACID even if cheap boxes are being used. ACID will require more "cheap boxes" on average for the same user volume.

--
Table-ized A.I.
Re:Possible != Practical by sarkeizen · 2010-09-01 08:57 · Score: 1

Possibly (depending on what is meant by 'all else'), but outside of labour I doubt that distinction would shows up very often. I suspect that we are talking about something like 20% per-cpu.
Re:Possible != Practical by shutdown+-p+now · 2010-09-01 11:08 · Score: 1

The point of "NoSQL" is that you don't actually always need all guarantees of ACID. It lets you dump those (depending on the specific solution you use, the set of those dumped is different) in exchange for scalability.
The problem, as you rightly point out, is that many people who don't understand ACID think they don't need it, while they actually do. It's the same fundamental problem as with every other tech fad - people hear a new buzzword, and start using it in new projects (or, worse yet, switch existing projects) just because it's new and "hip" and "cool" (and because "SQL is for old people"), without understanding why the new tech was introduced, and where it is reasonably applicable.
Re:Possible != Practical by DragonWriter · 2010-09-02 11:02 · Score: 1

A bigger issue may be the cost of ACID even if it can in theory scale. Supporting ACID is not free. A free web service may be able to afford losing say 1 out of 10,000 web transactions. Banks cannot do it, but Google Experiments can. The extra expense of big-iron ACID

Its not clear to me that guaranteeing availability with only eventual consistency (the usual non-ACID "NoSQL" approach) is any cheaper than guaranteeing consistency in a distributed system and sacrificing availability, and its even less clear that it is unavoidably more expensive.
The unavoidable cost (in a distributed system implemented in multiple nodes on an unreliable network) of consistency is availability, and vice versa. That's the essential meaning of the CAP Theorem.
Re:Possible != Practical by DragonWriter · 2010-09-02 11:08 · Score: 1

Given all else being equal, supporting ACID is going to require more hardware than not supporting ACID

If all else could be equal, that might be true.
A lot of the motivation for NoSQL systems is that most of them provide strong availability guarantees, at the expense of strict consistency. In a distributed system implemented over an unreliable network, that's an unavoidable tradeoff -- you can have availability or consistency, but not both. That's the upshot of the CAP theorem.

ACID does not imply SQL by LightningBolt! · 2010-09-01 05:28 · Score: 2, Insightful

For instance, Neo4J is a scalable graph-based "nosql" DB with ACID.

--
Old people fall. Young people spring. Rich people summer and winter.

Re:I hate SQL and Databases in General... by jeff4747 · 2010-09-01 05:34 · Score: 5, Insightful

Why is it that we continue to use a technology based on a 1960's view of a problem when clearly there ARE other solutions and ways to approach said problem?

Because it works.

"It's old" is a terrible reason to replace something. Go back to your previous arguments an you have a case. After all, a Core i7 is based on a 1960's view of a problem with an enormous number of band-aids applied in the intervening years, but you don't seem too concerned with replacing that.

Re:I hate SQL and Databases in General... by poet · 2010-09-01 05:36 · Score: 5, Informative

Spoken with proud ignorance.

Anyone who has properly scaled an application knows the database isn't the problem. If it was, it wouldn't take 12 applications servers to bring the thing to its knees. That said, most of your gripes equate to:

I am not a DBA and therefore I do not understand DBA and therefore I must complain.

Further SQL has nothing to do with ACID. AT ALL!

--
Get your PostgreSQL here: http://www.commandprompt.com/

NoSQL is also about arbitrary schemas by scorp1us · 2010-09-01 05:37 · Score: 1

NoSQL's two big features are scalability and the arbitrary schemas. While the paper covers the first (though I still think map/reduce has its place) NoSQL does do taxonomy-based (hierarchical) schema better. The only way to do that in SQL is to have a property table, where the parent object is a object RID, and a huge table of attached properties and values to that. You might be able to get your indexes to perform reasonably well, but only by duplicating the some data. And on top of that, just try writing a query for hierarchical data! You'll have sub-selects for each level of hierarchy. This means in order to to something relatively simple, like KPCOFGS of species classifications, you'll need a select and 6 sub-selects. At least that one is well defined to . If its not, you just don't know how many, and you have to write a recursive function to generate your select query, or process the results from it. Either way, you repeatedly consider 99% useless records at every level. True, you can cheat at this because there are always 7 levels. But that is not true for most other trees.

--
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.

Re:NoSQL is also about arbitrary schemas by capnchicken · 2010-09-01 05:49 · Score: 1

There is more than one way to do Hierarchical Query's, it just depends on the RDMS. Oracle has had it for years and SQL Server implemented it in the 2005 edition. You don't need sub-selects.

--
A libertarian shat on my carpet once. Claimed the free market would sort it out. -Ford Prefect(8777)
Re:NoSQL is also about arbitrary schemas by Anonymous Coward · 2010-09-01 05:52 · Score: 1, Informative

ANSI defines a query mechanism for walking heirarchies in Common Table Expressions. Granted, the garbage that is MySQL has no support for this, but the latest releases of Postgresql, Oracle, DB2 and Microsoft SQL Server all do using the same syntax. I will say that SQL is not really the best suited for such things, but it does work.
As for schema-less data, there are a couple of solutions which I believe are all DB-specific. Microsoft SQL Server allows storage of XML data as well as SQL extensions utilizing XQuery to query into that data. It also supports indexing that data and using XML schemas to constrict the nature of that data if necessary. Microsoft SQL Server 2008 also added sparse table support which is built on top of XML storage which allows a table to have 30,000 columns and optimized for the majority of those columns being NULL on any particular row. I know that ANSI does get into XML storage a little bit but I'm not sure which DBs actually implement the standard, if any, especially to a level where it would be a workable solution.
Re:NoSQL is also about arbitrary schemas by capnchicken · 2010-09-01 06:53 · Score: 1

Oracle's CONNECT BY is much much slower than a custom index based on nested sets...Tell me something about default SQL implementations...
Sure. Default SQL implementations are going to be more feature rich to accommodate for a larger set of use cases than a custom implementation which can make use of domain specific shortcuts for performance gains.
TMTOWTDI ... just sayin'
"premature optimization is the root of all evil"
-Knuth

--
A libertarian shat on my carpet once. Claimed the free market would sort it out. -Ford Prefect(8777)
Re:NoSQL is also about arbitrary schemas by akpoff · 2010-09-01 07:26 · Score: 1

And on top of that, just try writing a query for hierarchical data! You'll have sub-selects for each level of hierarchy. This means in order to to something relatively simple, like KPCOFGS of species classifications, you'll need a select and 6 sub-selects. At least that one is well defined to . If its not, you just don't know how many, and you have to write a recursive function to generate your select query, or process the results from it. Either way, you repeatedly consider 99% useless records at every level. True, you can cheat at this because there are always 7 levels. But that is not true for most other trees.
This is true if you use the Agency List Model for hierarchical data. Nested Set Models are a better solution to storing hierarchical data and are extremely fast and efficient for selecting arbitrarily deep nested data without tons of sub-selects. Though inserts are slow in theory (because you have to re-balance the tree) there are practical ways of inserting data so performance doesn't suffer.
See the MySQL site for their discussion of the Nested Set Model, this article on the same topic by Joe Celko and a question about insert performance in which Celko responds.
Re:NoSQL is also about arbitrary schemas by julesh · 2010-09-01 09:45 · Score: 1

NoSQL's two big features are scalability and the arbitrary schemas.
It amazes me how often the third crucial feature is missed: efficiency of implementation. Sending data to and from a database server by serializing to text and back to binary again is a seriously inefficient way of processing it, yet this is how it is typically done with SQL databases. A binary in-process API for manipulating the data without serializing it can have a profound impact on speed, which is a different thing entirely from scalability.
Re:NoSQL is also about arbitrary schemas by DragonWriter · 2010-09-01 10:55 · Score: 1

And on top of that, just try writing a query for hierarchical data! You'll have sub-selects for each level of hierarchy.

Well, if you are using a database that doesn't support the recursive form of Common Table Expressions from the SQL standard (from, IIRC, SQL-99.) That I know of DB2, SQL Server, Firebird, and PostgreSQL support that, and Oracle has a custom syntax for heirarchical queries (I think Oracle 11 may support recursive CTEs with standard syntax, as well.)
Using recursive CTEs, its possibly to do queries against heirarchies (and other graphs) directly in SQL, including queries that handle arbitrary depth rather than known-depth structures of the type you describe.
Re:NoSQL is also about arbitrary schemas by DragonWriter · 2010-09-01 11:04 · Score: 1

This is true if you use the Agency List Model for hierarchical data.
You probably mean "Adjacency List" here.
Re:NoSQL is also about arbitrary schemas by akpoff · 2010-09-01 11:10 · Score: 1

Haha! You're right, I did.
Thanks!
Re:NoSQL is also about arbitrary schemas by shutdown+-p+now · 2010-09-01 11:24 · Score: 1

I know that ANSI does get into XML storage a little bit but I'm not sure which DBs actually implement the standard, if any, especially to a level where it would be a workable solution.
I don't know whether it has much in common with SQL/XML parts of the SQL:2006 standard - which define XML storage in database, and querying over it using XQuery - but Oracle definitely supports both XML-in-database, and XQuery for it. PosgreSQL supports SQL/XML storage and export, but not XQuery.

Re:I hate SQL and Databases in General... by mugnyte · 2010-09-01 05:38 · Score: 1

Actually, if you look at set theory and declarative languages, SQL is coming to more traditionally procedural environments. (MS's LINQ, for example.) It's an amazing language, good at what it's supposed to do. You could nearly complain the same about XML transforms as SQL. They just collect & format data. It's the programmers who make it complex.

Unavoidable bottlenecks in systems come from storage, searches and transforms. If you want to remove the DB from the equation, what layer of your system should be performing these things?

BTW: The math in set theory hasn't changed since the 1960's, it doesn't "get old" and need replacing. And you should learn to spell COBOL, your rants will appear more credible.

You hate what you don't understand by frist · 2010-09-01 05:38 · Score: 5, Insightful

Sounds like you don't really understand what you're talking about. The reason we continue to use ACID compliant RDBMS is because they work and they work well. If you don't think that RDBMS have changed over the years, you're simply lacking experience. I feel this is most likely the case as you comlain about the interface language (SQL), and don't understand how to CM stored procedures, or how to test a DB (OMG I have to make a copy of the DB to test - so hard!) Comlaining about the overhead of using an RDBMS in an application that doesn't require an RDBMS is tantamount to complaining about how hot you get while wearing a spacsuit when you jog in the park.

They answered the wrong question by mysidia · 2010-09-01 05:45 · Score: 2, Insightful

We knew ACID can scale already.

With enough money poured into it, and new implementations, ACID can scale.

They solved some problems with scaling out, not necessarily the problems with it scaling up. Scaling does not necessarily just mean replicas and quick failover -- it means good performance without millions spent on hardware too, in terms of overhead, storage requirements, storage performance, server performance.

NoSQL scales in certain cases less expensively, with less work, and doesn't require complicated DBM algorithms. The representation of data is also simpler, and requires less work to maintain than tables.

It's just a result of major existing SQL implementations being so expensive with large datasets, that sometimes it costs more in terms of performance and required hardware, than simply using NoSQL.

I also love this gem from the article:

If the system is also stripped of the right to arbitrarily abort transactions (system aborts typically occur for reasons such as node failure and deadlock), then problem (b) is also eliminated. ... given an initial database state and a sequence of transaction requests, there exists only one valid final state. In other words, determinism.

I suppose the authors are from a land where hard drive space is infinite, database server resources are always guaranteed ahead of time... I/Os never have unrecoverable errors, syscalls never return error codes, RAM is infinite, programs never crash.

The conclusion that ACID alone is the bottleneck is not necessarily true. The SQL language itself requires a complex implementation just to parse and implement queries, that can add latency.

Re:They answered the wrong question by NoOneInParticular · 2010-09-01 09:08 · Score: 1

How did you know that ACID could scale? Given three data centres each in a different continent, and a query on three (parts of) tables, each located in one of these three data centres, how can you make sure that the result in Atomic, Consistent, Isolated and Durable, all in finite time? This is far from a trivial problem, especially in the real world situation that a network can be unavailable for minutes at a time.
Amazon tried to solve this with the concept of 'eventual consistency' (relaxing the C in ACID). These guys seem to be able to do this transactionally. You seem to claim that this is a non-issue and that it's the parsing of SQL that is the real bottleneck. You sure seem to know what you are talking about.
Re:They answered the wrong question by mysidia · 2010-09-01 16:33 · Score: 1

Given three data centres each in a different continent, and a query on three (parts of) tables, each located in one of these three data centres, how can you make sure that the result in Atomic, Consistent, Isolated and Durable, all in finite time? This is far from a trivial problem, especially in the real world situation that a network can be unavailable for minutes at a time.
The problem you have stated is something more complex than the problem of scaling datasets...
Scaling database sizes does not necessarily mean building three datacenters and splitting your tables into parts..
If you have enough money, you can use one datacenter, one part.
Amazon's issue is splitting a database up between multiple datacenters, so they can better serve different areas.
That's nice, but it's a separate issue from scaling SQL and scaling ACID for performance with large datasets.
That is: Amazon's problem is not the major scaling problem that drives most people to NoSQL, this 'division' problem is a problem separating responsibility of a workload to multiple remote independent database servers, and maintaining the transactional integrity.
This is far from a trivial problem, especially in the real world situation that a network can be unavailable for minutes at a time.
It's far from trivial, and inherent when you want to separate responsibility for a database in a transactional system between multiple autonomous nodes. If your network connection to other datacenters is unavailable, and someone requests to "write data", you only have a few options , and none of them are particularly friendly to the users of the application.
(a) Server delays SELECT queries, until the network is available again, so you don't get to read any data until servers can actually connect back together and arrive at a new consistent state that they definitely agree about.
(b) SELECT queries still work but return old data, some servers delay new INSERT/UPDATE/DELETE/ALTERs, until the network is available again, so all servers can move to the next database state.
(c) Servers reach some advance agreement utilizing a clustering algorithm, about which writes can be done immediately by which server, when some servers are isolated, you apply (a) or (b) to anything that requires updating (or seeing respectively) data outside the local server's responsibility.
So yes, it is hard to avoid violating ACID in clustering scenarios. But I do not equate 'clustering' and 'scaling'.
Re:They answered the wrong question by ras · 2010-09-01 16:52 · Score: 1

I suppose the authors are from a land where hard drive space is infinite, database server resources are always guaranteed ahead of time...
No, actually. They come from the land that drops a replica if it hits one of these problems.
Also, they don't actually mean you can't do aborts. A transaction can abort. It is just that all replica's must do the same thing with the transaction - commit or abort. Given you can achieve that, the rest of the paper follows easily. The magical bit is the pre-processor that guarantees all nodes (which might have different data) guarantee that without being a bottle neck or a central point of failure. Now that is magical, and I don't know how they do it.
Re:They answered the wrong question by mysidia · 2010-09-01 17:33 · Score: 1

Yes.. I anxiously await a complete implementation....
Re:They answered the wrong question by DragonWriter · 2010-09-02 11:30 · Score: 1

Also, they don't actually mean you can't do aborts. A transaction can abort. It is just that all replica's must do the same thing with the transaction - commit or abort. Given you can achieve that, the rest of the paper follows easily. The magical bit is the pre-processor that guarantees all nodes (which might have different data) guarantee that without being a bottle neck or a central point of failure. Now that is magical, and I don't know how they do it.
I don't know how they do it, but the fairly well-established mechanism of doing that without a central point of failure is use of consensus algorithms (e.g., the Paxos algorithm and its various descendants.)

Re:I hate SQL and Databases in General... by amorsen · 2010-09-01 05:50 · Score: 1

The parent is not a troll, it is spot on. The problem is that the database backend and the language frontend are tied together. To invent a new query language you need to invent a database backend to go with it, and you can't try out a new query language on an existing database deployment. Similarly, any innovations in the database backend are hampered by the limited syntax of SQL. If you can't make a small extensions to SQL to get it working, then you can forget about implementing it at all. This pretty much means game over for any database innovations.

Even Relational Algebra is infinitely easier to understand than the pseudo-English mess that is SQL. Much like even Haskell is easier to read than COBOL.

--
Finally! A year of moderation! Ready for 2019?

Re:I hate SQL and Databases in General... by PotatoFarmer · 2010-09-01 05:55 · Score: 1

Why is it that we continue to use a technology based on a 1960's view of a problem when clearly there ARE other solutions and ways to approach said problem?

Which problem? Storing your data, retrieving your data, modifying your data while guaranteeing transactional integrity, analyzing your data in aggregate, providing ways to recover your data, providing ways to reset your data to a previous state?

I'm not saying a traditional relational database is the perfect solution to everything, but it's silly to think that every approach will address the same set of concerns.

Re:I hate SQL and Databases in General... by paulsnx2 · 2010-09-01 05:56 · Score: 1

I will absolutely agree that a well designed database does not have performance issues. However, I work in a segment of the industry that works with Health and Human services, and the databases have issues that make any reasonable DBA sick.

None the less, database throughput is always an issue. Our applications scale just fine for our needs (as you imply) but it remains that even if only one person is running one application against the database, the through put is just "meh" at best. This is because every operation requires queries against the database to move significant amounts of data from many different tables. Could we build applications with better performance? Absolutely, and using traditional Relational Databases too, if the Schema was properly designed.

All of this begs the question. The real question is why we use a technology that is so sensitive to bad schema design? Why use a technology that has such a high baseline overhead? Why use a technology that is so tedious? Why use a technology that is so hard to test?

Absolutely the developer doesn't have to build applications that inherit all these problems from the database. I have designed applications that sit on databases, and have none of these faults. But unfortunately not all the applications I work on were designed to avoid these issues.

Now you ARE right that I am not a DBA. But if I have a fault, it isn't because I don't understand the DBA, but that I don't understand the database....

And yeah, in my rant I criticized SQL and ACID and relational databases in general as if they were all the same. They are not, and in fact need not have any overlap at all. Still, I'll stand by my rant as an expression of my annoyance with various aspects (these and others) of this particular approach to the persistence problem.

Re:I hate SQL and Databases in General... by Peeteriz · 2010-09-01 05:57 · Score: 1

Any decent framework abstracts out the SQL syntax for you in a nice manner (say, ARel in the Rails 3.0 framework is quite nice) , but gain a lot of compatibility by using SQL, allowing to choose from engines from SQLite in a flat file to Oracle on a cluster.

Re:I hate SQL and Databases in General... by jeff4747 · 2010-09-01 05:59 · Score: 1

You are not going to win points for reading comprehension if you don't read the whole sentence.

Irony much?

You might wanna read my 2nd sentence. I know, I know. That's really far into my post.

Not Proven (Yet) by TheNinjaroach · 2010-09-01 06:00 · Score: 1

I don't think they've proven it yet, they simply offer some solutions to what they admit is a very difficult problem. In other words, we'll see how their ideas pan out.

--
I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..

Just in case anybody else doesn't know... by elwin_windleaf · 2010-09-01 06:01 · Score: 3, Informative

From the Wikipedia Article (http://en.wikipedia.org/wiki/ACID)

"In computer science, ACID (atomicity, consistency, isolation, durability) is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction."

Re:Just in case anybody else doesn't know... by arose · 2010-09-01 08:29 · Score: 1

Also worth noting: just because it has ACID doesn't mean that it's SQL, and just because it's SQL, doesn't mean that it is ACID.
In other words, the only thing that is SQL vs NoSQL here is that the former tend to be ACID while the latter are all over the board because there is quite a bit that isn't SQL out there.

--
Analogies don't equal equalities, they are merely somewhat analogous.
Re:Just in case anybody else doesn't know... by DragonWriter · 2010-09-01 10:40 · Score: 1

Also worth noting: just because it has ACID doesn't mean that it's SQL, and just because it's SQL, doesn't mean that it is ACID.
Unless I recall correctly, to the extent an "SQL" system doesn't offer ACID guarantees, it is also divergent from the SQL standard.
Lots of things that use "SQL", of course, fall short of the standard in the details, and some things that use "SQL" (even as part of their names) aren't so hot on the fundamentals, either.
Re:Just in case anybody else doesn't know... by arose · 2010-09-01 11:39 · Score: 1

I think that when a particular standard or implementation isn't specified, SQL would be a database using SQL as the it's query language. Is there anything else that everyone can even remotely agree on?

--
Analogies don't equal equalities, they are merely somewhat analogous.

NoSQL is about a lot of things. by Ouija · 2010-09-01 06:02 · Score: 2, Interesting

SQL syntax is dated and very obtuse. Just look at the different syntax between insert and an update. ...wouldn't you rather just have "save"?

Object-relational mapping is cumbersome and mis-matched in SQL. 1:many either yields n+1 queries or a monster cartesian product set. And, what about inheritance? It just doesn't jive.

It isn't about losing ACID- although not every purpose needs ACID. Your average shared drive filesystem isn't ACID, for example.

When you have anemic domains that aren't nailed down and need to be readily flexible without big re-designs, JSON-based No-SQL works very well.
When you want to avoid n+1 and have well-defined data needs with 4MB of data across your object graph, No-SQL works... very very well.
When you want to segregate the business services and its backing data store from the separate concern of BI, No-SQL keeps the riff-raff out of your data store.

It's different. It solves different problems. Keep your mind open.

--

-Ouija- poke 53280,11:poke 53281,12

Not NoACID, NoSchema by bokmann · 2010-09-01 06:03 · Score: 2, Interesting

Interesting article )and yes, I read the article), but the point of the NoSQL movement isn't so much about SQL, or ACID, as much as it is about Schema.

Most applications today are written in object-oriented languges like Java, C#, Ruby, etc... and most common frameworks in these languages use object-relational models to essentially 'unpack' the object into a relational model, and then reconstitute the objects on demand. this post explains the kinds of problems better than most.

NoSchema is about storing data closer to the format we process it in today. Key-Value pairs. XML. Sets and Lists. Object-Oriented data structures. This is about abstractions that make developers more productive. It is a tool in a toolbox, and useful in some circumstance and not in others.

SQL databases do not have to be the 'one persistence data mechanism to rules them all'. We don't need one; we need many that solve differing classes of problems well.

Re:Not NoACID, NoSchema by PhrostyMcByte · 2010-09-01 11:05 · Score: 1

While it's true that being schemaless is a big plus for NoSQL, it's also true that the majority of NoSQL databases put a lot of focus on scalability and high-availability. They give up ACID and put a lot of restrictions on how you use them in the process. They also tend to have very poor performance for small-scale things -- I don't know if that's an inherent consequence of using Erlang (seems to be popular for these things), or if it's something else.
I'd love to see a MySQL-class schemaless database -- you know, one not meant for super-scaling to Google- or Digg-sized throughput. One that is good for 95% of projects, runs good on a single server, and implements all the nice sugar on top of things that people have come to expect like ACID transactions, indexes, complex server-side queries with arbitrary IO-heavy joins, etc.
Like any self-respecting coder, I enjoy jumping on new tech to try it out and have tried CouchDB, Riak, and Azure Table. Ultimately I just find it hard to make anything serious with, because even if you've got the time and dedication to learn how to use one correctly (which can be quite tough), you'll find out the next one you try is completely different! Limitations, data formats, query syntax, etc. -- they're all too different. If you write for one and choose to move to another, you've got a lot of rewriting and debugging ahead of you. There's a lot of non-standard SQL databases out there, but at least most have a significant standard subset that works well on all of them without any gotchas.
And one small correction to TFA: SQL Azure supports full ACID transactions with no unusual restrictions. He is thinking of Azure Table, which is their low-level NoSQL store. Both live in the cloud and provide high-availability, but Azure Table is much more scalable.

Re:I hate SQL and Databases in General... by davidbrit2 · 2010-09-01 06:07 · Score: 2, Informative

...because on every application I have ever worked on, the Database has always been the performance bottleneck.

What alternative have you seen that handles the same workload more efficiently? Flat files? I've seen plenty of database-related performance issues, but it's almost never inherent in the database - it's the idiot that wrote the lousy table-scanning code that's reading a couple rows out of a table with millions that's the problem.

Testing of DB applications is always a problem, because the running of tests generally changes the database, rendering tests unrepeatable without reseting the database.

If only you could start something like a "transaction", which you could then "roll back" after finishing the test, leaving the database in its original state. And if you could somehow "back up" the database and "restore" it on a test server, or under a different name. That would be awesome.

And don't get me started on stored procedures and the difficulty of using source code management with stored procedures.

Checking your create/change scripts into source control is no more difficult than checking your C source in prior to compiling it.

SQL is fixed in a syntax and written with naming conventions and styles that can best be described as neo-Cobal.

While I don't totally disagree on this point, calling SQL "fixed" is a bit like saying C# and Java are the same. I promise you any meaty SQL Server code will not run on Oracle without very significant changes that will have to be done by someone that will cost you a lot of money (and likewise with Oracle to SQL Server). The capabilities vary wildly by platform, and the syntax is only identical for the simplest of CRUD statements.

Last gripe: A traditional Relational database imposes ACID overhead on every application, even if you don't really need it or use it. This is like a programming language that imposes a SORT overhead on all your data structures even if you rarely or never need to sort them.

I have to give this one a LOLWUT. If you're using a big RDBMS, it's likely a multi-user system. If you've got multiple users and connections, you want ACID. This isn't like imposing sorting overhead on data structures, it's like imposing the basic memory protection, process isolation, and filesystem durability you find in any competent operating system. If you want to see what it's like without those protections, go use Mac OS 9 for a week or so, or an Access database used by a few dozen people over a network.

Re:I hate SQL and Databases in General... by medv4380 · 2010-09-01 06:08 · Score: 1

This reminds me of a quote I have at my desk.

Normalization is not just some plot by database programmer to annoy application programmers (That is merely a satisfying side effect!)

Err... by WSOGMM · 2010-09-01 06:10 · Score: 1

Acid is definitely scalable if you use blotter paper.

Re:I hate SQL and Databases in General... by Have+Brain+Will+Rent · 2010-09-01 06:15 · Score: 1

Absolutely true. I rewrote an application that had a 70 table database to use a simple tree structured representation - it ran two orders of magnitude faster and the code was easier to understand because the data representation conformed well to the actual problem domain. Relational databases are great but they aren't always the appropriate answer.

But as an aside I don't think hyperbole is the enemy of critical thinking - it is just a tool (perhaps weapon) the proper employment of which requires immensely more skill than most people possess.

--
The tyrant will always find a pretext for his tyranny - Aesop

Re:I hate SQL and Databases in General... by iamhigh · 2010-09-01 06:19 · Score: 1

SQL is still SQL. SQL is fixed in a syntax and written with naming conventions and styles that can best be described as neo-Cobal.

Has relational algebra changed (no, it's complete)? Why would the basics of SQL change then? Sounds like you just don't understand relational math and structured informaion basics.

--
No comprende? Let me type that a little slower for you...

Prove? by Troy+Roberts · 2010-09-01 06:19 · Score: 1

The editors have a loose definition of the work prove. I read the article and they provide some compelling arguments. However, I saw no proof in a mathematics or scientific way.

Re:Prove? by julesh · 2010-09-01 09:58 · Score: 1

The editors have a loose definition of the work prove. I read the article and they provide some compelling arguments. However, I saw no proof in a mathematics or scientific way.
Perhaps you miss where they say they have a forthcoming paper?

It's scalable allright. by Major+Downtime · 2010-09-01 06:21 · Score: 1

Of course ACID is scalable, but you have to be very careful with the dosage. Even Albert Hofmann himself never doubted that.

Relaying my comments from the blog by Cyberax · 2010-09-01 06:23 · Score: 1

To achieve 'nonconcurrency' one needs to introduce a global ordering of transactions. Which WILL require a shared resource among ALL of the transactions. No way around it, sorry.

And what's funny, this resource some of the problems of ACID systems. However, there should be advantages (no need for rollbacks, etc.).

Besides, all of this doesn't tackle another advantage of NoSQL systems: working with HUGE amounts of data. There'll still be problems in ACID systems if data access requires communication between several storage nodes.

And don't forget the CAP theorem. You can't get Consistency, Atomicity and Partition Tolerance at the same time. RDBMS typically 'solve' it by dropping the requirement for the partition tolerance. Usually by using quorum sensing schemas, etc.

Re:Relaying my comments from the blog by DragonWriter · 2010-09-01 10:27 · Score: 1

And don't forget the CAP theorem. You can't get Consistency, Atomicity and Partition Tolerance at the same time. RDBMS typically 'solve' it by dropping the requirement for the partition tolerance. Usually by using quorum sensing schemas, etc.
First, CAP is Consistency (which is different than the Consistency in ACID, incidentally, its consistency across nodes, not consistency with integrity constraints), Availability, and Partition Tolerance.
Second, quorum-based systems sacrifice availability, not partition tolerance; they are typical of distributed, strongly-consistent, databases. Sacrificing partition tolerance essentially means that the system cannot be implemented on top of a network that can lose messages (this is typical of non-distributed RDBMSs.)
Re:Relaying my comments from the blog by Cyberax · 2010-09-01 11:14 · Score: 1

"First, CAP is Consistency (which is different than the Consistency in ACID, incidentally, its consistency across nodes, not consistency with integrity constraints), Availability, and Partition Tolerance."
Sorry, a typo.
"Second, quorum-based systems sacrifice availability, not partition tolerance; they are typical of distributed, strongly-consistent, databases. Sacrificing partition tolerance essentially means that the system cannot be implemented on top of a network that can lose messages (this is typical of non-distributed RDBMSs.)"
That depends on your definition of 'partition tolerance'.
Quorum systems clearly fail the: "No set of failures less than total network failure is allowed to cause the system to respond incorrectly" criterion since one only needs to destroy quorum to stop the system from working.
Re:Relaying my comments from the blog by shutdown+-p+now · 2010-09-01 11:29 · Score: 1

Besides, all of this doesn't tackle another advantage of NoSQL systems: working with HUGE amounts of data.
Traditional SQL RDBMS can handle huge amounts of data very well - I recall there were known databases of hundreds of terabytes a few years ago, and at their rate of growth back then, a few should be over a petabyte today.
"NoSQL" promise is rather to scale better as the number of concurrent requests increases.
Re:Relaying my comments from the blog by DragonWriter · 2010-09-01 12:10 · Score: 1

Quorum systems clearly fail the: "No set of failures less than total network failure is allowed to cause the system to respond incorrectly" criterion since one only needs to destroy quorum to stop the system from working.
That's not at all true. Not responding (which is what most quorum systems do when a quorum is lost) is not the same as responding incorrectly.
It is a failure of availability, not partition tolerance.

Re:I hate SQL and Databases in General... by paulsnx2 · 2010-09-01 06:26 · Score: 1

Uh.... If I never said that "being old" is a reason to replace something.... As you would have known if you actually read the sentence you quoted. Given this observation, what am I to say about the fact that the Core i7 is based on a 1960's view of a problem? Besides, the Core i7 ISN'T a 1960's based solution, but is based on a 1960's solution. There is an important difference between the two statements.

Everything we do in CS is based on work that goes back to 1939 and even earlier. However, in the case of the Core i7 (as an example) we CHANGE the approach to try and fix various problems we have with our performance.

Personally, I think going back to old ideas and realizing that we can now implement them better/faster/cleaner is a great way to approach many problems. That a solution is "old" isn't a problem, but it is a problem if a solution has known issues, and we just live with them.

Re:I hate SQL and Databases in General... by gbjbaanb · 2010-09-01 06:26 · Score: 1

hmm. or you could have put an index on the right columns... which generally are implemented as tree structures. I'm sure your code was perfectly understandable to all who came after you, thinking they were working with a DB :)

Re:I hate SQL and Databases in General... by Have+Brain+Will+Rent · 2010-09-01 06:30 · Score: 1

I don't know if you are using SQL and "relational database" as equivalent... it seems that way. Anyhow a long time ago there were many different database solutions and most of them weren't relational databases. Then relational databases became popular and anything else almost seemed to disappear. I didn't really get this enormous shift because there are lots of domains where a relational database is not the natural representation of the information being modelled. But for most applications that most people are interested in relational databases work well and SQL represents the ideas behind relational databases quite well. So SQL is still here relatively unchanged decades later because nothing better has come along - apparently it fills its niche quite well - well enough that it hasn't been dislodged.

As for "neo Cobol" I think it was either Wirth or Dijkstra that said that typing speed was not the limiting factor in programming.

--
The tyrant will always find a pretext for his tyranny - Aesop

Hear, hear. by Cyberax · 2010-09-01 06:30 · Score: 1

Yes, I'd like to be able to work with RDMBS data in REAL languages, not in ugly SQL or even more uglier DB internal languages.

DB tables can be represented with lists, on which composable pure (side-effect free) functions could operate. So JOINs can be expressed as list comprehensions. 'where' naturally is expressed as filters, etc. Care should be taken to maintain purity of functions used in queries, so they can be optimized efficiently.

LINQ in C# has beginnings of something similar.

PS: Am I describing Haskell, by any chance? :)
PPS: If your query requires complex complex and non-trivial optimizations by the RDBMS engine, then it's a bad query.

Re:Hear, hear. by dumael · 2010-09-01 08:34 · Score: 1

To some degree yes.

http://research.microsoft.com/en-us/um/people/simonpj/papers/list-comp/index.htm

Re:I hate SQL and Databases in General... by gbjbaanb · 2010-09-01 06:31 · Score: 1

c'mon we use web services and only a few people complain about the inefficiencies there, we use XML and only some people complain about sprawling XML documents you can get.

You need to go learn a bit about DBs. SQL is pretty easy, once you've grasped the list-based concepts behind it. Stick to the simple bits and you're 90% done. They're not as bad as you think - its just your ignorance that's confusing you.

All technology suffers from the flaws you point out, all technology is fragile and easy to create total crap out of. (I know, I've worked with some 'professional' developers who make the most godawful mess, some of them even think they really are god's gift to coding).

DBs incidentally are one of those strange technologies where a 'clean, elegant and well designed' schema is a bad thing. If you over-normalise a DB performance will suffer, as will the code you have to write to use it. If you cobble everything into a few tables, it actually goes faster and is easier to code against. Strange, but true.

Re:I hate SQL and Databases in General... by DragonWriter · 2010-09-01 06:36 · Score: 1

BTW: The math in set theory hasn't changed since the 1960's, it doesn't "get old" and need replacing.

Its worth noting that, in additional to the arguments from proponents of non-relational databases, SQL also gets criticism from proponents of actually doing set theory right (e.g., Date and Darwen.)

Really, SQL and the databases using it are shaped as much by optimization of disk-based storage using popular computing architectures of the time at which it took shape as any mathematical model of data.

As computing architectures and performance attributes (not speed, but relative costs of different access patterns) of storage media change, underlying database implementations and the languages that best leverage them may change, even when you want to be generally guided by set theory.

Field calls by Florian+Weimer · 2010-09-01 06:38 · Score: 1

This seems to be a reinvention of field calls, with a slightly different purpose.

Re:Cowboy Chic by Tablizer · 2010-09-01 06:40 · Score: 1

"Referential integrity? We ain't need no stinkin' referential integrity."

--
Table-ized A.I.

Re:I hate SQL and Databases in General... by GooberToo · 2010-09-01 06:43 · Score: 2, Insightful

All of this begs the question. The real question is why we use a technology that is so sensitive to bad schema design? Why use a technology that has such a high baseline overhead? Why use a technology that is so tedious? Why use a technology that is so hard to test?

Because fairly consistently, for the past forty years, every time someone says they've created something better than SQL and released to the market, the market proves them woefully and completely wrong. As such, as much as people piss and moan about SQL, SQL has consistently proven to be an excellent, general purpose solution and amazingly poorly understood by the masses. And solutions such as MySQL has only made things worse. That's not to say there are not superior niche solutions, only that SQL is one of the few database technologies which has continued to survive for decades as a general purpose solution, and rightfully so.

Its like the world suddenly doing their own plumbing, framing, and mechanical work and then proudly exclaiming the state of architecture and the car industry stinks because the world is falling apart around them. In reality, that means we need far more qualified DBAs and far fewer people who can barely spell, "SQL", designing and condemning the world around us.

Its literally been years since I've run into a qualified DBA, despite the fact "DBA" was part of their title. Turns out, being able to spell, "DBA" is all too often enough to qualify one for such a position. And don't get me started on the all the more common case of people who don't even know what a DBA does and yet they are responsible for actually creating the schema/data model.

Re:I hate SQL and Databases in General... by paulsnx2 · 2010-09-01 06:51 · Score: 1

Maybe it is the fact that RDBS based solutions are too fragile and are too often crap. We need to develop in representations that make sense to developers, and have the right sorts of compiler technologies and tools that build the proper run time representations for performance.

That you CAN build manageable/fast/testable/efficient applications is only the first step.

The second is wringing manageable/fast/testable/efficient applications out of mere mediocre developers.

Stonebreaker's New Project by geoffrobinson · 2010-09-01 06:52 · Score: 1

Just in case anyone's interested: http://voltdb.com/

Stonebreaker started an open source database to implement the concepts he talks about.

--
Except for ending slavery, the Nazis, communism, & securing American independence, war has never solved anything.

Re:I hate SQL and Databases in General... by jeff4747 · 2010-09-01 06:55 · Score: 1

If I never said that "being old" is a reason to replace something

You said:

Why is it that we continue to use a technology based on a 1960's view of a problem

Your complaint: It's an old way of doing things.

My point: stick with everything else in your post, where you talk about efficiency and finding the language awkward. Your last sentence is summarized by "It's old, and we've thought of other things since then". That's not a useful argument.

When I explicitly referred to the rest of your post, that was kind of a clue that I read it.

we CHANGE the approach to try and fix various problems we have with our performance.

As long as we pretend that CISC is new, for example.

Re:I hate SQL and Databases in General... by DragonWriter · 2010-09-01 07:00 · Score: 1

Has relational algebra changed (no, it's complete)? Why would the basics of SQL change then?

Because SQL isn't a particularly faithful implementation of relational algebra?

Re:I hate SQL and Databases in General... by Have+Brain+Will+Rent · 2010-09-01 07:00 · Score: 1

Nope, trust me that wasn't going to work. It was much more natural to represent directly as a tree structure using XML.

--
The tyrant will always find a pretext for his tyranny - Aesop

Re:I hate SQL and Databases in General... by quanticle · 2010-09-01 07:09 · Score: 1

because on every application I have ever worked on, the Database has always been the performance bottleneck.

That means you need to fire your DBA and hire one that actually knows how to structure tables for performance.

Testing of DB applications is always a problem, because the running of tests generally changes the database, rendering tests unrepeatable without reseting the database.

And how is that different from testing any sort of application that has a persistent state?

Configuring applications to use this database or that database also ends up being a problem for most applications.

Really? What sort of libraries are you using? Every framework and DB library I've used has had a priority towards making it very easy to connect to a database. Usually, if you're only connecting to a single database, all you need to do is write your connection string in the appropriate file, and you're set. The only time you need to change that is when you're deploying your application from development to test, and from test to production.

Why is it that we continue to use a technology based on a 1960's view of a problem when clearly there ARE other solutions and ways to approach said problem?

Why do we use quicksort when there are other approaches to sorting?

--
We all know what to do, but we don't know how to get re-elected once we have done it

Re:I hate SQL and Databases in General... by quanticle · 2010-09-01 07:11 · Score: 2, Interesting

All of this begs the question. The real question is why we use a technology that is so sensitive to bad schema design? Why use a technology that has such a high baseline overhead? Why use a technology that is so tedious? Why use a technology that is so hard to test?

Those statements could be applied to any technology that's being used inappropriately. Why are our programs so sensitive to bad algorithm design?

--
We all know what to do, but we don't know how to get re-elected once we have done it

Re:I have to admit by spun · 2010-09-01 07:15 · Score: 2, Funny

I have a different image of ACID on Windows than they do.

Is it the image of Bill Gates in an Easter bunny outfit trying to force Steve Ballmer into a large cast iron kettle filled with Skittles and baby mice? 'Cause that's the image I have of ACID on Windows...

--
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton

ACID: Scale bigger, get slower by smcdow · 2010-09-01 07:15 · Score: 2, Interesting

TFA hints at this but doesn't come out and say it: the larger you scale, the more you swamp yourself with atomicity protocol overhead. If your database is geographically distributed, then you have to decide if atomicity is more important than forgoing the very large bills for the associated network usage. I suspect that this may explain a lot about why Google, Amazon, etc., went with NoSQL solutions.

--
In the course of every project, it will become necessary to shoot the scientists and begin production.

Re:I hate SQL and Databases in General... by iamhigh · 2010-09-01 07:16 · Score: 1

I would agree with that, which is why I said *SQL basics*. You won't change the basic reduction down to working with sets no matter what you do with the language or abstractions.

--
No comprende? Let me type that a little slower for you...

I thought it was an array of structs by jonaskoelker · 2010-09-01 07:17 · Score: 1

"Yeah, ask me a specific structured question and I'll give you a two-dimensional array to work with as an answer."

I thought it was more like an array of structs, where each array entry is a row and each struct member is a column. In non-C you might say each row is an object, each field-of-a-class is a column (where class : table) and each field-of-an-object is a single cell.

Then the cartesian product operation on tables of types T1 and T2 (respectively) has a type which is the product of T1 and T2, and everything matches up neatly.

Re:I thought it was an array of structs by bluefoxlucid · 2010-09-01 07:23 · Score: 1

You mean a linked list. I'm not sure for your particular API.
The issue here is that you get rows that are effectively struct { char[]; int; long int; int; double; char[]; char[5]; }; which you can do. What you can also do is void* result[][], where (*(result[row][column])) (note that the inner set of parenthesis is optional in this case, but syntactically valid and more visually clear) points to the correct data.
Working with arbitrary data gathered from an arbitrary information set is a pain. Consider the task, though: Keeping a pile of information and performing arbitrary operations on arbitrary subsets of that information, including arbitrary entries (rows) and/or arbitrary attributes (columns).

--
Support my political activism on Patreon.

Whose data is it? by sbjornda · 2010-09-01 07:17 · Score: 3, Insightful

but it stores its data in a way that doesn't require me to deconstruct all of my data structures into tables.

I take it this is not business-type data? Otherwise you're doing it backwards. Start with your Entity-Relationship diagrams, devolve into logical than physical data models, and THEN start programming.

I forget who said it but it's true: The data belongs to the business, not to the application. The data should be structured and stored in a way that it will still be readable years after your program has become obsolete. (Unless it's data that has a short "best before" date.)

--
.nosig

Re:Whose data is it? by GWBasic · 2010-09-01 11:19 · Score: 1

The data belongs to the business, not to the application. The data should be structured and stored in a way that it will still be readable years after your program has become obsolete.
Which is why breaking everything into tables is a PITA. Tables often have logical groupings of rows, which MongoDB is excellent at handling. You can still extract partial documents from the database if you need don't need all of the relationships.

--
No, I will not work for your startup

Finally by Joebert · 2010-09-01 07:24 · Score: 1

Yale Researchers Prove That ACID Is Scalable

Finally. I've been telling Bob that for years, but nooo, he insists that we keep using blotter paper and sour patch kids.

--
Wanna fight ? Bend over, stick your head up your ass, and fight for air.

Re:I hate SQL and Databases in General... by the+eric+conspiracy · 2010-09-01 07:35 · Score: 1

The reason these other database types went away is because the relational db + SQL handles ad-hoc queries very well. In many if not most db applications that is a killer application.

Re:I hate SQL and Databases in General... by roman_mir · 2010-09-01 07:40 · Score: 1

Why is it that we continue to use a technology based on a 1960's view of a problem when clearly there ARE other solutions and ways to approach said problem?

- seriously. I have this same problem with the entire DNA thing - it's too damn old and hard to understand.

I say we switch to a new paradigm - NoDNA.

From now on we don't need all those silly As and Gs and Ts and Cs and the entire twin helical strand idea, it's too freaking old. We must move on with times, so that we can implement NoDNA-DNA2 paradigm. It's going to be faster and easier on the eyes, it's going to have more Zaz in it. Zing, Zork, Kapowza, Mazooma in the bank!

It's just what cool kids would use.

--
You can't handle the truth.

Re:I hate SQL and Databases in General... by Improv · 2010-09-01 07:41 · Score: 1

Oh no you don't like the syntax. That's a great reason to turn away from a technology that has been implemented enough times and had enough research to bring it to where it is today.

If you can get over not liking the syntax, the SQL standard is pretty awesome, as are many (but not all) actual databases that use it. It's powerful enough to let you do some pretty complex queries, it's reasonably easily optimisable (and there is a lot of literature about that) provided you're not using a lousy database engine (like MySQL which can't even handle basic relational calculus planning in a sane way), it's pretty fast, and it offers some great guarantees. I have absolutely no idea what you mean by being difficult to test - either you know how to test or you don't. SQL doesn't get in the way there. You have a production data store and a test data store; you test changes together.

Stored procedures are not so widely used because they're not standard enough. However, they're not hard to use with source code management - you're making the wrong argument.

Your last gripe is fair, and if you are *really* sure you don't need ACID overhead and you have a reasonable alternative database, go for it. You're giving up on all the other research that's gone into the common platform, but that's a tradeoff that might be worth it for some purposes.

--
For every problem, there is at least one solution that is simple, neat, and wrong.

Re:The holiest of holy wars by Tablizer · 2010-09-01 07:45 · Score: 1

and someday, the model that surpasses [relational] (at least for certain cases) will be produced in the same way. It might be a descendant of one of the current so-called NoSQL approaches.

Perhaps, but the question facing an individual company is do they want to take on the risk of being the Guinea pig for relational's challengers. Most such experiments fail to catch on, statistically. (Some do find a nice little niche, but often outside of where it was tested.)

But this in no way changes the fact that any general purpose approach, at least in some (but probably many if not most) cases, will be outperformed by a well-designed application-specific method.

But generally needs become more complex and diverse over time for any growing or competitive company, and one-trick-pony self-rolled databases fail to flex. If a sufficient-performing general-purpose RDBMS is only slightly more expensive than the self-rolled one at the start, it's better future-proofing.

--
Table-ized A.I.

Re:I hate SQL and Databases in General... by Qzukk · 2010-09-01 07:57 · Score: 1

abstracts out the SQL syntax for you in a nice manner (say, ARel in the Rails 3.0 framework is quite nice)

In the end it still boils down to SQL, and while I think SQL is pretty damn good and getting better, it's never going to become not a pain in the ass for certain complex operations. Some databases like postgresql work around it with extensions (such as postgresql's SELECT DISTINCT ON (...) which made questions like "Give me a list of every customer and the date and amount of their most recent purchase" trivial to answer in a single query before WINDOW became part of the standard, and even now it's easier to understand than the WINDOW syntax, at least as long as you know it's a postgresql special)

--
If I have been able to see further than others, it is because I bought a pair of binoculars.

Re:I hate SQL and Databases in General... by Just+Some+Guy · 2010-09-01 07:58 · Score: 3, Informative

And don't get me started on stored procedures and the difficulty of using source code management with stored procedures.

That's easily solvable:

Create a subdirectory called "storedprocs" inside your SCM directory.
Inside that subdirectory, make files with names like "checkinvoice.sql" that store the sequence of commands required to create a stored procedure - one per file. Start each one with a statement like CREATE OR REPLACE FUNCTION myschema.checkinvoice([...]).
Manage those files with your SCM system. Group them by database, or by project, or by phase of the moon, or by whatever else makes sense to you.
To update every stored procedure you've ever written, or to build out a new database: cd storedprocs; psql < *.sql

Stored procedures don't have to be any more difficult to manage than any other code.

--
Dewey, what part of this looks like authorities should be involved?

Summary by azmodean+1 · 2010-09-01 08:07 · Score: 4, Informative

Short Summary:
We make some claims about scaling ACID databases, but then don't support them.

Longer summary:
We don't like NoSQL and enjoy making baseless cracks about it such as it being a "lazy" approach.
In our paper we demonstrate that our unconventional version of an ACID database scales better than a traditional ACID database in a specific environment, while merely throwing away some robustness guarantees and changing how transaction ordering works.
No direct comparison to any NoSQL implementation is made.

So yea, I'm not holding my breath for companies to start migrating away from NoSQL.

Re:Summary by SolitaryMan · 2010-09-02 00:42 · Score: 1

I'm not holding my breath for companies to start migrating *to* NoSQL

--
May Peace Prevail On Earth

Re:I hate SQL and Databases in General... by jimrthy · 2010-09-01 08:08 · Score: 2, Insightful

Please don't take this wrong. I really do mean my comments respectfully and politely. It's been a long day, and I'm not sure I managed to write as sincerely as I intended.

... because on every application I have ever worked on, the Database has always been the performance bottleneck

Wow. We've had very different experiences, then. Sure, there have been plenty of times when the database was the bottle neck. But it seems like I've have more issues with network speeds. And I can think of a few cases where the file system was the issue. At my current day job, the system bus seems to be the most common bottle-neck. Not that we touch databases all that often.

Testing of DB applications is always a problem, because the running of tests generally changes the database, rendering tests unrepeatable without reseting the database.

Isn't that generally considered a "best practice" anyway? I mean, I've pretty much always just taken that as a given. What do you consider a feasible alternative?

Configuring applications to use this database or that database also ends up being a problem for most applications.

OK, now I really have to ask what kind of development environment you're using. That's always seemed like a fairly moderate "no-brainer." Sure, it's mildly inconvenient to make sure connection strings got changed when migrating from dev to test to staging to production, but it's not that big a deal.

Furthermore, while programming in general has continued to progress through many languages, exploring many different ways to describe problems, SQL is still SQL. SQL is fixed in a syntax and written with naming conventions and styles that can best be described as neo-Cobal.

That's one way of looking at it, sure. Maybe you're missing the point, though? I mean, so many other languages and approaches have changed so drastically over the years...maybe SQL hasn't because it's good enough for what it does?

Bottom line: SQL is tedious, ugly, slow, and difficult to test.

Compared to what? Keep in mind its original purpose: letting business users look up algebraic sets while programmers got on with the serious data analysis. It just happened that having a standardized API that made it relatively easy to swap out back-ends turned out to be the easiest way for programmers to do our jobs.

If you really do have access to some magic technology that lets you look up persisted data (in a way that's anywhere near as flexible as SQL) significantly faster than any of the major RDBMSs...why haven't you founded a business on that and made your fortune?

And don't get me started on stored procedures and the difficulty of using source code management with stored procedures.

You definitely need to look into some better tools. File | Save As... to stash your SP's in some directory, add to source control (if it's new), check in.

Last gripe: A traditional Relational database imposes ACID overhead on every application, even if you don't really need it or use it. This is like a programming language that imposes a SORT overhead on all your data structures even if you rarely or never need to sort them.

It's been a while since I had to mess with SQL, but I seem to recall specifying hints about how much transactional consistency I actually needed. I think you may be exaggerating the overhead a smidge. And I'm pretty sure there are ways to work around it. But that's getting way off track.

Why is it that we continue to use a technology based on a 1960's view of a problem when clearly there ARE other solutions and ways to approach said problem?

Two suggestions. 1) It works. And DBA's hate learning new technology. 2) No one's come up with an alternative that's compelling enough to convince more than a tiny fraction of companies to

Re:I hate SQL and Databases in General... by RightSaidFred99 · 2010-09-01 08:21 · Score: 1

4 words: You're doing it wrong.

Re:I hate SQL and Databases in General... by localman · 2010-09-01 08:31 · Score: 1

I would be surprised if you've got significant experience. You sound like I did 12 years ago when I didn't understand. I had the same complaints you do. I patted myself on the back as I wrote several different storage systems for various projects. And every one of those systems was useful for no more than the toy problems I was involved in at the time. Once I had to make things that lasted in the real world, I saw the light.

For any non-trivial, multi-user persistent data storage, a database is usually the correct view, whether it came up in 1960 or not. The notion of information and the way things interrelate has not changed significantly since then, and is unlikely to in the near future. Those old timers did their homework. I eventually realized that they understood the topic far better than I did.

Yes, there are special cases where you can get an angle of advantage by skipping a traditional database, but you lose an enormous amount of power too. If the trade off is worth it, then great. Usually it's not. Usually you will eventually have to amend the structures, import and export data, distribute it to different teams, integrate with existing systems, provide reporting tools, etc, etc, etc and you'll wish to hell you had just had it all in an database.

Good luck.

Re:I hate SQL and Databases in General... by paulsnx2 · 2010-09-01 08:54 · Score: 1

My complaint ISN'T that the solution is old, but that very little has changed in the SQL/RDBMS approach through the years, despite the vast changes in hardware, the vast experience we have gathered in the awkwardness of using SQL to persist data, and the known performance costs to the SQL/RDBMS approach to persistence.

Maybe my summary statement stinks; it certainly has turned into a useless analysis of what the statement was taken to say vs. what I as an author think it says and meant for it to say.

So let's call you right about how poor the statement was as a summary, and me not so wrong in what I intended it to mean.

SQL != ACID by yyxx · 2010-09-01 09:38 · Score: 1

You don't need SQL in order to get ACID properties. And some common SQL-like languages don't provide ACID.

Furthermore, SQL wasn't designed for what it is being used for today; SQL was meant to be a database interaction language for non-experts.

And "information theory" doesn't mean what you seem to think it means...

My article summary by Anonymous Coward · 2010-09-01 09:51 · Score: 1, Insightful

Academic determines that if only you're willing to insert a single point of failure, all of your replication problems can be hand waved away. Also if you have this new single point of failure, somehow magically transactions will never need to abort ever again.

RDBMS is a golden hammer by yaphadam097 · 2010-09-01 09:51 · Score: 2, Insightful

The reason that NoSQL is necessary is that ACID is not the only thing that developers need to think about. RDBMS was an innovative solution to the limitations of mainframe hierarchical databases circa 1970. Since then it has been the only game in town (At least for most enterprise software. Some of us do other things occasionally.)

It turns out that there are reasons to do things other ways, and having other options allows you to consider trade-offs. For many applications eventually consistent data scales just fine. For some applications, both big and small, an enterprise RDBMS is overkill. Why not just persist objects to a document store? Or even the file system?

The research is interesting, although I agree that we already knew we could scale the ACID paradigm. The conclusion is ridiculous. NoSQL has nothing to do with ACID, and it brings a richness to the conversation that has been missing for far too long. Like the Perl folks say, TMTOWTDI.

Re:The holiest of holy wars by sloth+jr · 2010-09-01 10:02 · Score: 1

Thanks for this, this is EXACTLY what I think everytime a slashdot article concerning databases come up - here come the blowhards who understand every possible application and its needs. How long before someone trashes MySQL? It's tiresome.

To be fair, I think this thread has a lot of signal compared to many - and then someone'll ruin it by asserting some absolute.

Re:I hate SQL and Databases in General... by shutdown+-p+now · 2010-09-01 11:15 · Score: 1

As such, as much as people piss and moan about SQL, SQL has consistently proven to be an excellent, general purpose solution

I would disagree with that assertion. It would be more precise to say that SQL has proven to be a solution that is better than anything else offered so far, when taking into account factors such as the existing implementations, supporting tools, learning materials, and availability of skilled professionals (which are all major factors when comparing a mainstream entrenched tech with any newcomer; see Java vs Ruby etc).

SQL by itself isn't particularly well designed as a query language - it was based on fundamentally wrong premises (as many "4GLs", it was supposed to "enable non-programmers to write code"; as everything that has ever tried that before and after, it failed). Consequently, its syntax is quirky and overly verbose with no good reason, and its semantics very non-orthogonal. And that's without even getting into the issue of standard conformance of real-world implementations, and portability of non-trivial queries between them...

SQL's real strength is that it's 1) already there, and 2) is good enough. In that regard, it's very similar to C. It's very easy to come up with a systems programming language better than C, but C is so widely used that it beats any competitors simply by virtue of being good enough; you'd need some real breakthrough to unseat it. Same thing here.

Re:I hate SQL and Databases in General... by SanityInAnarchy · 2010-09-01 12:07 · Score: 1

If only you could start something like a "transaction", which you could then "roll back" after finishing the test, leaving the database in its original state.

...which means you now can't test transactions.

And if you could somehow "back up" the database and "restore" it on a test server, or under a different name.

That'd be obnoxious, but sure.

If you've got multiple users and connections, you want ACID.

Spoken like someone who's never written a web app.

Let's take Slashdot as a trivial example. Do I need transactions to be atomic? That'd be nice -- I don't want to see half a post succeed and the other half fail -- but it isn't really needed outside profile settings. Consistency? Posts really only depend on their parents, which are immutable -- pretty much by definition, if a user can see a parent post to reply to, by the time Slashdot will let them reply, that parent post will be committed. How about isolation? Nope -- if two users post two comments simultaneously, the application can trivially figure out which one should go on top -- no need for a transaction. Durability? Again, I don't see this being needed anywhere but in the user's profile.

So, we could take atomicity and durability as useful and easily drop isolation and consistency -- they simply are not needed and add zero value, unless I'm grossly misunderstanding what they are.

If you want to see what it's like without those protections, go use Mac OS 9 for a week or so,

Or an embedded system. Or an OS kernel.

--
Don't thank God, thank a doctor!

Re:I hate SQL and Databases in General... by SanityInAnarchy · 2010-09-01 12:09 · Score: 1

Why do we use quicksort when there are other approaches to sorting?

That's actually a good question, and many languages have switched to things like mergesort as a default sort.

--
Don't thank God, thank a doctor!

Re:I hate SQL and Databases in General... by davidbrit2 · 2010-09-01 13:15 · Score: 1

If only you could start something like a "transaction", which you could then "roll back" after finishing the test, leaving the database in its original state.
...which means you now can't test transactions.

SQL Server supports nested transactions. I'd imagine other vendors do as well.

Re:The holiest of holy wars by gfody · 2010-09-01 17:44 · Score: 1

most geek arguments are based on the Highlander principle

--

bite my glorious golden ass.

Re:I hate SQL and Databases in General... by SanityInAnarchy · 2010-09-01 18:06 · Score: 1

Most major databases support nested transactions. If the ones you've had to test against don't, that would be a real PITA.

I'm probably thinking of something like SQLite. Good point, though.

Of course, there is the added problem of being able to access the result of running a given failed test before rolling it back. Should we dump the entire production database to do that?

You would need to do essentially the same things with any persistent store, right? Unless you mean to test against production data...

Eventually, the idea is to test against a clone of production data, which is I think what you're suggesting. However, there are a lot of tests you can do without even touching the database, and a lot more you can do with generated data. This means you need a quick way to get a completely clean datastore, and a quick way to get a datastore which is a clone of a given snapshot of production. (Has to be a given snapshot, otherwise you might have a test that fails on one run and succeeds on the next.)

OTOH, why do you care if your web app that doesn't need ACID happens to run on an ACID database? ACID is usually transparent to the application programmer, and I would assert that most web apps will never grow large enough to notice any performance hit, assuming a reasonably sane schema is in place.

As soon as you grow beyond a single database server, that's a serious performance hit, or quite possibly an administration nightmare, assuming your chosen database can scale to multiple servers -- last I checked, Postgres only does replication.

Also, SQL is ugly. If I don't need ACID, why should I put up with it? More generally, if my web app doesn't need ACID, why would I want to use an ACID-compliant database at all?

Hardly any of our web apps are ever going to become the next Google, Amazon, Twitter, or whatever.

Even without that, I'd like to be able to handle spikes in traffic gracefully. While I doubt it was the database that was the bottleneck, recently, a number of students in my physics class have been unable to work on or submit online homework, because the online homework system seems to buckle under the load of all the procrastinators. Being able to trivially scale the database with the application is still a Good Thing, even for a tiny app.

--
Don't thank God, thank a doctor!

Re:I hate SQL and Databases in General... by notknown86 · 2010-09-01 21:21 · Score: 1

Further SQL has nothing to do with ACID. AT ALL!

Mr Begin Transaction and Mrs Commit Transaction called for you. They aren't happy.

Re:TMTOWTDI by itsybitsy · 2010-09-01 21:56 · Score: 1

http://tinyurl.com/TMTOWTDI

Re:I hate SQL and Databases in General... by GooberToo · 2010-09-02 00:55 · Score: 1

Over all, I don't disagree with the substance of your reply.

it was supposed to "enable non-programmers to write code"

This is very important. The problem is, those same non-programmers believe they can design databases too.

Consequently, its syntax is quirky and overly verbose

There is a standard which reduces the vast majority of syntax quirkiness. Much of the quirkiness exists not from the SQL standard but from various SQL RDBMS which do not follow the standard.

I've frequently heard SQL is overly verbose but have never bought into that. Can you provide an example of SQL which is overly verbose and an imaginary, more concise example, which still meets "non-programmer" demands in readability?

In my own opinion, people confuse "overly verbose" with having many options. Again, in my opinion, many people don't understand the many nuances of various SQL clauses and is likely the root of the "overly verbose" argument.

Don't get me wrong, SQL is not perfect. There are certainly some oddities in the SQL standard. At the same time, I'm hard pressed to think of any technology by committee which doesn't have faults. Furthermore, I can't think of any significant technology base which is perfect.

Book: SQL Antipatterns by Futurepower(R) · 2010-09-02 02:02 · Score: 2, Informative

SQL Antipatterns may interest you. As one of the reviews says, "An excellent guide to database design tradeoffs".

Re:I hate SQL and Databases in General... by DragonWriter · 2010-09-02 03:09 · Score: 1

I would agree with that, which is why I said *SQL basics*. You won't change the basic reduction down to working with sets no matter what you do with the language or abstractions.

On that level, sure. But a lot of things that one might consider fairly basic to SQL aren't essential to leverage the generality of the relational model.

For instance, you could have a datastore where you assert tuples without reference to a particular relvar (or table), and then the datastore assures that the asserted tuple satisfies the constraints for all defined base relvars whose headers it satisfies. The body of a relvar is simply, then, the set of all asserted tuples that satisfy the header of the relvar.

If you do this, you'll want to have an independent namespace scheme for attribute (column) names rather than using relvars/tables as column namespaces as well.

This produces a database that is going to be very different from an SQL-based database on a fairly fundamental level, but still can leverage mathematical set theory and relational algebra in much the same way.

It also could conceivably support asserting (and querying) facts/tuples before they fit into defined relvars/tables, while still assuring that any defined integrity constraints on defined relvars were satisfied.

Re:I hate SQL and Databases in General... by shutdown+-p+now · 2010-09-02 06:02 · Score: 1

To give a specific example of SQL being messed up, WHERE vs HAVING. No other query language I'm aware of needs two distinct filtering clauses. In XQuery, for example, you simply apply "where" before or after "group by" as needed - they are fully orthogonal operators, so you can interleave as many as needed.

By the way, in general, I would point out XQuery as an example of a cleanly designed (syntactically, at least) query language from which SQL could learn a lot.

truth is by Colin+Smith · 2010-09-02 09:28 · Score: 1

Most developers simply drop their application scalability problems down to the DB layer and/or OS layer. Then bitch that those DBAs are dumbasses, the DB server doesn't scale.

--
Deleted

Re:truth is by badkarmadayaccount · 2010-09-03 03:37 · Score: 1

Leave scaling where it belongs - in the OS.

--
I know tobacco is bad for you, so I smoke weed with crack.

Re:I hate SQL and Databases in General... by badkarmadayaccount · 2010-09-03 04:12 · Score: 1

I'm tired of all this nonsense. A DB provides persistence, storage, integrate it with the storage interface of the PL in question (you do have one, right). Heck, PL/I had handles and record oriented storage, and any descent multi paradigm language provides the necesary higher order functions or equivalent, the rest is the compilers job. FWIW

--
I know tobacco is bad for you, so I smoke weed with crack.

NULLS violate the relational model by DragonWriter · 2010-09-03 05:06 · Score: 1

I went through a patch a few years ago where I was interviewing programming candidates who had XML coming out of their ears but hadn't the foggiest idea of what "NULL" means in the relational model.

Why should they? NULL doesn't mean anything in the relational model; NULL is an SQL construct that violates the fundamental underpinnings of the relational model.

Re:NULLS violate the relational model by hey! · 2010-09-03 08:05 · Score: 1

E.F. Codd would have disagreed with you. True, it wasn't part of his landmark 1970 paper, but by the early 80s he was asserting that having a concept for "null" was a requirement of a system that was "really relational". The reason was that while you can have a nice, self-consistent mathematical model of relations without NULL, a practical system absolutely needs some kind of NULL. I know what I'm talking about here. Back before decent embeddable database systems existed, it was common to put "relational" facades on top of indexed file storage. Let me tell you, if you don't have NULL in a record keeping system, pretty soon you find yourself inventing your own version of NULL and littering your code with checks. The relational model without nulls is from a practical standpoint garbage.
Granted, SQL's concept of NULL is problematic, but you are ... let's say *seriously misguided* ... if you think a programmer can work with a SQL based relational database (which is the only kind of RDBMS there is) without understanding SQL's concept of NULL and how it affects things like existential predicates or aggregate functions.
In any case, I'd like to see a serious argument that NULL violates the fundamental underpinnings of the relational model. The fundamental underpinnings of the model are in the relational calculus and algebra. Adding NULL to all domains does not alter any calculation that does not specifically involve a null value, so it seems to be consistent with the "fundamental underpinnings" to me.

--
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Re:NULLS violate the relational model by DragonWriter · 2010-09-03 10:37 · Score: 1

E.F. Codd would have disagreed with you.

That depends when you asked him, I suppose.

but by the early 80s he was asserting that having a concept for "null" was a requirement of a system that was "really relational".

And by the 1990, he was arguing that a relational system needed to have at least two different kinds of Nulls to operate correctly.
(Apparently, the beginning of the start of something in the db world which culminated with the proposal of requiring support for up to 128 types of distinguished NULLs in the early part of the standardization process of SQL-99.)
So, outside of the 1980s, Codd would seem to view SQL's use of NULL as a problem. Not that I'm find of arguments to authority.

The reason was that while you can have a nice, self-consistent mathematical model of relations without NULL, a practical system absolutely needs some kind of NULL.

Practical systems need to be able to admit of records that a short of ideal because some data is either missing, unknown, or otherwise different from the basic assumptions.
Each of those possibilities -- missing, unknown, and each other possibility of difference from the basic assumptions -- can be identified with an actual value without resort to special non-values that are dealt with in a manner fundamentally different from values.
So NULLs aren't fundamentally necessary, and they tend to obscure information.
They're often initially easier than doing the analysis of the specific need, but then so is not using the relational model in the first place and just storing data in a haphazard manner.

Granted, SQL's concept of NULL is problematic, but you are ... let's say *seriously misguided* ... if you think a programmer can work with a SQL based relational database (which is the only kind of RDBMS there is) without understanding SQL's concept of NULL and how it affects things like existential predicates or aggregate functions.
First, SQL isn't the only "relational" query language in use, though it is by far the most common.
Second, I never said that understanding NULLs wasn't important to using SQL. I said that understanding the meaning of NULL in the relational model isn't important because NULL isn't part of the relational model.

In any case, I'd like to see a serious argument that NULL violates the fundamental underpinnings of the relational model.

I think C.J. Date & Hugh Darwen have handled this admirably; the first 13 slides of this presentation are as good of an overview of the issues as I've seen (slides 4 & 13 are the most directly relevant to the problem of NULL with regard to the underpinnings of the model.)
Re:NULLS violate the relational model by hey! · 2010-09-04 04:33 · Score: 1

The solution of further decomposing nullable tables and then joining the results is almost laughably impractical. The first thing programmers (especially these days) will do is join them back together so they can deserialize objects from the database.
In any case, that solution doesn't solve the real problem. It ensures that (A or not(A)) is true for any domain, but it doesn't address the very practical problem of identifying where information for some reason does not apply. For a null value A that means "unknown", (A or not(A)) should be a tautology. For a null value A that means "does not apply" (A or not(A)) has no defined value.
Of course you *can* evade all this by introducing more base relations into you schema and doing more joins (hahaha). But it's simply not a practical solution. It's the kind of thing you suggest to make an academic point, not to make the model more usable. Codd is the one who's right here. You have to deal with these issues by representation, not introduce more relational decomposition into database design.
What we're wrestling with here is that the relational model is a wonderful metamodel of computerized record keeping systems. It's not a very good metamodel for creating models of the real world. You can see that by the lack of provision in the model for open world reasoning. There's a certain useful creative tension involved in this dichotomy; two arrows for our designer's quiver (object and relational modeling) instead of one.
There's nothing necessarily mathematically obnoxious about the "surprising" cases Date cites here, any more than sqrt(-1) is a bad thing in algebra. You just have to know what you were doing.

Second, I never said that understanding NULLs wasn't important to using SQL.

Yes, you were using my post as a springboard to make a related point.
I've been following the relational literature for almost thirty years, and practicing as a designer. I have an appreciation for the value of computer science to the practitioner, but there is a point where you are ultimately talking about academic math with no practical application to engineering. Basically the idea of eliminating nulls is the kind of idea that does more to keep the researcher publishing than it does for bringing better behaved, more expressive tools to market. The behavior of SQL's null is quite tolerable in practical systems, so long as you understand it.
The truth is that practical implementations of the relational model -- ones that competent engineers would find helpful in solving real world problems -- nearly always have some kind of NULL and if they don't it's a fault. Codd's idea of multiple kinds of nulls will almost certainly be needed to extend the relational model to allow queries to be explicitly made under open and closed world reasoning, if that ever happens.
By the way on slide 4 "Nulls ARE permitted in alternate keys..." Where did that come from? If by "alternate key" he means "candidate key that is not chosen as primary", Date is contradicting his own earlier writings. In any case, key candidacy is almost certainly the biggest practical weakness of the relational model per se, particularly if you allow composite candidate keys. The problem is one of fundamental epistemology. All the analysis you do to choose a candidate key is based on closed world reasoning. In practical terms system requirements evolve, undermining key choices, *especially* the subtle assertions you must make about normal forms beyond the 3rd.

--
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.

Re:I hate SQL and Databases in General... by shutdown+-p+now · 2010-09-03 06:00 · Score: 1

Well, it's kinda the idea behind LINQ - integrate query comprehensions (and convenient syntactic sugar for them) into the language, and then provide extensibility points so that they can be mapped to various data sources as needed.

Re:I hate SQL and Databases in General... by SanityInAnarchy · 2010-09-03 14:59 · Score: 1

it seems to me that once the transaction being tested is committed and the test detects a failure, it should be able to record details before rolling back the test-level transaction.

It could, but would I be able to explore a snapshot of the data at that point in time? In particular, could I watch changes to the data as I step through with a debugger?

These are things which are easy if I have an isolated copy (as you described in SQLite), but don't work as well just using a transaction of a development or production database.

If you've got tests that don't need a copy of the store, I'd say partition those from tests that do need it;

It can be done, sure, but I think it's far easier just to have object factory classes -- though I'm not sure, now that I think of it, that these are easier without SQL, it seems to be fairly independent of datastore.

along with dump files to repopulate either DB schema.

Not just the schema, though. You'd want a clone of the production schema and (likely) the production data. But I see your point.

(they don't actually *need* Oracle for this app, but they've standardized on it for all their internal apps).

I hate when that happens, but then, I'm not sure I see any real applications of Oracle. It seems to me that when you get to the scale where you would need Oracle, you're very nearly at a scale which would make Oracle useless and force you to think in distributed terms.

I think you're right about PostgreSQL, but I'm too lazy to check.

I think there may, at one point, have been some proprietary options for true multimaster replication, but I don't think they went anywhere. And multimaster replication doesn't really buy you much in terms of scalability, if you have to keep all nodes in sync.

Anyway, you should get a firm grasp during planning of how large (or distributed) the data needs to be within a foreseeable time frame. If you anticipate truly huge data on multiple servers or a need for distributed data, then that should strongly influence what data platform you choose.

Point is, if I design and plan for truly huge data on multiple servers, it's still going to work at a smaller scale -- and as an added bonus, I find many of the NoSQL databases much more pleasant to use than SQL.

However, if I design and plan for a small scale, well, it's probably not going to be pleasant upgrading from SQLite to MySQL or from MySQL to Oracle, so you can imagine how not fun it would be to go from any of these to a truly sharded design, or to one of these NoSQL designs.

One thing I love about Google App Engine, for instance, is that it makes me think about the shape of my data. If I can think about how this works at a high level, I can design my application around it such that it can scale to pretty much any load anyone throws at it.

maybe not if the data model just won't work well with a key-value DB.

Well, but what data is that? Consider CouchDB -- you get "views", which are essentially mapping functions, and you can also create arbitrary "reduce" functions. This means you essentially get a cached view of the result of arbitrary code execution, and these can each be pretty much arbitrarily parallized.

Most of the major databases can handle medium-sized data sets, say 10s of GBs to 100s of GBs, on a single server.

That doesn't tell anywhere near the whole story -- I handle terabytes of data on a single server with a flat filesystem, because they happen to be terabytes of video.

So what kinds of data, and what kinds of access patterns are we talking about? And how big a server -- how many cheap PCs could I buy for the same price? How are you going to handle availability -- that is, what happens if that one server goes down?

OK, you don't like SQL.

--
Don't thank God, thank a doctor!

Slashdot Mirror

Yale Researchers Prove That ACID Is Scalable

223 of 272 comments (clear)