The NoSQL Ecosystem
abartels writes 'Unprecedented data volumes are driving businesses to look at alternatives to the traditional relational database technology that has served us well for over thirty years. Collectively, these alternatives have become known as NoSQL databases. The fundamental problem is that relational databases cannot handle many modern workloads. There are three specific problem areas: scaling out to data sets like Digg's (3 TB for green badges) or Facebook's (50 TB for inbox search) or eBay's (2 PB overall); per-server performance; and rigid schema design.'
Microsoft Access is here!
So... every time I open my inbox in Facebook, it has to search through 50TB of data? That sounds like a design problem. What has always floored me is why people think everything needs to be stuffed into a database. Terabyte sized binary blobs? You know, there's a certain point where people need to stop and actually think about the implimentation.
#fuckbeta #iamslashdot #dicemustdie
With regard to scalability, it strikes me that the problem isn't so much SQL but the fact that current SQL-based RDBMS implementations are optimized for smaller data sets.
The performance claims will probably be disputed by Oracle whizzes. However, the "rigid schema" claim bothers me. RDBMS can be built that have a very dynamic flavor to them. For example, treat each row as a map (associative array). Non-existent columns in any given row are treated as Null/empty instead of an error. Perhaps tables can also be created just by inserting a row into the (new) target table. No need for explicit schema management. Constraints, such as "required" or "number" can incrementally be added as the schema becomes solidified. We have dynamic app languages, so why not dynamic RDBMS also? Let's fiddle with and stretch RDBMS before outright tossing them. Maybe also overhaul or enhance SQL. It's a bit long in the tooth.
More at:
http://geocities.com/tablizer/dynrelat.htm
(And you thought geocities was de
Table-ized A.I.
I think I've heard of non-relational databases before. There's a particularly famous one, in fact. What could it be? Let's see: first started shipping in 1969, now in its eleventh major version, JDBC and ODBC access, full XML support in and out, available with an optional paired transaction manager, extremely high performance, and holds a very large chunk of the world's financial information (among other things). It also ranks up there with Microsoft Windows as among the world's all-time highest grossing software products.
....You bet non-relational is still highly relevant and useful in many different roles. Different tools for different jobs and all.
I'm a huge PostgreSQL fan and took classes in formal database theory in college. I'm saying this as someone who understands and thoroughly appreciates relational databases: I'm starting to love schema-less systems. I've only been playing with CouchDB for a few weeks but can certainly see what such stores bring to the table. Specifically, a lot of the data I've stored over the years doesn't neatly map to a predefined tuple, and while one-to-one tables can go a long way toward addressing that, they're certainly not the most elegant or efficient or convenient representation of arbitrary data.
I'm certainly not going to stop using an RDBMS for most purposes, but neither am I going to waste a lot of time trying to shoehorn an everchanging blob into one. Each tool has its place and I'm excited to see what niche this ecosystem evolves to fill.
Dewey, what part of this looks like authorities should be involved?
In the example of inbox's no user has to look at another user's inbox so the first step is to simply find the current user's mail.
I typically use MD5 since it's very good at evenly distributing information. For example stock symbols are heavily weighted to common letters so there are lots of stock symbols that start with "s". But, if you MD5 the stock symbol you get an even distribution based on the first two hash characters to put the historical data into 256 tables. You could also just put it all in one massive table and use the first two characters in their own column with an index. The advantage of using multiple tables is that it's easier to later split the tables onto multiple physical systems.
So MD5 the Facebook user ID. Use the first four characters to pick the database server. Use the next four characters to pick the table and then select from there. By the time you're even referencing the table you're down to a handful of accounts sharing one table. Searching the User's email is then trivial as the dataset is small.
Another example of MD5 awesomeness is finding a URL and associated data very quickly (useful for DMOZ data). In MySQL varchars can be up to 255 characters while URLs with various parameters can be any length so you could try to index the TEXT field OR you simply hash the URL and when you want to look up a URL you search for the easily indexed hash.
Working with large sets of data is only a problem if you don't devise ways to break up the data. If Facebook needs to search all the user's email for various stuff then they can run a script that goes through every table in every database. They don't have to run a single query which would take forever. With distinct sets of data you can quickly start getting results to verify your code is accurate and start digging through the results while the script continues to run.
Work Safe Porn
Hi Monkeys. There are MPP databases that scale way past this and allow you speedy access that includes ansi sql access (petabytes in teradata's case). The newer compresed column store engines for many uses destroy hadoop in analytics use cases, per in performance and far fewer machines, plus the ability to use sql.
Stop the trype hype.
There was a similar story on Slashdot a few months ago:
http://tech.slashdot.org/story/09/07/02/219247/Enthusiasts-Convene-To-Say-No-To-SQL-Hash-Out-New-DB-Breed
Table-ized A.I.
We didn't start with relationship databases. RDBMSes were responses to the seductive but unmanageable navigational databases that preceded them. There were good reasons for moving to relational databases, and those reasons are still valid today.
Computer Science doesn't change because we're writing in Javascript now instead of PL/1.
That is indeed suspicious. But if they want to sell clouds, then make a RDBMS that *does* scale across cloud nodes instead of bashing SQL. (SQL as a language doesn't define implementation; that's one of it's selling points.) It may be that since there's not one out yet, they instead hype the existing non-RDBMS that can span clouds.
(I agree that SQL could use some improvements, such as named sub-queries instead of massive deep nesting to make one big run-on statement. Some dialects already have this to some extent.)
Table-ized A.I.
Collectively, these alternatives have become known as NoSQL databases. The fundamental problem is that relational databases cannot handle many modern workloads.
I'm sceptical. Why is the problem worse now then in the past? Relational theory in practice is abstracting the data such that a human/application can understand it as logical constructs. How the data is PHYSICALLY organised is a matter of implementation - the relational theory doesn't place any constraint (!) on how the data is organised/retrieved/updated - except that by giving a broad design pattern , duplication is minmised, and so then is processing overhead. MPP (Parallel Processing) lends itself quite neatly to any large set of data - many implementations will continue to scale linearly above the PB size (e.g Teradata). Looks to me like a sales pitch.
I was an admin on a system that spread the data across 10 database servers. Each server had a complete set of some data, like accounts, but the system was designed so that ranges of accounts stored their transaction type data a specific server, and each server held about the same number of accounts and transactions. As data came in, it was temporarily housed on the incoming server until a background process picked it up and moved it to the 'correct' one. This is a very simplistic view, but the reality was that it worked quite well. Occasionally, there was a re-balancing that had to be done. But it was very scalable. The incoming data wasn't so time sensitive that if it took a few hours to get moved, everything was still OK. When an 'online' session needed data, it knew which server to connect to to get it. Processing was done overnight on each server, then summarized and combined as needed.
.. .people have been coming up with innovative ways to solve these problems for a very long time.
So yes
And they will continue to do so.
I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
I just sharded
Let's not forget where the bottleneck is - the I/O. It's expensive but once you build a fast and solid storage system, correctly configure it and partition your data properly over a sufficiently large number of hard drives, RAIDs, LUNs etc., you might be able to use SQL. We run a database of 10TB on MS SQL with hundreds of millions of records with an equal rate of reads and writes and could not be happier.
Worse, sharding and other such solutions usually end up requiring the application to know way, way too much about the back end structure, how tables are split, where they are split, and so on.
And your solution to improving the storage engine doesn't help. At some point in a RDBMS you need to do joins and so forth, and that assumes that the machine doing the join is capable of doing so AND of handling the load and the number of transactions being tossed at it. Hence we start getting into clusters and other solutions that again need to be understood and managed.
The NoSQL solution let's you toss your request out to the "cloud" and get an answer without needing to know clusters, shards, tables, or really anything on the physical implementation side of the fence.
Any sect, cult, or religion will legislate its creed into law if it acquires the political power to do so.
Most RDBMS implementations on the web are generally only used to store data and perform very basic queries such as get and store operations. Personally I don't really see the issue of using one for a web applications since they are proven to work well and with the right design and caching solution are more than capable of handling a popular website such as Digg or Facebook. The only real issue with these sites is to prevent bottlenecks you would generally need to throw more hardware at it than may be necessary (although memory is very cheap these days so its a non-issue for most companies).
Memcached has shown to really help solve many performance issues for relational databases since the database won't constantly perform complex queries to grab data, it will just pull the result from a hashed index stored in memory. MemcachedDB http://memcachedb.org/memcachedb-guide-1.0.pdf is looking very promising to use to get rid of a RDBMS all together for certain data such as user sessions since it focuses on performance rather than functionality. Even then I think it all really boils down to choosing the right tool for the job, if there's data that you know is going to be a performance bottleneck in the database, you look for more creative solutions to store and process that data. There's nothing stopping you from running two or more different types of databases for the task at hand.
Hmm... Before 1979, market share for RDBMS was TINY. It really didn't begin to "serve us well" until the mid 80's.
Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading
Wow a "object oriented" database discussion again. I've never read one of these :P I've only been doing this 15 years and I've lost count of these talks a long time ago.
What is the difference between schema less and schema rigid anyways. I don't see what that has anything to do with performance. The real issue is uptime and transaction support. People want to add a column or index without taking the system down. That is different then dealing with PBs of data. Most table structures can easily deal with that much data.
If you have a DB that is big you have lots of outs. Pay...get Enterprise version of whatever. Break it into many DB/tables and merge together. Archive. Archive I bet will get most people by. Does eBay really need all that bidding info for items over a few weeks old...only for analysis maybe. Move that old stale data out of the active heavily hit data tiers.
The fact remains that MySQL should be able to scale to TBs of data. The fact that it can't is a failure of the product. All the others have been for a while. Why can't it...I don't know...the fact that it uses a F'in different file for each index on a table. If you don't understand how old school that is start using Paradox. Just because it is open source doesn't mean it has to be so damn out of date. Please for the love of god save multiple tables/indexes in the same pre sized file...god.
Google has all the power to go and use something different. Google gets to cheat. Google is a collection of pretty static data. They scan the internet a lot, but imagine if every time you did a search Google had to scan every web page on the planet, index them, and then give you search results. That would be impractical for sure. So for now they just store big collections of blobs and a big fast index for searching keywords and links to pages. Impressive none the less, but it's not like your typical app. GMail is...funny that it is one system they've had problem with. Even then EMAIL DOESN'T CHANGE. It's user specific, but it's still f'in static. GoogleTastic if you ask me.
The fact is people are using RDBMS right now to solve real world problems. Some start up is finding a way to tweek MySQL to do something cool and then posting it on a blog...then all of the sudden RDBMS is dead. RDBMS is fine, it will be fine for at least 10 years if not longer. In that time it will evolve as well so that it will be around for even longer. MySQL in 5 years will have online index addition, performance hitless online column addition, partitioning, geo indexing, XML columns, BigASS table support, Oracle RAC like support, and a thousand other features that some RDBMSs have today and some will not see for even longer. Then developers that spent all that cash developing custom shit will revert and post comments like this one.
That's the way it goes in software development. The middle tier gets bigger, gets inept, custom shit comes out, it gets integrated into the middle tier shit....continue;
Instead of pronouncing death start talking about how dated a 2 dimensional result set is. JOINs should return N dimension result sets similar to XML with butt loads of meta data. ODBC/JDBC are dated...so updated them.
select u.login, ul.when from users u join user_logins ul as logins.login ON ul.user_id = u.user_id where u.name = 'me' should equal something like a nested XML packet instead of duplicated crap when there is more then one user_logins.
Can we agree that SQL is a high level language for capturing the set theory query logic and is COMPLETELY INDEPENDENT of the engine and physical storage that actually generates the query plan and makes the heads fly to cache and return data?
Structured
Query
Language
not
Stupid
Quixotic
Layout
(Of tables, pages, indexes, drives, heads,spindles, SANs, etc...)
Right?
"Knowing everything doesn't help..."
Has this ever occured to you: Maybe people just choose not to answer you? :)
I've seen OLAP systems in the 100TB range which work fantastically well on Oracle.
Object databases could be a nice idea, but not for performance or scaling reasons. An object oriented database would be beneficial as a method to sidestep ORM. So you can, effortlessly and without any significant amount extra work persist the state of your objects.
Then you can build POxOs to represent your objects and just implement a few lines of code to have them persisted.
Not sure if anything like that already exists. I certainly don't know of anything in the C# world, but I expect there's some funky named java project which does it.
MS-Access had some really great features: it could be accessed with both SQL and with a blazingly fast (because almost running on the bare OS) ISAM-style library. I am still missing anything like it on Linux. SQLite is a file-system database, but why on earth should it parse full-blown SQL at runtime and why on earth should my program write another program in SQL at runtime just to load some data? Get serious. Parsing and building SQL is just overhead, and especially parsing SQL is no easy and light task.
Since I switched to OO programming, most (95%) of my queries are "This table/index. Number 5 please." In essence that is the get/put method, or the ISAM style method. I really would like something like that to exist on Linux. The closest thing around is MySQL's HANDLER statement, but that can only be used for constant data (because it does dirty reads) and for reading only.
SQLite could even be faster if it just accepted some basic "get row by index" and "put row by index" commands that do not try to parse, optimize or outsmart anything. The problem with "modern" databases is that they are either "SQL" or "NoSQL". That's awful. Some programs speak SQL (because of compatibility, because it is a reporting program or just because the programmer does not know anything else) and some programs are better off with direct row management. That does not mean that the data should not be accessible by both programs. I really wish that the regular SQL databases would develop ISAM-style access methods. Programming would be a hell of a lot easier then, and the programs themselves would speed up significantly was well.
This is no idle remark. I worked a lot with MS-Access and most rants about it being slow comes from the fact that most programmers treat the file-system database as a server. So it must emulate itself as a server and do a lot of household parsing and does not even have a physical server to relieve its load.
But if you know how to program a file-system database with ISAM-style methods, MS-Access is by far the fastest database I ever encountered. No Joke. Really. It can be fast because there is no need to do all these household jobs to just dig up a row.
Nae king! Nae laird! Nae yurrupiean pressedent! We willna be fooled again!
its simpler to switch to a different rdbms when your queriees are already in sql.
It's mostly just human ignorance and laziness.
Deleted
You are aware of PostgreSQL's hstore: a type representing basically a name-value mapping (think Perl hash or Python dictionary). You can put an index on it answering queries like "find all records where the field has a mapping "foo => bar", or contains mappings {foo => bar, baz => grumble} and more.
Cool stuff.
E-Mail servers associate data with only one index: the e-mail address.
...Valid points, except for your use of the word "one". My email can be retrieved by my email address, but also selected by the folder that it's in, sorted by sender, subject, date or priority, and searched by keyword.
There are only a couple of handfuls of thing that need to be indexed, but certainly more than 1.
I work on a very large db2 system. Enterprise systems cost money because they work. There still seems to be this ignorant self absorbed counter culture which believes big iron and similar (anything about look what I can build in my basement) isn't cool so it cannot work.
Between radix, sparse, derived, encoded vector indexes I can pretty much serve up anything my partners want, whether they are native or foreign db2 ,jdbc or odbc connected. With the tools I have at my disposal I can analyze statements presented by developers to insure I have the access paths needed for their work and guide them to better data retrieval. I can tell if their choices result in full table scans, index probes, hash tables, rrn tables, etc. If I need support its a phone call away.
I do not care who my client is, data is my job. As such I need tools which are so reliable that only concerns I have are, just what is my customer doing and how can I make their request better. When they query 5tb tables and don't even notice a delay I think I am doing just fine.
* Winners compare their achievements to their goals, losers compare theirs to that of others.
SQL is hardly a hammer - a hammer only has one general use. It's not a Swiss army knife either - a lot of fairly low-grade tools that are convenient in a pinch. If anything, a Swiss army knife is a spreadsheet.
An RDBMS is more like a well equipped workshop that you build and equip at the site of likely problems. It will take far more work to set up than buying a leatherman tool. However, it will solve almost any unanticipated problem you throw at it, once it is built. That is the beauty of an RDBMS, and why businesses and governments like to build both workshops and relational databases.
Of course, there are circumstances where a RDBMS is not called for. If you are doing anything that needs to be highly optimized for just one thing, and will only ever be used for just that one thing, then you do not use an RDBMS. (e.g. an FPS). Much like you wouldn't use a workshop if all you are going to be doing is manufacturing widgets - then you need a factory.
I guess the analogy kind of breaks down there, because a workshop isn't efficient enough to run as a business compared to a factory - it is support infrastructure. For many, many things though, an RDBMS can be the core of a business information system and can also quickly and conveniently answer questions that weren't thought of at design time. Their RDBMS problem domain will only increase as computing power grows, unlike more specialized systems. I would not be surprised at all if SQL is still dominant in a hundred years time.
If I have seen further it is by stealing the Intellectual Property of giants.
SQL databases if designed properly DO handle enourmous datasets. the problem starts when you have wits designing the database and then managers attempting to use the DB for purposes it wasn't meant for.
If you mod me down, I will become more powerful than you can imagine....
All these programmers that know how to create tables and normalize a DB but that don't really understand advanced programming techniques are cornering themselves in the "Vietnam of software development": endless OO-to-RDB plumbing. They invent lots of tools to ease their immediate pain, without looking at the big picture: OO and RDB do not match.
You have hierarchical datas? Then learn something new: learn what OO really is about, use an OO DB. It has proven to be really fast.
But I don't expect this to become mainstream: most people don't understand advanced programming techniques. They don't understand OO, they don't understand multi-threaded programming. Hence they rely on the SQL DB to "keep things in synch" and to "organize" their data. It kinda works, for naive stuff.
Once it comes to real amount of data, then the relational paradigm and especially the SQL implementation of that relational paradigm simply ain't cutting it anymore.
I mangled the first link.. http://tinyurl.com/ybepcqr/
Database size is usually not an issue for modern RDBMS, such as Microsoft SQL, Sybase ASE, Oracle, or IBM's DB2. I am running an ERP on Sybase with 3 TB worth of data, a datamart on Microsoft with 5 TB, a Patient Record System on Microsoft with 20 TB, a HR system with 2 TB, and a Patient Accounting system on Oracle with 8 TB of data. All of these systems talk with at least one other system, usually with the assistance of SSIS (Thank god for SSIS, our ETL is heavy lifting, approx. 5 TB a night of incrementals). With enough server hardware, we can scale up to very large levels easily. We forcast out our data size needs out for the next three years and have been very accurate, not running across SAN issues.
Only systems we have had issues with in the area of data size is MySQL and Informix.
In God we trust, all others require data.
You do realize that a site can use more than one database, right? A database that is primarily read-only to display data quickly, and another database to handle the financial transactions. I really hope that you are not telling me that any business - web business or not - does not depend on ACID requirements.
Also, there's more to the RDBMS world than MySQL, PostGres, etc. - the commercial databases, on proper hardware and proper database design, can actually scale up and scale out quite well: You just have to learn to use the right tools for the job!
The typical architect who opts for a NO-SQL approach is basing her decision on what an RDBMS can do / can't do primarily on experience with mySQL. She would never consider something much more scaleable on the extreme like Oracle or even heaven forbid DB2. She has never tuned let alone touched one of these real RDBMS. Similarly, her idea of hardware doesn't much transcend a set of independent servers linked with GBE. So her hammer is anything but an RDBMS and the conclusion is totally foregone that an RDBMS won't work. The real conclusion is that mySQL won't work which is totally accurate. Go look at the Larry Ellison video of the Oracle/Sun database machine which will eat most of these "unsolveable" problems for lunch. Yes it is expensive, but building an empire so your pet project can succeed is also expensive and probably more risky as well.
... that too many developers and integrators will just use an SQL database by default without considering whether or not it is appropriate for the task. I see so many databases where there is little or no hint of any relationships even being involved. Some forums, for example, store postings in a database where the message content is a blob and it is indexed by a number. To get a post, look by number. While an SQL database can do this, so can many other database types. There's no complex relational searching with this; it's just basic indexing (with maybe a tree of index relationships). I'd sooner do this with a B-tree based filesystem.
now we need to go OSS in diesel cars
Actually, the real problem is that MySQL sucks. Sure, you can patch over some of its suck with Memcache, but at somepoint your still stuck waiting 30 seconds for a query to return, no matter how optimized you make it. Yes, it's trivial to get Oracle and MSSQL to scale to billions of rows, but those cost money no one is willing to spend. NoSQL is wonderful in that it scales easily and is free.
Sure, you have to denormalize your data, but you probably already were to try to squeeze the last bit of performance out of MySQL.
You want people to use RDBMS? Make a free one that doesn't suck donkey balls and they will.
CODASYL Hierarchical Databases are faster for large complex databases. I've supported extremely large databases and user bases with 3 second or better end-to-end response times for over 300,000 real-time customer service rep users with such software. These databases allow precise physical positioning; including the ability to group related child record rows on the same physical page. One I/O can retrieve the entire set. They also support hash or other custom indexing that directly yields the physical page address instead of wading thru relational index pages to get there. Tool support is not as good and it takes someone who understands them to get the best results. Functionality such as producing report output is more work. But they work great on large datasets.
I can quickly find anything on my desk using my index of food wrappers and containers.
I know that report was done about the time I ate that snicker's, ah found it.
I only look human.
My mother is a halfling and my dad is an ogre, so that makes me an Ogreling
Let's call it Nazgul instead? That's how I pronounce NoSQL anyway :)
At the ACM site Michael Stonebraker wrote an article titled "The "NoSQL" Discussion has Nothing to Do With SQL" where he discusses how the NoSQL group is solving real problems, but using a name.. that well.. really has nothing to do with the problems getting solved.
http://cacm.acm.org/blogs/blog-cacm/50678-the-nosql-discussion-has-nothing-to-do-with-sql/fulltext
For anyone not familiar with Stonebreaker..
http://en.wikipedia.org/wiki/Michael_Stonebraker
Great article from someone who truly knows what he is talking about.
MongoDB starts as a service on my MacBook and on my local network I always keep services for Sesame (RDF data store, SPARQL endpoint), MongoDB, and CouchDB running.
It is easier to use NoSQL datastores (when they are appropriate) if you always have them running, have client libraries in place, etc.
If you want to use a relational database, you don't have to stop to install it, get client libraires, etc. I think the same 'ready at hand-ness' shoud apply to whatever NoSQL datastores that meet your needs.
The world rely on IMS
Are all of those in Java? What about people who want something efficient and scalable without running JVMs everywhere? Have some of them been ported to Mono?
Everything stored on disk or memory is a database! Many filesystems use B-Trees to organize files. Those directory paths are essentailly schemas. The simple pointer address to a struct, record, or object in memory is organized data. A memory block and the ASCII charset define a string.
Every 50TB block is a blob with interpreted meaning. Whatever that meaning is, that's your database!
The JOIN syntax should not affect the query plan. Most SQL-engines today create equally good plans for both SQL-86 and SQL-92 JOINs. This is true among MySQL, PostgreSQL, & Oracle from the last decade.
The query plan will read any indexes it has to find column C in both tables X and Y. Then it will do an index-scan to retrieve the columns. See the "EXPLAIN PLAN" (oracle) or just "EXPLAIN" statement to see the query plans.
SQL> select x.a, y.b from x, y where x.c = y.c;
SQL> select x.a, y.b from x left join y on x.c;
Also, there nothing restricting these SQL queries to one computer. The SELECT can occur recursively on a server cluster. Treat the queries on computers A & B as the UNION of the two single-computer SELECTs. SQL can be used in a cloud!
SQL> (NOTE: AT... is not SQL!)
select x.a, y.b from x left join y on x.c AT Computer A
UNION
select x.a, y.b from x left join y on x.c AT Computer B;
There's something to be said for using the right tool for the job. A general purpose database will be optimized for the general case, not for your specific problem. Large databases spanning multiple servers, taking extreme traffic, are sufficiently outside the scope of normal database operations, that a custom solution can be the only way to do it.
I suggest:
* figure out exactly what needs to be solved ;-) )
* check if existing solutions solve it
* if not, then develop a solution (and if you're in Canada, claim IRAP and SR&ED for it
Reasons why my suggestion would not always work:
* risk, both financial and project
* skills bias (preference to change the problem to match what skills are available)
* technology bias (preference to change the problem to match a specific technology)
* vendor bias (preference to change the problem to match a specific vendor)
SR&ED
50TB of data? OMG! WTF! MOREACRONYMSINCAPS! With an index and an average allocation unit of 1kB and no caching whatsoever, that could be, like, up to almost 37 seeks!!! OH NOES! DO WE HAVE ENOUGH POWER?!?!?
I too get annoyed by database luddites, especially the ones who are in there because they have no social skills, no desire to co-operate with others, and who know all the MS latest terminology but don't, for instance, actually understand how indexes work because they have never really learnt system programming. But valuable corporate data does need to be protected; its loss or corruption costs profits and jobs. SQL is a proven language with a strong track record that is largely portable and, except when queries are generated by some hopeless automated query generation engine, can be made human readable and checkable. Way to go for corporate data.
If you had a nickel as you suggest, you probably wouldn't have enough to buy lunch in a decent restaurant. If you had a nickel for every swear word uttered by every dba or IT manager sweating blood trying to overcome data loss or corruption, you might be able to retire as you suggest.
Remember: social networking applications are not mission critical business processes, and they do not have significant SLAs to meet.
From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."
> Digg's (3 TB for green badges) or Facebook's (50 TB for inbox search) or eBay's (2 PB overall)
All of these use sql though.
SQLite is a nice alternative for embedded systems. The whole distribution is less than half a meg. Works quite well for the opposite side of the spectrum covered by TFA. Smaller than Access, smaller than darn near anything, a fully self contained SQL environment expressed in a file. For the Big Huge scale (petabytes) look at Google's BigTables.
Do not mock my vision of impractical footwear
No. The EAV model creates a row-centric view of attributes. My suggestion keeps the traditional column-centric view intact. Other than being more careful about implied types when comparing and asterisk usage, most SQL will look just like it does in a "static" RDBMS. This is not the case with EAV's; they completely change the way one queries.
Table-ized A.I.
PostgreSQL at least, and probably other databases, has a generic "key-value store" data type: http://www.postgresql.org/docs/8.3/interactive/datatype.html. With it, rows can contain some strictly-typed data (such as IDs, types, other metadata) and also contain a field (or many fields) which store all other loosely-typed data. And since it's PostgreSQL, all data is safe, can be replicated, you can have complex indexes, full text search, etc.
-- Sig down
...One database per registered user.
I am not devoid of humor.