Beyond Relational Databases

KISS by mcrbids · 2005-05-24 07:14 · Score: 5, Insightful

Some of the biggest problems that "new" database designs have:

1) Overly complex

2) Don't scale

3) Tied to a single platform/implementation

4) Poor performance

It's typical to see all four in a single try!

SQL, on the other hand:

1) Reasonably simple API

2) Scales to very large databsaes

3) Cross-platform/architecture

4) Performs very well.

Given the insane amount of inertia SQL has, it will extend into an object model, rather than be replaced by one. (EG: C/C++)

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.

Re:KISS by mmkkbb · 2005-05-24 07:22 · Score: 1, Interesting

There is a language called OOSQL, but looking at it makes me squirm.

--
-mkb
Re:KISS by eggoeater · 2005-05-24 07:23 · Score: 2, Interesting

Very true. My group had to rebuild an overly-complicated MIS system for a large call center about 5 years ago. The old system was built with RedBrick DB engine, informatica data transformations (i.e. pretty GUI..oooooohhhhh), and Brio reporting. It was a mess and NEVER worked. We replaced it with SQL Server and some relatively simple SQL stored procs and dumped the POS informatica and RedBrick.

IMHO, there are too many GUI tools out there to do data transformations. SQL works great but most people don't want to take the time to learn it!

--
$7.95/mo, 200 GB disk, 2TBxfer, MySQL, PHP, RoR.
Re:KISS by Anonymous Coward · 2005-05-24 07:30 · Score: 0

BDB is simple and scales well too. I was hoping they were advocating a move to something more like BDB for the future post-relational database stuff.
Re:KISS by NoOneInParticular · 2005-05-24 07:51 · Score: 3, Insightful

Google's web index and desktop search facility is a database. I don't know about point 1, but Google definitely blows any relational database out of the water on point 2 to 4.
As for SQL I do not agree with point 4: SQL does not perform very well. We're in the age of Ghz processors, fast disk drives and it *still* is an performance issue to add a few million records to a database? What SQL sorely lacks is a recognition that the 4th generation of software languages was a bad mistake, and get back to a third generation language: explicit indices, explicit loops over these indices and fast (compiled) execution of said loops. Just freaking program the database instead of waving chicken bones at it.
Re:KISS by MSBob · 2005-05-24 08:00 · Score: 3, Insightful

Relational algebra is a very nice and tidy concept but SQL is a piss poor (and limited) implementation thereof.
In other words, relational databases are very nice and elegant but their interface (SQL) is bad and should be replaced.

Also relational databases by themselves can't supply the needs of a typical enterprise and that's where technologies such as OLAP are built on top of RDBMS to make certain data manipulations efficient.

--
Your pizza just the way you ought to have it.
Re:KISS by ignorant_coward · 2005-05-24 08:01 · Score: 1

WTF is BDB? Another buzzword du jour?
Re:KISS by Frank+T.+Lofaro+Jr. · 2005-05-24 08:02 · Score: 2, Informative

Umm SQL does have explicit indexes and they make a big difference:

create index t_c_idx on t (c);

creates an index "t_c_idx" on column "c" of table "t".

I've shaved 2 multi-second queries to under 1/10 of a second (it was an overall speedup of over 100X) just by adding indexes and letting them be used (e.g. you can't have where some_function(c)=value and expect the DB to use the index on c).

--
Just because it CAN be done, doesn't mean it should!
Re:KISS by ignorant_coward · 2005-05-24 08:03 · Score: 1

As for SQL I do not agree with point 4: SQL does not perform very well. We're in the age of Ghz processors, fast disk drives and it *still* is an performance issue to add a few million records to a database?

With tuning it isn't so bad. Usually a piss-poor database reflects on piss-poor developers.

Also, the better SQL engines out there have a decade+ of refinement and stability, along with transactional integrity on all those millions of inserts.
Re:KISS by NoOneInParticular · 2005-05-24 08:12 · Score: 1

My problem with tuning is that I want to do this programatically, not by measuring cache utilization at the processor level. I have the same problem with Java, where for performance issues you end up using generate and test to set the parameters for the garbage collector and JIT parameters. In my opinion this is a step back from total control from the language itself.
But definitely: stability and acidity of the database are extremely valuable, I just wish I could simply get rid of it sometimes. Again programatically.
Re:KISS by NoOneInParticular · 2005-05-24 08:21 · Score: 2, Interesting

Yes, you can expect the DB to use the index, which is subtly different from instructing the database to use the index. The whole idea behind SQL is great: a declarative language for specifying the job, and a procedural, smart, engine for performing the job. Unfortunately, databases have failed to deliver on the latter part, leading to implicit techniques to get the performance out. What I claim here is that we should take a step back, not forward, with SQL and get back to explicit programming of the databases, with strong guarentees about the performance. No more "let's add an index and I'm sure it will go faster", but, "instruct the query to first make and then use the index". Much easier to comprehend, and much faster.
The relational model is great, but what we need for speed is absolute control over the order in which records are stored. If you know that order and you have an underperforming "general" query, you might be able to speed up the thing with smart programming instead of trying to prepare the environment in such a way that the procedural engine does its job. Say you want to insert a bunch of records which have consecutive primary keys. Writing the procedure for adding them is trivial, letting the database sort it out often leads to dissapointing performance.
Re:KISS by handslikesnakes · 2005-05-24 08:27 · Score: 1

Berkeley DB, I'm guessing. And the buzzword de jour is "buzzword".
Re:KISS by Anonymous Coward · 2005-05-24 08:38 · Score: 0

du jour == of THE day
Re:KISS by 3nuff · 2005-05-24 08:41 · Score: 2, Informative

I'm currently working on one of those GUI applications, not Informatica. I am pleased with it's capabillities and I think it's a very powerful and reliable tool. Big projects can take a couple days once a good template is built and the metadata management is simple and useful. Reusable code and SMP processing are available and greatly enhance the speed of data transfer.
In the wrong hands though, such a tool can be made into a piece of crap. The problem is that too many "programmers" don't know SQL before they use these tools. They just go at it. No standards in place; just head down. Judgements need to be made on when to use the tool's sql generation and hand coded SQL to get best performance.

Too many companies get substandard people in place and then wonder why their $1M software doesn't work, it puts these tools in a bad light, that is undeserved.

--
"Give me taste, give me funk, give me fury, gimme some more."
Re:KISS by WaterBreath · 2005-05-24 08:49 · Score: 2, Interesting

instruct the query to first make and then use the index
build multiple indices on a 20M row table, on the fly? Is it worth the effort to build it on the fly if the database can figure out the best way to make and use the indices on its own 99% of the time?

I think that what you're looking for is a database that will allow, but not require, you to tinker programmatically with creating and using temporary indices on the fly. It's still indespensible to have optimizing index functionality in the database for the 99% of proper uses that don't require the extra oomph. And insisting on reinventing the wheel every time simply because "I can do it better than some stupid program someone else wrote" is a rather naive development philosophy. Yes there are times when it's better. But they are vastly outweighed by the times when it's just not worth it. The need for packaged indexing functionality is there.
Re:KISS by WaterBreath · 2005-05-24 08:57 · Score: 3, Insightful

Usually a piss-poor database reflects on piss-poor developers.
This is a BIG IMPORTANT POINT. But, unfortunately, the people that are architecting the databases in these cases usually don't realize that they're producing bad designs. All they see is "The database is slow. Why the *!?$ is it so slow!?" There's usually one or two answers: 1) Your schema is organized poorly. 2) You're just plain juggling too much data at once. And let's not forget the deadly combination of both 1 and 2. 100M rows, for example, is a lot of data, any way you look at it. If you've got 100 bytes per row (remarkably small in many cases), that's 10G of data. If you're shuffling through all that data every time, yeah, it's gonna be slow. Ya might want to look into reorganizing, or archiving.
Re:KISS by WaterBreath · 2005-05-24 08:59 · Score: 1

I swear I hit "Preview". Let's try it again:
Usually a piss-poor database reflects on piss-poor developers.

This is a BIG IMPORTANT POINT. But, unfortunately, the people that are architecting the databases in these cases usually don't realize that they're producing bad designs. All they see is "The database is slow. Why the *!?$ is it so slow!?"

There's usually one or two answers:

1) Your schema is organized poorly.

2) You're just plain juggling too much data at once.

And let's not forget the deadly combination of both 1 and 2.

100M rows, for example, is a lot of data, any way you look at it. If you've got 100 bytes per row (remarkably small in many cases), that's 10G of data. If you're shuffling through all that data every time, yeah, it's gonna be slow. Ya might want to look into reorganizing, or archiving.
Re:KISS by Retric · 2005-05-24 09:00 · Score: 1

The problem with this is code rot. It's faster now but when the DB is 20x as large but your hardware is 200x as fast and you need to change some things your not going to want to mess with low-level data storage issues.

I think the best solution is to have a good administrative system so you can optimize the database based on system load without changing your code. You end up with the old Oracle / DB administrator but you get a highly tooned system that is still portable.
Re:KISS by kurzweilfreak · 2005-05-24 09:05 · Score: 1

le pwned

--
kurzweil_freak
5th Kyu Genbukan Ninpo/KJJR student
Be the darkness that allows the light to shine.
Re:KISS by waveman · 2005-05-24 09:08 · Score: 2, Insightful

> SQL, on the other hand:
> 1) Reasonably simple API...

The outrageous success of SQL is in part because it complies with the "second law of axiomatic design": a good design has minimum entropy. Prior to SQL a lot of application knowledge was duplicated in the database which increased entropy. Also a lot of knowledge of the physical structure of the data was embedded in the programs which also increased entropy. This is bad because the more redundant information the system contains the harder it is to change.

With SQL you can change the physical structure without rewriting all your programs. And you can change your code without affecting the DBMS design eg you can get the rows back in a different order. In the bad old days if you wanted things in a different order you had to reload the database and recompile all your programs.

Even though SQL uses 3 to 10 times as many CPU cycles as hierarchical DBMSs or ISAM files it took over because of this flexibility.

Similarly, I think that one of the reasons OO programming has not delivered the productivity benefits it was supposed to is that OO somehow encourages people to embed a lot of duplicated knowledge of the application into the class structure. This duplicated information makes it hard to change the system. I have seen this again and again. In contrast, before OO a lot of this information ended up in data or in a database, where it could be easily changed.
Re:KISS by fritzenheimer · 2005-05-24 09:20 · Score: 1

Some of the biggest problems that "new" database designs have:
1) Overly complex
The whole point of the article is that current RDBMS schemes are overly complex. Several applications are listed that RDBMS isn't well suited for: data warehousing, directory services, web search, mobile device caching, XML management, and stream processing. RDBMS was designed for workloads characterized by ad hoc queries (which SQL shines at as several of you have commented on), significant write traffic, and the need for strong transactional guarantees. The "problem" applications above don't have a need for any of these RDBMS design requirements. It's too much for the task at hand.

The article author advocates "flexible solutions" "tailored to the needs of a particular application."

--
RFM
Re:KISS by timeOday · 2005-05-24 09:22 · Score: 1

The logic programming community already spent a decade or two trying to replace SQL with logic languages like Prolog. SQL vs Prolog is like C vs lisp all over again.
Re:KISS by Kent+Recal · 2005-05-24 09:26 · Score: 1

Just freaking program the database instead of waving chicken bones at it.

Best quote about databases I have read in a long time. Is it by you?

We have, too, hit the ceiling on our reporting SQL database (postgres) a while ago and it's slowly becoming a pain to work with it.

There are adjustments that we can make to the schema to improve the situation (drastically even) but its quickly approaching a point where the sheer amount of data makes some of the more interesting queries less and less feasible because of the performance hit.

The next step (after improving the schema) will be to throw more hardware at it (currently: dual-xeon, 3GB ram, fast scsi disks) but still, sometime not far in the future we'll have to switch some of the interesting questions to custom stuff (like grepping/scripting on huge files, cdb, custom stuff) or consequently reduce the working set (like, only process a month instead of three) because the damn queries don't scale lineary but more exponentially.

Instead of rolling most of that custom stuff on our own we'd much prefer to have a toolkit available - just as you suggested.
Let *me* decide how to get to the data. Yes I *do* know better than any query optimizer how to approach *my* problem.

I think this is a huge market gap (think goatse only bigger!) since any friggin website is database driven nowadays. And anything based on a SQL DB that grows to any noteworthy size very quickly regrets to be SQL based in first place. Don't underestimate the overhead of clustering (in addition to the overhead of a SQL database in itself).

There are many situations where the flexiblity of SQL is just unnessecary and the waste of ressources unjustified. In these situations it would to not be forced to write it *all* from scratch but just to use something that has been time-tested, peer-reviewed and definately works. I know many people who would pay serious money for such a thing.
Re:KISS by Kent+Recal · 2005-05-24 09:31 · Score: 1

I missed a "be nice" in my above post. You'll figure it out. :-)
Re:KISS by Kent+Recal · 2005-05-24 09:46 · Score: 1

100M rows, for example, is a lot of data, any way you look at it. If you've got 100 bytes per row (remarkably small in many cases), that's 10G of data. If you're shuffling through all that data every time, yeah, it's gonna be slow. Ya might want to look into reorganizing, or archiving.

10G is a lot of data?
Well, when you know how to treat it it doesn't have to be.
My problem with SQL databases is that they often refuse to do what I want them to do. They refuse to use an index for some obscure reason or take another, entirely wrong approach.
When that happens you're down to hinting the query optimizer (which rarely worked for me) or drag the stuff through various temp tables which works faster than the initial query but still not remotely as fast as it could be if i could just tell the damn thing exactly what to do and in what order.

When a temp_table is involved most RDBMS will do the first query (read/process/write), second query (read/process/write) etc.
With more control over the actual processing steps taking place after the first "read" I could often save quite a few of them. Stupid query optimizers tend to waste *a lot* of precious RAM and once they hit the boundary of physical mem it usually means: game over (as in: "let's try to do this with yet another intermediate step"...)
Re:KISS by gfody · 2005-05-24 09:50 · Score: 1

KISS never scales.

here's an example of a "new" problem. Take an event, let it have some properties.. when, what, where, who, etc. Now put 100 billion of them into a table.

If we've kept it simple we have one table with 100 billion records in it, and our attempts at producing summary reports (table scan w/aggregates) fail miserably (by taking too damn long, or causing the tempdb to expand until something breaks).

In fact, any new problem is only a problem because of current RDBMS's inability to scale. The solution to the above problem is to dimensionalize the data, keep one scalar 'fact table' and one or more 'periodic snapshot' tables. This adds lots of complexity for the purpose of being able to run summary reports more than once every 30 something hours. Not to mention using way less overall system resources.

I'm curious why you mention poor performance and no scalability as drawbacks of a "new" system. The new system proposed by TFA is hypothetical. I also wonder if you even know what scalability means?

I can't think of any situation where the simplest method is also the most scalable. Especially in database scenerios where working around the IO bottleneck requires some novel approach. Of course, I don't think anybody in their right mind would refer to today's RDBMS's as "simple" either.

--

bite my glorious golden ass.
Re:KISS by snorklewacker · 2005-05-24 10:01 · Score: 1

> Just freaking program the database instead of waving chicken bones at it.

You go right ahead. You're such a real man, you go write your own execution plans for every query. I prefer a declarative language that does it for me, and no I don't have a problem with adding millions of rows. SQL doesn't even enter into it. You think it's parsing and interpreting SQL for every column of every row going into the database?

It's not even a very verbose language: 90% of SQL queries are names of tables and rows. So go turn SELECT, FROM, WHERE, and ORDER BY into one-character punctuation sigils, it won't exactly result in an explosion of productivity.

Golly. Try learning something about databases before you go off on a tirade.

--
I am no longer wasting my time with slashdot
Re:KISS by gfody · 2005-05-24 10:03 · Score: 1

I think what you want already exists in the form of optimizer hints.

you can specify the index to use and even the join method via hints like so:
select * from
sometable (index=someindex)
inner hash join othertable (index=...)
inner merge join ....

the last trick is to use "forceplan" which will cause the executer to join the tables in the order you have them listed in the query. this makes SQL less declarative and more procedural in a way.

--

bite my glorious golden ass.
Re:KISS by Anonymous Coward · 2005-05-24 10:04 · Score: 0

> Yes, you can expect the DB to use the index, which is subtly different from instructing the database to use the index.

Pardon? Most real databases can be told to use or not use an index. Oracle most certainly lets you write your own query plans, if you're that much of a masochist.

I suggest you stop judging databases based on what MySQL can't do.
Re:KISS by eggoeater · 2005-05-24 10:05 · Score: 1

Right...that's why a lot of these GUI tools are dangerous. People who don't know SQL can quickly get into trouble, especially with respect to design.

The trouble with informatica (at least the version I used,) was it didn't generate any SQL. It actually by-passed the SQL engine on the server and performed the data transformation row-by-row. This is equivalent to doing all your SQL with cursors...slow as hell.

--
$7.95/mo, 200 GB disk, 2TBxfer, MySQL, PHP, RoR.
Re:KISS by Brian_Ellenberger · 2005-05-24 10:20 · Score: 3, Insightful

Google's web index and desktop search facility is a database. I don't know about point 1, but Google definitely blows any relational database out of the water on point 2 to 4.

Google is a very unique case. There are two things in Google's advantage that most RDBMS system have to take into account:

1. Google does not have to update in realtime. If I add a page to my website it is not immediately available on Google. Contrast this to a normal RDBMS where if I add a record it must be available immediately. Google is much more similar to a Data Warehouse than a RDBMS.

2. Websites indexes are more easily parallelized because complex joining of data is not needed. Naively, Google can store all websites starting with 'A' on a server, 'B' on a server, etc. You can store the User table and the Address table on separate database servers and expect to query on users and their addresses with any sort of performance.

3. Going along with 3, the queries expected out of most RDBMS systems are much much more complex than any queries expected out of Google right now. I'm assuming you haven't seen any complex financial reports or statistics that have been generated from a RDBMS. Databases do alot more than "select name from user where id=1234".

Brian
Re:KISS by isomeme · 2005-05-24 10:30 · Score: 2, Insightful

Google's web index and desktop search facility is a database. I don't know about point 1, but Google definitely blows any relational database out of the water on point 2 to 4.

Google's performance and value are both amazing. But it's easy to drive yourself nuts (if you're an enterprise software architect, which I am) trying to get that kind of performance out of other types of application. You see, Google has two major advantages over nearly all other large data-backed applications:

1. There is no "right" answer

Google keeps their ranking and indexing schemes proprietary, so nobody can say what "should" come back from a given search. Indeed, execute the same query at the same time from different machines, or the same machine at different times, and you sometimes get wildly varying results. SEO folks call this the "Google Dance".

2. Writes are asynchronous with reads

As near as anyone can tell, the Google index is rebuilt by their crawler over the span of a few weeks, and then the whole new index is exported to production machines over the span of a few days. Only the crawler writes data to the index, and the index as it's being built is not read by end-user clients; the production index is *only* read.

These advantages let Google use a distributed, loosely coupled, inconsistent server farm made of cheap boxes that needn't be in sync with one another. It doesn't matter if queries to two of them give different answers, after all.

Contrast this with (e.g.) an online bookstore; once you order a book, every node in every part of the system needs to know about that order in order to keep everything consistent (stock level monitoring, end-user purchase tracking, and so forth). This is a much, much harder problem to solve.

So again, not to take anything away from Google's tech (which I more or less worship), but it's not a fair point of comparison for most large enterprise apps.

--
When all you have is a hammer, everything looks like a skull.
Re:KISS by ShieldW0lf · 2005-05-24 10:57 · Score: 1

Sounds to me like you're trying to use a database that was designed for OLTP to run queries better suited to an OLAP design. Or, to put it another way, you sound like someone who doesn't understand how to use the tool complaining that it won't do what you need when you're just using it wrong, which is what pretty much every complaint I've personally heard about SQL databases boils down to.

If I were you, I'd be looking at adding a second database on a second box for these queries, then horizontally partition your data and move the older stuff to the second box, index the shit out of it, create tables where you'd otherwise use views, etc and design your reporting software accordingly.

There's only one data solution out there that is time-tested, peer-reviewed and definately works. It's called a relational database.

--
-1 Uncomfortable Truth
Re:KISS by Kent+Recal · 2005-05-24 11:03 · Score: 1

Well, you have a point. (I'm responding because I wrote a post quite similar to the one you replied to)

But it's really just that, why is none of the dbs letting us do our own execution plan (and do mutations on the intermediate sets)?

In my environment the number of queries that need to run frequently is relatively limited. Squeezing the last bit of performance out of them would definately be reasonable. More reasonable than starting from square one (sleepycat) or below.

Why is it that we have these powerful, mature database backends in open source but the only means to trigger their magic remains so limited?

I don't claim to grasp much of the underlying hard math. But I do realize when I run into a corner case that could be resolved with a minor tweak.
Re:KISS by uncqual · 2005-05-24 11:13 · Score: 1

With tuning it isn't so bad. Usually a piss-poor database reflects on piss-poor developers.

I assume this is a comment about those developing the application (i.e., determining the logical and physical database design,queries, and the platform) rather than the developers of the underlying DBMS (Oracle, SQL Server etc.).

In my experience, piss-poor database performance often comes from the failure of the customer to redesign schemas as they toss loads and queries onto the system that were not anticipated when the application was originally created.

For example, the developers were initially assured that the customer service droids would only be able to query an individual customer's service history. However, two years after rollout, someone adds an "upsell" function to the customer service interface and this results in some dreadful join that was never anticipated - and, alas, this change is deployed without DB schema changes or adequate performance testing under load. This seems to be most common in shops without a strong IT department which "owns the world" (and therefore can't stop "progress" for pesky things like design).

--
Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading /.
Re:KISS by jbplou · 2005-05-24 11:21 · Score: 1

You forgot about developers who don't create proper indexes or use too many triggers.
Re:KISS by Pseudonym · 2005-05-24 12:02 · Score: 1
On the other other hand, SQL-based databases:
- do not provide a "natural" representation for most kinds of data (i.e. most data doesn't naturally come in tables),
- can't handle some kinds of data at all, particularly in any situation where you need to extract an unknown number of indexable keys from a single field efficiently (e.g. indexing text by word),
- is incompatible between different kinds of server for anything nontrivial (e.g. joins), and
- rely on stored procedures, which are even more incompatible.
It's also not true that any database design which isn't relational is "new". Z39.50, for example, has been around for longer than the SQL standard. (It used to be known as WAIS before it got its ISO number.)
--
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
Re:KISS by snorklewacker · 2005-05-24 13:00 · Score: 1

> But it's really just that, why is none of the dbs letting us do our own execution plan (and do mutations on the intermediate sets)?

Well, there's only so much tuning you can actually do. Oracle will let you pick indexes to use, possibly the order (I'm really not sure), or whether to use them at all. Creating them "on the fly" is of course impossible, since creating an index in the first place requires a full scan. Temporary tables are for when you need to mutate the data "mid-stream", and many databases include a first-class table type to store resultsets in, without need to even use the temporary tablespace. Combined with the procedural languages that many databases offer, you really do have that sort of control if you want it -- sometimes it's just painfully verbose to express it.

Aside from the nasty verbosity, using a prepared query like "select blah from table where primarykey = ?" really is just like selecting from a hash db like sleepycat. It's probably not launching into some exhaustive grandiose search logic (though it does have to pass through some extra layers -- the price one pays for abstraction) unless the table was specifically set up to use such logic, since that table could be partitioned, it could be a view, but you don't have to know it. Databases like PostgreSQL have even more direct interfaces via the OID column, making them act very much like an OODB.

--
I am no longer wasting my time with slashdot
Re:KISS by ejamie · 2005-05-24 13:09 · Score: 3, Informative

I fully agree with this the parent post. I've built several systems (web or otherwise) running on SQL Server over the last 5 years.

I've learned this from experience myself: There is no reason why a database driven application should be slow, provided the database is layed out appropriately and the built-in performance facilities of the RDBMS are utilized.

Show me a slow performing database-driven application, and I will show you a set of indexes, stored procedures, vertical/horizontal table partitioning, or table normalization/denormalization that will fix the problem.

The other cool things about DB optimization (one of my favorite development subject areas), is that generally putting in effort to optimize in these areas will give you an immediate and many-fold increase in performance. Not so easy to do in non-RDBMS environments!

--
Hey! Stop copying my sig!!! Stop copying my sig!!! Stop copying my sig!!! Stop copying my sig!!!
Re:KISS by vsync64 · 2005-05-24 13:56 · Score: 0, Troll

Your nickname is accurate, for sure...

--
TO BUY A NEW CAR WOULD MAKE YOU SEXUALLY ATTRACTIVE.
Re:KISS by Anonymous Coward · 2005-05-24 14:38 · Score: 0

You're stupid. SQL isn't an implementation of relational algebra. It's an implementation of relational calculus plus a bunch of extra features (sorting, analytics, and whatnot). The idea is that the user should be able to specify what they want (relational calculus) instead of how to get it (relational algebra).
Re:KISS by HornWumpus · 2005-05-24 18:10 · Score: 1

not by measuring cache utilization at the processor level.
Is that how your tuning you queries or are you looking for problem tasks with the processor stats.

If the former you are being mislead, if the latter ignore this message.

The first step for tuning slow queries is the query plan. After that you need to reformat your query to trick the optimizer into doing sane things. Stored procedures are programatic so that should work for you to begin with.

For example under SQL server these two querys return the same result. One is much faster:
SELECT * FROM i INNERJOIN c ON i.co_id = c.co_id WHERE c.co_name = 'acme';
SELECT * FROM i WHERE i.co_id = (SELECT co_id FROM c WHERE c.co_name = 'acme');

The steps the query engine does for the first is typically much slower. This is what I would call waving chicken bones at the query engine. Still once you know a few of these tricks it can be worth a few bucks. The real trick is reading the execution plan then running a few different versions against real test data.

--
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
Re:KISS by Stu+Charlton · 2005-05-24 19:12 · Score: 2, Interesting

I disagree that a programmer can manually create a better query plan than (for example -- others apply) Oracle 10g. In certain cases -- perhaps. A couple of questions though....

A) will a programmer be expected to do statistical analysis of the data, checking for value skew, and also noting the current load of the system before coming up with the best way to perform a query? This is what happens automatically today, each time you parse a SQL statement.

B) if the skew of data changes, do they have to rewrite their query to switch access algorithms?

C) if I want to some day perform the query in parallel, do I also have to rewrite it?

D) if the business needs change, and I have to change the way I look up the data (its access path), do I once again have to rewrite the query?

E) if I rely on an exact order of rows in the database, doesn't that imply an O(log n) insert / update / delete penalty (assuming a B*Tree structure, which generally is the means of preserving order on disk).

F) if the volume of data in the table grows, wouldn't I have to rewrite the query at some point?

These are the reasons we have SQL today. And we have things like optimizer plan stability and now (with Oracle 10g) the ability to actually re-write the plan for a poorly written SQL statement. I think it's best to stick with declarative, logical querying. I'd love a replacement for SQL, but moving back to 3GL-based physical access is a step backwards.

--
-Stu
Re:KISS by millwall · 2005-05-24 23:34 · Score: 1

they're producing bad designs

What about yourself? Look at the design of your post! Half of it in bold, no line breaks.

Pfff! :-P
Re:KISS by ahmusch · 2005-05-25 01:21 · Score: 1

Exactly how is full-scanning a table, sorting it, writing an index, and then performing data access by the index faster than simply full-scanning the table? It seems to me that if you just do the full scan and process based off of that, you've done less work.

If you need absolute control over how data is stored, well, that's why various databases give you various options. Always join two tables? Cluster them (as in physically store together) the two tables. Always want the data in sorted order? Use an index-organized table, or reorg it regularly about a clustering index.

This is the sort of knowledge that database architects and administrators are supposed to have, and are supposed to leverage... except when the coders decide that they're going to use that database as a data dump never bother consulting with them about "how should I get this data?"

Explicit programming of database processing is how you get slow-by-slow, er, row-by-row processing. Database are designed to work on sets of data. You can't outsort it programmatically and you can't outjoin it programmatically. If you try, then you're either wasting your time or wasting money - and probably both.
Re:KISS by AdamPiotrZochowski · 2005-05-25 01:41 · Score: 1

you can't have where some_function(c)=value and expect the DB to use the index on c

just to nit pick, but some sql let you do this, kinda.

Add a new column where the value is computed. Its something along the lines of:

alter table [table] add new_column_name as someFunction(c);

Now add index on that column. The trick is that the function has to be deterministic, meaning, work in a predictable fashion such that sql knows for 100% all the conditions when the result of someFunction would change, and only update index when any of such conditions is met.

If you cannot add index then your function is not predictable/deterministic and the function is called everytime you read a row, not something I would recommend.

Besides having index on a function you can also achieve indexes over aggregates with use of materialized views. Such index will be reused whenever possible even when materialized view is not directly refered to.

--
/apz, sql is god, its that all implementations suck
Re:KISS by Kent+Recal · 2005-05-25 08:25 · Score: 1

Well, you mentioned the exact thing that I'd be interested in:

launch into some exhaustive, grandiose search logic

...because some queries could be tackled quite easily with a little script
once the relevant rows have been fetched to memory. as it is, some things
that cannot be expressed well in SQL would be deadsimple if you could
throw a little "while"-loop in the mix. I know most RDBS have some kind of
scripting language but the syntax is usually horrible (I have only tried
postgres, i think they call it plsql) and its still closely tied to SQL,
as in you can usually only operate on the result-set of a query, not on the
intermediate steps.
Re:KISS by snorklewacker · 2005-05-25 08:39 · Score: 1

I don't know of a single "real" database that doesn't have cursors in their high-level language and per-row callbacks in their C interface. You're dealing with an external system, get used to an event-driven interface. Otherwise, go and hack the source yourself.

What you're asking for exists and does not require reinventing the DB engine. And if you need a different type of engine, there are things like "tick databases" made for storing and fetching truly massive amounts of log-type data quickly (usually stock quotes). There are even SQL databases that are real time and used in industrial automation. Expand your horizons before judging the entire field.

--
I am no longer wasting my time with slashdot
Re:KISS by Anonymous Coward · 2005-05-25 08:45 · Score: 0

Google is much more similar to a Data Warehouse than a RDBMS.
Data warehousing however is not at all unique and as the article points out, this is the first mentioned use case where we should be looking at alternatives to RDBMS.
Re:KISS by Kent+Recal · 2005-05-25 09:48 · Score: 1

I don't know of a single "real" database that doesn't have cursors in their high-level language and per-row callbacks in their C interface. You're dealing with an external system, get used to an event-driven interface. Otherwise, go and hack the source yourself.

Cursors and per-row callbacks, at least the implementations that I have seen, trigger *after* the query has been processed. There is no means to tell a JOIN to "stop and go no further" when a certain condition is met.
There is no way to perform loops, searches or complex mutations on intermediate sets beyond what SQL has to offer.

What you're asking for exists and does not require reinventing the DB engine. And if you need a different type of engine, there are things like "tick databases" made for storing and fetching truly massive amounts of log-type data quickly (usually stock quotes). There are even SQL databases that are real time and used in industrial automation.

Yes, I know there are tick databases, bibliographical databases and quite a few others. What do they have to do with the shortcoming of most SQL databases that I described? Are you saying that instead of thinking about how to improve a tool that "almost fits" I should rather go and start with one that was made for an entirely different purpose?

Expand your horizons before judging the entire field.

Make sure you comprehend the question before talking down like that?

I'm by no means claiming to be a database guru or that I know things better than the smart people who actually design these things. If I sounded like that then I apologize, it was not intended.

I just wondered why an obviously *possible* and probably very useful bit of flexibility has not been considered yet.

Feel free to enlighten me on database theory and why manual setup of execution plans is much more difficult than it may appear to me by looking at the output of "explain analyze" but don't point me to a motorbike (which is fine for certain purposes) when I'm looking to pimp my car...
Re:KISS by handslikesnakes · 2005-05-25 11:59 · Score: 1

Moral of the story: rural Alberta schools have shitty French programs.
Re:KISS by GoofyBoy · 2005-05-25 14:38 · Score: 1

>No more "let's add an index and I'm sure it will go faster",

No serious SQL programmer/DBA will say this. Doing this is like saying "just add one to the loop varaiable and hopefully we will get the correct answer.

> but, "instruct the query to first make and then use the index".

As other mentioned before - huh? Can you show me what serious DB does this? I think that you would be better off doing a table scan than creating, using and then destroying an index.

>is absolute control over the order in which records are stored.

Um.. maybe you should talk to a DBA. This can be done in alot of modern DBs. I've been doing this since mid-90s.

--
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
Re:KISS by Patrik_AKA_RedX · 2005-05-25 21:18 · Score: 1

Similarly, I think that one of the reasons OO programming has not delivered the productivity benefits it was supposed to is that OO somehow encourages people to embed a lot of duplicated knowledge of the application into the class structure. This duplicated information makes it hard to change the system. I have seen this again and again. In contrast, before OO a lot of this information ended up in data or in a database, where it could be easily changed.
IMO the big problem with OOP lays in the way it is taught. It's always put forward in a all or nothing point of view: either you write an OO program or you don't. I think they should teach OOP (and any other programming paradigm) as being a tool, not as The One True Magical Solution.
Using other paradigms could deal with redundant code far more easely than OOP can.
OOP works wonderful on ADT, but ordinary functions and procedures beat OOP easely on reducing repeated code. IMO OOP combined with procedural and functional programming could provide the promised productivity benifits.

With databases it's pretty much the same. Instead of trying to get one technic to work, perhaps they should be looking in new combination of technics, borrowing from other sciences. Perhaps the information available on how the brain works or AI technics can be adapted for large databases.
Re:KISS by Bush+Pig · 2005-05-26 09:01 · Score: 1

I used to work for an oil company, and they had several Oracle instances, each of about a terabyte. We found that some queries were taking a _long_ time (often they'd time out, or they'd be terminated because they were about to interfere with the nightly backup). The problem was a braindead database design (Oracle Financials with some customisation) and these queries were looking at views of views of views of ... well, you get the picture. One of the blokes I was working with looked at the queries, and found that they were actually quite simple if you unrolled them, and got them down from hours to, in some cases, seconds. Often the problem was that the queries were unable to use automatic query optimisation, because of their surface complexity.

However, the problem certainly _wasn't_ a lack of control over the processing steps.

--
What a long, strange trip it's been.
Re:KISS by Bush+Pig · 2005-05-26 20:52 · Score: 1

I've just recently acquired responsibilty for supporting a system which was originally written for ISAM files (it's _really_ old), but the DB back end (in our instance) has been force-fitted into an Oracle database. So ... I've got these highly denormalised tables, with the ISAM-style indices built into them (multiple copies of the same data in different columns in the same fucking table). Christ on a bike! All the responsibility for data integrity and security and all the business rules are in the application. (And it does a pretty piss-poor job, let me tell you.)

The upside is, I reckon I've got a job for life.

--
What a long, strange trip it's been.
Re:KISS by Bush+Pig · 2005-05-26 20:56 · Score: 1

The really important thing about relational databases is not the ability to perform ad-hoc queries (although that _is_ a nice feature), it's the fact that they guarantee referential integrity, and they have a rigorous mathematical model underpinning them. And yeah, I know that there aren't any perfect commercial implementations of the relational model, but the main players are pretty bloody good.

--
What a long, strange trip it's been.
Re:KISS by aug24 · 2005-05-27 19:29 · Score: 1

it *still* is an performance issue to add a few million records to a database?

What the fuck are you using, Access?! I don't consider my databases big till they are a couple of orders of mag over that. And no, I don't have any speed issues.

The problem is not SQL. The problem is developers who don't understand databases. For example (personal hate) surrogate keys on every table instead of actually understanding what the keys are, dictated by the business need. Un-indexed foreign keys because the developer doesn't realise that the connection goes both ways (find the parent from the child child from the parent).

If you step back a little, and think about what a SQL engine does - work out the best way to produce exactly what you said we should manually write, then write it and execute it - you'll realise that you might as well complain that Awk does pattern matching for you and we should walk the byte arrays ourselves. Or do you turn off all code generation tools when working with EJBs (say) because you know better?

Justin.

--
You're only jealous cos the little penguins are talking to me.
Re:KISS by tigersha · 2005-05-28 20:36 · Score: 1

I can second that. One thing that irks me in OO is the idea that simple things like adding two numbers is also supposed to be a message sent to an object. Link 2 + 3 is the message +3 send to 2. This is complete crap.

OO's problem lies in the deep fundamentals, the basic values from which objects are constructed. Despite all the Python people's whining, Java did it right with the distincion of an Integer and an int. An Integer has both a value AND a location. It has a specific ID. An int represents the universal value. This is something the fanatic-OO languages like Python and Ruby get wrong. There is a difference between a value and the location with a refernce in which the value is stored. A fundamental one.

--
The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism

Just because something is "old" by Anonymous Coward · 2005-05-24 07:17 · Score: 4, Insightful

Doesn't make it obsolete. "Databases are old and kludgey. Teh suXX0rs for R0xxng H4XX0rs liek me.

Just because people are too stupid to take the time to read and understand the theory and learn the application doesn't mean the technology is no longer relevant.

Of course no solutions are proposed. There are none because relational theory is correct, and appropriate for real database driven applications. Little crap bulletin boards can use MySQL.

Netcraft confirms relational databases are dead!

Re:Just because something is "old" by speed-sf · 2005-05-24 07:37 · Score: 1

The real solution is in small lightweight autonomous agents that can query the 'monolithic' RDMS in an evolutionary fashion. No need to scrap a proven system or to scrap SQL. Just use it smarter.

--
All your database are belong to us
Re:Just because something is "old" by Anonymous Coward · 2005-05-24 08:03 · Score: 0

Careful son.. it's not like the big SQL database products are that much closer to the relational model than MySQL. No updateable views, no general constraints, no basic relational operators like equality (between *relational values*).
Re:Just because something is "old" by Frank+T.+Lofaro+Jr. · 2005-05-24 08:06 · Score: 2, Insightful

Little crap bulletin boards can use MySQL.

Like Slashdot :) Remember yesterday's fiasco where posts were migrating into other articles and the time before that where it happened and the time before that. :)

Why people don't use PostgreSQL is beyond me - unless they don't like fact it is under a BSD license instead of the GPL (please don't get me started on that).

--
Just because it CAN be done, doesn't mean it should!
Re:Just because something is "old" by darthium · 2005-05-24 08:07 · Score: 2, Insightful

I concur with this opinion, I've seen a lot of supposedly 'Top notch' designs and I've seen flaws that made me blush myself....mostly the problem is with stupid design mistakes and poor concepts.

Instead of blaming the technology and tools, they should improve skills in the Sytem Artchitects and all the way down the road the people involved in Software Development.
Re:Just because something is "old" by orderb13 · 2005-05-24 08:55 · Score: 1

Gee, looking at SQL Server Books online and typing in "Views" I see something that says "modifying data". Reading the article I see that you CAN update a view with new data.
Re:Just because something is "old" by Anonymous Coward · 2005-05-24 09:15 · Score: 0

Uhh, wasn't that just some not-so-clever trolling? All of those posts were AC posts.
Re:Just because something is "old" by Kafka_Canada · 2005-05-24 10:18 · Score: 1

Why people don't use PostgreSQL is beyond me - unless they don't like fact it is under a BSD license instead of the GPL (please don't get me started on that).

I think it's more because people look at the name, try to figure out how to pronounce it, then give up and use MySQL.

--
Fuck it

Wake me up when it's ready by Vile+Slime · 2005-05-24 07:17 · Score: 3, Insightful

People,

Have been crying for the need to replace relational databases since the early nineties at least.

We can all see where that got them.

--
---- Go ahead, mod me down, I'll just post it again and you lose your mod points.

Re:Wake me up when it's ready by Eravnrekaree · 2005-05-24 12:30 · Score: 1

Few have ever used a truly relational database, since SQL does such a terrible job it cannot be called truly relational. The relational theory I believe when properly used leads to well designed, easy to maintain, and well performing databases. I dont really think OO databases or some other design will change things much, since the same issues remain as to actually how to physically store and retrieve data. The perfomance issues we see today I believe are the result of bad implementation, in the engine itself, in such a poor adaption of relational concepts as SQL, or in the design of the application, not the relational theory itself which is solid.

If someone wanted, as well, OO based layers can be done on top of relational databases, using the same engine. Then one can choose which perspective to utilise. There are good methods to store hierarchical data on a relational database if thats needed.

. . . no concrete solutions are included. . . by kfg · 2005-05-24 07:18 · Score: 3, Insightful

Funny how they never are, eh?

KFG

Re:. . . no concrete solutions are included. . . by Anonymous Coward · 2005-05-24 11:52 · Score: 0

I vote for replacing all RDBMS'es with a couple CSV files - or better yet, a LOT of XML consisting mainly of CDATA tags ;)

Author's conclusion in case of slashdotting by Anonymous Coward · 2005-05-24 07:19 · Score: 5, Funny

The future will not be found in the relational model, object model, or hybrid, but in the comma-delimited list.

Re:Author's conclusion in case of slashdotting by Anonymous Coward · 2005-05-24 07:32 · Score: 0

Post-It notes. Lots and lots of Post-It notes.
Re:Author's conclusion in case of slashdotting by kfg · 2005-05-24 07:44 · Score: 1

Although how he came to that conclusion after four pages of rambling along over half formed ideas is beyond me.

The theorists have been demonstrating that for decades in no more than a few paragraphs of plain English and a little logic.

KFG

Re:I did not RTFA by AKAImBatman · 2005-05-24 07:19 · Score: 5, Insightful

I didn't RTFA but for my needs

Or the summary

mySQL suits me quite well.

That's nice. It won't handle a multi-terabyte database, though. That's the domain of Terabase, Oracle, and (blech) DB2. It's also what the article is about.

The power of PHP and mySQL is all I need.

And a moped is all you need to get to work. If you want to haul 300 metric tons of rock from point A to point B, you need a dump truck. Again, that's what this article is about.

Back on topic, this entire article is mostly speculative for the moment. A lot of excellent work has been done in OODB and XMLDB designs, but no singular design has yet emerged to solve all our woes. For example, I love the Prevayler concept. It solves a lot of problems, lowers data access times, and provides for complete data security. It also isn't usable or scalable without a lot more design work.

The future will hold some very interesting things, but for now we'll have to keep inventing until we come up with a consolidated solution.

--
Javascript + Nintendo DSi = DSiCade

SQL Dying...film at 11...NOT! by Gorm+the+DBA · 2005-05-24 07:19 · Score: 4, Interesting

Wow..."SQL and Relational Databases to be replaced by new technology"...film at 11.

See "COBOL to be replaced...." for an example of just how unlikely that is...sure, the latest hip "Tres Kewl" software for business might be written in something else, but SQL will be around for a long, long time.

Consider just the fact that "Relational Database" technology as laid out by Cobb back in the early days specifically says "You don't *HAVE* to do it this way, but it will be more effecient if you do"...realize that SQL handles Denormalized Warehouse and Datamart tables just as well as it does the 5th normal form model of perfection...and relax...it ain't goin nowhere.

Re:SQL Dying...film at 11...NOT! by tyler_larson · 2005-05-24 07:30 · Score: 2, Insightful

After extensive research, our Databases Of The Future department has come up with the following prediction:
In the future, databases will be more "good."

While we can't yet go into detail as to how this will all work, suffice it to say that we have a pretty solid idea what the future holds.

--
"With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea...."
RFC 1925
Re:SQL Dying...film at 11...NOT! by Alien+Being · 2005-05-24 07:32 · Score: 1

s/Cobb/Codd/
Re:SQL Dying...film at 11...NOT! by Anonymous Coward · 2005-05-24 10:53 · Score: 0

> See "COBOL to be replaced...."

Buddy, COBOL has been replaced. By SQL. COBOL was typically used for ISAM databases, and you wrote your "queries" in ISAM statements embedded in COBOL, whereas SQL embeds into damn near anything -- yes, including COBOL, but also Perl or PL/SQL or Java or whatnot. It's expressive enough that just the SQL shell like isql or sqlplus is often enough, whereas COBOL was needed to be the ISAM shell.

The only thing COBOL is used for nowadays is supporting existing COBOL codebases. Nobody starts a project in COBOL.

Close... by techwolf · 2005-05-24 07:20 · Score: 1

but not a real http://it.slashdot.org/article.pl?sid=05/05/02/194 4248&tid=221&tid=198&tid=8 dupe.

I don't know what new structure they'll come up with for storing data, but I'm sure someone will try to port Linux to it.

--
I don't do this for karma, I do it for cash. It's much better.

Databases? Bah! by GillBates0 · 2005-05-24 07:20 · Score: 3, Funny

Real men work with raw data.

Nothing builds character like manually searching megabytes of raw, unorganized information for a relevent entry. Except maybe sorting it by hand.

Databases are for sissies.

--
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam

Re:Databases? Bah! by techwolf · 2005-05-24 07:22 · Score: 2, Insightful

That's what Perl is for...

--
I don't do this for karma, I do it for cash. It's much better.
Re:Databases? Bah! by The+Angry+Mick · 2005-05-24 08:07 · Score: 2, Funny

Nothing builds character like manually searching megabytes of raw, unorganized information for a relevent entry. Except maybe sorting it by hand.

True, but eventually, you have to give in to the need for less complexity - after all, there's only so many hours in a day.

Now I keep all my "1"'s and "0"'s in two separate containers. This makes it tremendously easy to find exactly what I looking for . . .

--
I'm not tense. I'm just terribly, terribly, alert.
Re:Databases? Bah! by ArsonSmith · 2005-05-24 08:18 · Score: 1

No, Reading perl is like manually searching megabytes of raw, unorganized information for a relevent entry.

--
Paying taxes to buy civilization is like paying a hooker to buy love.
Re:Databases? Bah! by temojen · 2005-05-24 08:21 · Score: 2, Funny

As are filesystems. "Real men" grep their block devices for magic numbers like "ReIsEr34" when looking for the start of their data.
Re:Databases? Bah! by shish · 2005-05-24 08:53 · Score: 2, Funny

Nothing builds character like manually searching megabytes of raw, unorganized information for a relevent entry
Wow, someone who likes slashdot's search feature :o

--
I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment
Re:Databases? Bah! by 123abc987 · 2005-05-24 10:45 · Score: 1

There are relevent entries in slashdot?!
Re:Databases? Bah! by spasticus74 · 2005-05-24 14:58 · Score: 1

halleluljah brother!

--
"I'd like to think oysters transcend national barriers Adrian"

Relational Filesystems by Doc+Ruby · 2005-05-24 07:21 · Score: 5, Interesting

How about just getting filesystems to be relational? Replace the ancient 1960s-era hierarchical inode database that underlies filesystems with a modern relational one. Then distributed databases can provide a more consistent platform for all our distributed apps.

Enough stuffing metadata into filenames. Enough shoehorning all data into a file/folder/cabinet model, now less familiar to people than the networked infosystems that mimic them. Enough fake hierarchies inconsistent with accurate data models, forcing whole technologies like Apple Spotlight, GNU Dashboard, and Google Search just to transact basic relatioships buried in the data. Enough reinvention of the wheel with every initial RDBMS schema, just a layer on top of the DB's actual hierarchical filesystem - a shell for an inode database. Enough empty promises of "WinFS" and "OLEDB" vapor - get relational filesystems into developers' hands, and developers will move beyond them, building apps that meet users actual needs, dragging the database tech along.

--

--
make install -not war

Re:Relational Filesystems by AKAImBatman · 2005-05-24 07:35 · Score: 3, Interesting

How about just getting filesystems to be relational?

For what purpose? I used to work with a Unisys NX Series (and a predecessor that I don't remember) and it had all data stored in database tables. PC files were stored in new tables with a record length matching the size of the file. It was more a PITA than actually useful.

Enough stuffing metadata into filenames. Enough shoehorning all data into a file/folder/cabinet model, now less familiar to people than the networked infosystems that mimic them. Enough fake hierarchies inconsistent with accurate data models, forcing whole technologies like Apple Spotlight, GNU Dashboard, and Google Search just to transact basic relatioships buried in the data.

You don't need a relational model for that. All you need is a metadata area in the FS, and a metadata indexing scheme. You can then save psudeo-directories as metadata searches. i.e. A bit like the Label system used by GMail. This concept has been done to death in the reseach community, but no one has yet had any success in getting users to accept the idea.

--
Javascript + Nintendo DSi = DSiCade
Re:Relational Filesystems by Doc+Ruby · 2005-05-24 07:48 · Score: 1

You just described the relational "views" feature. Others have other relational features at the top of their priorities. A RDBMS, instead of the hierarchical DBMS we use for our filesystems, offers your feature, their feature, my feature, all in a package. Which is well understood, with lots of applications that use it. We've already got RDBMS tech - why reinvent an inadequate version of it?

--
--
make install -not war
Re:Relational Filesystems by turgid · 2005-05-24 07:54 · Score: 3, Informative

The Pick OS.

--
Stick Men
Re:Relational Filesystems by RiotNrrd · 2005-05-24 07:56 · Score: 2, Interesting

How about just getting filesystems to be relational? Replace the ancient 1960s-era hierarchical inode database that underlies filesystems with a modern relational one. Then distributed databases can provide a more consistent platform for all our distributed apps.

I think that this is what Microsoft was trying to do with WinFS. The only problem would be that the RDBMS that replaced the FAT would be an embedded version of SQL Server.

Of course this would be *great* for p2p apps!
Re:Relational Filesystems by AKAImBatman · 2005-05-24 07:57 · Score: 2, Insightful

We've already got RDBMS tech - why reinvent an inadequate version of it?

Because current RDBMS designs are unsuitable for filesystems. Relational theory still holds (just as it does for OODBs), but the physical design should be quite different if it's going to be effecient.

As I said, this has been beaten to death in the research communities. BeOS even included a DBFS design, but it went largely unused. NTFS also has all the necessary stuff in it, but Microsoft constantly removes it in final releases. ReiserFS has DBFS features, but these also go largely unused.

I think the problem is that making effective use of a DBFS requires a very different set of applications. i.e. If the applications are aware of the functionality, then they can assist the user and provide useful support. But without this form of OS and application support, the user will find that the metadata is nothing but added confusion.

--
Javascript + Nintendo DSi = DSiCade
Re:Relational Filesystems by Doc+Ruby · 2005-05-24 08:09 · Score: 1

That's what I meant by 'Enough empty promises of "WinFS" and "OLEDB" vapor'. What MS was (now so obviously) trying to do with WinFS was lie about innovation, while selling the same old crap in new bags. There's probably an "integration" angle that a working version would offer to MS competitors, that MS couldn't monopolize (secret APIs are harder in DB schemas than in DLLs). But we'll never know, because MS has announced this tech as "immanent" for decades. All we know for sure is that it's popular enough in the market that MS keeps offering it, baiting & switching everyone into the next version of the same crap.

--
--
make install -not war
Re:Relational Filesystems by ajs · 2005-05-24 08:10 · Score: 1

"Enough stuffing metadata into filenames. Enough shoehorning all data into a file/folder/cabinet model, now less familiar to people than the networked infosystems that mimic them."

Well, that seems fine, except for the fact that you want your filesystems to PERFORM well. You think nothing of launching 10,000 "queries" at your filesystem, but when it comes to dealing with a relational database, you have to back off and think about caching strategies because the database is expensive.

Why? Because that flexibility requires abstraction, which costs cycles and memory.

Instead, why not re-design the filesystem to handle the most basic operations required by both databases and the infosystems that resemble, but are not databases... fortunately, someone has already done much of that for you.
Re:Relational Filesystems by peragrin · 2005-05-24 08:11 · Score: 1

Two big points. Relational databases are slow at accessing data quickly. IE it's fast enough for web browsing but do you want your game to have a few milisecond lag, so it can load up a character?

Second who controls the networked resources? What happens when you use linux as a file server and Windows on one desktop and a Mac on the other?

It can sometimes be a pain keeping Mac metadata together when transfering files on non-mac FS. what happens when Windows has it's own setup.

You know they aren't going to be compatible or work together without lot's of overhead right?

--
i thought once I was found, but it was only a dream.
Re:Relational Filesystems by halber_mensch · 2005-05-24 08:13 · Score: 1

How about just getting filesystems to be relational?

...could you please specify what color you would like your fantastic bikeshed painted?

--
perl -e "eval pack(q{H*},join q{},qw{70 72696e74207061636b28717b482a7d2c717b343 637323635363534323533343430617d293b})"
Re:Relational Filesystems by Doc+Ruby · 2005-05-24 08:13 · Score: 1

BeOS itself was largely unused. NTFS never exposed the features for apps. And ReiserFS doesn't have a simple API, or critical mass of RDBMS features. It takes that critical mass to catch on.

What would work would be a SQL interface to current files, and vice versa. I've tried to get a Samba/MySQL hybrid OSS project going several times in the past 10 years, but the teams have never come together enough to get past architecture. Sometime someone (maybe me) will be able to make the case to a paying customer, and develop it as middleware between FS and SQL apps. Then we'll see it take off.

--
--
make install -not war
Re:Relational Filesystems by Anonymous Coward · 2005-05-24 08:13 · Score: 0

Personally I'm just waiting for a stand-alone relational database with a good query language and true relational operators. Your filesystem could be built on top of it (as could any XML or Object database).
Re:Relational Filesystems by Plugh · 2005-05-24 08:15 · Score: 1

Have a look at Oracle iFS (recently renamed to "Oracle Content Management SDK", apparently)
This might be pretty close to what you're looking for:
http://www.oracle.com/technology/products/ifs/inde x.html
http://www.orafaq.com/faqifs.htm

--
Part of the Second American Revolution!
Re:Relational Filesystems by Anonymous Coward · 2005-05-24 08:17 · Score: 0

One more time: the relational model is a "MODEL", it doesn't have any inherent perfomance. Think of basic integer arithmetic. What is the performance of addition? Of multiplication? The question doesn't even make sense.

Specific *implementations* will have different performance characteristics.
Re:Relational Filesystems by Doc+Ruby · 2005-05-24 08:18 · Score: 1

IBM's extremely popular AS/400, in its "i5 server" version, uses DB2 as its fileystem. Those apps are very demanding - of performance, reliability, and flexibility. Reiser adds RDBMS features with every release, and might evolve into a true DB/FS hybrid. At the very least it proves both the concept, and its appropriateness to actual apps.

--
--
make install -not war
Re:Relational Filesystems by Doc+Ruby · 2005-05-24 08:24 · Score: 1

One reason RDBMS'es are slower than inode DBs (filesystems) is that the RDBMS is just a layer on the inode DB. Another is the inclusion of many features not necessary for a filesystem. Strip down the RDBMS to its essentials, and let it write directly to the disk geometry (with bus interrupts), maybe even integrated directly into the disk HW, and performance will compete directly with inode filesystems. Maybe even exceed them, especially in complex storage systems, even RAIDs. And certainly when an app calls for RDBMS features - more required all the time.

As for access policies on the networked resources, that's much easier on RDBMS systems than inode filesystems. Compatibility across platforms is much simpler, too. One of the favorite uses for an RDBMS is merely a "neutral ground" where data can be exchanged among those platforms you mention, more easily than any inode "sharing" tech, while offering benefits beyond that sharing alone. Your scenario is one of the best reasons why RDBMS is better than inodes.

--
--
make install -not war
Re:Relational Filesystems by Doc+Ruby · 2005-05-24 08:28 · Score: 1

Yep, they rolled out a halfassed version with Oracle 8i, and seem to be getting there with 10g. IBM is also trailblazing with i5 Server, an AS/400 version with DB2 for its filesystem. Today's minicomputer architectures are tomorrow's distributed PC platforms. Once companies like IBM and Oracle stabilize the approach, Linux filesystem developers will be there to make it cheap, flexible, and available to everyone.

--
--
make install -not war
Re:Relational Filesystems by Twinbee · 2005-05-24 08:46 · Score: 1

I agree that the relational filesystem (synonym with a metadata filesystem presumably) would be the way to go. The increase in speed and power would be phenomenal. If we take the paradigm to its logical conclusion, then only one 'folder' would be needed for all files, though of course, there's no reason to prevent the two models existing together.

You may be interested to read the following article from my site:
Towards a single folder metadata (database) filesystem

--
Why OpalCalc is the best Windows calc
Re:Relational Filesystems by Frank+T.+Lofaro+Jr. · 2005-05-24 08:50 · Score: 1

Try Samba + PostgreSQL.

I won't even trust MySQL to a bulletin board, let alone my files!

--
Just because it CAN be done, doesn't mean it should!
Re:Relational Filesystems by 0xABADC0DA · 2005-05-24 08:55 · Score: 1

Users don't give a flip about how anything you mentioned is implemented, which is all you are talking about when you say "relational filesystem this and that": how it is implemented.

Apple's OS X has plenty of directories that appear as files to users, and that is fine since users don't care about the details. If all docuements and files were in a "relational filesystem" (whatever that is), you would still need something like Spotlight to convert all the different schemas and data into a usable, searchable model.

So what problem is a relational filesystem going to solve? Unless users will start dragging tables around instead of icons then the only thing it could do is make some operations more convenient and faster -- but the price will be lots of interdependent schemas and tables. For example, everything will break when you change a file size from being a simple value to being a key into the file size table, which stores the type-specific size (ie width x height for images, duration for songs, etc).

Sure that's a simple example that maybe nobody would do, but the fact is that a relational filesystem will just jumble lots of currently separate information together making for even more dependencies and likelyhood for breaking. Would you let any random joe add new tables, columns, etc to your companies databases? So why would you want them to mess with your personal data either?
Re:Relational Filesystems by 0xABADC0DA · 2005-05-24 08:59 · Score: 1

to your companies databases?

Note to self: when you type in the new script-prevention code and hit return, it posts immediately intead of previews
Re:Relational Filesystems by 0xABADC0DA · 2005-05-24 09:01 · Score: 1

Note to self: when you type in the new script-prevention code and hit return, it posts immediately intead of previews

Note to self: check spelling before posting...
Re:Relational Filesystems by Doc+Ruby · 2005-05-24 09:02 · Score: 1

The key to getting a project like that to succeed is to get something, anything, to work - even a little bit. Then upgrade the parts that don't work. MySQL is much simpler, so much less overhead in hacking it into Samba. And it's got a much larger developer community. But even so, I couldn't get a committed team. Maybe next time.

--
--
make install -not war
Re:Relational Filesystems by Doc+Ruby · 2005-05-24 09:21 · Score: 1

Users care about results. Relational filesystems deliver the kinds of results we're seeing in the technologies I mentioned from Apple, Google, GNOME, and followers like Microsoft etc. Other approaches, like Oracle's IFS and IBM's i5 Server, also deliver those kinds of benefits. At much lower cost to developers, in time, complexity and interop. So users get better apps faster, that work together better.

The problems you mention are all possible with bad implementations of the good RDBFS idea. There's no reason for tables to show up in the UI, but groups of related items manipulated at once would be more manageable. Especially when they're freed from their current storage-layer constraints, like "files" being contiguous datasets in the same "folder" for drag/drop, when the user wants various segments of each dataset with some characteristics in common, regardless of where they're stored, and among what other extraneous data. "I sent demos containing clips of my Ford ad to people in Michgan, while I was working on that Mercedes ad. Open a new message to every one of them who hasn't replied to any message since the Mercedes ad was in the Superbowl, with that _Ad Age_ article about me quoted." - that's a SQL-style operation that could have a desktop UI as simple as file/folder drag/drop, but not with the file/folder paradigm. Those kinds of operations are useful every day, but only when the "business logic" can easily access the stored data. Moreover, relational access systems are much more secure and flexible than filesystems, especially when modeling collaboration.

The whole point is that RDBMS is the most practical framework to model today's data use, especially in complex organizations, with important security models. No one wants to write those apps for ancient filesystem semantics, which usually need an RDBMS grafted on to work, or to develop. Let's drop the filesystem bottleneck, and get on with the real work already.

--
--
make install -not war
Re:Relational Filesystems by timeOday · 2005-05-24 09:48 · Score: 1

A RDBMS, instead of the hierarchical DBMS we use for our filesystems, offers your feature, their feature, my feature, all in a package.
It is better to have just the functionality you need than a superset of what you need. Extra stuff just muddles everything, confusing end users, making implemention harder (bugs), and constraining optimization.
Most of what people store on their filesystems just isn't very relational.
Re:Relational Filesystems by timeOday · 2005-05-24 09:55 · Score: 1

One reason RDBMS'es are slower than inode DBs (filesystems) is that the RDBMS is just a layer on the inode DB.
Not necessarily:
A raw partition is a portion of a physical disk that is accessed at the lowest possible level. Input/output (I/O) to a raw partition offers approximately a 5% to 10% performance improvement over I/O to a partition with a file system on it.
Re:Relational Filesystems by Anonymous Coward · 2005-05-24 10:31 · Score: 0

You mean "lots"?
Re:Relational Filesystems by Doc+Ruby · 2005-05-24 10:34 · Score: 1

It's clear that there's a balance between software customized in features for each user, and the extreme of bloatware. The former can't be served by any developer community we know of, in meaningful scale, while the latter makes everyone pay too much for features they'll never need. The real-world pressures demanding relational features are growing. IBM and Oracle already offer i5 Server and IFS, respectively, to serve enterprises with fullblown RDBMS distributed filesystems. And ReiserFS4 has ever-more relational features for individuals and small applications. The relational demands, though the minority by datasize and even usage frequency, are pushing the threshold where it makes sense to offer an optional filesystem based on RDBMS, for the critical mass who will use it in real apps. Today's minicomputer is tomorrow's PC, and those minicomputer apps will serve us well when they arrive.

--
--
make install -not war
Re:Relational Filesystems by Anonymous Coward · 2005-05-24 11:07 · Score: 0

> One reason RDBMS'es are slower than inode DBs (filesystems) is that the RDBMS is just a layer on the inode DB.

This is not the case ... ever. Oracle typically uses a partition. Even Postgres, even MySQL use a single file. They aren't mapping anything to inodes, they're mapping rows to btree nodes that are stored in regions of a single file per table, files that are opened and kept open and not re-accessed for every database access.

The common wisdom was that using a raw partition would be slower, but buffer caches are MUCH smarter these days, and the process that flushes them is typically tuned to the drive controller that will reorder the requests for performance. To ensure durability, they issue sync() calls, but sometimes they cheat and don't do that immediately.

A database that used discrete files would be utterly insane from the syscall overhead alone. I know of a "database" that did that, called /rdb (not to be confused with RDB), but it was more of a gimmick, albeit a really neat one.
Re:Relational Filesystems by Anonymous Coward · 2005-05-24 11:54 · Score: 0

You are so completely off the mark it isn't funny.
Re:Relational Filesystems by Anonymous Coward · 2005-05-24 11:56 · Score: 0

How would this be great for P2P apps?

I'm curious, what's your vision?
Re:Relational Filesystems by AKAImBatman · 2005-05-24 13:14 · Score: 1

What would work would be a SQL interface to current files, and vice versa.

Do you mean a current interface as in a metadata wrapper around the existing file system, or a full network file system concept? The former concept sounds feasible, but the latter would be unlikely to work very scalably. :-)

--
Javascript + Nintendo DSi = DSiCade
Re:Relational Filesystems by MrCreosote · 2005-05-24 13:22 · Score: 1

I used Pick once. I wish that at that time I had remembered what my mother used to tell me

'If you pick it, it won't get better'

--
MrCreosote Meow!Thump!Meow!Thump!Meow!Thump! "You're right! There isn't enough room to swing a cat in here!"
Re:Relational Filesystems by Tablizer · 2005-05-24 16:54 · Score: 1

PC files were stored in new tables with a record length matching the size of the file. It was more a PITA than actually useful.

Relational does not hard-wire column sizes into the theory. Hard-wired sizes are an implementation detail. Vendors could deviate from if they want, and even offer "blob" or "varText" columns, but they tend not to optimize for such. I would like see a Dynamic Relational DBMS implemented where tables and columns can be defined on the fly just by using them, similar to how dynamic and type-free programming languages treat variables.

--
Table-ized A.I.
Re:Relational Filesystems by Doc+Ruby · 2005-05-24 17:04 · Score: 1

That's 5-10% right there. And then there's the overhead of the protocol translation itself, from RDBMS<->inodes.

--
--
make install -not war
Re:Relational Filesystems by Unordained · 2005-05-24 17:13 · Score: 1

Is that the same "post-relational" as with Intersystem's Caché? Their database is based on (-is-) the MUMPS/M (also 1960's) built-in "global" database. To me, looks a lot like a php array gone nuts -- each element of the array can itself be an array or a scalar value, etc. You do everything through loops of table_name[i][j][k][l][m] to find what you want, etc. And they're telling me that this is a good thing? Sparse matrix technology to the rescue? *sigh* At least they claim it also supports SQL ...
Re:Relational Filesystems by Doc+Ruby · 2005-05-24 17:14 · Score: 1

I like the idea of a Samba server (frontend) with a SQL VFS to a RDBMS server backend. That lets existing apps with a filesystem interface connect via Samba, which translates to SQL, and new apps querying the DB directly in SQL. Throw a tuned Linux server with hi-SMP, GB ethernet and a large RAID at the server, and performance will stay high, while apps get whichever API they prefer. Concurrency, replication, and all kinds of other RDBMS features come with the package.

--
--
make install -not war
Re:Relational Filesystems by Doc+Ruby · 2005-05-24 17:28 · Score: 1

And what do you think writing to those "files" means? It means writing to the inode database. Which sector, filesystem metadata, etc. Except in rare cases, like one mode of Oracle, which writes directly to the disk geometry - without even an OS.

--
--
make install -not war
Re:Relational Filesystems by Doc+Ruby · 2005-05-24 17:32 · Score: 1

You are such a paradigmatic Anonymous Coward, with your content-free post only an excuse to make insupportable insults.

--
--
make install -not war
Re:Relational Filesystems by ckaminski · 2005-05-24 17:41 · Score: 1

Accessing files like so:

\\server\%%Quarterly Reports%% sucking down every document in server with a permutation of that string, or said title?

I could see filesysetems benefitting from such syntactic sugar. Simplicity in design, however. Should be something supportable by existing applicatioins. Simple, extensible, repeatable.

I cannot commit to the development of such a system, but I can see something very simple based on wildcarding would be beneficial. Here's hoping someday, someone, somewhere, can deliver such a solution. There are management problems associated with operating sch a system, but things like offline retrieval and metadata storage lend themselves fairly nicely to such a solution without having to be grafted on like 5th wheels.
Re:Relational Filesystems by ajs · 2005-05-25 00:30 · Score: 1

"IBM's extremely popular AS/400, in its "i5 server" version, uses DB2"

I don't know AS/400s, but I think you'll find that DB2 has special support for acting as a filesystem and/or caching metadata in a vnode-like system, and you're not performing true relational operations in order to find your file and look up its modification time.

"Reiser adds RDBMS features with every release, and might evolve into a true DB/FS hybrid. At the very least it proves both the concept, and its appropriateness to actual apps."

No, that's evidence that the model that I was suggesting is practical (hence the reason that I linked to Reiser). What you want is a low-level that supports the semantics that routine filesystem use AND databases can take advantage of so that your filesystem's data is available to the database and visa versa. This is what Reiser is doing, and it's, IMHO, the next logical evolution of the filesystem without taking a giant performance hit from having a relational database for a filesystem.
Re:Relational Filesystems by Anonymous Coward · 2005-05-25 04:48 · Score: 0

And you are such a typical slashbot who needs to hijack EVERY topic of discourse into some frothing anti-microsoft rant not supported even by cites to anything resembling external data, let alone evidence.

Get a new note, Mr. Glass.
Re:Relational Filesystems by Doc+Ruby · 2005-05-25 05:02 · Score: 1

You are the Anonymous poster child Coward. The post you're whining about cites Microsoft's well-known failure to introduce the OLE-DB and other "database filesystem" technologies they're always baiting the market with, then switching at the last minute to some other, less exciting filesystem tweak. But Microsoft worshippers like you just suck it up every time. Your brainwashing keeps you from seeing Microsoft's sleazy marketdroid mouthpiece lying in every possible tech niche, while hardly ever delivering. All you can bitch about is the reasonable response: informed calls of "bullshit" from people with brains who can tell the difference. At least you zombies can muster only worthless posts like the ones you're spewing in this thread - playground "nyah nyahs" that ignore the facts, and just grumble the echoes of defensive Microsoft cliches. Shamble back to your safe little MS user group, where your kind can distract only one another, without getting in the way of the living.

--
--
make install -not war
Re:Relational Filesystems by lostguy · 2005-05-29 15:34 · Score: 1

Wait, are you calling OLEDB a "database filesystem"?

OLEDB's a db-abstraction technology to provide a db-like interface to multiple providers, one of which can be a filesystem, but that's not its primary purpose. Sure, it's nice that there are providers for Excel and other non-RDBMS, but OLEDB simplifies the heck out of integrations that otherwise would have to use ODBC or some vendor-specific interface.

At the time OLEDB was born, around the time of ODBC 2.0, you couldn't really plug in random RDBMS engines behind an ODBC application. You still couldn't with the first versions of OLEDB, but it was a lot closer.
Re:Relational Filesystems by Doc+Ruby · 2005-05-30 04:15 · Score: 1

I'm calling it vaporware, precisely because it isn't a database filesystem. Because Microsoft told us that's what it would be, to keep us excited about MS innovation, then gave us something much less, while sticking us with the ancient, regular filesystem. That's what this entire subthread is about.

--
--
make install -not war
Re:Relational Filesystems by lostguy · 2005-05-30 13:19 · Score: 1

Are you sure you're talking about OLEDB then? Because it's in widespread use, so it's not vaporware. Either you're confused and calling a particular, promised OLEDB _provider_ vaporware, or you're thinking of something else.
Re:Relational Filesystems by Doc+Ruby · 2005-05-30 13:23 · Score: 1

I'm talking about the OLEDB they promised, which would replace FAT (and NTFS, for that matter) as the default filesystem, with a relational API for developers to all filesystem data. The promised original is vapor, and what they sell is a shadow of their promise.

--
--
make install -not war

Connectivity by FriedTurkey · 2005-05-24 07:22 · Score: 1

Local caching data gets ugly. Eventually the connectivity issues will be fixed. Mobile devices will seamlessly connect to databases preventing the need for caching.

"Organizing" issues aren't as important as getting the data faster and handling a large amount of users. Databases are getting larger and larger. Query optimizations can only improve queries to a certain extent. Eventually the amount of users and data reaches a threshold that more and more hardware doesn't always fix.

Re:I did not RTFA by AKAImBatman · 2005-05-24 07:25 · Score: 1

s/data security/data safety/g

I just realized that line might be confusing.

--
Javascript + Nintendo DSi = DSiCade

PostgreSQL is object-relational by Anonymous Coward · 2005-05-24 07:27 · Score: 1, Interesting

I remember the hype surrounding object databases and xml databases. What happened?

I believe there is simply too much existing data+code that depends on traditional RDBMS infrastructure.

To get the best of both worlds, I use PostgreSQL for both traditional RDBMS and the less well-publicized ORDBMS features.

With version 8.0 available as a native Windows software, I think it'll start grabbing marketshare at a much faster pace. Having it available only on Linux and *nix natively kept adoption rates at bay.

ps

For many applications, I found sqlite3 to be sufficient. Low overhead and speeds that blow away client/server solutions like PostgreSQL or MySQL. And cdb is even faster than all of these for read-only data.

Re:PostgreSQL is object-relational by Fizzog · 2005-05-24 08:01 · Score: 1

"I remember the hype surrounding object databases and xml databases. What happened?"

There are many large companies (finance, insurance, shipping, utilities, etc.) who are using GemStone (www.gemstone.com) very successfully.

We use GemStone as our persistence layer and application server on my project (at a major bank).

The only limitation at the moment is it only provides 32 bit support, so we are limited to about one billion objects, but I believe GemStone are working on a 64 bit version.

It can be kind of funky at times but is also an exceptionally productive application server development environment, and when coupled with a Smalltalk dev tool (VisualAge, VisualWorks, etc.) is probably the most fun you can have developing software.
Re:PostgreSQL is object-relational by Anonymous Coward · 2005-05-24 08:10 · Score: 0

What happened? Well, in the 60's we had hierarchic databases. Those were rejected because they weren't general enough. Then we had network databases. Those were rejected because they weren't general enough, The relational model is a general model for data storage and manipulation. But here we are again, back to hierarchic (XML) and network (Object Database). The smart people know this is just more ad-hoc nonsense, and it will be rejected soon enough once the people using it realize the limitations. Of course, it would be nice if people would just crack open a book on the relational model and spend a few minutes understand what the hell they are getting paid to do, but I guess that's asking too much. More consulting dollars for the rest of us I guess...
Re:PostgreSQL is object-relational by Anonymous Coward · 2005-05-24 08:42 · Score: 0

Use postgres as a relational database, ignore those useless "object" features which don't even work (can't do constraints properly).

I've found that with triggers and careful coding, I can translate most of my relational designs into PostgreSQL without losing a lot of integrity constraints.

But do stay away from the inheritence features. There's nothing there that you can't do better with view and triggers.

Re:I did not RTFA by techwolf · 2005-05-24 07:28 · Score: 5, Insightful

Quite true. MySQL does very well into the gigabytes. I haven't seen any good evidence of its abilities in handling terabytes of data. Don't get me wrong, I'm a huge fan of the MySQL, but I'm a bigger fan of using the right tool for the job. For your web message board, MySQL works fine. For holding product, sales, distribution, etc. information for, say Levis, it would not.

--
I don't do this for karma, I do it for cash. It's much better.

SQL isn't a database by Nytewynd · 2005-05-24 07:30 · Score: 5, Informative

SQL, on the other hand:
1) Reasonably simple API
2) Scales to very large databsaes
3) Cross-platform/architecture
4) Performs very well.
Given the insane amount of inertia SQL has, it will extend into an object model, rather than be replaced by one. (EG: C/C++)

SQL is a language for set operations. By itself it isn't a database or storage utility. There are some different versions similar to what you describe. Oracle's PL/SQL allows you to make temporary tables and materialized views. Neither solves the overall problem the article describes.

SQL by itself doesn't perform. It is based on the database engine, and how good the developer is. I have gotten SQL queries that took minutes to exectue in seconds by adding indexes, analyzing tables, and totally rewriting inefficient code. It is only "cross-platform" if you follow the ANSI SQL standard. Each database has it's own set of handy functions that make the code database centric.

SQL doesn't really have an API. It is a specification that is sometimes followed by database designers, and sometimes ignored. For example, in Oracle you can either use the ANSI joining sytax (LEFT OUTER JOIN) or use the (+) in the where clause.

It scales to large databases only when they are designed properly. I work with 18 terabytes of data. My sql code wouldn't work so hot if the tables weren't designed correctly. Indexing, partitioning, and table structure have more to do with performance at that level than the code. The code can make a large difference too, but if the underlying structure is wrong, even the best SQL won't help you.

--
/. ++

Re:SQL isn't a database by Just+Some+Guy · 2005-05-24 08:13 · Score: 1

SQL by itself doesn't perform.
That's kind of disingenuous, though, like saying that "C by itself doesn't perform". That's true, in as much as both are standards and not implementations. However, it's a bit misleading because C, at least, has features that make it inherently difficult to optimize (as compared to other languages like Fortran).

IANA DB theorist, but I could imagine that SQL might also contain some "traps" that might necessarily make its implementations slower than what another query language might make possible.

--
Dewey, what part of this looks like authorities should be involved?
Re:SQL isn't a database by mcrbids · 2005-05-24 08:22 · Score: 3, Informative

SQL by itself doesn't perform. It is based on the database engine, and how good the developer is.

A truth I hold to be self-evident. The language of SQL provides all the tools you need to make your application perform well, as you state.

SQL doesn't really have an API.

Realistically, SQL is an API. It's a highly abstracted interface for communicating between two programs. (your app, and the DB server software)

It is only "cross-platform" if you follow the ANSI SQL standard.

Sorta. See, I can write a script using PHP with a particular SQL call, and do the same thing in Perl, Java, ASP, C, C++, Python, Ruby, and even BASH, on Linux, Windows, Mac, or just about anything else with a tcp stack and a compiler. Sure, SQL implementations are different, with various shortcuts and extensions, but I'd call that cross platform if ever there was one.

Let me ask you this: How often do you see an OSS product (EG: phpwiki) that doesn't offer support for numerous databases?

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Re:SQL isn't a database by klausboop · 2005-05-24 09:28 · Score: 1

I think you may be missing the parent's point. You can create a table with a column of VARCHAR2 in an Oracle database, and you can send that SQL command using any of the languages you mentioned. But if you send that table creation command to SQL Server or DB2 or what have you, regardless of the programming language used to write the app communicating with the database, the command will fail because Oracle's VARCHAR2 datatype is proprietary (i.e. VARCHAR2 is not defined in the ANSI standards)

What you're saying is akin to saying that "Hello World" is cross-platform because you can output it using a variety of programming languages. But that forgets the recipients of the output that expect "hola mundo." (Of course this analogy doesn't fully hold up because there is no human language standards body equivalent to ANSI (yet))

--
Some of you already have those cute little shirts on that say disco sucks, right? That's not all that sucks.-Frank Zappa
Re:SQL isn't a database by drew · 2005-05-24 09:48 · Score: 1

Let me ask you this: How often do you see an OSS product (EG: phpwiki) that doesn't offer support for numerous databases?

I see quite a few that only work on mysql (many of the form phpMy*). it seems to me that anything that was written originally for mysql will pretty much always be mysql centric, while projects that were originally written for databases other than mysql quickly branch out to support a variety of databases.

--
If I don't put anything here, will anyone recognize me anymore?
Re:SQL isn't a database by Shadowlore · 2005-05-24 13:49 · Score: 2, Insightful

Let me ask you this: How often do you see an OSS product (EG: phpwiki) that doesn't offer support for numerous databases?

Quite a bit actually. In my experience most only support MySQL. Why? They don't understand the value of data integrity and the features MySQL lacks/lacked. As a result they have had to poorly implement things that the DB itself should do, and when they try to "port" this to another DB they run into all sorts of issues.

And SQL is a language, not an API.
So they give up.

--
My Suburban burns less gasoline than your Prius.

Synchronization by 3770 · 2005-05-24 07:30 · Score: 3, Informative

Most mainstream databases support replication. They are designed to be as fast as possible under heavy load.

Synchronization for a mobile device has another main requirement, robustness when the connection to the server is lost. A mobile device has to gracefully handle when the owner runs down into the subway.

--
The Internet is full. Go Away!!!

Re:Synchronization by duffer_01 · 2005-05-24 07:39 · Score: 1

Oh man, you are so correct. There is so much more that people do not realize to data synchronization in mobile enviroments. I mean take for example data conflicts. If someone makes a change on Monday and sync's on Friday and then someone elese makes a change on Tues but sync's immediately. Who wins? There are so many issues people just take for granted when they only use online systems.

--
Adventure City Tours

Object-relation databases by promantek · 2005-05-24 07:31 · Score: 1

I'm taking a DB course right now and we've been discussing the future of databases, and the prof seems to think it's Object relation databases.

How wide-spread is OR DB is the industry? I've never used one. What's the story with them?

Re:Object-relation databases by m50d · 2005-05-24 07:38 · Score: 1

People have been talking about moving to an object model for about as long as OOP has been around. Zope actually implements it, I think. People don't seem to use it, or see big advantages to it. But that could just be inertia.

--
I am trolling
Re:Object-relation databases by 3770 · 2005-05-24 07:39 · Score: 1

Object-relation databases has been the future of databases for at least 10 years now.

And it remains in the future.

Maybe it'll happen some day but so far I haven't seen anything happen in the industry.

support for XML in databases is far hotter.

--
The Internet is full. Go Away!!!
Re:Object-relation databases by Anonymous Coward · 2005-05-24 08:29 · Score: 0

The object database models are usually either ad-hoc, which makes them useless for guaranteed behavior, or just a subset of the relational model, which makes you wonder what the point is.

As for your prof, anybody who makes that claim is incompetent and shouldn't be teaching others.

Ask him to demonstrate 1) what the object relational *model* is and 2) why it doesn't reduce to the relational model. If he doesn't even understand the question, or answers in terms of specific products or implementations rather than *models*, quit the class immediately, you will learn nothing of value.
Re:Object-relation databases by Anonymous Coward · 2005-05-24 09:02 · Score: 0

==>the prof seems to think it's Object relation databases.

My prof tried to push that on us back in 1992. "It's the future". Ayup. That happened, didn't it. It's going on 13 years since I heard that statement and I ain't seen it yet.
Re:Object-relation databases by benow · 2005-05-24 09:50 · Score: 1

Yes, well, it's perhaps a philosophical viewpoint more than practical. Working with objects from an object language is nice (and 'natural'). There is also the promise of versioning, quick object traversal and simple querying. In practice, those are very hard to implement seemlessly and standards for which are few (odmg, jdo). I've been using a homebrew object-relational mapping layer for years. It's slow (needs optimization), cumbersome to maintain (somewhat less than others, needs use optimization) and quite proprietary... but, it allows for dealing with (data) objects at a high level... modelling from the objects to the tables and not the tables to the objects. With a couple months work, the thing could be quite nice, replacing a 'real' (read expensive) oodbms (for smaller scale apps), but with the x-app sql binding and less data layer lock-in (and ability to migrate to more capable rdbms backends). I'd like to get it to the point where the changes to the object model roll across to the dbms schema (with override), which would allow for a much more fluid relation between code and data... and half the fun is doing it.
The fact that RDBMS' have been around for years means they've been under the scrutiny of many eyes for many years and are rock solid, if cumbersome. Of course, the fact that the few existing high performance oodbms' that do exist are horrifically expensive. Objectivity being the best I've seen... able to capture objects at many Gigabytes/Sec into multinode distributed object network (for CERN particle data), but, yow, oracle is cheap in comparison.
Re:Object-relation databases by anomalous+cohort · 2005-05-24 10:21 · Score: 1

Object-relation databases
There is no such animal. There is such a thing as object/relational technology. Some common examples of this are hibernate and EJB. These are not alternatives to relational databases. Rather, they serve to persist objects to and from relational databases. They are built on top of relational databases. They do not replace relational databases.

Alternatives to relational databases are LDAP databases such as openLDAP, OLAP databases such as Hyperion, and XML databases like exist. None of these technologies will replace the relational database. It's more about using the right tool for the job. Relational databases work best with operational data. OLAP works best for planning and forecasting. Think of LDAP as a distributed hierarchical database.

Various relational database products have proprietary extensions that may confuse you into thinking that they are alternatives to relational databases. For example, there are extensions to SQL Server that make them seem to act more like OLAP or XML databases.
Re:Object-relation databases by jbplou · 2005-05-24 11:25 · Score: 1

support for XML in databases is far hotter. Thats because XML has become a marketing standard, add XML support to your product and put it on the bulleted list of new features. I personnaly think xml is horrible for storage of a large amount of data, too much extra meta data per record.
Re:Object-relation databases by macsuibhne · 2005-05-26 01:31 · Score: 1

The term Object/Relational is defined by Michael Stonebraker (progenitor of Ingres and Postgres) and Paul Brown in the book linked above. It is not the same as object persistence as implemented in hibernate or in the emerging EJB API, or older technologies such as Versant. It's impossible to have a meaningful dialogue about this without agreeing on fundamental terminology. Since Stonebraker coined the term, it seems reasonable to defer to him on its meaning. Extant implementations are to be found in Informix, Illustra, and Postgres, from which the former two derive.

Tony.

--
-- "Quis custodiet ipsos custodes?" -- Juvenal
Re:Object-relation databases by anomalous+cohort · 2005-05-26 10:20 · Score: 1

Ah, I get it, the Postgresql object oriented SQL syntax. I missed that because I was interpreting the original post in the context of the increasing need for intelligent caching on the local hardware.

Okay, here's a question for you. If an RDBMS extends the SQL syntax, is it still an RDBMS? My take on that is this. If the mental paradigm is still set theory with predicate calculus and 2D tables with columns, then it is relational.

It's already done by duffer_01 · 2005-05-24 07:32 · Score: 2, Interesting

"one size no longer fits all"

This is absolutely correct when referring to databases for different application. However, why do people always assume that they have to choose the Oracle's, MS and IBM's out there? There are already databases that have been tailored for certain application environments. Take for example http://www.ianywhere.com/ who has databases like SQL Anywhere and UltraLite which are tailored for smaller workgroups and mobile devices.
I don't think the solution to the problem is to build a more complex non-relational system but rather to choose the right tool for the job. Why reinvent the wheel when you don't have to?

--
Adventure City Tours

Re:I did not RTFA by abigor · 2005-05-24 07:32 · Score: 5, Funny

Yeah, for those terabytes of data taken up by your mom's recipes and your cd collection, the extreme power of PHP and MySql is all you need, man.

5th normal form model of perfection. by HornWumpus · 2005-05-24 07:33 · Score: 1

Ivory tower comp sci people have defined normal form way past the fifth.

In the real world you get to third, then carefully denormalize for performance.

--
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'

Re:5th normal form model of perfection. by Frank+T.+Lofaro+Jr. · 2005-05-24 08:45 · Score: 2, Insightful

If full 3rd normal form is too slow, you need better hardware.

If a few joins (assuming you've got indexes, etc setup) make it too slow you are too close to the edge.

Saving on hardware but spending more in dealing with data problems is false economy.

--
Just because it CAN be done, doesn't mean it should!
Re:5th normal form model of perfection. by orderb13 · 2005-05-24 08:58 · Score: 3, Funny

I actually once asked a Terradata consultant what the 4+ normal forms were used for. His response was "So high priced consultants can finish out their contract without actually having to do anything."
Re:5th normal form model of perfection. by Anonymous Coward · 2005-05-24 10:58 · Score: 0

> If full 3rd normal form is too slow, you need better hardware.

What delightfully uninformed statement. Data warehouses are denormalized because they expects access patterns that are among other things, read-only.

You go ahead and buy four times the hardware required so you can put your database in 3NF and run it at the same speed the older hardware would have for a denormalized table. Meanwhile I'll run the same warehouse tables 20 times faster once I cut the disk seeks merely in half from the better hardware.

Your department's performance will go south in a budget freeze. Mine will have money to spare.

Welcome to the real world, theory boy.
Re:5th normal form model of perfection. by tepples · 2005-05-24 11:37 · Score: 1

Data warehouses are denormalized because they expects access patterns that are among other things, read-only.

Then why can't the denormalization happen within the DBMS? Indexed views are still compatible with ivory-tower relational theory.
Re:5th normal form model of perfection. by HornWumpus · 2005-05-24 12:01 · Score: 2, Informative

Sure it could. But it does'nt.
The DBMS could theoretically fix badly stated SQL and do it right anyhow. But it does'nt.

I've tuned data collection processes that were originally written by a moron who wrote his masters thesis on query optimizing. My version turned a two and a half minute data collection, reduction and display process into a fifteen second process. Hearing him exclaim 'that's impossible' over the cube walls was a highlight of that job. (Bozo did a masters thesis on optimizing, but did'nt know how to read a query plan).

Just because someone has ivory tower credentials does'nt mean they know their head from their ass.

--
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
Re:5th normal form model of perfection. by TapeCutter · 2005-05-24 14:03 · Score: 1

So you are saying, one high priced consultant has no idea about what the another high priced consultant is talking about?

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.

Why 'Beyond'? by Anonymous Coward · 2005-05-24 07:34 · Score: 5, Insightful

Designed in the 1970s, the RDBMS has nevertheless proven to be the cornerstone of Web development three decades later. Thanks to systems like MySQL deployments are surely at record levels.

Essayist Clay Shirky has gone to far as to suggest that MySQL is at the center of a whole new software movement.

In my experience with Web applicaions the chief problem with the RDBMS seems to be that it does not do text indexing and search very well, so I have to keep a second store of data in something like Lucene.

The other major problem is the level of skill required to tune the database to achieve high-performance SQL queries, so hopefully the RDBMS will evolve with more self-configuration capability.

The article, which I only skimmed, actually addresses these two concerns but seems to pooh-pooh the notion of simply refining the existing RDBMS systems. Instead it says " Old-style database systems solve old-style problems; we need new-style databases to solve new-style problems. "

The paper seems awfully squishy on what this means. The clearest I found was a call to "produce a storage engine that is more configurable so that it can be tuned to the requirements of individual applications."

But this call for new highly modular/configurable storage "engines" seems to me to require at least as much fussy care and feeding as a traditional RDBMS. You're just replacing one DBA with another. And throwing out decades of refinement in the process.

The raison d'etre of the RDBMS is to allow the programmer to treat storage as a black box while gaining nifty ACID features. Extending this to text indexing seems logical.

Less is more by WiFireWire · 2005-05-24 07:35 · Score: 0, Troll

'nuff said. thekeyboardgoblinsstolemyspacebar

Kind of OT SQL Question by Over_and_Done · 2005-05-24 07:35 · Score: 2, Insightful

I agree with the sentiments of the posters that SQL is not going anywhere, but I had a question.
As I am designing more and more complex web apps, I am constantly having to think of new, innovative ways to design the tables and databases and am currently making it up as I go. Does anyone have a reccomendation for books/sites that talk about good design proactices, that is not "How to use SQL" and relatively agnostic on the specific brand on DB?
Sorry for the OT post, its just something that has been bugging me for a while

Re:Kind of OT SQL Question by Anonymous Coward · 2005-05-24 07:57 · Score: 0

What a question.. like a cell phone engineer asking for books on physics.. didn't they teach you this in school?

But since you are curious: you need to start with relational theory. I recommend Date's new book on O'Reilly:

http://www.oreilly.com/catalog/databaseid/

Read and understand the relational model, and DO NOT think about how current SQL databases work. Think abstractly.

Design your databases with this model in mind, then translate it into your favorite SQL product. Find a way to implement the general integrity constraints of relational model with, e.g., SQL triggers. Once you master data *integrity*, your database will hum along smoothly.

If you don't start shaking your head and thinking "WOW SQL really sucks, it's nothing like the relational model which is so general and powerful" .. read the book again. You'll come around. The more people who are demanding better products, the more likely a vendor will deliver them.
Re:Kind of OT SQL Question by thirt · 2005-05-24 08:01 · Score: 1

Are you refering to the concept of UDM? Univeral Data Model? Whatever you're working on, its data structure has been modeled before. Really. It has. In fact, it's probably been modeled times before. Some cool recommended reading: http://www.amazon.com/exec/obidos/tg/detail/-/0932 633293/qid=/sr=/ref=cm_lm_asin/002-7630614-0141659 ?v=glance http://www.amazon.com/exec/obidos/tg/detail/-/0471 380237/qid=/sr=/ref=cm_lm_asin/002-7630614-0141659 ?v=glance
Re:Kind of OT SQL Question by genus+babbage · 2005-05-24 08:12 · Score: 2, Informative

Depends what level you're after really, but here are some pointers that might help:
I know it says access on the cover, but Steve Roman's book Access Database Design & Programming is pretty good as a starting point, IMO.

You could also try Michael Hernandez's Database design for mere mortals

I found both to be pretty readable and a good introduction to the theory.

Once you've got the basics down, there's lots of further stuff you can try, from the obvious books by Codd and Date, to the more esoteric and weird.

I would suggest, however, that you don't stress the "platform independance" of databases; you can do it, but it's not going to be pretty - most of them have their own syntax, additions, quirks and foibles, and that's just in the SQL; it gets worse when you actually try and design code stuff; for example if you program your application to work with Oracle for locking, and then switch to, for example, SQL Server, you're going to have to redesign the way it works, since they have completely different strategies (or at least they did - been a while since I've done any SQL Server).

If you try for DB independance, you'll also end up coding to the lowest common denominator (no sequences for example... 'cause they all have their own standard...), which kinda takes the fun out of anything.

For Oracle, Tom Kyte is probably a good bet - Expert One-on-One Oracle is very good, IMO, well worth the asking price; I can't really recommend anything for non-oracle though, maybe someone else can help out; O'Reilly are bound to have something for pretty much everything and everyone.
Re:Kind of OT SQL Question by Colin+Smith · 2005-05-24 10:51 · Score: 1

Search Google for "Codd".

--
Deleted
Re:Kind of OT SQL Question by anarxia · 2005-05-24 11:58 · Score: 1

IMHO locking is better handled with stored procedures to do the whole transaction in one go (if possible of course). Sure it's more code (ugly code) but you can optimize for your particular database and it will work if you change vendors (once you rewrite them of course) without many changes to your application code.
For sequences I usually use helper functions to do my inserts. Something like DoInsert(sql, field_name). For MS Sql Server I append ";SELECT @@ IDENTITY", for postgres and oracle ";SELECT currval('field_name_seq')" etc.
There are ways to use "advanced" features and remain fairly portable but it needs effort.
Re:Kind of OT SQL Question by popeyethesailor · 2005-05-24 16:05 · Score: 1

You've got some vague answers mentioning books on relational theory.
I would recommend you pick up Joe Celko's SQL for Smarties. This covers practical problems, a number of database implementations; the author also highlights where SQL shines, and where it doesnt.

The real benefit of this book is it makes you start thinking in Sets; this is extremely important if you want to get any reasonable performance out of your DBMS. Some problems which seemingly require procedural solutions can be tackled with SQL, for eg. hierarchies, trees etc.

Newer DBMS systems offer even more richer SQL constructs ; The analytics/ data mining extensions for Oracle really rock. I'm told SQLServer 2005 will have similar DW extensions. Taking the time to read product documentation will also help tremendously. Most enterprise DBMS vendors offer very good documentation.
Re:Kind of OT SQL Question by Tablizer · 2005-05-24 17:00 · Score: 1

Search Google for "Codd".

Hold on. He might be a genious, but articulate he is not. Newbie no go.

--
Table-ized A.I.
Re:Kind of OT SQL Question by ZenFu · 2005-05-25 02:38 · Score: 1

Just to reinforce recommendations others have made...

It's been years, but for me, simultaneously struggling through Date's Intro to Databases to understand the theory and Celko's SQL For Smarties to understand SQL did the trick.

I did not not like Hernandez's (sp?) book database design for mere mortals. Although I appreciate the author's consulting approach, I prefer to develop my understandings on the more theoretical works - such as Date's work. I also like Date's aggressively opinionated style - check out his website database.
Re:Kind of OT SQL Question by GoofyBoy · 2005-05-25 15:05 · Score: 1

You shouldn't have to think of new ways of doing tables, it should have already been done before. There really should be "patterns" for data modelling.

One of the best design books I have is;
http://www.oreilly.com/catalog/oracledes/index.htm l

It does weigh towards Oracle, but trust me you can apply the ideas anywhere.

Also, look into design tools such as ERWin or Oracle Designer and their cheaper OpenSource equivlents. It will help show potential problems and keep you consistant.

--
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.

Scanned consciousness.... by SpectreBlofeld · 2005-05-24 07:35 · Score: 1

It's cool. We'll start scanning peoples' consciousnesses into computers, as per yesterday's article, and make them our database-indexing cyberslaves.

AS-400 by HornWumpus · 2005-05-24 07:35 · Score: 1

IIRC uses DB2 as its 'file system'.

--
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'

Re:AS-400 by Doc+Ruby · 2005-05-24 07:50 · Score: 1

And AS-400 is the most popular computer series ever to be subjected to the demands of real IT professionals, scientists, and corporations. Sounds like it works. If that kind of minicomputer tech was in the hands of millions of PC users, like the rest of Linux, we'd see both Linux and the filesystem improve - and the users' experience along with it.

--
--
make install -not war
Re:AS-400 by Monkelectric · 2005-05-24 08:04 · Score: 2, Insightful

As/400's are great for accounting and manufacturing. They are *NOT* used by IT professionals or Scientists and are only used in accounting (primarily gambling -- to monitor slot machines) and manufacturing.
AS/400's are hopelessly complex even to seasoned IT professionals such as myself, and they're only around is not because people like them but because the work SO GOD DAMNED WELL :)

point being, tractors work well to but you dont drive one day to day do you? :)

--
Religion is a gateway psychosis. -- Dave Foley
Re:AS-400 by ignorant_coward · 2005-05-24 08:08 · Score: 1

That's probably so IBM can charge more in licensing and support.
Re:AS-400 by LWATCDR · 2005-05-24 08:25 · Score: 1

I worked an AS-400 as a hospital when I was in college. It was only used for accounting. The lab ran a different system. Yea they worked well.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Re:AS-400 by Anonymous Coward · 2005-05-24 08:28 · Score: 0

No it doesn't. DB2 is running as a server on top of OS/400, you can see it has it's own subsystems. DB2 was wiggled in to fit with the existing database mechanism that is native to OS/400, which traces its roots back to the System/38, maybe further. Once DB2 was added to the AS/400, database performance dropped because the DB2 subsystems would take up resources that previously were not part of the system, and certainly aren't required if your application use native database access, rather than clunky SQL.
Re:AS-400 by A+Numinous+Cohort · 2005-05-24 17:05 · Score: 1

I teach the AS/400 elective track at a small computer college.

It's easy to learn how to use and operate the AS/400. That's not just my opinion developed over the course of 25 years of using the AS/400 and its predecessors going back to the S/34--the students in my Introduction to AS/400 class would agree with me.

SQL and the other data utilities provided by IBM are also easy to pick up--seniors without any AS/400 background learned to use these in a single term.

What they find challenging is using the development tools: the SEU editor, IBM's data language DDS and most specially the RPG programming language.

Maybe these utilities are better on the iSeries--the box I use is pretty old (OS/400 version V3R7).
Re:AS-400 by Doc+Ruby · 2005-05-24 17:16 · Score: 1

Other IT professionals disagree.

--
--
make install -not war
Re:AS-400 by Monkelectric · 2005-05-25 10:15 · Score: 1

That is madness :)

--
Religion is a gateway psychosis. -- Dave Foley

Fabian Pascal, I summon thee! by Anonymous Coward · 2005-05-24 07:41 · Score: 0

Fabian Pascal trolls, thee also do I summon!

Let the flamage commence!

Re:Fabian Pascal, I summon thee! by Anonymous Coward · 2005-05-24 08:00 · Score: 0

I think those types of people don't even both reading garbage like this any more. I'd rather read a book full of Pascal flames than more of this "beyond relational" junk. What exactly is "beyond" the relational model? Funny how none of these papers actually describe any new model!

ACM == the Academy of Country Music by Anonymous Coward · 2005-05-24 07:41 · Score: 0

They have some interesting things to say about databases, and monster truck rallys.

According who WHOM? by OpenYourEyes · 2005-05-24 07:43 · Score: 1

According to the ACM...

No... Acording to Sleepycat, who have a great name and logo, but an otherwise very annoying data store.

Re:According who WHOM? by Tool+Man · 2005-05-24 08:01 · Score: 1

What's wrong with the Berkeley DB? It isn't supposed to do everything you might use an SQL database for, but is arguably much better at some tasks than the SQL database people might think to use first.

This isnt a new discussion by Anonymous Coward · 2005-05-24 07:44 · Score: 0

The Third Manifesto came out years ago.

Re:I did not RTFA by cuntzilla · 2005-05-24 07:44 · Score: 1

If you want to haul 300 metric tons of rock from point A to point B, you need a dump truck.

I do believe you would need more than one. Perhaps a convoy of them.

Multi-Terabyte Data Warehouse and MySQL by MexicanMenace · 2005-05-24 07:48 · Score: 0, Troll

That's nice. It won't handle a multi-terabyte database, though. That's the domain of Terabase, Oracle, and (blech) DB2. It's also what the article is about.

http://www.mysqluc.com/cs/mysqluc2005/view/e_sess/ 6218

Sabre is using MySQL to analyze data that was not feasible to analyze before by replacing tape drives with online accessible information that is affordable. This talk outlines the project--data warehousing with large multi-gigabyte tables containing billions of rows of information--and gives details into the architecture, design, and rollout. Benzinger provides information on design decisions, lessons learned, performance tuning measures, and results obtained. Download presentation file

Slide 6 shows their MySQL database to be 26TB in size with 600,000 I/O per second.

Just so you know.

Re:Multi-Terabyte Data Warehouse and MySQL by AKAImBatman · 2005-05-24 08:03 · Score: 4, Interesting

I have no doubt that terabytes could be stored in MySQL. My overall point is that MySQL is not designed to effectively manage that much data. For example, the presentation you link to shows that Terabase is the workhorse of the business. Data is then offloaded to a disposable MySQL database for data warehousing analysis. The database is then purged after one week.

The holy grail of information technology would be to eliminate the need for such cumbersome replications, and instead have a single, reliable data source that can be queried for any information needed at any time. Unfortunately, MySQL isn't it. ;-)

--
Javascript + Nintendo DSi = DSiCade
Re:Multi-Terabyte Data Warehouse and MySQL by Anonymous Coward · 2005-05-24 08:55 · Score: 0

Troll. Parent is taking things out of context and attempting to start a flamewar. Please mod down.
Re:Multi-Terabyte Data Warehouse and MySQL by Anonymous Coward · 2005-05-24 09:21 · Score: 0

> I have no doubt that terabytes could be stored in MySQL

oh, yeah theoretically. Just like my pc can theoretically support far more memory than my motherboard will actually accept.

but practically...lets see:
- no query parallelism (which gives db2 & oracle *linear* performance improvements)
- primitive rules-based optimizer (which means queries in mysql can *easily* tank when they join in non-optimal ways)
- no partitioning (which means full table scans all the time, while db2 & oracle will be scanning just 10% of the data)
- lack of asynchronous agent processing? (means inefficient processing & serial workloads)
- primitive bufferpools (which means it's hard to optimize memory applied to various tables, indexes, etc)

ah, and it's mysql. So unlike every other option in the marketplace, you can expect to get invalid dates, truncated strings, truncated integers, etc. And according to mysql this isn't a bug, it's a feature we should be glad of since it improves their performance. Yea.

Bottomline: this is just mysql fanboy fud. MySQL doesn't even come close to doing what db2 & oracle do easily (and correctly) at the high end. Just because mysql can do an indexed-lookup on an isam file of 100-thousand rows very quickly has nothing to do with the performance required to answer queries against a ten-billion-row fact table, while joining to a dozen outer dimension.
Re:Multi-Terabyte Data Warehouse and MySQL by AKAImBatman · 2005-05-24 09:38 · Score: 1

Dude, you're repeating my point. (Granted, with much more detail.) Just about any database can *store* as much data as its disks can hold (as long as the internal data offsets are large enough numbers). The whole point of my post was that MySQL lacks the very features that make that much data *useful*. :-)

--
Javascript + Nintendo DSi = DSiCade
Re:Multi-Terabyte Data Warehouse and MySQL by Anonymous Coward · 2005-05-24 09:56 · Score: 0

> Dude, you're repeating my point.

Yep, i'm entirely agreeing with you. Probably should have phrased that part better ;-)
Re:Multi-Terabyte Data Warehouse and MySQL by Anonymous Coward · 2005-05-24 10:35 · Score: 0

My overall point is that MySQL is not designed to effectively manage that much data.

I wouldn't use MySQL for it either, but for the record, that is exactly what it was designed for. Read up on the history of why MySQL was developed...
Re:Multi-Terabyte Data Warehouse and MySQL by Tim+C · 2005-05-24 10:58 · Score: 1

There's a difference between being able to store TB of data in something, and that thing being able to handle it (eg access/process it effectively, store it safely, etc)

For example, you can store gigabytes of data in a flat file rather than a database, but I wouldn't want to try to query it. (Sure, you *could* use grep...)

--
It's official. Most of you are morons.
Re:Multi-Terabyte Data Warehouse and MySQL by AKAImBatman · 2005-05-24 11:52 · Score: 1

Please don't feed the trolls. He's one of the guys who runs around badgering me because he doesn't like the fact that I have insightful and intersting opinions. All he really wants is to get someone into a time-wasting argument.

Thanks. :-)

--
Javascript + Nintendo DSi = DSiCade
Re:Multi-Terabyte Data Warehouse and MySQL by AKAImBatman · 2005-05-24 12:03 · Score: 1

Read up on the history of why MySQL was developed...

Waaaayyyy back in '98/'99 I actually thought that MySQL might be a good database after I read up on the history of it. The whole "it was designed to store gigabytes of data for our company" seemed very convincing at the time. When I actually started to use it later on, however, I quickly figured out that it suffered from a sydrome common to in-house software: It was designed only for the task at hand.

Many of the things outside of what the parent company needed were not implemented or were poorly implemented. When MySQL split off into its own company, it finally began to patch some of the holes. Unfortunately, they've had some difficulty in adding features typically found in a commercial database. Whether this is due to the state of the codebase or priority being placed on other areas is unclear.

--
Javascript + Nintendo DSi = DSiCade
Re:Multi-Terabyte Data Warehouse and MySQL by Anonymous Coward · 2005-05-27 19:47 · Score: 0

www.kx.com,

Celko's article

http://www.intelligententerprise.com/010327/celko_ online.jhtml

read about tick
http://www.kx.com/products/kdb+tick.php

q is used for terabyte datawarehouses on wall street. realtime algorithmic trading.

milan's page.
http://homepage.hispeed.ch/milano/k4kdb+.htm

the usual database blatherings by Anonymous Coward · 2005-05-24 07:49 · Score: 2, Insightful

by MARGO SELTZER, SLEEPYCAT

Sleepycat? The guys who make a brain-dead key/value database with no data manipulation or integrity capabilities? Who are they to educate others on the topic of relational databases? (Sleepycat's products are useful tools, but they are not true databases).

while data management has become almost synonymous with RDBMS, however, there are an increasing number of applications for which lighter-weight alternatives are more appropriate.

Ahh, so the proper title of this paper should be: "Beneath Relational Databases" or "Below Relational Databases". Because the relational model is a *complete* model for data storage and manipulation, so if you have a subset of this functionality, you are not "beyond" it.

As argued by Stonebraker, the relational vendors have been providing the illusion that an RDBMS is the answer to any data management need. For example, as data warehousing and decision support have emerged as important application domains, the vendors have adapted products to address the specialized needs that arise in these new domains. They do this by hiding fairly different data management implementations behind the familiar SQL front end. This model breaks down, however, as one begins to examine emerging data needs in more depth.

Well, the mention of Stonebraker's name as an authority on databases is generally an indicater of a content-free paper, but let's be sure we're talking about the same thing: the relational *model* is a *complete* model. There is no other more effective model, in fact as far as I know, there are no other complete models!

So if you want to use the relational model as a foundation to build new database products, go right ahead. If you're talking the same old vendor BS about "post relational" or "XML" (hierarchical) or "object" (network and/or hierarchical), then please shut up!!

My feeling is when he says "in depth", he means "less depth".

As more documents are created, transmitted, and operated in XML, these translations become unnecessary, inefficient, and tedious. Surely there must be a better way. Native XML data stores with XQuery and XPath access patterns represent the next wave of storage evolution. While new items are constantly added to and removed from an XML repository, the documents themselves are largely read-only.

Uh, yes there is a better way: create an XML data type in a relational database with a full set of XML operators. The relational model doesn't care about data types.

I have no interest in giving up the general relational model for a hierarchic model (rejected decades ago as not being general enough) based on a TEXT FILE FORMAT.

Stream processing is a bit of an outcast in this laundry list of data-intensive applications.

I smell Stonebraker.. yes, it's an outcast because stream processing has nothing to do with data storage!!!

Some argue that database architecture is in need of a revolution akin to the RISC revolution in computer hardware

Yes, all these people need to study and understand the relational model which was developed 30 years ago and is still the only complete data model. The relational model can be described in half a page, and consists of a small number of core operations from which any possible data storage and manipulation need can be developed. Stop thinking about implementations, think about the *model* and then use that develop new implementations!!

Old-style database systems solve old-style problems; we need new-style databases to solve new-style problems.

What does this mean exactly? I need to store and manipulate data without limitations. The relational model offers this. What is "old" or "new" here? I'm not going to switch to an ad-hoc subset of the relational model because it's "new".

This "paper" (wasn't there one a couple weeks from some Microsoft dude, which was equally useless?) commits the same old sins: 1) look at existin

Re:the usual database blatherings by Tungbo · 2005-05-24 10:15 · Score: 1

"the mention of Stonebraker's name as an authority on databases is generally an indicater of a content-free paper"

That is certainly a bold claim coming from an AC. In case you didn't know, he has been building relational DB probably while you were still in diapers. Heard of something call Ingres perhaps?

The relational model could be defined to be "complete" in one technical sense, but that by no means make it the most effective tool for your application. For example, graphical data is not efficiently stored in a RDB, whether they are pixels or display lists. Typically, they have to get stuffed into a BLOB if one insists on saving into a RDB. That mean the RDB simply have no way to manipulate the data semanics.

To put it another way, every computer is equivalent to a Turing machine - arguable the simplest representation of computation. Does that mean we should all do programming in Turing machine exclusively?

The article is not arguing for getting rid of the RDB. It's simply pointing out areas of limitations which may inspire development of other data manipulation approaches. A better fitting tool will enhance your life.
Re:the usual database blatherings by dcfix · 2005-05-24 10:35 · Score: 1

I don't know of a professional database programmer that would store an image in a database. BLOB (Binary Large Object) storage in a database is a terrible waste of resources and defeats the whole purpose of a relational database. It's to manage relationships BETWEEN data!

FYI, instead of storing an image in a database, you want to store a path name to the image. The application can pull up the image just as fast without screwing up the other production users of the database. Besides, who the hell wants a BLOB being included in your sql db?

A point the original poster missed, and my favorite quote from the advertisement from sleepycat.
"Transactions provide the illusion that a collection of operations are applied to a database in an atomic unit..."
Actually, that's exactly what a transaction does... applies a set of actions against the database as an atomic entity. Google 'Database ACID principles' for more info.

--
What cod piece?
Re:the usual database blatherings by kimanaw · 2005-05-24 11:13 · Score: 2, Interesting
This "paper" (wasn't there one a couple weeks from some Microsoft dude, which was equally useless?)

I suspect you mean this article by (the highly regarded) Jim Gray, whose opinion I'm much more inclined to seriously consider than the ramblings of the subject article's author (someone who, based on his company's product, apparently considers a B-Tree to be an innovative solution).

But I admit I found Gray's article equally disconcerting (perhaps he's been spending too much time on the sailboat in the SF Bay, and MSFT needed a shill to push their "XML Uber Alles" and Longhorn vision?).

SQL has managed to overcome a lot of derision over the years, and, tho the language has often been a bit slow on the uptake, it has managed to adapt admirably to abstract datatypes, recursive queries, OLAP, etc. Perhaps as importantly, the 90's left us with empirical evidence that the O-R juggernaut may have been ill conceived, with the failures of GemStone (which has since re-invented itself), and the disasterous impact of Informix's Illustra (nee Postgres) adventure.

Let's face it, SQL just works. Most people I've met who complain about SQL usually
- (a) don't know how to tune a database and/or
- (b) don't know how to properly use SQL and/or
- (c) are stuck using a half-assed DBMS
There may be an inherent beauty/advantage to a declarative language, in that it can adapt to change by adding a few nouns, some adjectives and/or adverbs, and, as a last resort, one or two new verbs, rather than forcing a redefinition of the entire language (C -> C++, COBOL -> ObjectCOBOL, etc.).
So, to paraphrase Twain, "The rumors of SQL's death have been greatly exaggerated."
--
007: "Who are you?"
Pussy: "My name is Pussy Galore."
007: "I must be dreaming..."
Re:the usual database blatherings by Anonymous Coward · 2005-05-24 12:04 · Score: 0

No, I haven't used BLOBs either, but only because I'm wary of their implementation in today's databases; not because they don't belong.

I would love to treat ANY and ALL business data through a relational access mechanism to take advantage of the ACID properties rather than managing two different data stores. I haven't heard of any filesystem resource managers that will allow me to use a file API yet give me two phase commit with my RDBMS.
Re:the usual database blatherings by AtrN · 2005-05-24 23:19 · Score: 1

Maybe you should read some of Margo's other papers before jumping to conclusions.
Re:the usual database blatherings by Tungbo · 2005-05-25 00:32 · Score: 1

"I don't know of a professional database programmer that would store an image in a database."

I quite agree. That shows exactly that not all data semantic is appropriately modeled by the RDB approach - my original point.

As for the article calling transaction an "illusion", it is exactly correct. All the components of the transaction do NOT occur at once. They are completed in sequence with provision for rolling back in case of failure to complete ALL of the components. I'm quite sure that you actually would agree with this description.
Re:the usual database blatherings by dcfix · 2005-05-25 03:37 · Score: 1

A transaction is not an illusion due to the fact that an atomic unit does not mean that everything happens simultaneously. It means that all of the sub-transactions occur or NONE of them occur.

Only in the most basic sense could a transaction be considered a simultaneous act. Unfortunately, it's in this most basic sense that you wouldn't need to roll back the transaction, because there can't be multiple actions in the transaction you describe.

--
What cod piece?
Re:the usual database blatherings by Anonymous Coward · 2005-06-05 12:52 · Score: 0

"I don't know of a professional database programmer that would store an image in a database."

It's done in circumstances where you can guarantee availability and distribution of a database, but you cannot make that guarantee for a filesystem. Life is better in the age of the ubiquitous internet, but this can still be a huge problem.

Re:I did not RTFA by tha_mink · 2005-05-24 07:50 · Score: 1

One dump truck, 15 trips.

--
You'll have that sometimes...

A thought on XML documents by It+doesn't+come+easy · 2005-05-24 07:54 · Score: 2, Interesting

I woke up and had an interesting thought...

I can imagine XML documents created in such a manner that they could constitute an object from an OOP (Object Oriented Programming) perspective, containing their own schema, characteristics, relationships and data. Further, I can imagine the ability of accepting such an XML document object into a range of other things, such as a modular program (dynamic program extensibility) or a database (temporary database extension). I can imagine the ability to package up such XML documents so that databases can be built simply by linking the XML documents together.

So (for example), one could send a request to the XML index in the sky, find and link documents containing a subset of know medical facts, and then kick off a data mining process that could discover previously unknown medical relationships. All without needing to know anything other than where to find the XML files.

Now imagine a tool that could convert all of the terabytes of data the world is generating every day into small, linkable, OOP like XML files... Sounds like a great open source project to me...

--
The NSA: The only part of the US government that actually listens.

Re:A thought on XML documents by The+Slashdolt · 2005-05-24 08:29 · Score: 4, Insightful

Here is the problem with your idea. Unlike the relational model, XML does not link facts. XML documents can be joined in any way, either valid or invalid, without you knowing one way or another. The relationships between documents are weak. There is no referential integrity. Within a proper relational model you are stating facts and factual relationships. Joins of those facts generate derived facts that are as true and accurate as your original model. Why add the overhead and complexity of xml? Why not just use a proper relational model?

--
mp3's are only for those with bad memories
Re:A thought on XML documents by Rui+Lopes · 2005-05-24 08:53 · Score: 1

Oh, you mean something like this ?

--
var sig = function() { sig(); }
Re:A thought on XML documents by Anonymous Coward · 2005-05-24 09:00 · Score: 0

True.. but you need to state the obvious point: XML is a text file format, that's why it doesn't have those things.

If people who want "XML databases" kept this mind (for instance, mentally substitute 'CSV database' or 'PNG database' and see if that makes sense), I think the world would be a better place.
Re:A thought on XML documents by SvendTofte · 2005-05-24 13:27 · Score: 1

I'm not a buzzword hater, but express your idea without XML, and it will be far more interesting. XML is not a solution to anything, it is a tool.

Improving the efficiency???? by Dammital · 2005-05-24 07:56 · Score: 5, Informative

"Relational databases were developed in the 1970s as a way of improving the efficiency of complex systems.

Huh? Go back and reread some of Codd's papers (in the late 60's, BTW) and you'll see that efficiency was never a motivator. Simplicity was his aim, filesystem details were made irrrelevant, explicit navigation was obsoleted, and a built-in security model was included.

When relational systems finally began to appear (and I'm thinking specifically about IBM's System R) they were dog slow, and the extant hierarchical and CODASYL network databases of the day ran rings around them. Still do, unless you throw lots of hardware at the RDBMS.

RDBMS have lots of advantages over older technologies, but performance is not among them.

Re:Improving the efficiency???? by Anonymous Coward · 2005-05-24 08:06 · Score: 0

RDBMS have lots of advantages over older technologies, but performance is not among them.

The relational *model* says nothing about performance, which is an implementation detail.

Saying the relational model is "slow" is like saying arithmetic is "slow". It doesn't make sense.

This is the main problem with the database industry today: confusing the model with the implementation!
Re:Improving the efficiency???? by Unordained · 2005-05-24 14:28 · Score: 1

Everyone's always so quick to say the relational model is dead ... I wonder what NetCraft has to say about it.

a) The relational model is not slow. It's a mathematical model. It's like saying that algebra is slow because your calculator is slow. Get a better calculator.
b) The relational model really can store anything you can describe. It's a lot like algebra and accountants: if you can't make it quantifiable, you won't be able to do the math. A lot of people are simply unwilling to come up with good metadata for a relational database, and declare that it can't handle "unstructured" data. (Unstructured doesn't mean is has no structure -- it means they won't tell us what it is, or don't care.)
c) The relational model doesn't say you can't store a value of "TX, AZ" in a field. It only says that such a multivalue field cannot be seen as if it were "TX" in one row and "AZ" in another row. "TX, AZ" is equal only to "TX, AZ", but not to "TX" or "AZ". You can define your own functions for Contains(), Overlaps(), etc. if you like.
d) The relational model doesn't say you can't store XML values. It only says that when you do, you should be prepared to write very complex functions to work with those values. If you're willing to, then that's your problem.
e) Relational databases are slow compared to custom solutions, yes. So are general-purpose operating systems, compared to custom embedded OS's. Unlike custom solutions, they have great flexibility, a very rich toolset. Custom solutions won't give you a working OUTER JOIN operator, for example. Graphics CPU's vs. general-purpose CPU's ...

The submitter seemed to think this was the end of relational databases. The article itself was much less harsh -- its criticisms of the relational model are more criticisms of current products. The author wants a-la-carte features, not having to pay for what they don't use. Oracle is huge, yes. Firebird isn't -- tiny footprint, yet the features are there. Not all RDBMS's take up a gig of hard drive space (and ram!) just to start up. And hey, if you don't want transactions, you can always use MySQL! [snicker]

The relational model is fine. Yes, we should improve speed. No, we can't make it as fast as following pointers around in a file -- but businesses long ago realized the cost was too high (pointer-based databases don't give you fast joins on non-FK fields) ...
Re:Improving the efficiency???? by fnc · 2005-05-25 05:58 · Score: 1

You are right in saying that efficiency wasnt originally one of the objectives of the relational model, but today performance is better than with old technology. See this quote from an interview with Jim Gray (http://www.acmqueue.org/modules.php?name=Content& pa=showpage&pid=43. )
Nonetheless, the relational guys took the challenge and said, "If we're going to be successful, we're going to have to perform as well as the IMS on the bread-and-butter transactions." A great deal of energy went into this, and I think it is fair to say that the relational implementations did okay. Today, all the best TPC-C [Transaction Processing Performance Council Online Transaction Processing Benchmark] results you see are with relational systems. The IMSes of the world are not reporting TPC-C results because, frankly, their price performance isn't very good.

Re:KISS (I can prove SQL will be around) by gosand · 2005-05-24 07:58 · Score: 5, Funny

SQL, on the other hand:
1) Reasonably simple API
2) Scales to very large databsaes
3) Cross-platform/architecture
4) Performs very well.

I am proof that SQL will be around for a while. When I first saw Unix back in the late 80s, I thought "this is too hard to use, why would anyone need this?" I have been a Unix/Linux user since about '92.

When I took my first SQL class, I thought "these queries are very cumbersome. SQL is stupid." I still use it today.

In '93 I heard about this thing called the World Wide Web, and thought "This is unnecessary. I can find whatever I need on gopher and ftp sites. Why would I want a gui thrown on top of it?"

As you can see, I am quite the visionary.

--

My beliefs do not require that you agree with them.

You are likely correct by jd · 2005-05-24 08:01 · Score: 1

Indeed, the most "obvious" solution would be to have views (which are normally virtual tables) become physical tables on the remote device. Then, only the data that is actually important would be stored on local devices.

The problem is not that you have terabytes of data in total, because you don't deal with the total data. It's no different from swimming - you're only using the surface. The problem is extracting only that surface, so that it can be used on a local device.

The reason this is a problem is that conventional "intelligent" devices have extremely limited memory - far too limited for even a views-based approach to work. You would need to build devices capable of handling between ten to a hundred times their current capacity to realistically handle such a system.

Not all is lost, though. Most of a PDA is empty space. Lots of chips means lots of casing, and casing is largely vacant. If you had wafer-scale memory, or even had a single strip of silicon instead of individual chips, you could pack in far more memory with actually LESS space being taken.

(You only need one case on a wafer, and therefore only have one lot of overhead. A typical SIMM board has 9 chips, therefore 9 times the overhead. To get 100 times the memory, you'd currently need 900 chips, which would be 900 times the overhead. But you'd STILL only need one wafer, because you can get over 900 chips off a single wafer.)

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

Re:You are likely correct by pdbaby · 2005-05-24 10:52 · Score: 1

(You only need one case on a wafer, and therefore only have one lot of overhead. A typical SIMM board has 9 chips, therefore 9 times the overhead. To get 100 times the memory, you'd currently need 900 chips, which would be 900 times the overhead. But you'd STILL only need one wafer, because you can get over 900 chips off a single wafer.)
The problem being that your yield suddenly drops to 0 when you have to produce a huge wafer. It's nice to use multiple wafers because then you can throw away those that don't work (or relegate them to a lower speed rating)

--
Global symbol "$deity" requires explicit package name at line 2. - If only $scripture started "use strict;"
Re:You are likely correct by jd · 2005-05-24 11:21 · Score: 1

You then stick a filesystem-like onto the start of the wafer, and mark areas that don't work as if they are bad sectors, thus avoiding the problem.

Or you make the chip in two stages - etch the chips in round 1, then etch on the interlinks to join them together, once you know which ones work.

Or you download Badmem off Freshmeat, and use Linux.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

It boils down to memory by part_of_you · 2005-05-24 08:03 · Score: 0

It's funny how memory really works on a large scale. I mean, the things that existed 125 years ago, cannot be remembered by anyone here now. As a group, we cannot even conjure up what it must have been like 125 years ago, no more than we can for something that was 3,000 years ago. Practicle knowlege is what we really want...

Freemasons knew this, so they didn't write anything down. It's funny how "information" is dispursed, and kept. People are fighting over a way to get information in such a way that it is indestructible, but look at the Egyptians did. They tried, even to use pictures, like NASA did when they launched that space pod, in order to express to aliens where Earth was, and what was on Earth. (they had to use pictures do to obvious problems with language)

My point is that I don't think this problem with finding ways to keep data will ever be figured out. There are physical laws that govern the way matter acts. None of which, seem to hold data about a certain time, or place.

Bah... by Danathar · 2005-05-24 08:06 · Score: 1, Funny

ASCII Flat files are the way to go. If I can't fgrep it...then to hell with it. SQL my ass..

The future exists today by erasmix · 2005-05-24 08:10 · Score: 2, Informative

Many of the capabilities that, according to the article, should exist in the next generation of databases exist today in IBM Informix Dynamic Server. Examples: The hability to use other than B-Tree+ indexing technology, the hability do describe data beyond traditional types, etc. Furthermore the article relies heavily on comments by Stonebreaker, the father of Informix's Object Relational capabilities.

Re:I did not RTFA by arkanes · 2005-05-24 08:12 · Score: 1

One of these can carry 290 metric tons (tonnes?). Thats pretty close.

Slashdot must write about lack of concepts before by darthium · 2005-05-24 08:14 · Score: 1

Many of the 'problems' and concerns regarding current tools, could be avoided with a previous set of strong skills and concepts in the software development team. I see that a much bigger problem than the ones analyzed in most of the articles analizing current weaknesses in software development, where they talk about problems with the technology, IMO, problem with humans in previous and more critical.

ORDB is too complicated by mveloso · 2005-05-24 08:14 · Score: 1

It's amusing, but ORDBs are just too complicated for the market.

They should be simple, but they're not. Plus, the implementation tends to tie you to a given vendor. It's not a real problem since most people never move off of their database, but people don't like thinking about it too much.

The relational model is nice because it's easy to understand, and you can always flatten an object graph into tables if you need to. You can also poke around and visually see the structure, which is nice.

Lastly, there's no real benefit too ORDBs compared to RDBMSs. People can do everything they need/want to do today with RDBMS, so there's no reason to move. There hasn't been anything really compelling, I suppose.

Re:ORDB is too complicated by promantek · 2005-05-24 08:30 · Score: 1

those are good points.

the compelling (only) reason we've been exposed to is the attributes you get through OO - polymorphism, encapsulation and inheritence, with inheritence being the real payoff to switching.

for example, if you sell widgets and you have three types of customers - individuals, wholesale, and resellers - certain information would be common which would be a "super table" with the others inheriting from it. but it seems like it's just not worth the effort required to do a complete ORDB for that simple example.

i'm sure there are other reasons, but this is what i've been exposed to.

MUMPS by stuffduff · 2005-05-24 08:15 · Score: 3, Interesting

MUMPS had one data type, a string and one data structure, an array. Commands abbreviable to a single letter. MUMPS ran whole Hospital Information Systems on DEC PDP-11's where MUMPS was both the Operating System and the Language and was called Digital Standard MUMPS or DSM for short. It was way ahead of it's time, it predated SQL, and was one of it's progenitors. Anyone who offers a survey of databases and fails to include it is seriously myopic. Google 'MUMPS Language' for the tip of the iceberg.

As for the future, I think that while OO databases are all the vogue, that O databases, where each bit of information is represented by it's own object and will have the ability to demonstrate autonomous agency is a good area for research. Then we can just let the data speak for itself! ;^)

"Every person takes the limits of their own field of vision for the limits of the world."
- Arthur Schopenhauer

--
"Can there be a Klein bottle that is an efficient and effective beer pitcher?"

Re:MUMPS by Anonymous Coward · 2005-05-24 08:22 · Score: 0

Mumps is a very basic "dumb" data store. The model is similar to the network model. Definitely a subset of the relational model. As you say yourself, it only has one data type, which makes it pretty useless. How do you add constraints ("this number must be positive" "this XML document must have this structure" "this jpg must be this size")? How do you connect one element of the data to another? You don't, that's why mumps is used a low-level layer within an application, and you have to go through that application for all data manipulation.

Mumps-style DB's are exactly the kind of thing the relational model made obsolete. People are still using them strictly for performance reasons, the data integrity capabilities are terrible (because there are none!!).
Re:MUMPS by stuffduff · 2005-05-24 08:35 · Score: 1

II respectfully disagree. I used MUMPS for a decade before switching. If you look at the MUMPS command structure, the pattern matching and the ability to use indirection, you will see that an even finer level of control was achieved. For example, using a relational database, containing different structures, some finite, like an integer and some not so finite, like a blob, the relational database has to make a much larger number of disk hits just to locate the data. You also don't have to return the entire result of a query before being able to act on records. In MUMPS you can advance through the records one at a time to process them, again much less work for the system to achieve the same goals. With MUMPS and 16K of memory on the server and a VT100 as the client, you could do in text mode, things that you have to have a 1GHz+ PC and web browser to do. I've dismantled record crashed from RDBMS and MUMPS systems, the MUMPS is much easier to recover, the RDBMS you're pretty much screwed. MUMPS hasn't been made obsolete either, just look at CACHE, it's nothing but MUMPS underneath. RDBMS look great when all you're dealing with is similar in nature, a table of this, a table of that. Now let's record your medical record in it' suddenly you discover that for dissimilarly structured data the RDBMS is about as effective as a bazooka at a watchmaker's shop. Someone sold you a bill of goods my friend, time to think outside of the box.

--
"Can there be a Klein bottle that is an efficient and effective beer pitcher?"
Re:MUMPS by Anonymous Coward · 2005-05-24 09:14 · Score: 0

For example, using a relational database, containing different structures, some finite, like an integer and some not so finite, like a blob, the relational database has to make a much larger number of disk hits just to locate the data.

This is a physical implementation issue, and has nothing to do with the relational model.

You also don't have to return the entire result of a query before being able to act on records. In MUMPS you can advance through the records one at a time to process them

Irrelavent physical details. What is the *model* of a Mumps database, and why is it superior to the relational model? How does it guarantee constraints?

I've dismantled record crashed from RDBMS and MUMPS systems, the MUMPS is much easier to recover, the RDBMS you're pretty much screwed.

Which RDBMS? You're talking *PHYSICAL* details. If Mumps is so great for this, then I'll just build a relational DB on top of it. But I won't be using Mumps directly.

Now let's record your medical record in it' suddenly you discover that for dissimilarly structured data the RDBMS is about as effective as a bazooka at a watchmaker's shop.

Either you understand your data or you don't. If you do, then you can express it with the relational model, which is the only complete model for the manipulation and retrieval of data. If you don't, then it really doesn't matter what you use.

I suggest learning some basic relational theory, and the difference between a model and an implementation.
Re:MUMPS by FranTaylor · 2005-05-24 09:50 · Score: 1

Check out Intersystems Cache. http://www.intersys.com/ It successfully merges SQL and objects in the same database. Very nice. (Disclaimer: I work there).
Re:MUMPS by Unordained · 2005-05-24 17:38 · Score: 1

VistA (for Veteran Affairs) and RPMS (for Native Americans / Alaska Natives) are MUMPS/M packages for hospitals and clinics. I think I heard recently that RPMS is switching to Caché instead of the M globals, but I doubt that makes much difference (except they can start using some form of SQL if they want to.) MUMPS is still very much being used to run medical services ... doesn't appear to be dying anytime soon. They're even investing in switching to GUI interfaces (a few actually look almost good -- for medical apps, that's amazing). But if you want to have some fun, ask RPMS guys for their table layouts ... (sparse matrix is hard to describe in terms of table layouts.)
Re:MUMPS by stuffduff · 2005-05-25 03:09 · Score: 1
I can concede the most of the implementation VS model points, no problemo.
Either you understand your data or you don't. If you do, then you can express it with the relational model, which is the only complete model for the manipulation and retrieval of data. If you don't, then it really doesn't matter what you use.

Here too, expressing it in the 'model' is one point, but the implementation is another. I can actually build a mumps database in a relational database half dozen fields.
- A unique numeric node ID
- A pointer representing the ID of the parent node above this level
- A pointer representing the ID of the previous block at this level
- A pointer representing the ID of the next block at this level
- A pointer representing the ID of the first Child node below this level
- The Data Content of the node, represented as a string, which contains internal segments representing the subscripts or the data elements of a particular 'global'.
Top level directory blocks have '0' parents, bottem level data blocks have no children, etc.

I remember the 'root' block of a DSM MUMPS implementation would store the pointers to the first data and routine blocks in block 0, but it was referenced as block -16777216. Because the operating system was stored after the boot sector it was not unusual to see pretty big offsets to the first data and routine blocks. As I pointed out before the routine blocks were pretty normal. The top level held the list of routine names, each included a pointe r to the first block of the routine's text. If the routine overflowed the block, the 'next' pointer pointed to the next block of it.

The 'globals' were a bit different. First off the array index used key compression. The first array index in the block was complete. If there was any overlap in the index between the one before and the one after, it was indicated by a flagged numeric, for example if 'ABCDE' was followed by 'ABCDF' it would be stored as '0ABCDE'{pointer to block containing ABCDE's data}{delimiter}4F{pointer to ABCDF's data} etc.

In Mumps, if we stored these in global M these would be coded as:

^M('ABCDE')={ABCDE's data}
^M('ABCDF')={ABCDF's data}

The implementation was very precise, the containers wasted very little space, etc. A global that contained little or no data would contain its entire structure in 1 or 2 blocks. With 16K blocks that is a max of 32K for an empty global. It would absorb more than 24K of data in most cases before a block split was required, and the structure required no meta data other than the actual global storage representation itself. An empty database implementation of any RDBMS, say Access for example) is a huge waste of space. 92K just to have the empty container structure. Pointers to separate and segregate tables and fields special formatting and handling of each data type.

Enough of implementation, anyway. My chief complaint is that the relational model is inflexible in it's implementation by definition. MUMPS provided the programmer the ability to navigate the data in a tree format. If you wanted to just 'tack' something in somewhere, just like creating a new synapse in your brain, you could. By convention you could explore and extend it as a tree. If special code was needed to access the new data, it could be included in the data structure itself, just as it can in your brain. I see the RDBMS guys just throw up their hands as the number of tables in their databases become unmanageable and that they have to 'just say no.' to projects that just don't fit their 'understanding' of how their implementation requires them to do things. As long as I stick to object stores for data that 'fails to normalize' properly and continues to grow exponentially I do fine. I find that RDBMS is great for large collections of things that are all basically the same, and I routinely use them for those tasks. But for working with an extensive amount of dissimilarly structured data it's about as effective as ray tracing in COBOL. Yes, it can be done in the model and in the implementation, but who would want to maintain it? ;^)

So, we just think differently...
--
"Can there be a Klein bottle that is an efficient and effective beer pitcher?"
Re:MUMPS by stuffduff · 2005-05-25 03:14 · Score: 1

I remember when I could actually call Mr. Brown himself with crash dumps. During the first few weeks of release for the IBM PS2 version. It has come a long way and has a wonderful toolkit. I really like the visual implementation. It's really wonderful. You guys do good stuff there! For those who are interested in MUMPS I agree they should check out your site!

--
"Can there be a Klein bottle that is an efficient and effective beer pitcher?"
Re:MUMPS by stuffduff · 2005-05-25 03:20 · Score: 1

Your point about the sparse matrix is well taken! That is one of the places where MUMPS excelled. The VA is doing amazing things these days. VistA is a project that is making the 'bleeding edge' a functional reality for many people in medicine.

--
"Can there be a Klein bottle that is an efficient and effective beer pitcher?"

Re:Sifting Rapidly Through Petabytes of Data by Anonymous Coward · 2005-05-24 08:15 · Score: 0

Go fuck yourself, Roland.

Sleepycat? by halosfan · 2005-05-24 08:19 · Score: 3, Interesting

To me, this entire article reads like a plug for Sleepycat, written, not surprisingly, by a Sleepycat person.

Relational data model is just that -- a data model. It doesn't concern itself much with implementation (and therefore, performance in any particular environment) or with how applications use the data. And that's really the point -- relational databases are application-agnostic. They are designed to store the data that will be possibly accessed by applications that are not yet conceived. That's the reason they put great emphasis on internal correctness of the data. Once you have a database specialized for one application, that's not really a database in the relational sense -- that's a way to persist your application data. And that's where Berkeley DB shines. It doesn't replace relational databases, it just serves a totally different purpose.

--
My only problem with Microsoft is the severity of bugs in their software.

How about.... by plopez · 2005-05-24 08:21 · Score: 4, Informative

really implementing a relational model to begin with? Then we can decide if the relational model is broken or just the vendor implementation.

How about... a query language that is fully set operations compliant, i.e., something other than ANSI SQL which is a strange mixture of set and bag operations, and a mixture of relational algebra and relational calculas and some other 'extensions'.

How about... realizing that a major design goal for the relational model was data integrity. Modularity and configurability are also good goals but if you are serious about your data, integrity will be at the top of the list.

The biggest problems I see with databases is very few people understand how to use them. Here's a few tips:
1) a table is *not* a class or an object. Tables + constraints + user defined types + constraints etc. when used properly can define domains which are close to classes and objects.

2) Learn how to normalize. A badly (or flat out not) normalized database threatens data integrity by violating the once-and-only once rule. As a rule of thumb if the table has more than 20 fields in it you should review your data model and make sure it is properly normalized.

3) Point 2 is often the consequence of mindlessly slurping in spread sheets or MS Access database tables. Anyone doing this has no business being within 50 feet of an IDE.

4) Ditch Raid 5. 0+1 will give better perfomance in most cases. Manager like Raid 5 because it is cheap, you get what you pay for.

5) Have multiple channels for data, transaction logs, large indices and O/S or user applications to reduce bottle necks. This is expensive but for large databases going cheap will hurt you.

6) Learn a little theory, it won't hurt you. In fact it can save a large amount of time and trouble. Do not be afraid of learning about the technology you are using. After all, technology is what you are good at, right?

7) If it is a read only database, turn off logging for speed (impossible to due under SQL Server 2000 btw). Also, if a table is on a purge and load paradigm (many reporting and/or datawarehouse tables are) turn off logging on the table level if your version of database engine allows you to do so. Likewise, turning off logging on a hand held or other single user system may be appropriate, just make sure two people do not try to use the database at the same time.

8) Avoid XML. Too much bloat.

9) Learn how to use indices on tables.

10) Learn how to read a perfomance monitor/top etc.

Postgresql is both working hard to become truly relational AND is adding support for geographic objects and objects. The MySQL crew is working hard to improve. Oracle has some nice perfomance features but I think their 'Object/Relational' implementation is broken. SQL Server is getting 'long in the tooth'. There is also a great need for temporal databases and lightwieght engines. But remember, there is no 'silver bullet', no short cuts. Just hard work to be done.

--
putting the 'B' in LGBTQ+

Re:How about.... by Anonymous Coward · 2005-05-24 08:39 · Score: 2, Insightful

1) a table is *not* a class or an object.

Good advice. A table is simply a snapshot of a relational value. True relational values have no top/bottom or left/right ordering so you can't really show them on paper or on the screen. It doesn't make sense to map classes or objects to tables.

2) Learn how to normalize. A badly (or flat out not) normalized database threatens data integrity by violating the once-and-only once rule.

Normalization has nothing to do with integrity per se. You can add constraints to a denormalized database that make it logically equivalent to a normalized one. I'm not aware of any "once and only once" rule in relational theory.

Normalization is about dependencies: your data shouldn't have certain types of dependencies in it.

As a rule of thumb if the table has more than 20 fields in it you should review your data model and make sure it is properly normalized.

Wrong, the number of fields has nothing to do with normalization. And you should review your data model even if it has 1 field.

Ditch Raid 5. 0+1 will give better perfomance in most cases. Manager like Raid 5 because it is cheap, you get what you pay for.

Yes, 0+1 is faster, but lots of RAM is even faster. However this is a physical detail and not anything to do with the relational model.

6) Learn a little theory, it won't hurt you. In fact it can save a large amount of time and trouble.

This should be point #1 with a double underline. The lack of understanding and even hostility toward theory is frightening in the IT industry. If the author of the paper above had any foundation knowledge her paper would be much different.

8) Avoid XML. Too much bloat.

Just remember that XML is FILE FORMAT like Jpeg, CSV, and you'll do fine.

9) Learn how to use indices on tables.

Christ, I hope people know this one! They should also realize that an index is a PHYSICAL manifestion of a logical idea: keys.
Re:How about.... by Rebar · 2005-05-24 15:03 · Score: 1

You sound like you know what you are talking about, but your 10 bullets are not silver either. Once you have those down you can move on to areas where they don't apply. Some counterpoints:

(2) learn when to NOT normalize. If most of your access patterns pull data from several tables, you can avoid expensive joins by not over-normalizing. You can build the world's best data model where nothing is replicated and you'll have a system that can't answer anything without multiple table joins. That's OK in OLTP systems - but not OK in the world of DSS.

(4) On low-end hardware, you know of what you speak, but it doesn't hold true if the raid controller's cpu is fast and the cache is large. Sure you can create data files faster on 0+1, but on customer queries (even sort intensive ones), you can purchase raid-5 controllers fast enough that your bottleneck is somewhere else. Does the controller and FC disk cost eat up all your savings over a cheaper raid-0+1 on ATA or SCSI disk kit? probably.

(5) old school. New world order is stripe-and-mirror-everything. Disk controllers are smart enough these days that seek contention on multiple I/O streams is a thing of the past. I wouldn't waste valuable controller bandwidth on something that is going to be utilized 5% of the time. The key is to have *lots* of channels, not necessarily *separate* channels, and that boils down to spend more, get a faster database server.

(6) Damn straight. Can I get an "AMEN". And here you thought I was refuting points.

(9) Learn when to NOT use an index. You've had the "ah-HA" moment when adding an index to a frequently accessed table speeds up queries dramatically, but if your access patterns have you constantly pulling more than a small fraction of the rows, tune for a hash join or a full table scan on a single table... but not without seeing (5) above. Indexes can *kill* your performance if your customers ask the wrong questions. And then you miss your update window due to index maintenance... PostGreSQL has hash joins now, for which I am very happy. Too bad it's still 1/3rd the speed of Oracle*.

Everything say is spot-on in some, maybe most cases, so I know you are a true fellow database geek... just one with a different perspective than mine.

(*) when Oracle is working right. When it doesn't work right, damn near everything is faster than Oracle. If you want to be blown away by a fast database system - go benchmark a Netezza. No, I don't work for them.
Re:How about.... by TapeCutter · 2005-05-24 15:22 · Score: 1

I'm not aware of any "once and only once" rule in relational theory. Normalization is about dependencies: your data shouldn't have certain types of dependencies in it. - Normalisation is "about" minimising the amount of space you data set consumes by creating tables that are related to each other via keys (dependencies). The general aim of normalisation is to avoid storing redundant data and can be paraphrased as "store once and only once".

Also you do not need an "index" to implement a key. The key is one or more fields that exist as columns in a table, but the rows are stored in no particular order (fast insert, slow fetch). An index replicates the key's values from a table in a certain format (eg:hash table). The sole purpose of an index is to enable fast lookup of the row that contains the key you are searching for (slow insert, fast fetch).

Like so many other things, implementing a database is a trade-off between speed, size and effort.

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.

everything old sucks by pyrrho · 2005-05-24 08:25 · Score: 1

for example, the wheel, which is why I've invented the square.

And also, the computer... fucking things ancient... hello! I use the Plumation Machine instead because it's knew knew new!

I'm pretty sure whatever is next will be put on top of relational databases... rather than screw with decades of optimizations.

but SQL does suck. Really. not because it's old. it always sucked.

--

-pyrrho

sometimes all you need is simple object store by BigGerman · 2005-05-24 08:25 · Score: 1

.. like MAOS

Re:I did not RTFA by menkhaura · 2005-05-24 08:33 · Score: 1

Ah, the good days of Carmageddon...

--
Stupidity is an equal opportunity striker.
Fellow slashdotter Bill Dog

Hmm .. by ghakko · 2005-05-24 08:38 · Score: 5, Informative

Has anyone noticed that the author of the article is from Sleepycat (which sells commercial licenses for Berkeley DB to embedded systems developers)?

She puts forth a case against SQL and relational databases in general and claims that many applications (like directory services and search engines) have read-heavy, hierarchial access patterns which favour lighter-weight, non-relational, transaction-optional databases.

And .. it just so happens that Sleepycat's flagship products are Berkeley DB (a flat-file database) and DBXML (an XQuery engine built on top of that).

Re:Hmm .. by Anonymous Coward · 2005-05-24 15:16 · Score: 0

Of course. It's not like she's trying to pull the wool over people's eyes. Her work at Sleepycat came from her beliefs that SQL was not appropriate for many applications, and that something like Berkeley DB was needed. So here she is with an article justifying her position.
Re:Hmm .. by ZenFu · 2005-05-25 03:06 · Score: 1

I spent the last year working with xslt and storing data files in xml. At one point I did a search on xml databases and was pretty excited when I read about the DBXML product. (Unfortunately, I haven't had the time to learn more about the product, the alternatives, or the actual preformance.)

Although you have a point on full disclosure, her/her company's background is as much a reason to listen as to remain skeptical. They are the one's that have decided to spend time (and I assume money) on the effort.

WooHOO! by Anonymous Coward · 2005-05-24 08:39 · Score: 0

Comma-delimited lists? YES! Lisp will rise again!

Re:I did not RTFA by Anonymous Coward · 2005-05-24 08:41 · Score: 0

Oracle, and (blech) DB2

"Blech?"

Can you back that up with some real-world examples where DB2 was worse than Oracle?

Issue of dimensions in RDBMS by jonbrewer · 2005-05-24 08:42 · Score: 2, Interesting

One of the problems of current databases when is that a typical relational database doesn't have enough dimensions. Designing a table to store data is trivial - but what happens when you need to know the intersection of X and Y at time Z?

This is a fairly common question in data warehousing: What is the data today, what did it look like yesterday, last week, and last year?

I have seen it worked around in silly ways (snapshot and rename a table every day/week/month) and more clever ways (use separate transaction tables to record changes), but never in a particularly elegant way.

Wiser colleagues whispered to me the dirty answer "object relational" and scurried away to their dens of Rob Zombie and J2EE. I never got my head around object relational databases before leaving that world, and so am left to ponder papers from IDC with statements like this one:

"putting object extensions on RDBMSs is tantamount to adding stereo radios and global navigation systems to horse-drawn carriages"

Ouch, is that a swipe at Oracle? Seems that as far back as 1997 pundits have said that the future is in ODBMS, and not RDBMS or ORDBMS. Hmm...

Re:Issue of dimensions in RDBMS by Anonymous Coward · 2005-05-24 08:54 · Score: 0

The solution is the same as it always was: implement a proper relational database with temporal data types, and data "warehousing" will become trivial.

Until then, you might want to read this book (if you haven't already). Just looking through the examples makes me weep for a true relational implementation but we're stuck with SQL for now.
Re:Issue of dimensions in RDBMS by f00zbll · 2005-05-24 09:44 · Score: 1

Oracle's OLAP support is very object oriented, so it already exists. Just download the oracle API for OLAP in 10g and you'll see the approach is very object centric. It's much cleaner to use an OLAP approach when you need to slice and dice data. This is especially true of cases where one needs to do adhoc queries of a given time slice.
Re:Issue of dimensions in RDBMS by metamatic · 2005-05-24 10:06 · Score: 1

When you need to be able to query for the intersection of multiple dimensions, the solution is to build your relational database using a star schema. You don't need any kind of special "object relational" anything, and the end result can be quite elegant.

--
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
Re:Issue of dimensions in RDBMS by dcfix · 2005-05-24 10:27 · Score: 1

There are 2 very simple ways to solve this problem.

1) Create a versioning table that includes a timestamp field. Each time you load a new batch of data, increase the version and tie it together with a foreign key. Presto chango, you now have a 'temporal' datawarehouse.
2) Create the timestamp field in the actual table and query based on that timestamp... (hint - use a correlated sub-query to return the data with the maximum date that is less than a set date)

Both of these are pretty standard approaches.

BTW, when you're working with databases, you need to remember that the data is the important thing. Not the interface. Not the cool jargon. Not the CV boosting bullshit that has become OODB. Try writing a quaterly report for an insurance company that aggregates 25 fields in 5 million rows... using a OODB. Hah!

--
What cod piece?
Re:Issue of dimensions in RDBMS by MikeBabcock · 2005-05-25 01:14 · Score: 1

What's wrong with storing every new item in a table and disallowing updates? Use triggers to give yourself quick "current" views of the data if you wish.

--
- Michael T. Babcock (Yes, I blog)
Re:Issue of dimensions in RDBMS by GoofyBoy · 2005-05-25 14:52 · Score: 1

Just so I understand;

1) Create a versioning table that includes a timestamp field.

So if you want to insert one row (into table T1) for today's date you need to create an new entry in the versioning table and copy all data (from T1) from the previous version into T1 using the new version along with the new row of data?

2) Create the timestamp field in the actual table and query based on that timestamp

So if you want to insert one row for today's date, you just insert it adding today's date into one column. You then would query "as of a certain date"?

--
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
Re:Issue of dimensions in RDBMS by dcfix · 2005-05-26 03:25 · Score: 1

1) Create a versioning table that includes a timestamp field.

So if you want to insert one row (into table T1) for today's date you need to create an new entry in the versioning table and copy all data (from T1) from the previous version into T1 using the new version along with the new row of data?

As for option #1:

Create a version table that looks like this

Create table version
(
VersionID int identity primary key,
Created datetime
)

Create table MiscData
(
miscdataID int identity primary key,
versionID int references Version(VersionID),
MiscDataDescription varchar(255),
MiscDataIntStuff int, ...
)

So, every time that you load a new set of data (think archive as opposed to a regular data table)

2) Use this option if you're going to have a lot of dynamic data, like a pricing table where individual prices can change during the day, but you need to be able to figure out what the price list was at a specific time during the day.

--
What cod piece?
Re:Issue of dimensions in RDBMS by GoofyBoy · 2005-05-26 14:40 · Score: 1

Thanks, I just wanted make sure I understood you just in case I come across this problem.

--
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.

Re:KISS (I can prove SQL will be around) by Anonymous Coward · 2005-05-24 08:48 · Score: 1

these queries are very cumbersome. SQL is stupid

You had the right idea on this one, the rest of the industry has it's head up it's ass.

SQL: SELECT * FROM Customer

Relational algebra: Customer

SQL: SELECT first_name, last_name, order_id FROM Customer, Order WHERE Customer.customer_id = Order.customer_id

Relational: Customer JOIN Order { first_name, last_name, order_id)

And that's just simple queries.. try to write a query in SQL that depends on two query results being equal (for instance, "show me a list of all customers who have each bought at least one of every product".

No data integrity by Tony · 2005-05-24 08:51 · Score: 2, Interesting

The problem with MUMPS is the data integrity issue. As essentially a collective of hashed indexes where nodes can contain both data and more branches, it sucks for *any* kind of serious data retrieval. You basically have to create and maintain your own indexes, unless you go with Fileman, which is a massively moby hack, but a hack nonetheless.

The idea of the relational model is simple: it is based on set theory, which has a strong mathematical model. There is no equivelent model for object databases, nor for tree-based databases like MUMPS. There is no strong mathematical basis by which you can judge the integrity of your data.

Cache', by Intersystems (a Post-Relational Database!) is based on MUMPS. You've seen their adverts here on Slashdot. They claim to be object-relational, but they are no such thing: they are MUMPS. They went on a buying spree and purchased up most of the failing MUMPS vendors (DSM, MSM, etc), and now they are the big guys in the M world.

They have some pretty nify hacks which compiles their "object-oriented MUMPS" programming language (I forget what it's called) into straight M. Fun. Doesn't stop it from sucking hard.

MUMPS is, at best, a fairly bizarre language with persistent storage of global arrays.

MUMPS drives me nuts. It uses whitespace for blocking just like Python, but they had so much trouble with it, they eventually allowed a '.' to replace the whitespace, so you end up with code like this:

S node="" F S node=$O(^ZZZYYY(node)) D .. Q:node="" .. W "^ZZZYYY(node)=",node,! .. R Pause

(I stole that from this duscussion.)

(Sidenote: I have to admit, my exposure to MUMPS is one of the primary reasons I despise Python's whitespace-as-blocking. It seems replaces the poor aesthetics of brace-blocking with something more error-prone and stupid-looking, though more aesthetically pleasing. But all that's just opinion. I'm sure Python is a good language, just as I'm sure MUMPS is not.)

--
Microsoft is to software what Budweiser is to beer.

Re:No data integrity by stuffduff · 2005-05-24 09:36 · Score: 1

MUMPS didn't lack data integrity. It just didn't follow the relational model, as you pointed out. Mumps was based on binary trees. In fact the whole big thing was one big tree. At the top were two blocks, one for globals and the other for routines. The routines were the simple one, just like a file directory. The data on the other hand was pretty damn complex, with top level, middle level and data level blocks.
Your problem with external indexing would seem to indicate that your global structure could have been better. I have never seen a good mumps structure that failed to represent a self referential index when ordered through at a particular level.

As for the whitespace, well I adapted quite well to Python, I guess, in part because of that decade of MUMPS.

In any case it is not the language, but the structure of the mind that can appreciate it that makes a programmer productive. You seem to feel pretty comfortable with the relational model. When I want to do something like catalog my music, I don't want to only store a name and an artist and a song, I want to explore the influences, histories, etc, call it multideminsional model. It's the same with medicine. There are nuances which the relational model will never be able to accomidate. Sure it's good at what it does, but it will never think like a human being. And that's what we're all going to want sooner or later.

--
"Can there be a Klein bottle that is an efficient and effective beer pitcher?"
Re:No data integrity by Larthallor · 2005-05-25 14:11 · Score: 1

They have some pretty nify hacks which compiles their "object-oriented MUMPS" programming language (I forget what it's called)
It's called ObjectScript.
Re:No data integrity by NickFortune · 2005-06-05 00:26 · Score: 1

I stumbled upon this in meta-mod and thought I'd chip in. Apologies for coming so late to the party
The problem with MUMPS is the data integrity issue.

it sucks for *any* kind of serious data retrieval.

Umm... no. Ive used it professionally, as I have Oracle and a handful of other DBMS packages. Mumps and it's successor Cache are very fast. Data integrity isn't a factor in retrieval, it only affects updates and deletions (inserts too, arguably).

Cache', by Intersystems (a Post-Relational Database!) is based on MUMPS. You've seen their adverts here on Slashdot. They claim to be object-relational, but they are no such thing: they are MUMPS

There isn't actually an accepted object-relation model, so anything that says "object-realtional" can be read as "marketing spin". The best contender, IMO, for an OR data model is probably Darwen and Date's Third Manifesto, but there's been regrettably little corporate take-up of their ideas.

What cache do claim is to be post-relational. Which is to say, they assert that theirs is (one of) the data-model(s) that is taken over from the (presumably) obsolete Relational Model. I can't say I agree with them there, since the RM is the best thing since semiconductor junctions. Still, that doesn't make Cache/Mumps a bad product.

MUMPS is, at best, a fairly bizarre language with persistent storage of global arrays.

Stop right there! Never, never, NEVER make the mistake of confusing the data manipulation langauge with the underlying model!

That sort of loose thinking is why so many people think they hate the Relation Model, when all the actual suckage is in SQL. (We're talking about a plugin for the PL/1 language which has hardly been used for thirty years, one which has formally abandoned the Relational Model. and one where the secretary of its own standards committee has publicly questioned the value of SQL as a standard. If you want me to suupport all that say so. I'll dig out my dissertation and post some references).

That said, MUMPS, Cache-style isn't as bad as it looks at first. If you can ignore the culture of BASIC style keyword abbreviation and structure your code properly, it becomes a very powerful language - although like many such, it can take a while to get your head around it all. It's not to everyone's taste, granted, but it is powerful.

--
Don't let THEM immanentize the Eschaton!

Re:I did not RTFA by AKAImBatman · 2005-05-24 08:57 · Score: 1

"Blech?"

Can you back that up with some real-world examples where DB2 was worse than Oracle?

I could, but it would only start a useless argument over what database everyone prefers. Let's just say that my experience with DB2 has left me with less than stellar feelings toward that database and leave it at that. :-)

FWIW, my experience is with UDB and not the Mainframe DB2. At the end of the day, the two are very different beasts.

--
Javascript + Nintendo DSi = DSiCade

Re:I did not RTFA by daikokatana · 2005-05-24 08:57 · Score: 1

'fraid not. One dump truck should do the trick.

--
http://jcsnippets.atspace.com/ - a collection of Java & C# snippets

JSR 170 - Unified Data Repository by uss_valiant · 2005-05-24 08:59 · Score: 3, Informative

Another approach to the problem: JSR 170: Content Repository for JavaTM technology API
Standardizing the interfaces to various data resources (filesystem, database, cache, ...).

The expert group reads like a who's who in data management. And it seems to be very near to the final draft.

Re:JSR 170 - Unified Data Repository by gedhrel · 2005-05-25 03:50 · Score: 1

This is a different problem area and principally aimed at unstructured or semistructured data.

JSR170 _is_ a pretty baked model, however. The major thing it lacks is the notion of a prepared statement (a la JDBC) - which is ironic, considering (as you say) the cast list on the committee.

The problem with the lack of parameterised prepared statements is that the quoting rules in the query languages are complex to explain, and easy for a developer to screw up (vide any PHP app :-) ). So we're likely to see the widespread adoption of this API in the java world (it's certainly about time) coupled with a whole slew of injection attacks against JSR170-based web apps, alas.

Don't get me wrong: I think the JSR is a very good piece of work. However, despite the pressing time constraints, I really feel that this is an issue that should be corrected before this spec hits rec.

Another one? by ColdWetDog · 2005-05-24 09:03 · Score: 2, Funny

I've already got Windows and a girlfriend, I really don't need another irrational database.

--
Faster! Faster! Faster would be better!

Re:I did not RTFA by Anonymous Coward · 2005-05-24 09:12 · Score: 0

having spent ten years using both, I can tell you that:

Pro-DB2
1. db2 is less than 50% of the price of oracle
2. db2 is much easier to administer than oracle (check out oracle recovery procedures!)
3. db2 is much less vulnerable to being corrupted than oracle
4. db2 scales higher than oracle
5. oracle's Larry Ellison is an offensive nut-case
6. oracle's sales team will screw you

Pro-Oracle
1. oracle's locking methods are easier to develop for
2. oracle has better third-party tool support
3. oracle has more expertise in the marketplace
4. oracle has some features that can sometimes be lifesavers - like compressible partitions. When they aren't they just get in the way tho.
5. db2 fixpacks are sometimes flaky

And I'd usually choose to work with db2 over oracle: much easier to admin, rock solid. You can easily modify the databases or instances right online, restart if necessary (seldom is these days) - all without a worry about backing things up first. With oracle you don't even touch it without doing a backup. That tells you a lot about the differences right there.

Plus, it's easy to train junior IT personnel to become dbas. A year later you've got somebody who's a completely productive prod & dev dba. That never happens with oracle - where you need much more specialization.

Also, until db2 v8 came out two years ago, it was pretty far behind oracle 8 & 9. Now, it's in a great competitive position.

JDBC further standardizes SQL dialects by irritating+environme · 2005-05-24 09:15 · Score: 2, Insightful

JDBC (probably ODBC too, tho haven't used it in eight years) helps to standardize key generation (in JDBC 3.0 FINALLY), and Date processing (christ, date functions are so annoying). Most other operations can usually be done by the platform language that is processing the data, so you can avoid the tie-in that results from various SQL dialects' built-in functions. XML databases were a total flop, and so were object databases. I agree, SQL engines are so mature now that I don't see any database tech replacing them for another ten years. By the way, can we get PostGres (and now Oracle's) support of regular expression LIKEs standardized? And can we please get JDBC to support something like: INSERT $price INTO pricetable where productid = $productid rather than using ?'s and counting which ? to set values to? Hibernate's query language does this, and I really like it.

--

Hey, I'm just your average shit and piss factory.

Re:JDBC further standardizes SQL dialects by mcrbids · 2005-05-24 09:34 · Score: 1

To get around all the date/time/timestamp confusion, I switched all date functions to integer data types and use *nix epoch to keep time values. I also frequently use a "date" format based on YYYYmmdd when I want to ensure a granularity of a calendar day. This indexes quickly, and works well with the php date() functions. (my language of choice)

And can we please get JDBC to support something like: INSERT $price INTO pricetable where productid = $productid rather than using ?'s and counting which ? to set values to? Hibernate's query language does this, and I really like it.

If I were to change ANYTHING with SQL, it would be to make update and insert syntax the same as update. EG:
Insert into $TABLE set name='$name', address='$address', phone='$phone';
This would make dynamic query generation SO MUCH EASIER and more readable!

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.

What I'd really like to be able to do... by thewils · 2005-05-24 09:17 · Score: 2, Interesting

If any smart DBMS developers are listening, is to define a set of queries within the database (like for a _simple_ example "male" and "over 60" and "salary x") and then be able to refer to these criteria by name only, having the database build the query based on these rules as I choose to combine them (select xxx from yyy where 'male' or where 'male' and 'over 60').

Sort of like stored procedures in implementation - they could be called stored query definitions.

Because these query definitions would already be parsed, they don't require overhead to re-parse each time the stored query definition is executed.

Please have this feature ready in about 6 months :)

--
Once I was a four stone apology. Now I am two separate gorillas.

Re:What I'd really like to be able to do... by Spaceman40 · 2005-05-24 09:44 · Score: 1

Ever heard of 'views'?

--
I [may] disapprove of what you say, but I will defend to the death your right to say it.
Re:What I'd really like to be able to do... by thewils · 2005-05-24 10:14 · Score: 1

Yep, sure have, used them too. But they still act like a table, right? You still need to join them into your query using (sometimes) extremely convoluted SQL.

--
Once I was a four stone apology. Now I am two separate gorillas.

XML documents are an hierarchical database by irritating+environme · 2005-05-24 09:18 · Score: 2, Insightful

An XML doc is a tree, thus it is hierarchically organized data. There have been hacks to try to extend around this limitation, but relational data still has superior flexibility.

that's why XML databases flopped

--

Hey, I'm just your average shit and piss factory.

Re:XML documents are an hierarchical database by The+Slashdolt · 2005-05-24 09:31 · Score: 1

Yes, heirarchical databases. Exactly what RDBMS's were designed to move away from. Amazing how things come full circle. If you ask me, an xml schema or dtd is just a modern version of a cobol copybook!

--
mp3's are only for those with bad memories

Re:I did not RTFA by pthisis · 2005-05-24 09:21 · Score: 1

That's nice. It won't handle a multi-terabyte database, though. That's the domain of Terabase, Oracle, and (blech) DB2. It's also what the article is about.

Add postgres to that list, too.

Tough to find examples now. They used to ask people who had DBs over 1TB to post to the list, but they stopped several years ago because they had plenty of them.

I remember Datainfosys's spam rules DB is over 1TB, and the American Chemical Society's historical archives, and there was some genome mapping project. Really they had a couple examples a month, so it's not like there are only 5-10 dbs out there in postgres in these sizes.

--
rage, rage against the dying of the light

You're Right... by irritating+environme · 2005-05-24 09:25 · Score: 1

You may be being facetious, but that is something that SQL doesn't do well.

Although since PostGres can do LIKE's with regexps in them, things are better...

--

Hey, I'm just your average shit and piss factory.

Re:You're Right... by La+Camiseta · 2005-05-24 10:01 · Score: 1

I would say that with certain extensions, you're completely wrong. PostgreSQL comes with an extension that you can install which implements a vector-based full text search.

Tsearch2 Homepage

Quick and dirty HOWTO
Re:You're Right... by irritating+environme · 2005-05-25 08:28 · Score: 1

"Certain extensions" implies pretty nonportable stuff, although you can say that about RLIKE too.

The SQL standards people really have been pretty lazy, if you ask me. JDBC is really the only thing pushing conformity in the database vendors, although it hasn't done much to reel in Oracle's big bad boy maverick streak.

--

Hey, I'm just your average shit and piss factory.

Current relational alternatives by nektra · 2005-05-24 09:31 · Score: 1

I think relational databases are overestimated in detriment of other high quality existent and open source databases. One of the most important and fastest databases are (and from the same author!):

- Gigabase
- FastDB
- Perst
- DyBase
- GOODS

All of them at: Konstantin Knizhnik home page.

Re:I did not RTFA by AKAImBatman · 2005-05-24 09:35 · Score: 2, Interesting

Also, until db2 v8 came out two years ago, it was pretty far behind oracle 8 & 9. Now, it's in a great competitive position.

Last time I did a comparison, it was DB2 7.1 vs. Oracle 8. Most of what bit me in the ass with DB2 was its flakey management tools and multitude of minor details that needed tweaking from the days when it was a mainframe database. (e.g. Why does it need a buffer large enough to hold the entire blob chunk that's going to be transferred? That's just stupid. It should pull across as much as the buffer can hold, fill the requesting array, then go back for more and repeat.)

As I said, it would only start a pointless discussion on who likes what database.

--
Javascript + Nintendo DSi = DSiCade

Interesting Alternative by Anonymous Coward · 2005-05-24 09:44 · Score: 0

For those who are around Pittsburgh or CMU,
an interesting company sprang up building
P2P based database technology (Maya Design).

For a good read, check out the following paper:
http://www.maya.com/web/what/papers/maya_universal _database.pdf

Need more dimensions? by cr0sh · 2005-05-24 09:49 · Score: 1

Have you looked into the PICK data model, aka MultiValue? May or may not be exactly what you need, but if you are needing more than rows and columns - this data model should be looked into.

It has been around since the late 1960's, in fact, one could argue that PICK (actually, PICK BASIC) was one of the first successful commercial instance of a "virtualized processor" system - that is, the PICK core was a VM that ran PICK assembler p-code (of a sort), and the VM was implemented in software running either as the OS or as part of the OS (ie, in *nix implementations) - and PICK BASIC applications were compiled to the p-code - and in theory (which actually worked quite well, IIRC), the compiled objects could be run on any PICK implementation (barring vendor-specific implementation details - always inevitable in this kind of situation, re: Sun Java vs MS J++). Another point of fact is that some companies (I think Fujitsu was one) created hardware implementations of the PICK VM - in other words, "PICK processors" - which obviously ran the code much faster than the software version.

I know that PICK is still available from various vendors (D3 is one - at least, it was not too long ago) - I also think an open-source version is in the works. It was long used for "green-screen", head-down vertical market type applications, but today there are other interfaces to it beyond a serial terminal (GUI, Web, etc). There are also companies who have created completely different DBMS systems based on the PICK data model, but not using PICK BASIC or all of the other old methods...

Finally, it is possible (though very kludgy, and I wouldn't reccommend it except as a way to "play" around) to simulate extra dimensions in a standard relational DB - set up a column as a TEXT or BLOB data type, then store the data in that, separated by a non-keyboard delimiter (ie, ASCII 254 or something). Parsing, insertions and deletions won't be easy (nor fast), but I would imagine one could set up stored procedures to handle such needs. It isn't pretty, it isn't reccommended, but it can be made to work as long as the data being stored isn't too complex...

--
Reason is the Path to God - Anon

More interested in DBMS able to cope with Biology by WillAffleckUW · 2005-05-24 09:49 · Score: 1

While the XML portion looked interesting, all the other "needed features" were totally useless from my perspective. I'm more concerned with the ability to handle the facts that our biochemical/biological recordings alter as our workflow progresses and we learn new techniques or alter existing techniques. And how we data mine from that changing universe.

So, mostly not very useful from my perspective, and they missed the important things that would actually matter here.

--
-- Tigger warning: This post may contain tiggers! --

Open Source PICK - MaVerick by cr0sh · 2005-05-24 09:54 · Score: 1

Here is the site for MaVerick, the open source implementation of PICK (crazily enough, running under the Java VM!!!) - which also uses a regular DBMS as the backend (Berkeley DB, MySQL, PostgreSQL) - hmm - makes me wonder if they are simply doing what I said to try, but in a more maintainable manner...?

--
Reason is the Path to God - Anon

Re:Open Source PICK - MaVerick by cr0sh · 2005-05-24 10:00 · Score: 1

BTW - this project does appear to still be actively developed - look at the CVS changelog in the status area, which has recent commit dates and changes. Also, it appears that in addition to third-party data stores, a "native" data store can also be used...

--
Reason is the Path to God - Anon

SQL will live, but Temporal will come as well by photon317 · 2005-05-24 09:57 · Score: 1

SQL will enjoy a long life. There will probably be another update to the SQL standard, SQL07 or something, just about when everyone gets caught up tot he SQL99 standard of today. It does what it does very well. While the syntax is cumbersome, there aren't many better ways to represent the relational complexity. Sure, they could stand to fix the god-awful decisions that were made with regards to quoting and nesting characters and things of that nature, but the essential syntax really needs to be about like it is.

THe big thing where I see a real major follow-on to SQL emerging is in the Temporal area. There's some existing work out there, lots of papers and designs and whatnot, for Temporal SQL. SQL just isn't that great at handling time-related data, even though a lot of people coerce it into doing so. The Temporal SQL replacements/extensions attempt to remedy these, and someday whatever comes of the Temporal research will become mainstream.

--
11*43+456^2

Re:SQL will live, but Temporal will come as well by joib · 2005-05-24 18:30 · Score: 1

There is already a SQL 2003 standard.

Re:I did not RTFA by Anonymous Coward · 2005-05-24 10:00 · Score: 0

> Most of what bit me in the ass with DB2 was its flakey management tools and multitude of minor
> details that needed tweaking from the days when it was a mainframe database.

yeah, i didn't like those earlier versions of db2. in fact, back in the 90s on a really huge project IBM offered my project db2 for free. We opted to spend $800k on informix instead! That was v5 if I remember, but I don't think that v7 was all that much better really.

Close...Fragmentation Promenade. by Anonymous Coward · 2005-05-24 10:00 · Score: 0

Databases are going to fragment. Different types for different data.

Re:KISS (I can prove SQL will be around) by duckworth · 2005-05-24 10:12 · Score: 1

How about:

SELECT SUB.CustomerID, COUNT(*) CNT FROM ( SELECT DISTINCT C.CustomerID, OD.ProductID FROM Customers C INNER JOIN Orders O ON O.CustomerID = C.CustomerID INNER JOIN [Order Details] OD ON OD.OrderID = O.OrderID ) SUB GROUP BY SUB.CustomerID HAVING COUNT(*) = (Select COUNT(*) FROM Products)

MS SQL SERVER 2000 in Northwinds DB

3rd form, no dup data==no totals on invoice. by HornWumpus · 2005-05-24 10:18 · Score: 2, Informative

You're advice: don't store total's on invoices and just requery the line items every time? (don't let me construct a straw man please correct me if I'm misinterperting)

You do realize you don't know how big my parent table is nor how infrequently the children change.

All generalizations are false.

IMHO When the child data changes very rarely or never, update triggers that recalculate parent totals are sometimes the way to go. This violates third normal form and is the most common de-nomilization I've done. Hell I've lived without the update triggers and just stored totals. When I was a kid we ran data checking batch jobs to check data validity.

--
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'

Re:KISS (I can prove SQL will be around) by snorklewacker · 2005-05-24 10:20 · Score: 1

"show me a list of all customers who have each bought at least one of every product"

Congratulations, you've discovered that SQL does not express the full relational algebra. Here's your sticker and decoder ring showing you belong to the Captain Obvious Fan Club!

Speaking of obvious, I don't know how obvious that query is in relational algebra, and hell if I can remember how to express it in SQL. It is possible, but I do guarantee it will be an expensive query, since you'll hit every last row in your orders table to calculate that. A real-world solution would be to just sum up the number of distinct products each customer has ordered and compare that with the total number of products offered. If you haven't removed products, that number will give you an exact answer.

The real irksome thing is that most people wanting to throw out SQL are trying to replace it with something even less expressive.

--
I am no longer wasting my time with slashdot

"According to the ACM" - not! by OnanTheBarbarian · 2005-05-24 10:24 · Score: 1

The lead-in carelessly claims that the opinions in the ACM Queue article are those of the ACM. This is almost certainly not true; the ACM merely operates an on-line journal where authors can express their own opinions in this case.

Once again, Slashdot manages to bollocks up a lead-in with careless, inaccurate, flip or overly opinionated lead-ins that might have taken about 2 minutes to clean up.

Re:I did not RTFA by Anonymous Coward · 2005-05-24 10:29 · Score: 0

Heh, MySQL...the moped of databases!

Metakit by fbonnet · 2005-05-24 11:20 · Score: 1

Metakit is a radical alternative to conventional RDBMS. Portable, self-contained, on-the-fly restructuring, fast, memory-mapped...
IIRC Apple uses it for MacOS X's address book.

Re:Metakit by Anonymous Coward · 2005-05-24 11:45 · Score: 0

No one uses Metakit any more. Everyone uses Sqlite which is much better - it's also in Apple's OSX.
Re:Metakit by yerM)M · 2005-05-24 12:59 · Score: 1

Responses like this are bound to be ignored out of hand. Metakit is used widely in certain areas such as bioinformatics and chemistry. I have found that it scales far better than sqlite for certain circumstances and is very, very fast when used correctly. I have made several metakit applications as fast as berkeley db and with full relational algebra support! I liked it enough that I wrote a Dr. Dobb's article on metakit.
The fact that views can automatically hold subviews is a huge bonus. I.e tables in tables so there is no need for joins.

I have a 1.5 gig metakit database the millions of rows per table that dies under sqllite. Conversely, I have some complicated sql that works very well under sqllite that metakit doesn't handle very well.

Now, I have to mention the new metakit virtual machine thrive, while currently only implemented for integer datafields, is incredibly amazing without really any optimization, but, alas still a while out.

Now, if you are stuck with sql, by all means use sqllite, your transition will be much, much faster. But if you want to use a full relational algebra approach, try metakit. If you want to do financial transactions, try ksql (built on kay) holy crap is that fast for financial data.

To pronounce PostgreSQL, read the FAQ. by tepples · 2005-05-24 11:22 · Score: 2, Informative

I think it's more because people look at the name, try to figure out how to pronounce it, then give up

Why, given that this FAQ entry clearly states that it's "post-gress-cue-ell"?

Re:To pronounce PostgreSQL, read the FAQ. by Anonymous Coward · 2005-05-25 04:42 · Score: 0

I just call it Postgres. That was the original name, before it even supported SQL. Most managers have heard of Ingres, so "postgres" sounds similar.

Theoretical Problem with Relational Databases by tjstork · 2005-05-24 11:23 · Score: 1

All relational databases have a theoretical problem that make it possible for entrants to a market place to succeed.

a) They are based on relational algebra. This is good, but, relational algebra will only take you so far. There is an entire domain of relational calculus, and if, you cannot implement all of it, you can at least make a go of cherry picking a piece of it for your own application. I would like to think my own domain is a good stab at this, but, since I'm maxed on all my credit cards and drive two cars that I can't afford, I'll assume not!

b) They are ruined by the languages that talk to them. Yes, I hear about the virtues of object oriented programming, but relational algebra, for what it is worth, is much more theoretically complete.

I write my own shareware, specialized, non-relational database and I've come to the conclusion that it is enormously difficult to match the performance of experienced relational database designs when trying to do relational types of things. I can make my database load a certain kind of data many, many times faster than SQL Server can, but when I try to do things that are more to the strength of a relational engine, my stuff looks pretty weak.

Anyone can make a relational database engine that looks really good at a million rows. But jack that up to ten million, or a hundred million, and then see just how well your design stacks up!

--
This is my sig.

TreSQL? by tepples · 2005-05-24 11:25 · Score: 1

sure, the latest hip "Tres Kewl" software for business might be written in something else, but SQL will be around for a long, long time.

Heck, by popular demand, "Tres Kewl" will probably be extended with an SQL backend called "TreSQL".

So They Finally Discovered Prolog? by Anonymous Coward · 2005-05-24 11:26 · Score: 0

Prolog is a relational developer's wet dream. Given orders that consist of items that are shipped by shippers, then to find all orders for the item "soap" that were shipped by "Marathon, Inc." I issue the a query like:

order(Orderid, Orderedby),
orderitem(Orderid,Itemid),
item(Item id,"soap"),
itemshipper(Itemid,Shipperid),
shipp er(Shipperid,"Marathon, Inc.").

and Prolog finds them all for you. IOW you specify a query by specifying the relationships among your tables (5 tables here) and by specifying constraints (e.g., "soap" and "marathon", above).

Learning Prolog shows you how SQL should be used, but also shows you how limited SQL is.

Indexed views by tepples · 2005-05-24 11:33 · Score: 1

You're advice: don't store total's on invoices and just requery the line items every time?

In the ivory-tower relational design, you're supposed to set up a view on SELECT SUM(...) and then index that. This way, you gain the performance benefits of denormalization, but the DBMS takes responsibility of maintaining consistency on update. Application-level denormalization happens primarily because the current SQL DBMS implementations do a urine-poor job of handling indexed views.

One problem is that the price of a given line item often changes as a function of time, but the amount charged on a given invoice is intended to use a snapshot of the price offers at the time the order is agreed upon. In that case, representing each invoice as a separate negotiated offer in the database might still follow the normal forms. A true relational ivory-towerist would add the effective date of a price change as an additional field.

Re:Indexed views by HornWumpus · 2005-05-24 11:49 · Score: 1

When I set up an Invoice table and a lineitem table and constructing appropriate update triggers does'nt what I'm doing amount to helping the backend optimizer get the indexed view right? (assuming the data changes infrequently enough that this is the right tune for the problem)
What I'm saying is should'nt the view compiler recognize the total scenario (it's bloody common), then (perhaps with some compiler hints) setup the view basically as I've discribed above? I suppose the problem is recognizing when the lineitems are changing so frequently the update trigger server load exceeds the Select Sum(),,GroupBy server load.

What's the term for a view that is stored in a temp table (realized view?).

--
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'

Complexity theory by tepples · 2005-05-24 11:52 · Score: 1

The relational *model* says nothing about performance, which is an implementation detail.

Models have inherent performance limits because each model is only as fast as its best implementation. One thing studied in computational complexity theory is the complexity of some optimal implementation of a given model. If model A's optimal implementation is slower than model B's at a given sequence of operations, then I'd overload the terminology a bit to state that model A itself is slower. How would you disagree?

Thats because mySQL is more like DBase then Oracle by HornWumpus · 2005-05-24 12:06 · Score: 1

You open tables directly with the client. etc.

I know the newest version is slightly less retarded, but I bet most of the client code still uses the DBase III type methods.

--
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'

Re:KISS (I can prove SQL will be around) by earthbound+kid · 2005-05-24 12:28 · Score: 1

I know how you feel. In high school, my friend was like, "Hey, check out this new thing called 'Napster.'" I took a look, but decided that it could never compete with palavista and the other FTP search & trade sites. Napster was kid's stuff. All the real MP3 heads will stick with their FTP sites, thank-you-very-much.

Of course for real vision, no one beats CmdrTaco's "No wireless. Less space than a Nomad. Lame."

Comment removed by account_deleted · 2005-05-24 12:35 · Score: 1

Comment removed based on user account deletion

what about cache? by asciiRider · 2005-05-24 13:12 · Score: 1

The healthcare org I work for (3 hospitals, 3500 users) runs the Cache database - it's supposed to be and insanely stable - how does it differ from the traditional oracle/db2? Anybody care to explain in terms your typical slashdotter might understand?

Re:what about cache? by sdjunky · 2005-05-24 14:17 · Score: 1

Not to mention that all VA's run Cache for their VistA system.

They've been using it for years since it's highly scalable.
Re:what about cache? by Anonymous Coward · 2005-05-24 15:25 · Score: 0

I think it's just a wrapper around Microsoft Access. HTH

theory to practice by icepick72 · 2005-05-24 13:27 · Score: 1

Although no concrete solutions are included,

That's seems to be the problem. From reading, database seem to be one of the the most theorized technologies, with a comparatively low set of conrete solutions. Sure a lot of databases and tools exist, but the industry is having a really hard time moving away from the relations model, and maybe for good reasons. I think instead of a better database technology coming along, the database will just gradually evolve and a long from now somebody will notice and write an article and lament a about the way it used to be.

Re:theory to practice by Anonymous Coward · 2005-05-24 13:30 · Score: 0

I just re-read my post and it's chock full of spelling and grammatical errors, so I'm apologizing as AC. I'm trying to get some work done for school tomorrow evening but cannot pull myself away from Slashdot. Damn! So I'm just compromosing by typing faster on /.

What about OLAP??? by Anonymous Coward · 2005-05-24 13:34 · Score: 0

Although relatively new, OLAP (online analytical processing) systems like Essbase and OFA pack a lot of whallop when doing analytical work. I'd like to hear more about OLAP uses for analyzing web click data...

Reply from Ivory Tower. by TapeCutter · 2005-05-24 13:45 · Score: 1

Pokes head out of window and takes in a breath of "real world" air... cough, choke, splutter...

Hello down there. We don't claim that normalisation is suitable for anything except optimising the amount of space required to hold your data set. For space optimisation nothing has yet been devised to improve apon it.

...Pulls head back inside, thinking there must be a better building material than dead elephants.

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.

Re:KISS (I can prove SQL will be around) by panaceaa · 2005-05-24 14:12 · Score: 1

What do you find cumbersome today? :)

--
my blog

Which part of COMPLETE do you not understand? by Anonymous Coward · 2005-05-24 14:25 · Score: 0

Whatever your data is, you can represent it under the relational model. You need cubes to model multidimensional data? You can use a star schema, or a number of other representations embedded within the relational model. Need to keep track of the history of the changes of the values of a property of one entity? Put the value assignments in a historical join table that points back at your "main" table.

bad bad bad by Anonymous Coward · 2005-05-24 14:59 · Score: 0

when will people learn that SQL was created in a time of ram scarcity? Oracle was designed back when machines had 4 or 8 MB of RAM __tops__. New developers should experiment with vector based systems instead of row based systems. Before anyone boohaahaas this as a toy, I recommend you look at www.kx.com customer's list, all the top Ibanks and bond houses in the __world__ use it. Lehman brothers has a 50 TB bond datbase that uses this technology.

http://www.kx.com/

http://www.kx.com/a/kdb/document/contention.txt

read up on http://www.kx.com/ they have a almost fully compliant sql engine written in 200K of code. The interpreter fits in a couple lines of cache.

This is the original J interpreter (written by Arthur Whitney), it looks like line noise, but use the 'indent' command and you will see its beauty:

typedef char C;typedef long I; typedef struct a{I t,r,d[3],p[2];}*A; #define P printf #define R return #define V1(f) A f(w)A w; #define V2(f) A f(a,w)A a,w; #define DO(n,x) {I i=3D0,_n=3D(n);for(;it=3Dt,z->r=3Dr,mv(z->d,d,r); R z;} V1(iota){I n=3D*w->p;A z=3Dga(0,1,&n);DO(n,z->p[i]=3Di);R z;} V2(plus){I r=3Dw->r,*d=3Dw->d,n=3Dtr(r,d);A z=3Dga(0,r,d); DO(n,z->p[i]=3Da->p[i]+w->p[i]);R z;} V2(from){I r=3Dw->r-1,*d=3Dw->d+1,n=3Dtr(r,d); A z=3Dga(w->t,r,d);mv(z->p,w->p+(n**a->p),n);R z;} V1(box){A z=3Dga(1,0,0);*z->p=3D(I)w;R z;} V2(cat){I an=3Dtr(a->r,a->d),wn=3Dtr(w->r,w->d),n=3Dan+wn; A z=3Dga(w->t,1,&n);mv(z->p,a->p,an);mv(z->p+an,w->p ,wn);R z;} V2(find)true V2(rsh){I r=3Da->r?*a->d:1,n=3Dtr(r,a->p),wn=3Dtr(w->r,w->d) ; A z=3Dga(w->t,r,a->p);mv(z->p,w->p,wn=3Dn>wn?wn:n); if(n-=3Dwn)mv(z->p+wn,z->p,n);R z;} V1(sha){A z=3Dga(0,1,&w->r);mv(z->p,w->d,w->r);R z;} V1(id){R w;}V1(size){A z=3Dga(0,0,0);*z->p=3Dw->r?*w->d:1;R z;} pi(i){P("%d ",i);}nl(){P("\n");} pr(w)A w;{I r=3Dw->r,*d=3Dw->d,n=3Dtr(r,d);DO(r,pi(d[i]));nl() ; if(w->t)DO(n,P("p[i]))else DO(n,pi(w->p[i]));nl();}

C vt[]=3D"+{~=3D'a'&&a'9')R 0;z=3Dga(0,0,0);*z->p=3Dc-'0';R z;} verb(c){I i=3D0;for(;vt[i];)if(vt[i++]=3D=3Dc)R i;R 0;} I *wd(s)C *s;{I a,n=3Dstrlen(s),*e=3Dma(n+1);C c; DO(n,e[i]=3D(a=3Dnoun(c=3Ds[i]))?a:(a=3Dverb(c))?a :c);e[n]=3D0;R e;}

main(){C s[99];while(gets(s))pr(ex(wd(s)));}

Here is another excellent page:

http://www.kuro5hin.org/story/2002/8/30/175531/763

Indices are a waste of space. by HornWumpus · 2005-05-24 16:05 · Score: 1

But I'm not sure if I'm remembering the form correctly.

Unique identifier or index required?

--
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'

Re:Indices are a waste of space. by TapeCutter · 2005-05-24 18:10 · Score: 1

"Unique identifier or index required?" - Exactly, everyone thinks there is no difference.

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.

Re:KISS (I can prove SQL will be around) by Bronster · 2005-05-24 16:21 · Score: 1

try to write a query in SQL that depends on two query results being equal (for instance, "show me a list of all customers who have each bought at least one of every product")

SELECT Customer.* FROM Customer, (
SELECT CustomerId, COUNT(DISTINCT ProductId) AS NumProducts FROM Purchase GROUP BY CustomerId
) AS DistinctPurchases, (
SELECT COUNT(DISTINCT ProductId) AS NumProducts FROM Product
) AS DistinctProducts
WHERE DistinctPurchases.NumProducts = DistinctProducts.NumProducts
AND DistinctPurchases.CustomerId = Customer.CustomerId;

Or in English:

Count how many different products have been purchased by each customer
Count how many products exist
Show me all the customers where the two numbers above are equal

Not that hard really.

Dynamic Relational by Tablizer · 2005-05-24 16:21 · Score: 1

Dynamic Relational may be a solution to some of the problems raised. Columns and tables could be "created" on the fly. Another improvement is to replace SQL with a better relational query language. Alternatives include Tutorial-D and my pet, SMEQL (originally called TQL but found a name overlap).

--
Table-ized A.I.

Hierarchical database by Anonymous Coward · 2005-05-24 16:24 · Score: 0

I have often thought about how cool it would be with an Hierarchical database which you could sort huge amounts of data very easily and then have like symlinks (symolic links) that point from one place to another.

That could sort a huge amount of data in a very simple and hierarchical way kind of like XML or a file system.

Imagine storing the information of a brain in such a database.

SQL Showdown by Tablizer · 2005-05-24 16:36 · Score: 1

SQL isn't an implementation of relational algebra. It's an implementation of relational calculus plus a bunch of extra features (sorting, analytics, and whatnot). The idea is that the user should be able to specify what they want (relational calculus) instead of how to get it (relational algebra).

The author was not disputing the concept of declarative techniques (such as constraint-based programming where you ask for what you want instead of how), but rather complaining about SQL as a language for doing such.

I have to agree with the SQL complaints. I even designed a new query language intended to replace SQL (but have no test implementation yet). I will pit it against SQL as far as simplicity and elegance any day (although measuring "elegance" can be subjective).

--
Table-ized A.I.

Re:SQL Showdown by Anonymous Coward · 2005-05-25 04:31 · Score: 0

> (but have no test implementation yet)

And never ever will. There isn't even a coherent spec for it. If there were, it would be a trivial exercise to translate it into a HLL like perl with DBI calls.
Re:SQL Showdown by Tablizer · 2005-05-25 15:39 · Score: 1

And never ever will.

Ye of little faith. Actually I began building an implemention (for experimentation only, not production). As long as I don't procrastinate, someday it will be avail.

--
Table-ized A.I.

Re:More interested in DBMS able to cope with Biolo by Anonymous Coward · 2005-05-24 16:54 · Score: 0

Data management does not equal knowledge management.

The biology example focuses on the changing interpretations of data, which, assuming that the data was well modelled, is really a problem for the KM application layer, rather than the data management layer.

Remember -- data itself doesn't change, only interpretations do.

Re:I did not RTFA by Tablizer · 2005-05-24 16:57 · Score: 1

Yeah, for those terabytes of data taken up by your mom's recipes and your cd collection, the extreme power of PHP and MySql is all you need, man.

Well, us nerds need *some* way to justify the cost of a 4-way CPU box to the folks. How else are we gonna run our 3D porn server?

--
Table-ized A.I.

Unless the PHB is the problem by HornWumpus · 2005-05-24 18:23 · Score: 1

Show me a slow performing database-driven application, and I will show you a set of indexes, stored procedures, vertical/horizontal table partitioning, or table normalization/denormalization that will fix the problem.

You discount human factors. If there are no resources and big jobs to do, you're stuck.

I've enjoyed a few 'That's impossible' moments myself, once at the expense of a over educated 'expert' that did'nt know how to read an execution plan. Same deal, 3 minutes turned into 15 seconds. I was the twentyeth or so person thru the code looking for speed. There sure are a lot of clueless people out there selling themselves as SQL experts.

--
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'

Return to the Business Object by JPyObjC+Dude · 2005-05-24 18:38 · Score: 1

The full features of RDBMSs are totally overkill for about 90 percent of business applications. Standardized Business Object level data management is all that is needed.

I am suprised that more developers are not thinking this way while most f500 companies are depending on such systems for their core product and lifecycle management.

Relational Object Bridging is nice but still to programmatic of a solution. A Business layer is the way to go that is a black box with language independent API.

JsD

well by Stu+Charlton · 2005-05-24 19:21 · Score: 1

Most databases do allow you to turn off logging (durability) and allow you to ratchet back the isolation level. I can't think of any sane technical reason to eliminate atomicity and consistency.

--
-Stu

Re:KISS (I can prove SQL will be around) by Ed+Avis · 2005-05-24 23:46 · Score: 1

In recent SQL versions you can at least say

SELECT FIRST_NAME, LAST_NAME, ORDER_ID
FROM CUSTOMER
JOIN ORDER USING CUSTOMER_ID

I agree about the 'SELECT * FROM' crap though. It wouldn't be a big extension to SQL to let you just say 'CUSTOMER'.

All customers who have bought at least one of every product? Not too hard, surely:

SELECT *
FROM CUSTOMER C
WHERE NOT EXISTS (
SELECT *
FROM PRODUCT PROD
WHERE NOT EXISTS (
SELECT *
FROM PURCHASE PUR
WHERE PUR.PRODUCT_ID = PROD.PRODUCT_ID
AND PUR.CUSTOMER_ID = C.CUSTOMER_ID
)
)

OK, a bit cumbersome - 'where does not exist any product he didn't buy' - but you get used to it after a while. A true set difference operator would be useful, or perhaps you could rewrite part of the above using outer joins and testing for a null value in the result of the join.

--
-- Ed Avis ed@membled.com

Re:More interested in DBMS able to cope with Biolo by WillAffleckUW · 2005-05-25 07:09 · Score: 1

Data management does not equal knowledge management.

The biology example focuses on the changing interpretations of data, which, assuming that the data was well modelled, is really a problem for the KM application layer, rather than the data management layer.

Remember -- data itself doesn't change, only interpretations do.

It's not so much the changing interpretations of the data as a combinatorial explosion of altering procedures (even for one set as it progresses thru the labs), altering models (we learn things in biochem, realizing that what we thought was just an ATP process is more complex, as other scientists (or ourselves) discover how things work), and the variation in actual recording between different scientists and assistants.

Most DMBS users exist in a world of facts with static states, here we measure phase transitions in milliseconds and conformational changes of protein domains in varying temperatures and conditions. We need the data, not theories about how we should measure it the way you measure the slow non-biochem world.

But thanks for thinking about it. We'll keep coding in Perl and using Terabytes of storage while designing the next generation of computers that run on these processes ... ok, that's not my department, but some others here do that.

--
-- Tigger warning: This post may contain tiggers! --

Not quite true by leonbrooks · 2005-05-25 12:35 · Score: 1

This article speaks of MySQL serving ~1TB in real life, and links to a benchmark which has MySQL scaling as well as Oracle (just one benchmark, but I think it makes the point).

MySQL has a number of features, restrictions and peculiarities which I find irritating, but in terms of raw performance, especially on reads, it doesn't seem to stop when your database gets seriously large. I find PostgreSQL much more pleasant to use, and this article speaks of Fujitsu helping to add Table Spaces to make management of data "into the hundreds of gigabyte" easier, with the implication being that people already have PostgreSQL databases that large, and the feature is basically a bonus. This article also mentions a PostgreSQL database of over a terabyte.

I think you'll find that the limitation is not the software, the limitation is that precious few MySQL DBAs are familiar with databases larger than you can squeeze into a desktop machine (the machine in front of me will take 4x250GB IDE disks for a total of 1TB of storage, for example, and if you had matching SATA drives as many controllers do, put in a new PSU and double that).

The developer.com article mentions that Oracle was harder to tune for larger databases than MySQL, so perhaps this is changing, perhaps we will see more people asking if it's worth spending the extra money for a database that's harder to operate, and no faster. Perhaps it would be cost-effective to spend the money on more servers instead (you can get a pair of jaw-droppingly impressive servers for the price of a single high-end Oracle licence), and rely on redundancy rather than expertise. PostgreSQL supports replication, and there are bolt-ons to do the same for MySQL, kinda-sorta, so it's not an unreasonable proposition and can only get more attractive as these features are improved.

--
Got time? Spend some of it coding or testing

You've made an authoritative claim by leonbrooks · 2005-05-25 12:44 · Score: 1

Now back it up with Real Life(tm) references.

I don't like MySQL (I prefer PostgreSQL or ibFireBird) but at 100 tickets a second it does seem to cut the ice for large applications.

Arjen also routinely mentions "terabyte" databases, although he tends to speak more in terms of "billions of records". If in doubt, email him. You'll get an authoritative answer.

--
Got time? Spend some of it coding or testing

db4o object database is new and more KISS by Chris_Stanford · 2005-05-25 13:18 · Score: 1

Some "new" databases exactly meet your requirements. db4o, the open source object database, native Java and .NET is:

- very lean and not complex at all (one line of code stores any object)
- scales extremely well
- is cross platform Java and .NET
- very performant (up to 44x faster than Hibernate+MySQL, for instance)

Chris
http://www.db4o.com/

"Data, Get Smart!" by Klaus+Wuestefeld · 2005-05-25 13:45 · Score: 1

The article is basically saying data has to get smart.

That's precisely what object-orientation is all about: bringing data and behavior together.

OODBMSs such as http://www.db4o.com/ will be key players in this future of his.

See you, Klaus.
Db4o - The Open Source Object Database
Prevayler - Persistence is Futile

Re:More interested in DBMS able to cope with Biolo by GoofyBoy · 2005-05-25 15:11 · Score: 1

>we measure phase transitions in milliseconds and conformational changes of protein domains in varying temperatures and conditions.

Look up star schema data models, that might give you some ideas.

Think of events with values that occur over time. So its either sales of blue jeans of size 14 in New York state during the period of Dec 1 to 15th, or the changes of protiens at a certain temp/conditions during the 121 second and 300 second marks.

--
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.

Re:More interested in DBMS able to cope with Biolo by WillAffleckUW · 2005-05-26 03:47 · Score: 1

Think of events with values that occur over time. So its either sales of blue jeans of size 14 in New York state during the period of Dec 1 to 15th, or the changes of protiens at a certain temp/conditions during the 121 second and 300 second marks

I think you mean 121 millisecond to 300 millisecond.

Conformational structural changes in proteins are measured in very short time segments, nowadays using lasers to heat them up fast.

We also freeze them at -80 and -40 C temps, or use 4 C, 20 C, or 40 C temps (hint, 20 C is room temp).

--
-- Tigger warning: This post may contain tiggers! --

360 comments