Brian Aker On the Future of Databases

Well by CrispBH · 2008-06-03 11:03 · Score: 5, Funny

I couldn't... agree... more... I'd say that some... very valid... points... have been... raised.

Re:Well by felipekk · 2008-06-03 11:06 · Score: 2, Funny

Very ... funny
Re:Well by emurphy42 · 2008-06-03 11:25 · Score: 2, Informative

Service-oriented architecture
Re:Well by njcoder · 2008-06-03 12:04 · Score: 2, Funny

SOA = Shit-Outta-Acronyms

--
Open Source Java DAO Generator
Re:Well by dgatwood · 2008-06-03 12:20 · Score: 0, Offtopic

D**n it, Bill, I'm an actor, not a doctor. Wait, what?

--
Check out my sci-fi/humor trilogy at PatriotsBooks.
Re:Well by Bogtha · 2008-06-03 12:21 · Score: 2, Informative

Come on, he's talking about the future of databases. He was just trying to set the mood by doing his best Kirk impression.

--
Bogtha Bogtha Bogtha
Re:Well by dintech · 2008-06-03 20:56 · Score: 0, Offtopic

Oh, come on. Even Ned Flanders says damn. :)
Re:Well by Anonymous Coward · 2008-06-03 22:41 · Score: 0

Mr. Aker's only rehashing what I discovered back in 1996 for EEC Systems (now SuperSpeed.com), albeit nearly 13 yrs. later... or, perhaps, he's just saying "it's time now for this stuff".

(As regards using ramdisks for database processing to remove random access latencies & lowering seek time)

Plus, what I later expounded on for CENATEK (as regards their TRUE SSD, the RocketDrive, which can be spanned/striped into a 16gb unit via 4 of them)... using hardware based units, not software based ones (as was done for EEC Systems & their SuperDisk product, a mirroring back to backing HDD software based ramdisk).

This idea, for database work, took EEC Systems (SuperSpeed.com, whom I had also reprogrammed their other product, a disk level cache called SuperCache II, for up to 40% better efficiency + performance as well on a later paid contract) to a finalist position @ Microsoft's Tech Ed 2 yrs. in a row in the hardest category there: SQLServer Performance Enhancement...

It works, no questions asked.

I love seeing this... finally, the technology is becoming more "mainstream" in SSD's, although, I would never put "flash based SSD's" into the same category as units like Gigabyte's SATA 150 IRAM or CENATEK's PCI 2.2 133 RocketDrive (or better ones like them, TRUE SSD's, & NOT flash based stuff with its limited write cycles).

APK

P.S.=> It just makes sense - you can do all the optimization @ the software level there is, & I am NOT knocking this by any means (programmer/analyst - software engineer here for 15++ yrs. running as a pro - & big fan of better algorithms/engines, by ALL means), but "there is no software substitute for GOOD hardware" & that old rule usually stands the "test of time", everytime... combine them both? You get the best of BOTH worlds, & then some! apk
Re:Well by deroby · 2008-06-03 22:56 · Score: 2, Funny

Wouldn't he rather say something along the lines of : damnedideloodi neighbour !

--
If there is one thing to be learned on slashdot, it has to be sarcasm.
Re:Well by liloldme · 2008-06-04 01:13 · Score: 3, Funny

Same Old Architecture

Dear Slashot by CopaceticOpus · 2008-06-03 11:07 · Score: 5, Funny

Can we please have another loud, circular debate over which database is best? It's the only way your favorite database will ever win.

Thank you.

Re:Dear Slashot by Anonymous Coward · 2008-06-03 11:42 · Score: 1, Funny

MyPostGroracleSQL Server 2008
</thread>
Re:Dear Slashot by CastrTroy · 2008-06-03 12:46 · Score: 1

Since you brought up the topic let me bite. I was just doing some testing today to see how certain queries we were running would run on MySQL, as compared to SQL server. Just to see if the query was so slow because of the size of the dataset, or because SQL Server is made by MS. Anyway I was in the process of transferring data, and I ran into this weird 4GB MyISAM table size limit. You can run a query and alter the table so that it can store more data, but what kind of DB in the year 2008 doesn't by default, out of the box, support storing more than 4GB of data in a single table. it's really not that much data anymore. Yes I realize that InnoDB doesn't have these crazy limitations, but MyISAM is much faster, and I don't require any of the transactional stuff for this table.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:Dear Slashot by chromatic · 2008-06-03 16:19 · Score: 1

Isn't that a filesystem limit?

--
how to invest, a novice's guide
Re:Dear Slashot by dave87656 · 2008-06-03 16:48 · Score: 5, Interesting

Okay, I'll bite too ...

We've been running MySQL using MyISAM since 2002. It's delivered acceptable performance until recently as we've expanded our application and the data volumes have increased. Now, we have to reorganize it on a frequent basis (we just backup and restore).

But, we really need to move to a transactional model so I've done some benchmarking between InnoDB and Postgresql. In almost all cases, Postgresql was significantly faster. Our application is very transactional with alot of writes.

And from what I've read, Postgresql scales well to multiprocessors and multiple cores where as MySQL does not. I know Falcon is coming but it was still very Alpha at the time I compared - I couldn't get it to run long enough to perform the tests.

Has anyone else compared Postgres to MySQL/Innodb?
Re:Dear Slashot by Bacon+Bits · 2008-06-03 17:23 · Score: 1

No. MyISAM uses a 32-bit address for row pointers for tables. A 32-bit address limits you to a 4GB addressable size.

Besides, the last file system to have a 4GB file size limit was FAT32. NTFS's and HFS+'s limits are in exabytes and ext2's is in terabytes.

--
The road to tyranny has always been paved with claims of necessity.
Re:Dear Slashot by Bacon+Bits · 2008-06-03 17:41 · Score: 2, Informative

Simply put, MyISAM isn't meant for data sets that large. It's meant to be fast with less regard for data integrity than the alternatives. That's by design. When you increase the max size of of the table, you change the bit length of the addresses used for indexing the table and such. Increasing the bit length slows the system, particularly when the bit length exceeds the bit size of the processor. I'd argue more that the default engine should be InnoDB rather than MyISAM, and that internal tables should also be run as InnoDB now.

Additionally, I'd argue that comparing a MyISAM table to SQL Server (or any other transactional, ACID-compliant RDBMS) is not a fair comparison. If all you care about is speed, then you can get even more if you go with an embedded database like Firebird or SQLite. Or try a flat file. Those are terrifically fast if you do them right. Why do you think file systems are so much more efficient than RDBMS's?

Honestly, there are better ways to optimize most databases which don't involve sacrificing data integrity to do so. Examine your indices and views. Maybe your DB isn't normalized properly. IMO, sacrificing OLTP integrity to satisfy OLAP speed is like taking supports from the first floor to finish the roof.

--
The road to tyranny has always been paved with claims of necessity.
Re:Dear Slashot by Ed+Avis · 2008-06-03 20:03 · Score: 1

Yes I realize that InnoDB doesn't have these crazy limitations, but MyISAM is much faster,
That in a nutshell is the problem with any discussion of MySQL - or indeed with MySQL itself. Any limitation like 'doesn't support tables >4Gibyte' can be answered with 'well use a different table type'. Meanwhile, any benchmark or scalability test can be won by MySQL by using MyISAM tables. It's not really meaningful to talk about 'MySQL' at all; you should say 'MySQL/MyISAM', 'MySQL/InnoDB' and so on. They are different database systems with different characteristics.

--
-- Ed Avis ed@membled.com
Re:Dear Slashot by starsky51 · 2008-06-03 20:59 · Score: 1

MyISAM uses a 32-bit address for row pointers for tables.
Surely that would limit it to ~4.3 billion rows, not a 4GB file size.

--
There are 2 types of people in this world. Those who understand ternary and those who don't.
Re:Dear Slashot by Two9A · 2008-06-03 21:28 · Score: 1

The row pointer refers to the point in the table file where the row begins. It's a byte-level address into the file; because it's 32-bit, it can only address a row up to (2^32)-1 bytes into the table file. Hence, 4GB limit on table size.

--
xkcdsw: the unofficial archive of Making xkcd Slightly Worse
Re:Dear Slashot by SteveAyre · 2008-06-03 22:06 · Score: 1

Not any longer - I'm using 5.0 and have an 8GB MyISAM table currently... I believe they've changed it so the 32bit number is no longer an offset but is instead a row, so the limit is now 4.3billion rows which is much larger. There's still a limit, but it's several orders of magnitude larger.

However, you should look at moving to using InnoDB once your data gets that large as handles locking far better and AFAIK has no limit on the size of tables. You'll also probably find it faster for many queries since its locking is far more intelligent. It's far better at handling the database crashing than MyISAM too (ever tried repairing an 8GB MyISAM table - it can take days to rebuild the index).
Re:Dear Slashot by CastrTroy · 2008-06-04 00:51 · Score: 1

So then why does MyISAM even exist? For large tables, you have to use InnoDB, because MyISAM doesn't handle lots of rows easily, and for samll numbers of rows, it doesn't make much of a difference, because both will be fast if you are only dealing with a small amount of data.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:Dear Slashot by ecasociados · 2008-06-04 03:21 · Score: 0, Offtopic

Put The Message Where It Matters! WideCircles aka Wide Circles represents relevant, distributed, highly targeted and efficient internet word of mouth marketing using entertaining or informative messages that are designed to be passed along in an exponential fashion using social network mediums such as blogs, forums, wikis and so on. http://widecircles.com/
Re:Dear Slashot by ChrisA90278 · 2008-06-04 04:42 · Score: 1

That's exactly what my tests showed too. The MySQL does not scale well. If you have many concurrent clients it slows. The reason is just that it locks entire tables. PostgreSQL and others have a finer ganularity locks that ca lock rows. With the fine lock more people can get in at ones ut with my SQL they all just stand in line waiting for their turn to hold the lock.

We were hitting MySQLs limit at about 10 concurrent users but it would vary based on how active each of them are and the desiggn of the database. In out case One user was doing on order of a thousand writes per second while the others were doing periodic reads, ever minutes or so they read a few thousand rows out. Wit 10 users the thing just died.

PostgreSQL and Oracle were about the same except that PostgreSQL had some oddball cases hwere some little thing you did "wrong" could just kill perfomrance, and a siple re-write the query would fix it. Oracle was more even and scales well.
Re:Dear Slashot by crucini · 2008-06-04 05:10 · Score: 1

I was with you until this part:
If all you care about is speed, then you can get even more if you go with an embedded database like Firebird or SQLite. Or try a flat file. Those are terrifically fast if you do them right.

In my experience, SQLite is slower than MySQL, despite being in-process. SQLite is transactional, remember? And flat files will only support linear scan lookups, unless you implement your own indexing scheme. Which would make them "non-flat". Unless the table is very small, MySQL will be faster on SELECTs (by PK) than the flat file.
Re:Dear Slashot by encoderer · 2008-06-04 05:47 · Score: 0

There's a few reasons:

1. Inno provides row-level locking which is what the GP mentioned when he suggested Inno. The presumption being that any table with that many rows is going to have a lot of transactional activity.

That's not always the case. In just my small slice of the world I've been involved in 2 projects that fit that bill:

One was robotic cells in a manufacturing plant where each cell was attached to a PC that, among other things, ran a MySQL/ISAM DB that recorded a near-continuous stream of data from the robot. Anywhere between 500,000 and 1,000,000 writes per day that got uploaded to an Oracle db every night along with every other robots data stream. In this case locking was not an issue because every insert came from a single client. But speed was. Because we were writing between 5 and 10 rows per second.

The second example was a read-only reporting database.

2. There is no fine line between "large" and "small." 4bn rows is unquestionably large. Likewise, anything below, say, 500k rows is unquestionably small. But that leaves an awfully large gray area.
Re:Dear Slashot by encoderer · 2008-06-04 05:58 · Score: 0

FWIW, Inno (and other storage engines) do offer row-level locking. Only MyISAM tables are limited to table-level locking.

I'm curious.. if you're seeing 1000 t/s from a single user, what were the reasons you chose MySQL/MyISAM to begin with? (I'm genuinely interested. I'm a developer not an admin, but I do have a large set of experience with databases and datasets. My experience lies more on the capacity side than throughput. I've built systems on top of databases with billions of rows (the DNC voter history database, among others) but these systems were invariably fed millions of rows at a time by replication/dts and had very few cases of sustained transaction activity. Any given day would see millions of reads but just thousands of writes.

Very easy in nearly any modern DBMS to optimize such a system for reads. Biggest downside is the minutes needed to rebuild indices after batch-imports.
Re:Dear Slashot by dave87656 · 2008-06-04 06:09 · Score: 1

Actually, your application (lots of reads and a few writes or writes infrequently in batch mode) shouldn't cause MySQL as much trouble as a transactional environment with lots of reads and writes with multiple tables involved.

That said, your application would still probably be better under Postgresql (PG) as it would scale better to multiprocessor and multicore hardware. InnoDB for MySQL is extremely slow when adding columns to large tables and with some other admin tasks. But, Oracle is coming out with a new version for MySQL which should improve that. MySQL is coming out with Falcon next year which should improve things.

Keep in mind that Sun is targeting MySQL to Web Space applications and not high volume production applications.
Re:Dear Slashot by krow · 2008-06-04 13:37 · Score: 1

MyISAM is an analytical engine. Load data into it and then run queries. It has some limited concurrent functionality but really its a "load and query" storage engine.

In 90% of the cases I have seen, Innodb is the right pick. Most people learn this the first time they have to deal with a corrupt MyISAM table.

--
You can't grep a dead tree.
Re:Dear Slashot by TheLink · 2008-06-05 03:47 · Score: 1

Seems there are (were?) some performance issues with innodb when it comes to concurrent inserts into tables with a "autoincrement" primary key.
--
- Too many replies beneath your current threshold

Leaky abstractions by yoris · 2008-06-03 11:10 · Score: 5, Interesting

Gotta love that link between the hardware limitations and the software concepts that may seem fancy but are essentially only built to get around them. I believe someone once called it "the law of leaky abstractions" - would be interesting to see what the new limitations would be if you start combining solid-state storage with pervasive multiprocessing, i.e. what can you do with a multi-processor multi-sdd server that you can not do with a single-processor single-hard drive server?

I think TFA is pretty right on the money that parallellization and massive use of SSD could cause some pretty fundamental changes in how we approach database optimization - if I were to imagine that rack that I'm staring at being filled with SSD drives and processors instead of with nothing but hard drives... locality of data takes on a whole new meaning if you don't require data to be on the same sector of the HD, but rather want certain sets of data to be stored on storage chips located around the same processor chips to avoid having to overload your busses.

Then again, I haven't been in this game for so long, so maybe I'm overestimating the impact. Oldtimer opinion would be very welcome.

Re:Leaky abstractions by Crispy+Critters · 2008-06-03 11:57 · Score: 1

I don't think that is the way that "leaky abstractions" is generally used. Leaky abstractions occur when you need to understand the underlying technology to predict how something will act.
One example I know is in Python.
>>> a=[[5]]
>>> b=a
>>> a[0][0]=3
>>> b
[[3]]
Should changing one variable also change another? It makes sense if you think of a Python list as linked list of pointers. (It acts that way -- I know nothing about the internals of Python.) The behavior is very mysterious at face value.
Re:Leaky abstractions by tepples · 2008-06-03 13:03 · Score: 1

Should changing one variable also change another? If both variables point to the same object, yes. This is the case for Java's ArrayList and Python's list, which share similar semantics, and for Java's HashMap and Python's dict.
Re:Leaky abstractions by ZerothOfTheLaw · 2008-06-03 13:35 · Score: 1

Well, thats because python, to save memory and processing time with lists, passes by reference, not by copy. The way to do what you want is this: >>> a=[[5]] >>> b=[a[0][:]] >>> a[0][0]=3 >>> b [[5]] However, if you are going to be passing around lists, and want copies to modify, then do this: b=a[:] to do a one-level copy of the list's contents. Any lists in the list, as above, will also be passed by reference, not copy. If you do have nested lists, then take a look at the copy module.
Re:Leaky abstractions by mrbluze · 2008-06-03 13:57 · Score: 1

Oldtimer opinion would be very welcome. Get off my lawn already!

--
Do it yourself, because no one else will do it yourself. [beta blockade 10-17 Feb]
Re:Leaky abstractions by Anonymous Coward · 2008-06-03 13:58 · Score: 0

if I were to imagine that rack that I'm staring at being filled with SSD drives and processors instead of with nothing but hard drives

Welcome to Slashdot folks!
Re:Leaky abstractions by K.+S.+Kyosuke · 2008-06-04 00:44 · Score: 1

It does not change the variable. In Python, a variable is a lexically scoped named reference to a data object. That means that (lexically - by position in the source code) the same variable can point to several different objects at the same time, if you create multiple closure environments (e. g., by returning an inner function from within the enclosing function, where the inner function refers to a variable in an enclosing scope), or simply if you have several frames of the same function on the call stack, each of them has "its own variable X". And conversely, you can refer to a single data object from multiple variables (which makes a difference undesr certain circumstances, see below). But that does not matter that much, the most important thing is that each of these is just a reference to something, and the qualities of this 'something' are of importance here.

The most important thing is this: The values (real data objects) in Python can be mutable. When you introduce mutability of data objects (and even the mutability of variables - those "mere references" - in case of languages with first-class closures) into a language, your equivalence predicate essentially has to split into two. Just a single '==' (or '=' in not-quite-like-C languages) is no longer sufficient in case of mutable data. In Python, you get an 'is' operator for the "most strict" identity of objects. If 'a is b' in Python, it is required that 'a == b' also be true, but the implication does not work the other way round. If you have a=[1]; b=[1]; c=b in Python, then (a==b, a==c, b==c) returns (True, True, True) and (a is b, a is c, b is c) returns (False, False, True).

One important consequence of 'a is b' being false is that mutating a does not influence b. The other consequence of introducing mutability is that it introduces the time factor into the system, essentially breaking things like "the order of evaluation does not matter" that work in "pure math". (You can see that in certain C expessions like c[i] += a[i] * b[i++] - the result is in fact undefined - please correct me if I am wrong. Here the variables themselves are the data objects, as they are values, not references, but that is just a detail.) But is that an "abstraction leak"? I am not sure. Perhaps you should study the Chapter 3 of the marvellous SICP book (I can not recommend it enough), named Modularity, Objects, and State, and decide on your own whether it is an "abstraction leak". Maybe from the point of view of a mathematician, but there are certain benefits that it brings. (See the linked text. And no, I simply cannot see Your Average Java Programmer grokking monads any time soon, or at least not all of them.)

The very fact that the data object referred from b changes when you change a actually proves here that b is "the same as" a in the most strict meaning of "sameness" that Python has to offer. But "sameness" is a relative concept and it very much varies from language to language, and very often you find yourself switching between different concepts of "sameness" even when working within the bounds of a single language, dependening on the issue at hand. And you have to rely on the actual specification - if your language has one. :-) Only in languages like Haskell are you operating with all values immutable.

--
Ezekiel 23:20

Get ready, aim, fire! by Anonymous Coward · 2008-06-03 11:13 · Score: 1, Funny

Eagerly awaiting the gratuitous Sun trash talking! Every thread seems to devolve into Sun bashing (especially with regard to the MySQL acquisition). Everyone: Sun did not decide to close source MySQL. If anything they are pushing against it.

This IS news! by Bluesman · 2008-06-03 11:16 · Score: 5, Funny

MySQL has people who are responsible for *designing* it? I'm shocked, Shocked.

--
If moderation could change anything, it would be illegal.

Re:This IS news! by BigGerman · 2008-06-03 12:07 · Score: 1

bravo. I wish I had modpoints.
Re:This IS news! by Anonymous Coward · 2008-06-03 13:49 · Score: 0

Well, they don't seem to be well versed in database design:

"random I/O being [in]expensive" "will... cause a rethinking in.. the standard Jim Gray relational database design"

Wtf the relational algebra have to do with random I/O ? That is the absolute opposite: thanks to Jim Gray (a CJ Date, etc) work, relational database design have no dependency on the physical implementation, so SSD is a no problem for relational databases...
Re:This IS news! by gfody · 2008-06-03 15:36 · Score: 2, Informative

I'm sure he meant the 'implementation of'
Relational algebra has nothing to do with random IO however building a relational database system has everything to do with random IO because it is by and large the worst bottleneck in the system. The best performing RDBMSs are the ones completely designed around avoiding random IO. That's why TFA says a new RDBMS could be created from scratch and blow the existing players out of the water in the new SSD world.

--

bite my glorious golden ass.
Re:This IS news! by Grapedrink · 2008-06-04 01:33 · Score: 1

Amen. All students should study the MySQL source to learn what not to do when designing a database. It's only fitting it is the database of choice for PHP. Together, they form voltron, or is it Mr. Hankey?

I must give them credit, they have improved the database by many orders of magnitudes between versions. My mom said learn from your mistakes. I guess they took it a little too seriously at the start.

Re:Too small by Tablizer · 2008-06-03 11:18 · Score: 2, Funny

Solid state disks will never match with hard disks.

In 2025, this will be grouped with "640K RAM will always be more than enough." and you'll feel like such a Bill G.

--
Table-ized A.I.

Re:Too small by bobwrit · 2008-06-03 11:20 · Score: 0

You fail to mention the human factor. We maybe so accustomed to hard disks that we will get sick of them and then switch.

--
-- (this is a sig) My Computer Programming Forumhttp://www.programers.co.nr/

Re:Too small by Tablizer · 2008-06-03 11:26 · Score: 1

You fail to mention the human factor. We maybe so accustomed to hard disks that we will get sick of them and then switch.

Reliability should also be a driving force. The mechanics of hard-drives make them one of the most fragile and problematic parts in a computer. (Although RAID may dampen that somewhat, but double the cost.)

--
Table-ized A.I.

Admittedly.... by Enderandrew · 2008-06-03 11:27 · Score: 1, Interesting

I haven't read the article yet, but that summary terrifies me. I keep hearing how in the modern age we shouldn't think about optimal programming because people have more resources than they need.

Databases need to scale to disgusting large numbers. Memory and disk resources should always be treated as expensive, precious commodities, because they might be plentiful on a simple database on robust hardware, but there are plenty of people out there with massive friggin' databases.

In corporate America, Oracle and MSSQL sadly are king. MySQL has some interesting advantages, one of them is performance over MSSQL, but if they squander that, what will they be left with? And ffrankly, I don't think Sun paid a fortune for MySQL just to piss away opportunities at gaining ground in corporate America.

--
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.

Re:Admittedly.... by od05 · 2008-06-03 11:52 · Score: 1

I agree that there hasn't been enough progress in this field, just slow, incremental improvements.

I'm constantly having to compromise with the physical limitations of my database, such as having to pick and choose which attributes to delete, as the tables in my Oracle database reached it's 1000 column limit a long time ago. While these limits have forced me clean up the data and consolidate, it's a major annoyance that I shouldn't have to deal with.

MySql's 4096 column limit is enough to make me want to switch.
Re:Admittedly.... by m0rph3us0 · 2008-06-03 12:03 · Score: 1

wow.

I imagine a schema like OrderItemId1, OrderItemId2 .. OrderItem500. I have a feeling if you designed your schema, it would be much faster.
Re:Admittedly.... by njcoder · 2008-06-03 12:12 · Score: 1

I'm constantly having to compromise with the physical limitations of my database, such as having to pick and choose which attributes to delete, as the tables in my Oracle database reached it's 1000 column limit a long time ago. You do know you can create more than one table right? You don't have to normalize every schema so much :)

Seriously though, I'm curious what type of application would need and actually benefit from such a design.

--
Open Source Java DAO Generator
Re:Admittedly.... by Samah · 2008-06-03 12:33 · Score: 2

Just as a reference Re: big applications and databases, the company I work for maintains a transport management system for the largest rail freight provider in Australia. The database for this application (Oracle 10g) currently stands at 1100+ tables (some containing a few million rows) 2400+ packages, and 450+ triggers. The client application is comprised of 2100+ java source files and 500+ interactive Swing forms.
Your statement of "Databases need to scale to disgusting large numbers." is spot on. ;)

--
Homonyms are fun!
You're driving your car, but they're riding their bikes there.
Re:Admittedly.... by Enderandrew · 2008-06-03 12:46 · Score: 2

I used to work for a casino company (the largest in the world at the time, and largely is).

Our slots database didn't just track totals for rewards credits and such, we kept every transaction ever. We can see how long you played at each machine, for how long, when you put money it, when you hit, when you took money out, etc. We know when you want to come in and play, which machines, etc. Consider that for every customer in a casino. Now consider than on a scale of 60 major casinos.

I'm not a DBA, so I was only responsible for resource managing on the servers.

--
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
Re:Admittedly.... by CastrTroy · 2008-06-03 12:54 · Score: 2, Informative

I'm going along with the other two guys. I can't see what application would need more than 1000 columns in a single table. What really gets my is the MS SQL Server 2000 maximum of 8 KB ( SQL Server 7 was 2 KB) in a single row. Now there's a limitation that's badly designed. Oh, and you can define a table with 15 Varchar(8000) fields, just don't try filling every field. 1000 columns I could do just fine with (SQL Server supports 2048?) but the big killer is that you can't even use 2000 columns, because if you did, you would run out of space in the row, unless the average field size was under 4 bytes.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:Admittedly.... by CrazedWalrus · 2008-06-03 13:09 · Score: 1

?!

1000? That's "One Thousand" ?? As in ten times more than 100?

You might want to take some database design courses if you consider a 1000 column limit to be unreasonable or unworkable. The widest table I'd ever seen was one where the original programmer did a poor job of normalization (that is, he didn't do it), and even that only hit about 150 to 200 columns (times about 60 million rows. Ouch.) Not only is it difficult to maintain wide tables like that, the database will probably perform poorly as well when dealing with that table -- especially if it needs to table scan.

With proper schema design, you should never have to pick and choose which attributes to keep due to table width limitations. Also, I get the feeling that, when you say "consolidate", you mean moving several columns into a single delimited field, which is the makings of another unmaintainable mess.

If this is a database that a real company depends on, you really might want to get someone in who can refactor your data model. It'll be painful at first, but it'll save you some real trouble down the road.
Re:Admittedly.... by TheFlamingoKing · 2008-06-03 13:21 · Score: 2, Interesting

I wonder if I've been reading Slashdot too long - I can't tell whether this is a troll, a joke, a newbie, or an actual legitimate issue...
Re:Admittedly.... by od05 · 2008-06-03 13:39 · Score: 1

My database stores the behaviors of visitors to develop predicative modeling applications for a mid-sized online retailer. The data collection process is automated, and responses can fill up the tables very fast. I've found that takes a lot of data to accurately run a regression, as the behaviors of Internet users is pretty much random. In my system, each participant is a row, and each response is a column. Many people tend to revisit the site on a regular basis, and it is not uncommon at all for the click-throughs of certain users to exceed the 1000 column limit over a period of time.

I don't mind having to break the data up into chunks when it comes time to import the data into SPSS for analysis, however, I do get annoyed by having to store the data in chunks.

I think there is a high demand for scalability on disgusting proportions, I for one, would benefit from this.
Re:Admittedly.... by Anonymous Coward · 2008-06-03 14:00 · Score: 0

you do realize thats a horrible schema for storing that data. instead, you should have a People table with one entry per person, and a Response table, with one entry per response that has a FK back to the People table.

select * from Response as R where R.people = somebody;

would be how you'd collect all the responses from a single person.
Re:Admittedly.... by allanw · 2008-06-03 14:07 · Score: 2, Informative

If you find you have to create thousands of columns response_0001, response_0002, ... response_4096, then you should probably realize that there's something wrong with your schema. It's just basic database normalization. (Though I suppose you might have a reason for doing it this way. But it sounds incredibly horrible.)
Re:Admittedly.... by allanw · 2008-06-03 14:09 · Score: 1

The guy is absolutely right. Anyone designing a database schema should know this.
Re:Admittedly.... by njcoder · 2008-06-03 14:21 · Score: 2, Insightful

Yeah, like the other responders have mentioned. Sounds like you're doing it horribly wrong.

I don't know much about SPSS, but if you need to present the data in such a flat format, you can always store the data the proper way and create a view when you need to import it into SPSS.

I'm not sure if there is some sort of limit on the number of columns in Oracle views or not.

--
Open Source Java DAO Generator
Re:Admittedly.... by CastrTroy · 2008-06-03 14:24 · Score: 1

All software should be tested and optimized for performance. I ran my web app through the profiler today, and found out it was doing tons extra work, which basically accomplished nothing, and which ended up slowing things down quite a bit. It's amazing how much things can be sped up just by running a profiler. All your unit tests may pass, and everthing may appear to be working correctly, but running through a profiler can fix some major bottlenecks quite quickly.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:Admittedly.... by hemp · 2008-06-03 14:37 · Score: 1

Sounds like your web app was badly designed ( or maybe not at all ). You should have not spent any time coding superfluous functions and procedures in the first place.

--
Skip ------ See the latest from http://www.anArchyFortWorth.com
Re:Admittedly.... by CastrTroy · 2008-06-03 14:48 · Score: 1

Yeah, well back here in the real world, people have to deal with applications that are 15 years old, and have tons of legacy code, a good portion of which wasn't "designed". It's nice to talk about how things should be in and ideal project, but in real life, things are usually not ideal.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:Admittedly.... by Samah · 2008-06-03 15:49 · Score: 2, Funny

UPDATE SlotMachines SET ChanceToWinPercent = 100;
COMMIT;
try {
machine.collectMoney();
player.run();
} catch(SecurityException e) {
player.resist();
if(player.isFast())
casino.exit(0);
else
player.yield();
}

--
Homonyms are fun!
You're driving your car, but they're riding their bikes there.
Re:Admittedly.... by Alpha830RulZ · 2008-06-03 16:49 · Score: 1

I believe Edgar Dijkstra said it well: "Premature optimization is the root of all evil." Profilers exist for a reason, which is that in significant systems, it can be hard to predict where the system and users will spend all their time. It's not economic for a project to optimize everything, all the time. Nor should you - the job is to get good work done. A slow, correct routine is often just fine, for my work at least.

--
I was taught to respect my elders. The trouble is, it's getting harder and harder to find some.
Re:Admittedly.... by spatialguy · 2008-06-03 18:24 · Score: 2, Informative

A database table is not an Excel sheet with fewer limits! Have some local wizard help you in the design. And if you use postgresql or any other full featured database, you can use views to retrieve your data in a format you need for analysis.
Re:Admittedly.... by aproposofwhat · 2008-06-03 23:24 · Score: 1

finally {
casino.profit();
}
:P

--
One swallow does not a fellatrix make
Re:Admittedly.... by Yoozer · 2008-06-03 23:33 · Score: 1

, I for one, would benefit from this.
You would benefit by reading a book on database design, especially chapters on normalization. We would benefit if you'd submit your code or your model to The Daily WTF.

Here's free advice:
In my system, each participant is a row, and each response is a column
Make two tables. One with the participants. The other you call "Response", and it contains 2 columns - a field with the unique id of the participant and a field with the value of the response. You "connect" these using a join in your SQL statement.

You won't run out of anything that way, and it'll be a hundred times faster too, since you don't have to do SELECT * FROM a thousand times. Your query will look like "SELECT * FROM responses WHERE participant_id = 1234" if you want the response for a particular participant.

To untangle that ungodly mess that is in your database right now, you could either perform more SQL wizardry, but it's probably safer if you just write a script that outputs everything to a bunch of INSERT SQL queries. Less of a chance to go wrong.

Don't forget to back up.
Re:Admittedly.... by bryce4president · 2008-06-04 00:44 · Score: 1

The sad part is that sometimes incompetent people get put in charge of important things... *sorry mr. president...
Re:Admittedly.... by WuphonsReach · 2008-06-04 02:10 · Score: 1

I believe Edgar Dijkstra said it well: "Premature optimization is the root of all evil."

Pretty much spot-on. When designing / developing a system, you should work as fast as possible without doing anything stupid (like using a bubble sort). Get it working, make it faster later.

The art lies in knowing when to do which method (quick and dirty vs smart and efficient) so that you don't end up with something so slow as to be a project-killer.

--
Wolde you bothe eate your cake, and have your cake?
Re:Admittedly.... by WuphonsReach · 2008-06-04 02:29 · Score: 1

Even though I'm a firm believer in normalization, there are specific cases where you will end up with 1000+ attributes for a single object/entity. Sure, you could refactor that into a single-many join into a new table, but that can often be unnecessarily complex.

For example, an entity with a few hundred attributes, all of which are of slightly different data types (some are single byte, some are 2-byte integers, some are varchar). To create a single-many sub-table to store those attributes, there are no clean solutions. Either you setup your sub-table with multiple fields of the different data types and then make a business rule that you can only write one of the multiple fields. Or else you create multiple sub-tables.

The first method burns a lot of extra disk space (4 or 8 bytes for the primary key plus the sizes of the unused data columns), the second method is ugly and overly complex (and still burns 4-8 bytes for the primary key for each attribute).

This becomes an even bigger issue when you're doing on-off projects where each project defines the entities out with wildly different sets of attributes. So you end up simply creating multiple tables, all with the same primary key, and spread the few hundred attributes across multiple tables. Which also makes a lot of types of reporting and filtering a whole lot simpler.

For example, show me the entities who matched attribute criteria X, Y and Z. Including those where attributes A+B+C added up to 42. Excluding those with attributes Q. Which is an easy query to write if all the attributes are in the same table (or view), but a real pain if you have to do sub-selects or pivot the table.

Limitations like 2048 bytes per row or the max # of fields in a table/query are definitely a PITA at times.

In summary, there are times when having lots and lots of fields is a valid design choice (the lesser of multiple evils).

--
Wolde you bothe eate your cake, and have your cake?
Re:Admittedly.... by hackstraw · 2008-06-04 04:25 · Score: 1

..the big killer is that you can't even use 2000 columns, because if you did, you would run out of space in the row, unless the average field size was under 4 bytes.

I'm so glad I've never actually needed to put, like, data in a database. Sounds tricky.
Re:Admittedly.... by spatialguy · 2008-06-04 05:40 · Score: 1

This might be true, however only if: 1 every attribute changes for every record 2 if they are not related to each other 3 when each column can appear in a WHERE clause. And IMHO it is still bad design to have 1000+ separate unordered attributes for a single entity. I would like to see a real world example where this is really needed. When you can make a logical subdivision: PostgreSQL implements array's (also 2D), where you can store vector type data(or "multiple columns") up to 1 Gb, but more importantly it implements user defined composite datatypes: http://www.postgresql.org/docs/8.3/static/sql-createtype.html Assuming there is some logical subdivision to make in your 1000+ attributes, this can be implemented in a "few" user defined types. And another method in postgresql would be to use inherited tables, where the attributes that are the same for every record are inherited and the attributes that are specific per project are added to the child table for that project. And yes, sub-queries can be difficult, but wading through a 1000+ attributes in a single table and maintaining integrity constraints are as well.
Re:Admittedly.... by Samah · 2008-06-04 11:41 · Score: 1

...I so wish I'd thought of that!
:)

--
Homonyms are fun!
You're driving your car, but they're riding their bikes there.

Re:Help with MYSQL! by Anonymous Coward · 2008-06-03 11:32 · Score: 0

mysql-admin
mysql-query-browser

$ mysql -u -p

Re:Too small by Anonymous Coward · 2008-06-03 11:33 · Score: 3, Interesting

It's more accurate to say that there will probably always be a tradeoff between slow and fast storage, there will probably always be a tradeoff between permanent and temporary storage, and there will probably always be a tradeoff between expensive and cheap storage.

In 20 years, I do not know what form slow, or cheap, or permanent storage may take. It may not be spinning magnetized platters. But I do know that in 20 years, every well-written database will have algorithms and data structures to deal with slow storage, permanent storage, and cheap storage.

BigTable, HBase and SimpleDB are the future by atomic777 · 2008-06-03 11:33 · Score: 3, Informative

I recently blogged on this, but essentially, as long as your average PHP developer thinks of MySQL as a glorified flat file system to place their serialized PHP objects, an always-available, pay-as-you-go distributed database is going to revolutionize application development in the coming years. For those that want to keep control of their data, HBase is coming along quite nicely.

Can I ask a stupid question... by Enderandrew · 2008-06-03 11:34 · Score: 2, Interesting

I'm actually reading the article now, and as he is talking about design for a database taking multiple cores into consideration, etc, I'm wondering if the traditional lock approach used in MySQL (and most SQL databases as far as I know) somewhat kills parallel operations. Wouldn't the interbase approach work better in a parallel environment?

Again, I'm sure this is a stupid question, but perhaps someone could clue me in.

--
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.

Re:Can I ask a stupid question... by Anonymous Coward · 2008-06-03 11:43 · Score: 0

When you can get groups of record (not tables) to be their own lockable object(s), with dedicated processor(s), disk(s), you'll end up with the endgame in functional databases that is only limited by throughput. Just a matter of time.
Re:Can I ask a stupid question... by Johnno74 · 2008-06-03 11:58 · Score: 3, Informative

Umm I'd say you have it wrong - "Traditional" databases have many different lock granularities, such as Table locks, page locks and row locks. SQL server and Oracle certainly do this.

MySQL only does table locks, which are much simpler and much faster for light workloads, but as I'm sure you can imagine when you have many CPUs trying to update the table at once in the end each thread has to wait their turn to grab the lock and perform their updates sequentially.

In SQL Server, Oracle, or any other "enterprisey" db multiple threads can update the same table at exactly the same time, as long as its not the same row.

Stuff like this is exactly why people who use MS-SQL and oracle look down their nose at people who use MySQL and claim it is capable of playing with the big boys.

Once again, despite what MySQL are saying there is nothing innovative here. All this stuff has existed in the mainstream database engines for many, many years and they are still playing catchup.
Re:Can I ask a stupid question... by kestasjk · 2008-06-03 12:03 · Score: 1

I'm wondering if the traditional lock approach used in MySQL (and most SQL databases as far as I know) somewhat kills parallel operations I'm not sure what you mean, fine-grained locking is up to the engine, and MySQL can't really do it.
InnoDB is the classic example; you can get MySQL to lock entire InnoDB (or MyISAM) tables, but if you want to lock a certain row in an InnoDB table MySQL can't help, because it doesn't know about InnoDB.

If you're saying "the MyISAM/MySQL table-based locking won't work in a parallel environment", then you're right, but that's nothing new that relates SSDs and micro-cores.

Fully transactional engines have to be very good at being parallelizable. In databases things had to be "parallel" even before multi-core was around, because being able to interact in more ways with the database at a time means better performance.
You don't need two CPUs to have multiple clients trying to write to the same data without treading on each others' toes.

Databases are inherently scalable and parallelizable, so I think more low-latency memory and more CPUs will have a big effect.
But I could be wrong, I only deal with databases on the other side of the SQL parser.

--
// MD_Update(&m,buf,j);
Re:Can I ask a stupid question... by Enderandrew · 2008-06-03 12:13 · Score: 1

As far as I understand Interbase (and now Firebird) use a versioning approach as opposed to locking. I wonder if this approach is more beneficial with parallel processing, but you're right in that the concern was already relevant given multiple users in a database at once.

I wonder which one has better performance for multiple operations at once.

--
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
Re:Can I ask a stupid question... by XanC · 2008-06-03 12:24 · Score: 3, Informative

What you say is true for MyISAM tables, but MySQL's InnoDB tables fully support row-level locking. And I believe their BDB tables support page-level locking.
Re:Can I ask a stupid question... by anarxia · 2008-06-03 12:34 · Score: 2, Informative

It is called MVCC. Other databases such as oracle and postgres also use this approach. MVCC has its pros and cons. It allows for higher concurrency, but it might require extra copies of data and that translates to more memory and disk space. On a "weak" server it might actually end up being less concurrent.
Re:Can I ask a stupid question... by merreborn · 2008-06-03 13:01 · Score: 1

I'm wondering if the traditional lock approach used in MySQL (and most SQL databases as far as I know) somewhat kills parallel operations. Wouldn't the interbase approach work better in a parallel environment?
Both interbase and MySQL's InnoDB engine use the concept of Multi Version Concurrency Control (MVCC) to address the issue of simultaneous access -- although their implemenations differ: InnoDB does make use of row-level write locks. The upcoming Falcon engine is said to do the same with even fewer locks than InnoDB.

All should parallelize fairly well -- it's the table-level locks of MySQL's default MyISAM engine that really kill simultaneous access.
Re:Can I ask a stupid question... by Anonymous Coward · 2008-06-03 13:02 · Score: 1, Interesting

InnoDB uses MVCC as well. As storage goes, InnoDB is perfectly serviceable. It's just the rest of the DB engine around it that's out of whack.
Re:Can I ask a stupid question... by Johnno74 · 2008-06-03 14:16 · Score: 1

Cheers for the info, I thought it still did table locks only.

Don't get me wrong, I have nothing against MySQL... It has its place, and it does seem to be improving. What does make me grind my teeth is when people say its a serious competitor to MS SQL, or oracle. The typical argument is all the other things that these DBs give you (like proper ACID transactions, distributed transactions, fine-grained security, triggers, functions, etc etc) are just fluff and you don't need them anyway....
Re:Can I ask a stupid question... by XanC · 2008-06-03 14:43 · Score: 1

MySQL 5 may be worth a look. I'm not saying it's at the level of the other databases, but it does now support I believe all the features you mentioned. And if you set strict SQL mode, a lot of the other gripes about data handling go away as well.
Re:Can I ask a stupid question... by Anonymous Coward · 2008-06-03 20:50 · Score: 0

"MySQL only does table locks"

Why do people answer when they don't know what they are talking about?

It's just stupid and lazy in today's world where a quick search on google for "mysql row locking" brings up this page:
http://dev.mysql.com/doc/refman/5.0/en/internal-locking.html

Was that too hard?
Re:Can I ask a stupid question... by Anonymous Coward · 2008-06-03 22:08 · Score: 0

Also PostgreSQL has got row-level locking since forever.
Re:Can I ask a stupid question... by Anonymous Coward · 2008-06-04 00:29 · Score: 0

Cheers for the info, I thought it still did table locks only How about doing simple research?

It's a sad state of affairs when you get marked "Score:4, Informative", when you are giving out incorrect information that could quite easily be corrected with a simple google search.
What does make me grind my teeth is when people say its a serious competitor to MS SQL, or oracle. What about lazy uninformed people diluting the helpful knowledge on the net by about subjects they don't know enough about?
Re:Can I ask a stupid question... by Anonymous Coward · 2008-06-04 02:55 · Score: 0

MySQL only does table locks Um, not according to http://dev.mysql.com/doc/refman/5.0/en/internal-locking.html which says
MySQL uses table-level locking for MyISAM and MEMORY tables, page-level locking for BDB tables, and row-level locking for InnoDB tables.
Re:Can I ask a stupid question... by Anonymous Coward · 2008-06-04 05:23 · Score: 0

I'm actually reading the article now, and as he is talking about design for a database taking multiple cores into consideration, etc, I'm wondering if the traditional lock approach used in MySQL (and most SQL databases as far as I know) somewhat kills parallel operations. That isn't actually the issue with multiple cores.

The traditional locking/MVCC etc things you hear about are about processing multiple queries in parallel, e.g. multiple connections to the database each doing their own thing at the same time.

The issue with multiple cores is in making good use of them for CPU-bound tasks, such as spreading the sort operation of a single query across several cores so that it finishes faster -- essentially the same type of multithreading problem faced by everyone else in the programming field.

Some databases address do such "parallel query processing" already, but most of the focus has been on spreading not just across CPUs, but across disks or entire servers. There's still plenty of room to grow for all the players in a multi-core SSD arena.
Re:Can I ask a stupid question... by Tim4444 · 2008-06-04 05:29 · Score: 1

I have to agree. A lot of those features have been present for some time now.

Just some fun trivia I saw on MySQL.com:

InnoDB is used in production at numerous large database sites requiring high performance. The famous Internet news site Slashdot.org runs on InnoDB...
It's true that MySQL isn't getting used for a lot of the large mission critical applications. Slashdot's great, but I think it would be a stronger performance endorsement if they could say that the banking systems, or healthcare systems, or even Walmart's datacenter were using MySQL. Unfortunately, I think a lot of the large corporations overlook MySQL, just as they overlook Linux, for political reasons. They need to have solid support agreements (ie. a strong company they can sue) in case something goes wrong. For this reason, MySQL doesn't even get included in their performance metrics. Maybe that will change with Sun standing behind it. (so long as Sun doesn't rewrite it in Java : )

Re:Help with MYSQL! by Anonymous Coward · 2008-06-03 11:35 · Score: 0

MYSQL is a seriously flawed product in my opinion.

Download Tortoise SVN, then try to check something out and claim that SVN is seriously flawed because it doesn't work out of the box. Please LEARN until you understand what you are doing before trying to place criticism.

I start the administrative console.

I dont think you started what you think you did.

http://www.mysql.com/products/tools/administrator/

Cores? by bar-agent · 2008-06-03 11:40 · Score: 1

In the article, they say the multiple cores will have a big impact on the design of databases. I don't understand this. Databases are already threaded, are they not? Why does multi-core support require different code than multi-thread support?

--
i'd hit it so hard, if you pulled me out you'd be the king of britain [bash.org]

Re:Cores? by eggfoolr · 2008-06-03 12:07 · Score: 1

Most DB's are sudo threaded. They run multiple single threaded processes and do have issues scaling. To be fair, it is also the application design that limits scalability, not just the database.

For a very simple example, you could choose to traverse a table sequentially, or you could split it into smaller chunks and perform multiple operations in parallel.

The problem on a single or small number of processors is that if you make your application highly threaded, you will generate a huge amount of context switches.

If you get really threaded then have a look at Sun's T2 processor. That's the future!
Re:Cores? by bersl2 · 2008-06-03 13:04 · Score: 4, Insightful

"sudo" is that command which grants one user authorization to act as another user.

"pseudo-" is that verbal prefix which means "false".

I'm seeing language devolve in front of my eyes...
Re:Cores? by blckholehorizon · 2008-06-03 13:15 · Score: 1

... and i though sudo was for mind control... http://xkcd.com/149/

--
my UID is Prime. It makes me special.
Re:Cores? by Anonymous Coward · 2008-06-03 13:26 · Score: 0

> Most DB's are sudo threaded.

OK, name one DB that has to run as root? MySQL, PostgreSQL, Microsoft SQL, and DB2 do not. Back-up your claim.
Re:Cores? by eggfoolr · 2008-06-03 13:30 · Score: 1

hahaha. It was my keyboard. I had nothing to do with it. It just types what it assumes I think.
Re:Cores? by eggfoolr · 2008-06-03 13:32 · Score: 1

Ummmm, Oracle. In order for it to run DISM on Solaris it needs setuid.

Bloody hell, I'm letting my real identity escape now!
Re:Cores? by MBCook · 2008-06-03 13:34 · Score: 1

Right. MySQL, at least under Linux, just starts to fall flat on it's face if it's given more than 4 cores. That's the case for 5.0.x, and I think it is still true of 5.1.x (but that's still beta anyway). I seem to remember the situation being better under Solaris, but it's not fixed entirely.
Oracle, DBII, and (probably to a lesser extent) PostgreSQL can handle many more processors.
Just like any other software, multithreading is hard. But where there are times in some programs (say games) where stale data may be OK (i.e. for drawing a frame) or a double update may be tolerable (setting something to be dead twice is probably pretty harmless) when you are running a multi-billion dollar system those kind of errors are absolutely unacceptable. You just have to be far far more careful.
You're right that in some circumstances this isn't that bad. In some cases each processor could almost "own" a table and that might work. But when you have one giant table and a couple of small ones, you just have to share the giant table and lock on individual rows, sections, partitions, whatever.

--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
Re:Cores? by felix9x · 2008-06-03 18:00 · Score: 1

As far as I understand it, its not enough just to create many threads. The threads have to cooperate with each other and work on common data. How to do this effectively is the hard part. To make an analogy, in recent versions of java some new hashtable implementations where introduced that allowed many thread to access the data concurrently without locking. It may be complex stuff to us who don't understand how the stuff works under the hood but to an application developer it just works and easy to use.

Re:Too small by bluefoxlucid · 2008-06-03 11:48 · Score: 1

RAID 5 against (n) disks. Given non-failure probability (p) of one disk in a given time window, we have sqrt(p^n). So for 3 disks at a point in time where p=0.5 (0 p 1 in all cases, any number will illustrate the effects), 35% chance that the whole array still works. 5 disks, 17%. 2 disks (not doable), 50%.

Point: RAID5 doesn't statistically increase reliability. I know someone who worked for a data recovery firm, lots of people swapped the drives the moment they failed and had a second go while the first hot spare (i.e. was already in there for auto-activation on one disk failure) was rebuilding.

Mirroring is okay but gets expensive. For two storage devices (disks, or RAID5 arrays) p=50%, 1/Sqrt(2) is your failure probability of the whole array. And the two arrays likely don't share identical data layouts; though a specific RAID10 might, who knows.

--
Support my political activism on Patreon.

The furture of databases you say? by Anonymous Coward · 2008-06-03 11:52 · Score: 0

2 words....SQL Server

Oracle is going the way of UNIX, and by UNIX I mean HPUX, Solaris, & AIX. In 10 Years you'll see the trend towards MS SQL Server for all but the top-tier business critical apps. MySQL and friends will likely remain a popular choice for meta-repositories and web tech solutions. The dirty secret is that tons of Oracle 8 & 9 mid-range systems are getting replaced with SQL Server every day as they get technology refreshes. You really can't argue the economics, and Oracle, like legacy big-iron won't change fast enough to turn the tide.

Flame on -

OCP Certified Oracle DBA

Re:The furture of databases you say? by njcoder · 2008-06-03 12:34 · Score: 3, Interesting

If I was migrating away from Oracle, MS SQL Server wouldn't be my first choice. Postgresql would. Given the choice between a free version that is similar to the original vs a product that is very different that I need to pay for it's a no brainer. Also take into consideration that for some database applications you're going to need some serious horsepower. You're limited in the number of procs you can have in a Windows system. Last time I checked, once you get past 8 processors Windows doesn't scale as well. Even linux doesn't do as well as Solaris, AIX or HPUX past a certain number of procs.

Oracle's RAC seems to be a better solution than MSSQL's approach. PostgreSQL (and EnterpriseDB) are working on a more RAC-like approach.

This is a good story about a company that successfully moved from Oracle to Postgresql. Basically, they had 2 database systems running Oracle, a data warehouse and an OLTP system. They moved their data warehouse over to Postgresql running on Solaris 10, then they used the licenses they no longer need for the data warehouse to boost the computing power of the OLTP system.

--
Open Source Java DAO Generator
Re:The furture of databases you say? by foniksonik · 2008-06-04 03:01 · Score: 1

Maybe the article should more clearly say "The future of databases in regards to web application development"?

Then you have to think about how many mid-range client/server desktop applications are being replaced with client/server web applications every day as they get technology refreshes...

There are more and more *relatively* cheap web developers getting hired to replace traditional desktop app developers for non-critical business applications.

SO there is a disconnect here with your statements in general... yes there may be more MSSQL server installs coming in to replace Oracle, etc. for desktop apps... but there are even more MySQL/Postgres/SQLite web apps coming in to replace all those desktop apps.... so Oracle is losing but I wouldn't say that MSSQL is the replacement, it's just one of many possible replacements, depending on what HR resources a company has invested in. If they are still heavy with desktop app developers or newly acquired/trained .NET developers then they will go with MSSQL. If they have an investment in Linux or Unix developers then they may be moving to something different.

--
A fool throws a stone into a well and a thousand sages can not remove it.

MonetDB by Frans+Faase · 2008-06-03 11:55 · Score: 1

It almost sounds like he is talking about MonetDB.

Re:MonetDB by alxtoth · 2008-06-03 20:04 · Score: 1

Nice to hear others use MonetDB too! But please please understand, the vertical databases like Monet are not for transaction processing, but for "analytics" = big group by, big joins .. -Alex

--
http://revj.sourceforge.net

Don't have to wait 10 years by VampireByte · 2008-06-03 12:10 · Score: 1

I think it's happening already. I develop in both Oracle & SQL Server environments and for years organizations I've encountered used Oracle for one purpose and SQL Server for another. In the last two years however, many of those companies are interested in totally replacing their Oracle databases with SQL Server.

As far as MySQL, after doing a lot of development with it over several years during the late dotcom/early bust days, I haven't worked with it lately because there hasn't been much demand for it. It'll be interesting to see if that changes with Sun's involvement.

--

Run and catch, run and catch, the lamb is caught in the blackberry patch.

Re:Too small by dgatwood · 2008-06-03 12:16 · Score: 2, Insightful

Worse than that. Part of that is the fact that RAID vendors tell people to go with drives from the same manufacturer to optimize performance, when in reality, they should be doing with the most diverse pool of drives possible, dramatically reducing the probability of multiple catastrophic failures at the same time (except through human error, some sort of physical impact, electrical surge, etc.). If a drive has a design defect, it isn't at all uncommon for them all to fail at n+/-k hours for some relatively small value of k. If all your drives are the same model from the same batch, your probability of losing the entire array is suddenly remarkably close to 100%. RAID with the same model of drive dramatically reduces reliability in the name of performance.

Even mirroring is basically useless for reliability if the drives are from the same vendor unless you swap out the mirrored drive daily so that no single backup has been operating more than half as long as the main drive. RAID is a nice idea in theory, but the reality is that with what seems to be a rapid decline in hard drive reliability over the past few years, in practice, performance notwithstanding, RAID just raises your power bill and guarantees you have two drives to ship back to the manufacturer for replacement instead of one.

As an aside, I had two personal hard drives die in the past week (and four within the last year). When you consider that I only have about 5 drives in regular use, that's alarming. Both of these drives were under a year old. One was a Seagate 500 GB drive in continuous operation in a heavily cooled tower (sides off the tower, four fans blowing outside air straight across the drive). Came into the house and it sounded like someone was using a radial arm saw. Cloned off enough data before the drive stopped reading any blocks at all, so my MythTV box is up and running again. That lasted about 9 months. The other was a Seagate 160 GB 5400 RPM laptop drive. Lasted 11 months and suddenly went into click-of-death mode where no data was accessible from the drive. It's going to be a long time and a lot of therapy before I'll ever trust anything important to a Winchester drive again... and Seagate went in a single week from my "high reliability, buy over all other vendors" list to my "not in a million years" list.

--

Check out my sci-fi/humor trilogy at PatriotsBooks.

Re:Too small by Anonymous Coward · 2008-06-03 12:24 · Score: 2, Informative

Except that Bill Gates never said that. Bluefoxlucid did.

I'm sure he'll feel lots worse. While Gates gets hounded for something he never said, at least he has mountains and mountains of cash to console him.

Re:Too small by Anonymous Coward · 2008-06-03 12:27 · Score: 0

err.. "they should be doing it with". My bad.

Remember kids. by Anonymous Coward · 2008-06-03 12:48 · Score: 0

Remember kids this is the same guy who said that ACID wasn't important and the emulating foreign keys was as good as having support for them in the database. Face facts - he will do / say what ever it takes to get you to use / buy his product. History shows this to be true.

nothing new : move along by mojorisin67_71 · 2008-06-03 12:49 · Score: 1

Main-memory databases in the early 90s were figured out the issues when all pages can be accessed in constant time. A solid state
disk characteristics is no different from main memory.

See http://www.bell-labs.com/project/dali/ for one of such projects.

Re:Too small by CastrTroy · 2008-06-03 13:03 · Score: 1

Personally, I've had way less problems with hard disks than I have had with RAM chips, CPUs, video cards, motherboards, network cards. I've only ever had 1 hard drive die. I've had countless of the other devices die. Hard drives have very simple circuits, so it's very unlikely that the chips will overheat, or have any capacitors and such blow. The only thing left to fail is the mechanics. Since they are all sealed in a dust free chamber, the only thing to kill it is heat. Keep you hard disks properly cooled, and you really shouldn't have many problems with them. I really don't understand why people say hard drives are so problematic. Most computer problems I've experienced are due to faulty solid state hardware.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.

Brian's Blog by apachetoolbox · 2008-06-03 13:04 · Score: 1

http://tangent.org/

Sup Brian!

-Bryan (from apachetoolbox)

Re:Brian's Blog by Anonymous Coward · 2008-06-03 14:40 · Score: 0

Heh.

I lived in Brian's basement for a few weeks when I first moved to Seattle.

Sadly, none of his brainpower settled into his basement and into my brain. o.O

(Even if some of you folks don't like MySQL and/or some of its features, you gotta admit Brian's done more than most of us...)

Re:Too small by Anonymous Coward · 2008-06-03 13:18 · Score: 0

Your Myth drive was probably killed by the shit you were recording off fox.

Jus sayin

self-fulfilling predictions? by BigGerman · 2008-06-03 13:36 · Score: 1

If one's product does not support advanced features or, sometimes, even basic common sense features, no wonder one's predictions call for the world that does not need those features.

Locality is the key by Dave500 · 2008-06-03 13:55 · Score: 5, Interesting

In my mind as a database engineer for a wall street bank, the biggest change in the near term that we forsee is data locality.

Given the amount of computing power on hand today, it may surprise many how difficult it is to engineer a system capable of executing more than a few thousand transactions per second per thread.

Why? Latency. Consider your average SOA application which reaches out to 4-5 remote services or dataserver calls to execute its task. Each network/rpc/soap/whatever call has a latency cost of anything between one and at worst several hundred milliseconds. Lets say for example that the total latency for all the calls necessary is 10 milliseconds. 1000/10=100 transactions per thread per second. Oh dear.

The amount of memory an "average" server ships with today is in the 32-64GB range. Next year it will be in the 64-128GB range. The average size of an OLTP database is 60-80GB.

So, the amount of memory available to the application tier will very soon be greater than the size of the database, warehouses excluded. Moore's law is quickly going to give the application tier far more memory than it needs to solve the average business state, exceptions noted.

The final fact in the puzzle is that for transaction processing, read operations outnumber write operations by roughly 20 to 1. (This will of course vary on the system, but that *is* the average.)

This situation is strongly in favor in migrating read only data caches back into the application tier, and only paying for the network hop when writes are done in the interests of safety. (There is a lot of research into how writes can be done safely asynchronously at the moment, but its not ready yet IMHO.)

Challenges exist in terms of efficient data access and manipulation when caches are large, performant garbage collection and upset recovery - but they are all solvable with care.

Its my opinion that in the near future large data caches in the application tier will become the norm. What has to be worked out is the most effective way of accessing, manipulating and administering that data tier and dealing with all the inevitable caveats of asynchronous data flow.

Some (not complete) examples of implementing this:

Relational Caches (there are many more):
http://www.oracle.com/technology/products/coherence/coherencedatagrid/coherence_for_java.html
http://www.alachisoft.com/ncache/index.html

Object Caches:
http://www.ogf.org/OGF21/materials/970/GigaSpaces_DataGrid_OGF_Oct07.ppt
http://jakarta.apache.org/jcs/

Re:Locality is the key by behindthewall · 2008-06-03 14:23 · Score: 0, Redundant

Interesting. Thanks.
Re:Locality is the key by eekygeeky · 2008-06-03 15:10 · Score: 1

So the future is giant databases running in RAM, backed up to nearby RAM and only written to disk on schedule or in extremis? sounds... kind of cyberspace?
Re:Locality is the key by Dave500 · 2008-06-03 15:19 · Score: 1

You are right - it does.

The best methodology on how to efficiently and robustly manipulate large data sets in the application tier is still up for debate. In memory databases are but one potential solution to this requirement - there are many others. The javaspaces/gigaspace grid style solution is another.

Every current approach to this issue at present has is pros and cons - that's the price for being on the bleeding edge I guess.

What cannot be ignored is that we are on the edge of a potential paradigm shift - one powered by the fact that available system memory is about to surpass the size of the average state of a complex system. How this will effect our application design is still very much up for grabs, but be assured it will.
Re:Locality is the key by ppanon · 2008-06-03 15:31 · Score: 4, Interesting

Interesting ideas, but it would seem that, once your application tier is spread over multiple servers that don't share a memory space, you are going to have significant distributed cache coherency issues. While I can understand the desire to avoid the marshalling overhead involved in database reads and updates, you're also going to have to reinvent the wheel of distributed concurrency control for each application when it's already been solved in a general way in the clustered database.

For instance, from the JCS link you provided:
JCS is not a transactional distribution mechanism. Transactional distributed caches are not scalable. JCS is a cache not a database. The distribution mechanisms provided by JCS can scale into the tens of servers. In a well-designed service oriented architecture, JCS can be used in a high demand service with numerous nodes. This would not be possible if the distribution mechanism were transactional.

So if you're having to give up transactional integrity to have your distributed cache, I think it's going to have limited applications because it doesn't solve that 1000 transactions per thread problem you indicated. Sure you can work your way around it a little by treating it as extremely optimistic locking to maintain transactional integrity on writes, but it also does limit the accuracy of the cache and for some applications (financial for starters, I would expect) that's going to be an issue.

--
Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire
Re:Locality is the key by Dave500 · 2008-06-03 15:49 · Score: 2, Interesting

Extremely valid point.

Not to bash Oracle, but the ultimate scalability of their multi-host database partitioning solution (RAC) is indeed limited by the amount of communication the distributed lock manager needs to make to ensure transactional isolation as the number of partitions/hosts increase. (Caveat to Oracle fans - 80% of requirements are beneath this threshold - so I understand Oracle's strategy.) (An alternative solution is the "shared nothing" partitioning approach (example - db2's DPF) but this has its own drawbacks too.)

I don't pretend for a second to know all the answers - indeed I suspect that some of them are yet to be invented/utilized effectively by industry.

My major point is that having distributed application side data caches will soon become very tempting in terms of the latency involved with accessing data. There are admittedly great challenges involved with doing this safely, in a way which is scalable as you point out and providing a productive application interface.

It will be very interesting over the next few years as we collectively work out the best approach to these requirements. Anybody can be wrong - me of all people - but my bet is that most of these problems will be solved. How they will be is the coolest part :) .
Re:Locality is the key by atomic777 · 2008-06-03 18:15 · Score: 1

While memory sizes may be increasing such that available memory is not too much smaller than a typical OLTP database, there is always a computational Jevon's paradox at work. Most applications these days could run just fine if I revoked the delete privilege.
eg. the transactions involving relatively large (> $10, let's say) sums of money in systems of 10 years ago have given way to databases that keep track of granular clicks and impressions. $10 "worth" of data might be a record of over 10,000 pageviews.
In 5-10 years, as computational and storage capabilities increase, we may be tracking even more granular things, like, say, specific, movements through much more interactive applications generating massive volumes of data by today's standards. By the time that happens, we may think of magnetic disks much as we look at tape drives these days and the database and application landscape will look much different.
Re:Locality is the key by Anonymous Coward · 2008-06-04 00:44 · Score: 0

JBoss Cache
Re:Locality is the key by Anonymous Coward · 2008-06-04 01:47 · Score: 0

It seems to me like a better way to solve this guy's latency problem (although 10ms sounds a little high for a really fast interconnect) is to pipeline requests, not worry about the round trip time. Then you're only limited by throughput, not latency, and we've got plenty of throughput.

I think the real problem is that current database drivers are oriented towards a very basic request-response model that doesn't make this easy. Perhaps instead of adding the complexity of a non-blocking interface to the database driver API, however, we should simply accept that we need a massive number of threads at the application level to feed the database. If the application tier hardware is really so powerful, this shouldn't be an issue.
Re:Locality is the key by ppanon · 2008-06-04 08:36 · Score: 1

Yeah, I had some thoughts running that way as well. I was also thinking about pipelined asynchronous queries. I think people are trying to avoid retraining the application developer populace to know how to deal with asynchronous (and concurrent) processing because it's much more error prone and many programmers aren't up to the task. People that can do that are going to be more expensive and that's hard to sell to business.

But heck, if you're really running an application with a workload that needs to scale that much, then you can bloody well pay more for somebody that can handle that complexity. If not, then you're doing the large scale equivalent of hiring your neighbour's pre-teen kid at $8/hour to run your small office network because he worked out well for your parent's Windows PC. Sometimes, you need to pay more to get what you need.

--
Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire

Re:Too small by Anonymous Coward · 2008-06-03 14:10 · Score: 0

RAID 5 against (n) disks. Given non-failure probability (p) of one disk in a given time window, we have sqrt(p^n). So for 3 disks at a point in time where p=0.5 (0 p 1 in all cases, any number will illustrate the effects), 35% chance that the whole array still works. 5 disks, 17%. 2 disks (not doable), 50%. Wrong.

Point: RAID5 doesn't statistically increase reliability. I know someone who worked for a data recovery firm, lots of people swapped the drives the moment they failed and had a second go while the first hot spare (i.e. was already in there for auto-activation on one disk failure) was rebuilding. Selection bias anyone?

Mirroring is okay but gets expensive. For two storage devices (disks, or RAID5 arrays) p=50%, 1/Sqrt(2) is your failure probability of the whole array. And the two arrays likely don't share identical data layouts; though a specific RAID10 might, who knows. Wrong again, Statistics 101.

Everyone knows which database is the best! by csoto · 2008-06-03 14:26 · Score: 1

PC-File (from good 'ol Jim "Button" Knopf of Buttonware). That and PC-Write (Bob Wallace, Quicksoft) are all everyone ever needed. That and 640K. Where's my box of floppies?

--
There exists no way of exchanging information without making judgments. --Bene Gesserit Axiom

Moores Law vs. Micromechanic Law by Crass+Spektakel · 2008-06-03 14:53 · Score: 1

This is just another case where Moores Law bypasses another Law, in this case the Micromechanical Law.

Microchips are developing faster than Micromechanic so by leaving behind harddrives we might see flashbased storage become bigger and bigger.

But even more interesting, I think we will see SATA-attached flash being a short lived exception. It is much cheaper, faster, more flexible and more direct to attach it directly into address space (doesn't matter if NAND or NOR, the memory controller will take care of that). With that done we will also see DRAM becoming some sort of cache for flash and flash replacing DRAM. And as memory and inner circuit bandwidth will increase even further there really is no reason to safe bandwidth or storage any more - who cares if your 100GB database uses 200GB because you deflate/multi-index it for faster access times? The only limit here is that 64bit address space may be a little narrow but then we just see 128bit architectures ten years earlier.

Ah yes, the future of SQL... well, I think a small SQL-to-DBM interface should do the trick until all applications map the whole db into real address space. There is little use for a overcomplicated interpreting language if you can just put your whole 200GB db into an associative array.

--
"Life is short and in most cases it ends with death." Sir Sinclair

Solid state storage devices are more than disks. by Animats · 2008-06-03 15:09 · Score: 4, Interesting

Until recently, solid state storage devices have been treated as "disks". But they're not disks. They have orders of magnitude less latency.

For files, this doesn't matter all that much. For databases, it changes everything. Solid state devices need new access mechanisms; I/O based seek/read/write is oriented towards big blocks and long latencies. The optimal access size for solid state storage devices is much smaller, more like the size of a cache line. With smaller, lower latency accesses, you can do more of them, instead of wasting channel bandwidth reading big blocks to get some small index item. It's not RAM, though; these devices usually aren't truly random access.

It starts to make sense to put more lookup-type functions out in the disk, since getting the data into the CPU has become the bottleneck. Search functions in the disk controller went out of fashion decades ago, but it may be time to bring them back. It may make sense to put some of the lower-level database functions in the disk controller, at the level of "query with key, get back record". Cacheing at the disk controller definitely makes sense, and it will be more effective if it's for units smaller than traditional "disk blocks"

This could be the beginning of the end of the UNIX "everything is a stream of bytes" model of data storage. We may see the "database is below the file system" model, which has appeared a few times in mainframes, make it to ordinary servers.

Re:Too small by gfody · 2008-06-03 15:23 · Score: 1

Volatile, nonvolatile, expensive, cheap are all hardware concepts not applicable to software. Applications in the future will just use 'memory' and assume it's persistent and as fast as possible.

The only reason applications are forced to account for slow storage is because hard disk drives have been our biggest bottleneck for 50 years!

--

bite my glorious golden ass.

Well... by Anonymous Coward · 2008-06-03 15:40 · Score: 0

Considering SSD drives won't be mainstream for a few years on production servers for critical data storage, that a shift from relational to "something else" won't happen overnight due to inertia and business conservatism, that this "something else" system of storage hasn't been developed yet, wake me up in 10 years and I will look into it then, and I probably still will be an early adopter.

Re:Solid state storage devices are more than disks by Anonymous Coward · 2008-06-03 15:54 · Score: 0

I wouldn't count on it. The main problem is the tension between standardized, interoperable, commodity physical devices and the ever-changing needs of the application software. The byte-stream or block-based I/O models won because they are painfully general purpose!

Instead, consider "the system is a cluster"... your local processing will occur finally when the full OS and application stack can migrate tasks onto the storage-local coprocessors and do message-passing over the network that used to be the "system bus".

Re:Solid state storage devices are more than disks by gfody · 2008-06-03 16:05 · Score: 1

SSD hasn't changed the whole seek/read/write paradigm because the current batch are tailored to be retrofit to existing HBA/RAID storage systems.
Once the IO drive is here we'll have a true random access to nonvolatile storage with latency measured in nanoseconds.

--

bite my glorious golden ass.

Re:Too small by dave87656 · 2008-06-03 16:51 · Score: 1

Once you've lost all your data to a disk failure, SSD starts looking pretty nice.

Nope. by HornWumpus · 2008-06-03 18:13 · Score: 1

IIRC that's already the way AS-400's try to work.

It never works out in practice.

You sometimes (often) need to manage and understand your memory usage.

If you think that won't be true in the future implicit in your thought is that the problems being solved (or at least crunched on) will not keep up with hardware in complexity.

--
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'

Re:Nope. by gfody · 2008-06-03 19:17 · Score: 1

Never say never - the reason existing storage abstractions fail is because the ridiculously bad performance of random IO on a mechanical disk drive. Hiding that bottleneck from the developer is a bad idea especially in retrospect when you consider that it's a bottleneck that's been with us for over 50 years.
If you think that won't be true in the future implicit in your thought is that the problems being solved (or at least crunched on) will not keep up with hardware in complexity. I could not parse that sentence, come again?

--

bite my glorious golden ass.
Re:Nope. by HornWumpus · 2008-06-04 11:31 · Score: 1

If you think that won't be true in the future, implicit in your thought is that the problems being solved (or at least crunched on) will not keep up with hardware in complexity.
Sorry missed a comma. Perhaps that should be a simicolon? Who cares?
What I'm failing to get across is that people have thought like you before and been wrong. 'In the future computer resources will be free and virtually limitless' has proven to be almost as wrong as '640K should be enough'.
New things to do with computers will be discovered/built that push the limits of hardware. Future programmers will have to live in 1TB of RAM and be aware the new isn't free (even if they have a 1000TB flash storage for VM). The data touched by the low tight loops had better be in RAM or the thing will crawl.
Think Holodeck or something similar. Imagine trying to model a human brain cell by cell, synapse by synapse. That brain.braincells.tick event better execute in RAM, no matter how many CPUs are running braincell.tick. Let's not get into the possible complexity of modeling a female brain...if such a thing is even possible.

--
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'

Wenxiong is that you? by HornWumpus · 2008-06-03 18:23 · Score: 1

Still up to your old tricks I see.

When you run out of columns just join to Participant2 and keep adding columns.

--
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'

Re:Too small by symbolset · 2008-06-03 18:44 · Score: 1

I should think that for guys that can write MySQL solving the filesystem problem should be child's play. I can think of several answers that banish these issues permanently. And no, I don't know where Nina is.

The spinning platter problem... No. At the scale of problem we're thinking about here the price difference between SSD and Spinning disks is not that much of an issue.

--
Help stamp out iliturcy.

Cores, cpus, nodes, .. by tuomoks · 2008-06-03 18:51 · Score: 3, Informative

Had to comment, the reference to Jim Gray was a little weird? I was lucky to work with Jim and we were often talking about technology changes and enhancements. Now - see what for example Tandem did call "massively parallel" database! The system was already built to allow several cpus and several nodes to interconnect transparently, Jim did see how that could be used and how the database optimizer really could work. Of course making direct access to any disc faster will help, especially now when the SDD's are getting bigger but the theory is nothing new. Even SQLite can show you that and think systems where you have 32, 128 or even 256 bit flat, memory speed but storage backed world - will change the picture, or? But be careful, we have already gone through many iterations making part of the system faster, as fixed head disks and even indexing in solid state, and found that it may (will) create other problems, not always seen upfront (except by JG!)

Re:Cores, cpus, nodes, .. by krow · 2008-06-04 13:41 · Score: 1

Hi!

A good number of the people who write databases use Jim Gray's book (aka the black book) as the "way to build a database". It is a great read, but I do not believe that in the face of change that the basic design needs to stay the same any longer.

Look at later interviews with Jim and I believe you will find that he too had noticed that the basic design around B-trees really would not cut it for the hardware we see being shipped today.

Cheers,
-Brian

--
You can't grep a dead tree.

OLTP vs OLAP by alxtoth · 2008-06-03 19:55 · Score: 1

IMO, sacrificing OLTP integrity to satisfy OLAP speed is like taking supports from the first floor to finish the roof. Ideally, there is one database for OLTP where you indeed need all the integrity you can get. And there is the reporting database, which might be replicated, and where speed is important. If you try to do analytics in your transactional database, something isn't right

--
http://revj.sourceforge.net

Re:OLTP vs OLAP by Bacon+Bits · 2008-06-03 22:14 · Score: 1

Oh, I quite agree, but given the complaint about query speed here and the limitations of MyISAM on MySQL, the only situation I could envision was one that involved some highly complex query on the order you'd see in OLAP. Or some poorly designed database. What kind of query would take so long with no need for procedural SQL, triggers, writable views, or foreign key constraints? Obviously, I haven't seen the DB in question, but it's suspicious.

--
The road to tyranny has always been paved with claims of necessity.

Re:If you... by BiggerIsBetter · 2008-06-03 23:06 · Score: 0, Redundant

If you.... edit an article.... enough.... you can make it.... say just.... about anything. Shatner? Is that you?

--
Forget thrust, drag, lift and weight. Airplanes fly because of money.

Re:Too small by Anonymous Coward · 2008-06-03 23:13 · Score: 0

Yes, what a fool. Everyone knows that Fox broadcasts a flag that MythTV uses to send the THRASH_DISK_HEAD command.

gilding the lily by Hognoxious · 2008-06-04 00:09 · Score: 1

In my system, each participant is a row, and each response is a column.

I was going to post a comment along the lines of "why, to keep track of what products each customer buys, of course!" to the grandparent post.

However sometimes the truth is stranger than fiction and sarcasm is superfluous.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."

Re:Too small by ZerdZerd · 2008-06-04 01:11 · Score: 2, Interesting

I bought the same batch, but before setting up the RAID, I used each drive differently (ran benchmarks, copied files etc.). Some heavily, some light usage. The probability of them crashing at the same time should be smaller then if all of them get the same wear and tear.

--
I'm not insane! My mother had me tested.

Object Databases? by Grapedrink · 2008-06-04 01:27 · Score: 2, Interesting

Not trying to start a war here, but seriously Databases != RDBMS. It seems like no one knows that object databases have been around a long time too. In the context of the article, many of the points can be applied to all types of databases, but it's so focused on the RDBMS (no shock considering the author).

There were a multitude of issues in the past with object databases from agendas, performance, complexity, etc that put relational databases at the forefront. Hardware and the quality of object databases has more than caught up, so why are object databases so rarely used still?

One answer why object databases are ignored to a large degree is that people don't like to stray from the norm and tend to implement what they know. Another possibility is many people simply have never even heard of the concept of object databases. Further, in academia we almost exclusively focus on relational databases in most courses. Finally, legacy data is perhaps the biggest hurdle.

A corollary to the issues above is that there is an entire industry of DBAs and developers that fight learning something new. There's also mega corps with billions invested in the concept of the relational databases. I don't blame MySQL and some of the things said in the article because they're just trying to improve, but on the user level, it's amazing how much effort goes into adapting the RDBMS into the online world and resulting crazy architecture/technologies/code.

Object relational mappers are a great example of our unwillingness to leave the RDBMS world(unless you're working with legacy/existing data of course, but even then, investigate the possibility to migrate). Why do we need ORMs in the first place? They are a product of using relational databases. When I'm programming, I want to work in objects and not bizarre mapping layers, complicated DALs, etc. We spend so much time on mapping and layers to build bridges between a relational and object world at the cost of productivity and performance simply to continue to hang on to our old RDBMSs.

I've found that in most cases, object databases are faster for my projects. I've also tried related databases like grid/network databases. There are definitely cases where relational databases are also better, but I would use one over the other on a case-by-case basis. I find for the average case I've seen, hardware and architecture tips the balance in favor of object databases because of the way how we want to model things using objects anyway. If we look at a popular type of app right now, a social network... why use a relational database? Typically the associations and structures we make are objects and hierarchies or networks. Relational databases are ill suited at both. Instead, we start to develop hack and wtf schemas, rely to heavily on the app to sort out the data, or introduce object database-like concepts like table inheritance. This also forces us to introduce and learn yet another language.

SQL is a huge discussion in itself. I find SQL brilliant and easy to use, but nonetheless ill-suited for many tasks. Once cursors, user defined functions, etc. were introduced, the nightmare got worse. I find procedural and object constructs instead of set based constructs in SQL created by clients all the time as a result. This ends up crippling performance and instead of fixing the issues, decision makers will just throw more hardware at the problem or ignore it all together. There's also this myth that somehow SQL creates a way for the layman to query data in the database. This is true to a small degree, but has mutated into something not unlike "human readable" for XML.

I'm certified in SQL Server and Oracle, and Postgres is my home RDBMS of choice, so certainly I have a lot invested, but if I'm offered something that is better I will gladly abandon all my intellectual and time investments in these systems. I use whatever works the best for the task. After building several apps using Gemstone over the years, I have to cringe every time I return to Oracle or even w

Re:Object Databases? by jadavis · 2008-06-05 04:10 · Score: 1

It seems like no one knows that object databases have been around a long time too.

Actually longer. An "object database" is really just a graph database. Objects refer to eachother via pointer chains, and it forms a graph. There's really no extra semantic value there, object databases are just graph databases adapted to OO development styles.

Relational database systems were invented to overcome the semantic problems with graph databases, not the performance problems.

One answer why object databases are ignored...

Here you start a list of very condescending remarks about relational database systems. Have you considered the possibility that many people use relational database systems because they are informed and implementing a new application; not in spite of those facts?

...to continue to hang on to our old RDBMSs

More of the same. Object databases aren't newer or better, they are repackaged graph databases. Businesses are "hanging on" to RDBMSs because they are the best thing out there (although you may be able to criticize some specific implementation).

I think you should really consider whether you're looking for a DBMS at all. If all you want is persistence, you should consider solutions along those lines.

--
Social scientists are inspired by theories; scientists are humbled by facts.
Re:Object Databases? by Grapedrink · 2008-06-06 05:05 · Score: 1

Actually longer. An "object database" is really just a graph database. Objects refer to eachother via pointer chains, and it forms a graph. There's really no extra semantic value there, object databases are just graph databases adapted to OO development styles.

This is what I was implying of course.

Relational database systems were invented to overcome the semantic problems with graph databases, not the performance problems.

True, but that is not the only reason. It's a long discussion and I mentioned there are many limitations.

Here you start a list of very condescending remarks about relational database systems. Have you considered the possibility that many people use relational database systems because they are informed and implementing a new application; not in spite of those facts?

I didn't I say it depends on the situation? Just because someone knows about object databases doesn't mean they are stil making the right choice. In fact, I would say it's the norm that people have the knowledge but don't properly apply it when it comes to IT projects.

More of the same. Object databases aren't newer or better, they are repackaged graph databases. Businesses are "hanging on" to RDBMSs because they are the best thing out there (although you may be able to criticize some specific implementation).

I think you should really consider whether you're looking for a DBMS at all. If all you want is persistence, you should consider solutions along those lines.

It doesn't sound like you have used any quality ojbect databases. You're also pointing out the obvious, even more so given I actually did mention graph databases. You're also behaving like what I was hinting at; that there is this culture of resistance. I will use any technology that works and is proven. I don't feel any loyalty to any company or technology. Show it to me and I will learn and use it if it is quality.

I use both relational and object databases, again, depending on the task. My point was people equate database with RDBMS, and what the reasons for that are, along with some of the implications of switching. Show me some concrete data that prove RDMBS are the "best" thing out there as you claim. I guarantee in many situations both in terms of productivity for developers and database performance that RDBMS get smoked. The reverse is true in some cases as well. Ultimately, people need to look at their problem domain and decide which is a better fit. You'd be shocked how many web applications are a great fit for an object DB.

You're chiding me for given an honest opinion rather than trying to make nice, but you're doing the same thing in your reply. There's no denying things like ORMs and the mess the relational world has created. The industry has spent so much effort on skirting around these problems while there are existing implementations that are free from these problems because it's again a matter of applying technology correctly. Sure, there are plenty of situations where we want relational and ORMs and such, but arguments can be made there also on the application side.
Re:Object Databases? by jadavis · 2008-06-08 03:35 · Score: 1

I re-read your post and it still seems to have a strong negative attitude about relational database systems. It seems to suggest that they are somehow a legacy that has been outdated.

Show me some concrete data that prove RDMBS are the "best" thing out there as you claim.

Best is subjective. But, despite a huge amount of investment, "research", and marketing, neither OODBMSs nor XML Database Systems have made much progress into useful production applications. I would claim that's because those are a dead end and will never be useful enough to overcome the current relational database market.

I guarantee in many situations both in terms of productivity for developers and database performance that RDBMS get smoked.

By "productivity for developers" do you mean "got the first iteration of some application finished" or "created something that will usefully collect and manage data over a long period of time"? I suspect that OODBMs may be faster to accomplish the former, but I doubt they are more effective at the latter.

There's no denying things like ORMs and the mess the relational world has created.

That the relational world created? I think ORMs are something the OO world created. Interesting opinion, however.

If your only point is that OODBMSs are better than ORMs, you may be able to convince me of that. ORMs are certainly a mess, I'll give you that.

--
Social scientists are inspired by theories; scientists are humbled by facts.

He doesn't get it by plopez · 2008-06-04 01:54 · Score: 1

Memory and disks (solid state or platters) are *hardware*. Transactions are *software*. Transactions are implemented in code to insure the information gets written to disks to faithfully represent what happened with the information (updates, deletes, insertions, reads. With distributed applications, this is even more important than ever.

Also looking at the spec. of the Tokyo cabinet DB model and all I see is a return to network databases. Having suffered through IDMS, I can tell you this is the wrong approach.

Things haven't changed in a while because they work. Stop trying to reinvent the wheel!

--
putting the 'B' in LGBTQ+

Re:Too small by lanswitch · 2008-06-04 02:30 · Score: 1

Once you've lost all your data to a disk failure, you should realize the importance of backups...

Re:Too small by WuphonsReach · 2008-06-04 02:38 · Score: 1

The bigger issue with RAID5:

Rebuild times scale up as the number of disks in the array increases. Which means that as you RAID5 over more disks, your recovery window (during which a 2nd failure will kill the array) grows increasingly larger.

Which also leads to issues that as the array rebuilds, it puts extra wear and tear on the other drives in the array (as it frantically attempts to recalculate parity bits for the new hot spare). Since your recovery window is a lot larger, and you're exercising all of the disks in the array to do the rebuild, your chance of a 2nd failure goes up.

Now, some (most?) of that is addressed with RAID6, which can handle a double-drive failure.

(Personally, I prefer RAID10, where the recovery window is based on the size of an individual drive and not the size of the array.)

--
Wolde you bothe eate your cake, and have your cake?

SSD will change everything by theonetruekeebler · 2008-06-04 02:50 · Score: 1

I think SSD is going to greatly simplify DBMS design and make it possible for vendors to concentrate more on features facing the user rather than the server:

Everybody knows that with spinning drives sequential writes are much faster than random. RAID and SAN technology has reduced the cost of random writes significantly (and in many cases made serial writes a lot more seeky than you'd assume). But in SSD the concept of physical distance between chunks is completely meaningless. Data can be Ghod-knows-where---and due to rewrite minimization algorithms on the drive, it probably is. The only advantage one large write has over many small ones is in the cost of setting up an atomic I/O operation.

Internally, most of the DBMSs I've worked with are architected around the idea of serial writes==fast. They have two basic phases for how they write to disk. Whenever you do an update---for every update that needs to be written to disk---the DBMS first writes it out to a serialized log file that contains transaction IDs, serial IDs within the transaction, and from- and to-images of the affected rows or pages or whatever. When it's time to flush that to disk, the DBMS starts doing a sequential read through this file, applying changes to the affected on-disk pages, probably going through an intermediate step where it sorts the writes so the random seeks require the least amount of head movement.

One of the problems with this technique is that the log file gets used like a circular buffer: An old, uncommitted write from this morning can bite you hours later when you've wrapped around to that point in the log.

With SSDs, all this clever thinking goes straight out the window: just tag the old disk pages with one ID, the new ones with another ID, and when the transaction is done you point to the new IDs. Done right it's wicked fast, and you no longer have circular buffer problems: The volume of uncommitted data you can handle is limited only by available disk space.

It's kinda neat, and I think it can revolutionize and simplify DBMS persistence even more than the introduction of RAID and SAN storage. I've done DBA work long enough to remember obsessing over which spindle I was putting what type of data on---never put an index on the same spindle as its table, sort of thing. I think about how much simpler RAID made my job and I can't help but wonder how much simpler SSD will make things for the DBMS itself.

--
This is not my sandwich.

Re:Too small by dave87656 · 2008-06-04 03:16 · Score: 1

Once you've lost all your data to a disk failure, you should realize the importance of backups... True, which I do. But I only backup my data. Reinstalling all the software is a pain in the butt.

why are you blocking? by Chirs · 2008-06-04 03:27 · Score: 1

Given the amount of computing power on hand today, it may surprise many how difficult it is to engineer a system capable of executing more than a few thousand transactions per second per thread.

Lets say for example that the total latency for all the calls necessary is 10 milliseconds. 1000/10=100 transactions per thread per second. Oh dear. Your numbers assume that you are blocking while waiting for the responses. It should be possible to process other transactions during that time. Some form of async messaging and a state machine, and you should be good to go.

Even if you don't want to go with asynch processing, why is it a problem if you can only handle 100 transactions per thread per second? Linux will happily run with thousands of threads.

I'm sure there's some reason why it's more complicated than it sounds, I'm curious what it is.

Flexibility by Tim4444 · 2008-06-04 05:03 · Score: 1

Saying MySQL/InnoDB or MySQL/ISAM gets complicated since you can access tables from different engines within a single query. Maybe there's no silver bullet engine for all scenarios. Ultimately, flexibility is the key feature. That way you can gather real performance data and choose the best database/engine for your specific application and data set.

parallel processing and RDBMS is old stuff by Anonymous Coward · 2008-06-04 16:04 · Score: 0

Check out Informix and DB2 . Even Oracle does parallel scaling after a fashion.

This new fangled stuff like "row locking",
"parallel processing", etc. is only new to MYSQL
not to other RDBMS. So just copy what has been out there for a decade or two and stop telling us
MYSQL invented the Internet, everyone knows Al Gore did that.

Re:Too small by Allador · 2008-06-04 19:32 · Score: 1

Even mirroring is basically useless for reliability if the drives are from the same vendor unless you swap out the mirrored drive daily Can we please leave the ridiculous hyperbole at home?

Although what you say sounds reasonable in theory, it doesnt work out that way in the real world.

Even people using multiple deathstar drives didnt see consistent failure times.

Very very few drive failures are because of defects. The deathstar notwithstanding. Even that though isnt relevant because no one in their right mind would use a drive like that for something important. I mean ... it actually said on the box that it wasnt intended for 24x7 use.

Everyone that has released data and in my personal experience, hard drive failures are quite random and arbitrary. Even when a huge number of them is from the same manufacturer and model.

So please be careful what kind of mal-information you bring here, because someone might believe you and try to carry this bad information back to the workplace.

The risk level on RAID is quite simple. It's a ratio between expected drive failure rate and rebuild time.

There's a reason corporate systems still tend to build arrays out of 36, 75, or 150GB drives. It's because rebuild rates are reasonable (and also more spindles equals more speed). So if you ever see someone building a raid array out of 500GB consumer level drives, you know they are either:

1. Working with data that can take a much higher risk of multi-drive failure, since rebuild times are huge.

2. Working with multiple systems networked, and using replication between them to handle multi-drive failure risk.

3. Working with data availability needs that can stand going to tape if the whole array downs.

4. Idiots.

If your rebuild rate is more than an hour or so on very important data, then you use 10 or 6 or one of the higher raid levels.

So you plan what is the cost of losing a second drive while the first is rebuilding, and you design your storage around that.

Like everything else in IT, you pay for how much risk reduction you want.

In my experience, 2 drives failing in the same array close enough in time to lose the array is extraordinarily rare. However, sa user-error while trying to rebuild the array (ie, do it wrong and down the whole array, or pull the wrong drive) is fairly common.

If that happens, you go to your backups. If you cant afford the downtime by that, then you use a RAID level that you can afford and provides you the level of protection you need.

But RAID is not useless, and telling people that is just feeding bad information to people who dont know any better.

Re:Too small by TheLink · 2008-06-05 04:44 · Score: 1

Once you've failed to restore from a backup you realize the importance of working restores ;).

Best to realize a lot of things before they happen to you.

--

Too many replies beneath your current threshold

Relational databases by Anonymous Coward · 2008-06-05 06:45 · Score: 0

I have always been a user of relational databases such as DB2 and Oracle. I am interested in the changes that would take place.

Choppy ungrammatical speech needed editing. by Seor+Jojoba · 2008-06-05 09:58 · Score: 1

It often happens that people being interviewed talk ungrammatically. But usually, the interviewer fixes up the text, gets approval from the interviewee on the edited text, and something easy to read gets published.

This is almost certainly a verbatim dump of Brian Aker's speech, and I think it makes him sound like a scatter-brained schizophrenic. Which isn't really fair, because most of us would sound this way when giving interviews. Read TV closed captions without audio on news programs, and even press secreteries speak in these fragments. Aker, or whoever manages media contacts at MySQL should insist on proper editing for interviews.

Re:Too small by Tablizer · 2008-06-06 15:13 · Score: 1

Except that Bill Gates never said that.

Dammit, don't let reality ruin my fricken joke!

--
Table-ized A.I.

Slashdot Mirror

Brian Aker On the Future of Databases

175 comments