MySQL Clustering Software Launched

← Back to Stories (view on slashdot.org)

MySQL Clustering Software Launched

Posted by timothy on Wednesday April 14, 2004 @07:08AM from the englobulate-positively dept.

lawrencekhoo writes "MySQL AB announced yesterday that software for building a MySQL Cluster will be available for download by the end of April. Articles available from Computerworld, Internetnews, Linux Electrons, and PHP Architect. Great! Now my website can finally have 99.99% availability ..."

16 of 48 comments (clear)

Min score:

Reason:

Sort:

More info... by Apiakun · 2004-04-14 07:13 · Score: 5, Informative
Here are some direct links to more information:
- The whitepaper
- FAQ
- Data sheet (it's a PDF)
Oh, and they say availability is 99.999%, not just 99.99% :)
Re:set nitpicking = on by dacarr · 2004-04-14 07:33 · Score: 4, Insightful

It's PR. Remember, The SCO Group is "a leading provider of UNIX-based solutions", per many of their press releases. It doesn't make it any more acceptable, it's just a tactic. Chill.

--
This sig no verb.
Re:set nitpicking = on by Apiakun · 2004-04-14 07:34 · Score: 2, Interesting

It really depends on what the meaning of is is. Does popularity mean that it is the most used, or the most liked? I would think that popularity and usage are a different metrics.
What about PG? by Anonymous Coward · 2004-04-14 07:35 · Score: 5, Interesting

I remember someone developing a rahter advanced multi-master replication and clustering for PostgreSQL. Does anyone know how far is that project? Has it entered the testing phase yet?
From what I've read it looked very, very prommising, but it doesn't do much good if it's on paper only...
set error_detection = on by jtheory · 2004-04-14 07:48 · Score: 4, Informative

Apples to oranges. The press release should have been more specific than just "database", but still... Berkely DB is not a "database" as most developers think of the term (relational, accessible using SQL, etc.).

Berkely DB is code that manages a data store, and you access the data using method calls within your app (you compile their code with your project), NOT using SQL, and NOT connecting to an independant application. Remote access n/a, no ODBC or JDBC, etc. etc.. Great product, but a completely different animal from MySql and other relational databases.

In fact, MySql used to offer Berkeley DB (as opposed to InnoDB, etc.) as a data storage option WITHIN the MySql product.

--
There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
In memory only? by diegomontoya · 2004-04-14 08:01 · Score: 4, Insightful

If this is the requirement deployment then for people like us were db size at over 20GB, and yes the big blogs are already stored in compressed using compression, this would not be economically pratically to use. Factoring OS, caching, I need to get 22GB memory for each node? Last I checked, the 2GB cheaps are still nasty expensive.
1. Re:In memory only? by Unknown+Relic · 2004-04-14 11:21 · Score: 4, Informative
  
  I was wondering this as well. Also the FAQ mentions that "Data that needs to be highly-available must reside in the MySQL Cluster storage engine. If existing MyISAM and/or InnoDB data needs to be made highly-available then it has to be migrated to the MySQL Cluster storage engine." I'd assume that the clustered table types have support for transactions like InnoDB tables do, but there's nothing here to confirm this.
  
  From the way I'm reading it, this type of cluster would be most ecomomically used for in conjunction with a traditional replicated mysql database. You would use clustered engine for transactional tables where a large number of inserts or updates occur, and for tables where you have a lot of historical or read-only data, you would use standard replication, where you could tolerate a few minutes without the ability to insert or update should the master fail. In order to reduce the memory requirements for the cluster you could also move old transactations from the transactional tables to historical tables which use InnoDB/MyISAM.
  
  That being said, there must be SOME use of the disk on the cluster, because their recommended node system has raid + four 73GB SCSI hard drives... major overkill if everything except for OS/Software is stored in memory!
You know you're a database geek when: by denubis · 2004-04-14 11:10 · Score: 4, Funny

You know you're a Database geek when you see the headline and immediately think: "Ah hah! Clustered indexes! That'll save some time during joins! Oh. Wait. They're talking about boxes. Drat."
drive usage and thoughts... by diegomontoya · 2004-04-14 12:22 · Score: 5, Informative

No where did they mention battery backed-up ram modules as a recommended config so I believe your're correct to assume that disk not only has to be used, but MUST be used.

Without ramsan style battery packed ram, there is no way any enterprise would trust clusters of any kind to ram only storage for write commits.

Looks like each write transaction will be synchronized acrossed all nodes, which would explain the gigabit and lower latency interconnects. Still, this is crazy complex to make fast and reliable.

So to make it truely synchronized, they have to write to disk, for backup/log, before committiong the data to the ram. So regardless, writes are slow and I'm waiting to see how they by-pass this disk write commit latency. Add on that they have to do this for all nodes before responding to the app, writes are crazy slow, relatively, since they can influence indices, force cache/ramed-data flushes, etc. Would be interesting to see how they handle this.

Also, I'm interested to see what type of check code/algorithm to see which NODE is healthy and which ones are corrupt (not dead since dead servers are the easiest to detect). From their diagrams, looks like N-type replication so each node is an exact synchorinized duplicate of all others. But how to know for sure which one is the "safe" one when corrupts happen?

Also, I wonder how they tackle gigantic inserts/update like "replace into table2 select * from gigantic_table1". They can't assume or dictate that we only stick to small write transactions right?

Cheap N-way synchronized replication is my and probably most dbms managers' holy grail so I'm crossing my fingers for Mysql to get this right.
Re:set nitpicking = on by jonadab · 2004-04-14 12:37 · Score: 2, Insightful

I think they're using "database" here to mean RDBMS. Technically a database is
just anything that organises data, so a filesystem would count, but that's not
how the term is generally used. Usually these days when people say database
they mean RDBMS.

The other thing is, most installs is not the only reasonable measure of
popularity. I'm pretty sure more people have daily interaction with MySQL
than with Berkeley DB directly. Berkeley DB is installed so widely because
it's been around longer and because certain key pieces of software depend
on or use it for historical reasons, not because people like it better.

Note that I'm not trying to say Berkeley DB is bad or anything, or that MySQL
should replace it; they're really quite different things, and they exist for
different purposes and fill different niches. I wouldn't consider them to be
direct competition really -- well, not mostly. MySQL is in competition with
PostgreSQL mainly, and to a lesser extent the major commercial database
offerings (Oracle, MS SQL Server) and various lesser-known projects (e.g.
Firebird SQL). Berkeley DB competes with I think certain Gnu libraries and
maybe some other things I'm even less aware of. Not that MySQL and Berkeley
DB are in _completely_ different worlds; they both might reasonably be said
to compete on some level with SQLite for example, so there is some overlap
between their areas of application. But still, they're mostly not really in
the same category.

Sure, they're both databases. But to say one is more popular than the other
is like arguing whether traceroute is more popular than Mozilla. They are,
after all, both internet software.

--
Cut that out, or I will ship you to Norilsk in a box.
Node requirements by Anonymous Coward · 2004-04-14 18:22 · Score: 2, Insightful

The standard requirements for the node surprised me.

Is stats that you need 16GB of RAM !! Why do they say that? Doesn't the amount of RAM depends on the size of your Database? If my InnoDB database file is only 3GB why would I need more that 4GB og RAM?

Also, why the hell would you need scsi drives for an in memory database?
1. Re:Node requirements by Daniel+Dvorkin · 2004-04-15 17:13 · Score: 2, Informative
  
  16 GB is the "preferred" requirement; the minimum is 512 MB. Quite a difference.
  
  --
  The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
Does MySQL AB have credibility here? by Anonymous Coward · 2004-04-14 22:50 · Score: 5, Insightful

I mean, this is an enterprise-scale storage engine from the same engineering team that used to deride ACID transaction isolation and rollback as unimportant, and whose parser still silently ignores any attempt to use integrity constraints that aren't supported. Are these the right people to achieve the robustness that needs to accompany "five nines"?
1. Re:Does MySQL AB have credibility here? by ldspartan · 2004-04-15 04:39 · Score: 4, Insightful
  
  No, they're either morons or criminally ignorant of what is considered a standard feature set for RDBMSs. For all the reasons you mentioned and more.
  
  --
  lds
MySQL Cluster white paper by Vexware · 2004-04-15 06:36 · Score: 2, Informative

For the lazy among you (and lazy you have to be to find the task of entering a few fields in a form exhiliarating), I have uploaded the MYSQL Cluster white paper to another FTP site, mirror of the file which you may access there: mysql-cluster-whitepaper.pdf (the document is a PDF file, so fear the Adobe Acrobat Reader loading time).

--
"Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect" -- Linus Torval
1. Re:MySQL Cluster white paper by Unknown+Relic · 2004-04-15 21:32 · Score: 2, Informative
  
  Note that they've now posted a technical whitepaper outlining the architecture which wasn't there yesterday. It's worth a read, goes into a lot more detail than what was there previously and talks about replication options, failure scenarios, etc. It mentions that disk storage is used in addition to memory storage, which confirms the speculation made earlier in the discussion, though it still doesn't explain exactly how disk storage is used.