Open Source Databases Revisited
pusakat writes "If you've been following performance comparisons of the different Open Source databases, Tim Perdue revisits PostgreSQL v. MySQL with production data from SourceForge and comes up with interesting results. This may be fodder for yet another 'my database is better than your database' exchange from both camps but the results are interesting anyway."
Indeed, I'm working on a crisis for a customer who runs a large web site with three Linux/Apache/PHP app. servers hitting one database server (via ODBC) running our DBMS.
Guess which component is failing?
Oh man, yet another case of Slashdot putting up a comparison that can't possibly be made, but it will generate a lot of traffic and bring down servers to make it look like this is the happening place when it's just a geek version of Jerry Springer. This survey has been an old story on the Oracle newsgroups for a week and it's not resolvable, just "We're better than you" followed by "no, we're better, biatch." If you use these systems, you know what happens... different products are work better under different circumstances, nothing new. You are starting up a website, have no money but lots of know-how and want to get something up and running until you get loaded with cash (okay, dream). Right, you download MySQL for nothing and get up and running. You are getting a huge number of hits from tons of users (row locks become a little bit of an issue, right?) and you have some money, you get set up with Oracle. Which is better? I hit a website (a newcomer site, not up and going for very long) Friday night to place an order and for the second weekend in a row (okay, it was a food delivery place - I have no life and order in, what's the surprise?) I was returned a MySQL error. But you know what? A week ago I got multiple ORA-00020, and ORA-00018 errors trying to access my work late at night (and guess whose product that is). Lot's of factors come in, not the least of which is the talent running the whole show. It seems to me like this is a big picture issue, and isn't it good for us all that there are free alternatives that get us through testing and maybe even up and running, but realistically, how far can something free take us? Then if our idea works, and the users come, we pony-up the dough for something else. That seems to work. So what does everyone think?
"My mother works for Microsoft now. A whole other cult."
Either im in class, at home studying, restoring my truck,
designing someone's page, setting up a router,
or fixing some lemming's (like yourself) machine because
"The browser crashes when they get online" bullshit.
My machine stays online all day. I don't bother hanging up the modem.
You are a complete bastard child, I would assume you are no
older than 13, living with your mommie. She fucked
some bum, and you have no idea who the hell your father is.
You know what? Im gonna go dig up your grandmother's grave
and let SIGNAL 11 have his way with her. In fact, ill make sure
he digs up your grandfather too.
ETRN x
Sorry kat, i have more brain in my pinky than you do in
your whole body.
ETRN x
You can compile postgres yourself and the compiler will optimize for your processor.
"Fat, drunk, and stupid is no way to go through life."
"You're gonna need a bigger boat." - Chief Brody
Quite. Let's see how well these "open source" databases do in TPC-C and -H.
Here's the specifications for Interbase: http://www.interbase.com/open/research/tech_specs. html
Maximum size of database: 32TB using multiple files; largest recorded InterBase database in production is over 200GB
Maximum size of one file: 4GB on most platforms; 2GB on some platforms
Maximum number of tables: 64K Tables
Maximum size of one table: 32TB
Maximum number of rows per table: 4G Rows
Maximum row size: 64KB
Maximum number of columns per table: Depends on the datatypes you use. (Example: 16,384 INTEGER (4 byte) values per row.)
Maximum number of indexes per table: 64K indexes
Maximum number of indexes per database: 4G indexes
"Fat, drunk, and stupid is no way to go through life."
"You're gonna need a bigger boat." - Chief Brody
On another subject, did you drop all the users from blacknova.net? My user (tolket) is gone....
Is that the official release or is that still in beta ? I heard they plan to do it, but with those versions I worked with, didn't support it.
ich bin der musikant
mit taschenrechner in der hand
kraftwerk
I copied the following message from http://www.phpbuilder.com/annotate/message.php3?id =1001760:
The opensource Interbase is alive and well: At: http://firebird.sourceforge.net
http://www.ibphoenix.com
http://www.interbase2000.com
And in newslists at http://www.mers.com
Generally it's called Firebird, and Inprise don't want to know or link to us.
On another subject, did you drop all the users from blacknova.net? My user (tolket) is gone....
Laugh, I'm not in charge of that, though I'm one of the OpenSource developers.. Yes, the DB was completely flushed... You need to create a new character.. Read the forums.
-Michael
-Michael
Your comment minimizes the performance issues. You can create a beautiful, 3rd normal form database and find that it is slow as hell when you try to read or write to the schema.
Most real-world databases require a whole lot more {tables}.
For a web site, pages need to return in 1-2 seconds, but if your 16 table join takes 1-2 seconds, you're in trouble because your response time now depends entirely on the network conditions.
Theoretical ideals of relational theory applied to databases are fun to talk about and design, but the real-world dictates some compromizes. Often, denormalization is the first compromize. This reduces the number of joins required to read read the database. It also speeds up writes because you have to issue fewer SQL statements.
Later.
"Fat, drunk, and stupid is no way to go through life."
"You're gonna need a bigger boat." - Chief Brody
I've evaluated several free (as in I can make money with it without paying anyone a dime) DB servers for a small startup company, and Interbase so far looks by far the best. Not only are the features attractive, it also has more name recognition than MySQL or Posgres, and for some people that's important. We're using it particularly from Delphi apps, and support for that is great. It's amazing that the entire server can be configured to take less space than the BDE itself.
Anyway, I'm not saying that Interbase is an Oracle or MS SQL killer. But for a lot of clients and a lot of applications it's a suitable choice, and the client saves a stack of money to boot. Add to that the fact that you can deploy it on Linux, and for cost conscious customers it's essentially a no-cost solution. Just buy the box, the rest is free.
Your database software failed. I lost all my data, but I can at least sue you and get some money. My business might be ruined, but hey I can cash out from winning a law suit against you. Of course I have to pay the lawyers and might have no income while the suit goes on but who cares.
Trying to be cynical ? Probably. But your logic fails utterly: In a mission critical environment you sure as hell want a database vendor, who:
is likely to provide professional, contractual support on various levels
is not likely to go away next week
who can provide an annual report indicating the amount of $ invested in R&D
ultimately who is accountable.
I'm aware that you can purchase professional support with PostgreSQL (which is likely better then what the big 5 have to offer), but frankly, being responsible for a project I wouldn't bet my carreer on it.
I never said that I necessarily agree with this mindset, but from a (project) management's perspective it's understandable. If I invest $200 000 000 into a mission critical system, the possible savings by using open source products in core elements (database, middleware) is probably not an option.
Now for the rest:
Reliable Database How do you determine reliable ? For millions M$ Access is the idea of a reliable database. Tell you what: It's not! Further disks will and go inevitably bad. Even if you have redundant disk subsystems a logical error or a pointer going haywire may have been written to disk.
good backups We certainly don't have an argument here. However, if you can't afford down time or you can't afford even five minutes of lost data this might not be enough.
Security Of course, but it's incredible difficult to provide good security. Especially in a multi tiered environment. Granted, that has nothing to do with the fact if a database is open source or not.
ich bin der musikant
mit taschenrechner in der hand
kraftwerk
Well, the effect of caching query plans, or pre-parsing or pre-compilation will depend heavily on the query. For a query doing many joins that return little data using tables with appropriate indices, optimization time most likely will be an expensive component of the total time of query execution.
On the other hand, a quick query on a single table might spend a fairly high percentage of the execution time in the parser, because the optimizer has few options to explore and the resulting execution is likely to be fast.
Preliminary tests by the person who put together the current test implementation of the query cache code reports about a 3x improvement in the execution of simple queries.
Here's a reason why your test of precompiled functions might not help as much as you'd expected: though the function itself is precompiled, the query you use to reference the function is not. The time required to parse and optimize this query might be enough to overcome the fact that the function itself is pre-compiled.
Also, you talk of "stored procedures". I don't use PG's stored SQL functions, only PL/pgSQL. Stored SQL functions might not be pre-compiled, I can't say one way or another.
I'm not sure exactly what you mean by "raw calls to table operations". I do know, though, that a PG plan consists of a tree of operations like "merge join", "indexed scan", etc. These are "table operations" and they're "raw", but they're complex table operations.
Sorry, Mr. AC, but ORacle 8i definately, and without a doubt, is Object Oriented. Read up, bucko.
like your post couldn't be dismissed as a troll from the title alone...
"If you think education is expensive, try ignorance" - Derek Bok
Linux is great for manufacturing.. I work for a mid-sized auto remanufacutuer, and we use postgresql all over the place, to do inventory, and core checkin. postgres has made huge strides in the last few years.. we started out with an old pentium running 6.3 postgres.. it was slow as dirt.. we are currently running 6.5.3, on a 600mhz alpha system.. and will probably move to 7 one of these days.
you have a budget? what's that? *smirk* I'm lucky if i get $50 for a stack of CD-R's.
There used to be an 8K limit on row-sizes.
The author of the article says its going away next release.
I think the lack of tools, row-level locking, transaction support, etc. in MySQL are a shame. Yet, people talk like its the best thing since sliced bread. This has lead me to discover two kinds of MySQL users:
I noticed, though, that it would not understand insertion of large text object data within an sql INSERT command as opposed to MySQL.
Postgres supports Large Objects but I've never had to use them. In fact I'm not even sure how to use them. :-)
You're forgetting Informix which runs many enterprise-class installations, 24/7, OLTP, Tbytes of data, all that rot.
--
As a matter of fact, I am a lawyer. But I play an actor on TV.
Did anybody else notice that PostgreSQL actually improved performance with more concurrent users ?
I didn't notice that. In fact, I noticed the opposite (which is exactly what was expected.)
What I noticed was that as you increased the number of clients, the total number of requests served increased, while the number of requests per client was lower - which is exactly what should happen - the performance decreases as the number of clients increases.
The problem is that you don't know how to read a graph.
Try looking here . Version 7.1 should allow you to insert large (>8kB) amounts of text without having to use the large object interface (i.e. a simple INSERT/SELECT/UPDATE will do). If you're planning on implementing large object storage under PHP, read here about the pg_lo* functions, it has some useful examples.
That's bullshit. First of all, jsut because you're doing open source stuff, and have never even seen a real enterprise app, don't tell me that one terrabyte is 'enormous'. It's fairly standard. I've worked on several projects that had databases this big.
Secondly, Oracle IS object oriented. You can use eithe ra relational or object-oriented structure in Oracle.
Thirdly, I've never even heard of these DB's that you mention. Do they even support databases of this size?
It's got both stored procedures and triggers. Some people I know that moved from Interbase to MSSQL have liked both features better on Interbase. Of course, MSSQL comes with a lot of eye candy such as Enterprise Mangler. Also, a lot of people consider Interbase one of the most rock solid db servers out there. I'm running one setup at home using Interbase 6 on RedHat 6.2 on a 486DX2/66 with 32MB RAM. Performance is quite decent for 3-5 users. The limiting factor is the slow-ass drive in there (250MB IDE). More RAM and a faster drive would grease it up.
Not necessarily. Database design is one of a very few true black arts (read: serious voodoo.) The number of queries alone isn't a problem. The layout of the data, indexing, number of queries, complexity of queries, and the volume of returned data all factor in.
Let me provide a real world example. Some years ago, it was my task to roll out RADIUS for an ISP's dialup network. The stock setup for the USR RADIUS server -- it was the only one I could get to talk to a SecurID server correctly -- was, shall we say, non-optimal. USR's stats said one (1) authenication per second when using postgres. Let's ignore the way they had postgres setup (fsync enabled) for the moment. The reason everything was so bloody slow was the database schema. Even though there was only 1 query against the USERS table, it had to return 5k. Each row of the USERS table contained everything you could ever imagine (well, that USR could imagine.) Each row consumed almost 5k of space.
After three months of testing, data collection, and analysis, I redesigned the mess. USR literally laughed at my design because it made between three and six queries depending on the user. They stopped laughing when I showed one of their engineers the setup actively handling over 100 authentications per second. The USERS table, holding 10,000 entries, was about 2M.
Uhhh...you can use Oracle 8i for DEVELOPMENT for free for as long as you want, or use an EVALUATION copy for 30 days.
... when you deploy, you must pay. Read your licence, please. While this wouldn't bother me a lot, it might bother Larry's lawyer's, and if your resulting site's visible I suspect they'll notice. Since you've posted your URL on slashdot, the odds that they'll notice may be substantially greater than it was until you posted :)
But
The resulting lawsuit might not be terribly pleasant.
While you're free to do what you want, and while Oracle's not known for going after small-fry and seems to be much less obnoxious than MicroSoft in terms of license enforcement, folks should at least know that what you are proposing is a violation of your Oracle license.
At the risk of being redundant - deployment or other non-development, non-evaluation use = $$$.
actually, in the mysql fulltext search docs the mysql guys do a decent job of detailing the _two_ options for querying the fulltext index. putting the "match (a,b) against ('words')" in the "select" list returns the relevance rank against all rows in the table, including those with zero relevance. if you put the match clause as part of the "where" clause then mysql returns only those rows with non-zero relevance, ordered by the relevance score (but the relevance score is not one of the return columns).
- mark
1. PostgreSQL does run under NT. It runs under cygwin, and I have no personal experience running it under NT, but it does run under NT (and, yes, I do mean the server, not just various clients).
2. In PostgreSQL, writers never block readers, so your statement to the contrary is a factual error. InterBase and PostgreSQL use a very similar paradigm for the storing of actual data, and InterBase's "Multi-Generational" stuff is equivalent to PostgreSQL's "Multi-Variate Concurrency Control", or "MVCC" for short).
Again, I'm not disparaging your use of InterBase, a fine product that hopefully will survive the current political machinations that surround it. But I can't let your misconceptions about PostgreSQL pass without comment.
But "nonzero relevance" doesn't mean "contains all terms"! Go back to the docs and take a careful look at the examples they give.
The only use I've found for the full text search is in ordering items found by other means. Marginal utility, at best.
It would also be nice if they gave a little formula somewhere for the "relevance". What's it really measuring?
--
Welcome to the Turing Tarpit, where everything is possible but nothing interesting is easy.
Does anyone know when the PostgreSQL 7.1 beta is supposed to be released?
I had not considered the cygwin option and I concluded that PG does not run on NT natively when I went to download a binary distribution. Have you tried it under cygwin? Is there a performance hit because PG would run in the cygwin emulation, rather than native?
I was unaware of the MVCC of PG because the NT part was the first issue, so I did not explore use of PG after I didn't find a binary NT version.
Thanks for the corrections!
"Fat, drunk, and stupid is no way to go through life."
"You're gonna need a bigger boat." - Chief Brody
Well, I just recently attended the Boston.pm group meeting which this time around was hosted by NuSphere. These are the folks that are doing MySQL modifications for their release of MySQL/Perl/Apache and PHP.
The big news was that a re-worked version of the *Progress* database back-end would be turned into an open-source MySQL-compatible back-end called Genie. This would be pretty huge, as Progress is everything that people complain that MySQL isn't. Also, MySQL is adding (either in their next release or the one after that, depending on the feature):
* Subqueries
* A boolean type
* Foriegn key constraints
It's good to see that Postgres has shown you can include real RDBMS features and still be faster than MySQL.
Was the author using BDB tables? If so, that's probably the problem. MySQL allows you to use BDB tables, which support transactions, but they are much slower. It is also possible that transactions, which are somewhat new to MySQL, are not yet well optimized. Either optimize your database to make transactions unnecessary, or use postgresql, at least for now. Why MySQL made the mistake of supporting transactions, I will never understand.
ok then your [sic] infringing on my copyright! Could you as [sic] me next time before STEALING my comments for your own?
Which of them is most stable (with HUGE databases) ?
What does HUGE mean? Gigabytes? Terabytes? Exebytes? Also, size is not the only metric involved in stability. Stability with a given number of records or fields, for instance. A 100 gigabyte database could have 10 records or a million records. Depends on how big each record is. Another metric is the number of simulatenous requests it can handle. One might work fine if it gets a 100 requests at a time, but chokes if it gets a 1000 requests at a time. A million record database might be only accessed by 2 clients at a time...size doesn't equal number of users.
Which of them is most likely to be able to recover withouth glich from an "impure" shutdown ?
Good point, but they're probably about the same in that regard (just taking an educated guess here).
Which will benefit more from a HUGE ram quantity or multiple cpus ?
Most well-written software will take advantage of as much RAM as it can get. The question would be which program is better at managing its own memory?
As far as mutiple CPUs go, programs have to be written specifically to support multiprocessing. On Linux or Windows NT, for the most part, this means writing the code to use multiple threads. I'm not sure if either of them are or are not, but that would be a very good point.
My journal has hot
What are you talking about? The linked article has a date of 20001112. At the top of this (new) aticle, the guy refers to a previous article he wrote back in July...
rm -rf / is the evil of all root
but how does SourceForge function when it has pages with 16 queries, crashes with 30 simultaneous clients, and serves .77 pages/s with only 5 clients? I'm no web/database guru but that kind of performance seems crazy. I guess those projects can't be all that busy or the site would be locked-up all the time? or what?? Judging by the performance of this column at phpbuilder.com maybe that site has some performance issues as well. I guess my question is, should I be using these sites to learn PHP/Database programming? Do these performance problems show actual problems with the database server or with the database schema and program designer(s)?
As I said, maybe I'm ignorant and missing something as I am new to this type of development...
The link to the supposed IBM study returns an error. Which is too bad, because I'd love to critique the study - Tim Perdue's benchmarking work isn't the first to show that MySQL's table locking paradigm breaks down under high load. Remember that he's been studying alternatives because SourceForge, using MySQL, has had problems in this area. And remember that Slashdot has helped fund the integration of the Berkeley DB backend and MySQL for the same reason.
It's not all myth, folks. Table locking sucks, page-level locking (Sybase) is much better and row-level locking (Oracle, Interbase, Postgres) much better still in high concurrency systems.
I don't know about MySQL, but from what I can tell psql for postgreSQL is fairly full featured in terms of wrapping SQL in functional language-- and comparing the minimal work I've done in GUI DB tools to the experiences I've had working in command line or Perl with pgsql, I'd take the non-gui process any day, it's much more direct, flexible, and doesn't suffer from the same tool-specific learning curve (i.e. solid Perl + solid SQL = extremely portable skill, whereas any of the listed commercial tools may or may not be portable-- this is important for both ramp-up time on new experiences/jobs/whatever, and for personal development or career building). As for the tools you mention, do any of these function via ODBC, and therefore have the capability to connect to any database with drivers? Not to say that you don't have a valid point, I'm just wondering if there isn't a good reason why the Free Software types haven't embraced GUI and ungeek-friendly tools...
I do not have a signature
I wonder what www.phpbuilder.com uses, Postgres or MySQL, because right now, its slow as molasses.
I haven't had time to evaluate it myself, either. Anybody out there used it?
Interbase is free, it is not restricted by the GPL.
Interbase has been released under a variant of the MPL that "protects" the Interbase name (IOW, only Interbase may release certified Interbase builds). Anyone may grab the code and use it whichever way they like.
This gets right at the heart of why good Oracle DBAs often pull salaries in excess of $150K. At the end of the day, speed and reliability count for a lot more than other peoples' preconceptions.
Remember that what's inside of you doesn't matter because nobody can see it.
We were using Informix at my workplace for a while; in less than a month the thing completely "melted down" and dropped all of our data. The suits were in a panic.
:)
We switched over the Postgres in one swift maneuver, and it's been running great ever since. Speed was never in question; we just needed a database which wouldn't drop all its data one night while no one was looking.
Besides which, the ultra-cool pg_dump command makes backups a no-brainer.
Kudos to the Postgres team. And my company won't be dump enough to try to use proprietary software again in the future, that's for sure...
I've been using Pg since the 6.4 series of builds, and about a month and a half ago switched to 7.0.2. It is worth the upgrade IMHO, but of course since you're relying on it for critical business data you'll probably want to test it some first... ;-)
Oh, and if you don't have a budget how come you have a 600mhz alpha? 600 amd mhz is pretty cheap, but my impression is that 600 Alpha mhz are very much not... :-)
--
News for Geeks in Austin, TX
That may be correct. I don't remember off-hand which product it sprang from but they've taken vastly divergent paths since then.
--
As a matter of fact, I am a lawyer. But I play an actor on TV.
Well, the printable version is one page, read it.
.sig --
--
That license is the most insanely tortured piece of legalese I've seen in a long time. How is it useful to call something Open Source when, after reading the license, I have almost no clue what I'm allowed to do with the software? Is this not proof that Open Source is a phrase that is too vague to be useful, since so many seem to feel the need to write these kind of "Open Source" licenses that confuse the issue of user freedom to a point beyond which it becomes almost unrecognizable?
I do not have a signature
I'm not saying that 1TB and greater DB's are not out there, I know one individual who designed a multi TB database for some phone company. These, as the previous poster mentioned, are specialized. You are probably a consultent in a specialized market without even realizing it. The majority of the market does not see databases this size yet.
Cheers
"A few great minds are enough to endow humanity with monstrous power, but a few great hearts are not enough to make us w
I'd love to use db.linux on a couple of projects but this limitation is a killer. The complete license is at License Terms.
Please note that this benchmark (and any benchmark, for that matter) applies only to the system and application that it's testing. Database applications vary so widely that it's very difficult to get any meaningful numbers outside of very specific areas.
As an example, I recently ran some tests that came up with the exact opposite results. My application is a large message tracking database. Query speed is secondary; insertion speed is critical. MySQL handles my test data set in 20 minutes; PostgreSQL takes over 3 hours (!). For this application, PostgreSQL is Right Straight Out.
Other notes:
1. PostgreSQL's tables took up roughly twice the space of MySQL's.
2. MySQL's lack of transactions is a real pain. I can, however, work around it (in this particular application, at least).
3. MySQL's "text indexing" is useless. The evaluation function returns every record that contains any of the search terms; there is no way I've found to require all search terms. No documentation, of course.
4. The latest beta of MySQL can use Berkeley DB tables to get real transaction handling. Unfortunately, this is even slower than PostgreSQL.
Obvious conclusion: Run your own tests and draw your own conclusions.
--
Welcome to the Turing Tarpit, where everything is possible but nothing interesting is easy.
I'm pretty certain few people actually think MySQL is a wonderful DBM that's the best thing since sliced bread. Most are keenly aware of its deficiencies but defend it because of two features that are invaluable to a successful open-source product:
There's no doubt that PostgreSQL is a good DBM, but MySQL will remain more popular until the support for it arrives.
Ita erat quando hic adveni.
the url you posted is wrong, please post the correct URL, I do like to read IBM's report.
thank you.
------ Curiosity killed the cat. {satisfaction brought it back | it didn't die ignorant | lack of it is killing mankind
pgadmin - 32bit app for win*
phpPgAdminweb interface admin. This is a port of phpMyAdmin for MySQL (which I believe can be found at zend.com)
I'm pretty sure k5 sets a limit of <100 queries on a page before they start worrying about performance. At least that's what Rusty said if I remember correctly. I'm guessing they probably run about 10-20 queries on most pages, however.
My site runs about 10-20, but thats only after caching lots of data in memory.
--
Mysql uses multi-threading (each connection gets it's own thread), but I've read slashdot posted benchmarks that show Mysql failing after so many threads are spawned via concurrent connections(I think 60). Perhaps they're using some finite resources and the thread spawner was not using semephores or some other such resource control.
Postgres uses multi-processing (each connection gets it's own process) with shared memory segments and a controlling/monitoring process (postmaster).
Thus if Mysql could do a better job of managing resources under heavy loads, then it would waste less physical memory. Additionally, I don't know how good Linux is with multi-threading under heavy loads. Can it faithfullly handle hundreds of potentially thousands of concurrent/competing threads? I've never seen any tests to show this... I'm aware that the default max processes/threads was hard-coded in the kernel to 512 at one point (I believe it requires a recompile to change). This might have something to do with mysql's crashing.. You'd have to go to single-user mode to max this puppy out. Additionally I don't know if it's possible for an app to determine how many more threads can be spawned by the OS.. If a thread fails to spawn, then you can corrupt the entire process.. In MP, if you fail to spawn a process, then you simply lose that connection.
In general, I'd prefer MP on a Linux system. It's a similar story with Apache. You get a lot of reliability, and not _that_ much memory waste when compiled correctly. Additionally, MP has the potential to span computers (as in a slasdot article on a multi-hosted extension to Linux)
-Michael
-Michael
How would these 2 compare on a low end machine?
Around 200mhz / single processor?
Since i run a small site that uses MySQL, Apache/Php4,
I would like to know. I might convert the server over to
postgres if it would show performance above mysql on
this type of machine.
I don't think it would make much difference, accept
that it would give me more flexability with my site
and the same/ or better speed. Most of my
traffic is on FTP anyway.
ETRN x
The link you included to IBM returns a 404. Got another one?
Yes, this is a plug for something that I am working on, but it's something that we are very proud of and which is going open source on x86 GNU/Linux.
Check out GT.M at http://sourceforge.net/projects/sanchez-gtm. This is a vetted, industrial strength high speed transaction processing database that is in production worldwide at a number of large (and small) banks, and has been benchmarked at 1,200+ online financial transactions per second on a high end RS/6000 (http://www.sanchez.com/news/pr000828.htm), every transaction having full ACID properties.
It's not open source yet, but it will be soon (other niceties such as a project home page are not quite there yet because we're busy scrambling to get a release out). Meanwhile, a binary can be downloaded, as can PDF manuals.
GT.M also has some unique database replication capabilities for creating an application that is continuously available 24x7, even if the data center is blown away or even as the underlying database schema is upgraded. Read the manual for details.
What's missing? SQL/ODBC/JDBC access requires a commercial product (an open source freeware solution is not yet available).
Actually, I think "Exebyte" is a company that manufactures 8mm tapes and tape drives. Tee hee!
--Be human.
If it's not free then it is not open source.
But do realise that you can pay money for software which is open source (or free -- they're the same thing).
You're a suburbanite.
PostgreSQL, as distributed, comes configured with an 8KB blocksize, which effectively limits the size of varchar to a bit under that. However, you can configure it to use a blocksize of up to 32KB, which means you can handle the largest string Netscape will return from a textarea widget.
Among many other enhancements, PG 7.1, due for beta very shortly and the version Tim Perdue reviewed in his article, removes this restriction. You can have varchars (and other large types) of virtually any length - though at some cost in performance. Some types (like text) can also be compressed much as InterBase does with its text type.
Their HTTP serving seems to be a quick as can be, it's only while attempting to access download.sourceforge.net that it becomes unusable.
The wheel is turning but the hamster is dead.
The wheel is turning, but the hamster is dead.
No they do not.
this means writing the code to use multiple threads.
No it does not. It is the path of least resistance, though. Unfortunately, it tends to be the buggiest path, as well.
--
--
Eat right, exercise regularly, die anyway.
Do either MySQL or PostgreSQL include the ability to use stored procedures? I've found them to be a life saver on SQL Server and wouldn't mind trying open source/gpl'd databases if they had that feature.
However, it does make good use of postgres' triggers and someone with some time could enhance it to make it useful.
In particular, someone should enhance it to make the "noise words" list selected out of a database table, instead of hard-coded in the C-code of the.
Actually, this probably means that the database was very well-designed. I have seen way too many databases that consisted of one to three tables. Most real-world databases require a whole lot more. Maybe you should also look at the page he is talking about before you comment. That page has a lot of functionality.
Engineering and the Ultimate
Just use the appropriate ODBC driver (myODBC for example) and keep using your tools. Works fine.
I/O Error G-17: Aborting Installation
Since you bring this up, you seem to bring it up as if you would like to use it. Open and free software come as a result of programmers wanting to scratch their itch, if you have this itch, why don't you start the project? What are you waiting for? You can do it, even if you are not a great coder, start something, look at slashdot, look at where it is, it didn't come from a great codebase, but it is here because it was started. Scratch that itch! :)
------ Curiosity killed the cat. {satisfaction brought it back | it didn't die ignorant | lack of it is killing mankind
Nonsense. One page may well be doing some extremely complicated processing, which has evolved over many iterations of the code - it's not hard to imagine that such a page could have 16 queries in it, especially if it's being written in an oo style. Sure, in an ivory tower you could spend days / weeks refactoring and cut down the number of queries a bit, but computer time is now significantly cheaper than programmer time.
Oh, easy. I'm an ASP/COM/Oracle developer. I jsut got finished converting a state's Medicaid system from Mainframe to Web-RDBMS based. The Medicaid claims filled approx 1.5 terabytes, and grew at a rate of 100 gig/month.
Such things exist, and you, Sir, are talking out of your hat.
Pray tell me your criteria for deciding that it was 'very badly designed', and how you would do better.
You can get Oracle 8i for free from oracle, and while its a bear to setup, I've found it to be faster than PostGRE 7.0 in a lot of ways ... of course, you'll make up for that with all the money you'll spend in training costs ;).
Finally, a SQ..., *cough* I mean, sequel to the the last database article.
What *I* noticed is that in one graph the number of request *per*second* increseased for PostgreSQL. And this does not look right, so the original poster is quite right.
.. With some substantive data information, we could find out what the reasons for the results were, but right now, all we have is conjecture.
Sorry, No.
Yes, the number of requests-per-second increased (by a large amount at first, then leveled off after you hit 20 clients) but keep in mind that you're MAKING MORE REQUESTS. If the database is properly scalable, the number of simultaneous connections will increases until the DB hit's its' processing limit, and then level off, which is exactly what happened in the graph on pages 3 and 4.
Think about it - (on page 3) the DB is capable of handling (approx) 3 "complex queries" per second - it doesn't matter how many clients are making those requests... now, if your clients take too long to process the data, and (with 5 clients) only request 1.5 pages per second, then that's all that the DB will serve.
The initial scaling period (from 1.5 to 3) tells me that at 5 clients, the DB is waiting around, and that the web server (or whatever is processing the returned data) is the bottleneck - you increase the number of clients, the DB sever has less waiting around to do.
The problem with the article is that there is no information on the queries themselves, or how he conducted the tests (did he run each one several times and average the results? what were the test systems?, etc)
One of the things not often mentioned with multiprocess vs multithreaded applications is the penalty you must pay for keeping the cache on multiple CPUs sync'ed when dealing with multiple threads. Also with linux threads are scheduled in the exact same fashion as processes so you gain even less.
Now on an OS like Solaris, you might gain from multiple threads, but only because Solaris' scheduler can tend to suck hairy nuts at times...
What does HUGE mean? Gigabytes? Terabytes? Exebytes? :^) The SI prefix you're looking for is exa. On the other hand, perhaps exe is handy as a "computerization", along the lines of the "K" for 1024 in KB (SI uses "k" for 10^3)... More information about SI prefixes is available here. Or, maybe your fingers slipped and I'm just being more nitpicking than usual. ;^)
Ah, "exebytes". That's the unit of choice for measuring the size of executables produced by that big & rich company, right?
main(O){10<putchar(4^--O?77-(15&5128 >>4*O):10)&&main(2+O);}
paraphrased: Dumb newbies use Mysql, but us smart people use postgres.
This is insightful? Surely it should be moderated as flame bait.
Sigs are awesome huh?
Even though I haven't tried, I wouldn't recommend it.
My general rule is: don't use stuff made for Unix on NT, unless you're just messing about, looking at features.
One of the main reasons is often stuff made for Unix assumes forking isn't that expensive. But that is not true on NT. Forking in NT is a real killer. It's almost like double clicking and launching an app. You'll have to wait for the app to do threads _well_ first.
anyone looked at SAPDB by SAP ?
They claim Enterprise strength, Free and source available April 2001
Binaries available now at www.sapdb.org
The functional needs of an application (and users) should drive the technical design, and scaling is just another challenge to be met after addressing functional requirements.
I just came off a site design that has 10+ complex queries per page, and this was after extensive optimization. Mind you, the pages have extraordinary, dynamic, user-responsive functionality. If we needed 16 to deliver the needs, then so it would have been.
As for database scaling, there are many options for clustering and replication and multi-processing. Your own words suggest you don't think in this realm. My world is perhaps not so simple as yours.
You speak with such authority, and yet, you are plainly presumptuous, uninformed, and hasty in your conclusions. Listen more. Talk less. THINK.
>Did anybody else notice that PostgreSQL actually
>improved performance with more concurrent users ?
>The only explanation here is the caching
>behavior of the data base.
No, the other explanation here is that the test was run on a quad processor machine!
Perhaps PostgreSQL can perform simultaneous queries in different threads?
Tom
Tom
I have discovered a wonderful
Well, if you have 4 GB of RAM and 50 GB of disk space, you might be able to find out. :-)
Paul
Stability from an impure shutdown? This is definitely a win for MySQL. Postgres has completely self-destructed for me a handful of times when the machine it was on hard-locked. The only recovery method is to create a fresh database from a backup dump.
Huge RAM and multiple CPUs? This is probably a function of the OS more than anything. MySQL is limited more simply because of its locking problems, right? So no matter how many CPUs you have - only one can update a given table at a time. That's a pretty important limitation if you have a database that has any updates simultaneously with selects.
Why? The answer is simple: load controlling. Basically, as concurrency increases, the cost of locking and that of buffer pool misses become prohibitive and transactions begin to starve. So most "smart" DBMSes will queue a few transactions for later processing after a certain point.
Because MySQL uses such coarsely-grained locks (at the table level), the opportunity for concurrency is very low to start with; hence the decreased throughput and starvation.
If you're interested in this stuff, look at Jim Gray's seminal paper "Granularity of Locks and Degrees of Consistency in a Shared Database" and Chou and DeWitt's "An Evaluation of Buffer Management Strategies for Relational Database Systems." You should be able to find either from the ACM Digital Library or in an anthology (like Stonebraker's _Readings in Databse Systems_).
~wog
Essentially what that test showed was that mysql had to lock all those tables in order to do the join.. And that with Postgres's versioning system, it was a trivial matter.
I remember reading on the topic of excessive joins in a Sybase Admin book once (excelent book on "practical" database, since it shows you exactly how one company solved all the DB problems). It essentially said that determining join order was actually a very slow process.. Exponential in fact.. Especially if your DB tries to determine which indexes to used based on statistics... The selection of indexes increases the number of possible variations incredibly. Sybas's solution was to only compare a select few of the possible join methods - which is a trade-off.. Alternatively, if you were to create a stored procedure, then which-ever method was best at creation time will be used from there-on-out...
Stored procedures - in Sybase at least - reduce a great deal of the overhead, especially in a 16-way join. Sadly, I don't know if Postgres does ANY pre-compiling of stored procedures. They have two main methods: straight named 'sql' statements, and 'psql' (a true stored procedure 'language'). My guess is that they do little more than pre-parse the statment (if even that). If this is the case, then using a stored procedure might even be slower than raw SQL since a call requires the parsing of the function name, then loading of the function into memory from a table, then the parsing of that statement (two more steps than raw processing). It is, however, necessary for triggers and rules.
Postgres (and I believe mysql) has raw-c extensions, but I really dislike these for general DBA operations (not to mention I haven't spent the time learning them).
As for the 16-table join, a general web page probably shouldn't do this (especially a heavily used page), but a database SHOULD at least let the DBA perform the ad-hoc query when trying to extract information. The only real solution is summary information, which often tends to be more trouble then their worth... Another solution is de-normalization (which makes me feel dirty)... Basically like what Blacknova.net uses... If you have 1 to N data, then just pick some upper bound N and include that many extra columns directly in your main table. Fast and efficient, provided that you choose the right N.
-Michael
-Michael
> What I noticed was that as you increased the number of clients, the total number of requests served increased, while the number of requests per client was lower - which is exactly what should happen - the performance decreases as the number of clients increases.
What *I* noticed is that in one graph the number of request *per*second* increseased for PostgreSQL. And this does not look right, so the original poster is quite right.
> The problem is that you don't know how to read a graph.
Compare graphs on page 3 and page 4. Then come back explaining us how *you* read those graphs.
Interpreting profiling results is a very difficult task. In that case, I suspect that it is impossible without getting the hands on the test configuration to try various alternative loads. Results are (IMHO) too strange to be accepted without further investigation.
Cheers,
--fred
1 reply beneath your current threshold.
Also, the article clearly states that these are worst-case scenarios, chosen to test the database, not the PHP code.
Isn't Informix based on an old version of Postgres?
Engineering and the Ultimate
The online PostgreSQL book is ~490 pages and the latest online MySQL manual I have is ~460, which is comparable. But MySQL also has an >700 page "New Riders" book and an >460 page O'Reilly book. Other than the online text, I haven't found any other PostgreSQL books, which is unfortunate.
Python, in particular, has PostgreSQL bindings but they don't yet conform to the 2.0 DB spec, which is a pain. In general, the PostgreSQL modules seem older and less maintained than the MySQL modules, which is both a cause and a symptom of support problems.
Again, I see nothing technically wrong with PostgreSQL. But if it's going to get more popular, I still believe it needs more support in terms of documentation and languages.
Ita erat quando hic adveni.
I have been unable to locate binaries for Windows NT/2000? Does anyone know where/if such a thing exists? I would like to evaluate PostgreSQL for our project instead of SQL Server 7.0 or 2000. I would also appreciate comments anyone has on such a move.
Thanks in advance!
Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
On the other hand, if you throw gobs and gobs of memory at your db, shouldn't it be able to handle the caching for you?
Subject says it all.
Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
pgaccess.
Tcl/tk and in the distribution
Rocks
realkiwi
This is absolutely not true. There are three Python 2.0 DB compliant PostgreSQL drivers. PyGreSQL has just added 2.0 compliancy, PoPy is 2.0 compliant, and there's another one that also purports to be 2.0 compliant.
Apart from Momjian's excellent online manual there is still the existing PostgreSQL documentation. Everything from basic queries to creating your own types, to hacking out extensions in C are covered by this documentation. MySQL may have more books about it, but most of the information in these books will be duplicate.
The fact of the matter is that PostgreSQL has more features than MySQL, is more standards compliant (SQL-92), is released under a more liberal license (BSD style versus GPL), and it is now even starting to outperform MySQL for simple selects and joins.
The reasons for staying within MySQL's limited abilities are getting smaller and smaller. Why not just use PostgreSQL in the first place? Eventually every successful database project is going to want the features that PostgreSQL has, why start with MySQL and face the prospect of migration later.
I'm sorry but I think this grossly misses several points and undermines the appropriate difference between threads and processes.
One way to think of threads is as a solution to bloated processes. In operating systems like Solaris, which are designed to scale to 32 or 64 (or more) processors, the number of locks in the kernel is enormous. As a result, context switches and process creation time tend to be higher. In gross terms, you pay for the concurrency at high numbers of processors with poor performance at lower numbers of processors. This is not necessarily bad (especially if you plan to use 16 or more processors at some point) but it is just the way things are.
Linux, on the other hand, penalizes scalability for large numbers of processors in order to get much better performance with smaller numbers of processors. Linux does this because Linus doesn't believe that 16+ processor machines are common or sensical and that the kernel should be optimized for common (and sensical) cases.
Why do I mention all of this? Because Linux process creation times are slightly faster than Solaris thread creation times. Anyone who fetishizes multi-threading versus multi-processing doesn't really understand the difference between the two and when it really matters.
Processes are contexts of execution. Threads are contexts of execution that may share a memory space with other contexts of execution. These are fundamentally different things in some operating systems. In Linux, a thread is just a process that shares a memory space with another process. This is because processes are *fast* to create and switch to in Linux (optimized for the common case, remember?).
Anyway, I don't mean to bash maraist at all, just to point out a common set of misconceptions about these things. When considering these architectures, each database system must try to optimize across all of the operating systems they plan to be deployed on (just like the apache project, which i think was an excellent analogy).
I work for a large manufacturing company, and recently we submitted our budget for next year (2001)... well, on there we had SQLServer, and a bunch of other commercial software. We got our budget back in its final form, and it did NOT include our SQLServers... we needed a database system, so I threw together a proposal for some smaller MySQL servers.
My manager liked the idea, but wanted info on other open source databases... I pointed him towards Tim's article (this was last week) and we've been on our merry way since. We now have both MySQL and PostgreSQL boxes in production for testing, and will make our determination later on!
Tim's article played a HUGE roll in our initial impression forming, and I would recommend it to all and any who are looking for info on open source databases!
-C
"This above all, to thine own self be true"
Postgres has FTI.. Well sorta, it's a hack included with postgres but it works reasonably well.
They should plunder ispell for ideas.
Which of the two is the more widely used? All you hear is that MySQL is the best. Seems that this is not the case...
There are methods for determing (theoretically) nice ways of breaking up tables to eliminate redundancy and ensure lossless joins. Look up Boyce-Codd Normal Form, 3rd Normal Form, et cetera.
Of course, if your DB gets real large, you may not want to do joins anyway.
Paul
We are a medium sized company developing web apps. We are not poor, but we would be if we ruled out open source software. We use db2 for most work, but postgres is being investigated and we have found it quite workable for our many needs.
Your comments remind me of similar ones a few years back when we started using linux and ppl said "serious companies don't use linux" Nobody (except a small-minded minority of change-haters) is saying that now.
Is this for the support that you get, or are these genuinely better at handling large databases (>100Gb)? Any pointers to stats on MySQL/PostGRE handling large databases?
Anyone else find it ironic that in the published report the graphs look a lot like they were done in M$ Excel (at least to me it looks that way)? I guess it still shows that even us open-source pundits find it convenient to stick with commercial apps sometimes. One might have tried putting the benchmark data into either the MySQL or Postgres open-source DB, then whip up a bit of Perl or Python or something and fire it through some open-source tools for graphics. Does anybody doing this sort of thing on a regular basis have any suggestions?
Anybody want a peanut?
The only explanation here is the caching behavior of the data base. Howerver, this also indicates that benchmarking databases is not a really trivial task, because exactly such effects must be considered for database benchmarks.
Further, besides a few graphs. The test says actually nothing different then: PostgreSQL improved strongly, while MySQL is a dog. Actually it says nothing at all.
Not that I mind the results. I worked with both databases briefly and believe that PostgreSQL is far closer to an industrial strength database (Lack of transaction control disqualifies MySQL for that in the first place). Nevertheless, I think the results really lack any significance.
ich bin der musikant
mit taschenrechner in der hand
kraftwerk
- Which of them is most stable (with HUGE databases) ?
- Which of them is most likely to be able to recover withouth glich from an "impure" shutdown ?
- How much disk space can they waste in tablespaces with frequent spaces ?
- Which will benefit more from a HUGE ram quantity or multiple cpus ?
--
1% APY, No fees, Online Bank https://captl1.co/2uIErYq Don't let your $$$ sit in a no-interest acct.
> Well, if you have 4 GB of RAM and 50 GB of disk space, you might be able to find out. :-)
The article said the hardware was a Quad-Xeaon with 1Gb of RAM. I beleive that ORACLE may be able to get past the fabulous 3 request per-second MySQL / PostgreSQL did...
Cheers,
--fred
1 reply beneath your current threshold.
I don't agree with that statement. In cases where you need data using 16 different queries, no doubt many of those could have been cached for a huge speedup. I've written code for back ends for several major sites and I've seen very complex funcationality indeed where only a few queries were needed per page - most data was pulled from caches. Also, OO helps you write code where you can fetch data lazily. For instance in a discussion forum, you might keep all root-level messages headers in a cache, the 1000 last accessed bodies in a cache, 10000 last accessed headers in cache, parent-child relationships in cache.. In this way, you don't have to go to the database 16 times every time users come to the front page of the discussion forum and are served the exact same page - 30 times per second.
Like you said, computer time is now significantly cheaper than programmer time. Well, memory is also cheap. Keep stuff in memory and use good old abstract data types to get your data fast.. RDBMS's are very overused these days.
Perhaps the database was well designed from a data modelling point of view but not from a performance point of view. A good database design for a web site that needs to scale is such that it allows you to cache the data that is used most often. On a web site that gets 30 hits per second but where data only changes about once per 30 seconds, one should start thinking why the data is pulled from the database every single time. User profiles on personalized pages can be cached, data can be pre-fetched with one query, cached and then accessed from the cache when needed instead of doing several queries for a smaller result set etc. etc.. 16 queries is insanity!
Commentary like this disgusts me, because the writer makes it clear he doesn't know anything about Postgres but merely presumes it's a poor implementation.
For the record, PL/pgSQL queries are pre-compiled, which is why there's no dynamic SQL capability in the language. You can create dynamic SQL from one of the other languages (pltcl, plperl) which exposes the SPI interface to the query engine to the user.
Work is under way to allow optional caching of individual query plans (i.e. by explicit statements), which allows one to avoid parsing and optimization overhead. When completed (7.2?) this will be great in the web environment.
I know little about Sybase so will avoid speculation about that RDBMS, might you be so kind as to avoid speculating about Postgres as well?
As a followup idea.. 16 joins reminds me of a project that I did once where we used ID's for every table.. Thus every table stored meta data and a single piece of actual data.. Thus if you had 17 pieces of data to display, then you'd have to do 16 joins. There were three solutions to this:
First, cache all the smaller tables, and do the translation (from say user_id to user_name) in an external hash (I know perl can easily do this; don't know about php). For static fields such as user-names, email addresses, etc, this is easy. Unfortunately this doesn't help when trying to find all users with email addresses going to aol and souce code that's over-due, etc. But it does solve the summary report problem.
Second is to break up the query into multiple operations.. You need an incredible amount of memory to do certain types of joins, and sometimes the sql-processor guesses the wrong method. Even though postgres's "explain" command told me it was doing hash-lookups for a user-name summary, I found that it was faster to first get all the data with user-id's, then perform a second search with
"select user_name from users where user_id in (x,y,z,....)"
So long as you don't have more than a couple dozen entries, this works nicely.. ESPECIALLY if you have thousands of rows of real-data with only one or two unique user-names. This was a nice-short term fix, but "clean" sql should all be done on a single statement.. Soo onto the third
Denormalize back to form 4 or 3. Namely, whenever you have a user-name, don't use a user-id.. Sure you consume 8 more bytes of memory per row, but it greatly speeds up searches. The method is to ONLY use ID's when you can not uniquely represent a primary key with a single column.. (such as first-name, last-name,...).. But in our computer world, "user-name" tends to be unique. "project-name" should be unique, "working-branch" should be unique.. In which case, at the design level, it's worth the extra consumed space.. Now I know that Mysql advocates (including me a year ago), cringe at the idea of keying off variable length strings.. For me, it was the idea that you can compare integers much faster than strings.. BUT, if the string is an index, then your sorting is going to be MUCH faster, strings distribute much better than auto-incremented integers (which tend to want to be linear). Additionally, the fact that you won't have to perform a join at all removes almost all of the performance problems.
When you don't use meta-data as primary keys, then you find that you only perform joins when you actually have relational data... That joins based on summary data almost always goes away.
Additionally, if your DB ever get's corrupt (say you lose a table), then you don't have to fret over "who the hell was user-id 155?". Given that Mysql doesn't have roll-backs, the potential for data corruption with meta-data is enormous (when you do both reads/writes).
-Michael
-Michael
You obviously don't know the meaning of an OO database. Oracle is a relational database, end of story. Just because their dopey little GUI says "Create relational structure" does not make it a relational database.
What? Can you explain this to those of us who aren't as smart as you think you are? First of all, Oracle is an object relational database. Second, you first claim that Oracle is relational, and then go on to tell us that the GUI(as if anyone uses the crummy GUI that comes with Oracle) does not necessarily make it a relational database. Hello? Don't you have recess or something now?
My other
Actually, you are quite wrong. PostgreSQL has more documentation, I know this cuz I have combed every possible link on both site. Go back to PostgreSQL page, there is a full book free on PostgreSQL on their documentation page. Also, PostgreSQL has bindings for a zillion languages as well, what do you want? Perl? C? Python? C++? more? Before you post, please update yourself before you post your misinformations. I like both database, and I use both. :)
------ Curiosity killed the cat. {satisfaction brought it back | it didn't die ignorant | lack of it is killing mankind
First, I started off with the words "Sadly, I don't know".. Therefore removing myself as an expert in that area. It was more a question than anything else.
As for the other languges, perl, tcl, etc, those can only be less optimial forms, and are more for procedural operations, which has nothing to do with over-comming the 16 join problem. (With the exception maybe of hashed-lookups, but that's dangerous)
It's great to hear about cached query plans (I'd forgotten the term).
As for the problem with speculation.. If nobody were to speculate, and those in the know don't impart their knowledge, then has anyone gained anything? 'Qualified' speculation does exactly as here; Inciting those knowing specific facts to quality errors. My speculation _was_ based on emperical analysis, however, so it wasn't as if I was blindly stateing something.
But now, as a question, is the compiled sql stored in any more optimal fashion? Or does postgres use an interpreter (which does little more than tokenize). To me, a compiled sql statement means that its making raw calls to table operations..
Simple benchmarks that I've done haven't shown any performance increases with using stored procedures, but that might just be because the compilation / interpretation stages were insignificant with respect to the overall operation. The other interpretation is that little is actually gained by the compilation of the sql statement (at least until the cached query plans come around).
-Michael
-Michael
Long live Linux! Ahem... Of course, for that system I ended up with SQL Server, because it needed to be Windows. Maybe one day...
Naturally, if I'm wrong and someone does have some stable binaries, I'd be most interested.
Open source databases -- great. But how about tools to facilitate database development? In commercial terms you've got modeling systems (e.g.: Erwin), Extract-Transform-Load (ETL) tools (e.g.: Informatica, Sagent), Reporting tools (e.g.: Brio, Business Objects, etc.). GUI managers, GUI drag-n-drop development tools, etc.
In the open-source world you've got... Well you've got to custom code in some non-database-centric language, e.g.: C, C++, Java, Perl, etc.
Anyone want to start up an open-source alternative for an ETL tool? Target open-source databases as source and target systems. A usable open-source system would really raise the visibility of open source in the (hitherto) commercial-only data warehousing market.
"But actually trying to use m4 as a general-purpose langage would be deeply perverse" --ESR
No way. 'Large' is a terabyte or bigger. 100 Gig is no big deal.
Also, thanks to the remarkable synergy that MySQL has with the commercial Website's staff (thats what my marketing manager says, anyway;), PostgreSQL will have difficulty generating the same loyal following. Lest we forget, technical superiority is only a very small part of the battle, you must also win Mind share. PostgreSQL is the BSD of the database world - limited to an irrelevent core market segment.
KTB:Lover, Poet, Artiste, Aesthete, Programmer.
KTB:Lover, Poet, Artiste, Aesthete, Programmer.
There is no
Anyone who has designed a site and its database in such a way that generating a page requires 16 queries should not taken seriously when conducting a database performance test. Still, it does seem to suggest that MySQL does not perform too great (which doesn't come as a surprise to me really.. All I'm saying is that a site with a page that requires 16 queries is *VERY BADLY DESIGNED* and will not scale very well at all!
Centura Software makes an open source mobile database software called db.star and db.linux
db.star is the first open source, embeddable data management solution for Information Appliances (IAs). "It is a high performance, small footprint, developer friendly database engine which facilitates the development of rich and powerful applications for popular IA platforms, such as the Palm® Computing Platform and Windows®-powered Pocket PC. "
Open source but not free.
db.linux is OSS Data Management for Information Appliances
Open source and royalty free.
The page currently linked to refers (and links) to the database benchmarking article as "my July Article". Way to stay recent, Slashdot.
--Carl
PostgreSQL is faster the MySQL for this guy.
But, as always in the case of database server choices, nothing replace real world test. You should test the various servers on *your* data.
Frankly, I found the numbers quoted in the article ridiculously low. But as we don't know te volumetry of the data, the hardware used, the database cache size, etc, etc, this boils down to 'someone found PostgreSQL better than MySQL for its usage'.
It is not uncommon to have orders of magnitude performance differences between databases. I would love to know how what a tuned ORACLE would get.
Cheers,
--fred
1 reply beneath your current threshold.
Thanks for your experiences. I may yet collapse and simply use Linux for a test RDBMS server, but I suspect my client will opt for the perceived safety of SQL Server 2000. That's OK, it'll look good on my resume! ;+)
Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
Scenerio:
Your IIS has just been hacked ( no surprise ). The hacker
has used a common exploit to gain access to your server,
and M$ will not release a fix untill 2 weeks later. You have
10,000 users depending on that server for every day work.
The hacker is getting in again, and you have 5,000 users
online. NOW FIX YOUR WEBSERVER !!!
Answer: You can't fix it.
Real life situation here funny boy. Id like to see
you fix this since 3 MCSE's can't.
You are a joke.
ETRN x
After having used MySQL for quite a while I played with Postgresql and was quite impressed. I noticed, though, that it would not understand insertion of large text object data within an sql INSERT command as opposed to MySQL. I can't find comments about this anywhere... Does anybody know if it is something you wouldn't (shouldn't) do for some reason? Or is it another limitation of Postgresql?
Also, what is the real value of scalabilty for Avergae Jos User? It's relevant for a phone company billing system where the CDRs come flushing in by the millions per hour. But if you require that horsepower, you certainly don't lack resources to spend mucho bucks on a commercial database (from a company you can sue if something goes wrong), the iron to run them all (with a few nifty EMC disk arrays) and serious consulting.
ich bin der musikant
mit taschenrechner in der hand
kraftwerk
I should say at the outset: I've used both Postgres and MySQL on production projects and like both for very different reasons. But here's the thing (and this is a common problem with technology product evaluations): these products have very different design goals and aren't really that comparable.
I'm not saying the standard "MySQL's not a database because it doesn't support transactions and databases have to. ACID!" (although I'm sympathetic to that point of view, I don't think individual words like "database" are worth fighting over--If MySQL wants to call itself a database, fine). What I am saying is that Postgres was designed to be a full-fledged SQL92-compliant database with transactions and triggers and foreign keys and the whole lot. MySQL was designed to be a SQL-based front-end to a very fast database file format. These products are not the same and comparing them without agknowledging that seems foolish.
I've been very pleased by the speed improvements in Postgres recently (partisan testing aside, Postgres 7 really is much faster). I've also been impressed by feature additions in MySQL (although it still isn't close to what you would expect to get if you're coming from the Oracle or DB2 world). But both remain inherently died to their design goals. This isn't a bad thing, at all, because different projects need different products with different design goals.
I compare this to the (often senseless) comparisons of NetBSD, OpenBSD and FreeBSD. One is designed for portability, another for security (features be damned!) and the last for performance, features and multi-processing under (primarily) intel. Although they come from a common code base, they have obviously diverged in design goals. Instead of bashing one product or another (all of the *BSD's and the two databases discussed have *major* problems that are bashable) use the one that seems most appropriate to your needs.