UML, PostgreSQL Get Corporate Support
tcopeland writes "An article on NewsForge highlights some changes in the upcoming PostgreSQL release (v7.5) that are funded by Fujitsu. PostgreSQL core team member Josh Berkus says that "Tablespaces, Nested Transactions, and Java support" are being underwritten by Fujitsu; this has also been mentioned on the postgresql-hackers list. He also says that 7.5 will be "...the most significant new release of the software since version 7.0 almost four years ago". Good times for PostgreSQL users!" And ggoebel writes "Jeff Dike posted a notice to the UML [User-mode Linux] developers mailing list: 'The first bit of news is that as of last Monday, I am working for Intel. They
generously offered a full-time position, off-site, with my time mostly spent
on UML. This basically means that UML is no longer a part-time, after-hours
thing for me, so we should start seeing more work happening on it, especially
compared to the last month or two.'"
It's really the future of "shared" webhosting because it balances the power of a full server against the cost of a shared one. Some hosts like JVDS and RimuHosting are already doing this and it's great.
UML ....
(1) Unified Modeling Language?
or (2) User Mode Linux?
Methinks (2), given that I work alot with (1) and have never heard of Jeff Dike
the primary DB System for so long has been MySQL. PHP coders don't have too much for an alternative
Au contraire, there are PHP interfaces for PostgreSQL, Oracle, Sybase, and MSSQL built right in to the source distribution. I seem to recall that back in the Bad Old Days before Mac OS X, when you had to compile things yourself, building PHP with all the necessary libraries was a huge pain, but now it's a trivial thing. Marc Liyanage maintains a PHP module package that snaps right into the built-in Apache web server on your Mac, and it already has most of the necessary bells and whistles built in.
I write in my journal
...some on PGFoundry, some still on GBorg.
PLUG: For example, there's this little SQL query analysis utility!
The Army reading list
"Tablespaces" allow you to put individual tables on different storage devices. Prior to tablespaces, an entire database had to be on one device*.
You are referring to two completely different technologies:
(1) "Writing directly to disk cluster" - By that you seem to mean direct disk access, not through the filesystem. I don't even think this is part of the PostgreSQL TODO, because there is just not a very strong need. Are you experiencing performance problems in this regard?
(2) "fragment tables across spaces" - By that you mean "Table Partitioning". That allows you to break up a single table across multiple storage devices. That would be very valuable technology, but as far as I know, won't make 7.5.
If all these features really work out for 7.5, they should call the release 8.0, and maybe they will.
*: There are some tricks you can use if you need to move a single table to a different device prior to 7.5. I think symlinks work fine, but if it's important, I'd wait for 7.5 or ask on the -general list to make sure it's correct.
Social scientists are inspired by theories; scientists are humbled by facts.
...If you want to manage a lot of UML virtual machines, I _highly_ recommend UMLazi. It has a very slick configuration file format-- configuration directories instead of a single file, which makes it really easy to manipulate with scripts--, and they've obviously put a lot of thought into security.
I had a few problems getting it started, but the developers were very helpful.
"Tablespaces" allow you to put individual tables on different storage devices. Prior to tablespaces, an entire database had to be on one device*.
;)
;)
Strictly speaking, that's not true. You can move things around manually, and some have done so, but it's not pretty, not easy, and not easy to maintain. Implementation of tablespaces in PostgreSQL simply allows its users to easily do what was previously an arcane-voodoo art. So clearly, it's a big step up. But, you already knew that.
"Writing directly to disk cluster" - By that you seem to mean direct disk access, not through the filesystem. I don't even think this is part of the PostgreSQL TODO, because there is just not a very strong need. Are you experiencing performance problems in this regard?
That's correct. AFAIK, there is no desire to implement raw partition support. The speed difference is minimal and the required code is large. Basically, you wind up writing a FS and associated buffer management into the database. The return generally is not very high. It used to be, many years ago. These days, filesystem technology and implementations are plenty fast. Those that want raw partition access, IMO, are simply living in the past.
If all these features really work out for 7.5, they should call the release 8.0, and maybe they will.
You are correct. Accordingly to the list, the numbering constantly goes back and forth. From what i gather, they are waiting to see what features actually make it in. Depending on the scope of changes, they'll then determine the version number. As a rule of thumb, people are calling it 7.5, simply because nothing else has been blessed.
Please don't think I'm correcting what you've said. You've said nothing that I disagree with. I'm simply adding a followup remark.
Cheers!
7.5 will contain a native windows port with no external dependencies. You can find the current binary version here.
Even though it is currently in beta it works very well. The port is now being downloaded over 2000 times a week and increasing all the time.
Looks like version 7.5 will also include a native Windows port. Prior to this, PostgreSQL on Windows has always required Cygwin (which offers a lot of great stuff in and of itself) to run.
John Kerry is a Joke!
> For the uninitiated and lazy, is there any compelling reason why that's better than putting the
> database files on a RAID and letting the OS split the table across devices?
Sure, you might want to distribute your data across multiple arrays. For example - keep your logs and tempspace on an fast & expensive raid 0+1 array of fast (15k drives). Then put small OLTP stuff on a another raid 0+1 array. Then put your huge graphic images, documents, etc on a much more economical RAID5 array.
I use multiple arrays all the time for performance and economics (in db2 & oracle) - this is cool to see postgres pick itup.
However, for larger or more complex systems there are some advantages to splitting tables over multiple disk systems. For example, tables with lots of little niggling disk writes (access tables, change logs, temp tables) can go on a fast (possibly striped) disk system. You don't have to waste high-priced, high performance RAID on archived data (if it crashes, restore from tape), or on large media files etc stored as blobs or clobs.
These are just examples, but on a large server with several different disk sytems available, this technology lets the database designer match storage system performance characteristics much more accurately than a simple raid.