UML, PostgreSQL Get Corporate Support

← Back to Stories (view on slashdot.org)

UML, PostgreSQL Get Corporate Support

Posted by timothy on Thursday July 1, 2004 @06:20AM from the what's-good-for-'em dept.

tcopeland writes "An article on NewsForge highlights some changes in the upcoming PostgreSQL release (v7.5) that are funded by Fujitsu. PostgreSQL core team member Josh Berkus says that "Tablespaces, Nested Transactions, and Java support" are being underwritten by Fujitsu; this has also been mentioned on the postgresql-hackers list. He also says that 7.5 will be "...the most significant new release of the software since version 7.0 almost four years ago". Good times for PostgreSQL users!" And ggoebel writes "Jeff Dike posted a notice to the UML [User-mode Linux] developers mailing list: 'The first bit of news is that as of last Monday, I am working for Intel. They generously offered a full-time position, off-site, with my time mostly spent on UML. This basically means that UML is no longer a part-time, after-hours thing for me, so we should start seeing more work happening on it, especially compared to the last month or two.'"

28 of 213 comments (clear)

Min score:

Reason:

Sort:

UML is pretty awesome by Anonymous Coward · 2004-07-01 06:22 · Score: 3, Informative

It's really the future of "shared" webhosting because it balances the power of a full server against the cost of a shared one. Some hosts like JVDS and RimuHosting are already doing this and it's great.
1. Re:UML is pretty awesome by Anonymous Coward · 2004-07-01 07:01 · Score: 1, Informative
  
  linode.com offers UML-based hosting as well. [no, i neither work there or use them]
2. Re:UML is pretty awesome by Anonymous Coward · 2004-07-01 07:01 · Score: 1, Informative
  
  ... also Linode.com, which has the largest deployment of UML.
clarification please... by Anonymous Coward · 2004-07-01 06:23 · Score: 3, Informative

UML ....

(1) Unified Modeling Language?

or (2) User Mode Linux?

Methinks (2), given that I work alot with (1) and have never heard of Jeff Dike
UML by Un+pobre+guey · 2004-07-01 06:25 · Score: 5, Informative

OK, UML is User Mode Linux. Got it. No, no, I'm not confused, I get the coincidence with the other extremely widespread use of the acronym. No prob, Dude.
UML by lorcha · 2004-07-01 06:25 · Score: 2, Informative

Who the hell is Jeff Dike and why is he working on the Unified Modeling Language? And why does Intel care about it?
Oh, you meant User-mode Linux? Well, why didn't you say so? Sometimes I think these writeups are intentionally confusing.

--
"Avoid employing unlucky people - throw half of the pile of CVs in the bin without reading them." -- David Brent
Re:Good to Hear... by Twirlip+of+the+Mists · 2004-07-01 06:36 · Score: 4, Informative

the primary DB System for so long has been MySQL. PHP coders don't have too much for an alternative

Au contraire, there are PHP interfaces for PostgreSQL, Oracle, Sybase, and MSSQL built right in to the source distribution. I seem to recall that back in the Bad Old Days before Mac OS X, when you had to compile things yourself, building PHP with all the necessary libraries was a huge pain, but now it's a trivial thing. Marc Liyanage maintains a PHP module package that snaps right into the built-in Apache web server on your Mac, and it already has most of the necessary bells and whistles built in.

--

I write in my journal
Good tools out there for PostgreSQL.... by tcopeland · 2004-07-01 06:37 · Score: 3, Informative

...some on PGFoundry, some still on GBorg.

PLUG: For example, there's this little SQL query analysis utility!

--
The Army reading list
Re:Table spaces? by jadavis · 2004-07-01 06:41 · Score: 5, Informative

"Tablespaces" allow you to put individual tables on different storage devices. Prior to tablespaces, an entire database had to be on one device*.

You are referring to two completely different technologies:

(1) "Writing directly to disk cluster" - By that you seem to mean direct disk access, not through the filesystem. I don't even think this is part of the PostgreSQL TODO, because there is just not a very strong need. Are you experiencing performance problems in this regard?

(2) "fragment tables across spaces" - By that you mean "Table Partitioning". That allows you to break up a single table across multiple storage devices. That would be very valuable technology, but as far as I know, won't make 7.5.

If all these features really work out for 7.5, they should call the release 8.0, and maybe they will.

*: There are some tricks you can use if you need to move a single table to a different device prior to 7.5. I think symlinks work fine, but if it's important, I'd wait for 7.5 or ask on the -general list to make sure it's correct.

--
Social scientists are inspired by theories; scientists are humbled by facts.
Re:Table spaces? by rgigger · 2004-07-01 06:43 · Score: 2, Informative

No it does not write directly to the disk cluster if you mean that it can write to a raw unformatted disk. They want to get all of the nice buffering for free from the os because they can't beat it's performance yet. Writing directly to the raw disk would slow it down right now. They are going to reconsider this if someone can write a caching system that can beat the os but so far that hasn't happened.

They also do not have table partioning. It has been discussed and it is a high priority feature but it doesn't seem like anyone has seriously tried to tackle it yet. I'm guessing that it will be on the radar for the next release though.

Tablespaces basically just lets you partition your db across different volumes but a single table cannot be split up.

I am not a developer but this is what I have gleaned from the hackers list.
More servers running PostgreSQL... by tcopeland · 2004-07-01 06:56 · Score: 2, Informative

...can be found on the Big List Of GForge Sites.

Props to Tim Perdue for picking a solid database on which to build GForge!

--
The Army reading list
User-Mode Linux Management by Anonymous Coward · 2004-07-01 06:59 · Score: 5, Informative

...If you want to manage a lot of UML virtual machines, I _highly_ recommend UMLazi. It has a very slick configuration file format-- configuration directories instead of a single file, which makes it really easy to manipulate with scripts--, and they've obviously put a lot of thought into security.

I had a few problems getting it started, but the developers were very helpful.
Re:Table spaces? by GooberToo · 2004-07-01 07:26 · Score: 4, Informative

"Tablespaces" allow you to put individual tables on different storage devices. Prior to tablespaces, an entire database had to be on one device*.

Strictly speaking, that's not true. You can move things around manually, and some have done so, but it's not pretty, not easy, and not easy to maintain. Implementation of tablespaces in PostgreSQL simply allows its users to easily do what was previously an arcane-voodoo art. So clearly, it's a big step up. But, you already knew that. ;)

"Writing directly to disk cluster" - By that you seem to mean direct disk access, not through the filesystem. I don't even think this is part of the PostgreSQL TODO, because there is just not a very strong need. Are you experiencing performance problems in this regard?

That's correct. AFAIK, there is no desire to implement raw partition support. The speed difference is minimal and the required code is large. Basically, you wind up writing a FS and associated buffer management into the database. The return generally is not very high. It used to be, many years ago. These days, filesystem technology and implementations are plenty fast. Those that want raw partition access, IMO, are simply living in the past.

If all these features really work out for 7.5, they should call the release 8.0, and maybe they will.

You are correct. Accordingly to the list, the numbering constantly goes back and forth. From what i gather, they are waiting to see what features actually make it in. Depending on the scope of changes, they'll then determine the version number. As a rule of thumb, people are calling it 7.5, simply because nothing else has been blessed.

Please don't think I'm correcting what you've said. You've said nothing that I disagree with. I'm simply adding a followup remark. ;)

Cheers!
windows port by MagicMerlin · 2004-07-01 07:26 · Score: 3, Informative

7.5 will contain a native windows port with no external dependencies. You can find the current binary version here.

Even though it is currently in beta it works very well. The port is now being downloaded over 2000 times a week and increasing all the time.
Also in PostgreSQL 7.5 - Native Windows Port by john_smith_45678 · 2004-07-01 07:26 · Score: 3, Informative

Looks like version 7.5 will also include a native Windows port. Prior to this, PostgreSQL on Windows has always required Cygwin (which offers a lot of great stuff in and of itself) to run.

--
John Kerry is a Joke!
Google didn't exist when user-mode linux started by mec · 2004-07-01 07:53 · Score: 2, Informative

Jeff Dike started user-mode linux in February 1998

message from jeff

Unified Modelling Language may have existed in early 1998; I first saw it in April 1999. But Unified Modelling Language was a lot smaller back then.

And Google did not exist in February 1998!

These days, when I need to name something, I stick the name in google and check for conflicts.
Re:Table spaces? by kpharmer · 2004-07-01 08:11 · Score: 4, Informative

> For the uninitiated and lazy, is there any compelling reason why that's better than putting the
> database files on a RAID and letting the OS split the table across devices?

Sure, you might want to distribute your data across multiple arrays. For example - keep your logs and tempspace on an fast & expensive raid 0+1 array of fast (15k drives). Then put small OLTP stuff on a another raid 0+1 array. Then put your huge graphic images, documents, etc on a much more economical RAID5 array.

I use multiple arrays all the time for performance and economics (in db2 & oracle) - this is cool to see postgres pick itup.
Re:Google didn't exist when user-mode linux starte by red+floyd · 2004-07-01 08:13 · Score: 2, Informative

Rumbaugh, Booch, and Jacobsen started on UML in the mid 90s.

According to this, UML 0.9 was from 1996, UML 1.0 was 1997.

--
The only reason we have the rights we have is that people just like us died to gain those rights. -- Cheerio Boy
Re:Table spaces? by Java+Ape · 2004-07-01 08:29 · Score: 3, Informative

The answer to this question depends on what your database looks like. For most small, general purpose databases the RAID approach is great. Fast, simple and not much planning required.
However, for larger or more complex systems there are some advantages to splitting tables over multiple disk systems. For example, tables with lots of little niggling disk writes (access tables, change logs, temp tables) can go on a fast (possibly striped) disk system. You don't have to waste high-priced, high performance RAID on archived data (if it crashes, restore from tape), or on large media files etc stored as blobs or clobs.
These are just examples, but on a large server with several different disk sytems available, this technology lets the database designer match storage system performance characteristics much more accurately than a simple raid.
Re:Why corporate self-interest can be good for OSS by nsayer · 2004-07-01 08:43 · Score: 2, Informative

Using a raw partition typically means bypassing the filesystem code in the OS. Since most databases simply consist of a small number of large files that are randomly accessed by the database system, the overhead of the filesystem is unnecessary. Not having a filesystem between your database and the disk also means faster crash recovery - there's no need to run FSCK on the (largely irrelevant) filesystem AND run a database consistency check - you can jump right to the latter.

You're right about this being for dedicated postgres boxes, but then dedicated database machines are exactly what you find in large enterprises. The "dot com" I work for has a big iron Sun running Oracle and nothing else, and a large number of smaller machines that do the "everything else". I think you'll find that fairly typical.
Re:GUI Tools by Anonymous Coward · 2004-07-01 08:56 · Score: 1, Informative

I use this one:
http://ems-hitech.com/pgmanager/index.phtml
Re:OLAP still missing... by stuktongue · 2004-07-01 09:46 · Score: 2, Informative

Okay, I'll bite. While I am certainly not an OLAP expert, I have found a need to learn a little about it and I plan to use it as part of an application I am developing for personal use.

For the uninitiated, OLAP stands for online analytical processing. In layman's terms, this refers to the process of interactive analysis of data, typically via incremental queries that progressively slice, dice, and refine the data set in order to reveal non-obvious relationships between various parameters.

OLAP is typically performed on data that is of medium-age; i.e., not just current data, as would be found in a typical operational database, but maybe not the full long-term historical data, as would be found, say, in a data mining environment. Of course, different types of data and different application scenarious make such generalizations somewhat problematic, but, generally, OLAP is focused on analysis of, say, the last year or two of data. Regardless, the data sets returned by OLAP queries are typically quite large. As a result, special techniques, distinct from those used for traditional transaction processing, are usually employed in order to meet query response time requirements, which are often key requirements for OLAP systems.

One technique often employed is the use of so-called "star" or "snowflake" schema. This form of schema is quite different from the very normalized schema of transaction processing systems in that the data are organized into central "fact tables" with related dimension tables. Dimensions are things like date, location, product, etc., and have attributes that allow fine-grained querying of the facts in the fact tables. These dimension tables are also constructed in a way that reflects natural hierarchies; e.g., a date dimension would allow queries by year, month, week, day, etc.

While such schema can be defined in traditional transaction processing systems, OLAP-aware database systems typically incorporate design elements that optimize processing of queries on such schemas. OLAP queries are focused on examining aggregates of data across the various dimensions, such as sums, averages, etc. These aggregates may be precomputed on selected chunks of the overall data set to speed up online queries, but the query processor needs to be able to identify opportunities to take advantage of such things. So, optimizing queries for OLAP is a key feature of an OLAP-aware system.

Another feature of an OLAP-capable system is some sort of API for creating the various components needed, e.g., the schema, definitions for any pre-computed aggregates, defining rules for "rolling up" from lower levels of a dimension's hierarchy to higher levels, etc. Oracle's OLAP, for instance, provides several techniques for accessing OLAP data and metadata, but they mostly boil down to either a Java API (high-level) or a more arcane, lower-level API for more direct access. The API(s) available to program an OLAP application can be critical in determining the ease with which applications can be created, and the types of applications that can be created.

Does this help a little?
Re:Why corporate self-interest can be good for OSS by stuktongue · 2004-07-01 10:16 · Score: 2, Informative

My experience is with Oracle, so my comments here will be mostly restricted to that context. You are correct in saying that database servers are best dedicated to that function alone; the resources involved (memory, network, etc.) in running a non-trivial database server usually demand their own machine.

I take some exception, however, to your view on raw partitions vs. filesystem-based storage. At least in the Oracle world, most studies and expert opinion I have viewed generally recommend against use of raw partitions. With appropriate use of RAID and suitable filesystem selection, the overhead associated with filesystem storage is usually not considered significant, despite many folks's assumptions otherwise. When you consider the difficulties in managing storage over time--e.g., altering tablespace mappings to files, expansion of tablespaces, equalization of I/O--use of filesystems makes such administration much more straighforward. Tom Kyte, a highly-respected technical expert at Oracle, highly recommends against the use of raw partitions unless you just can't stand the 2-3% performance hit.

That said, raw partitions have been required in "Real Application Clusters" (RAC) environments (previously known as Oracle Parallel Server (OPS)), at least until the mainstream acceptance of so-called cluster filesystems. It is my understanding that Oracle's work on clustered filesystems is aimed at allowing RAC systems to enjoy the substantial benefits of filesystem storage.
Re:PostGreSQL needs online backup by Anonymous Coward · 2004-07-01 12:35 · Score: 1, Informative

PostgreSQL has had "hot backups" for a long, long time. Since it uses MVCC, this is quite easy for it. Unless you meant something else?
Re:Postgres is kicking butt by Sxooter · 2004-07-01 14:19 · Score: 2, Informative

Do a google search for slony. It's in early beta right now, but looks very promising.

--

--- It is not the things we do which we regret the most, but the things which we don't do.
Re:OLAP still missing... by Sxooter · 2004-07-01 14:24 · Score: 2, Informative

Take a look here:

efeu

--

--- It is not the things we do which we regret the most, but the things which we don't do.
Re:postgre who? by Anonymous Coward · 2004-07-01 15:06 · Score: 1, Informative

If you _want_ mysql, then please USE mysql.

Many people who use Postgresql want it to continue to advance, and do NOT want it to become like mysql.

None of these features will make it harder to install or use a basic installation, they are advanced features to allow particular economic requirements, or performance requirements to be met.
Re:GUI Tools by lexus99 · 2004-07-01 16:32 · Score: 2, Informative

Actually, you may find pgadmin2 a better choice for now. It has a migration plugin that works wonderfully. ASAIK, this plugin is not yet available for pgadmin3, and doesn't appear to yet be a priority, as it should IMO.

PGAdmin2 is not available for Linux. I can only assume you use Linux since you mentioned pgaccess. I've not heard of a Win port of it, but since it is written in TCL/TK, it would probably be fairly easy to port. PGAdmin2 may even run fine under WINE (not tested)

However, with that said, the former poster was correct, MS Access DOES work very well with postgresql. There are a few problems, but I've always managed to work past them.
LeX