The Future of Databases
gManZboy writes "Ever wonder where database technology is going? This is something that Turing award winner Jim Gray from Microsoft has given a lot of thought to. He recently published an article in which he looks at the many forces pushing database technologies forward, and what those new technologies will look like. Gray writes, 'the greatest of these [research challenges] will have to do with the unification of approximate and exact reasoning. Most of us come from the exact-reasoning world -- but most of our clients are now asking questions that require approximate or probabilistic answers.'"
Wow! That's a wonderful future! :-P
As in, he passed the Turing test?
"...the greatest of these [research challenges] will have to do with the unification of approximate and exact reasoning. Most of us come from the exact-reasoning world -- but most of our clients are now asking questions that require approximate or probabilistic answers."
Just nod your head and agree...umm..yep, I concur!
Disclaimer: I don't know a damn thing about databases.
Could someone summarize it without using the letter 'e'?
Starsucks
How many times have we heard of huge sites going down because databases become corrupt or unrecoverable, or of the huge resource strain (memory and CPU) from a large database?
In my opinion, the future of databases is nothing so complicated as pitched here -- but rather a move to simpler, more reliable back ends where the filesystem is the database. This is certainly the vision pitched by Hans Reiser and reiserfs, which aims to put more database like intelligence within the filesystem. So you eliminate extra unnecessary layers that just eat up resources and create fragile databases.
Data is data. Just data. Save it, read it, sort it how you like. Efficient results mean having rapid, low-latency access to data.
... except that you still do, because SQL isn't a way of avoiding logical errors. ... and that they still don't save time. At best, they allow for some parallelism, external access to the data, and a separation of concerns.
Add code to it, and you have data+code.
OF course, code is data, and thus data can be treated as code, and handled by other code. LISP does this moderately well.
But you can't avoid the fact that, as it stands, databases are just engines for keeping your data structures outside your code, or when you add code to them, engines for reading your data structure for you so that you don't have to think about how to do it.
I'm getting rather tired of the fad that databases should be tacked on to everything, ranging from a shopping list to guidance systems. When did adding overhead become the mark of skill?
The requirements for a database today aren't too much different from those twenty years ago - except for what we want to get out of them.
Now that data mining is a $[insert large number here]million industry, databases are being asked to do a lot more processing with this data than before. For example: old database query = get these attributes from tuples that match this pattern. New database query = determine how likely a user who has accessed 30 or more times this last month is to subscribe to the second-level pay service within the next ninety days, with or without an email advertising said service.
I [may] disapprove of what you say, but I will defend to the death your right to say it.
... MBA's want the magic glowy box to do their thinking for them.
Fortunately, Microsoft will be there to take their money.
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
Could indeed be useful at Microsoft.
At support desk:
SELECT user, probability(likly_madness_level) FROM caller_queue
-- to see if you should take the next call or let the person next to you take it.
At beginning of important day:
SELECT probability(crash_counter) FROM computer_log WHERE date=now();
-- to see if you should go on with your important report or just call it a day and play minesweeper.
$techology is dying. It will be replaced with $replacement. Insert 4000 more words sprinkled with $random_buzzwords. I am so smart! The end.
most of our clients are now asking questions that require approximate or probabilistic answers
Bioinformatics databases are a good example of this. DNA and protein sequence databases are often searched by approximate string-matching algorithms based on "dynamic programming" to hidden Markov models and other stochastic grammars.
Historically, drug target-hunters in Big Pharma created a market for accelerated hardware to facilitate dynamic programming searches, some of which (e.g. Paracel's Fast Data Finder chip) was originally marketed to government agencies who, um, shared an interest in approximate string-matching ;)
The rise of Sarbanes-Oxley highlights a key insecurity in the accountability of enterprise systems. Although the high-level applications can do a good job of tracking who did what to the financial data, the core DB may be open to tampering. If a DB admin with the right password can manually diddle a field in a database, they can change the financials of the company.
In contrast, a secure bitemporal DB would record not only the date of the what the data refers to (e.g., the purchase order was entered on March 3rd, 2004) but also the date(s) of any modifications of the data (the quantity and total was changed on December 31, 2004, Uh-Oh!).
This is more than just securing the DB with a hierarchy of privileges, it means that no one can overwrite the old data or change any data without creating an audit trail. This, of course, also means changes in the DB, OS or file system to make critical data only accessible through a secure DB layer that tracks changes (e.g., no accessible plain-text DB data structures). These same concepts could be used (probably are, for all I know) for OSS version control to track who did what and when to the code.
Two wrongs don't make a right, but three lefts do.
Not when my credit rating is at stake! OR, When an airline agent is mistaking me to be a member of Al Qaida, and therefore denying me a seat on the plane.
In these cases I want *exact* answers to everything time related.
Better indexing, faster lookups...
That's... about... it...
Object relational was the "new thing" that didn't really take off as well as they'd hoped.
Hell, I work with people who still can't handle compound keys and joins well...
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
The "next great advancement" in databases will be when I can setup 2 or more linux servers and have them act as a single database server. Our database server is the most expensive item in our datacenter because it's an N-way IBM server.
Queues? Workflows? Business logic? Excuse me for thinking that a database should just store data. I guess that makes me a caveman or something.
He didn't mention the biggest problem with databases - it's that the clowns down the hall (the DBAs and sysadmins) own it and you don't. I've seen teams store their data in flat files just because they didn't want to deal with those bozos.
Tristan Yates
Slashdotters- please remember that comments are better when the articles are actually read... that being so...
... it's NOT going to happen...
Here are my problems with what Gray et al said.
Under the mounting onslaught, our traditional relational database constructs--always cumbersome at best--are now clearly at risk of collapsing altogether.
In fact, rarely do you find a DBMS anymore that doesn't make provisions for online analytic processing. Decision trees, Bayes nets, clustering, and time-series analysis have also become part of the standard package, with allowances for additional algorithms yet to come.
He seems to be confusing two issues- one is finding what (data) to find and the other is finding that data. A database enables the second...a program, a human or both in combination with the technologies he mentions is responsible for the first. We shoudl never confuse these two. A database is a way to identify unique things by their properties, literally, a way to distinguish "things" from each other. If it's done that in a manner that provides for ACID and retrieval then our work is done. Finding WHAT to find is another completely different concern.
The relational model is not going anywhere- and that's what every database is - an implementation of the relational model; some better than others.
Gray is a great researcher, and maybe he was talking about the need to make current databases PRODUCTS "better".. but it doesn' read like that... it reads like a battle cry for us to move beyond the relational model.
awesome.
"If you are a dreamer, a wisher, a liar, A hope-er, a pray-er, a magic bean buyer
I can see they've no hope of being any competition at all come the real db revolution.
Seastead this.
More seriously, this means something like + Lucene[1] (or, more likely, lucene4c [2])
[1] http://lucene.apache.org/
[2] http://incubator.apache.org/lucene4c/
Simpy
If you want more Jim Grey, head on over to Channel 9 to see a couple of sit downs with him. Personally I found both Part 1 & Part 2 are both quite interesting and thought provoking.
Help Brendan pay off his student loans
XML.
You can already do this with Oracle RAC. We run this at the office. Works great.
...
Well, I should no longer expect news from Slashdot but :
:a spx?tr_id=735
this is almost the same content as his SIGMOD 2004 speech,
which is available since April 2004
http://research.microsoft.com/research/pubs/view.
How is the refurbishing of an one year old article news?
(And, BTW, I find the keynote speech better structured then the refurbishment)
I have discovered a truly remarkable proof for my post which this sig is too small to contain.
The parent makes a good point, and it's pretty easy to see why if one holds off the usual anti-Reiser reactions and thinks it through a bit.
Databases require a mechanism for atomicity to create their transactions, and because no common operating system has ever provided such, they need to implement it themselves at application level. It's like the bad old days before PCs provided networking, and you had to run up your own networking stack if your application needed comms.
Well reiserfs has the goal of providing atomic transactions at filestore level, so in principle it will become possible to leave a good chunk of the very hairy rollback processing of conventional RDBMSs to the operating system.
It won't remove the need for proper RDBMSs for power database applications, nor will it in any way obviate the need for database distribution, but it should make professional databases both simpler and more robust. And it should also allow mini-database applications to be coded directly around the filestore with better transactional properties than the traditional flat-file designs.
There's been a bit of movement in this direction for a while now. Oracle has had soundex() for a while at the least. It's never worked very well, though... :|
most of our clients are now asking questions that require approximate or probabilistic answers
What are my chances of getting laid tonight...
What are the odds of my winning the lottery...
What are the chances that my boss will find out about that phoney dinner reciept...
Seriously, SAS stat analysis software does exactly what this numbskull is talking about. You don't need a new kind of database, merely somebody with training in stats.
Can you do this without shared storage?
I would LOVE that, but I have little experience with Oracle (or any other high end DBs).
Could someone summarize it without using the letter 'e'?
Sure.
Th Futur of Databass
Postd by timothy on Monday May 02, @08:12PM
from th your-flight-status-is-'mayb' dpt.
gManZboy writs "vr wondr whr databas tchnology is going? This is somthing that Turing award winnr Jim Gray from Microsoft has givn a lot of thought to. H rcntly publishd an articl in which h looks at th many forcs pushing databas tchnologis forward, and what thos nw tchnologis will look lik. Gray writs, 'th gratst of ths [rsarch challngs] will hav to do with th unification of approximat and xact rasoning. Most of us com from th xact-rasoning world -- but most of our clints ar now asking qustions that rquir approximat or probabilistic answrs.'"
Hmmm, I kind of like 'databass'.
Should I take the Teradata Physical Implementation test? Or just let it slide since databases will eventually disappear anyway?
And since Oracle is *way* cheaper than IBM, it's problem solved!
Anyone who loves or hates any language, platform, or manufacturer, doesn't know what they're talking about.
funny++
I'm sorry, you must have misread the article (MTFA). I think you mean, "The 'next great advancement' in databases will be when I can setup 2 or more Microsoft servers."
DT
Is this thing on? Hello?
"most of our clients are now asking questions that require approximate or probabilistic answers.'"
Fuzzy Logic
From the article:
The problem starts, of course, with Cobol
Damn those Cylons! Why won't they leave humanity alone?!
"I think I agree with the parent. Databases are methods of storing and retrieving data. Trying to make queries fuzzy, or less structured is just wrong."
The reason we can't is more economic than technological. Now just imaging a terabyte database sitting on top of banks of associative memory.
640K is enough for everybody applies as much to databases as anything else.
You certainly can do it, it's just not easy. I agree, though, that the first group that can make this easy enough (run program, login to DB network with DB servers, access DB network just as single DB is accessed), will become very popular.
$sarcastic_comment
Slashdot is well on the way to automation of both articles AND commentary.
Faster computers allow more complex queries.
No fucking shit.
I've got to get off this hamster wheel of doing actual work and become a "computer scientist" or "theoretical physiscist". Then I can just state the obvious or make shit up.
> The "next great advancement" in databases will be when I can setup 2 or more linux servers and have
>them act as a single database server. Our database server is the most expensive item in our datacenter
>because it's an N-way IBM server.
lol, IBM has supported *exactly* what you are talking about for at least five years.
That is, you can spread your db2 database across 10,100, or 1000+ linux commodity boxes (ideally blades). Or you can use windows, or aix, or solaris, or hp-ux, etc. Of course, those individual boxes can be SMPs in their own right - so a thousand 8-way aix boxes is certainly possible, if not cheap.
Oracle is now in this game as well - oracle 10g can certainly support 32, and maybe 64 individual linux boxes in a cluster. The techniques are different between the two - oracle might be better at transactional systems. db2 is definitely better at data warehousing, data mining, etc.
Of course, there are still benefits to a big smp: a single P570 16-way will cost you $250k. But each of those 16 cpus is multi-code (and far faster than intel or amd), and with its micro-partitioning - it can run at least 150 linux or aix lpars (logical partitions). These lpars can grow or shink as they need - so you aren't always over-buying for size, buying new hardly-used hardware, or having to colocate apps on a busy server - when a different os would be preferable. Not to say everyone should go this way - but there are definite benefits.
As a 15 year DBA, currently we are working with some of the would-be far reaching (to most people) concepts described in this paper. The idea of a TRUE SQL Debugger is like, so big it's sick. Quest offers some tools that kinda sorta do this for Oracle systems, but a true realtime debugger would save me YEARS of work during my career as a SQL coder. For an Idea of scale, The last replication project I wrote for an employer propogated over Oracle DB_LINKS via triggers to synchronize a dataset in two cities, log it, and do something with errors. Because this particular system was a Peoplesoft installation, it was a subset of 6800 tables and 15k lines of code give or take some triggers, with NO debugger. OMG, it's like a "finally" moment to have someone even claim to be fixing this soon in their architecture.
Next, there was some inane reference to reiserfs above, which clearly ignores what a database fundamentaly both is, and is becoming. It really began (and I hate to admit this as a former Solaris/Oracle admin) with SQL Server 7 and Oracle 8, and the concept that a database should be object programmable. Reiser is not going to be streaming still frames of image data fast enough to a remote client to rebuild seamlessly into a movie, for instance. Or recalculate all of a company's business logic for point of sale systems so that, for instance, the wrong type of credit card gets rejected, or so a supply chain gets populated, the list is endless. Reiser, and for that matter VFS and the other myriad of database enhanced filesystems, are tools. Good ones, but tools...
It's interesting to note that MS has finally figured out that the "n-tier" was a dumb idea. It's almost like, well you take all this shit, then sell it through a middle man, but expect to not have to pay him anything for brokering. Like, duh. We actively benchamrked this process, in fact, and discovered that it does, not suprisingly, take time to pass data through an extra server.
Workflow is life. It's what make this page exist (SD is I believe run in MySQL). The idea of publishing-subscribers with atomic transactions is hardly new, but I agree with the authors that this is the direction of the market, simply because businesses now are getting spread all over. Read - If your job just went to India, learn to be a DBA, cuz when all that shit they sent over there comes back, you can bet its going to be a mess (and is a mess actually already, which is why, in particular, people in ERP fields that intertwine with mine(as a DBA) demand and recieve very large salaries, 200$US an hour is not unusual). The reason this particular ramble is relevant, is because lots of global companies are either looking at, or are already implementing, the idea of data grids, where all the data servers inside a global network stay in sync. Suzy the secretary checks out a document in Baltimore, and that document flags as in use in Madrid through transactional replication within a kind of database trust-relationship network. It's a very very good way for companies with lots of data to keep it all together, but today it's still a pain in the ass to manage.
Vertical partioning is pretty much worthless except to data warehousing installations, most of whom are probably running on strong equipment already (to have that much data). Not to mention, I believe (I'd have to check, since it's not a feature I'd really use) Oracle's 10G product allows for this already if you really want it. Materialized views is another point here that raises my hackles. This guy is writing about the wonder of materialized views and column partitions, which ARE a cool performance cheat in large systems, but make no mistake that by the time you get to this point, you are probably rearranging deck chairs on the titanic anyway. Essentially Materialized views precache SQL resultsets into a temporary table which gets constantly updated so it can always provide a full resultset without having to parse the parent table. This is processor and space expensive. Vertical par
Imagination is the silver lining of Intelligence.
the huge resource strain (memory and CPU)
... will come, not from your dbms, but your os.
The difference being?
Mongrel News all the news that fits and froths
Does anyone besides me get the feeling Microsoft is going to take over the world one day? Jeremy MCSE MCSA CCNA http://www.n2networksolutions.com/ Arizona computer consulting
"The 'next great advancement' in databases will be when I can setup 2 or more Microsoft servers."
This actually works too. Stably too, go figure.
--chitlenz
Imagination is the silver lining of Intelligence.
Object relational was the "new thing" that didn't really take off as well as they'd hoped.
You're thinking of Object databases, which indeed did not take off at all.
However Object-Relational systems are EVERYWHERE. There's hardly a big database anymore that doesn't have several object-relational mapping systems between it and code...
Object->Relational mappers have taken off in a big way, which is good in a way since the databases can remain the nice placid solid systems they've always been and you can go to them directly when things get inefficient.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
From a practical point of view, the kind of databases that people like Gray have made a career out of have been very useful up to a point, no question.
On the other hand, those databases have already been pushed far beyond their limits: people have been using them inappropriately in many applications. Much of the "hot" recent stuff Gray mentions is not new technology: smart people have been proposing it and using it for years, only to be beaten down in the market by the relentless push behind relational technologies beyond any reasonable limits.
The research problems aren't new either: approximate, probabilistic, and inferential retrieval have a long tradition, but they have received relatively little funding because most of it has been going to fairly meaningless and incremental improvements in the kinds of traditional databases Gray has made a career out of.
Let bygones be bygones, but perhaps it's time for people like Gray to retire and leave the next generation of databases to people who actually have the background to work on areas like approximate and probabilistic retrieval. Research in those areas can build on the core storage technologies for existing databases, but the kind of expertise required for making new discoveries in those areas is completely different from what someone coming out of a database research group will have learned.
How many times have I read database articles that says the same damn things through a thick and wooly fog of imprecise language and confused concepts, offering absolutely nothing: 1) the world is different now 2) the relational model is dead 3) XML, objects, blah blah blah.
*sigh* Where to begin? First of all, the relational model is just that: a model. In fact it's safe to say that it's the model for data storage and manipulation, since nobody has presented one that's 1) more general and 2) doesn't reduce to the relational model. Clearly what the author is referring to are current poor implementations of non-relational SQL-based database applications. Normally, a knowledgable person should stop reading. But let's continue. Perhaps, even with poor wording, he still has something to offer. Even if it's just more quotes of the week.
Aside: has there been any time in modern history when we weren't in a time of extreme change, and there wasn't an onslaught of information? Why were databases invented in the first place?
This was a particularly funny quote.. what did they depend on before?? The printer?
Another great quote! Databases are used to deliver IDEs?? What is he selling? Maybe he meant applications, not application development enviroments, but that doesn't make sense either.. applications have always been about data + algorithms + UIs.
Good lord.. hasn't he ever heard of a constraint? A stored procedure? A user-defined function? Column types?? That's business logic! At least he got some new terminology out of it: "active database". I'm now confident predicting that this article will consist mainly of coming up with new names for old things.
What does any of this mean? What was I using to write stored procedures before? Why are you calling them "modules" now? What is a "database object" and how can you define an "object" as a "class"? I thought a class was a template for an object?
Anybody remember this TV commercial: "When pizza's on a bagel, you can have pizza anytime". I don't know what that meant, and I don't know what he means here either.
Yup, I was right... let's rename everything, that makes it new and fresh! Fields are now objects, values, or references.. records are now vectors and tables are sequences (is that different than a vector?). And somewhere in th
I'd personally ask a Google employee where the future of databases is heading. The Google FS really shows where databases are moving...
I give Gray a lot of respect in most cases because he's a really smart guy. But the math and computationally-intensive parts should be focused in the probabilistic searches.
In one sense, though, Gray is quite right. And this is the direction of speech recognition. I might add that the Speech Server beta out by Microsoft is quite good...even at this stage.
This sig donated to Pater. Long live
I see the future now and it will happen in two phases: getting rid of SQL and then replacing it with something half-way decent (like a properly implemented relational algebra.)
Picture this... memory nowdays is a hell lot cheaper than a full Oracle Licence. So, instead of investing on a DBMS why not buy massive quantities of ECC memory and keep all instances of your data in-memory for near instant access?
Crazy idea, huh? What if I said that this can be as fast as 8000 times faster than Oracle? And 3000 times faster than MySQL!
Crash recovery? No big deal, keep a serialized version of your in-memory-objects, and a transaction log and you're set!
Read more at:
http://www.prevayler.org/
---- You know how some doctors have the Messiah complex - they need to save the world? You've got the "Rubik's" complex
Uhm, keeping your data in RAM with a serialized version on disk is a database, what makes you think it isn't?
But what if you want to access your DB from a different application that has a different serialization format? What if you want to perform arbitrary, ad-hoc queries that have nothing to do with your original object structure? What if my DB grows beyong my RAM? Oops. Welcome to 1970, we're working on solving these problems.
(For the record, the author did talk about memory databases.)
Can you do this without shared storage?
Why would you want to?
With shared storage (hello, VAXcluster 1984!), you still have access to all of your data as long as one of the nodes stays up.
"I don't know, therefore Aliens" Wafflebox1
will this still work in future databases?
1. On a new Worksheet, Press F5
2. Type X97:L97 and hit enter
3. Press the tab key
4. Hold Ctrl-Shift
5. Click on the Chart Wizard toolbar button
6. Use mouse to fly around - Right button forward/ Left button reverse
Yes, i AM aware that excel is a SPREADSHEET. it's my feeble attempt at a joke you vultures.
Show me the cost of airline tickets when The Who were touring during winter and compare that, inflation adjusted to airline tickets today that I can purchase now.
Don't laugh, these people have one goal in mind - answering questions based on data on your disk or on the web.
The use of a database as a file system will require radical new technological advances in database theory as the current methods break down under the new requirements. The functionality of the file system will change as the capabilities of an underlying database are realized. The two forces together will create an interesting discontinuity in the industry, the kind the venture capitalists look for.
It's all good. Pray for WinFS.
A most overlooked advantage to owning a computer is if they foul up there's no law against wacking them around a bit.
Digital did this over ten years ago. One of the things that Oracle inherited when they bought RdB from Digital was the cluster support. However it seems they tool a long time to get the technology into their own RDMS.
2 servers acting as a single database server has been available for many years...e.g., Oracle 9i RAC, Oracle 10g, DB/2's something or other, etc.
Advice: on VPS providers
> Digital did this over ten years ago. One of the things that Oracle inherited when they bought RdB
> from Digital was the cluster support. However it
seems they tool a long time to get the technology
> into their own RDMS.
I've got fond memories of an 800 gbyte billing & customer data warehouse on rdb around 1995 - giving sub-second access and running on a vms quad. That was such a slow system compared to what we've got now - but it sure handled a ton of data well.
On the other hand, I don't remember how much was due to the excellent clustering in vms - or how much was due to rdb...
Man, is there anything he left out? My God, you'd think that everything (my TIVO, my IPOD, XML, streaming data, web servers and my mother's apple pie) was a database. This guy was a stoner, but his brain's too fried to qualify now. Most hippies were a lot brighter and did good theoretical work; this fellow's little more than the burned-out husk of what once passed for a hacker.
At one time, I though object oriented databases were going to be the next big thing.
My original post said:
The "next great advancement" in databases will be when I can setup 2 or more linux servers and have them act as a single database server.
i.e. Out of the box, I can setup a database cluster. I'm talking about costs for HA. If I can dump the big iron for 2- or 4-way x86 servers, I'd save money. But if I need to pay a lot for support for Oracle's RAC, or a custom setup/installation, etc., then I'm not saving money. It all comes down to the bottom line.
Prolog can do everything SQL does and much more. It is the natural language of relational databases.
-xeo_at_thermopylae@yahoo.com
Regex! As processors get faster, memory gets cheaper.... I wouldn't be surprised to see more better, faster, etc. implementations of regex that allow doing what full blown databases do today. Of course that's in a read/only context, but I've implemented full blown "database" applications centered around the regex. And some will point out regex doesn't deal with integrity and data management issues, I would point out many databases are implemented in overkill mode where data integrity and management are handled sufficiently and nicely with underlying OS mechanisms and the database engine itself becomes uneccessary (sometimes evens adds overhead).
Personally, I think so many things are "database" implemented because some glossy brochure somewhere convinced a room full of PHB's they needed a database solution.
Again, let me re-iterate, I wouldn't suggest this replaces and/or solves database issues, and becomes the new direction of database technology, but the increased processor speeds DOES allow for implementations relying solely on "crufty" technologies (e.g., regex, be it perl, awk, python, whatever) instead of databases costing tens of thousands (and more) dollars.
> So, instead of investing on a DBMS why not buy massive quantities of ECC memory and keep all
> instances of your data in-memory for near instant access?
because a *well-tuned* relational database with a 1:4 ratio of memory to disk is almost as fast as an in-memory database - due to efficient caching
because some queries require an enormous amount of temp space. supporting them can easily double your space requirements - which have to be purchased in memory.
because if you just want to run your database in-memory you can already do that with most databasees.
because you don't have the same speed requirements for every piece of data in your database. You might have some tables used for session & user management that are often read & written to and must be very fast. But other tables that just hold seldom-accessed historical data. A modern database would allow you to keep the small & fast tables effectively in memory, and the huge 100 gb history table on disk. And you don't have to buy 100 gbytes of memory to do it.
because...it's just a bad idea.
www.teramanager.com Teramanager - HRI (Historic Retrieval Interface) In some cases, Customer Care Departments need to have historic information online to fulfill the customer's requests. This requests force the operator to be online with the master database generating an important traffic over the network and an extra workload for the host. HRI offers a way to avoid theses problems making a large historic database very easy to handle, distribute and install in remote locations without having to be connected the "real" database. This module is used to publish information over the internet avoiding exposition of the main database. The information is secure and is accessed at very high speed. Applications: Reduce Host traffic and database inquires. Access Security (users do not access the host database) Internet Access
[misc drivel] Read more at:
http://www.prevayler.org/
Oh my dear god. You've never actually used Prevayler have you? Prevayler isn't nearly as useful on actual data problems as Prevayler's worshippers would have you believe.
I know this because I tried to use it. If you'd ever tried to use it, you'd know how unbelievably poorly it performed when attempting to implement real world queries. You have to implement every query in Java, and Java is a particularly poor implementation choice for creating complex queries.
What if I said that this can be as fast as 8000 times faster than Oracle?
This "performance comparison" that the Prevayler group trots out is particularly funny as their test uses a single ArrayList of objects as in-memory "storage" and then "queries" it by index. Not exactly a realistic problem. Try a query across four classes with a few million instances of each class and you'll quickly discover what relational databases are good for.
Regards,
Ross
Prevayler exposed.
there isn't a +6(Interesting).
The big end of the database world has always seemed strange to me. Your post provides some interesting views on that area.
. . . I no longer have to debunk the sweeping claims of an AC--I can simply provide a link.
-- . . ramblin' . . .
This notion of "active databases" seems to me to be interesting but fraught with problems.
Not least of which is the old bugaboo - documentation. How do you document a system composed of myriad triggers scattered on myriad tables in myriad databases communicating over the Net?
All I know from trying to decipher ONE Oracle Forms application at City College of San Francisco is that it is nearly impossible to get a handle on what happens where when. There appears to have been NO effort made by Oracle to enable a coherent method of documenting an application developed with their Forms technology - or of reverse-engineering such an application in order to develop such documentation.
Just printing out a bunch of trigger code and GUI design panels says nothing about how the app actually is supposed to WORK.
Great for obfuscating your proprietary code, I suppose.
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
Well, since Microsoft recommends running a separate server for every server function, I imagine they'd say if you want to run two SQL Server databases, you'd best use two SQL Server engines running on two separate Windows Servers on two separate hardware systems - for which of course, you pay for two licenses (and two more for the Windows Servers).
Of course, Oracle with their database layout basically says the same thing - except they want you to put your indexes, your tablespaces, your logs and everything else on at least SEVEN separate servers...
Funny how that works out to mean more licenses to buy...
I view this article as meaning that Microsoft intends to introduce a new "Data Mining Server" - which they will recommend running on yet another Longhorn server running on yet another PC...
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
This is one of the most basic cluster services in OpenVMS. It is fast and scaleable (I use the present tense as there are still some big installations knocking around). The main thing about it is the way it allowed you to keep buffer caches synchronised across a cluster. The I/O system underneth RdB was actually part of Digital's CODASYL product, DBMS-32 which had been clustered quite happily over twenty years ago so it was well proven.
To be fair, this is probably why Oracle had trouble using Digital's technology as Oracle needed to be platform independent and not many platforms support the multiple functions of DLM in such an elegant way.
Yes, a good DBA and/or Database Developper is a very valuable addition to any team.
The problem is that in a lot of corporations (e.g., the one I work for), they -- and all other admins -- have been taken and put in a different building. And more importantly they don't actually have to cooperate with any team.
Their job's goal is no longer the same as the developpers: to get a program done by a deadline. They've been turned into a bureaucracy whose only job is to see that the servers run. No more.
That's an _awful_ job description, because it directly makes the developpers their enemy. I'm not even talking "slippery slope", but direct cause-effect. Instead of being "the other half of the team that will make this program work", developpers just become "those assholes who crash our servers."
It's not hard to get from that point of view to pathologic cases like the admin that limited our productive servers to 3 connections per server. He kept his own servers running perfectly (which is his job description) at the expense of making the company's productive programs grind to a halt (hey, it's not in his job description to care about those.)
That's the problem with that kind of internal organization. As one BOFH-wannabe once said "The source of the problems on my network are the users. Would you prefer that I cut your access? Then there wouldn't be any problems any more." Another one threw a hissy fit that we dared ask that he does his job, during work hours. Yeah, how dare we bother him by asking if he could please reboot the test server he's managing.
That's the underlying problem. Instead of providing a service _to_ the users, a whole caste has been created whose job is to serve the computer, and the users are just those pesky assholes disturbing his majesty the computer. That's a very unproductive situation to create.
Worse yet, a bunch of companies invented the devastating practice of internal invoices. The admins in one department won't even go to the toilet unless they can send an bill to another department for it.
They won't even talk to each other (e.g., the WebSphere admin telling the DBA and the Unix admin that he needs a Solaris patch and a newer version of Oracle for the "transactionBranchesLooselyCoupled" setting.) No, you have to personally talk to all three of them, because otherwise they can't send three bills for it.
And predictably, they'll do _nothing_ more than the bare minimum that was requested and billed. E.g., you have to tell the DBA explicitly to set this and that, to this and that value, because she won't do that on her own. Which basically means you already need to have all the knowledge of a DBA, and she is just acting as a proxy over the phone... and sending you a bill for it.
Basically if you're not that kind of a DBA, you have my respect. All I'm saying is that when you read about "teams of clowns" or about people who'd rather invent their own storage than deal with a DBA... well, they're not necessarily avoiding _your_ kind, but the kind of clown I've described above.
A polar bear is a cartesian bear after a coordinate transform.
It's called... Oracle 10g.
Of course, Oracle with their database layout basically says the same thing - except they want you to put your indexes, your tablespaces, your logs and everything else on at least SEVEN separate servers
No, at least in the Oracle books I've read, it's spreading all that crap out across seven separate disk devices (i.e., SCSI, not IDE), with some on separate SCSI controllers. It's increasing that parallelism in data i/o...
Inter-server communications with linked DBs is slow.
There's a lot of talk in database circles about the fact that open source databases may do to commercial databases what linux did to commercial unixes. i.e. wipe them out. Recently LazyDBA one of the most well known websites for database administrators started supporting open source databases. Add to that the fact that Oracle is going on an app buying fest (Peoplesoft and now maybe Siebel), database people see that the commercial database in danger.
Even better, add or modify attributes on those tables in the RDBMS, vs doing so in an Object database (even the object layer in Oracle. Typically, it means tearing down the database and rebuilding the schema...er, object hierarchy, with 100% data loss...).
"Ever wonder where database technology is going?"
Yeah, all the time.
Don't take life so seriously. No one makes it out alive.
Pity he neglected to mention Graph-based databases (as in DAG). A substantial problem lies in the dynamic nature of information. Relational databases are lousy at storing relationships between data that were thought to be unrelated. Having to change the database structure all the time is a nightmare anyone can do without. Graph databases are able to model knowledge much more accurately with the added benefit of being able to store relationships between nodes without changing the design. Bioinformatics has been a good example of an application area for graph-based databases. Here the masses of information (ontologies, pathways, RNA sequences) need to be related in many different ways. Graph-based databases allow quering information in novel ways that relational databases simply aren't capable of handling. From that aspect, the requirements today are really very different from 20 years ago.
Honestly, folks, databases are like crutches: Pathetic, but you when you need them, there's hardly an alternative. They are the living proof that abstract concepts and computer simulation of those on real world hardware need the strangest type of hacks to be mended together.
On top of that - and this is the worse part - what we call databases today is nohing much more of a historically grown apocalyptic chaos. With one of the crappiest programming languages ever as a cornerstone of its technology. A weedy mumbojumbo of wanna-be virtual machines, wanna-be server daemons, makeshift security layers, obstrusive user management and pseudo operating systems and a bazillion proprietary variants of said programmin language. With features bolted on left right and center. This basically is the case with any current DB in widespread use, be it MySQL, Oracle or anything inbetween.
And if you look at the core of it Database technology and how long it has been that way there isn't much hope that DB's will go anywhere anytime soon.
Then again, if you want to get a glimpse of a possibly brighter future, I'd actually recomend Zope. I consider it's object relational DB a working proof of avantgarde "database" concepts and a prototype of what DBs generally could look like in the future if anyone were interested.
We suffer more in our imagination than in reality. - Seneca
This sounds very much like a program I assisted in in Cybernetics nearly 30 years ago. Based on modeling Intuition which pitted two automatons against each other playing a game (tick-tac-toe) one automaton kept a 'database' of failed moves and learned from them and got smarter. While this is not new, even then, and while written in FORTRAN IV it was a primitive version of this Idea.
My thought is this, if this is the best microsoft can do, dredging up ideas 30 years old as new thought, no wonder they are constantly reinventing the same wheels in development.
RJ
There is an interesting analysis of databases in filesystems (and metadata...) in the Ars Technica review of OSX: extended attributes managed at system level, an application like Spotlight making (some) use of this, etc. http://arstechnica.com/reviews/os/macosx-10.4.ars/
(this link was already given in the recent OSX Tiger discussion here)
Hervé
Herve S.
Well, of course they do, but I think the database concept has evolved as far as it makes sense. Relational and object oriented databases are more or less the logical limit to what you can meaningfully do about organising data - and don't forget, databases are about organising data, not about how you use it afterwards. You could argue that retrieval methods are part of what a database is, since eg. indexing is a way of retrieving data - but that is bad thinking, in my view. An index is just data organised in a certain way.
That is just my opinion about things; now roll out Wikipedia and Webster dictionary to 'prove' me wrong, I don't really care. But why make a fuss about it at all? Well, I feel there is a trend to muddle the concept, and I think it is a good idea to keep those things clear and simple; otherwise it all just ends up being marketing drivel. Take for example the way 'the internet' has become equivalent with 'web sites' - which is obviously nonsense. The internet is a physical network plus a number of protocols, of which http is one. But we still see from time to time some stupid sod blaring out 'The Demise of The Internet' because of some new virus or other nuisance that affects a large number of web sites. Hey, let us keep our minds clear - certain applications of the internet may go out of use, but the internet continues.
So, back to databases - OK, some wise guy thinks that we will see databases more like so and so, and that it could be cool if whatever. All well and good, but its not that databases are changing fundamentally, its just new applications.
I suspect that may translate as "most of our clients want to be given easy answers to difficult questions".
I'm sure there'd be a big market for a database system that stored flight bookings and could answer the question "which of our customers is a terrorist?". You don't address that market with new technology, though, but by developing new sources of snake oil.
Shared storage hardware is VERY expensive for me (I am in india), so I much prefer a share nothing system.
Replication is nice, but a true multi-master setup would be even better.
Both the parent and gradparent are right. The fact is, you're talking about apples and oranges.
When I look at things like Prevayler and XL2, I see systems designed for making it so your in-memory object graph is persistent in case of power loss. Yet these objects are still tradional objects with complete data encapsulation
When I looks at RMDBMSs, I see systems designed to store data, queried in arbitrary ways, and violating object-oriented encapsulation (your data is no longer only accessible through your object).
With an object persistence mechanism, if yo want to "Search" through your object graph, you have to build your own indices and lookup mechanisms (HashMap, TreeMap, BTree data structures in memory, etc.). With an RDBMS, you get that for free. THere are supposed to be OOBDMS that will let you do what you're used to in RDBMS, but I've not used one.
But the thing is, often all I need *is* object-graph persistence, not a full-blown RDBMS. And I can throw in a few hand-crafted mechanisms for indexed retrieval of my objects when necessary.
The future of databases must surely be Microsoft Access.
Your Average Joe
Microsoft has hired a bunch of pie in the sky forward thinkers from the golden age of computers. The question is - will their knowledge and visions be relevant in the future? How many visionaries manage to remain visionaries through the passage of time?
Well that's the essence.
The DB as a big artificial brain.
Associations are weighted. Parts of it 'dream' up new associations. It organizes itself based on assorted 'self conceived' recipes.
It's one very fuzzy wart on the ass of the AI as a black-box brain,
rhinocerous. Kinda cool that folks are thinking about AI in functional rather than structural terms, even if they may not know it.
Mind you this is just a first cup of coffee opinion!
The reason that database technologies are a sticking point, IMO, is that as a relatively mature (although still developing) technology, they don't attract much mindshare, and overall skill levels in using the technology is stagnant or deterioriating.
I've known with newly minted advanced degrees who could tell you about the finer points of implementing model view controller using web frameworks, talk you ear off about user principals and aspect oriented method interception, yet they get stopped like a deer in the headlights when faced with how to construct a relatively simple join query. Nested queries, outer joins, and null value semantics in aggregate functions are completely exotic to them. They no just enough to get by, they have no art in this area.
Yet, I can tell you that twenty-five years ago, the much smaller geek community of the day had tons of people who could argue the finer points of relational calculus.
It's technological fashion. These days, database technology is to informatics what plumbing is to architecture. Unglamorous, but you can't live without it, and when all the toilets are plugged up it doesn't matter how spiffy your new corporate headquarters are.
Of course, now that supply of database talent has dried up, people want their corporate information not only to flow, but to do tap dances and pull rabbits out of a hat. It seems to me that Mr. Gray is talking about is the union of two highly useful but passé technologies: database technology and fuzzy logic.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Language certainly requires a probablistic database, at least for the lexicon. Syntax, semantics, morphology and phonology all respond more to the rules of probablistic databases than relational ones. If such a key function in human life/ culture is a probablistic database system, isn't it logical that higher-level functions would also be?
So long and thanks for all the fish . . . !!!
Actually folks, this article is in timing with SQL Server 2005's release. One only need do the math to see that Rational Rose has a m$ smack light on it. SISS is DTS with more "Lower CASE" handling. There was no tie in with the Visio/UML solution; I figure its just around the corner. Other businesses that make money off of SQL Server were overheard grumbling at the Anahiem conference about the SQL Server 2005 release. M$'s BI,(Business Intelligence), interface can do some matching with Fuzzy, and Nueral logic; Along with 3 other methods of "Best Guessing". I give mySQL/apache/perl/firefox/openOffice about a year to catch up on what the 2005 product is doing.
IMHO, continuing to evolve the db to be as simple as possible is a good direction to go in. If you need fancy fuzziness, can't this just be another layer that enables various types of indexing? It seems to me that a modular approach to db, like mySQL uses, is a solid architecture to invest in. The modular system of back ends (ie. myisam, innodb) really empowers us to tailor an application infrastructure appropriately for all the different db use cases; primarily a read db in the case of dynamic web sites vs. rd/wr with heavy, contention, etc. Doesn't it seem appropriate to take the same approach to indexing functionality? To abstract it into a different layer? There could be modules for the r-tree stuff, full text searching, and all sorts of other fuzzy stuff. Giving the designer/programmer choices seems to be the best way to go; the write tool for the job. Wasn't this the philosophy behind Wirth's Oberon? To have collections of modules instead of distinct applications? It also reminds me of how/why *nix is so powerful and succesful- collections of small simple tools that can be put together like building blocks. Abstract modules or layers is obviusly a resilient, efficient way to design a system.
$famous_person is such an idiot. I thought about $hard_problem (without reading the article) for 30 seconds and came up with $obviously_broken_solution. Insert 4 or 5 more misspelled words. The end.
I set up an IMAP server, told the developers it was Oracle, told the boss they loved it, saved $15K and got a 10K raise in salary.
And then...I woke up.
Whenever you read this sig someone's refrigerator light turns on.
.. future of Database Technology. Actually you don't need to ask them. Just go to any bookstore and buy one of their books and you will quickly learn that relational doesn't mean SQL. Relational databases are about two-valued predicate logic and set theory and there is not more solid then this to be used as a basis for storing and manipulating information. Future databases will be truly relational truth systems with the support for user defined types and temporal data at the logical level and the much better implementation at the physical level. Jim Gray is authority in area of transaction processing but not in area where databases and database languages in general.
Prevayler is useful as an object store, thats about it. As a database, it fails pretty miserably, but I dont think anyone had pretentions of it ever being a database.
There's a very easy way to see that prevayler is not a db: prevyalerworks by loading the entire db into memory. This is the exact purposeo f a db, to hold data which cannot fit into memory and still perform queries on it.
If you just want an easy way to store your objects, prevayler aint half bad. Its very useful for saving state.
Myren
These 'distiguished microsoft engineers' are essentially nazgul. Once powerful thinkers, corrupted by greed, they have thus been subverted by the power of the dark lord, and betrayed all the values they may have once stood for. Now they are kept alive by spells.
My particular field is medical visualization for Radiology, so essentially I have to organize huge sets of patient data in a way that I can do things like, well, volumetrically render your skull to see if you have a lesion, etc. Today, I have to pull this to the workstation, organize the dataset, and render the scene from the dataset onto the stage. Because of the flowing nature of our data (that is to say, this isn't like a game where you can pre-cache models on the local workstations since every patient is a different model), I would like a way to tie direct3d to a pre-render engine at the database layer so that all I would have to provide to a client like a web page is the end product. I'm working with MS SQL atm, so I'll use it as an example, a typical MRI image of your chest comes out of a scanner in some stupidly high resolution. That scan typically contains voxel data which is defines by the mm thickness of the slice. Your POV as an end user over the web is, 'all I care about is this one particualr diagnostic output', or one image lets say. To actually GET that image may or may not require that a set of transforamtions be applied to a large subset of slices in any particular study. It would be really nice to not have to add external services (another app), and instead be able to directly and natively be able to access the inner workings of the database engine to do this directly, instead of offloading it to the local OS. Object programmability, in the
We're about to start on a big database backend for scientific and engineering frontends, and I'm having the damndest time trying to find a product that was designed with an eye towards what I'd call "basic mathematics".
Our short-term needs:
Long term, a future interest will be in the area of what I might call a systematized approach to scientific data analysis, and particularly things that go under the guise of e.g.
For awhile there was a big push here for auditing the DBA team.. Every option they put forth I told them how I could get around it. We are the superusers of the data, at some point we have to be trusted.
The funny thing is, SOX was brought on by executives who broke the law... Maybe its just me but the only people that seem to be affected by SOX are the people who weren't involved in the recent scandals. ie, my productivity has been impacted by 30-50% because of all the extra process. But when asked, no one is monitoring the effect these process are having on productivity and if headcount needs to increase because of it.
"Thanks to the remote control I have the attention span of a gerbil."
"You can yank the power cord out of the wall anytime you like, and the database won't get corrupted." - That's how I've generally heard the main advantage of a good DB described to non-technical people. Power failure won't cause corruption is the cannonical example.
I only vaugely recall the incident you mention; I thought at the time it was a case of "Why MySQL isn't a real DB", but before MySQL fans jump down my throat, I'll admit that I didn't (and don't) really know. It may well have been, as you describe, hardware outright lying to software, in which case, to return to the original discussion, using only a filesystem and not a db would not have helped. Certainly there might be bugs in hardware that a DB can't cover for. But there are many many sorts of bugs and other failures that a DB will prevent from corrupting your data, because that is one of the main design goals of a good DB.
I'll stick with my original point: If using a DB makes your data storage more fragile, you are doing something very wrong.
Actually, Oracle supported this in 9i as well.
DB2's clustering doesn't use shared storage iirc. In Oracle what you're looking for is replication, not clustering.
and since none of us on this board are into opensource your solution remains informed.
Shared storage hardware is VERY expensive for me (I am in india)
Ah.
Replication is nice, but a true multi-master setup would be even better.
Huh? Multi-master is replication.
Are you confusing replication with clustering?
"I don't know, therefore Aliens" Wafflebox1
But I don't think "IT" is.
... this query is running slow, maybe I should build an index and see if that fixes it... Hrrrm these results have improved nicely with that new index.
CS follows the "scientic method" of 'observe, hypothesise, test' and is falsifiable in a Popper-esque sense.
Hrrrm
Looks like a science to me.
What exactly do you mean when you say it is not a science? What is your basis for that?
We live in a time of extreme change, much of it precipitated by an avalanche of information that otherwise threatens to swallow us whole.
EEEEEEKK!!! Run for your lives!!!
Flying is easy, just throw yourself at the ground and miss. -Douglas Adams
Actually Oracle Parallel Server has supported this for years, at least since 8 officially.