World's Largest Databases Ranked
prostoalex writes "Winter Corp. has summarized its findings of the annual TopTen competition, where the world's largest and most hard-working (in terms of load) databases are ranked. The results are in, and this year the contestants were ranked on size, data volume, number of rows and peak workload. I wrote up a brief summary of the top three winners in each category for those too lazy to browse the interactive WinterCorp chart."
I would've expected to see Google in there somewhere.
scored a measley 17th. Oh well, time for more surfing.
Does the SQL Server mean MS-SQL?
I would have liked to see SQL vs non-SQL ranking too.
"The price good men pay for indifference to public affairs is to be ruled by evil men." ~Plato (427-347 BC)
I wonder how many of the spammers allowed their databases to be evaluated for this list.
Food not Bombs is a nice platitude but it breaks down when you notice that the Bombees are usually well fed
You're off by 3 orders of magnitude. The largest is 30TB.
I thought that 90% of the world's data was irretrievably trapped in IMS? Seriously though, I am surprised that an IMS system isn't on the list. Probably because it isn't relational, and the people making the list figure that RDBMS are the only DB around.
Lasers Controlled Games!
Based on all the hype about the national Do Not Call registry, I would have expected to see that up there somewhere. Then again, it probably consists of like one table and 3 fields. It certainly would qualify as a very popular database.
Leaving off a couple zeroes, my friend. Largest database in the survey is 30,000GB. Not to mention, of course, that you probably have to actually request to be included in these tallies. There could very well be much larger databases (maybe government agencies with three letters in their name?) that are unknown to the people running these numbers.
Game... blouses.
OK so this is obviously only vendors of databases and RDBMS systems.
In a broader sense aren't such things as the wayback machine a database? What about the truly massive amounts of data gathered at research labs, e.g. CERN. Who's the daddy of these guys?
my other sig is written in brainfuck
I would imagine that the Winter Corporation's db is now climbing up the peak performance for online transactions right now ;o)
$ strings FTP.EXE | grep Copyright
@(#) Copyright (c) 1983 The Regents of the University of California.
Nope. not even by a large measure.
France Telecom's Oracle database is around 30 TB in size (29,232 GB.. thats a comma not a decimal point).
- mritunjai
I have none, nada, zip experience in big databases. But it surprised me that the peak workloads were measured in 100s of concurrent queries. If I had to make a wild guess, I would have guessed 10s of thousands. My blessed ignorance destroyed.
Shouldn't the World Wide Web be ranking 1st with its huge pr0n database?
Real geeks use acronyms.
29TB actually. (Due to rounding, presicely 28.547 TB)
"If anyone needs me, I'm in the angry dome."
I honestly doubt that 29.2 Terabytes is the biggest database in the world. But anyway...
I recognize Oracle and DB2, but could someone give a brief synopsis of what the other database systems are? And what is an MPP archetype?
.
AT&T 94,305GB Daytona SMP AT&T Sun Sun
I wonder how much of this database is everytime users have switched to and from AT&T to get those cash bonuses!
Wait I'm confused... Is this supposed to be sarcasm?
Real geeks use acronyms.
I know where I work we recently (for an IT pat on the back) calculated our total network accessiable storage capacity and came in at a rough estimate of about 150TB. Now that is a giant swarth of data and a decent amount is in databases (MSSQL farm) but, scattered across 1000's of DB's.
It takes a truely amazing staff to maintain (backup, adminisister, maintence, sit and stare at screens) the servers and maintain the integrity of the data but, good lord...
A 94.3TB database? My upmost, and highest kudo's to those DBMA's and admins there. That is one gigantic task to operate. Being it's AT&T and assuming a great deal is billing and maintence functions these have to be up I'm sure a good 3 nines if not greater.
Regardless of the result of the study, which without actually reading the entire study the end results are simply a short-read of a geek pissing contest, I find it truely amazing how much work, man-hours, and midnight pager calls go into maintaining these databases. I know I don't want our DBMA's jobs and certainly wouldn't want to be a DBMA on a 94.3TB farm but, I know those that do and love doing it. It's a speciality skill and apparently these guys do it right...
Kudos...
Quote:
"The Internet Archive Wayback Machine contains over 300 terabytes of data and is currently growing at a rate of 12 terabytes per month." Taken from here
If you can read this sig - the bitch fell off.
CIA and RIAA are given a raw deal!
In order to qualify for the TopTen program consideration, any commercial production database implementation was required to feature a minimum of 500GB of data for Microsoft Corp.'s Windows and NT platforms and 1TB of data for all other platforms.
I cannot see what OS each DB is running on. Is that irrelevant?
France Telecom uses Oracle Corp. as its DBMS, Hewlett-Packard Co. as its storage and system vendor, and employs an SMP (symbol manipulation program) architecture.
:-)
A case of acronym confusion, I guess.
Sig ?
Especially in the peak workload category. I seen a lot of ms sql databases working overtime when slammer first came out!
Lastly, in the Windows OTLP category HP servers were used by 7 of 10 organizations, and Microsoft SQL Server was the DBMS choice for seven respondents.
Neither WindowsNT, nor MS SQL are generally a choice for the top databases. In fact, to make the entry in this list, a Windows-Database was required to be only half as big as databases on other platforms:
In order to qualify for the TopTen program consideration, any commercial production database implementation was required to feature a minimum of 500 GB of data for Microsoft Corp.'s Windows and NT platforms and 1 TB of data for all other platforms
ms
Large yes, but I'm sure that their list of who they want to sue must be huge! (Atoms in the universe huge.)
One line blog. I hear that they're called Twitters now.
Not only does Anonymous say a lot of things and write some music and paint, but he also has one of the world's largest databases.
While it is nice to see the ranking in terms of size and usage, it would be nice if the survey ranked other factors like maintenance time and number of users to see how they really compare in operation. Largest number of OLTP might signify lower downtime but maybe not.
Well, there's spam egg sausage and spam, that's not got much spam in it.
The guy who did the summary is going to have a bill on his way... :)
Quote: "If this is your website please contact Verve Hosting"
And Verve hosting address is billingadmin@vervehosting.com...
JP
troll ?, Thought it was kinda funny
wanted: one clever sig,apply within
Winter Corp's own results database shoots to number one in the 'Peak Workload' rankings after being linked to from Slashdot...
++ Say to Elrond "Hello.".
Elrond says "No.". Elrond gives you some lunch.
Hmm - how to /. your own website in one simple step?
Boy is the slanted. I work on Large IBM machines with DB2 built-in... Were are those?
Some one lese wrote about google, it should be in this listing too, even if it is using a in-house developed DB.
Platforms: Windows or Unix... BAH!
does anybody believe that the "SMP" used in reference to the French Telecom DB means "symbol manipulation program" rather than "symmetric multiprocessing"? how are we supposed to take seriously a study (or at least a report about the study) where they just look up acronyms with no understanding?
Even better is the Google Glossary to solve your acronym hell.
Some are scored 10-1
shouldn't the overall best performer have been ranked 1984? and the rest from there?
every day http://en.wikipedia.org/wiki/Special:Random
From the article: "the TopTen Program featured 141 qualified and validated surveys representing 23 countries spanning all major DBMS, server and storage vendor products." So it just has to be a DataBase Management System, not necessarily Relational.
I'm absolutely shocked that the NCBI's (National Center for Biotechnology Information - part of the NIH) genomic and proteomic search engine BLAST isn't included in the list. BLAST is consistantly used by scientists worldwide to search the genome of several organizms. I'm similarly shocked that MEDLINE / PubMed isn't included as it's the primary database for searching published scientific literature. When I think of databases, I think of these two sites - not Amazon.
Oracle is 1st (France Telecom). I bet larry Ellison is launching a *big* advertising campaing based on these data.
They are going to exploit this thing "ad nauseam". Wait and see.
Why am I simultaneously frightened and amazed to note that two of the winners are the United States'customs and border patrol database and Experion's credit rating database? If you've ever checked your credit rating you'd realized this company and its peers (equifax etc.) maintain a tremendous amount of information on you, and charge you to verify it. Finding out why your credit is bad, and in the case of a mistake, changing it, is an expensive and time consuming task.
If this were Usenet, I'd killfile the lot of you.
Anyone else notice if you go to wintercorp.com it states:
Makes you wonder how definitive this survey really is.
#exclude <ms/windows.h>
Even funnier is that there's no such thing as OTLP: it's OnLine Transaction Processing. On-Transaction Line Processing???
This was tested against a live directory with the same number of users and objects each time?
How was your test environment organised?
Oh no, you were being ironic, I must pay more attention.
We have dev systems bigger than 100GB.
Slashdot, as the biggest SCO Flames database ..
The registry of some of my NT5 servers that has become HUGE after 2 years
My pr0n cd's sql database : )
WTF am I doing replying to an AC at 5 A.M on a Friday night?
Gee, it's too bad they couldn't get any responses from some of the big SPAMMERS. I bet their db tables and #rows are pretty PHAT too!!
pi=sigma{n:0-infinity}[(1/16)^n][(4/(8n+1))-(2/(8n +4))-(1/ (8n+5))-(1/(8n+6))]
I believe you have the right to, once a year, get your credit rating, for free, on demand (usually written.)
Here in Colorado, Equifax sends me a notice every year that my credit was checked and offers me a free copy of the report they (alledgedly) are sending out.
What remains scary is that, although my credit report was dead on, I have in the past had reports that were so wildly inaccurate I had to laugh out loud. But because the person whose information was included on my report had such great credit the credit reporting company (not Equifax, the other one), told me to just leave it on there and take the benefits.
So thank you, Mr. X in Texas! Without your lack of control and deep pockets we probably wouldn't have got our house. Merry Christmas!
Obviously, you would be crazy to not use some middleware, but things aren't as simple as any of the PR guys claim. Running queries asynchronously creates a different set of problems and complicates the entire architecture. If you look at the biggest installation, they all use middleware and most of them use Tuxedo. This includes most, if not all MS Sql Server deployments. OLEDB can't that kind of load and neither can standard COM+. Just look read the full disclosures for TPC. You'll see all the MS Sql Server tests wrapped Tuxedo with COM+. As much as Microsoft likes to slam EJB and Tuxedo being too expensive, you can't scale Sql Server without using tuxedo for really heavy deployments.
I find it interesting that the largest database is only 2TB larger than the one I recently built. It is a medical system. 66 mysql servers bear the load but I only usually have 30 of them actually active as the rest are mirrors and logging masters. Typical connections: 4500 at any given time.
Bad Panda! No Bamboo for you! In matters of importance ACs will not be responded to. Want to say something critical,OK
I wonder if any of these are large government surveillance databases?
OK I'll be flamed for technical illiteracy, but there are a number of archival systems which go into the Petabyte (1000 Terabyte) range but are still relational databases with row level access.
One I worked on stored the output of Cray supercomputers running modelling programs 24x7. The data was output to a bank of Teradata boxes and then archived to tape. The system had a robot tape librarian at the back end but could still operate as a relational database.
The historical data should all be in there by now which would make around 1.5Pb.
The vendor of the software that managed it all was talking about telephone companies planning similar systems to put up to 5Pb in a system.
Anyone top that?
I had always heard that walmart maintained one of, if not the biggest database in the world. Kmart appears on one or two of the top ten lists here, but not walmart. Anybody know what gives?
I'd truly expect the truly largest databases to be maintained by financial institutions (banks, credit card companies, the stock market, etc) based on the sheer volume of transactions. Either them or the NSA or the FBI.
But in France, don't they use commas as decimal points?
Tim
Omnia vestra castrorum habetur nobis.
Well, the results are wrong. Where I work they were told by Microsoft they had the largest MS SQL operational (all live) database of 18terabytes...
My database professor gave us the run down of the technologies that the NIH databases employ- its some impressive business! Researchers all over the world are indexing and adding papers... SCREW amazon!
In the future, I would want to not be isolated from my friends in the Space Station.
Could the rest be just logs of past telephone traffic? All phone traffic ever made through the company? What portion of these databases contain actual used data (data that is likely to be used in business), rather than just stored historic data? Are companies kepping huge amounts of old data because they can? Because it gives the db administrator a stiffie to think he's got $many terabytes in his db rather than on old tapes in the basement?
We have databases in our organization (Star Schema, Red Brick) where the fact tables literally have billions of rows. I'm sure there are many other organizations (especially government entities) that have huge databases not on this "list". For those interested on operating at this scale, other interesting hardware/software data mining solutions in the same vein as a Teradata are Netezza Corp's database applicances.
I left MasterCard in 1999 after working with their data warehouse. At the time they recently bought a 3 terabyte Sun E10000 with Oracle. They quickly ran out of space and added another terabyte. I'm also surprised to not see them on the list. They work closely with Oracle, who have an office down the street, since they have high volume. Just the credit card transactions table alone gets 14 million new records on average every day.
I agree that there are many companies who would not want to be in that list. There's a small competitive advantage if you keep what technology you use secret.
Developers: We can use your help.
France Telecom uses Oracle Corp. as its DBMS, Hewlett-Packard Co. as its storage and system vendor, and employs an SMP (symbol manipulation program) architecture.
<grin>
Somebody give Mr. Fonseca a clue. With so many unemployed geeks running around, why can't eWeek find somebody who knows this stuff (even cursorily) to write?
I would be surprised if some government databases, such as Social Security's, would not rank on this list if they were allowed to be analyzed.
Is it just me, or is this the first time anyone has heard of AT&T's Daytona? A quick Google search reveals a pdf and 8 links before Daytona becomes Daytona Beach. For such a high ranking, I'd think AT&T would want to make it better known that they have this system.
I used to be really interested in what DB2, MySQL etc. could do until I was turned on to Hans Reiser's vision with respect to file systems. In his view, the storage layers above the file system (complex database software) can be replaced by a more intelligent filesystem that itself acts like a database. I'm currently trying out ReiserFS (a filesystem included in the Linux 2.4 kernel) which internally uses balanced trees to achieve much higher performance in large directories. ReiserFS also wastes much less space in the storage of small files.
I'd say Oracle. Have you tried installing that baby? It makes MS Office look like Twiggy.
---- It puts the lotion on its skin or else it gets the hose again. It does this whenever it's told.
If you go to the article, you will find that AT&T had the largest listed database at 94.1 TB - that's 9 times your speculation for Google.
And I used to work with some of the AT&T databases. Heck, the payroll system alone would have probably made the list in those days. (And I was the DBA for payroll for a while).
Also, some of the winners were using IDMS - a network implemenation of DBMS - not relational.
n/t
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
From my forays into mainframe land, 93tb could be supported by:
2 sysprogs,
2 dbas,
2-3 operators,
1 applications programmer,
and thousands of data entry personnel.
Everything would run batch, (including dumps into satellite dbs for regional or department uses) except for the online data entry, and the apps programmer would dbe setting up jobs for the operators to run at night.
And the first time any of them hose up the DB, would be thier last day of work on a mainframe.
putting the 'B' in LGBTQ+
This is over a month old already. Oddly enough I thought that I read it on slashpot in the first place. But maybe here.
I was never so happy as the day I was able to burn all my Teradata manuals, cause I ain't going back to one of those turkeys ever again.
The largest machines were about 250 nodes (Kmart, and look where they are today, and Walmart). I worked on machines up to about 135 nodes (Amps) (and 50 or so COPS) The performance never matched anything I've seen in DB2.
And even today, the performance tuning tools suck.
Oh, and as for your 1500 node limit, better check your manuals. Tucked away in the manual, and hardcoded into the operating system is a little limit - 1024 nodes - the origin of the name Teradata....
Oh, and it only takes 54 legal commands to crash one of those suckers (if you know the right commands, because of a hard coded limitation in the os as well).
I used to work for a company called Epsilon Data Management[1], in Burlington MA. They've been bought since I left them a while ago, but they where the keeper of AmEx customer transaction database for data mining and direct marketing (junk mail and phone calls).
Big. 7 data silos big. Each silo holds 50k tapes, each tape was 30gb, and it usually took 4 days to load.
[1] Epsilon was originally an AmEx division, which was spun off to keep other customers happy (banks and other CC companies).
III.IIVIVIXIIVIVIIIVVIIIIXVIIIXIIIIIIIIVIIIIVVIII
My first reaction is that, if France Telecom has the largest (non-hybrid) proprietary relational data storage, at 29 TB, ahead of AT&T and SBC, at around 26TB each, that France Telecom must have a bunch of redundant data lying around.
As of 2001-01-01, France had a population of about 59 Million. As it turns out, however, France Telecom (FTE) provides services to a dozen countries, not just France. Checking Yahoo! Finance, I see that
FTE had 2002 revenues of 49B, with 240,000 employees.
ATT had 2002 revenues of 40B, with 71,000 employees.
Finally, SBC had 2002 revenues of 43B, with 175,000 employees.
So nothing terribly unusual about the size of their database. But it's obvious that the French employees are a bunch of unproductive slackers...
The cure for cancer is coming: Reovirus
I once helped out in a study for the largest database AT&T wanted to do. To just store the data would have been 6 times the huge Walmart database's size or more. And this was just for a 3 month rolling store of the calls made on the AT&T network.
The 94.3 TB database is nowhere near what AT&T has to store. That is just one of 7 (last count I had) data centers they maintain. The total size of all the AT&T data approaches several THOUSAND terabytes. They maintain a converted bunker just to store tapes in!
Think about it, they have to keep records for YEARS about every call made on the worldwide entwork.
This is like ranking projects based on largest number of lines of code.
Without system descriptions (like in tcp) it merely shows that such a top-end is feasible.
What about total cost?
annual cost?
time to build?
software versions?
hardware?
staffing composition?
I mean really, a 500 gbyte database on a modest single CPU server is far more challenging than a 2 TB database on a 64-CPU E10k.
Under the category: Database Size, All, DSS
#7 Claria Corporation 12,100 Oracle SMP Oracle Sun Hitachi
The largest in the survey is 30GB.
Is my organisation the new record holder?
Yeah, I thought the same with by 60GB databases. I did a double-take after this post and it's right, Thirty THOUSAND Gigabytes!
Rank them by load, and you'll note the winners =)
Am I the only one surprised to not find eBay on the list? I suppose on one hand it is respectable to have a large and complex database, but on the other companies with massive databases as part of their business that DON'T show up on the list impress me more.
I work for a company that makes billing software for tier 1 telcos. My job is to tweak performance of the billing system and environment as we deploy into the client's production environment.
My team has an internal 17 TB database we use to test performance against, and every one of our clients has at least a 15 TB database. I can list four of our clients who maintain at least a 40 TB database. Not one of our clients is listed on that list (nor are we).
"The market alone cannot provide sufficient constraints on corporation's penchant to cause harm." -- Joel Bakan
The largest DB I've done was about 1 billion rows, processing the weblogs of a large ISP into SQL Server. It was about 1.5 TB.
:-)
I wrote some queries that reduced the processing time from 6 hours to 45 minutes
Me = smart
Microsoft OLE DB Provider for ODBC Drivers error '80004005'
[Microsoft][ODBC SQL Server Driver]Timeout expired
/vldb/2003_TopTen_Survey/TopTenWinners. asp, line 99
Bad web monkey!
1) ASP blows
2) You didn't catch your error
3) You let your error get spit out on the web page for me to start learning about your source code.
4) You should have used the OLEDB driver.
5) You should have cached those results instead of crippling your sql server fetching the same damn info 1 million times.
I didn't pay attention to politics until my country started to scare me. Recently.
That's a lot of data folks. For comparison, Microsoft's Terraserver, which in cooperation with the USGS (geological survey), maps the entire surface of the united states with photographs from the air, satellites, and so on.
That database of pictures is around 6 TB.
Some of the databases listed on the survey are even larger - approaching tens of terrabytes!
I wonder what Skyserver will be (new successor to terraserver, designed to collect and stitch together a map of the entire sky in 3d from all known and future telescope pictures)
Natural != (nontoxic || beneficial)
Since neither PostgreSQL or MySQL showed up in the list (not surprisingly), does anybody know what the largest databases are running either of them?
I would guess that PostgreSQL maxes out larger than MySQL. </fuel-on-the-fire>
France Telecom : 29,232 : Oracle : SMP : Oracle : HP : HP
AT&T: 26,269 : Daytona : SMP : AT&T : Sun : Sun
SBC : 24,805 : Teradata : MPP : Teradata : NCR : LSI
***Anonymous*** : 16,191 : DB2 for Unix : MPP/Cluster : IBM : IBM : IBM
16 terabytes, and anonymous.... Hmmmm.... I know! It's the motherlode of all porn sites! Either that or the NSA. Same thing, really...
Mmmmm Condeeeeeeeeee!
RS
Shoes for Industry. Shoes for the Dead.
I'd be very surprised if there aren't megalithic databases churning away in a black budget projects operated by unnamed government agencies that make these commercial ones puny by comparision.
For that matter, I'm curious as to who "Anonymous", the operator of the #3 db in terms of size, is...
---anactofgod---
---anactofgod---
"Equal opportunity swindling - *that* is the true test of a sustainable democracy."
As a former employee (in the store, not at ISD) I know that most of that 240 terabytes is going to be in a database, not just files. I know Walmart keeps alot of stuff a secret, but they are rather proud of their IT stuff, and I'm surprised it didn't make the list
How big is Slashdot's database?
Stanford Linear Accelerator Center weighs in at 500TB. They run Objectivity.
Internet Archive weighs in at 300-400TB and runs Linux.
Google is probably somewhere in that range, but they don't tell. A rough guess would be 3307998701 pages * 100KB/page / 1024KB/MB / 1024MB/GB / 1024GB/TB = 308TB. They run pigeons
You missed one important point!
FT provides a public service in France which means that they are not expected to make as much profit as a company in the free market.
For example, FT has the obligation to maintain the telecommunications for remote parts of France (mountains, islands,...).
A private company would just refuse to do it or would charge a lot more that FT.
You would define it as 'none'. Lack of an eye color is a perfectly valid piece of data. It's not unknown - you know they have no eyes, therefore, they have no eye color: 'none'.
A Null value is one that does not have a value.
I'm not trolling (despite some clueless moderator's beliefs otherwise - I wish mods wouldn't moderate posts on subjects they don't understand...) or trying to belittle you or anything here, I'm just trying to point out that RDBMSs don't really exist and things like stupid NULLs are what's to blame. Oracle, SQL Server, Hell, even my favorite - PostgreSQL, all are to blame for stuffing non-relational tools down peoples' throats while screaming about RDBMs's. Think about how illogical your statement is:
A Null value ... does not have a value.
That makes absolutely NO sense. How can a value NOT have a value? A NULL is meant to represent something YOU DON'T KNOW. However, if you regularly find you don't know how to describe something completely, you probably shouldn't be trying to describe it within a relation. If you're occasionally going to have the need to temporarily represent an "unknown" value (perhaps you haven't seen this individual yet to know what eye color they have), why not just use the string 'unknown' as a placeholder? It's logical, it's true, and it signals that it needs to be changed eventually. Simple.
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
Wally World, errr Walmart is suspiciously absent from that list.
They have a HUGE (200+ node) Teradata install.
Wherever You Go, There You Are
(insert 'insert into' joke here)
So, what database does Slashdot use and how big is it ?
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Semantically, I don't see the difference between using 'unknown' as a placeholder and using an explicit null value. Except that using the placeholder is more awkward and it isn't clear how to store 'unknown' in a field that contains integers, for example. Further, databases have support for using null values, for example in outer joins and in aggregate functions. Perhaps you could modify an RDBMS to recognize the string 'unknown' so it could be used cleanly in these cases, but what would be the point?
If you object to the very idea of storing 'a value which is not known' then I don't see why this is any better when represented as the magic string 'unknown' or the magic value null. I pointed out in my post above that nulls have been part of the relational model since the beginning.
-- Ed Avis ed@membled.com
Just a note - I do speak from current experience when saying that a magic 'unknown' string or similar is painful to deal with and a source of bugs.
-- Ed Avis ed@membled.com
Number one in Decision Support System Peak Workload for Windows!
Number eight in the same category for all platforms!
See, small guys can do big things! We're a small to midsize consulting firm (50 or so employees), and yet we're on the top ten list of largest databases in the world!
*pops the champagne*
I am disrespectful to dirt! Can you see that I am serious?!
I suspect a more realistic guess is that its one of Teradata's larger installations that is preparing to defect to a DB2 EEE install and they aren't quite ready for NCR to know just yet.
Or as in the case of the US, would be required to provide service to everyone in their service region. I've been in some rather isolated areas in the US that had cheap affordable phone service. But I got to agree, we can't expect a public service to be run efficiently.
required msg
I'll respond to your three posts here for convenience.
Yes, you're right, of course, on Codd's assertions regarding the NULL. I'm strictly speaking in the sense of SQL, however, and the issues that the current implementations of that messy language and the resultant DMBSs that are raised. Of course, I realize I didn't SAY that, so my apologies for confusing the matter (perhaps I need to add myself to my last journal entry regarding people who type before they think...).
At any rate, back to NULL. It's not that storing unknown values is bad, it's that it's abused. It turns into a catch-all like the eye color issue the AC raised. NULL (in the current SQL sense - perhaps this wouldn't even be an issue if NULLs were treated properly) makes no sense with that issue because the correct eye color for someone without eyes is "none", not NULL. People assume that anything that's not a cozy little tailor-made fit with their views of things can be represented as NULL, which is simply not the case.
That goes back to my other two posts though (that have, apparently, been bitchslapped to Troll) regarding people like Ellison and companies like Oracle that develop systems that claim to be relational because they implement PORTIONS of the Relational Model, and then sell them to people who don't want to bother learning what the Relational Model really is. I may not always agree with Fabian, but I think he and Date are right when they say that a truly relational system would render the whole frenzy over "XML Databases" and "Ob-Relational Databases" and all that other garbage moot and it's a crime that these companies get away with pushing the garbage they do by claiming it's something it's not. If a truly relational system were implemented, it could represent data in the ways that those systems do simply through the proper use of data types and attribute definitions (why, pray tell, could one not simply define and XML datatype in a true RDBMS?). It is a great source of annoyance to me that people will sit and argue with me about the nature of a Relational Database based solely on the fact that a vendor says it's relational. I actually sat and argued with a professor for hours once that Access barely qualifies as a DBMS, much less a relational one. He kept arguing, however, that, because it had "tables" and keys and a handful of data types, it must be relational. I seriously wanted to clobber him with the copy of "Intro to Database Systems" that I had on hand (that's one heavy freakin' book in case you've never held a copy).
BTW - I don't condone the use of a kludgy string unless it's necessary. By necessary I mean "you happen to realize part way through the process that you don't know a value, but you will get it and replace the kludge as quickly as possible". In fact, I don't think ANY kludge like that should ever go to production because it causes problems in app development later on.
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
Null may be abused, however it is much the less bad option compared to most other ways of modelling unknown or not applicable values. Sometimes you have a genuine need to model data where some things are really not known. This is not the same as a kludge because you don't know what value to put. When designing your relational database you make a conscious decision that it makes sense to have rows where 'height' is not known.
Of course the correct eye colour for someone without eyes is 'none', or 'not applicable' as it is sometimes called. But you may decide as part of your data model to represent 'none' using null. This might not be ideal but it is certainly no worse than representing it using a magic string.
Of course, you are now confusing unknown eye colour with not-applicable eye colour, and this was one of Codd's criticisms of SQL, that it has only a single null value but needs two. However, I don't think this single failing is enough to disqualify a system as a relational database. And if you are using a particular RDBMS that supports a single null value it usually makes more sense to go with what the database supports natively rather than try to reinvent it with special 'none' strings. Even if you wish that the native support could be a bit different, you'd be silly not to use what is there. As I mentioned earlier, aggregate functions such as avg(height) are aware of null but not aware of any special value you might choose.
If you don't condone using such a value in production, but only as a kludge during development, then fair enough. If you've never needed to model unknown values or not applicable ones, you've been lucky. (By good design and normalization it's possible to reduce the number of not-applicable values you have to store, but they can't always be avoided entirely.)
You're right that Oracle comes with a lot of non-relational crud like 'XML databases'. However, don't let that distract you from the fact that Oracle, the core product, is a pretty good implementation of a relational database system. The same goes for most other RDBMSes. I believe you can choose simply not to install the XML / object-relational stuff.
I'd say Access probably is a relational database because it stores all data in relations and you use a query language based on relational algebra (even though it may have a GUI front end) to retrieve and update the data. It might not meet all 12 of Codd's rules, but in my opinion it's close enough. You might sound a bit unreasonable if you insist that a database storing relational data is not a relational database.
-- Ed Avis ed@membled.com
Particle physics experiments routinely collect far more data than this. The Babar experiment that I participated in stores enough data that its database is an order of magnitude greater in size than anything in this article (Current size: 895.0 TB).
See: BaBar Database for details, it uses an OO database (which in my experience was very painful for users)
Dan
Or largest publicly admitted-to databases.
Mine at work is 44TB, DB/2 for AIX, running on an RS/6000 system with 128 nodes. DSS only.
But I work for a really huge US company who doesn't talk to the media much.
It makes me wonder how many really huge ones are also flying "under the radar screen". Such as SCO's database of all Linux users, perhaps...
I don't understand their counting. Not that I am happy with it, but we (BaBar) have certainly a much larger database than all of these companies. And, since we also have severl computing farm summing up to several thousand CPUs which process the data constantly, I doubt that they have higher load.
0 20 412/database.html
Press release:
http://www.slac.stanford.edu/slac/media-info/20
Cheers
KdenLive/PIAVE - non-linear video editing
"France Telecom uses Oracle Corp. as its DBMS, Hewlett-Packard Co. as its storage and system vendor, and employs an SMP (symbol manipulation program) architecture."
The author of this article just failed my bullshit filter. SMP in this context is "symmetric multiple processors" -- yes, SMP "Symbolic Manipulation Programme" was the name of what Stephen Wolfram wrote back in the early 80's while a grad student at Caltech, and open-sourced, and got heaps of shit for, because of a nasty copyright battle with Caltech over it. He was a student, and felt he owned the code he wrote while a student. Caltech felt differently when it started giving MacSyma a heck of a run for its money -- and Maple started raising their prices.
But this has abso-fucking-lutely nothing to do with database architecture. What "geek dictionary" did this writer look up this acronym in? Doesn't know what he's writing about. At all.
> FT also runs Minitel, which some might scoff
/.! Can I say PORN two, ... three times without being moderated down?
> at but is not trivial to run
Minitel is also one of the largest PORN DATABASE in the world. You can find there millions of PORN images in high minitel resolution (40x25 in 16 colors).
I hope that there is no spam filter on
You might sound a bit unreasonable if you insist that a database storing relational data is not a relational database.
You mean like... oh, say, Fabian Pacal, that guy I keep reading? :)
Everything you've said now comes down to the realm of current practical implementations. Yes, Oracle is "pretty close" as is SQL Server, PostgreSQL, and a slew of others, but they're just not there yet. I don't really have a problem with Oracle (excepting price...), SQL Server, or PostgreSQL. In fact, I love PostgreSQL. However, when I hear people arguing for OOP Database architectures or XML Databases, they always try to argue that "relational database management systems just don't meet the needs of the data being modeled in these circumstances" (note, now, we're in the realm of theory). This is total BS. There's no reason you couldn't have an XML data type in a real RDBMS or create a network model of your data within the RDBMS (although, that latter would stupid, you COULD do it). The problem is that vendors aren't offering relational system in the true sense of Codd's works, and when something comes along like XML, people jump up and run around thinking they need to have a new DBMS and yell that the "relation system is dead".
Baloney!
And, since I've already gotten the two original posts modded to Trolls, why not go for a three peat by tossing out a totally opinionated, offtopic statement :)
XML is a stupid idea anyway.
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
Why do you think that relational database vendors are not offering a true RDBMS in the sense of Codd's works? You mentioned null values before, but I explained that this is definitely part of the relational model and not a deviation from it. What else doyou think is missing?
-- Ed Avis ed@membled.com