World's Largest Databases Ranked

Google by ScribeOfTheNile · 2003-12-12 01:09 · Score: 5, Interesting

I would've expected to see Google in there somewhere.

Re:Google by tinrib · 2003-12-12 01:14 · Score: 5, Informative

Doesn't Google use 'big files' rather than a database for storing all its data?

see http://www.cs.rochester.edu/sosp2003/papers/p125-g hemawat.pdf which describes the Google filesystem.
Re:Google by lewp · 2003-12-12 01:21 · Score: 5, Informative

Even if Google qualified, which it probably doesn't due to the methods it uses for its data storage, if I read the article properly the database vendors are responsible for naming the participants.

Since Google's stuff seems to be developed in-house, they don't have a major database vendor to nominate them.

--
Game... blouses.
Re:Google by stripmarkup · 2003-12-12 01:25 · Score: 5, Informative

It seems that they are comparing relational databases. Search engines use proprietary databases which, among other things, do not allow for live insertion of records, SQL commands, etc. As for data volume, Google (or Yahoo or MSN, for that matter) are probably in the ballpark. The average html page is around 10k. Google probably stores at least 10^9 raw web pages in their cache(that's 10 TB alone) plus a lot of meta information about links to-from many others.

--
See charts for twitter trends on Trendistic
Re:Google by Ilgaz · 2003-12-12 01:59 · Score: 2, Insightful

What about visa/mastercard/american express?

IMHO some of them didn't want to be in that list.
Re:Google by KarmaPolice · 2003-12-12 02:13 · Score: 2, Informative

What about visa/mastercard/american express?

IMHO some of them didn't want to be in that list.

If you look at "database size", number 4 is listed as anonymous. They probably aren't too interested in telling everyone what database and platform they are using for storing very critical data with.
Re:Google by Wastl · 2003-12-12 02:35 · Score: 5, Informative

The term "database" is rather unprecise.
One might see a database as merely a "big file" with mechanisms to access and modify it consistently (and surely, Google has some means to ensure consistency). A big file does not disqualify for the term "database" just because it is not produced by one of {Oracle, MS-SQL, ...} or cannot be queried by the language SQL.
It is also possible to consider the Web to be a database (of Web sites). Or an XML, BibTeX, dbm, whatsoever file.
Sebastian
Re:Google by Grayputer · 2003-12-12 03:51 · Score: 1

How about NASDAQ, or any of the exchanges, strikes me they would rank somewhere in the OLTP section. OH wait, it's UNIX and MS, not mainframes ...
Re:Google by rawgod0122 · 2003-12-12 03:52 · Score: 1

>It seems that they are comparing relational databases. Search engines use proprietary databases which,
Yes and Yes

>among other things, do not allow for live insertion of records, SQL commands, etc.

If they can not insert (bulk load is an insert) how are they going to expand their databases?

>Google probably stores at least 10^9 raw web pages in their cache(that's 10 TB alone)

Google does not store raw web pages. Use google to find the pagerank and pagerate algorithms. Also find the paper on how the google filesystem runs.
Re:Google by MattRog · 2003-12-12 04:37 · Score: 2, Informative

A database is any collection of data. A database management system (which is what most people erroneously call a database) is a system of programs (say Oracle/MS SQL) to maintain the data in a database.

--

Thanks,
--
Matt
Re:Google by jfmiller · 2003-12-12 07:55 · Score: 1

Is this also the reason none of the major Credit Card serveses show up? ie Visa and Master Card?

--
Strive to make your client happy, not necessarly give them what they ask for
Re:Google by stimpleton · 2003-12-12 10:45 · Score: 1

The universe consists of 2 broad data storage formats:
1) Database(DBMS) systems.
2) Flat files(usually text files).

When I went to school, database meant a dbms.
"Big Files" were known as a flat file system.
Jane the Office Girl calls her Excel spreadsheet of clients a database, when it really is only a flat file.

A database(dbms) should be seperated from flat file systems, which do not deserve the term database.

Agreed, with the advent of ODBC etc, even a comma delimited text file can be queried with SQL, but flat files do not share the qualities of their Database bretheran. IE Concurrency control, failure recovery etc.

--

In post Patriot Act America, the library books scan you.
Re:Google by ydrol · 2003-12-12 12:54 · Score: 1

"In order to qualify for the TopTen program consideration, any commercial production database implementation was required to feature a minimum of 500GB of data for Microsoft Corp.'s Windows and NT platforms and 1TB of data for all other platforms. Respondents offered details through a questionnaire based upon their deployment's size, workload and DBMS, server and storage environments."
Re:Google by Directrix1 · 2003-12-12 17:29 · Score: 1

Oh, the ignorance. I can't stands it no longer. Most people do refer to a Database while meaning a DBMS, but flat files do not usually denote text files. A flat file usually denotes a database file which is accessed by a client through a library that uses file access mechanisms (as opposed to an SQL querying interface).

And flat files can do anything you can do with a DBMS, except access control and all querying/updating is only enforced/executed on the client end (whether through the library or file locks). Meaning the language being spoken is usually something along the lines of SMB or NFS and not a binary transliteration of an SQL Request/Reply.

--
Occam's razor is the blind faith in the natural selection of least resistance and in universal oversimplification. -- EF
Re:Google by Assembler · 2003-12-13 19:04 · Score: 1

Just thought this was interesting.. Here's a quote from the page that I saw:

Microsoft OLE DB Provider for ODBC Drivers error '80004005'

[Microsoft][ODBC SQL Server Driver]Timeout expired /vldb/2003_TopTen_Survey/TopTenWinners.asp, line 99

My porn database by Trigun · 2003-12-12 01:09 · Score: 3, Funny

scored a measley 17th. Oh well, time for more surfing.

Re:My porn database by real_smiff · 2003-12-12 02:03 · Score: 4, Funny

Does anyone actually have their porn in a database (of some sort)? I'm curious whether the "porn database" is just a joke or ... hmm, worth implementing! For all I know, there's already a 'porn-o-base' (tm?) collaborative project on sourcefourge that you're all using - after reading slashdot for a bit nothing would surprise me...
What are the pros and cons to databasing (sp.?) your porn? - except perhaps, reduced chance of getting a girlfriend, and chance of ridicule, obviously...
Hey, this is the right place to ask ;)

--
This is my Sig, this is my Gun. One is for Slashdot and one is for Fun.
Re:My porn database by lonb · 2003-12-12 02:16 · Score: 2, Interesting

I used to run a porn site, RezX.com (about six years ago). All the content, porn included, was served out of a db.

--
"Ain't I a stinka..." - Bugs
Re:My porn database by goosman · 2003-12-12 02:16 · Score: 2, Funny

I started to organize my pr0n with a database, but i found that I was easily distracted by the content.

Plus 'leafing' through it is half the fun.
Re:My porn database by Anonymous Coward · 2003-12-12 02:51 · Score: 1, Funny

Does anyone actually have their porn in a database (of some sort)?

Yes. I do not, but a close friend has a several terabyte database, burned to cd, referenced, with a shnazzy web interface. If a file you are looking for is unavailable, it requests that he swap some cd's to make that content available.

Hey, it's a hobby.
Re:My porn database by leifm · 2003-12-12 03:01 · Score: 1

I talked to a guy at school that claimed to have done this, or something similar. He'd organized his collection, and then written a VB frontend to do searches for something like, blonde + shaved. I never actually saw it, but he claimed he was going to sell it on eBay.

--

"Windows Me offers tremendous reliability and stability improvements..." -- Paul Thurott
Re:My porn database by Trigun · 2003-12-12 03:21 · Score: 1

Then write a program to automatically categorize pornography pictures and sell that on e-bay.
Re:My porn database by Oopsz · 2003-12-12 03:29 · Score: 2, Informative

How about automatically categorize, find, and download?

It exists, and its open source. Welcome to the wonderful world of porn-get.
Re:My porn database by leifm · 2003-12-12 03:33 · Score: 1

You'd almost have to have metadata to do that, or one hell of an advanced image processing thingy. Either one is going to be a massive undertaking, so I won't expect to see it on eBay anytime soon.

--

"Windows Me offers tremendous reliability and stability improvements..." -- Paul Thurott
Re:My porn database by Anonymous Coward · 2003-12-12 03:40 · Score: 1, Interesting

I keep md5 hashes and the galleries they relate to of all the pictures of porn I have (about 25000 images so far) so I dont get duplicates when I add new ones... I also have another db table for galleries wich keeps track of the number of images are included at that gallery and the traits of the chick/chicks in those pictures (young, hot, shaved, redhead, cartoon), althoug only about 60% of the galleries have been gategorized.

It runs on mysql and is controlled by a couple of legacy php scripts (yes I taught my self php so I could create a cool database for my porn)

The pictures them selves are held in seperate Blowfish encrypted files (so my parents wont find em) and I also keep smaller (also encrypted) thumbnails of all images.

The funny thing is that I rarely see any of these pictures after I enter them to my database because im always looking for new ones (free6.com, thehun.com, ampland.com and spidering various newsgroups)... Oh well... maby I just give em to my grandchildren some day.
Re:My porn database by ultranova · 2003-12-12 06:07 · Score: 1

As it happens, I'm currently building an application that will download pics from Usenet binary newsgroups and store them to a database (with each header stored separately, user-inputtable keywords, and a user-adjustable "score" describing overall quality. When finished, I can search and display via a web browser. This is neccessary, because once you have a few thousand pics with essentially random names, it becomes practically impossible to find anything, and it really annoys me when Nautilus refuses to show a folder because there's too many images in it... And for whoever's wondering, that's for "alt.binaries.pictures.fantasy-sci-fi", not porn groups... Not that there's that much of a difference, since the favorite subject of most fantasy artists nowadays seem to be a skinnily (if at all) clad woman with big breasts... Not that I'm complaining ;).

I'm also considering automatic trading. The system would automatically detect missing files (by parsing par files) and send out periodic requests. Anyone fulfilling these requests via reposting would earn credits, which could in turn be used to request automatic reposts from the system. Of course, these systems would recognise each other's requests and answer them, if possible.

Basically, there would be a limited number of possible automatic posts per day/week. The requests from the people with most credits would be fulfilled first (with upload slots being distributed in proportion of credits held), with any leftover free slots being used by the creditless masses.

Naturally, this would require that the files be checked, to ensure someone isn't posting garbage. Par files have some kind of integrity check, but I have no idea how well it would stand up to a deliberate attack.

So, basically, peer-to-peer over Usenet :).

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re:My porn database by jridley · 2003-12-12 06:33 · Score: 1

I have all the video that I own in a database, if that's what you mean. A certain (to remain unnamed) percentage is porn. When you get to the point where you have thousands of VCD/DVD discs in binders on the shelf, you need some way to know what you already have.

Heck, on a couple of occasions where I forgot to bring my palm pilot with me with a copy of the database, I've wound up buying duplicates of DVDs I already owned. At least I caught it before I opened them and was able to return them.
Re:My porn database by Tin+Foil+Hat · 2003-12-12 07:01 · Score: 1

Pro:
select * from pr0n where attribs like '%big breast%'
Con:
Having to spend hours typing descriptions of all the pics (with a hard-on, mind you) instead of just running through them left-handed in kview.

--
No matter how many of my rights are taken away, somehow I still don't feel safe. -Frigid Monkey
Re:My porn database by Slashdotnewbie · 2003-12-12 07:11 · Score: 1

There is a porno movie database: www.iafd.com
Re:My porn database by Nplugd · 2003-12-16 04:04 · Score: 1
Holly cow, I'd never have though to search for 'porn' on sourceforge. Turns out there's dozen of projects. Here's the three first:
- gnaughty: GTK porn movie auto-downloader.
- Porn Toolkit: The Porn Toolkit is a collection of scripts or programs to download free porn videos automatically.
- C'est gratuit pour les amis!: You would like to have your own erotic-porno-sex web site but you don't want to spend time on it? This project will create it for you, it will browse other erotic-porno-sex web sites and it will put their pictures on your!
  
  Ah, the humanity in all it's glory...
--
Je n'ai pas d'avenir Je n'ai qu'un destin Celui de n'être qu'un souvenir C'est pour demain

SQL Server? by B5_geek · 2003-12-12 01:10 · Score: 5, Interesting

Does the SQL Server mean MS-SQL?

I would have liked to see SQL vs non-SQL ranking too.

--
"The price good men pay for indifference to public affairs is to be ruled by evil men." ~Plato (427-347 BC)

Re:SQL Server? by sql*kitten · 2003-12-12 01:21 · Score: 1

Does the SQL Server mean MS-SQL?

Yes, in this case - look at the "Vendor" column. Note that in the past both MS and Sybase called their database "SQL Server", nowadays Sybase calls it "Adaptive Server". Sybase IQ is highly optimized for DSS work, where as AS is optimized for OLTP.
Re:SQL Server? by AndroidCat · 2003-12-12 01:32 · Score: 3, Funny

Typical Microsoft calling their product something generic that should apply to any SQL server. Almost like calling a product .. Windows.

--
One line blog. I hear that they're called Twitters now.
Re:SQL Server? by azaris · 2003-12-12 01:43 · Score: 4, Informative

Typical Microsoft calling their product something generic that should apply to any SQL server. Almost like calling a product .. Windows.
It was originally called Sybase SQL Server but was later picked up by MS who adapted the name. Typical /. objectivity.
Re:SQL Server? by Ilgaz · 2003-12-12 01:55 · Score: 1

What I know is, SQL is a method, not invented by MS, actually IBM and Compaq in business first adepting it.

MS did amazing PR job to show it like their invention, which is totally wrong.

SQL thing first got attention by Oracle after being ignored by all others and you see what they have become after caring about it.

If I don't remember wrong, SQL, relative databases, relative queries has earthed at 1978 or some later.
Re:SQL Server? by Net_Wakker · 2003-12-12 02:18 · Score: 1

Typical Microsoft calling their product something generic that should apply to any SQL server. Almost like calling a product .. Windows.
It was originally called Sybase SQL Server but was later picked up by MS who adapted the name. Typical /. objectivity.
So you're saying it's typical MS thievery, stealing another products' name?
(It's a joke. Laugh.)

--
a horrible place
Re:SQL Server? by Ilgaz · 2003-12-12 02:38 · Score: 1

--
Microsoft just called their product Microsoft SQL Server. Doesn't sound presumptious to me. If they had called Windows Microsoft Operating System instead would you be pissed?
--

I am not getting pissed. Its like the best/biggest nuclear power company claims or lets say, via PR , shows like they invented e=mc2 . Thats all.
If SQL stuff isn't somehow manipulated by ms PR (figure, not MS, their PR) to show as theirs, why sysadmins installing mySQL gets word from their bosses like "lets use the original" in first minor problem?
Re:SQL Server? by Ilgaz · 2003-12-12 02:48 · Score: 1

I respect you terribly, ok you know all.

But in these days everyone thinks SQL is a MS thing. Thats all. In fact same people will think video streaming is a ms invention when Realnetworks is dead (somehow) in 2 years. Thats the point.

I am speaking about that.
Re:SQL Server? by AndroidCat · 2003-12-12 02:50 · Score: 1

Exactly. Sybase SQL Server. And it was Microsoft SQL Server, but once Microsoft has their product in position, they drop the company name and gradually make everyone assume that the generic term refers to them by default. If they've managed to be extra tricky, they then sue anyone not using the generic term to refer to their product. (Did Sybase really chose to rename their product, or were they forced to by legal/marketing forces?)

--
One line blog. I hear that they're called Twitters now.
Re:SQL Server? by Dr+Caleb · 2003-12-12 03:10 · Score: 2, Interesting

If you want something that complies with the relational model and relational theory, skip SQL and go directly to IBM DB2 and RPG. SQL as you say is a kludge. DB2 as a language so much reminds me of assembler I tend to liken it to opcode for databases. As you may tell, I'm a big IBM'er.

--
"History doesn't repeat itself, but it does rhyme." Mark Twain
Re:SQL Server? by sphealey · 2003-12-12 03:17 · Score: 4, Insightful

It's also this intense stupidity that has prevented us from having a major vendor that actually provides a real RDBMS to this very day. If DBMS people would actually invest a little time in learning about the Relational Model, maybe they'd stop purchasing the crap that Microsoft, Oracle, IBM, etc. keep forcing out and (flamebait here) maybe people would stop installing MySQL and Access and thinking they're going to be good for anything more important than cookie recipes).
That's exactly how Larry Ellison got his start - he saw a good idea in an IBM tech journal, hired some programmers to implement it, and the result was Oracle. Why don't you (and the others who post this stuff to database-related forums and threads) go ahead and do the same? Actually write and market a "real relational system based on theory"? Then you could stop yelling at everyone else about it.
sPh
Re:SQL Server? by jxs2151 · 2003-12-12 03:21 · Score: 1

But in these days everyone thinks SQL is a MS thing.
That's known as sucessful marketing. Like the product(s) or not you have to admit that Microsoft's ability to communicate is unmatched.
Re:SQL Server? by azaris · 2003-12-12 03:22 · Score: 3, Insightful

Well, "SQL server" is a stupid way to refer to a RDBS. That's like calling Apache "perl-server". I'm not surprised the only people chosing to name their RDBS products as SQL-something-or-other are the open source developers and Microsoft. Also I've never heard of MS sueing MySQL or PostgreSQL for use of the term SQL in relation to a RDBS.
Besides, the product is officially called Microsoft SQL Server and has always been, just like Microsoft Windows, but everybody refers to it as SQL Server or, if there is possibility of confusion, MS SQL Server or MSSQL for short. Is it malevolence on the part of Microsoft if people can't be bothered to use the full name of each and every one of their products?
Re:SQL Server? by haystor · 2003-12-12 03:27 · Score: 1

I've read this sort of flamebait elsewhere. The one thing always missing from it seems to be the name and location of an implementation of this superior technology. I'd love to try it out.

--
t
Re:SQL Server? by OwnedByTwoCats · 2003-12-12 03:44 · Score: 1

MCC's LDL++. Unfortunately, the price was steep the last time I (encouraged my boss to) look.
Re:SQL Server? by djdavetrouble · 2003-12-12 03:49 · Score: 1

he said;
Did Sybase really chose to rename their product, or were they forced to by legal/marketing forces?)

I doubt it. Microsoft was the main party in the last round of funding before Sybase's IPO. i.e. Microsoft owns a stake of Sybase, and noone forced anyone to do anything.

--
music lover since 1969
Re:SQL Server? by MattRog · 2003-12-12 03:59 · Score: 3, Insightful

Because it is *relatively easy* to make a mediocre (Oracle, etc.) implementation of the Relational Model. It is quite difficult to make a truly Relational Database Management System. Not only that, but because the market is so uneducated why would they want to use it in the first place?

--

Thanks,
--
Matt
Re:SQL Server? by rawgod0122 · 2003-12-12 03:59 · Score: 1

The current and past trend in databases is to make everything work in a SQL database. As far as commercial relational databases go SQL is the standard. Check out M. Stonebraker, "Inclusion of New Types in Relational Data Base Systems", IEEE Data Engineering Conference, Los Angeles, CA, February 1986. (http://citeseer.nj.nec.com/ncontextsummary/11868/ 52519). This is a paper on how to extend relational databases to handle all sorts of data. It has been used to create Object Relational databases, add in spatial information to databases, and many other types of data. The latests and greatest data type that people are trying to integrate is XML. This has not met with so much sucess when efficently querying the data, but there is a great deal of current research on the subject.
Re:SQL Server? by Ed+Avis · 2003-12-12 04:24 · Score: 1

Can you give some examples of where SQL deviates from the relational model (I know there are a few, but I'd be interested to know which ones you consider important in practice), and suggest what features a replacement query language would have?

(Not just 'it follows the Relational Model' but particular examples, because otherwise it isn't clear what you mean, even to those who have studied a little relational theory.)

--
-- Ed Avis ed@membled.com
Re:SQL Server? by haystor · 2003-12-12 04:27 · Score: 1

Thanks, now I can at least read something about a concrete implementation. Last I read on dbdebunk there were tons of rants about how all the current SQL db's suck and how they had a real one coming soon.

I've been dealing with technology too long to bother thinking about any product that's not out yet.

--
t
Re:SQL Server? by the_mad_poster · 2003-12-12 04:37 · Score: 1

E.F. Codd: In a relational database all data should be represented explicitly and in only one way, as values in tables.

The very IDEA of a "Null value" is not only contradictory to this idea, it's outright silly. You can not represent an "unknown value" in a relational database where "all data should be represented explicitly and in only one way". You can not, logically, represent the "unknown" in only one way. Rather, it's more like saying "this could be anything, but it's only one thing, which is not necessarily the same as all the other one things, that could be anything but might be the same as this one thing".

In other words, it's ridiculous to even use a "Null value". If you don't even know what it is your representing, why are you trying to represent it?

That's only one example that peeves me in particular. Go check out Fabian Pascal's articles and Database Debunkings for a heck of a lot more thorough and eloquent justification on the position than I could rehash here.

--
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
Re:SQL Server? by AndroidCat · 2003-12-12 04:47 · Score: 1

And Windows has always been Microsoft Windows, but they've gone after other products using windows in their name. (But not X-Window for some reason. :^)
Obviously they don't own "SQL" and can't force anyone to stop using it, but now that they've laid their cockoo's egg of SQL Server, anyone else wanting to use "Bob's SQL Server" will hear from their lawyers because it causes confusion.
Anyway, major topic drift, oops!

--
One line blog. I hear that they're called Twitters now.
Re:SQL Server? by Ed+Avis · 2003-12-12 08:30 · Score: 1

E.F. Codd: In a relational database all data should be represented explicitly and in only one way, as values in tables.

You've just quoted the first of Codd's rules to define a relational database system. But it seems you're not aware of the others, in particular rule 3: there should be separate handling of null values, where the null value in a column is different from any of the normal values for that column's datatype.

In fact, Codd argued for two different kinds of null: one to mean 'unknown', as when entering a person into the database without knowing their date of birth, and one to mean 'not applicable', as when storing the engine size of a bicycle. One of the reasons he disliked SQL was because it combined these two into a single null value. SQL deviates from the relational model not because it supports some null values, but because it doesn't support enough of them.

BTW, please take Fabian Pascal's writings with a pinch of salt, at least until you have checked against other writings on relational theory and database management.

--
-- Ed Avis ed@membled.com
Re:SQL Server? by $ASANY · 2003-12-12 14:25 · Score: 2, Interesting

The actual story is that in the mid-90's Microsoft bought the source code and rights to Sybase SQL Server 4.9.2 from Sybase, and then sued Sybase claiming that the name "SQL Server" was part of the package that they paid for. Sybase settled the case and relinquished the "SQL Server" name re-branding their OLTP RDBMS "Adaptive Server Enterprise".
Now MS has overwhelmed Sybase with a derivation of it's own technology that has MS's special additional bugs included for a nominal price, largey because they know how to market and Sybase regularly fails to market it's products effectively.

Spam databases by stanmann · 2003-12-12 01:12 · Score: 2, Insightful

I wonder how many of the spammers allowed their databases to be evaluated for this list.

--
Food not Bombs is a nice platitude but it breaks down when you notice that the Bombees are usually well fed

Re:Spam databases by SnowWolf2003 · 2003-12-12 01:33 · Score: 1

Why would a spammer be even close to making this list. They likely only need one big table containing the email address. The rest of the supporting tables would be relatively small.

Re:No, it's 30,000GB by Cutie+Pi · 2003-12-12 01:13 · Score: 3, Informative

You're off by 3 orders of magnitude. The largest is 30TB.

No IMS? by John+Harrison · 2003-12-12 01:13 · Score: 4, Interesting

I thought that 90% of the world's data was irretrievably trapped in IMS? Seriously though, I am surprised that an IMS system isn't on the list. Probably because it isn't relational, and the people making the list figure that RDBMS are the only DB around.

--
Lasers Controlled Games!

Re:No IMS? by musikit · 2003-12-12 01:16 · Score: 2, Funny

I thought that 90% of the world's data was irretrievably trapped in IMS?

looks like you got a typo in your question there. let me fix it for you.

I thought that 90% of the world's data was irretrievably trapped in MS?
Re:No IMS? by holviala · 2003-12-12 01:24 · Score: 1

I thought that 90% of the world's data was irretrievably trapped in IMS?
WTF? IMS? IMNAAL!
(Uh.... my head hurts..... what's this IMS anyway?)
Re:No IMS? by John+Harrison · 2003-12-12 01:35 · Score: 5, Informative

Google is your friend.
IMS is the database that was used to keep track of things for the moonshot. It is an IBM product. It is hierarchical as opposed to relational. Because of this it can do certain things very quickly, though in general it isn't as flexible as say DB2. Because it has been around so long, applications where having a DB was really important tend to have bought IMS a long time ago and developed systems around it. If your system is old enough, large enough and still works well for you there is no need to migrate to relational. Most of the world's financial transactions pass through an IMS system at some point. It is very stable and has uptimes that measure in years if not decades by now.
Because of this I am surprised that it is not on the list. There are really big IMS databases out there that run a lot of transactions. Because it isn't relational there is some bigotry against it and it is ignored in the popular press.

--
Lasers Controlled Games!
Re:No IMS? by tiled_rainbows · 2003-12-12 01:55 · Score: 1

Because it isn't relational there is some bigotry against it and it is ignored in the popular press.

Dude, where I come from the popular press doesn't often run stories on database architecture of any description - they're more into celebrity gossip and stuff.

--

evil math within Nature's Cubic Creation!
Re:No IMS? by snoitpo · 2003-12-12 04:29 · Score: 1

CA-Datacom is a hierarchical database--like IMS. It's not relational. Hey look! There is one CA-Datacom database on the list.

I work with the Customs and Border Protection database (see Peak Workload, all, OLTP). I've seen it pump 250,000 transactions in a minute (a few times a week) so I wouldn't be surprised to see 50,000/second. This has a few comfortably cruising IBM mainframes feeding a few database instances. MQSeries messaging is pulsing through a few thousand CICS transaction (millions of lines of COBOL). With generally subsecond response. Gotta love those 3270 workstations; web-based interfaces are way too slow (with a few hundred bytes to display the overhead is much much too large).

telemarketers by donnyspi · 2003-12-12 01:14 · Score: 1

Based on all the hype about the national Do Not Call registry, I would have expected to see that up there somewhere. Then again, it probably consists of like one table and 3 fields. It certainly would qualify as a very popular database.

Re:telemarketers by stripmarkup · 2003-12-12 01:40 · Score: 1

A phone number is 10 digits in length. Suppose the database was extremely popular and had one phone number for each person in the US. That's 3 GB, without any sort of compression. It fits on a DVD.

--
See charts for twitter trends on Trendistic

Re:Hang on ... by lewp · 2003-12-12 01:14 · Score: 1

Leaving off a couple zeroes, my friend. Largest database in the survey is 30,000GB. Not to mention, of course, that you probably have to actually request to be included in these tallies. There could very well be much larger databases (maybe government agencies with three letters in their name?) that are unknown to the people running these numbers.

--
Game... blouses.

Hmmm by Cenuij · 2003-12-12 01:14 · Score: 2, Interesting

OK so this is obviously only vendors of databases and RDBMS systems.

In a broader sense aren't such things as the wayback machine a database? What about the truly massive amounts of data gathered at research labs, e.g. CERN. Who's the daddy of these guys?

--
my other sig is written in brainfuck ;)

Re:Hmmm by attonitus · 2003-12-12 02:38 · Score: 1

Put: "Database Size (Hybrid)" into the Category search field and you get:
Stanford Linear Accelerator Center 828,293 [GB] Objectivity DB Cluster Objectivity Sun Sun
i.e. almost a PetaByte.
The Met office comes next with 184,076 GB.

wintercorp climbing up the ratings now.. by maharg · 2003-12-12 01:15 · Score: 1

I would imagine that the Winter Corporation's db is now climbing up the peak performance for online transactions right now ;o)

--

$ strings FTP.EXE | grep Copyright
@(#) Copyright (c) 1983 The Regents of the University of California.

Re:wintercorp climbing up the ratings now.. by Cenuij · 2003-12-12 01:56 · Score: 2, Interesting

On that note...

"Experiments at CERN will produce hundreds of TB of data per year at data rates up to 35MB/second starting in 1999," states Jamie Shiers, Project Leader at CERN. "Experience from the use of Objectivity/DB and HPSS on these experiments will help us understand how we can cope with the staggering 100PB of data at rates up to 1.5GB/second expected at CERN's Large Hadron Collider, starting in 2005."

"The size of CERN's database is bigger than any numbers ever seen," according to Richard Winter, president of Winter Corp., a Boston-based consultancy specializing in VLDBs. "The growing use of non-traditional data types is producing a produce a giant leap in database size. Such databases will soon be commonplace in engineering, commercial, and medical fields as well." concludes Winter.

big mama db's

--
my other sig is written in brainfuck ;)

Re:Hang on ... by mritunjai · 2003-12-12 01:15 · Score: 1

Nope. not even by a large measure.

France Telecom's Oracle database is around 30 TB in size (29,232 GB.. thats a comma not a decimal point).

--
- mritunjai

What surprised me... by MyNameIsFred · 2003-12-12 01:15 · Score: 5, Interesting

I have none, nada, zip experience in big databases. But it surprised me that the peak workloads were measured in 100s of concurrent queries. If I had to make a wild guess, I would have guessed 10s of thousands. My blessed ignorance destroyed.

Re:What surprised me... by Davak · 2003-12-12 01:20 · Score: 1

Anybody know how many concurrennt queries slashdot gets at peak?

It would be an interesting reference point.
Re:What surprised me... by sql*kitten · 2003-12-12 01:27 · Score: 5, Informative

I have none, nada, zip experience in big databases.

S'okay, I have plenty :-)

But it surprised me that the peak workloads were measured in 100s of concurrent queries. If I had to make a wild guess, I would have guessed 10s of thousands. My blessed ignorance destroyed.

You would typically see tens of thousands (or more) of concurrent connections to a middleware layer - like Tuxedo - which would then multiplex them down to hundreds of connections to the database. This is because there is a lot of latency in establishing a connection, in fact logging in often takes an order of magnitude longer than running an actual query, yet few users submit transactions nonstop. So there is no sense in maintaining tens of thousands of expensive user contexts on the DB server, and there is no sense in requiring intermittent (relatively speaking) users to log out after a short idle period. Middleware does nothing but manage concurrent user contexts, and it can do so very efficiently. A database can't, because it tries to preallocate as much context as it can, and that doesn't match real-world usage patterns, and anyway, database vendors concentrate on their SQL engines and leave middleware vendors to manage the rest.

Of course, if you are a big database vendor, you probably also sell middleware, but there's no-one who tries to bundle the two into one, any more than you'd want a web server to have its own filesystem.
Re:What surprised me... by John_Booty · 2003-12-12 01:28 · Score: 1

Anybody know how many concurrennt queries slashdot gets at peak? It would be an interesting reference point.

I agree, it would.

I wouldn't be able to take a stab at the actual numeric value for your answer, but I believe that Slashdot (as most large, content-driven websites need to do) caches a lot of data, so that it doesn't need to be queried out of the database every single time somebody requests the page. That greatly cuts down on the actual number of queries being slung at the database.

--

OtakuBooty.com: Smart, funny, sexy nerds.
Re:What surprised me... by Quill_28 · 2003-12-12 02:51 · Score: 4, Funny

Something is wrong...

Here I find a knowledgable person on Slashdot,
Who has given a well-written response,
Answered the question without flaming the askee,
Didn't use numbers/symbols for letters,
Never slammed MS or SCO,

And was modded up?
Re:What surprised me... by 4of12 · 2003-12-12 02:52 · Score: 1

so that it doesn't need to be queried out of the database every single time somebody requests the page.

Agreed.

I've constantly remarked that my threshhold=-1 story grabs are so quick to come back.

--
"Provided by the management for your protection."
Re:What surprised me... by Anne_Nonymous · 2003-12-12 02:57 · Score: 1

Yeah, ditto this, and add that 29,000 GB doesn't sound that large either. What's up with that?
Re:What surprised me... by theMerovingian · 2003-12-12 03:07 · Score: 1

More Tuxedo information

--
"If you think you have things under control, you're not going fast enough." --Mario Andretti
Re:What surprised me... by jxs2151 · 2003-12-12 03:29 · Score: 2, Funny

It's his user #.
None dare mod down those w/ 4 digits.
Re:What surprised me... by popeyethesailor · 2003-12-12 03:47 · Score: 2, Insightful

I havent read their definition of Peak workload, but I guess it probably means concurrent queries. Even with a persistent connection, shouldnt there be a large number of concurrent queries? With things like parallel querying etc, does the number of connections have to be the same as queries?

Another factor could be caching; if intelligently used could cut down on the DB workload substantially.
Re:What surprised me... by Laetor · 2003-12-12 04:39 · Score: 1

High-larious. Excellent post.

-Laetor
Re:What surprised me... by Frank+T.+Lofaro+Jr. · 2003-12-12 05:41 · Score: 1

The cool sql*kitten username probably helped. :)

--
Just because it CAN be done, doesn't mean it should!
Re:What surprised me... by Mr.Mustard · 2003-12-12 08:20 · Score: 1

any more than you'd want a web server to have its own filesystem.

Nice comment, but I have to disagree with you on this point. I think this is not a totally bad idea. There are any many exploits that stem from the fact that the web server provides access to the file system that the OS is running on.

If the web server only served files from it's own native file system, some of those exploits could be avoided.

--
fnord
Re:What surprised me... by arrow · 2003-12-12 08:36 · Score: 1

I beg to differ.

--
symetrix. We are building a religion, a limited edition.
Re:What surprised me... by jxs2151 · 2003-12-12 11:54 · Score: 1

Just out of curiousity, when did you get your userid? I read ./ for years before actually getting an id and posting and now I wonder what might have been....

What about the WWW? by dinnerkraft · 2003-12-12 01:16 · Score: 1

Shouldn't the World Wide Web be ranking 1st with its huge pr0n database?

--
Real geeks use acronyms.

Re:What about the WWW? by dilby · 2003-12-12 01:40 · Score: 1

How about www.archive.org?

Makes you wonder what the hell kind of data France Telecom is storing....

Yes, Jean-Pierre on the 11 December 2003 at 11:03pm you called "Chaud et Sauvage" escorts and ordered a brunette 5'6", to arrive wearing a Napolean hat, snorkel and flippers.....

--
This post patent pending.
Re:What about the WWW? by WWWWolf · 2003-12-12 01:49 · Score: 1

'cause you can't do SELECT * FROM files WHERE category_id IN (SELECT id FROM category WHERE subtopic = 'Pornography');... this ranking is for, I think, RDBMSes and not mere data storages. It's easy to pile up data, it's harder to actually organize and query it =)

Re:No, it's 30,000GB by Walterk · 2003-12-12 01:16 · Score: 1

29TB actually. (Due to rounding, presicely 28.547 TB)

--

"If anyone needs me, I'm in the angry dome."

29 TB is the biggest? by epiphani · 2003-12-12 01:16 · Score: 3, Interesting

I honestly doubt that 29.2 Terabytes is the biggest database in the world. But anyway...

I recognize Oracle and DB2, but could someone give a brief synopsis of what the other database systems are? And what is an MPP archetype?

--
.

Re:29 TB is the biggest? by Peridriga · 2003-12-12 01:22 · Score: 4, Informative
Well... if you actually read the article it clearly states that 29.2 is not the largest...

You can find the link to the article yourself but
1. AT&T @ 94.3TB
2. Amazon @ 34.2TB
Re:29 TB is the biggest? by leomekenkamp · 2003-12-12 01:31 · Score: 1

Let's see:

Stanford Linear Accelerator Center 828,293 - Objectivity DB - Cluster - Objectivity - Sun - Sun

You can find that under 'database size, hybrid'. Note that this is an object database and as such will never be found under one of the 'number of rows' entries, simply because rows are relational and an object base simply stores objects.

I believe that CERN has got a huge odbms also.

--
Wenn ist das Nunstueck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.
Re:29 TB is the biggest? by mountainhouse · 2003-12-12 01:32 · Score: 5, Interesting

I think the NCR Teradata approach is one of the most interesting. It is made up of a number of nodes (each quad Intel processor systems with separate memory and disk), each broken down into a number of logical machines. Data is hashed across all the nodes in the systems based on the data's indexing. So if two tables have the same indexing the join takes place at the "logical machine" level, and then the result is spooled together. The largest systems approach 300 nodes, with over 2,000 logical machines and 150 Tb of disk (some used to duplicate tables in case of node failure).

Personally, it has it's drawbacks, but if the indexing is right, you can join hundred million row tables at amazing speed. Based on my experience in data warehousing, it's performance Oracle can't touch (no, I'm not paid by NCR...just a user).

http://www.teradata.com

Overview:
http://www.teradata.com/t/go.aspx/?id =84960
Re:29 TB is the biggest? by Ilgaz · 2003-12-12 01:42 · Score: 1

So, its unclassified dbase "competition" just like top 500 supercomputers are the unclassified?

IMHO with dozens of years, FBI and NSA would be top in petabyte levels.
Re:29 TB is the biggest? by musikit · 2003-12-12 01:57 · Score: 1

why would the FBI/NSA/CIA/any government group need a database?

with the patriot act they can just grab the commercial data without provication. let the private industry pay for your DB storage and Admin cost. so if you think about it all those other databases combined are the FBI/NSA/CIA database.
Re:29 TB is the biggest? by Zocalo · 2003-12-12 02:00 · Score: 2, Interesting

Or if you include Hybrids 828.3TB owned by the Stanford Linear Accelerator Center. Frankly, I was expecting to see much larger figures than these from academia and large scale research projects, Laurence Livermore for example.
Obviously data collected from places like Arecibo wouldn't lend themselves to this kind of survey, even though it must be vastly larger, but what about storage of particle vectors from nuclear event simulations? I'm guessing that they were either not nominated or declined to be listed on security grounds rather than don't rate high enough. Does anyone have any figures?

--
UNIX? They're not even circumcised! Savages!
Re:29 TB is the biggest? by fritz1968 · 2003-12-12 02:05 · Score: 2, Interesting

Here's a thought: How do they backup a database that is 94.3 TB? I deal with servers that have only a puny 100-150 GB. One or two LTO tapes backup these servers. What tapes to they use to backup this database?

--
It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change.
Re:29 TB is the biggest? by Ilgaz · 2003-12-12 02:11 · Score: 1

echelon, promis creates huge amounts of data if we exclude fingerprint, contacts and in your way of thinking, CC data, phone data, transportation data too...

Add internet and mail/usenet too, you will see amazing amount of data collected and all have to be relative.

I understand your point too but remember its working everywhere also could be done. Just USA is kinda more open and have to do the stuff you mention by law. Its the difference, nothing else.
Re:29 TB is the biggest? by spiny · 2003-12-12 02:49 · Score: 1

offsite RAID / mirroring ?

i doubt they just have one copy and a little guy sat next to it feeding in DDS3 tapes all day and night ...

--

Fry: heh, Yakov Smirnoff said it
Leela: No he didn't.
Re:29 TB is the biggest? by grid+geek · 2003-12-12 03:20 · Score: 1

The SLAC database is only data from the last 4 years - the BaBar project is due to last for a decade at least so the data size is likely to get upto about 3PB by the time its over.

Arecibo doesn't have large amounts of data, astrophysics is only in the TB range mainly because observatories are in out of the way places so you have to physically ship the data about by truck. Satellites download speeds suck compared to a nice fat fibre optic cable.

The really interesting databases are going to be when the LHC comes online at CERN in 2007 sa they are expecting 3PB a year at least from the 4 main experiments which will be stored in databases.

I have no idea why academia isn't better represented here, but do remember these academic databases usually only have 100s - low 1000s of users - not quite as many as google etc.
Re:29 TB is the biggest? by instanto · 2003-12-12 03:21 · Score: 1

What makes you so sure they do? :-)

--
// instant - "I for one welcome our new Decaff Coffee-Flavoured-Coffee Overlords"
Re:29 TB is the biggest? by Ozric · 2003-12-12 03:26 · Score: 1

One would do a 3rd mirror backup, then run it to tape once remounted. My largest DB takes about 20hrs to run to tape. I have no idea what kind and number of drives they use, DLT would be my guess.
Re:29 TB is the biggest? by Coryoth · 2003-12-12 03:34 · Score: 1

I've used some pretty big Teradata databases (though not as huge as the ones on this list) and I have to admit the performance was pretty good. Most notable was how well performance scaled - that is, compared to Oracle or Sybase IQ you could keep throwing more and more data at Teradata for minimal loss in performance.

Jedidiah

--
Craft Beer Programming T-shirts
Re:29 TB is the biggest? by lewp · 2003-12-12 03:34 · Score: 1

These systems are backed up on hopes and dreams, my friend.

--
Game... blouses.
Re:29 TB is the biggest? by jgerry · 2003-12-12 03:48 · Score: 4, Informative

How do they backup a database that is 94.3 TB?

I support very large Oracle databases for a living (very large meaning > 1TB), databases that must be up 24/7. Backups are done in a number of different ways:

1) Disk syncs, block by block, between disk subsystems at disparate locations, to retain multiple copies of a database in different locations. They can be synced to more than one location too, so you can have as many copies of the database as you want. Your main database is the only "hot" database, the others can be brought up and recovered if needed. We mainly use EMC disk subsystems to do this, the process is called BCV (can't remember what that stands for right now)

2) Real-time replication. One-to-one or one-to-many. All databases are "hot" at all times. This can be great for load balancing too since you can have multiple system onine at the same time. Very difficult to maintain and monitor.

Large databases just can't be put to tape anymore. Even if you did, it would take days or weeks to recover them if they failed. Disk to disk is about the only way to provide backups for really large databases.
Re:29 TB is the biggest? by BigGerman · 2003-12-12 04:27 · Score: 2, Informative

To add to that,
Standby databases are popular when (in Oracle scenario) the archived log files from your hot production database are constantly automatically applied to the cold standby database in some different location and if something happens to the primary it takes very little time to bring the standby up.
Also Oracle hot backup is by nature incremental, you can do like one tablespace per night, dont have to do the whole database at the same time (while backing up all the archived log files). I have seen sites where last cold backup was done something like 4 or 5 years ago.
Re:29 TB is the biggest? by nettdata · 2003-12-12 04:33 · Score: 1

I don't find this list all that accurate (or maybe representative), as I have personally worked with at least 4 DB's that would easily top the list. Mind you, I do a lot of high-end database consulting for governments and fortune 50 companies, and they probably declined to be included, or weren't even aware of this list.

Maybe they should rename it to "the top 10 list of databases that we know about and are allowed to talk about". ;)

--

$0.02 (CDN)
Re:29 TB is the biggest? by Lovepump · 2003-12-12 05:02 · Score: 2, Informative

BCV - Business Contingency Volume I think. We call it Snap backup'ing.

When we dump data, it gets dumped to a VTS (that's Virtual Tape system which is a whopping collection of disk, or DASD pretending to be loads of cartridges). Once the data is on the VTS, it then makes it's way to a selection of real MagStar drives which sit behind the VTS system.

Works quite nicely.
Re:29 TB is the biggest? by AnyLoveIsGoodLove · 2003-12-12 05:48 · Score: 2, Informative

As someone mentioned: Business Continuance Volumes is local copy within the storage array. Sync times depend on data change rate(dirty tracks). Host does not see any performance degradation. Copies are consistent from app level down, if done right.

SRDF = Symmetrix remote data facility. is a bcv copy across a link (network, fiber, DS3/1, OCs etc...fill in the blank). Again it only copies any changed tracks....

Good stuff, this is how most of the Fininacials recovered from 9/11 so quickly...

The databases then are put to tape using the copies. when the db exceed 24 hour backup time, you use multiple copies in rotation. Usually there's a regulatory reason to go to tape, otherwise people just use disk.

--
"It's technical in a psychometric kind a way" -- C. Parish
Re:29 TB is the biggest? by nazzdeq · 2003-12-12 06:35 · Score: 1

I've been in two place where we had a Teradata in DSS environments. We didn't even use indexes as w/ their MPP architecture it's almost always faster to do full table scans, especially w/ large DSS queries.

-Nazz
Re:29 TB is the biggest? by merlin_jim · 2003-12-12 08:58 · Score: 1

Large databases just can't be put to tape anymore. Even if you did, it would take days or weeks to recover them if they failed. Disk to disk is about the only way to provide backups for really large databases.

We have a large database (3 or so TB)... and we have a three-tier recovery model:

First off, we have a RAID array (I forget the number; its one of the high redundancy modes) so any single disk failure doesn't result in data loss

Secondly, we have a hot backup system with log shipping... whenever a transaction is committed, the log gets shipped to the backup system, which then applies the same changes. This way catastrophic database failure can be recovered immediately.

In case the problem is worse or one of those measures itself fails, we still do a tape backup. The notion that tape is inadequate to cover this sort of system is preposterous. We have a midrange tape robot utilizing fair sized tapes (I think the uncompressed storage space is 320GB)... We can do a full backup weekly and a differential backup nightly with no problems. The full backup completes within a few hours.

And yes, we're on the list... I don't know why we're not on the Windows Storage Space list for DSS... but we are on the Peak Workload (All Platforms) for DSS Top Ten...

--
I am disrespectful to dirt! Can you see that I am serious?!
Re:29 TB is the biggest? by Rebar · 2003-12-12 10:16 · Score: 1

Hi. For what it's worth, I benchmarked Oracle single-system (16 cpu) vs. Teradata (4 node by 4 cpu) for a database system of about 5TB total -- the benchmark was about 1/10th that size. While Teradata has an interesting aproach, Oracle *blew it away* in performance on the same queries, when properly tuned. However, when IMproperly tuned, Oracle *sucked pond water* by comparison.

Part of the reason Oracle was faster on our benchmark is that Oracle's parallel hash joins are very very fast, and part of the reason is that NCR wasn't keeping up with the latest Intel CPUs - I believe the machine I benchmarked had 700Mhz Pentiums (two years ago) and I suspect they bogged down on the computations.

Disclaimers: (1) YMMV, this was with my workload; I can imagine a workload in which Teradata would win. (2) When the Oracle optimizer makes the wrong decision, it REALLY sucks, as in kill-this-it-aint-never-gonna-finish. When Teradata had the wrong notion about a query, it maybe ran twice as long as when it was working optimally. Maybe that's why it seems faster to users; there are fewer instances of it just going to shit because the optimizer decided nested-loop-full-table-scans made sense.
Re:29 TB is the biggest? by afidel · 2003-12-12 10:33 · Score: 1

The reason tape isn't really reasonable is that at a max speed of 30MB/s (uncompressed) per T9940B drive and the largest StorageTek silo holding 80 of those drives you get only 8.6 TB/hr which means the larger databases would take on the order of a half day to recover under ideal circumstances. Now you could go all out and have a dense packed Powderhorn silo cluster and get 103.7 TB/hr which might be acceptable, but it would still be a LOT longer than just failing over to the remote site with an up to date copy of the DB on disk =) Btw no tape tech currently has uncompressed storage space of 320GB, there is SDLT 320 which is 160GB uncompressed, LTO 2 which is 200GB uncompressed, and T9940B which is also 200GB uncompressed. Basically tape is there for worst, worst case scenarios, historical reference, and mostly to satisfy regulatory rules.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:29 TB is the biggest? by afidel · 2003-12-12 10:44 · Score: 1

Yeah I was suprised to NOT see Walmart's Teradata installation on the list. Obviously they chose to not be included because their data warehouse was at 23TB in 1998 and they have grown considerably in size since then and been adding data all along.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:29 TB is the biggest? by sapbasisnerd · 2003-12-12 15:49 · Score: 1

Actually what it says about AT&T is that it has 94.3TB of normalized data volume but fails to define this in any more detail than "Normalized Data Volume estimates of the total volume of data managed by the DBMS in GB". It certainly looks like this is the same system that is listed as having ~26TB of data. In any case this whole thing is crap, I personally know of at least one and probably a couple more systems that would qualify for the list in the OLTP category (one DB2 on AIX, one DB2 on zOS and one Oracle, all running SAP R/3). It's all based on 141 surveys...in other words, a miniscule amount of "data" if you could call it that. FWIW there's this church in Utah that has last I heard at least 7 and probably 9 petabytes of genealogical data, I have no idea how they actually organize it and doubt this would be called a single database in the terms used here but there are certainly bigger databases out there than this "study" accounts for.
Re:29 TB is the biggest? by Brannoch · 2003-12-12 16:09 · Score: 1

I once interviewed at Trans-Union (the credit reporting agency) to work on a database that was about 300 terabytes - with 3 terabytes of turnover per day. This was several years ago, so it is probably quite a bit bigger now.

At the time, there was no database server software in existence that could handle that kind of volume, so they wrote their own, in IBM 370 assembler (as far as I remember). Their query language was hand-coded assembler as well.

They wanted me to write macros or some kind of higher-level query language to automate the writing of those queries. It sounded interesting, but I didn't want to work for a credit reporting agency, so I declined.

I think it exceedingly unlikely that they would switch to an RDBMS, so you'll never see that database mentioned in this kind of contest.
Re:29 TB is the biggest? by Pseudonym · 2003-12-13 22:12 · Score: 1

I honestly doubt that 29.2 Terabytes is the biggest database in the world.

Others have noted that there were two bigger databases in the list.

One thing to remember, though, is that you shouldn't confuse the amount of data that some organisation has to manage with the size of an individual database. Lexis-Nexis, for example, manages 40Tb or so, however it's a lot of individual databases, all running on different systems.

--
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});

Switches by Davak · 2003-12-12 01:18 · Score: 2, Funny

AT&T 94,305GB Daytona SMP AT&T Sun Sun

I wonder how much of this database is everytime users have switched to and from AT&T to get those cash bonuses!

Re:article also reports that by dinnerkraft · 2003-12-12 01:19 · Score: 1

Wait I'm confused... Is this supposed to be sarcasm?

--
Real geeks use acronyms.

94.3TB!?!?! by Peridriga · 2003-12-12 01:19 · Score: 4, Interesting

I know where I work we recently (for an IT pat on the back) calculated our total network accessiable storage capacity and came in at a rough estimate of about 150TB. Now that is a giant swarth of data and a decent amount is in databases (MSSQL farm) but, scattered across 1000's of DB's.

It takes a truely amazing staff to maintain (backup, adminisister, maintence, sit and stare at screens) the servers and maintain the integrity of the data but, good lord...

A 94.3TB database? My upmost, and highest kudo's to those DBMA's and admins there. That is one gigantic task to operate. Being it's AT&T and assuming a great deal is billing and maintence functions these have to be up I'm sure a good 3 nines if not greater.

Regardless of the result of the study, which without actually reading the entire study the end results are simply a short-read of a geek pissing contest, I find it truely amazing how much work, man-hours, and midnight pager calls go into maintaining these databases. I know I don't want our DBMA's jobs and certainly wouldn't want to be a DBMA on a 94.3TB farm but, I know those that do and love doing it. It's a speciality skill and apparently these guys do it right...

Kudos...

Re:94.3TB!?!?! by m00nun1t · 2003-12-12 01:29 · Score: 1

I agree, this is amazing. What's even more amazing is looking at the vendor: AT&T. This is a home grown RDBMS! They not only maintain the largest database, but write the software that makes it run!!!

--
Read reviews of shopping cart software
Re:94.3TB!?!?! by swb · 2003-12-12 01:30 · Score: 1

I always wonder about large systems like that. They develop procedures and policies and a whole layer of bureaucracy to try and keep a firm grip on them, but they always seem to become an entity unto themselves that just *seems* to be under control, when it reality no two or three guys have enough access and enough experience with the thing to know exactly what's there.

Or maybe I just lack imagination...
Re:94.3TB!?!?! by AKnightCowboy · 2003-12-12 01:41 · Score: 2, Funny

they always seem to become an entity unto themselves that just *seems* to be under control, when it reality no two or three guys have enough access and enough experience with the thing to know exactly what's there.
Turns out after AT&T deleted an ex-employee's porn, mp3, and warez stash he was hiding in his own personal table they were able to optimize the database down to about 3GB of customer billing data. You just can't find good help these days.
Re:94.3TB!?!?! by milamber.net · 2003-12-12 01:44 · Score: 2, Funny

Being it's AT&T and assuming a great deal is billing and maintence functions

Oh how naive! It may be AT&T but the DB will still be run by a bunch of nerds...

"Right, boss needs a client list"
.. login... ok..

> use bigassdb;
> show tables;
games
porn
mp3s
films
tv
other

..
"Ok clients must be in here somewhere..."
Re:94.3TB!?!?! by kilonad · 2003-12-12 02:24 · Score: 3, Insightful

This is a home grown RDBMS!

What else do you expect from the company that kinda sorta wrote Unix?
Re:94.3TB!?!?! by cygnus · 2003-12-12 04:44 · Score: 1

you wouldn't be giving them kudos if you had much experience talking to their tech support. given my success rate, i'd assume their stuff is down about 30% of the time. usually, i'll come off a long wait on hold, tell a very paitent phone tech my problem, and they'll come back with, "sorry, our system is down. can you call back later?"
something tells me they need to go with an outside vendor. :)

--
Just raise the taxes on crack.
Re:94.3TB!?!?! by sapbasisnerd · 2003-12-12 15:59 · Score: 1

Re-read the article, it doesn't say they have 94.3TB of data, they say this value is an "estimate of the total volume of data managed by the DBMS in GB" and fail completely to explain what they mean by this, since this system appears to be the same one that's in 2nd place on database size at ~26TB I'm guessing they intend this somehow represent "turnover" of the data.

Archive.org not on the list? by CompWerks · 2003-12-12 01:21 · Score: 4, Interesting

They claim to have over 300tb of data.

Quote:
"The Internet Archive Wayback Machine contains over 300 terabytes of data and is currently growing at a rate of 12 terabytes per month." Taken from here

--
If you can read this sig - the bitch fell off.

Re:Archive.org not on the list? by CompWerks · 2003-12-12 01:25 · Score: 1

Forgot to mention, they also run Linux. :)

--
If you can read this sig - the bitch fell off.
Re:Archive.org not on the list? by Agent+00p · 2003-12-12 01:35 · Score: 2, Informative

They don't have to put all their data into one database, though ...,

--
when the shit hits the fan, it is not equally spread
Re:Archive.org not on the list? by Anonymous Coward · 2003-12-12 01:38 · Score: 1, Informative

but they're not running a big Oracle or IBM style database, they're using a content management and static file system.
Re:Archive.org not on the list? by bruthasj · 2003-12-12 02:02 · Score: 1

And what is that difference? Please enlighten us. Regexp grep over a text file with comma-delimited tables can replicate many a SQL commands.
Re:Archive.org not on the list? by bruthasj · 2003-12-12 02:05 · Score: 3, Interesting

All the more proving that you don't need a stupid database for everything. Actually, they should put conventional static filesystems as part of the comparison. Because you know what, some IT people get hooked on trying to dumping everything under the Sun in Oracle. This request is especially relevant for journaling/transaction based filesystems and possibly the future Longhorn thingy where it's got SQL capabilities.
Re:Archive.org not on the list? by Tim+C · 2003-12-12 03:50 · Score: 1

Well, it can't do insertions, deletions, transactional support...

--
It's official. Most of you are morons.
Re:Archive.org not on the list? by CompWerks · 2003-12-12 05:07 · Score: 1

The definition of a database as per dictionary.com
database( P )
Computer Science
n. also data base

A collection of data arranged for ease and speed of search and retrieval. Also called data bank.

--
If you can read this sig - the bitch fell off.
Re:Archive.org not on the list? by MattRog · 2003-12-12 06:45 · Score: 1

The article is talking about DataBase Management Systems -- the collection of programs that *maintain* databases.

--

Thanks,
--
Matt
Re:Archive.org not on the list? by yppiz · 2003-12-12 07:45 · Score: 1

As someone who once was a Senior Data Mining Analyst at IBM and has worked on both Amazon.com's and the Internet Archive's databases, I can say that the Archive most certainly is a database.

Yes, it's flat files, and yes, it's indexed, and yes, it can handle add, insert, delete, and query.

Just because the mechanisms are visible and simple does not mean that the system is anything less than a real database.

--Pat / zippy@cs.brandeis.edu
Re:Archive.org not on the list? by bruthasj · 2003-12-12 14:01 · Score: 1

Insertion: $ touch filename Deletion: $ rm filename Transactions can be emulated in a user-space library.
Re:Archive.org not on the list? by yppiz · 2003-12-12 17:53 · Score: 1

You don't think the Internet Archive has a suite of programs that maintain and organize and serve that data? Just because it doesn't say IBM DB2 on it doesn't mean it isn't a database.

Similarly, Google might be surprised to hear that, because they're not running Oracle, they're not maintaining, indexing, caching, serving, deleting, and updating data.

Oracle and its competitors have convinced people that databases are some sort of magic. They're not.

--Pat / zippy@cs.brandeis.edu

something missing. by kautilya · 2003-12-12 01:21 · Score: 1

CIA and RIAA are given a raw deal!

Re:Hang on ... by Mr.+Dop · 2003-12-12 01:22 · Score: 2, Informative

Nope, you dont even quallify:

In order to qualify for the TopTen program consideration, any commercial production database implementation was required to feature a minimum of 500GB of data for Microsoft Corp.'s Windows and NT platforms and 1TB of data for all other platforms.

But it doesn't say what OS? by nick_urbanik · 2003-12-12 01:23 · Score: 2, Interesting

I cannot see what OS each DB is running on. Is that irrelevant?

Re:But it doesn't say what OS? by xneilj · 2003-12-12 02:36 · Score: 1

z/OS is basically an IBM UNIX for mainframes

No it's not. z/OS is effectively the current version of IBMs MVS operating system which goes back to the 60s. It does have a POSIX-compliant implementation of UNIX available for applications to use if they want (USS - UNIX System Services), but it's not a UNIX platform, especially not when referring to DB/2 for z/OS which is a native MVS application.

--
rm -rf / is the evil of all root
Re:But it doesn't say what OS? by Darren.Moffat · 2003-12-12 02:47 · Score: 1

In some of the cases it is obvious from the vendor.
HP is likely HP/UX, Sun is Solaris, MS is Windows of some variant. The abiquous one is IBM it could be AIX, Solaris, Windows or something else.
Re:But it doesn't say what OS? by larien · 2003-12-12 02:58 · Score: 1

IBM could, of course, be linux or OS/2, but I doubt either of them is being used for any large database, although once they get linux scaling well on p690s, we'll see what happens. Oracle 10g is also based around "grid" databases using clusters of smaller servers to achieve higher throughput which bypasses the need for scaling on a server level.
IBM still have the real "big iron" in their mainframes, but AFAIK, they don't tend to do the largest databases, just ones where they are (a) running legacy code or (b) require absolute reliability which you still can't get from Unix, even in a cluster.

SMP? by zm · 2003-12-12 01:24 · Score: 1, Funny

France Telecom uses Oracle Corp. as its DBMS, Hewlett-Packard Co. as its storage and system vendor, and employs an SMP (symbol manipulation program) architecture.

A case of acronym confusion, I guess. :-)

--
Sig ?

Re:SMP? by Trbmxfz · 2003-12-12 01:36 · Score: 1, Funny

> > SMP (symbol manipulation program) architecture.
> A case of acronym confusion, I guess. :-)

Indeed. Surely they meant "Service Mediation Platform" or possibly "Sex, Money, Power"...
Re:SMP? by operagost · 2003-12-12 02:50 · Score: 1

They also don't seem to realize how vague a reference to an "HP system" is nowadays. Is it HP/UX, Tru64, Linux, xBSD, or OpenVMS?

--

Gamingmuseum.com: Give your 3D accelerator a rest.
Re:SMP? by o'reor · 2003-12-12 03:18 · Score: 1

or possibly "Sex, Money, Power"...
Well after all, France Telecom has long been one of the main pr0n providers in France thanks to the once wonderful Minitel... and they made heaps of money on it, charging every minute online !

--
In Soviet Russia, our new overlords are belong to all your base.
Re:SMP? by RapaNui · 2003-12-12 03:32 · Score: 2, Informative

Yup.

Methinks the character who wrote the article came across the term 'SMP', went to FOLDOC or The Jargon File, and whaddya know - the first hit returns 'Symbol Manipulation Program - Stephen Wolfram's yadda yadda yadda'.

bull! ms sql should have been 1st! by agwis · 2003-12-12 01:27 · Score: 1

Especially in the peak workload category. I seen a lot of ms sql databases working overtime when slammer first came out!

Only on Windows platform! by MS · 2003-12-12 01:27 · Score: 5, Informative

Read all, to get the facts:

Lastly, in the Windows OTLP category HP servers were used by 7 of 10 organizations, and Microsoft SQL Server was the DBMS choice for seven respondents.

Neither WindowsNT, nor MS SQL are generally a choice for the top databases. In fact, to make the entry in this list, a Windows-Database was required to be only half as big as databases on other platforms:

In order to qualify for the TopTen program consideration, any commercial production database implementation was required to feature a minimum of 500 GB of data for Microsoft Corp.'s Windows and NT platforms and 1 TB of data for all other platforms

:-)
ms

Re:Only on Windows platform! by kiwimate · 2003-12-12 01:50 · Score: 2, Informative

At least they don't try to hide it in three point text -- it's right there on the main page. But, anyway...if you want to see another (MS) view, look here.

By the way, I must just grumble at the lack of knowledge some people have on SQL Server. I sat in a meeting a few weeks ago with our Oracle-centric architects who decided that, as SQL Server is being used more and more extensively in our company, they'd better understand something about it. They started asking us various questions which rather puzzled me until I thought I knew what the problem was. "You do realize that SQL Server uses transaction logs, don't you? And that it implements transactional integrity, so, for example, will roll back an incomplete transaction?". Blank stares. "Really? Huh, we just assumed it wouldn't have those features because it's not a real database". Well thanks, guys, for doing your homework and being Oracle defensive on the basis of a good solid knowledge of the issues. At least SQL Server doesn't store internal passwords in a table that I can easily run a SELECT query on. Yes, I know they're encrypted -- but SQL Plus is quite happy to allow me to copy and paste the encrypted password into the authentication dialog and accept that as a valid logon.
Re:Only on Windows platform! by Alsee · 2003-12-12 03:04 · Score: 1

The Microsoft view was pretty hysterical once you realize what it really is. It makes me want to advertize some database program as ranking:

* 7 of the world's top 10 largest databases for transaction processing on Commodore-64.

-

--
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.

Re:Coming in second... by AndroidCat · 2003-12-12 01:27 · Score: 1

Large yes, but I'm sure that their list of who they want to sue must be huge! (Atoms in the universe huge.)

--
One line blog. I hear that they're called Twitters now.

Anonymous by suso · 2003-12-12 01:29 · Score: 4, Funny

Not only does Anonymous say a lot of things and write some music and paint, but he also has one of the world's largest databases.

Re:Anonymous by quantum+bit · 2003-12-12 02:14 · Score: 1

He's just embarassed about using MS SQL server.

Other factors? by UnknowingFool · 2003-12-12 01:30 · Score: 2

While it is nice to see the ranking in terms of size and usage, it would be nice if the survey ranked other factors like maintenance time and number of users to see how they really compare in operation. Largest number of OLTP might signify lower downtime but maybe not.

--
Well, there's spam egg sausage and spam, that's not got much spam in it.

Ooops... by jarpak · 2003-12-12 01:31 · Score: 1

The guy who did the summary is going to have a bill on his way... :)

Quote: "If this is your website please contact Verve Hosting"

And Verve hosting address is billingadmin@vervehosting.com...

JP

Re:Ooops... by prostoalex · 2003-12-12 04:20 · Score: 1

I got an e-mail from their admin today saying my account has been temporarily suspended. I actually had a mirror of that single page, and was planning to redirect to it as soon as my bandwidth quota would approach the max.

The page that I linked to is not that big, has some text and table on it, no graphics or heavy items, I figured I'd survive.

However, the story was posted at 5 am Pacific, when I was seeing the sweetest dreams. Everything seems to be back to normal by now, except that hoster might ask me to move.

Re:Wow... begging for a good Slashdotting by danknight · 2003-12-12 01:35 · Score: 1

troll ?, Thought it was kinda funny

--
wanted: one clever sig,apply within

And in other news... by iapetus · 2003-12-12 01:35 · Score: 1

Winter Corp's own results database shoots to number one in the 'Peak Workload' rankings after being linked to from Slashdot...

--
++ Say to Elrond "Hello.".
Elrond says "No.". Elrond gives you some lunch.

Doh! by Dilaudid · 2003-12-12 01:35 · Score: 2, Funny

I wrote up a brief summary of the top three winners in each category for those too lazy to browse the interactive WinterCorp chart

Hmm - how to /. your own website in one simple step?

Only WIndows and Unix? by jackb_guppy · 2003-12-12 01:36 · Score: 1

Boy is the slanted. I work on Large IBM machines with DB2 built-in... Were are those?

Some one lese wrote about google, it should be in this listing too, even if it is using a in-house developed DB.

Platforms: Windows or Unix... BAH!

Re:Only WIndows and Unix? by chriskenrick · 2003-12-12 03:47 · Score: 1

Well, the summary has some entries with DB2 running on z/OS (for IBM's biggest mainframes), doesn't that count?

SMP? by paulbd · 2003-12-12 01:36 · Score: 4, Informative

does anybody believe that the "SMP" used in reference to the French Telecom DB means "symbol manipulation program" rather than "symmetric multiprocessing"? how are we supposed to take seriously a study (or at least a report about the study) where they just look up acronyms with no understanding?

Google Glossary: God's little helper by arrogance · 2003-12-12 01:37 · Score: 1

Even better is the Google Glossary to solve your acronym hell.

Re:Google Glossary: God's little helper by Ilgaz · 2003-12-12 02:23 · Score: 1

I also wonder, what the heck is IMS? Google (thanks) gives me results, not the explanation.
Re:Google Glossary: God's little helper by rhinoX · 2003-12-12 03:04 · Score: 1

IMS - Information Management System, a heirarchical database created by IBM in the dark ages of computing. Instead of having relationships in the fashion of a RDBMS, the entire IMS database is a bunch of lists, of lists, of lists, of lists, where any list can be part of any other list at any time. We studied this in a database class in school, and even got a chance to play around with a live system (or something based on the same principal, as what we used ran under VMS).

For some types of "queries" it is really incredibly powerful, but it is not very good at building very complex relationships.

--
The copper bosses killed you, Joe. 'I never died', said he.

should it really have been traditional ranking? by way2trivial · 2003-12-12 01:39 · Score: 1

Some things in life are scored 1-10
Some are scored 10-1

shouldn't the overall best performer have been ranked 1984? and the rest from there?

--
every day http://en.wikipedia.org/wiki/Special:Random

Doesn't have to be relational by arrogance · 2003-12-12 01:41 · Score: 4, Interesting

From the article: "the TopTen Program featured 141 qualified and validated surveys representing 23 countries spanning all major DBMS, server and storage vendor products." So it just has to be a DataBase Management System, not necessarily Relational.

Genomic databases by xplenumx · 2003-12-12 01:41 · Score: 2, Interesting

I'm absolutely shocked that the NCBI's (National Center for Biotechnology Information - part of the NIH) genomic and proteomic search engine BLAST isn't included in the list. BLAST is consistantly used by scientists worldwide to search the genome of several organizms. I'm similarly shocked that MEDLINE / PubMed isn't included as it's the primary database for searching published scientific literature. When I think of databases, I think of these two sites - not Amazon.

Re:Genomic databases by smoondog · 2003-12-12 02:03 · Score: 1

Yes, but their traffic is miniscule compared to a dot com like amazon. Also, I don't know the details of the BLAST backend, but I'm not sure it even counts in this competition. It is a conglomeration of tools and several datasets, not incorporated as a single database.

-Sean
Re:Genomic databases by glwtta · 2003-12-12 06:08 · Score: 1

Uh dude, the BLAST databases are not relational (and RDBMSs are being ranked here) they are just a bunch of indexed sequence files. And they are tiny too, in the tens of gigs region.
As for pubmed, that's pretty tiny too - I recently needed to download it for work, in their bloated XML format it takes up about 200GB, there's only something like 30-50GB of actual data in there (that's titles, authors and keywords for about 13 million papers, and abstracts for 2/3 of them).
Now the databases that run their Entrez services are real RDBMS (they run Sybase from what I remember), and are probably quite sizeable, but nowhere near the size of the databases ranked in the article.

--
sic transit gloria mundi

Do not show this to Larry ! by dtio · 2003-12-12 01:45 · Score: 1

Oracle is 1st (France Telecom). I bet larry Ellison is launching a *big* advertising campaing based on these data.

They are going to exploit this thing "ad nauseam". Wait and see.

Frightening by water-and-sewer · 2003-12-12 01:45 · Score: 3, Interesting

Why am I simultaneously frightened and amazed to note that two of the winners are the United States'customs and border patrol database and Experion's credit rating database? If you've ever checked your credit rating you'd realized this company and its peers (equifax etc.) maintain a tremendous amount of information on you, and charge you to verify it. Finding out why your credit is bad, and in the case of a mistake, changing it, is an expensive and time consuming task.

--
If this were Usenet, I'd killfile the lot of you.

Sponsorhip by JimR · 2003-12-12 01:46 · Score: 1

Anyone else notice if you go to wintercorp.com it states:

The TopTen Program is sponsored by Hewlett-Packard, Microsoft, Oracle, Sybase, and Teradata, a division of NCR.

Makes you wonder how definitive this survey really is.

--
#exclude <ms/windows.h>

WTF is OTLP? by arrogance · 2003-12-12 01:47 · Score: 1

Even funnier is that there's no such thing as OTLP: it's OnLine Transaction Processing. On-Transaction Line Processing???

Re:article also reports that by andyh · 2003-12-12 01:52 · Score: 2, Funny

This was tested against a live directory with the same number of users and objects each time?

How was your test environment organised?

Oh no, you were being ironic, I must pay more attention.

Re:Hang on ... by birder · 2003-12-12 02:08 · Score: 1

We have dev systems bigger than 100GB.

they are missing a few .... by GNUALMAFUERTE · 2003-12-12 02:08 · Score: 1, Funny

Slashdot, as the biggest SCO Flames database
The registry of some of my NT5 servers that has become HUGE after 2 years ..
My pr0n cd's sql database : )

--
WTF am I doing replying to an AC at 5 A.M on a Friday night?

What about the SPAMmers? by RazorJ_2000 · 2003-12-12 02:13 · Score: 1

Gee, it's too bad they couldn't get any responses from some of the big SPAMMERS. I bet their db tables and #rows are pretty PHAT too!!

--
pi=sigma{n:0-infinity}[(1/16)^n][(4/(8n+1))-(2/(8n +4))-(1/ (8n+5))-(1/(8n+6))]

Re:What about the SPAMmers? by Tore+S+B · 2003-12-12 04:31 · Score: 1

I can see it already... "Can't satisfy the statistics? Enlarge your database NOW!"

--
toresbe

Not Completely True but Still Frightening by Anonymous Coward · 2003-12-12 02:16 · Score: 1, Funny

I believe you have the right to, once a year, get your credit rating, for free, on demand (usually written.)

Here in Colorado, Equifax sends me a notice every year that my credit was checked and offers me a free copy of the report they (alledgedly) are sending out.

What remains scary is that, although my credit report was dead on, I have in the past had reports that were so wildly inaccurate I had to laugh out loud. But because the person whose information was included on my report had such great credit the credit reporting company (not Equifax, the other one), told me to just leave it on there and take the benefits.

So thank you, Mr. X in Texas! Without your lack of control and deep pockets we probably wouldn't have got our house. Merry Christmas!

put things in perspective by Anonymous Coward · 2003-12-12 02:18 · Score: 2, Informative

The size of the database isn't all that interesting. What is more important from a maintenance and reliability perspective is size in relation to average and peak loads. Who cares if you have 3Tb of data in MS Sql Server, if it takes you 10x longer to run the same query on TeraData and Oracle. For small databases, who cares. Any of the major database can handle several Gb of data without any problems. But there is a huge difference between TeraData, Oracle, Sybase, Db2 and MSSql Server. Sql Server can't handle concurrent queries worth shit from first hand experience. You have to run your queries in an async fashion and have the clients pick up the results later on. Compare it to Db2, Sybase and Oracle, the scalability factor under heavy concurrent without some middleware in between MS Sql Server blows.

Obviously, you would be crazy to not use some middleware, but things aren't as simple as any of the PR guys claim. Running queries asynchronously creates a different set of problems and complicates the entire architecture. If you look at the biggest installation, they all use middleware and most of them use Tuxedo. This includes most, if not all MS Sql Server deployments. OLEDB can't that kind of load and neither can standard COM+. Just look read the full disclosures for TPC. You'll see all the MS Sql Server tests wrapped Tuxedo with COM+. As much as Microsoft likes to slam EJB and Tuxedo being too expensive, you can't scale Sql Server without using tuxedo for really heavy deployments.

Databases not ranked by Hungus · 2003-12-12 02:19 · Score: 2, Interesting

I find it interesting that the largest database is only 2TB larger than the one I recently built. It is a medical system. 66 mysql servers bear the load but I only usually have 30 of them actually active as the rest are mirrors and logging masters. Typical connections: 4500 at any given time.

--
Bad Panda! No Bamboo for you! In matters of importance ACs will not be responded to. Want to say something critical,OK

Re:Databases not ranked by Hungus · 2003-12-12 06:00 · Score: 1

They are all net booted xserves with 2gb of ram, dual procs no HDs FIbre connections to the raid arrays basically standard apple xserves connected to xraids.

--
Bad Panda! No Bamboo for you! In matters of importance ACs will not be responded to. Want to say something critical,OK
Re:Databases not ranked by Dausha · 2003-12-12 06:20 · Score: 1

Perhaps the ranking has proprietary biases?

--
What those who want activist courts fear is rule by the people.
Re:Databases not ranked by Hungus · 2003-12-12 06:51 · Score: 1

Maybe, I just found it interesting as I had no idea that what I was working on would rank amongst the Largest databases. Though to be honest I am quite sure that there are other medical databases that are of a similar if not larger size than mine. I am also wondering what the total cost of implementation was for these databases listed. Including all of the redundant hardware mine was just over $750,000 US

--
Bad Panda! No Bamboo for you! In matters of importance ACs will not be responded to. Want to say something critical,OK

Lots of 'Anon' entries by zerosignal · 2003-12-12 02:30 · Score: 1

I wonder if any of these are large government surveillance databases?

Bah, that's nothing -- let's talk Petabytes by Aliks · 2003-12-12 02:32 · Score: 1

OK I'll be flamed for technical illiteracy, but there are a number of archival systems which go into the Petabyte (1000 Terabyte) range but are still relational databases with row level access.

One I worked on stored the output of Cray supercomputers running modelling programs 24x7. The data was output to a bank of Teradata boxes and then archived to tape. The system had a robot tape librarian at the back end but could still operate as a relational database.

The historical data should all be in there by now which would make around 1.5Pb.

The vendor of the software that managed it all was talking about telephone companies planning similar systems to put up to 5Pb in a system.

Anyone top that?

Re:Bah, that's nothing -- let's talk Petabytes by sapbasisnerd · 2003-12-12 16:06 · Score: 1

Anyone top that? There's this church in Utah, with apparantly something in the order 7-9 Petabytes of genealogical data.

Walmart by kilonad · 2003-12-12 02:38 · Score: 1

I had always heard that walmart maintained one of, if not the biggest database in the world. Kmart appears on one or two of the top ten lists here, but not walmart. Anybody know what gives?

I'd truly expect the truly largest databases to be maintained by financial institutions (banks, credit card companies, the stock market, etc) based on the sheer volume of transactions. Either them or the NSA or the FBI.

Re:Walmart by robo45h · 2003-12-12 03:46 · Score: 1

My thoughts exactly; I believe Wal*Mart has one of the largest. I suspect that's the flaw in this set of awards: perhaps the only organizations considered are those that applied. If no-one at Wal*Mart submitted an application, they're not listed. Similarly, I suspect the US SSA has one of the largest databases around, but they were not listed. The word "commercial" is sprinkled throughout the contest description, but it's not clear if the organization itself has to be commercial, or just the database software. In any event, the "Land Registry" is listed among the winners, and they appear to be a UK government entity, though I'm not sure.
Re:Walmart by Anonymous Coward · 2003-12-12 05:08 · Score: 1, Interesting

Yes, Walmart does have one of the largest, but finacial and government institutions are the largest. The NSA keeps track of everything on the net(yes, everything) they just don't have the tools to analyze all that data.

I read an article in CRN(could be wrong here) about Visa's north american systems. They have two sites, one for the eastern half and one western, dividing the continent at the Mississippi river. The eastern site generates about 240 TB of data a month and could take over the whole continent if the western site went down. All this with just 4-5 IBM mainframes, probably running IMS.

How often does your visa transaction not get processed? On the shopping day after Thanksgiving? All this with just 4-5 boxes? I would like to see Sun, HP, Oracle, Microsoft try that. Even if they could do it, it would be far more expensive to build and maintain.

Re:Hang on ... by TheOnlyCoolTim · 2003-12-12 02:42 · Score: 1

But in France, don't they use commas as decimal points?

Tim

--
Omnia vestra castrorum habetur nobis.

WRONG! by blahbooboo2 · 2003-12-12 02:44 · Score: 1

Well, the results are wrong. Where I work they were told by Microsoft they had the largest MS SQL operational (all live) database of 18terabytes...

Re:WRONG! by CharterTerminal · 2003-12-12 04:42 · Score: 1

Wow, they really fell for the old "Yours is the biggest I've ever seen!" line, huh?

Me Too! by mekkab · 2003-12-12 02:44 · Score: 1

My database professor gave us the run down of the technologies that the NIH databases employ- its some impressive business! Researchers all over the world are indexing and adding papers... SCREW amazon!

--
In the future, I would want to not be isolated from my friends in the Space Station.

What's all this data? by Alkonaut · 2003-12-12 02:47 · Score: 1

I Can see how a web archive or some accelerator laboratory could end up having many terabytes to store, but how come telephone companies have terabyte databases? Sure, huge numbers of customers make lots of data, but say 250M custmers (megacustomers) and a megabyte of info each (that's a whole lot) is still just in the hundreds of gigabytes.

Could the rest be just logs of past telephone traffic? All phone traffic ever made through the company? What portion of these databases contain actual used data (data that is likely to be used in business), rather than just stored historic data? Are companies kepping huge amounts of old data because they can? Because it gives the db administrator a stiffie to think he's got $many terabytes in his db rather than on old tapes in the basement?

Re:What's all this data? by fasaxc · 2003-12-12 03:18 · Score: 1

250 000 000 customers x 1 MB eac. = 250 000 000 MB = 250 TB.

-Shaun
Re:What's all this data? by Alkonaut · 2003-12-12 06:00 · Score: 1

sorry bout that, at 1mb they would only room their customers and no transactions, you're right. Funny how the disks shrink. We all remember saying we'd never fill up that 350mb harddrive, and all of a sudden I can come up with a totally reasonable use for 250TB...

# of rows only in the hundreds of millions? by Anonymous Coward · 2003-12-12 02:53 · Score: 1, Informative

We have databases in our organization (Star Schema, Red Brick) where the fact tables literally have billions of rows. I'm sure there are many other organizations (especially government entities) that have huge databases not on this "list". For those interested on operating at this scale, other interesting hardware/software data mining solutions in the same vein as a Teradata are Netezza Corp's database applicances.

Re:# of rows only in the hundreds of millions? by whereiswaldo · 2003-12-12 16:56 · Score: 1

We have databases in our organization (Star Schema, Red Brick) where the fact tables literally have billions of rows.

From Wintercorp:

Grand Prizes for Most Rows/Records, decision support databases, was given to AT&T, in All Environments and Unix Only, 496 billion rows. comScore Networks, Inc. received the Grand Prize for Most Rows/Records for Windows systems.

496 billion rows! That is simply amazing. Imagine how much space just an 64 bit integer primary key takes to store on that many rows (note that 32 bit unsigned integers max out at around 4 billion).

If each row only took just 64 bits of space in the index, that would amount to 3,784,179 Megabytes - almost 4 terabytes on its own! Can anyone explain how it is even possible to create a database of this size?

MasterCard by truthsearch · 2003-12-12 02:57 · Score: 3, Interesting

I left MasterCard in 1999 after working with their data warehouse. At the time they recently bought a 3 terabyte Sun E10000 with Oracle. They quickly ran out of space and added another terabyte. I'm also surprised to not see them on the list. They work closely with Oracle, who have an office down the street, since they have high volume. Just the credit card transactions table alone gets 14 million new records on average every day.

I agree that there are many companies who would not want to be in that list. There's a small competitive advantage if you keep what technology you use secret.

--
Developers: We can use your help.

WTF? by Quixote · 2003-12-12 03:04 · Score: 1

From the article (yes, I actually read it) :
France Telecom uses Oracle Corp. as its DBMS, Hewlett-Packard Co. as its storage and system vendor, and employs an SMP (symbol manipulation program) architecture.

<grin>

Somebody give Mr. Fonseca a clue. With so many unemployed geeks running around, why can't eWeek find somebody who knows this stuff (even cursorily) to write?

pseudo by OgreChow · 2003-12-12 03:05 · Score: 2, Insightful

I would be surprised if some government databases, such as Social Security's, would not rank on this list if they were allowed to be analyzed.

Re:pseudo by BigGerman · 2003-12-12 04:31 · Score: 1

most of those things are in flat Cobol-processed files and on magnetic tape so they would not appear on this list.

Daytona? by wandazulu · 2003-12-12 03:06 · Score: 3, Insightful

Is it just me, or is this the first time anyone has heard of AT&T's Daytona? A quick Google search reveals a pdf and 8 links before Daytona becomes Daytona Beach. For such a high ranking, I'd think AT&T would want to make it better known that they have this system.

Re:Daytona? by KMnO4 · 2003-12-12 03:14 · Score: 1

Yes agree, strange. This list is clearly crap...why when you search for transaction volume (of an OLTP system) does the Korean "Internet Auction Co" come up while Ebay doesn't? Could they actually be bigger than Ebay? Have more transaction volume? No way, jose.
Also they exclude mainframes where most of the large OLTP stuff is happening...think banks and people using their ATM cards. Even the phone companies still use mainframes for the transaction processing...notice AT&T has a shockingly huge data warehouse (what could they possibly be storing in those 494 billion rows? have that many long distance calls even been placed in the history of the world between all long distance telephone co.s?) but are not anywhere on the transactional rankings.
Re:Daytona? by KMnO4 · 2003-12-12 03:25 · Score: 1

actually...now that I think about it I guess that volume seems reasonable...according to this article they can handle up to 400 million calls in one day, which is, what, every adult in the country making 2 long distance calls in a day? I guess it could happen. 494 billion rows would be a few years of that data.
Re:Daytona? by nettdata · 2003-12-12 04:42 · Score: 1

Also, (at least here in Canada), our AT&T cel phones log and report on EVERY call made, not just long distance, so if that stuff is included somehow, that's a LOT of data.

--

$0.02 (CDN)

ReiserFS, a database-in-filesystem by bigberk · 2003-12-12 03:11 · Score: 1

I used to be really interested in what DB2, MySQL etc. could do until I was turned on to Hans Reiser's vision with respect to file systems. In his view, the storage layers above the file system (complex database software) can be replaced by a more intelligent filesystem that itself acts like a database. I'm currently trying out ReiserFS (a filesystem included in the Linux 2.4 kernel) which internally uses balanced trees to achieve much higher performance in large directories. ReiserFS also wastes much less space in the storage of small files.

Oracle by andy@petdance.com · 2003-12-12 03:11 · Score: 1

I'd say Oracle. Have you tried installing that baby? It makes MS Office look like Twiggy.

Echelon? by tommck · 2003-12-12 03:13 · Score: 1

I bet Echelon's got them all beat! Of course, I don't think the gub'ment is going to let some magazine weenies profile their database!

--
---- It puts the lotion on its skin or else it gets the hose again. It does this whenever it's told.

Not even close(+) by Mycroft_514 · 2003-12-12 03:13 · Score: 1

If you go to the article, you will find that AT&T had the largest listed database at 94.1 TB - that's 9 times your speculation for Google.

And I used to work with some of the AT&T databases. Heck, the payroll system alone would have probably made the list in those days. (And I was the DBA for payroll for a while).

Also, some of the winners were using IDMS - a network implemenation of DBMS - not relational.

I want to know when I can download/eval it. by Ayanami+Rei · 2003-12-12 03:13 · Score: 1

n/t

--
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON

It's concievable by plopez · 2003-12-12 03:18 · Score: 1

From my forays into mainframe land, 93tb could be supported by:
2 sysprogs,
2 dbas,
2-3 operators,
1 applications programmer,
and thousands of data entry personnel.

Everything would run batch, (including dumps into satellite dbs for regional or department uses) except for the online data entry, and the apps programmer would dbe setting up jobs for the operators to run at night.

And the first time any of them hose up the DB, would be thier last day of work on a mainframe.

--
putting the 'B' in LGBTQ+

Re:It's concievable by Ralph+Spoilsport · 2003-12-12 05:44 · Score: 1

plopez wrote:
"And the first time any of them hose up the DB, would be thier last day of work on a mainframe."
I truly dislike being hostile, but: WHAT? Are you, stupid? Anyone who runs a DB this big is ALWAYS running it in multiples - so this way if it goes down (And they DO go down periodically) the "copy" is always already there pumping identical data. Anything that comes in during a switch is held in a buffer and then dumped in / out. Once the down one is up again, it runs a data integrity test, and then it's back on line.
In fact, it's usually a smart idea to do this on a continuous basis.
Anyone who would fire a DBA for the DB dying is a pig headed ass.
RS

--
Shoes for Industry. Shoes for the Dead.

Slow news day means lets post old news.... by cjjjer · 2003-12-12 03:20 · Score: 1

This is over a month old already. Oddly enough I thought that I read it on slashpot in the first place. But maybe here.

As a DBA of Teradata(+) by Mycroft_514 · 2003-12-12 03:24 · Score: 1

I was never so happy as the day I was able to burn all my Teradata manuals, cause I ain't going back to one of those turkeys ever again.

The largest machines were about 250 nodes (Kmart, and look where they are today, and Walmart). I worked on machines up to about 135 nodes (Amps) (and 50 or so COPS) The performance never matched anything I've seen in DB2.

And even today, the performance tuning tools suck.

Oh, and as for your 1500 node limit, better check your manuals. Tucked away in the manual, and hardcoded into the operating system is a little limit - 1024 nodes - the origin of the name Teradata....

Oh, and it only takes 54 legal commands to crash one of those suckers (if you know the right commands, because of a hard coded limitation in the os as well).

Re:AmEx by hrieke · 2003-12-12 03:25 · Score: 2, Informative

I used to work for a company called Epsilon Data Management[1], in Burlington MA. They've been bought since I left them a while ago, but they where the keeper of AmEx customer transaction database for data mining and direct marketing (junk mail and phone calls).
Big. 7 data silos big. Each silo holds 50k tapes, each tape was 30gb, and it usually took 4 days to load.

[1] Epsilon was originally an AmEx division, which was spun off to keep other customers happy (banks and other CC companies).

--
III.IIVIVIXIIVIVIIIVVIIIIXVIIIXIIIIIIIIVIIIIVVIIIV IIVIIIIIIVIII...

France Telecom? They must be doing something wrong by bshroyer · 2003-12-12 03:31 · Score: 3, Funny

My first reaction is that, if France Telecom has the largest (non-hybrid) proprietary relational data storage, at 29 TB, ahead of AT&T and SBC, at around 26TB each, that France Telecom must have a bunch of redundant data lying around.

As of 2001-01-01, France had a population of about 59 Million. As it turns out, however, France Telecom (FTE) provides services to a dozen countries, not just France. Checking Yahoo! Finance, I see that

FTE had 2002 revenues of 49B, with 240,000 employees.
ATT had 2002 revenues of 40B, with 71,000 employees.
Finally, SBC had 2002 revenues of 43B, with 175,000 employees.

So nothing terribly unusual about the size of their database. But it's obvious that the French employees are a bunch of unproductive slackers...

--
The cure for cancer is coming: Reovirus

Maybe you can't grasp the scale(+) by Mycroft_514 · 2003-12-12 03:32 · Score: 1

I once helped out in a study for the largest database AT&T wanted to do. To just store the data would have been 6 times the huge Walmart database's size or more. And this was just for a 3 month rolling store of the calls made on the AT&T network.

The 94.3 TB database is nowhere near what AT&T has to store. That is just one of 7 (last count I had) data centers they maintain. The total size of all the AT&T data approaches several THOUSAND terabytes. They maintain a converted bunker just to store tapes in!

Think about it, they have to keep records for YEARS about every call made on the worldwide entwork.

bah, meaningless by kpharmer · 2003-12-12 03:34 · Score: 3, Interesting

This is like ranking projects based on largest number of lines of code.

Without system descriptions (like in tcp) it merely shows that such a top-end is feasible.

What about total cost?
annual cost?
time to build?
software versions?
hardware?
staffing composition?

I mean really, a 500 gbyte database on a modest single CPU server is far more challenging than a 2 TB database on a 64-CPU E10k.

Re:bah, meaningless by MeerCat · 2003-12-12 10:24 · Score: 1

This is like ranking projects based on largest number of lines of code.

Hooray, thought no-one would say it !

It's like measuring commerce sites by value - I designed and built a system that could probably claim to be the world's largest online marketplace (executed more $ value of trades in it's first 2 weeks than ebay have done in 5 years, and has now executed more $3 trillion in less than a year) - but I'm guessing the software, hardware, complexity and management is a fair bit smaller than the equivalent ebay systems.

I thought this was news for nerds, not Top Trumps

--
T

--
I spent a lot of money on booze, birds and fast cars. The rest I just squandered. - George Best

Thanks Gator! by swanky · 2003-12-12 03:35 · Score: 1

Under the category: Database Size, All, DSS

#7 Claria Corporation 12,100 Oracle SMP Oracle Sun Hitachi

Re:Hang on ... by boy_afraid · 2003-12-12 03:41 · Score: 1

The largest in the survey is 30GB.

Is my organisation the new record holder?

Yeah, I thought the same with by 60GB databases. I did a double-take after this post and it's right, Thirty THOUSAND Gigabytes!

Highest Load = SQL Server? by 330Pilot · 2003-12-12 04:03 · Score: 1

Rank them by load, and you'll note the winners =)

eBay? by bobbabemagnet · 2003-12-12 04:03 · Score: 1

Am I the only one surprised to not find eBay on the list? I suppose on one hand it is respectable to have a large and complex database, but on the other companies with massive databases as part of their business that DON'T show up on the list impress me more.

Not an accurate list by GreenCrackBaby · 2003-12-12 04:08 · Score: 1

I work for a company that makes billing software for tier 1 telcos. My job is to tweak performance of the billing system and environment as we deploy into the client's production environment.

My team has an internal 17 TB database we use to test performance against, and every one of our clients has at least a 15 TB database. I can list four of our clients who maintain at least a 40 TB database. Not one of our clients is listed on that list (nor are we).

--

"The market alone cannot provide sufficient constraints on corporation's penchant to cause harm." -- Joel Bakan

1 billion rows by Saint+Stephen · 2003-12-12 04:37 · Score: 1

The largest DB I've done was about 1 billion rows, processing the weblogs of a large ISP into SQL Server. It was about 1.5 TB.

I wrote some queries that reduced the processing time from 6 hours to 45 minutes :-)

Me = smart

The /. effect shows how M$-$QL stacks up by BrianDeacon · 2003-12-12 04:38 · Score: 1

Microsoft OLE DB Provider for ODBC Drivers error '80004005' [Microsoft][ODBC SQL Server Driver]Timeout expired /vldb/2003_TopTen_Survey/TopTenWinners. asp, line 99 Bad web monkey! 1) ASP blows 2) You didn't catch your error 3) You let your error get spit out on the web page for me to start learning about your source code. 4) You should have used the OLEDB driver. 5) You should have cached those results instead of crippling your sql server fetching the same damn info 1 million times.

--

I didn't pay attention to politics until my country started to scare me. Recently.

wow by rabtech · 2003-12-12 04:48 · Score: 1

That's a lot of data folks. For comparison, Microsoft's Terraserver, which in cooperation with the USGS (geological survey), maps the entire surface of the united states with photographs from the air, satellites, and so on.

That database of pictures is around 6 TB.

Some of the databases listed on the survey are even larger - approaching tens of terrabytes!

I wonder what Skyserver will be (new successor to terraserver, designed to collect and stitch together a map of the entire sky in 3d from all known and future telescope pictures)

--
Natural != (nontoxic || beneficial)

Open Source DBs? by Anonymous Coward · 2003-12-12 05:14 · Score: 2, Interesting

Since neither PostgreSQL or MySQL showed up in the list (not surprisingly), does anybody know what the largest databases are running either of them?

I would guess that PostgreSQL maxes out larger than MySQL. </fuel-on-the-fire>

Who the hell is ANONYMOUS? They're #4! by Ralph+Spoilsport · 2003-12-12 05:31 · Score: 1

If you do a straight search on size, you get:

France Telecom : 29,232 : Oracle : SMP : Oracle : HP : HP

AT&T: 26,269 : Daytona : SMP : AT&T : Sun : Sun

SBC : 24,805 : Teradata : MPP : Teradata : NCR : LSI

***Anonymous*** : 16,191 : DB2 for Unix : MPP/Cluster : IBM : IBM : IBM

16 terabytes, and anonymous.... Hmmmm.... I know! It's the motherlode of all porn sites! Either that or the NSA. Same thing, really...

Mmmmm Condeeeeeeeeee!

RS

--
Shoes for Industry. Shoes for the Dead.

The "Top Ten" publicly *known* databases, maybe... by anactofgod · 2003-12-12 05:31 · Score: 1

I'd be very surprised if there aren't megalithic databases churning away in a black budget projects operated by unnamed government agencies that make these commercial ones puny by comparision.

For that matter, I'm curious as to who "Anonymous", the operator of the #3 db in terms of size, is...

---anactofgod---

--

---anactofgod---

"Equal opportunity swindling - *that* is the true test of a sustainable democracy."

What about our favorite World's Largest Retailer by joeblakethesnake · 2003-12-12 05:55 · Score: 1

Walmart's Information Systems Division career page states

more than 240-terabyte data warehouse

As a former employee (in the store, not at ISD) I know that most of that 240 terabytes is going to be in a database, not just files. I know Walmart keeps alot of stuff a secret, but they are rather proud of their IT stuff, and I'm surprised it didn't make the list

How big is Slashdot's database? by teko_teko · 2003-12-12 06:03 · Score: 1

How big is Slashdot's database?

The largest databases aren't what you think by freality · 2003-12-12 06:04 · Score: 1

Stanford Linear Accelerator Center weighs in at 500TB. They run Objectivity.

Internet Archive weighs in at 300-400TB and runs Linux.

Google is probably somewhere in that range, but they don't tell. A rough guess would be 3307998701 pages * 100KB/page / 1024KB/MB / 1024MB/GB / 1024GB/TB = 308TB. They run pigeons

Re:The largest databases aren't what you think by yppiz · 2003-12-12 07:54 · Score: 1

Google is probably somewhere in that range, but they don't tell. A rough guess would be 3307998701 pages * 100KB/page / 1024KB/MB / 1024MB/GB / 1024GB/TB = 308TB.

Google counts pages as indexed whether they have the page text or just an href with text pointing to a page. In other words, they count all the URLs in their crawl as pages.

The number of crawled pages is more like 50-70% of their count, based on my experiences with the Internet Archive's crawler.

Also, the average page size is 10-20KB, not 100KB.

Assuming 50% and 15KB/page, your formula would estimate their archive as ~20TB.

--Pat / zippy@cs.brandeis.edu

Re:France Telecom? They must be doing something wr by stef49 · 2003-12-12 06:27 · Score: 1

You missed one important point!
FT provides a public service in France which means that they are not expected to make as much profit as a company in the free market.
For example, FT has the obligation to maintain the telecommunications for remote parts of France (mountains, islands,...).
A private company would just refuse to do it or would charge a lot more that FT.

Re:Null Value != "unknown value" by the_mad_poster · 2003-12-12 06:31 · Score: 1

You would define it as 'none'. Lack of an eye color is a perfectly valid piece of data. It's not unknown - you know they have no eyes, therefore, they have no eye color: 'none'.

A Null value is one that does not have a value.

I'm not trolling (despite some clueless moderator's beliefs otherwise - I wish mods wouldn't moderate posts on subjects they don't understand...) or trying to belittle you or anything here, I'm just trying to point out that RDBMSs don't really exist and things like stupid NULLs are what's to blame. Oracle, SQL Server, Hell, even my favorite - PostgreSQL, all are to blame for stuffing non-relational tools down peoples' throats while screaming about RDBMs's. Think about how illogical your statement is:

A Null value ... does not have a value.

That makes absolutely NO sense. How can a value NOT have a value? A NULL is meant to represent something YOU DON'T KNOW. However, if you regularly find you don't know how to describe something completely, you probably shouldn't be trying to describe it within a relation. If you're occasionally going to have the need to temporarily represent an "unknown" value (perhaps you haven't seen this individual yet to know what eye color they have), why not just use the string 'unknown' as a placeholder? It's logical, it's true, and it signals that it needs to be changed eventually. Simple.

--
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!

Something missing by ccarter · 2003-12-12 06:39 · Score: 1

Wally World, errr Walmart is suspiciously absent from that list.

They have a HUGE (200+ node) Teradata install.

Re:France Telecom? They must be doing something wr by garyisabusyguy · 2003-12-12 07:06 · Score: 1

According to ATT, they are using compressed data and FT is _not_. Otherwise, ATT's Daytona database would be MUCH larger than FT's puny Oracle database.

In this survey, the AT&T data warehouse is using Daytona's data compression features to great advantage whereas France Telecom is not using Oracle's compression features in any substantial way. To compare the two on this Database Size metric is to compare apples with oranges
Read about it here: http://www.research.att.com/projects/daytona/

--
Wherever You Go, There You Are

XXX-SQL - The SQL for porn queries by rabs · 2003-12-12 07:08 · Score: 1

(insert 'insert into' joke here)

Slashdot database ? by ultranova · 2003-12-12 07:34 · Score: 1

So, what database does Slashdot use and how big is it ?

--

Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

Re:Null Value != "unknown value" by Ed+Avis · 2003-12-12 08:37 · Score: 1

Semantically, I don't see the difference between using 'unknown' as a placeholder and using an explicit null value. Except that using the placeholder is more awkward and it isn't clear how to store 'unknown' in a field that contains integers, for example. Further, databases have support for using null values, for example in outer joins and in aggregate functions. Perhaps you could modify an RDBMS to recognize the string 'unknown' so it could be used cleanly in these cases, but what would be the point?

If you object to the very idea of storing 'a value which is not known' then I don't see why this is any better when represented as the magic string 'unknown' or the magic value null. I pointed out in my post above that nulls have been part of the relational model since the beginning.

--
-- Ed Avis ed@membled.com

Re:Null Value != "unknown value" by Ed+Avis · 2003-12-12 08:39 · Score: 1

Just a note - I do speak from current experience when saying that a magic 'unknown' string or similar is painful to deal with and a source of bugs.

--
-- Ed Avis ed@membled.com

Go Stratapult! by merlin_jim · 2003-12-12 08:39 · Score: 1

Number one in Decision Support System Peak Workload for Windows!

Number eight in the same category for all platforms!

See, small guys can do big things! We're a small to midsize consulting firm (50 or so employees), and yet we're on the top ten list of largest databases in the world!

*pops the champagne*

--
I am disrespectful to dirt! Can you see that I am serious?!

Re:anonymous DB2? by ccarter · 2003-12-12 08:44 · Score: 1

I suspect a more realistic guess is that its one of Teradata's larger installations that is preparing to defect to a DB2 EEE install and they aren't quite ready for NCR to know just yet.

Re:France Telecom? They must be doing something wr by khallow · 2003-12-12 09:18 · Score: 1

Or as in the case of the US, would be required to provide service to everyone in their service region. I've been in some rather isolated areas in the US that had cheap affordable phone service. But I got to agree, we can't expect a public service to be run efficiently.

sweet, thanks (n/m) by freality · 2003-12-12 09:51 · Score: 1

required msg

Re:Null Value != "unknown value" by the_mad_poster · 2003-12-12 10:05 · Score: 1

I'll respond to your three posts here for convenience.

Yes, you're right, of course, on Codd's assertions regarding the NULL. I'm strictly speaking in the sense of SQL, however, and the issues that the current implementations of that messy language and the resultant DMBSs that are raised. Of course, I realize I didn't SAY that, so my apologies for confusing the matter (perhaps I need to add myself to my last journal entry regarding people who type before they think...).

At any rate, back to NULL. It's not that storing unknown values is bad, it's that it's abused. It turns into a catch-all like the eye color issue the AC raised. NULL (in the current SQL sense - perhaps this wouldn't even be an issue if NULLs were treated properly) makes no sense with that issue because the correct eye color for someone without eyes is "none", not NULL. People assume that anything that's not a cozy little tailor-made fit with their views of things can be represented as NULL, which is simply not the case.

That goes back to my other two posts though (that have, apparently, been bitchslapped to Troll) regarding people like Ellison and companies like Oracle that develop systems that claim to be relational because they implement PORTIONS of the Relational Model, and then sell them to people who don't want to bother learning what the Relational Model really is. I may not always agree with Fabian, but I think he and Date are right when they say that a truly relational system would render the whole frenzy over "XML Databases" and "Ob-Relational Databases" and all that other garbage moot and it's a crime that these companies get away with pushing the garbage they do by claiming it's something it's not. If a truly relational system were implemented, it could represent data in the ways that those systems do simply through the proper use of data types and attribute definitions (why, pray tell, could one not simply define and XML datatype in a true RDBMS?). It is a great source of annoyance to me that people will sit and argue with me about the nature of a Relational Database based solely on the fact that a vendor says it's relational. I actually sat and argued with a professor for hours once that Access barely qualifies as a DBMS, much less a relational one. He kept arguing, however, that, because it had "tables" and keys and a handful of data types, it must be relational. I seriously wanted to clobber him with the copy of "Intro to Database Systems" that I had on hand (that's one heavy freakin' book in case you've never held a copy).

BTW - I don't condone the use of a kludgy string unless it's necessary. By necessary I mean "you happen to realize part way through the process that you don't know a value, but you will get it and replace the kludge as quickly as possible". In fact, I don't think ANY kludge like that should ever go to production because it causes problems in app development later on.

--
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!

Re:Null Value != "unknown value" by Ed+Avis · 2003-12-12 10:30 · Score: 1

Null may be abused, however it is much the less bad option compared to most other ways of modelling unknown or not applicable values. Sometimes you have a genuine need to model data where some things are really not known. This is not the same as a kludge because you don't know what value to put. When designing your relational database you make a conscious decision that it makes sense to have rows where 'height' is not known.

Of course the correct eye colour for someone without eyes is 'none', or 'not applicable' as it is sometimes called. But you may decide as part of your data model to represent 'none' using null. This might not be ideal but it is certainly no worse than representing it using a magic string.
Of course, you are now confusing unknown eye colour with not-applicable eye colour, and this was one of Codd's criticisms of SQL, that it has only a single null value but needs two. However, I don't think this single failing is enough to disqualify a system as a relational database. And if you are using a particular RDBMS that supports a single null value it usually makes more sense to go with what the database supports natively rather than try to reinvent it with special 'none' strings. Even if you wish that the native support could be a bit different, you'd be silly not to use what is there. As I mentioned earlier, aggregate functions such as avg(height) are aware of null but not aware of any special value you might choose.

If you don't condone using such a value in production, but only as a kludge during development, then fair enough. If you've never needed to model unknown values or not applicable ones, you've been lucky. (By good design and normalization it's possible to reduce the number of not-applicable values you have to store, but they can't always be avoided entirely.)

You're right that Oracle comes with a lot of non-relational crud like 'XML databases'. However, don't let that distract you from the fact that Oracle, the core product, is a pretty good implementation of a relational database system. The same goes for most other RDBMSes. I believe you can choose simply not to install the XML / object-relational stuff.

I'd say Access probably is a relational database because it stores all data in relations and you use a query language based on relational algebra (even though it may have a GUI front end) to retrieve and update the data. It might not meet all 12 of Codd's rules, but in my opinion it's close enough. You might sound a bit unreasonable if you insist that a database storing relational data is not a relational database.

--
-- Ed Avis ed@membled.com

Missed the largest one. by avsed · 2003-12-12 11:24 · Score: 1

Particle physics experiments routinely collect far more data than this. The Babar experiment that I participated in stores enough data that its database is an order of magnitude greater in size than anything in this article (Current size: 895.0 TB).

See: BaBar Database for details, it uses an OO database (which in my experience was very painful for users)

Dan

correct title is "largest civilian databases" by brre · 2003-12-12 12:10 · Score: 1

Or largest publicly admitted-to databases.

Huh? by TrailerTrash · 2003-12-12 13:11 · Score: 1

Mine at work is 44TB, DB/2 for AIX, running on an RS/6000 system with 128 nodes. DSS only.

But I work for a really huge US company who doesn't talk to the media much.

It makes me wonder how many really huge ones are also flying "under the radar screen". Such as SCO's database of all Linux users, perhaps...

We are larger: 500TB by SilverSun · 2003-12-12 14:39 · Score: 2, Interesting

I don't understand their counting. Not that I am happy with it, but we (BaBar) have certainly a much larger database than all of these companies. And, since we also have severl computing farm summing up to several thousand CPUs which process the data constantly, I doubt that they have higher load.

Press release:

http://www.slac.stanford.edu/slac/media-info/200 20 412/database.html

Cheers

--

KdenLive/PIAVE - non-linear video editing

Re:We are larger: 500TB by Call+Me+Black+Cloud · 2003-12-16 15:52 · Score: 1

I was just going to point that out. Maybe they are only ranking relational databases and didn't know what to do with your object oriented database.

For those unfamiliar with BaBar the project uses Objectivity. Here's an article about it.

Symmetric Multiple Processors, you DORK by stanwirth · 2003-12-12 16:11 · Score: 1

"France Telecom uses Oracle Corp. as its DBMS, Hewlett-Packard Co. as its storage and system vendor, and employs an SMP (symbol manipulation program) architecture."

The author of this article just failed my bullshit filter. SMP in this context is "symmetric multiple processors" -- yes, SMP "Symbolic Manipulation Programme" was the name of what Stephen Wolfram wrote back in the early 80's while a grad student at Caltech, and open-sourced, and got heaps of shit for, because of a nasty copyright battle with Caltech over it. He was a student, and felt he owned the code he wrote while a student. Caltech felt differently when it started giving MacSyma a heck of a run for its money -- and Maple started raising their prices.

But this has abso-fucking-lutely nothing to do with database architecture. What "geek dictionary" did this writer look up this acronym in? Doesn't know what he's writing about. At all.

Re:France Telecom? They must be doing something wr by stef49 · 2003-12-12 17:13 · Score: 1

> FT also runs Minitel, which some might scoff
> at but is not trivial to run

Minitel is also one of the largest PORN DATABASE in the world. You can find there millions of PORN images in high minitel resolution (40x25 in 16 colors).

I hope that there is no spam filter on /.! Can I say PORN two, ... three times without being moderated down?

Re:Null Value != "unknown value" by the_mad_poster · 2003-12-13 03:35 · Score: 1

You might sound a bit unreasonable if you insist that a database storing relational data is not a relational database.

You mean like... oh, say, Fabian Pacal, that guy I keep reading? :)

Everything you've said now comes down to the realm of current practical implementations. Yes, Oracle is "pretty close" as is SQL Server, PostgreSQL, and a slew of others, but they're just not there yet. I don't really have a problem with Oracle (excepting price...), SQL Server, or PostgreSQL. In fact, I love PostgreSQL. However, when I hear people arguing for OOP Database architectures or XML Databases, they always try to argue that "relational database management systems just don't meet the needs of the data being modeled in these circumstances" (note, now, we're in the realm of theory). This is total BS. There's no reason you couldn't have an XML data type in a real RDBMS or create a network model of your data within the RDBMS (although, that latter would stupid, you COULD do it). The problem is that vendors aren't offering relational system in the true sense of Codd's works, and when something comes along like XML, people jump up and run around thinking they need to have a new DBMS and yell that the "relation system is dead".

Baloney!

And, since I've already gotten the two original posts modded to Trolls, why not go for a three peat by tossing out a totally opinionated, offtopic statement :)

XML is a stupid idea anyway.

--
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!

Re:Null Value != "unknown value" by Ed+Avis · 2003-12-13 05:50 · Score: 1

Why do you think that relational database vendors are not offering a true RDBMS in the sense of Codd's works? You mentioned null values before, but I explained that this is definitely part of the relational model and not a deviation from it. What else doyou think is missing?

--
-- Ed Avis ed@membled.com

Slashdot Mirror

World's Largest Databases Ranked

265 of 356 comments (clear)