World's Largest Databases Ranked
prostoalex writes "Winter Corp. has summarized its findings of the annual TopTen competition, where the world's largest and most hard-working (in terms of load) databases are ranked. The results are in, and this year the contestants were ranked on size, data volume, number of rows and peak workload. I wrote up a brief summary of the top three winners in each category for those too lazy to browse the interactive WinterCorp chart."
I would've expected to see Google in there somewhere.
scored a measley 17th. Oh well, time for more surfing.
Does the SQL Server mean MS-SQL?
I would have liked to see SQL vs non-SQL ranking too.
"The price good men pay for indifference to public affairs is to be ruled by evil men." ~Plato (427-347 BC)
I wonder how many of the spammers allowed their databases to be evaluated for this list.
Food not Bombs is a nice platitude but it breaks down when you notice that the Bombees are usually well fed
You're off by 3 orders of magnitude. The largest is 30TB.
I thought that 90% of the world's data was irretrievably trapped in IMS? Seriously though, I am surprised that an IMS system isn't on the list. Probably because it isn't relational, and the people making the list figure that RDBMS are the only DB around.
Lasers Controlled Games!
OK so this is obviously only vendors of databases and RDBMS systems.
In a broader sense aren't such things as the wayback machine a database? What about the truly massive amounts of data gathered at research labs, e.g. CERN. Who's the daddy of these guys?
my other sig is written in brainfuck
I have none, nada, zip experience in big databases. But it surprised me that the peak workloads were measured in 100s of concurrent queries. If I had to make a wild guess, I would have guessed 10s of thousands. My blessed ignorance destroyed.
I honestly doubt that 29.2 Terabytes is the biggest database in the world. But anyway...
I recognize Oracle and DB2, but could someone give a brief synopsis of what the other database systems are? And what is an MPP archetype?
.
AT&T 94,305GB Daytona SMP AT&T Sun Sun
I wonder how much of this database is everytime users have switched to and from AT&T to get those cash bonuses!
I know where I work we recently (for an IT pat on the back) calculated our total network accessiable storage capacity and came in at a rough estimate of about 150TB. Now that is a giant swarth of data and a decent amount is in databases (MSSQL farm) but, scattered across 1000's of DB's.
It takes a truely amazing staff to maintain (backup, adminisister, maintence, sit and stare at screens) the servers and maintain the integrity of the data but, good lord...
A 94.3TB database? My upmost, and highest kudo's to those DBMA's and admins there. That is one gigantic task to operate. Being it's AT&T and assuming a great deal is billing and maintence functions these have to be up I'm sure a good 3 nines if not greater.
Regardless of the result of the study, which without actually reading the entire study the end results are simply a short-read of a geek pissing contest, I find it truely amazing how much work, man-hours, and midnight pager calls go into maintaining these databases. I know I don't want our DBMA's jobs and certainly wouldn't want to be a DBMA on a 94.3TB farm but, I know those that do and love doing it. It's a speciality skill and apparently these guys do it right...
Kudos...
Quote:
"The Internet Archive Wayback Machine contains over 300 terabytes of data and is currently growing at a rate of 12 terabytes per month." Taken from here
If you can read this sig - the bitch fell off.
In order to qualify for the TopTen program consideration, any commercial production database implementation was required to feature a minimum of 500GB of data for Microsoft Corp.'s Windows and NT platforms and 1TB of data for all other platforms.
I cannot see what OS each DB is running on. Is that irrelevant?
Lastly, in the Windows OTLP category HP servers were used by 7 of 10 organizations, and Microsoft SQL Server was the DBMS choice for seven respondents.
Neither WindowsNT, nor MS SQL are generally a choice for the top databases. In fact, to make the entry in this list, a Windows-Database was required to be only half as big as databases on other platforms:
In order to qualify for the TopTen program consideration, any commercial production database implementation was required to feature a minimum of 500 GB of data for Microsoft Corp.'s Windows and NT platforms and 1 TB of data for all other platforms
ms
Not only does Anonymous say a lot of things and write some music and paint, but he also has one of the world's largest databases.
While it is nice to see the ranking in terms of size and usage, it would be nice if the survey ranked other factors like maintenance time and number of users to see how they really compare in operation. Largest number of OLTP might signify lower downtime but maybe not.
Well, there's spam egg sausage and spam, that's not got much spam in it.
Hmm - how to /. your own website in one simple step?
does anybody believe that the "SMP" used in reference to the French Telecom DB means "symbol manipulation program" rather than "symmetric multiprocessing"? how are we supposed to take seriously a study (or at least a report about the study) where they just look up acronyms with no understanding?
From the article: "the TopTen Program featured 141 qualified and validated surveys representing 23 countries spanning all major DBMS, server and storage vendor products." So it just has to be a DataBase Management System, not necessarily Relational.
I'm absolutely shocked that the NCBI's (National Center for Biotechnology Information - part of the NIH) genomic and proteomic search engine BLAST isn't included in the list. BLAST is consistantly used by scientists worldwide to search the genome of several organizms. I'm similarly shocked that MEDLINE / PubMed isn't included as it's the primary database for searching published scientific literature. When I think of databases, I think of these two sites - not Amazon.
Why am I simultaneously frightened and amazed to note that two of the winners are the United States'customs and border patrol database and Experion's credit rating database? If you've ever checked your credit rating you'd realized this company and its peers (equifax etc.) maintain a tremendous amount of information on you, and charge you to verify it. Finding out why your credit is bad, and in the case of a mistake, changing it, is an expensive and time consuming task.
If this were Usenet, I'd killfile the lot of you.
This was tested against a live directory with the same number of users and objects each time?
How was your test environment organised?
Oh no, you were being ironic, I must pay more attention.
my other sig is written in brainfuck
Obviously, you would be crazy to not use some middleware, but things aren't as simple as any of the PR guys claim. Running queries asynchronously creates a different set of problems and complicates the entire architecture. If you look at the biggest installation, they all use middleware and most of them use Tuxedo. This includes most, if not all MS Sql Server deployments. OLEDB can't that kind of load and neither can standard COM+. Just look read the full disclosures for TPC. You'll see all the MS Sql Server tests wrapped Tuxedo with COM+. As much as Microsoft likes to slam EJB and Tuxedo being too expensive, you can't scale Sql Server without using tuxedo for really heavy deployments.
I find it interesting that the largest database is only 2TB larger than the one I recently built. It is a medical system. 66 mysql servers bear the load but I only usually have 30 of them actually active as the rest are mirrors and logging masters. Typical connections: 4500 at any given time.
Bad Panda! No Bamboo for you! In matters of importance ACs will not be responded to. Want to say something critical,OK
I left MasterCard in 1999 after working with their data warehouse. At the time they recently bought a 3 terabyte Sun E10000 with Oracle. They quickly ran out of space and added another terabyte. I'm also surprised to not see them on the list. They work closely with Oracle, who have an office down the street, since they have high volume. Just the credit card transactions table alone gets 14 million new records on average every day.
I agree that there are many companies who would not want to be in that list. There's a small competitive advantage if you keep what technology you use secret.
Developers: We can use your help.
I would be surprised if some government databases, such as Social Security's, would not rank on this list if they were allowed to be analyzed.
Is it just me, or is this the first time anyone has heard of AT&T's Daytona? A quick Google search reveals a pdf and 8 links before Daytona becomes Daytona Beach. For such a high ranking, I'd think AT&T would want to make it better known that they have this system.
I used to work for a company called Epsilon Data Management[1], in Burlington MA. They've been bought since I left them a while ago, but they where the keeper of AmEx customer transaction database for data mining and direct marketing (junk mail and phone calls).
Big. 7 data silos big. Each silo holds 50k tapes, each tape was 30gb, and it usually took 4 days to load.
[1] Epsilon was originally an AmEx division, which was spun off to keep other customers happy (banks and other CC companies).
III.IIVIVIXIIVIVIIIVVIIIIXVIIIXIIIIIIIIVIIIIVVIII
My first reaction is that, if France Telecom has the largest (non-hybrid) proprietary relational data storage, at 29 TB, ahead of AT&T and SBC, at around 26TB each, that France Telecom must have a bunch of redundant data lying around.
As of 2001-01-01, France had a population of about 59 Million. As it turns out, however, France Telecom (FTE) provides services to a dozen countries, not just France. Checking Yahoo! Finance, I see that
FTE had 2002 revenues of 49B, with 240,000 employees.
ATT had 2002 revenues of 40B, with 71,000 employees.
Finally, SBC had 2002 revenues of 43B, with 175,000 employees.
So nothing terribly unusual about the size of their database. But it's obvious that the French employees are a bunch of unproductive slackers...
The cure for cancer is coming: Reovirus
Yup.
Methinks the character who wrote the article came across the term 'SMP', went to FOLDOC or The Jargon File, and whaddya know - the first hit returns 'Symbol Manipulation Program - Stephen Wolfram's yadda yadda yadda'.
This is like ranking projects based on largest number of lines of code.
Without system descriptions (like in tcp) it merely shows that such a top-end is feasible.
What about total cost?
annual cost?
time to build?
software versions?
hardware?
staffing composition?
I mean really, a 500 gbyte database on a modest single CPU server is far more challenging than a 2 TB database on a 64-CPU E10k.
Since neither PostgreSQL or MySQL showed up in the list (not surprisingly), does anybody know what the largest databases are running either of them?
I would guess that PostgreSQL maxes out larger than MySQL. </fuel-on-the-fire>
I don't understand their counting. Not that I am happy with it, but we (BaBar) have certainly a much larger database than all of these companies. And, since we also have severl computing farm summing up to several thousand CPUs which process the data constantly, I doubt that they have higher load.
0 20 412/database.html
Press release:
http://www.slac.stanford.edu/slac/media-info/20
Cheers
KdenLive/PIAVE - non-linear video editing