World's Largest Databases Ranked
prostoalex writes "Winter Corp. has summarized its findings of the annual TopTen competition, where the world's largest and most hard-working (in terms of load) databases are ranked. The results are in, and this year the contestants were ranked on size, data volume, number of rows and peak workload. I wrote up a brief summary of the top three winners in each category for those too lazy to browse the interactive WinterCorp chart."
I would've expected to see Google in there somewhere.
Does the SQL Server mean MS-SQL?
I would have liked to see SQL vs non-SQL ranking too.
"The price good men pay for indifference to public affairs is to be ruled by evil men." ~Plato (427-347 BC)
I thought that 90% of the world's data was irretrievably trapped in IMS? Seriously though, I am surprised that an IMS system isn't on the list. Probably because it isn't relational, and the people making the list figure that RDBMS are the only DB around.
Lasers Controlled Games!
I have none, nada, zip experience in big databases. But it surprised me that the peak workloads were measured in 100s of concurrent queries. If I had to make a wild guess, I would have guessed 10s of thousands. My blessed ignorance destroyed.
I know where I work we recently (for an IT pat on the back) calculated our total network accessiable storage capacity and came in at a rough estimate of about 150TB. Now that is a giant swarth of data and a decent amount is in databases (MSSQL farm) but, scattered across 1000's of DB's.
It takes a truely amazing staff to maintain (backup, adminisister, maintence, sit and stare at screens) the servers and maintain the integrity of the data but, good lord...
A 94.3TB database? My upmost, and highest kudo's to those DBMA's and admins there. That is one gigantic task to operate. Being it's AT&T and assuming a great deal is billing and maintence functions these have to be up I'm sure a good 3 nines if not greater.
Regardless of the result of the study, which without actually reading the entire study the end results are simply a short-read of a geek pissing contest, I find it truely amazing how much work, man-hours, and midnight pager calls go into maintaining these databases. I know I don't want our DBMA's jobs and certainly wouldn't want to be a DBMA on a 94.3TB farm but, I know those that do and love doing it. It's a speciality skill and apparently these guys do it right...
Kudos...
Quote:
"The Internet Archive Wayback Machine contains over 300 terabytes of data and is currently growing at a rate of 12 terabytes per month." Taken from here
If you can read this sig - the bitch fell off.
You can find the link to the article yourself but
Lastly, in the Windows OTLP category HP servers were used by 7 of 10 organizations, and Microsoft SQL Server was the DBMS choice for seven respondents.
Neither WindowsNT, nor MS SQL are generally a choice for the top databases. In fact, to make the entry in this list, a Windows-Database was required to be only half as big as databases on other platforms:
In order to qualify for the TopTen program consideration, any commercial production database implementation was required to feature a minimum of 500 GB of data for Microsoft Corp.'s Windows and NT platforms and 1 TB of data for all other platforms
ms
Not only does Anonymous say a lot of things and write some music and paint, but he also has one of the world's largest databases.
I think the NCR Teradata approach is one of the most interesting. It is made up of a number of nodes (each quad Intel processor systems with separate memory and disk), each broken down into a number of logical machines. Data is hashed across all the nodes in the systems based on the data's indexing. So if two tables have the same indexing the join takes place at the "logical machine" level, and then the result is spooled together. The largest systems approach 300 nodes, with over 2,000 logical machines and 150 Tb of disk (some used to duplicate tables in case of node failure).
d =84960
Personally, it has it's drawbacks, but if the indexing is right, you can join hundred million row tables at amazing speed. Based on my experience in data warehousing, it's performance Oracle can't touch (no, I'm not paid by NCR...just a user).
http://www.teradata.com
Overview:
http://www.teradata.com/t/go.aspx/?i
does anybody believe that the "SMP" used in reference to the French Telecom DB means "symbol manipulation program" rather than "symmetric multiprocessing"? how are we supposed to take seriously a study (or at least a report about the study) where they just look up acronyms with no understanding?
From the article: "the TopTen Program featured 141 qualified and validated surveys representing 23 countries spanning all major DBMS, server and storage vendor products." So it just has to be a DataBase Management System, not necessarily Relational.
What are the pros and cons to databasing (sp.?) your porn? - except perhaps, reduced chance of getting a girlfriend, and chance of ridicule, obviously...
Hey, this is the right place to ask ;)
This is my Sig, this is my Gun. One is for Slashdot and one is for Fun.
How do they backup a database that is 94.3 TB?
I support very large Oracle databases for a living (very large meaning > 1TB), databases that must be up 24/7. Backups are done in a number of different ways:
1) Disk syncs, block by block, between disk subsystems at disparate locations, to retain multiple copies of a database in different locations. They can be synced to more than one location too, so you can have as many copies of the database as you want. Your main database is the only "hot" database, the others can be brought up and recovered if needed. We mainly use EMC disk subsystems to do this, the process is called BCV (can't remember what that stands for right now)
2) Real-time replication. One-to-one or one-to-many. All databases are "hot" at all times. This can be great for load balancing too since you can have multiple system onine at the same time. Very difficult to maintain and monitor.
Large databases just can't be put to tape anymore. Even if you did, it would take days or weeks to recover them if they failed. Disk to disk is about the only way to provide backups for really large databases.