Slashdot Mirror


World's Largest Databases Ranked

prostoalex writes "Winter Corp. has summarized its findings of the annual TopTen competition, where the world's largest and most hard-working (in terms of load) databases are ranked. The results are in, and this year the contestants were ranked on size, data volume, number of rows and peak workload. I wrote up a brief summary of the top three winners in each category for those too lazy to browse the interactive WinterCorp chart."

36 of 356 comments (clear)

  1. Google by ScribeOfTheNile · · Score: 5, Interesting

    I would've expected to see Google in there somewhere.

    1. Re:Google by tinrib · · Score: 5, Informative

      Doesn't Google use 'big files' rather than a database for storing all its data?

      see http://www.cs.rochester.edu/sosp2003/papers/p125-g hemawat.pdf which describes the Google filesystem.

    2. Re:Google by lewp · · Score: 5, Informative

      Even if Google qualified, which it probably doesn't due to the methods it uses for its data storage, if I read the article properly the database vendors are responsible for naming the participants.

      Since Google's stuff seems to be developed in-house, they don't have a major database vendor to nominate them.

      --
      Game... blouses.
    3. Re:Google by stripmarkup · · Score: 5, Informative

      It seems that they are comparing relational databases. Search engines use proprietary databases which, among other things, do not allow for live insertion of records, SQL commands, etc. As for data volume, Google (or Yahoo or MSN, for that matter) are probably in the ballpark. The average html page is around 10k. Google probably stores at least 10^9 raw web pages in their cache(that's 10 TB alone) plus a lot of meta information about links to-from many others.

      --
      See charts for twitter trends on Trendistic
    4. Re:Google by Wastl · · Score: 5, Informative
      The term "database" is rather unprecise.

      One might see a database as merely a "big file" with mechanisms to access and modify it consistently (and surely, Google has some means to ensure consistency). A big file does not disqualify for the term "database" just because it is not produced by one of {Oracle, MS-SQL, ...} or cannot be queried by the language SQL.

      It is also possible to consider the Web to be a database (of Web sites). Or an XML, BibTeX, dbm, whatsoever file.

      Sebastian

  2. My porn database by Trigun · · Score: 3, Funny

    scored a measley 17th. Oh well, time for more surfing.

    1. Re:My porn database by real_smiff · · Score: 4, Funny
      Does anyone actually have their porn in a database (of some sort)? I'm curious whether the "porn database" is just a joke or ... hmm, worth implementing! For all I know, there's already a 'porn-o-base' (tm?) collaborative project on sourcefourge that you're all using - after reading slashdot for a bit nothing would surprise me...

      What are the pros and cons to databasing (sp.?) your porn? - except perhaps, reduced chance of getting a girlfriend, and chance of ridicule, obviously...

      Hey, this is the right place to ask ;)

      --

      This is my Sig, this is my Gun. One is for Slashdot and one is for Fun.

  3. SQL Server? by B5_geek · · Score: 5, Interesting

    Does the SQL Server mean MS-SQL?

    I would have liked to see SQL vs non-SQL ranking too.

    --
    "The price good men pay for indifference to public affairs is to be ruled by evil men." ~Plato (427-347 BC)
    1. Re:SQL Server? by AndroidCat · · Score: 3, Funny

      Typical Microsoft calling their product something generic that should apply to any SQL server. Almost like calling a product .. Windows.

      --
      One line blog. I hear that they're called Twitters now.
    2. Re:SQL Server? by azaris · · Score: 4, Informative

      Typical Microsoft calling their product something generic that should apply to any SQL server. Almost like calling a product .. Windows.

      It was originally called Sybase SQL Server but was later picked up by MS who adapted the name. Typical /. objectivity.

    3. Re:SQL Server? by sphealey · · Score: 4, Insightful
      It's also this intense stupidity that has prevented us from having a major vendor that actually provides a real RDBMS to this very day. If DBMS people would actually invest a little time in learning about the Relational Model, maybe they'd stop purchasing the crap that Microsoft, Oracle, IBM, etc. keep forcing out and (flamebait here) maybe people would stop installing MySQL and Access and thinking they're going to be good for anything more important than cookie recipes).
      That's exactly how Larry Ellison got his start - he saw a good idea in an IBM tech journal, hired some programmers to implement it, and the result was Oracle. Why don't you (and the others who post this stuff to database-related forums and threads) go ahead and do the same? Actually write and market a "real relational system based on theory"? Then you could stop yelling at everyone else about it.

      sPh

    4. Re:SQL Server? by azaris · · Score: 3, Insightful

      Well, "SQL server" is a stupid way to refer to a RDBS. That's like calling Apache "perl-server". I'm not surprised the only people chosing to name their RDBS products as SQL-something-or-other are the open source developers and Microsoft. Also I've never heard of MS sueing MySQL or PostgreSQL for use of the term SQL in relation to a RDBS.

      Besides, the product is officially called Microsoft SQL Server and has always been, just like Microsoft Windows, but everybody refers to it as SQL Server or, if there is possibility of confusion, MS SQL Server or MSSQL for short. Is it malevolence on the part of Microsoft if people can't be bothered to use the full name of each and every one of their products?

    5. Re:SQL Server? by MattRog · · Score: 3, Insightful

      Because it is *relatively easy* to make a mediocre (Oracle, etc.) implementation of the Relational Model. It is quite difficult to make a truly Relational Database Management System. Not only that, but because the market is so uneducated why would they want to use it in the first place?

      --

      Thanks,
      --
      Matt
  4. Re:No, it's 30,000GB by Cutie+Pi · · Score: 3, Informative

    You're off by 3 orders of magnitude. The largest is 30TB.

  5. No IMS? by John+Harrison · · Score: 4, Interesting

    I thought that 90% of the world's data was irretrievably trapped in IMS? Seriously though, I am surprised that an IMS system isn't on the list. Probably because it isn't relational, and the people making the list figure that RDBMS are the only DB around.

    1. Re:No IMS? by John+Harrison · · Score: 5, Informative
      Google is your friend.

      IMS is the database that was used to keep track of things for the moonshot. It is an IBM product. It is hierarchical as opposed to relational. Because of this it can do certain things very quickly, though in general it isn't as flexible as say DB2. Because it has been around so long, applications where having a DB was really important tend to have bought IMS a long time ago and developed systems around it. If your system is old enough, large enough and still works well for you there is no need to migrate to relational. Most of the world's financial transactions pass through an IMS system at some point. It is very stable and has uptimes that measure in years if not decades by now.

      Because of this I am surprised that it is not on the list. There are really big IMS databases out there that run a lot of transactions. Because it isn't relational there is some bigotry against it and it is ignored in the popular press.

  6. What surprised me... by MyNameIsFred · · Score: 5, Interesting

    I have none, nada, zip experience in big databases. But it surprised me that the peak workloads were measured in 100s of concurrent queries. If I had to make a wild guess, I would have guessed 10s of thousands. My blessed ignorance destroyed.

    1. Re:What surprised me... by sql*kitten · · Score: 5, Informative

      I have none, nada, zip experience in big databases.

      S'okay, I have plenty :-)

      But it surprised me that the peak workloads were measured in 100s of concurrent queries. If I had to make a wild guess, I would have guessed 10s of thousands. My blessed ignorance destroyed.

      You would typically see tens of thousands (or more) of concurrent connections to a middleware layer - like Tuxedo - which would then multiplex them down to hundreds of connections to the database. This is because there is a lot of latency in establishing a connection, in fact logging in often takes an order of magnitude longer than running an actual query, yet few users submit transactions nonstop. So there is no sense in maintaining tens of thousands of expensive user contexts on the DB server, and there is no sense in requiring intermittent (relatively speaking) users to log out after a short idle period. Middleware does nothing but manage concurrent user contexts, and it can do so very efficiently. A database can't, because it tries to preallocate as much context as it can, and that doesn't match real-world usage patterns, and anyway, database vendors concentrate on their SQL engines and leave middleware vendors to manage the rest.

      Of course, if you are a big database vendor, you probably also sell middleware, but there's no-one who tries to bundle the two into one, any more than you'd want a web server to have its own filesystem.

    2. Re:What surprised me... by Quill_28 · · Score: 4, Funny

      Something is wrong...

      Here I find a knowledgable person on Slashdot,
      Who has given a well-written response,
      Answered the question without flaming the askee,
      Didn't use numbers/symbols for letters,
      Never slammed MS or SCO,

      And was modded up?

  7. 29 TB is the biggest? by epiphani · · Score: 3, Interesting

    I honestly doubt that 29.2 Terabytes is the biggest database in the world. But anyway...

    I recognize Oracle and DB2, but could someone give a brief synopsis of what the other database systems are? And what is an MPP archetype?

    --
    .
    1. Re:29 TB is the biggest? by Peridriga · · Score: 4, Informative
      Well... if you actually read the article it clearly states that 29.2 is not the largest...

      You can find the link to the article yourself but

      1. AT&T @ 94.3TB
      2. Amazon @ 34.2TB
    2. Re:29 TB is the biggest? by mountainhouse · · Score: 5, Interesting

      I think the NCR Teradata approach is one of the most interesting. It is made up of a number of nodes (each quad Intel processor systems with separate memory and disk), each broken down into a number of logical machines. Data is hashed across all the nodes in the systems based on the data's indexing. So if two tables have the same indexing the join takes place at the "logical machine" level, and then the result is spooled together. The largest systems approach 300 nodes, with over 2,000 logical machines and 150 Tb of disk (some used to duplicate tables in case of node failure).

      Personally, it has it's drawbacks, but if the indexing is right, you can join hundred million row tables at amazing speed. Based on my experience in data warehousing, it's performance Oracle can't touch (no, I'm not paid by NCR...just a user).

      http://www.teradata.com

      Overview:
      http://www.teradata.com/t/go.aspx/?id =84960

    3. Re:29 TB is the biggest? by jgerry · · Score: 4, Informative

      How do they backup a database that is 94.3 TB?

      I support very large Oracle databases for a living (very large meaning > 1TB), databases that must be up 24/7. Backups are done in a number of different ways:

      1) Disk syncs, block by block, between disk subsystems at disparate locations, to retain multiple copies of a database in different locations. They can be synced to more than one location too, so you can have as many copies of the database as you want. Your main database is the only "hot" database, the others can be brought up and recovered if needed. We mainly use EMC disk subsystems to do this, the process is called BCV (can't remember what that stands for right now)

      2) Real-time replication. One-to-one or one-to-many. All databases are "hot" at all times. This can be great for load balancing too since you can have multiple system onine at the same time. Very difficult to maintain and monitor.

      Large databases just can't be put to tape anymore. Even if you did, it would take days or weeks to recover them if they failed. Disk to disk is about the only way to provide backups for really large databases.

  8. 94.3TB!?!?! by Peridriga · · Score: 4, Interesting

    I know where I work we recently (for an IT pat on the back) calculated our total network accessiable storage capacity and came in at a rough estimate of about 150TB. Now that is a giant swarth of data and a decent amount is in databases (MSSQL farm) but, scattered across 1000's of DB's.

    It takes a truely amazing staff to maintain (backup, adminisister, maintence, sit and stare at screens) the servers and maintain the integrity of the data but, good lord...

    A 94.3TB database? My upmost, and highest kudo's to those DBMA's and admins there. That is one gigantic task to operate. Being it's AT&T and assuming a great deal is billing and maintence functions these have to be up I'm sure a good 3 nines if not greater.

    Regardless of the result of the study, which without actually reading the entire study the end results are simply a short-read of a geek pissing contest, I find it truely amazing how much work, man-hours, and midnight pager calls go into maintaining these databases. I know I don't want our DBMA's jobs and certainly wouldn't want to be a DBMA on a 94.3TB farm but, I know those that do and love doing it. It's a speciality skill and apparently these guys do it right...

    Kudos...

    1. Re:94.3TB!?!?! by kilonad · · Score: 3, Insightful

      This is a home grown RDBMS!

      What else do you expect from the company that kinda sorta wrote Unix?

  9. Archive.org not on the list? by CompWerks · · Score: 4, Interesting
    They claim to have over 300tb of data.

    Quote:
    "The Internet Archive Wayback Machine contains over 300 terabytes of data and is currently growing at a rate of 12 terabytes per month." Taken from here

    --
    If you can read this sig - the bitch fell off.
    1. Re:Archive.org not on the list? by bruthasj · · Score: 3, Interesting

      All the more proving that you don't need a stupid database for everything. Actually, they should put conventional static filesystems as part of the comparison. Because you know what, some IT people get hooked on trying to dumping everything under the Sun in Oracle. This request is especially relevant for journaling/transaction based filesystems and possibly the future Longhorn thingy where it's got SQL capabilities.

  10. Only on Windows platform! by MS · · Score: 5, Informative
    Read all, to get the facts:

    Lastly, in the Windows OTLP category HP servers were used by 7 of 10 organizations, and Microsoft SQL Server was the DBMS choice for seven respondents.

    Neither WindowsNT, nor MS SQL are generally a choice for the top databases. In fact, to make the entry in this list, a Windows-Database was required to be only half as big as databases on other platforms:

    In order to qualify for the TopTen program consideration, any commercial production database implementation was required to feature a minimum of 500 GB of data for Microsoft Corp.'s Windows and NT platforms and 1 TB of data for all other platforms

    :-)
    ms

  11. Anonymous by suso · · Score: 4, Funny

    Not only does Anonymous say a lot of things and write some music and paint, but he also has one of the world's largest databases.

  12. SMP? by paulbd · · Score: 4, Informative

    does anybody believe that the "SMP" used in reference to the French Telecom DB means "symbol manipulation program" rather than "symmetric multiprocessing"? how are we supposed to take seriously a study (or at least a report about the study) where they just look up acronyms with no understanding?

  13. Doesn't have to be relational by arrogance · · Score: 4, Interesting

    From the article: "the TopTen Program featured 141 qualified and validated surveys representing 23 countries spanning all major DBMS, server and storage vendor products." So it just has to be a DataBase Management System, not necessarily Relational.

  14. Frightening by water-and-sewer · · Score: 3, Interesting

    Why am I simultaneously frightened and amazed to note that two of the winners are the United States'customs and border patrol database and Experion's credit rating database? If you've ever checked your credit rating you'd realized this company and its peers (equifax etc.) maintain a tremendous amount of information on you, and charge you to verify it. Finding out why your credit is bad, and in the case of a mistake, changing it, is an expensive and time consuming task.

    --
    If this were Usenet, I'd killfile the lot of you.
  15. MasterCard by truthsearch · · Score: 3, Interesting

    I left MasterCard in 1999 after working with their data warehouse. At the time they recently bought a 3 terabyte Sun E10000 with Oracle. They quickly ran out of space and added another terabyte. I'm also surprised to not see them on the list. They work closely with Oracle, who have an office down the street, since they have high volume. Just the credit card transactions table alone gets 14 million new records on average every day.

    I agree that there are many companies who would not want to be in that list. There's a small competitive advantage if you keep what technology you use secret.

  16. Daytona? by wandazulu · · Score: 3, Insightful

    Is it just me, or is this the first time anyone has heard of AT&T's Daytona? A quick Google search reveals a pdf and 8 links before Daytona becomes Daytona Beach. For such a high ranking, I'd think AT&T would want to make it better known that they have this system.

  17. France Telecom? They must be doing something wrong by bshroyer · · Score: 3, Funny

    My first reaction is that, if France Telecom has the largest (non-hybrid) proprietary relational data storage, at 29 TB, ahead of AT&T and SBC, at around 26TB each, that France Telecom must have a bunch of redundant data lying around.

    As of 2001-01-01, France had a population of about 59 Million. As it turns out, however, France Telecom (FTE) provides services to a dozen countries, not just France. Checking Yahoo! Finance, I see that

    FTE had 2002 revenues of 49B, with 240,000 employees.
    ATT had 2002 revenues of 40B, with 71,000 employees.
    Finally, SBC had 2002 revenues of 43B, with 175,000 employees.

    So nothing terribly unusual about the size of their database. But it's obvious that the French employees are a bunch of unproductive slackers...

    --
    The cure for cancer is coming: Reovirus
  18. bah, meaningless by kpharmer · · Score: 3, Interesting

    This is like ranking projects based on largest number of lines of code.

    Without system descriptions (like in tcp) it merely shows that such a top-end is feasible.

    What about total cost?
    annual cost?
    time to build?
    software versions?
    hardware?
    staffing composition?

    I mean really, a 500 gbyte database on a modest single CPU server is far more challenging than a 2 TB database on a 64-CPU E10k.