Slashdot Mirror


World's Largest Databases Ranked

prostoalex writes "Winter Corp. has summarized its findings of the annual TopTen competition, where the world's largest and most hard-working (in terms of load) databases are ranked. The results are in, and this year the contestants were ranked on size, data volume, number of rows and peak workload. I wrote up a brief summary of the top three winners in each category for those too lazy to browse the interactive WinterCorp chart."

11 of 356 comments (clear)

  1. Google by ScribeOfTheNile · · Score: 5, Interesting

    I would've expected to see Google in there somewhere.

    1. Re:Google by tinrib · · Score: 5, Informative

      Doesn't Google use 'big files' rather than a database for storing all its data?

      see http://www.cs.rochester.edu/sosp2003/papers/p125-g hemawat.pdf which describes the Google filesystem.

    2. Re:Google by lewp · · Score: 5, Informative

      Even if Google qualified, which it probably doesn't due to the methods it uses for its data storage, if I read the article properly the database vendors are responsible for naming the participants.

      Since Google's stuff seems to be developed in-house, they don't have a major database vendor to nominate them.

      --
      Game... blouses.
    3. Re:Google by stripmarkup · · Score: 5, Informative

      It seems that they are comparing relational databases. Search engines use proprietary databases which, among other things, do not allow for live insertion of records, SQL commands, etc. As for data volume, Google (or Yahoo or MSN, for that matter) are probably in the ballpark. The average html page is around 10k. Google probably stores at least 10^9 raw web pages in their cache(that's 10 TB alone) plus a lot of meta information about links to-from many others.

      --
      See charts for twitter trends on Trendistic
    4. Re:Google by Wastl · · Score: 5, Informative
      The term "database" is rather unprecise.

      One might see a database as merely a "big file" with mechanisms to access and modify it consistently (and surely, Google has some means to ensure consistency). A big file does not disqualify for the term "database" just because it is not produced by one of {Oracle, MS-SQL, ...} or cannot be queried by the language SQL.

      It is also possible to consider the Web to be a database (of Web sites). Or an XML, BibTeX, dbm, whatsoever file.

      Sebastian

  2. SQL Server? by B5_geek · · Score: 5, Interesting

    Does the SQL Server mean MS-SQL?

    I would have liked to see SQL vs non-SQL ranking too.

    --
    "The price good men pay for indifference to public affairs is to be ruled by evil men." ~Plato (427-347 BC)
  3. What surprised me... by MyNameIsFred · · Score: 5, Interesting

    I have none, nada, zip experience in big databases. But it surprised me that the peak workloads were measured in 100s of concurrent queries. If I had to make a wild guess, I would have guessed 10s of thousands. My blessed ignorance destroyed.

    1. Re:What surprised me... by sql*kitten · · Score: 5, Informative

      I have none, nada, zip experience in big databases.

      S'okay, I have plenty :-)

      But it surprised me that the peak workloads were measured in 100s of concurrent queries. If I had to make a wild guess, I would have guessed 10s of thousands. My blessed ignorance destroyed.

      You would typically see tens of thousands (or more) of concurrent connections to a middleware layer - like Tuxedo - which would then multiplex them down to hundreds of connections to the database. This is because there is a lot of latency in establishing a connection, in fact logging in often takes an order of magnitude longer than running an actual query, yet few users submit transactions nonstop. So there is no sense in maintaining tens of thousands of expensive user contexts on the DB server, and there is no sense in requiring intermittent (relatively speaking) users to log out after a short idle period. Middleware does nothing but manage concurrent user contexts, and it can do so very efficiently. A database can't, because it tries to preallocate as much context as it can, and that doesn't match real-world usage patterns, and anyway, database vendors concentrate on their SQL engines and leave middleware vendors to manage the rest.

      Of course, if you are a big database vendor, you probably also sell middleware, but there's no-one who tries to bundle the two into one, any more than you'd want a web server to have its own filesystem.

  4. Only on Windows platform! by MS · · Score: 5, Informative
    Read all, to get the facts:

    Lastly, in the Windows OTLP category HP servers were used by 7 of 10 organizations, and Microsoft SQL Server was the DBMS choice for seven respondents.

    Neither WindowsNT, nor MS SQL are generally a choice for the top databases. In fact, to make the entry in this list, a Windows-Database was required to be only half as big as databases on other platforms:

    In order to qualify for the TopTen program consideration, any commercial production database implementation was required to feature a minimum of 500 GB of data for Microsoft Corp.'s Windows and NT platforms and 1 TB of data for all other platforms

    :-)
    ms

  5. Re:29 TB is the biggest? by mountainhouse · · Score: 5, Interesting

    I think the NCR Teradata approach is one of the most interesting. It is made up of a number of nodes (each quad Intel processor systems with separate memory and disk), each broken down into a number of logical machines. Data is hashed across all the nodes in the systems based on the data's indexing. So if two tables have the same indexing the join takes place at the "logical machine" level, and then the result is spooled together. The largest systems approach 300 nodes, with over 2,000 logical machines and 150 Tb of disk (some used to duplicate tables in case of node failure).

    Personally, it has it's drawbacks, but if the indexing is right, you can join hundred million row tables at amazing speed. Based on my experience in data warehousing, it's performance Oracle can't touch (no, I'm not paid by NCR...just a user).

    http://www.teradata.com

    Overview:
    http://www.teradata.com/t/go.aspx/?id =84960

  6. Re:No IMS? by John+Harrison · · Score: 5, Informative
    Google is your friend.

    IMS is the database that was used to keep track of things for the moonshot. It is an IBM product. It is hierarchical as opposed to relational. Because of this it can do certain things very quickly, though in general it isn't as flexible as say DB2. Because it has been around so long, applications where having a DB was really important tend to have bought IMS a long time ago and developed systems around it. If your system is old enough, large enough and still works well for you there is no need to migrate to relational. Most of the world's financial transactions pass through an IMS system at some point. It is very stable and has uptimes that measure in years if not decades by now.

    Because of this I am surprised that it is not on the list. There are really big IMS databases out there that run a lot of transactions. Because it isn't relational there is some bigotry against it and it is ignored in the popular press.