Slashdot Mirror


World's Largest Databases Ranked

prostoalex writes "Winter Corp. has summarized its findings of the annual TopTen competition, where the world's largest and most hard-working (in terms of load) databases are ranked. The results are in, and this year the contestants were ranked on size, data volume, number of rows and peak workload. I wrote up a brief summary of the top three winners in each category for those too lazy to browse the interactive WinterCorp chart."

13 of 356 comments (clear)

  1. Google by ScribeOfTheNile · · Score: 5, Interesting

    I would've expected to see Google in there somewhere.

  2. SQL Server? by B5_geek · · Score: 5, Interesting

    Does the SQL Server mean MS-SQL?

    I would have liked to see SQL vs non-SQL ranking too.

    --
    "The price good men pay for indifference to public affairs is to be ruled by evil men." ~Plato (427-347 BC)
  3. No IMS? by John+Harrison · · Score: 4, Interesting

    I thought that 90% of the world's data was irretrievably trapped in IMS? Seriously though, I am surprised that an IMS system isn't on the list. Probably because it isn't relational, and the people making the list figure that RDBMS are the only DB around.

  4. What surprised me... by MyNameIsFred · · Score: 5, Interesting

    I have none, nada, zip experience in big databases. But it surprised me that the peak workloads were measured in 100s of concurrent queries. If I had to make a wild guess, I would have guessed 10s of thousands. My blessed ignorance destroyed.

  5. 29 TB is the biggest? by epiphani · · Score: 3, Interesting

    I honestly doubt that 29.2 Terabytes is the biggest database in the world. But anyway...

    I recognize Oracle and DB2, but could someone give a brief synopsis of what the other database systems are? And what is an MPP archetype?

    --
    .
    1. Re:29 TB is the biggest? by mountainhouse · · Score: 5, Interesting

      I think the NCR Teradata approach is one of the most interesting. It is made up of a number of nodes (each quad Intel processor systems with separate memory and disk), each broken down into a number of logical machines. Data is hashed across all the nodes in the systems based on the data's indexing. So if two tables have the same indexing the join takes place at the "logical machine" level, and then the result is spooled together. The largest systems approach 300 nodes, with over 2,000 logical machines and 150 Tb of disk (some used to duplicate tables in case of node failure).

      Personally, it has it's drawbacks, but if the indexing is right, you can join hundred million row tables at amazing speed. Based on my experience in data warehousing, it's performance Oracle can't touch (no, I'm not paid by NCR...just a user).

      http://www.teradata.com

      Overview:
      http://www.teradata.com/t/go.aspx/?id =84960

  6. 94.3TB!?!?! by Peridriga · · Score: 4, Interesting

    I know where I work we recently (for an IT pat on the back) calculated our total network accessiable storage capacity and came in at a rough estimate of about 150TB. Now that is a giant swarth of data and a decent amount is in databases (MSSQL farm) but, scattered across 1000's of DB's.

    It takes a truely amazing staff to maintain (backup, adminisister, maintence, sit and stare at screens) the servers and maintain the integrity of the data but, good lord...

    A 94.3TB database? My upmost, and highest kudo's to those DBMA's and admins there. That is one gigantic task to operate. Being it's AT&T and assuming a great deal is billing and maintence functions these have to be up I'm sure a good 3 nines if not greater.

    Regardless of the result of the study, which without actually reading the entire study the end results are simply a short-read of a geek pissing contest, I find it truely amazing how much work, man-hours, and midnight pager calls go into maintaining these databases. I know I don't want our DBMA's jobs and certainly wouldn't want to be a DBMA on a 94.3TB farm but, I know those that do and love doing it. It's a speciality skill and apparently these guys do it right...

    Kudos...

  7. Archive.org not on the list? by CompWerks · · Score: 4, Interesting
    They claim to have over 300tb of data.

    Quote:
    "The Internet Archive Wayback Machine contains over 300 terabytes of data and is currently growing at a rate of 12 terabytes per month." Taken from here

    --
    If you can read this sig - the bitch fell off.
    1. Re:Archive.org not on the list? by bruthasj · · Score: 3, Interesting

      All the more proving that you don't need a stupid database for everything. Actually, they should put conventional static filesystems as part of the comparison. Because you know what, some IT people get hooked on trying to dumping everything under the Sun in Oracle. This request is especially relevant for journaling/transaction based filesystems and possibly the future Longhorn thingy where it's got SQL capabilities.

  8. Doesn't have to be relational by arrogance · · Score: 4, Interesting

    From the article: "the TopTen Program featured 141 qualified and validated surveys representing 23 countries spanning all major DBMS, server and storage vendor products." So it just has to be a DataBase Management System, not necessarily Relational.

  9. Frightening by water-and-sewer · · Score: 3, Interesting

    Why am I simultaneously frightened and amazed to note that two of the winners are the United States'customs and border patrol database and Experion's credit rating database? If you've ever checked your credit rating you'd realized this company and its peers (equifax etc.) maintain a tremendous amount of information on you, and charge you to verify it. Finding out why your credit is bad, and in the case of a mistake, changing it, is an expensive and time consuming task.

    --
    If this were Usenet, I'd killfile the lot of you.
  10. MasterCard by truthsearch · · Score: 3, Interesting

    I left MasterCard in 1999 after working with their data warehouse. At the time they recently bought a 3 terabyte Sun E10000 with Oracle. They quickly ran out of space and added another terabyte. I'm also surprised to not see them on the list. They work closely with Oracle, who have an office down the street, since they have high volume. Just the credit card transactions table alone gets 14 million new records on average every day.

    I agree that there are many companies who would not want to be in that list. There's a small competitive advantage if you keep what technology you use secret.

  11. bah, meaningless by kpharmer · · Score: 3, Interesting

    This is like ranking projects based on largest number of lines of code.

    Without system descriptions (like in tcp) it merely shows that such a top-end is feasible.

    What about total cost?
    annual cost?
    time to build?
    software versions?
    hardware?
    staffing composition?

    I mean really, a 500 gbyte database on a modest single CPU server is far more challenging than a 2 TB database on a 64-CPU E10k.