Slashdot Mirror


Web Analytics Databases Get Even Larger

CurtMonash writes "Web analytics databases are getting even larger. eBay now has a 6 1/2 petabyte warehouse running on Greenplum — user data — to go with its more established 2 1/2 petabyte Teradata system. Between the two databases, the metrics are enormous — 17 trillion rows, 150 billion new rows per day, millions of queries per day, and so on. Meanwhile, Facebook has 2 1/2 petabytes managed by Hadoop, not running on a conventional DBMS at all, Yahoo has over a petabyte (on a homegrown system), and Fox/MySpace has two different multi-hundred terabyte systems (Greenplum and Aster Data nCluster). eBay and Fox are the two Greenplum customers I wrote in about last August, when they both seemed to be headed to the petabyte range in a hurry. These are basically all web log/clickstream databases, except that network event data is even more voluminous than the pure clickstream stuff."

2 of 62 comments (clear)

  1. Looks like grammar is getting every worse... by ActusReus · · Score: 1, Insightful

    For shame, Taco...

  2. Another win for PostgreSQL... by tcopeland · · Score: 3, Insightful

    ...since that's that database on which Greenplum is based. PostgreSQL 8.4 is coming out soon and looks like it's got a lot of improvements. Too bad replication didn't make it in... hopefully in 8.5.

    One of the improvements that looks good is the parallelized restore; RubyForge's upgrade from PostgreSQL 8.2 to 8.3 took 30 minutes to restore the db and it seems like this feature will speed that up considerably.