Web Analytics Databases Get Even Larger

← Back to Stories (view on slashdot.org)

Web Analytics Databases Get Even Larger

Posted by CmdrTaco on Thursday April 30, 2009 @12:43AM from the who-watches-the-oh-never-mind dept.

CurtMonash writes "Web analytics databases are getting even larger. eBay now has a 6 1/2 petabyte warehouse running on Greenplum — user data — to go with its more established 2 1/2 petabyte Teradata system. Between the two databases, the metrics are enormous — 17 trillion rows, 150 billion new rows per day, millions of queries per day, and so on. Meanwhile, Facebook has 2 1/2 petabytes managed by Hadoop, not running on a conventional DBMS at all, Yahoo has over a petabyte (on a homegrown system), and Fox/MySpace has two different multi-hundred terabyte systems (Greenplum and Aster Data nCluster). eBay and Fox are the two Greenplum customers I wrote in about last August, when they both seemed to be headed to the petabyte range in a hurry. These are basically all web log/clickstream databases, except that network event data is even more voluminous than the pure clickstream stuff."

5 of 62 comments (clear)

Min score:

Reason:

Sort:

The good news... by Yoozer · 2009-04-30 00:49 · Score: 5, Funny

At least these won't get out in the open that easily because someone copied them to an USB drive and lost it somewhere.
Re:Web Analytics Databases Get Every Larger? by jez9999 · 2009-04-30 00:49 · Score: 3, Funny

Yesy. It mighty take a whiley to get used to, but I thinky it's quite a plusy overall.

--
== Jez ==
Do you miss Firefox? Try Pale Moon.
Another win for PostgreSQL... by tcopeland · 2009-04-30 01:31 · Score: 3, Insightful

...since that's that database on which Greenplum is based. PostgreSQL 8.4 is coming out soon and looks like it's got a lot of improvements. Too bad replication didn't make it in... hopefully in 8.5.
One of the improvements that looks good is the parallelized restore; RubyForge's upgrade from PostgreSQL 8.2 to 8.3 took 30 minutes to restore the db and it seems like this feature will speed that up considerably.

--
The Army reading list
Recursive queries too by coryking · 2009-04-30 01:38 · Score: 3, Interesting

These little puppies, i.e. recursive queries, look pretty cool too. Sounds like a good tool for threaded comment systems or finding related items in a table:
Recursive queries are typically used to deal with hierarchical or tree-structured data. A useful example is this query to find all the direct and indirect sub-parts of a product, given only a table that shows immediate inclusions:
WITH RECURSIVE included_parts(sub_part, part, quantity) AS ( SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product' UNION ALL SELECT p.sub_part, p.part, p.quantity FROM included_parts pr, parts p WHERE p.part = pr.sub_part ) SELECT sub_part, SUM(quantity) as total_quantity FROM included_parts GROUP BY sub_part
... It will take a while to wrap my brain around this new concept though. That doesn't look like a normal query I'm used to reading!
They'll get replication some day soon. But there is a lot of cool, very useful stuff with every new release. I usually feel like kid in a candy store wondering what's new that I can exploit.
Still get lame recomendations by se7en11 · 2009-04-30 04:45 · Score: 4, Funny

With all that user data, you'd think they would know me better by now. But I still get these lame recommendations.

"You might be interested in action DVDs because you bought one in the past" - BRILLIANT!!