The 1-Petabyte Barrier Is Crumbling
CurtMonash writes "I had been a database industry analyst for a decade before I found 1-gigabyte databases to write about. Now it is 15 years later, and the 1-petabyte barrier is crumbling. Specifically, we are about to see data warehouses — running on commercial database management systems — that contain over 1 petabyte of actual user data. For example, Greenplum is slated to have two of them within 60 days. Given how close it was a year ago, Teradata may have crossed the 1-petabyte mark by now too. And by the way, Yahoo already has a petabyte+ database running on a home-grown system. Meanwhile, the 100-terabyte mark is almost old hat. Besides the vendors already mentioned above, others with 100+ terabyte databases deployed include Netezza, DATAllegro, Dataupia, and even SAS."
No porn collection jokes please.
I had been a Porn Collector for a decade before I found 1-gigabyte Porn Collections to write about. Now it is 15 years later, and the 1-petabyte barrier is crumbling.
I have to find my kid. Last time I saw her, she was with her Uncle Micky while he was having his morning martini.
How many Libraries of Congress are necessary to break the 1-petabyte barrier ??
@neonux
Take a look at almost any large financial firm. The email retention system alone is much larger than a petabyte, and that's just dealing with the online media, not including what's spooled to tape. Due to deficiencies in RDBMS ssytems, each of the large firms usually develop their own systems for managing the archival system on top of the database.
Call me old fashioned, but I don't see why anyone but a search engine like google would need anything like a petabyte. You can have only so much useful information about anything. Sounds to me like, fill your garage with sh1t, build a bigger garage.
This is intended as a joke, I asume, but it also brings up the fact that it will be different sort of data that is now collected.
When I look at CRM systems, they used to contain basically the address and perhaps logs from calls they made to the call center. Now whole phone conversations are logged as well as faxes and letters that are scanned, together with images and video that is available.
Faxes and letters used to have only a reference number and you could look them up in a file cabinet.
So even though there is not that much more data collected, (things were already available) they are now all put in the database. Where it used to be an entry 'customer was extremely angry and cursed a lot' it now saves the mp3 for all eternity (where legal).
So yes, the HD space it takes is bigger and thus the amount is bigger, yet it does not automaticaly mean that sort of data is bigger. e.g. do we suddenly have shoesize or other data available? Could be but it also could be that we just have different file formats we now save in the databse.
Don't fight for your country, if your country does not fight for you.
I remember encountering a 1+ petabyte database 10 years ago: it was the database to record and analyze particle accelerator experiment data at CERN. And it was built using a commercial object database - not relational. Oh but wait - the relational vendors have told us that OO databases don't scale....
That was ten years ago.
Google Maps' database is far bigger...
A base of 8 tiles, with each becoming four more smaller tiles, in two modes (map/satellite), and 16 zoom levels.
We are sorry, but we don't
have maps at this zoom
level for this region.
Try zooming out for a
broader look.
[
... but I do wonder if you've ever heard of Sarbanes-Oxley.
Run and catch, run and catch, the lamb is caught in the blackberry patch.
... we'll need an army of Chris Hansens and a mountain of beartraps. God help us.
Any organisation that wishes to be classed in any way professional knows that the value in it's databases has to be protected. That requires them to have the means to recover the data if something bad happens. A hot-mirrored copy is simply not good enough (one corruption would get written to both copies).
As a consequence, the size of commercial databases is limited by the amount of time the organisation is willing to have it unavailable while it is restored, in the case of a disaster, or the time taken to create/update secure, offline, copies.
Not by intrinsic properties of the database or host architecture
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
The LHC will generate several PB of data per year, as will the Large Synoptic Survey Telescope. These projects aren't all that uncommon.
"Seven Deadly Sins? I thought it was a to-do list!"
My porn collection has long since achieved infinity.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
That is all.
The world will only need 5 large databases.
None of them will never need more than 640KB^H^HMB^H^HGBMB^H^HTB of RAM and 32MB^H^HGB^H^HTB^H^HPB of storage.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
http://labs.google.com/papers/bigtable.html
WalMart's data warehouse is already 4 petabytes: http://storefrontbacktalk.com/story/080307walmart.php
Just because I can hook a shark from a boat, I do no offer to wrestle it in the water.
I need measurements I can understand, like how many Keanu Reeves' brains is a petabyte? And could he hold it indefinitely, or would his head explode at some point? If the latter, can we get him started on it now?
Vincent J. Murphy
Spandex Justice
We've had petabyte databases on mainframes for a good couple of years. DB2 v9 on zSeries has two new tablespace types that make managing these humungous databases much easier.
So it may be news for the PC world but it's bordering on ancient history on IBM mainframes.
Sigs. We don't need no steenking sigs.
On the other hand, if you have the extra space, it invites the usual waste in the form of archive directories for closed-out years, development junk, etc. Spinning round and round, doing nothing.
Yep. That's exactly it. $200 today buys a 1 TB drive. $200 a few years ago bought a 1 GB drive. As the price has fallen the value of the HDD has risen relative to its cost. Those archive directories and development junk aren't being deleted because they have value. Sure, it's enough value to justify keeping them around when a 1 GB drive costs $200, but they are worth keeping around with a 1 TB drive costs that much.
They aren't "doing nothing" - they just aren't doing enough that it's worth keeping it until the price drops enough.
All of this is making the 1 TB drive considerably more valuable than the 1 GB drive, despite their original purchase price parity. This is long-tail economics at work. As the individual bits become worth less and less, the value in of the bits in total continues to rise, resulting in a completely new set of capabilities.
My DVR is an excellent example of this - it's a thorough change in the way that I watch television. Suddenly, it's a family event that we can all share, because when I want to comment, I can just hit pause, and share my thought. Nothing's lost, if needed we can just hit rewind a bit, and suddenly, instead of being annoyed at my daughter for wanting to comment on a point during a televised debate, I'm excited and interested! No more SHUSHSTing at my family, it's now a much more shared experience.
The price of nonlinear access media has dropped so incredibly that marginal-value bits (like video) are suddenly cheap enough to make it all possible.
I have no problem with your religion until you decide it's reason to deprive others of the truth.