Slashdot Mirror


The 1-Petabyte Barrier Is Crumbling

CurtMonash writes "I had been a database industry analyst for a decade before I found 1-gigabyte databases to write about. Now it is 15 years later, and the 1-petabyte barrier is crumbling. Specifically, we are about to see data warehouses — running on commercial database management systems — that contain over 1 petabyte of actual user data. For example, Greenplum is slated to have two of them within 60 days. Given how close it was a year ago, Teradata may have crossed the 1-petabyte mark by now too. And by the way, Yahoo already has a petabyte+ database running on a home-grown system. Meanwhile, the 100-terabyte mark is almost old hat. Besides the vendors already mentioned above, others with 100+ terabyte databases deployed include Netezza, DATAllegro, Dataupia, and even SAS."

14 of 217 comments (clear)

  1. Fixed it for you... by hyperz69 · · Score: 5, Funny

    I had been a Porn Collector for a decade before I found 1-gigabyte Porn Collections to write about. Now it is 15 years later, and the 1-petabyte barrier is crumbling.

  2. Oh s***! I'm calling my Congressman! by BitterOldGUy · · Score: 5, Funny
    We must protect the children from the petabytes! These petabytes are everywhere trying to have sex with our children!

    I have to find my kid. Last time I saw her, she was with her Uncle Micky while he was having his morning martini.

  3. I am confused !! by neonux · · Score: 5, Funny

    How many Libraries of Congress are necessary to break the 1-petabyte barrier ??

    --
    @neonux
  4. No big news here.... by edwardd · · Score: 5, Interesting

    Take a look at almost any large financial firm. The email retention system alone is much larger than a petabyte, and that's just dealing with the online media, not including what's spooled to tape. Due to deficiencies in RDBMS ssytems, each of the large firms usually develop their own systems for managing the archival system on top of the database.

  5. Re:Petabyte DBs are old news to... by houghi · · Score: 5, Interesting

    This is intended as a joke, I asume, but it also brings up the fact that it will be different sort of data that is now collected.

    When I look at CRM systems, they used to contain basically the address and perhaps logs from calls they made to the call center. Now whole phone conversations are logged as well as faxes and letters that are scanned, together with images and video that is available.
    Faxes and letters used to have only a reference number and you could look them up in a file cabinet.

    So even though there is not that much more data collected, (things were already available) they are now all put in the database. Where it used to be an entry 'customer was extremely angry and cursed a lot' it now saves the mp3 for all eternity (where legal).

    So yes, the HD space it takes is bigger and thus the amount is bigger, yet it does not automaticaly mean that sort of data is bigger. e.g. do we suddenly have shoesize or other data available? Could be but it also could be that we just have different file formats we now save in the databse.

    --
    Don't fight for your country, if your country does not fight for you.
  6. OO databases have done this ten years ago by cjonslashdot · · Score: 5, Interesting

    I remember encountering a 1+ petabyte database 10 years ago: it was the database to record and analyze particle accelerator experiment data at CERN. And it was built using a commercial object database - not relational. Oh but wait - the relational vendors have told us that OO databases don't scale....

    That was ten years ago.

  7. When the petafile barrier crumbles ... by cpu_fusion · · Score: 5, Funny

    ... we'll need an army of Chris Hansens and a mountain of beartraps. God help us.

  8. the only *real* barrier is backup time by petes_PoV · · Score: 5, Interesting
    or more correctly, restore time.

    Any organisation that wishes to be classed in any way professional knows that the value in it's databases has to be protected. That requires them to have the means to recover the data if something bad happens. A hot-mirrored copy is simply not good enough (one corruption would get written to both copies).

    As a consequence, the size of commercial databases is limited by the amount of time the organisation is willing to have it unavailable while it is restored, in the case of a disaster, or the time taken to create/update secure, offline, copies.

    Not by intrinsic properties of the database or host architecture

    --
    politicians are like babies' nappies: they should both be changed regularly and for the same reasons
  9. Science! by edremy · · Score: 5, Informative
    Petabytes are actually pretty common in the sciences. I visited NCAR (National Center for Atmospheric Research) in Boulder five years ago and their main database was in the 2PB region even then. I'm sure it's a lot larger today

    The LHC will generate several PB of data per year, as will the Large Synoptic Survey Telescope. These projects aren't all that uncommon.

    --
    "Seven Deadly Sins? I thought it was a to-do list!"
  10. The world will only ever need 5 large databases by davidwr · · Score: 5, Funny

    The world will only need 5 large databases.

    None of them will never need more than 640KB^H^HMB^H^HGBMB^H^HTB of RAM and 32MB^H^HGB^H^HTB^H^HPB of storage.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  11. Re:Google Street View must be most massive db ever by Anonymous Coward · · Score: 5, Informative
  12. Re:Noob by Gilmoure · · Score: 5, Funny

    It has an event horizon and is actively acquiring porn on it's own?

    --
    I drank what? -- Socrates
  13. Johnny Mnemonic by vjmurphy · · Score: 5, Funny

    I need measurements I can understand, like how many Keanu Reeves' brains is a petabyte? And could he hold it indefinitely, or would his head explode at some point? If the latter, can we get him started on it now?

    --
    Vincent J. Murphy
    Spandex Justice
  14. Re:Oh, come on. by Alpha830RulZ · · Score: 5, Interesting

    Data mining is statistically based. The more information that's available to mine, the more accurate the results will be.

    A minor quibble. I do data mining for a living. With most data sets, we end up sampling them down, because more data ramps up processing time faster than it improves accuracy. With most problems, more data doesn't improve accuracy measureably, once you've reached a certain critical mass size in the dataset. Simplistically, you don't need to flip the coin a billion times to figure out that it comes up heads 50% of the time.

    It's a rare problem that we use more than 100,000 records for. They exist, but they're rare.

    --
    I was taught to respect my elders. The trouble is, it's getting harder and harder to find some.