Slashdot Mirror


The 1-Petabyte Barrier Is Crumbling

CurtMonash writes "I had been a database industry analyst for a decade before I found 1-gigabyte databases to write about. Now it is 15 years later, and the 1-petabyte barrier is crumbling. Specifically, we are about to see data warehouses — running on commercial database management systems — that contain over 1 petabyte of actual user data. For example, Greenplum is slated to have two of them within 60 days. Given how close it was a year ago, Teradata may have crossed the 1-petabyte mark by now too. And by the way, Yahoo already has a petabyte+ database running on a home-grown system. Meanwhile, the 100-terabyte mark is almost old hat. Besides the vendors already mentioned above, others with 100+ terabyte databases deployed include Netezza, DATAllegro, Dataupia, and even SAS."

217 comments

  1. Porn collection by Anonymous Coward · · Score: 4, Funny

    No porn collection jokes please.

    1. Re:Porn collection by aywwts4 · · Score: 2, Insightful

      No porn collection jokes please.

      +1 Futile

      --
      Web Developers: Celebrate to our roots! Animated Gifs and Tiled Backgrounds, dont let our history die!
    2. Re:Porn collection by Lisandro · · Score: 1

      I don't know about yours, but my porn collection is no joke.

    3. Re:Porn collection by rubycodez · · Score: 1

      and he's in 9.55 petabytes of that

    4. Re:Porn collection by giftculture · · Score: 1

      My coworker and I termed a new unit of measure for storage density - the porno-byte, equal to the size of all the porn on the internet... I wonder how many porno-bytes some of these large 100+ terabyte databases come out to be?

  2. Won't somebody think of the children.... by Anonymous Coward · · Score: 2, Funny

    Oh wait, that was petabyte...

    1. Re:Won't somebody think of the children.... by Anonymous Coward · · Score: 0

      Being a Petaphile will get you onto a different registry...

    2. Re:Won't somebody think of the children.... by StrategicIrony · · Score: 1

      One which categorizes those who dislikes the eating of tasty animals?

    3. Re:Won't somebody think of the children.... by Nutria · · Score: 1

      One which categorizes those who dislikes the eating of tasty animals?

      That would be a petaphobe.

      --
      "I don't know, therefore Aliens" Wafflebox1
  3. Fixed it for you... by hyperz69 · · Score: 5, Funny

    I had been a Porn Collector for a decade before I found 1-gigabyte Porn Collections to write about. Now it is 15 years later, and the 1-petabyte barrier is crumbling.

    1. Re:Fixed it for you... by Poltras · · Score: 1

      Please. Stop. Just stop.

  4. This isn't really news by Anonymous Coward · · Score: 1, Interesting

    Since 500GB drives, this has been a reality. A couple of companies started selling petabyte
    arrays at about the time those drives were
    established.

    1. Re:This isn't really news by Anonymous Coward · · Score: 0
    2. Re:This isn't really news by Anonymous Coward · · Score: 0

      only need 2000 of them in that array.

    3. Re:This isn't really news by saibot834 · · Score: 1

      I think you confuse Petabyte with Terabyte.

  5. Petabyte DBs are old news to... by C_Kode · · Score: 2, Funny

    Petabyte DBs are old news to techie porn collectors. They always mix their two favorite subjects into one. Tech + Porn = Petabyte+ Porn Database

    1. Re:Petabyte DBs are old news to... by houghi · · Score: 5, Interesting

      This is intended as a joke, I asume, but it also brings up the fact that it will be different sort of data that is now collected.

      When I look at CRM systems, they used to contain basically the address and perhaps logs from calls they made to the call center. Now whole phone conversations are logged as well as faxes and letters that are scanned, together with images and video that is available.
      Faxes and letters used to have only a reference number and you could look them up in a file cabinet.

      So even though there is not that much more data collected, (things were already available) they are now all put in the database. Where it used to be an entry 'customer was extremely angry and cursed a lot' it now saves the mp3 for all eternity (where legal).

      So yes, the HD space it takes is bigger and thus the amount is bigger, yet it does not automaticaly mean that sort of data is bigger. e.g. do we suddenly have shoesize or other data available? Could be but it also could be that we just have different file formats we now save in the databse.

      --
      Don't fight for your country, if your country does not fight for you.
    2. Re:Petabyte DBs are old news to... by Anonymous Coward · · Score: 0

      Hmmm... you say most places record and store the whole call now? Like, even what I say when on hold?

      Uh-oh.

      I guess let's just hope nobody listens to my recordings, lest they find out how I truly feel about their hold music.

    3. Re:Petabyte DBs are old news to... by Lodragandraoidh · · Score: 1

      You hit the nail on the head. The technology allows for a richer experience for the user -- hence the ability to collect more useful information to make the customer experience better/faster/stronger/etc.

      --

      Lodragan Draoidh
      The more you explain it, the more I don't understand it. - Mark Twain
    4. Re:Petabyte DBs are old news to... by zippthorne · · Score: 1

      If they had the time to listen to you while on hold.. why would they put you on hold?

      --
      Can you be Even More Awesome?!
    5. Re:Petabyte DBs are old news to... by eric-x · · Score: 1

      To hear what you're saying when you think no one is listening.

    6. Re:Petabyte DBs are old news to... by blahplusplus · · Score: 3, Interesting

      "they used to contain basically the address and perhaps logs from calls they made to the call center. Now whole phone conversations are logged as well as faxes and letters that are scanned, together with images and video that is available."

      Reminds me of David brin's Transparent society

      http://www.davidbrin.com/tschp1.html

      http://www.amazon.com/Transparent-Society-Technology-Between-Privacy/dp/0738201448/

    7. Re:Petabyte DBs are old news to... by houghi · · Score: 1

      Why not (politely) tell them? We had some sytem that had some broken MP3 on it and we had no idea till somebody told us. Also many music is so lousy, because they do not want to pay the local RIAA (in Belgium you must pay for that, as you are broadcasting music) and thus go for some lousy music that the company provides them.

      --
      Don't fight for your country, if your country does not fight for you.
    8. Re:Petabyte DBs are old news to... by myth24601 · · Score: 1

      Hmmm... you say most places record and store the whole call now? Like, even what I say when on hold?

      Uh-oh.

      I guess let's just hope nobody listens to my recordings, lest they find out how I truly feel about their hold music.

      If you simply play copyrighted material while on hold and they record it can they be sued but the Record Industry?

      --
      No matter where you go, there you are.
    9. Re:Petabyte DBs are old news to... by QuantumRiff · · Score: 2, Insightful

      When my unemployment was running out years ago, I took a job at a call center to pay the bills.. When I had to ask a co-worker a question, I often would hit Mute instead of hold after asking them to hold. It was pretty entertaining!

      --

      What are we going to do tonight Brain?
    10. Re:Petabyte DBs are old news to... by maxume · · Score: 1

      I look forward to the day when the general level of cynicism reaches the point that CRM systems make the customer experience roughly equivalent to talking to a competent, helpful person (this probably results in more expensive service, but I think I'm okay with that).

      --
      Nerd rage is the funniest rage.
    11. Re:Petabyte DBs are old news to... by Anonymous Coward · · Score: 0

      make the customer experience better/faster/stronger/etc.

      I've got a critical Olympics overdose here. I need 5ccs of Valium, stet.

    12. Re:Petabyte DBs are old news to... by davolfman · · Score: 1

      If this was text it would be creepy. We're talking 2 or three large novels of information for every man woman and child on the planet.

    13. Re:Petabyte DBs are old news to... by suck_burners_rice · · Score: 1

      If they're saving all that information as mp3s, what happens when people start doing a petabyte DDOS attack by calling repeatedly and talking nonstop?

      --
      McCain/Palin '08. Now THAT's hope and change!
    14. Re:Petabyte DBs are old news to... by Nutria · · Score: 1

      If they had the time to listen to you while on hold.. why would they put you on hold?

      Ummmm.... If a computer can put you on hold, a computer can record all the audio signals it receives from you.

      --
      "I don't know, therefore Aliens" Wafflebox1
  6. Oh s***! I'm calling my Congressman! by BitterOldGUy · · Score: 5, Funny
    We must protect the children from the petabytes! These petabytes are everywhere trying to have sex with our children!

    I have to find my kid. Last time I saw her, she was with her Uncle Micky while he was having his morning martini.

    1. Re:Oh s***! I'm calling my Congressman! by houghi · · Score: 1
      --
      Don't fight for your country, if your country does not fight for you.
    2. Re:Oh s***! I'm calling my Congressman! by billcopc · · Score: 1

      Hotlinking FAIL!

      --
      -Billco, Fnarg.com
    3. Re:Oh s***! I'm calling my Congressman! by DeathGod321 · · Score: 1
  7. Google Street View must be most massive db ever? by Anonymous Coward · · Score: 3, Interesting

    They have many towns now with less than 50k people completely photographed, every street in high res. That has to be well over 1-petabyte, though I doubt it's all in one location, must be distributed?

  8. I am confused !! by neonux · · Score: 5, Funny

    How many Libraries of Congress are necessary to break the 1-petabyte barrier ??

    --
    @neonux
    1. Re:I am confused !! by n3xg3n · · Score: 1, Interesting

      0.009 1 Library of Congress = 10 Terabytes = ~0.009 Petabytes

    2. Re:I am confused !! by Anonymous Coward · · Score: 0

      You did your math wrong... I'll give you another try...

    3. Re:I am confused !! by Anonymous Coward · · Score: 1, Funny

      but, if they submit it to the LoC, then what?

    4. Re:I am confused !! by Anonymous Coward · · Score: 4, Informative

      1 Petabyte = 1,000 Terabytes
      1 LoC = 10 Terabytes
      100 LoC = 1,000 Terabytes
      ======
      100 LoC = 1 Petabyte

    5. Re:I am confused !! by Lachlan+Hunt · · Score: 2, Informative

      You seem to be trying to calculate in Tebibytes (TiB) and Pebibytes (PiB), which are based on the binary system, rather than Terabytes (TB) and Petabytes (PB), which are base 10.

      Although some operating systems incorrectly use the decimal-based units with binary-based values (i.e. 1TB = 1024MB), that is technically wrong. Hard drive manufacturers actually report correctly using the decimal-based values (i.e. 1TB = 1000MB).

      Also, you still got your maths wrong. 10TiB = ~0.09PiB.

      --
      By reading this signature, you hereby agree with the content of the above comment.
    6. Re:I am confused !! by hattig · · Score: 1

      I speak on behalf of many people when I say this:

      Screw the SI units for data capacity.

      1PB = 1024TB
      1TB = 1024GB
      1GB = 1024MB
      1MB = 1024KB
      1KB = 1024B

      In this case, the historical, and de-facto standard, wins. Base 2 capacities are all that matter for computer data stored in base two units of capacity, such as a block on a disc, computer memory, etc. You won't catch me using the inanely stupid SI unit names for this.

    7. Re:I am confused !! by Orestesx · · Score: 1

      Don't confuse him more...

      1/0.009 = 111

      111 Libraries of Congress = 1 Petabyte

    8. Re:I am confused !! by Lachlan+Hunt · · Score: 1

      Yeah, well, like it or not, hard drive manufacturers and data transmission rates use the base 10 SI units.

      --
      By reading this signature, you hereby agree with the content of the above comment.
    9. Re:I am confused !! by suggsjc · · Score: 1

      Isn't the Library of Congress continually growing? If so, doesn't that need to be a dynamic algorithm to adjust for its rate of growth? I couldn't find any documentation or any historical data, but I would think its out there somewhere...then we can start working on this algorithm.

      --
      When I have a kid, I want to put him in one of those strollers for twins and then run around the mall looking frantic.
    10. Re:I am confused !! by Anonymous Coward · · Score: 1, Informative

      Data transmission isn't done in power of two unit sizes (packets can be variable size), so they should indeed use base 10 units, and bits, not bytes. 10Mbps, no problem.

      Hard drives are formatted with block sizes that are a power of two (e.g., 512 bytes). Thus it is more useful to see how many of them you would have on a filesystem than some power of ten figure that also conveniently inflates the capacity.

      Imagine RAM being sold in base 10, it would be stupid.

    11. Re:I am confused !! by Anonymous Coward · · Score: 0

      I'm not sure what language you code in, but I've never had a line of code take up 10 terabytes...

    12. Re:I am confused !! by Lodragandraoidh · · Score: 1

      If you think you're confused now, wrap your head around this:

      1 64 bit address space = 18 Quintillion Bytes = 18 Million Petabytes = 18 Billion Libraries of Congress -- directly addressable by one machine (I wouldn't want the electric bill from that machine).

      --

      Lodragan Draoidh
      The more you explain it, the more I don't understand it. - Mark Twain
    13. Re:I am confused !! by zippthorne · · Score: 1

      Further, do they mean, the size of the *text* of the books when ascii-encoded, or do they mean images of every page in the books, and all the media encoded appropriately and "losslessly?"

      Even further: RGB filters only? What about reflective inks/bindings, embossed covers? lenticular "hologram" covers?

      --
      Can you be Even More Awesome?!
    14. Re:I am confused !! by dougisfunny · · Score: 1

      Well, with all the professional evidence destroying... shredding companies looking for something to do in DC, you have to be worried it might start getting smaller.

      --
      This is not the funny you're looking for.
    15. Re:I am confused !! by SlowMovingTarget · · Score: 1

      I'm not sure what language you code in, but I've never had a line of code take up 10 terabytes...

      The AC obviously codes in Lisp, all those closing parens really add up!

    16. Re:I am confused !! by neonux · · Score: 1

      Your concept of appropriately losslessly ascii-encoded lenticular holographic Library of Congress book covers confuses and infuriates me !!

      --
      @neonux
    17. Re:I am confused !! by Anonymous Coward · · Score: 0

      1 line of code = 10 terabytes? Wow, you're the ubercoder.

    18. Re:I am confused !! by novakreo · · Score: 1

      SI is the older, historical standard, dating back to the nineteenth century. And you are using the 'inanely stupid' SI names, but breaking the standard by redefining them for your own purposes. How difficult is it to write TiB instead of TB when you want to be unambiguous?

      --
      O frabjous day! Callooh! Callay!
    19. Re:I am confused !! by Lachlan+Hunt · · Score: 1

      Hard drives are formatted with block sizes that are a power of two (e.g., 512 bytes). Thus it is more useful to see how many of them you would have on a filesystem than some power of ten figure that also conveniently inflates the capacity.

      The issue being discussed isn't whether they should use base 10 or base 2 values, it's about which SI Prefix names that should be used for reporting the values.

      It is an indisputable fact that hard drive manufacturers do currently use base 10 values and the base 10 prefixes. If you think they should use base 2 values, then fine, you may have a valid point. But you would have take it up with their marketing departments. However, if they did, they would also have to switch to the base 2 prefixes to avoid any confusion. IMHO, they should just report both sizes. e.g. 1TB / 931.32GiB

      --
      By reading this signature, you hereby agree with the content of the above comment.
    20. Re:I am confused !! by Anonymous Coward · · Score: 0

      I now inform you that you are too far from reality.

    21. Re:I am confused !! by Albio · · Score: 1

      Something's wrong, Google isn't accepting my query.
      1 PB in LoC

    22. Re:I am confused !! by maxume · · Score: 1

      The answer to your question is 1/2 Mexico difficult.

      --
      Nerd rage is the funniest rage.
    23. Re:I am confused !! by Anonymous Coward · · Score: 0

      Making up new words and trying to force the world to use them is hopeless. Get over it.

    24. Re:I am confused !! by sydneyfong · · Score: 1

      The computer industry had been using the "byte" unit way before the SI Nazis kicked in. The fact that "kilo" means 1000 for grams, meters, etc. doesn't mean it has to apply to bytes.

      The reason for SI standardization is for easy conversion between units. Working in base 2 is much more natural than on base 10 in the computer industry. I hereby propose we keep the defacto standard, and invent a new unit called "glob", where a equals 1000,000bytes. For all of us who actually need a base 2 system to work with, we can keep the old metric.

      --
      Don't quote me on this.
    25. Re:I am confused !! by initialE · · Score: 1

      Is it just me, or did they stop growing the Library of Congress?

      --
      Starbucks, Harbuckle of Breath.
    26. Re:I am confused !! by novakreo · · Score: 1

      I won't speculate as to whom you refer to as 'SI Nazis', but as I said, the metric system is much, much older than the idea of a byte, which was only conceived in the last century.

      The problem with breaking the SI standard is that even if you accept byte units based on powers of 2, it doesn't make actual usage consistent. I could go out and purchase a gigabyte of RAM, and it should be 1,073,741,824 bytes, but if I dig up an old one gigabyte hard drive, I should only expect it to have 1,000,000,000 bytes. WTF? You can't even simply make an exception for data storage, because some software reports file sizes using powers of two, and some uses powers of ten.

      The sensible solution would be to let SI prefixes keep the meaning they've had from the start, and if people want to have a different system, use different names for it. If you don't like referring to tebibytes and mebibytes, then come up with something else and persuade people to use it, but in the meantime, it's not that difficult to write KB when you mean 1000 bytes and KiB when you mean 1024.

      --
      O frabjous day! Callooh! Callay!
  9. Yawn... by admi-systems · · Score: 1, Insightful

    Hard drives keep getting larger. Hard drive consumption keeps getting larger. How much larger it keeps getting really isn't all that impressive.

    1. Re:Yawn... by bconway · · Score: 1, Flamebait

      Database, not filesystem. Thanks for almost bothering to read the summary, though.

      --
      Interested in open source engine management for your Subaru?
    2. Re:Yawn... by Anonymous Coward · · Score: 1, Insightful

      A file system is a database...

    3. Re:Yawn... by Beale · · Score: 3, Insightful

      As soon as you have the capacity, people will fill the capacity. There's always more data to collect.

    4. Re:Yawn... by camperdave · · Score: 1

      And a database is a file system.

      --
      When our name is on the back of your car, we're behind you all the way!
    5. Re:Yawn... by Anonymous Coward · · Score: 0

      Actually, it's mostly the other way around but, yes, occasionally you store a file system in a database.

    6. Re:Yawn... by Anonymous Coward · · Score: 0

      And a database is a filing system.

      There, fixed that for you.

    7. Re:Yawn... by Aaron5367 · · Score: 1

      Hard drives keep getting larger. Hard drive consumption keeps getting larger. How much larger it keeps getting really isn't all that impressive.

      How much larger will it have to get until we run out of porn?

  10. Petabyte? by rstevens · · Score: 0, Redundant

    Petabyte? I never touched a byte!

    --
    http://www.clango.org
  11. No big news here.... by edwardd · · Score: 5, Interesting

    Take a look at almost any large financial firm. The email retention system alone is much larger than a petabyte, and that's just dealing with the online media, not including what's spooled to tape. Due to deficiencies in RDBMS ssytems, each of the large firms usually develop their own systems for managing the archival system on top of the database.

    1. Re:No big news here.... by jiayao · · Score: 1

      Email retention system and any archival system is boring. I wouldn't be excited to see such a system to manage exabytes of data. The only operation you need to do is search. You don't do join or updates. Another uninteresting example is a huge database with most of the data being opaque blobs. It's more like a file system than a database. It's the live data that are queried and updated constantly that counts. E.g. Yahoo's web analytics database. I think it's amazing they managed to keep such an amount of data possibly distributed across data centers consistent during updates and still keep the system responsive with constant queries.

  12. It's not... by Anonymous Coward · · Score: 0

    It's not how big it is, but how you use it. :)

  13. Oh, come on. by seven+of+five · · Score: 4, Interesting

    Call me old fashioned, but I don't see why anyone but a search engine like google would need anything like a petabyte. You can have only so much useful information about anything. Sounds to me like, fill your garage with sh1t, build a bigger garage.

    1. Re:Oh, come on. by poetmatt · · Score: 4, Insightful

      So the fact that movies have gone from 780mb (dvdrips) to 4.8gb (straight up copies) to 25gig (blu ray) doesn't bear any significance to you?

      Or how about games which have gone from 1mb to installations that are upwards of 10gigs now (warhammer IIRC is 9 something).

      Not to mention MS's fiasco of their Office XML format where things take up a ridiculous amount of space in comparison to open office (10mb docx vs 2.9mb open office)...it's all about the level of tech knowledge of someone that determines their space usage.

      I wouldn't mind 3-4 TB, I'd split it off into about 4 partitions or raid stripe and call it a day for a while.

      However consumer use is indicative of business use, so I would expect things to head towards exabyte eventually.

    2. Re:Oh, come on. by AP31R0N · · Score: 3, Insightful

      Agreed.

      And i'd also be worried about losing a PB all at once. There are TB drives at my local Best Buy, but that's a lot to lose at once. i'd rather split my files and programs between two or more smaller drives (and have a RAID).

      --
      Utilizing the synergization of benchmark e-solutions to pre-workaround action items!
    3. Re:Oh, come on. by secondhand_Buddah · · Score: 3, Funny

      Bill, is that you???

      --
      Participatory Governance : The only feasible option for a real democracy, where everyone really does have a say.
    4. Re:Oh, come on. by seven+of+five · · Score: 4, Insightful

      However consumer use is indicative of business use, so I would expect things to head towards exabyte eventually.

      This is kind of my point. Do companies keep libraries of pr0n, video, music? Sure, if you're a media company you will. But say you're a plumbing distributor. You'll have the usual accounting stuff, and media for marketing, and some BS overhead, but don't tell me it adds up to a TB much less a PB.

      On the other hand, if you have the extra space, it invites the usual waste in the form of archive directories for closed-out years, development junk, etc. Spinning round and round, doing nothing.

    5. Re:Oh, come on. by garcia · · Score: 1

      You can have only so much useful information about anything.

      If you have the space available and the tools to utilize the stored data, why not? The more data you keep, the more information you will have available when techniques or routines become available to you to utilize this data.

    6. Re:Oh, come on. by Kjella · · Score: 3, Insightful

      Call me old fashioned, but I don't see why anyone but a search engine like google would need anything like a petabyte. You can have only so much useful information about anything. Sounds to me like, fill your garage with sh1t, build a bigger garage.

      Unfortunately, you gather up a lot of digital stuff fast and most of the time it's not useful. Take for example my business mail, it's full of old presentations and random versions of various documents and whatnot. Is it worth cleaning up? No. Is it worth keeping? Well, from time to time clients start asking about old things and it's very useful to have it. I figure 90% of it could be deleted, only keeping final versions and important mails. Of those 90% will never be asked for again, so I keep 100% for maybe 1%. Make a company with hundreds of thousands of people all like that and you get huge, huge amounts of data. It's still cheaper than to go through those huge, huge amounts of data. That goes double for many automated data collection processes - it's cheaper to keep until it's all guaranteed useless than trying to sort it out.

      --
      Live today, because you never know what tomorrow brings
    7. Re:Oh, come on. by Orestesx · · Score: 1

      Do you want to figure out which is the useful stuff? Better just to store it all; you don't know what is useful into you need it.

    8. Re:Oh, come on. by tekiegreg · · Score: 1

      This might be going slightly offtopic but yeah I've noticed that with the increases in data size, an increase in backup awareness and redundancy has been percolating down even to the home users.

      For example, recently I set up a mirrored drive system for my stepdad for his home photos (which are somewhere in the 200GB range as he is semi-professional) just in case one drive goes out. Also I've been looking at a cheap DVD Autoload backup option. Any ideas there from the Slashdot crowd?

      --
      ...in bed
    9. Re:Oh, come on. by jimmux · · Score: 1

      I'm currently working on a project which has a working database of around 1.5 petabytes (at last count).

      What's more, this database is constantly ingesting more data and shuffling off old data to tape archives. If the technology was available, this DB would be even bigger so we wouldn't have to retrieve data from archives in order to query data more than a year old.

      There is an unbelievable amount of data out there. As long as there is somewhere to put it, we will find reasons to stick it in a database and analyse it.

    10. Re:Oh, come on. by abigor · · Score: 1

      a. How on earth would you know? Do you work in a data-intensive industry?

      b. Do you understand what a data warehouse even is?

      c. Data mining is statistically based. The more information that's available to mine, the more accurate the results will be. And by "information", I don't mean some kid's hard drive filled with terrible mp3s and downloaded movies.

    11. Re:Oh, come on. by MrMarket · · Score: 1

      I'm guessing most of these databases are keeping CYA information, most of which will never be used.

    12. Re:Oh, come on. by V!NCENT · · Score: 1

      If you download your ass of and you don't want to delete your porn, games, movies (BluRay) and music (uncompressed) and your hdd/ssd' are in raid formation so that they back each other up with a journaling filesystem and they are partitioned for all your Linux/*BSD/Windows distros and you have never thrown a single file away because you back everything up and place it all back after a cleaaan install ... I think you are going to want to have petabyte storage capacity.

      --
      Here be signatures
    13. Re:Oh, come on. by Alpha830RulZ · · Score: 5, Interesting

      Data mining is statistically based. The more information that's available to mine, the more accurate the results will be.

      A minor quibble. I do data mining for a living. With most data sets, we end up sampling them down, because more data ramps up processing time faster than it improves accuracy. With most problems, more data doesn't improve accuracy measureably, once you've reached a certain critical mass size in the dataset. Simplistically, you don't need to flip the coin a billion times to figure out that it comes up heads 50% of the time.

      It's a rare problem that we use more than 100,000 records for. They exist, but they're rare.

      --
      I was taught to respect my elders. The trouble is, it's getting harder and harder to find some.
    14. Re:Oh, come on. by thexile · · Score: 1

      Yeah, 640KB should be enough.

    15. Re:Oh, come on. by Anonymous Coward · · Score: 0

      Try financing systems instead of widget mackers.You'll learn exactly why real data needs invariable outstrip hardware limits.

    16. Re:Oh, come on. by Fweeky · · Score: 1

      Also I've been looking at a cheap DVD Autoload backup option. Any ideas there from the Slashdot crowd?

      Backup 200GB+ of data to DVD's? Are you mad? That's 25-50 disks just for the initial backup, and you probably want twice that to handle discs going bad.

      Get two or three external disks (ESATA ideally; you can run SMART self tests, get better transfer rates, etc). Use a decent incremental backup tool to make versioned snapshots to them, rotating the drives periodically; keep one in storage, and ideally one off-site. Faster, less hassle, more robust and more flexible than a pile-o-DVDs.

    17. Re:Oh, come on. by blahplusplus · · Score: 1

      "This is kind of my point. Do companies keep libraries of pr0n, video, music? Sure, if you're a media company you will. But say you're a plumbing distributor. You'll have the usual accounting stuff, and media for marketing, and some BS overhead, but don't tell me it adds up to a TB much less a PB."

      That's true for small companies but places like Digg and any site that gets a lot of comments would very quickly fill up that TB.

    18. Re:Oh, come on. by nasor · · Score: 1

      That's exactly what I was thinking. Okay, a hi-def movie is 25 GB - but does some company really have 40k hi-def movies to stored?

    19. Re:Oh, come on. by tekiegreg · · Score: 1

      While I was only looking to back up maybe 15-20GB subset of that data, floating the idea of DVD's for it, you do have a point there. I can probably do a decent backup with more external HD's and cheaper too in the long run. Thanks for the sanity check bud!

      --
      ...in bed
    20. Re:Oh, come on. by lp_bugman · · Score: 1

      In the Financial bussines. We need to keep all trading data for at least 7 years. Most client firms use automated quoting systems. So traffic is substantial. 100's of megabytes for multiple systems of data are generated each day.(Quoting, Trading, Clearing, Reporting, ...). IO performance and ACK in millisecond. Are also very important.

      --
      BSD licensed software can't be stolen....
    21. Re:Oh, come on. by Grishnakh · · Score: 1

      So the fact that movies have gone from 780mb (dvdrips) to 4.8gb (straight up copies) to 25gig (blu ray) doesn't bear any significance to you?

      Are people actually storing BD movies on their hard drives these days? In BitTorrent land, movies are still only a gig or so, even the ones ripped from BD, as they always use a better codec like h264 or Xvid, rather than the ridiculously obsolete MPEG2.

      Not to mention MS's fiasco of their Office XML format where things take up a ridiculous amount of space in comparison to open office (10mb docx vs 2.9mb open office)

      Does anyone actually use this? Last time I checked, most Fortune 500 companies (including mine) are still using older versions of Windows (XP) and Office.

    22. Re:Oh, come on. by Anonymous Coward · · Score: 0

      Of course most businesses don't use that much. The average small to medium sized company out there is happy that they can store a decade worth of their business records on storage media that costs under $100.

    23. Re:Oh, come on. by tfunk1234 · · Score: 1

      While most companies don't need to keep this level of data, there are a number who do. Think of banks and credit card companies who need to store every transaction that happens on their cards. Or supermarkets who store a record of every item anyone purchases. There are a number of business's who need to store hundreds of billions of transactions.

    24. Re:Oh, come on. by Anonymous Coward · · Score: 0

      Yes, but you forget that the plumber is probably using M$ products and hence will need need an exabyte of storage for their work-related files.

    25. Re:Oh, come on. by Anonymous Coward · · Score: 0

      Sure, if you're a media company you will. But say you're a plumbing distributor.

      Say you are a plumber. In 2018.

      In 10 years, plumbing businesses will use petabyte databases to hold HD video records of their employees actually dealing with customers, discovering problems and fixing those problems. It is something which has value even now, but is routinely ruled out (or not even considered) because everyone knows it is too expensive.

      But, then, in 1988 everyone knew that scanning file cabinets full of documents and storing them on disk was too expensive.

  14. Too Bad Most of that is Due to Poor... by eno2001 · · Score: 1, Insightful

    ... DB design and old data that should be purged. Color me unimpressed.

    --
    -"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o
    1. Re:Too Bad Most of that is Due to Poor... by Anonymous Coward · · Score: 2, Interesting

      ... DB design and old data that should be purged. Color me unimpressed.

      I'm convinced now that regardless of attempted discrimination, HUMANS are pack-rats. THAT I can deal with, as people can be trained to actually throw shit away. The problem is when lawyers get involved in the matter. Yes, most of the shit we have today in the corporate world we are FORCED to keep due to some insane lawsuit and follow-up "fix-it-forever" law that calls for us to keep a copy of every damn thing that flows electronically for the next 7 - 70 years.

      Could you almost call it corruption? Yes, I can. The similarities between supply and demand feeding the corruption of oil companies can also be seen in data storage markets. Hard drives probably wouldn't be eclipsing 80GB if it were not for laws driving it that way. New personal computers with almost a terabyte of storage, yeah like Grandma is ever gonna fill that up. Give me a break.

    2. Re:Too Bad Most of that is Due to Poor... by N!k0N · · Score: 1

      Hard drives probably wouldn't be eclipsing 80GB if it were not for laws driving it that way.

      I'm not sure it's laws in the consumer market rather than a cyclical "failure"* in thinking.

      Lets take the hypothetical situation that Company A has made a 1GB drive, when all predecessors were making 250-500MB. Company (M)$ sees this, and instead of going through all the necessary steps to clean bloat from their new piece of software leaves it in, because "everyone has big drives now". Software companies start to follow in company (M)$'s footsteps, and hard drive manufacturers are then forced to make increasingly larger drives, which Joe Consumer then fills with pr0n and other random junk because the space is just *there* now....

      * I say "failure" because at some level it is a flaw in the thinking regarding the whole harddrive issue that you've stated...

  15. OO databases have done this ten years ago by cjonslashdot · · Score: 5, Interesting

    I remember encountering a 1+ petabyte database 10 years ago: it was the database to record and analyze particle accelerator experiment data at CERN. And it was built using a commercial object database - not relational. Oh but wait - the relational vendors have told us that OO databases don't scale....

    That was ten years ago.

    1. Re:OO databases have done this ten years ago by Anonymous Coward · · Score: 0

      if all you are doing is reading from the data than OO is OK. if you are doing a lot of writes than you need relational

    2. Re:OO databases have done this ten years ago by Anonymous Coward · · Score: 0

      So, 10 years ago when hard drives where 50 gigs max you saw a 1 petabyte database?

      That is what? 20,000 drives, minimum? I think 50 gigs for the time would have been enormous, probably a lot closer to 5.

      So, it was probably a tape system, if it existed at all. I wouldn't call a system living on tape a database by any means, the access would be to slow to do anything. Something isn't adding up.

      There are many reasons to have DB vendors, but don't be a sore losing just because your technology of choice turned out to be a massive failure.

    3. Re:OO databases have done this ten years ago by dfetter · · Score: 1

      Storing it is one thing. Querying is a very different thing. What happens when somebody wants to find out something not specifically envisioned in the original experiment?

      --
      What part of "A well regulated militia" do you not understand?
    4. Re:OO databases have done this ten years ago by littlewink · · Score: 3, Interesting
      You are mistaken. While certainly almost everything (right or wrong) has been said at some time by someone, nobody respectable who knew what they were doing ever claimed that object-oriented databases would not scale.

      In fact OO and similar (CODASYL, network-style, etc. ) databases were used and continue to be used very heavily in applications where relational database do not scale.

    5. Re:OO databases have done this ten years ago by cjonslashdot · · Score: 1

      You are right. For ad-hoc access, relational is superior. However, for pre-defined access, OO is superior. In particular, OO is far superior for 99% of all three-tier apps (those that use an app server), because for those kinds of apps the transactions are known ahead of time.

    6. Re:OO databases have done this ten years ago by cjonslashdot · · Score: 1

      I guess I was referring to the current community of developers that only use relational systems, as if they were the only game in town. For example, what percentage of middle-tier apps do you think use relational today? And what percent do you think would be best implemented as OO? In my own opinion, the numbers are probably something like 99% and 99%, respectively.

    7. Re:OO databases have done this ten years ago by Anonymous Coward · · Score: 0

      You mean recorded ton's of unstructured data in objects? Recording stuff is is storage, not database.

    8. Re:OO databases have done this ten years ago by TheSunborn · · Score: 2, Interesting

      Only problem is, where do you find an oo database with a good index and search implementation, that don't cost to much that when you ask the company for a price, they don't even want to reply.

    9. Re:OO databases have done this ten years ago by cjonslashdot · · Score: 3, Interesting

      Point well taken. The problem now is the reality that OO databases database products were decimated by their failure to explain their value to the market. However, there is a little bit of a resurgence. See http://www.service-architecture.com/products/object-oriented_databases.html

    10. Re:OO databases have done this ten years ago by Anonymous Coward · · Score: 0

      ... 10 years ago: ... it was built using a commercial object database - not relational... told us that OO databases don't scale...

      What was the name of the commercial database?

      How long did it take to 'sort'?

      How about an insert and sort?

      How many indexes?

      What if you needed to create an index?

      Was it one table or more...if more how long would a 'join' take?

      lol, call me skeptical.....

  16. Google Maps is way bigger... by Plantain · · Score: 3, Informative

    Google Maps' database is far bigger...

    A base of 8 tiles, with each becoming four more smaller tiles, in two modes (map/satellite), and 16 zoom levels.

    Each tile is approx. 30kB.

    (((0.03* (8 * (4^16)))/1024)/1024) == 983.04TB right there.

    My calculator doesn't handle numbers big enough for streetview. O_O

    --
    No, but I did throw granola at a deaf person once
    1. Re:Google Maps is way bigger... by Speare · · Score: 4, Funny

      Google Maps' database is far bigger...

      A base of 8 tiles, with each becoming four more smaller tiles, in two modes (map/satellite), and 16 zoom levels.

      We are sorry, but we don't
      have maps at this zoom
      level for this region.
      Try zooming out for a
      broader look.

      --
      [ .sig file not found ]
    2. Re:Google Maps is way bigger... by Plantain · · Score: 1

      There's actually 20 zoom levels, but I'm approximating 16 as the average.

      --
      No, but I did throw granola at a deaf person once
    3. Re:Google Maps is way bigger... by Speare · · Score: 1

      My point is, two thirds of the surface of the earth is water. Oceans have maybe two or three zoom levels. Given the fractal nature of the data, your estimate of "16 levels" as the global average is waaaaaay off base. I'd be very surprised if all the unique graphics for all modes ends up being more than 1 terabyte.

      --
      [ .sig file not found ]
    4. Re:Google Maps is way bigger... by imsabbel · · Score: 1

      Then be surprised.

      The landsat data alone comes close to 1TB.
      And that is just the whole world in the broad 30m or so array.
      (I know, because waaay back, i mirrored part of the Nasa WorldWind data)

      This data is in no way fractal in nature.

      And just do the math (just to see that your argument is bogus):

      A km^2 at level 20 has 4^4=256 times as much data as one at level 16.
      If you do the math, central europe alone is enough to push the world to an average of level 16 (germany, e.g., is completely covered in airplane pictures, equaling about 25% of the earch surface in level-16 equivalent)

      --
      HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
    5. Re:Google Maps is way bigger... by TheGreatGraySkwid · · Score: 1

      That's the worst haiku I've ever seen.

      --
      The Humblest Mollusk on the Net
    6. Re:Google Maps is way bigger... by maxume · · Score: 1

      This is almost certainly because you lack ambition.

      --
      Nerd rage is the funniest rage.
    7. Re:Google Maps is way bigger... by Anonymous Coward · · Score: 0

      plus the fact that the average land tile is nowhere near 30 kb. Even the landsat images are smaller.

    8. Re:Google Maps is way bigger... by Anonymous Coward · · Score: 0

      Google Maps' database is far bigger...

      We are sorry, but we don't

      have maps at this zoom

      level for this region.

      Try zooming out for a

      broader look.

      But I want to see that huge patch of trash accumulating in the middle of the pacific ocean.

      I also want to look for subs, UFOs and even very large fish (i.e. whales), so please let me zoom closer, please?

  17. "Barrier"? by Anonymous Coward · · Score: 1, Insightful

    Gigabyte barrier. Petabyte barrier.

    In what sense are these barriers? Does the database resist putting more data in it the closer to a petabyte you get? Is it likely to explode once it reaches 1 petabyte?

    1. Re:"Barrier"? by Daimanta · · Score: 1

      "Is it likely to explode once it reaches 1 petabyte?"

      No, but your head will.

      --
      Knowledge is power. Knowledge shared is power lost.
    2. Re:"Barrier"? by n9hmg · · Score: 1

      That is exactly why I bothered to post. I think banal idiots try to amplify the importance of a milestone, and a PB IS something of a psychological milestone, by calling it a barrier. There WAS a barrier, of sorts, at 2G or 4G depending on addressing scheme, but that was easily put away with other addressing schemes, and with 64-bit architecture, it's not even relevant any more.

      Hey, I just passed the 384-character barrier! Whoah!, breezing right on past! This is amazing!

    3. Re:"Barrier"? by rdebath · · Score: 1

      The same barrier exists at 2TB or 2^32 disk sectors.

      After that MSDOS style partition tables aren't good enough any more.

    4. Re:"Barrier"? by Ant+P. · · Score: 1

      These used to be actual barriers, but now that we're measuring most things in 64 bits it doesn't really mean anything.

  18. I won't call you old fashioned... by VampireByte · · Score: 4, Insightful

    ... but I do wonder if you've ever heard of Sarbanes-Oxley.

    --

    Run and catch, run and catch, the lamb is caught in the blackberry patch.

  19. When the petafile barrier crumbles ... by cpu_fusion · · Score: 5, Funny

    ... we'll need an army of Chris Hansens and a mountain of beartraps. God help us.

  20. the only *real* barrier is backup time by petes_PoV · · Score: 5, Interesting
    or more correctly, restore time.

    Any organisation that wishes to be classed in any way professional knows that the value in it's databases has to be protected. That requires them to have the means to recover the data if something bad happens. A hot-mirrored copy is simply not good enough (one corruption would get written to both copies).

    As a consequence, the size of commercial databases is limited by the amount of time the organisation is willing to have it unavailable while it is restored, in the case of a disaster, or the time taken to create/update secure, offline, copies.

    Not by intrinsic properties of the database or host architecture

    --
    politicians are like babies' nappies: they should both be changed regularly and for the same reasons
    1. Re:the only *real* barrier is backup time by jdanton1 · · Score: 1

      Block based snapshots in conjuntion with database backup packages are the only way to do this. For instance with a Net App filer, you can take a block level image, and tie it in to Oracle's RMAN (Recovery Mananger). It's the only way to deal with DBs that large. BTW, I think the size limit in Oracle 10 is on the order of 10 exabytes, and Oracle 11 has no size limit.

    2. Re:the only *real* barrier is backup time by TheLink · · Score: 1

      Exactly.

      When various Important People are standing behind you making "supportive" noises, while other people are coming by every 5 minutes to ask "Is it fixed yet?", you'll start to realize that restore time is very important, and that disk I/O is pathetic, and tape is overrated.

      --
    3. Re:the only *real* barrier is backup time by Anonymous Coward · · Score: 0

      This is why there are mirror activators and replication servers - they will read transaction log and apply the changes to a remote database, hence stopping corruption from propagating.

  21. Effect of the scale by cefek · · Score: 2, Insightful

    Imagine having tens of millions, or just millions users - all of them with their records, history, targeted ads data. Or some mail provider that stores attachments in a database. Or a file sharing service like those you and I know. That's a plenty of information to manage. Add an overhead, and it's easy to overfill even the biggest database.

    Also I agree with you that bad design might be a concern. Of course there's no big database that couldn't get on a "purge" diet.

    Now seems to me we might have a problem with querying such a big bucket of random data. Imagine a query taking months to complete. We're gonna be there in another ten years.

    And then we lose the capacity to make electricity. And we can use our CDs, DVDs, let alone magnetic media to... well, dig trenches.

    Those pesky petabytes of data are going to doom us.

    --
    Plain old sigh.
  22. Science! by edremy · · Score: 5, Informative
    Petabytes are actually pretty common in the sciences. I visited NCAR (National Center for Atmospheric Research) in Boulder five years ago and their main database was in the 2PB region even then. I'm sure it's a lot larger today

    The LHC will generate several PB of data per year, as will the Large Synoptic Survey Telescope. These projects aren't all that uncommon.

    --
    "Seven Deadly Sins? I thought it was a to-do list!"
    1. Re:Science! by dargaud · · Score: 1

      The LHC will generate several PB of data per year, as will the Large Synoptic Survey Telescope [lsst.org]. These projects aren't all that uncommon.

      Shit, I'm working on those 2 projects. I'd better ask for a bigger hard drive to management...

      --
      Non-Linux Penguins ?
    2. Re:Science! by boombaard · · Score: 1
      don't forget projects like LOFAR (snippets from lofar website)

      In the first digital processing step 256 kHz subbands are formed. Only a subset of these bands is further processed. The maximum total bandwidth selected for further processing will be 32 MHz. Each Remote Station delivers a single dual polarization beam at 32 MHz, or 8 dual polarization beams at 4 MHz or any combination in between. The resulting output data rate is 2 Gb/s. The secondary filtering stage (to 1kHz channels) is done in the Central Processing system.

      LOFAR produces large data streams, especially for the astronomy application (e.g. 6 TB of raw visibility data for an 8 beam, 4 hour synthesis observation, after integration for 1 sec and over 10kHz). One month of observing in this mode results in a PetaByte of data. (Systematic long-term storage for such data volumes thus becomes extremely expensive.)

      The project is hardly up and running yet, but still, quite a bit of raw data to process. (powered by IBM's BlueGene/L)

    3. Re:Science! by steelfood · · Score: 1

      The LHC will generate several PB of data per year

      I know 1080p60 takes a lot of space, but I'm not sure I want to see that much hardon's colliding...

      --
      "If a nation expects to be ignorant and free in a state of civilization, it expects what never was and never will be."
  23. Noob by SmallFurryCreature · · Score: 4, Funny

    My porn collection has long since achieved infinity.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

    1. Re:Noob by BrotherBeal · · Score: 1

      Your sig STRONGLY suggests otherwise ;)

      --
      I'm disabling ads until because I choose not to reward redesigns that are less usable than "view source".
    2. Re:Noob by Gilmoure · · Score: 5, Funny

      It has an event horizon and is actively acquiring porn on it's own?

      --
      I drank what? -- Socrates
    3. Re:Noob by Anonymous Coward · · Score: 0

      It has an event horizon and is actively acquiring porn on it's own?

      And surprisingly, it only needed to be seeded with "the 1000 pound woman" with a banana.

    4. Re:Noob by Rick+Bentley · · Score: 1

      that explains my lack of hard disk space...

      --
      My favorite quote doesn't fit into 120 characters. Now no one will like me.
    5. Re:Noob by Kjella · · Score: 1

      My porn collection has long since achieved infinity.

      It has an event horizon and is actively acquiring porn on it's own?

      <voice series="Futurama" character="Hermes Conrad">
      That would be a singularity. Since the universe is infinite, you can have an infinitely large porn collection by using an infinitely large volume rather than create a singularity.
      </voice>

      --
      Live today, because you never know what tomorrow brings
    6. Re:Noob by Gilmoure · · Score: 1

      So... the entire universe may just be an infinite porn collection encoded in matter and energy? Damn it! Where's the key?

      --
      I drank what? -- Socrates
    7. Re:Noob by infinite9 · · Score: 2, Funny

      ...event horizon...

      Awesome! That's what I'm going to call it now! My "event horizon"!

      "Here it comes baby, the point of no return!"

      --
      Disconnect your television. Do your own research. Draw your own conclusions. They're probably lying. Don't be a sheep.
    8. Re:Noob by Poltras · · Score: 1

      Not on slashdot, for sure :)

    9. Re:Noob by Facegarden · · Score: 1

      So... the entire universe may just be an infinite porn collection encoded in matter and energy? Damn it! Where's the key?

      A bottle of Jagermeister?
      -Taylor

      --
      Worldwide Military budgets: $2100 billion. Worldwide Space Exploration budgets: $38 billion. Really, world? Really?
  24. s/barrier/arbitrary round number/g by ivan256 · · Score: 4, Insightful

    That is all.

    1. Re:s/barrier/arbitrary round number/g by Spatial · · Score: 1

      I'm an arbitrary round number, you insensitive clod!

  25. The world will only ever need 5 large databases by davidwr · · Score: 5, Funny

    The world will only need 5 large databases.

    None of them will never need more than 640KB^H^HMB^H^HGBMB^H^HTB of RAM and 32MB^H^HGB^H^HTB^H^HPB of storage.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  26. Re:life0cidal corepirate nazi execrable decomposin by stainlesssteelpat · · Score: 1

    Wow....I think somebodies PB database got too close to a magnet, without a tinfoil hat.

    --
    War is the statesman's game, the priest's delight, the lawyer's jest, the hired assassin's trade.- Shelley
  27. Re:Google Street View must be most massive db ever by Anonymous Coward · · Score: 5, Informative
  28. Wired by zehnra · · Score: 1

    Apparently nobody caught the Wired article on this a couple of months ago?

    The Petabyte Age

    1. Re:Wired by Anonymous Coward · · Score: 0

      Comments like this make me what to shout, "Simpons Did It!"

  29. Is this article a commercial for proprietary dbs? by GNUPublicLicense · · Score: 1

    I think that's obvious... actually with hard disk of 1 terrabytes being broadband, reaching a petabyte is quite easy, even for a midsize organization. Where I work, we build ourselves our disk matrixes, and reaching 1000 terabytes is about to put together just a few 1000's of disks, not a big deal.

  30. WalMart has a 4 petabyte database already by captaindomon · · Score: 4, Informative

    WalMart's data warehouse is already 4 petabytes: http://storefrontbacktalk.com/story/080307walmart.php

    --
    Just because I can hook a shark from a boat, I do no offer to wrestle it in the water.
    1. Re:WalMart has a 4 petabyte database already by Anonymous Coward · · Score: 2, Funny

      They only needed one petabyte, but the Chinese cut them a deal on 4.

  31. IBM Boulder by Abattoir · · Score: 2, Insightful

    Is the location of IBM's Managed Storage Services (MSS) division, which deploys SAN for customers in Boulder (including IBM internal) and other locations (over high speed fibre links) on IBM "Shark" (ESS) and DS6000/DS8000 devices. When I worked at IBM their marketing materials stated they were managing over 4 petabytes of data for enterprise customers out of that location alone - that was four years ago! That doesn't count for other MSS locations either, nor all the other areas where IBM implements large amounts of storage for customers. Remember, many if not most of IBM's customers are governments and Fortune 100 companies, particularly high finance. I think they've got some data.

    So you want to talk about high levels of storage - IBM has the game covered, considering they invented the HDD.

    1. Re:IBM Boulder by serviscope_minor · · Score: 1

      So you want to talk about high levels of storage - IBM has the game covered, considering they invented the HDD.

      Actually, this is about databases rather than disks per se. But that's OK since they invented the relational database, too.

      --
      SJW n. One who posts facts.
  32. I wonder by DragonTHC · · Score: 1

    How much of that data is marketing information?

    seriously, is all of that data current and necessary?

    seems to me that they should prune off and backup old data.

    --
    They're using their grammar skills there.
    1. Re:I wonder by arrowrod · · Score: 0

      Wonder no more. Some jackass always puts a retention date on data. Usually 25 or 100 years. Mostly trivial backups. The rational: "Well you never know". What is most amusing is backup data is usually the same over and over. Day after day, year after year.

    2. Re:I wonder by Shados · · Score: 1

      When you're doing automated data projections, using previous years of data to try and predict, from trends, the future (so to speak), having 10+ years of data isn't a luxury. And in our field, 10 years of data is often -all- of your data...so well...

  33. Johnny Mnemonic by vjmurphy · · Score: 5, Funny

    I need measurements I can understand, like how many Keanu Reeves' brains is a petabyte? And could he hold it indefinitely, or would his head explode at some point? If the latter, can we get him started on it now?

    --
    Vincent J. Murphy
    Spandex Justice
    1. Re:Johnny Mnemonic by Anonymous Coward · · Score: 0

      Well, it depends on whether or not he used a doubler.

    2. Re:Johnny Mnemonic by Lodragandraoidh · · Score: 1

      I believe 1 'Keanu' = 64 Kilobytes, but I would have to check the literature...

      --

      Lodragan Draoidh
      The more you explain it, the more I don't understand it. - Mark Twain
    3. Re:Johnny Mnemonic by Johnny+Mnemonic · · Score: 1

      Johnny's brain could hold 80GB, or 160GB if he used a "doubler". So a PB is 12.5 times the capacity of Johnny's brain, undoubled.

      I should know. ;)

      --

      --
      $tar -xvf .sig.tar
    4. Re:Johnny Mnemonic by Leebert · · Score: 1

      I should know. ;)

      That's a bummer then, since you're off by a factor of 1000. ;)

    5. Re:Johnny Mnemonic by Anonymous Coward · · Score: 0

      IIRC, a Keanu brain overloaded to 320GB would explode in 24hrs. If we assume an inverse relationship between loading and time to failure, then a PB would make a mess in about 27 seconds.

      Whoa...

      More(?) interestingly, the first PB mention I recall was around 1996 regarding Exxon's offshore seismic survey database, which was around 15PB at the time.

    6. Re:Johnny Mnemonic by SlowMovingTarget · · Score: 1

      Put a one and two zeroes in front of that and we've got ourselves some storage!

  34. Chia Pet a Bite? by Anonymous Coward · · Score: 0

    No no, if your Chia Pet is a-bitin' you're most likely doing something wrong.

    Just add water, and sing along: Ch-ch-ch-chia!

  35. I could see practical applications by gravis777 · · Score: 2, Informative

    Okay, I know that the article is refering to database, but the comments seem to have gone into the way of disc storage, so I will take the bait and go off topic.

    Petabyte drives would not really be that unpractical of an application for people who like to archive stuff. I just filled up a 300 gig drive and a 750 gig drive with just stuff off of the DVR in under a year. While National Geographic HD may be compressed so badly that it barely looks better than HD, and a one hour show is under 2 gig, try archiving something with a higher bandwidth. For example, I recorded the Olympics, and saved the opening and closing ceremonies and all gymnastic events. A single 4 hour day saved is around 40 gig.

    So, lets think media server for HD material. Let's just stick with HDTV for a while. Let's say that I want to archive on a media server a Blu-Ray disc. Let's for the matter of talking say that the movie takes up all 50 gig of the disc. Ten movies, 500 gig. 100 movies, 5 Terrabyte, 1000 movies, 50 Terrabyte.

    Now let's say that we are an IMAX theater, and upgrading to the new Imax Digital standard. I read not too long ago that an Imax film is equilivant to 18k (most digital theaters project 2K, although some are now installing 4K systems). So, to keep from having these big massive films around of the 20 year old science documentaries that we keep in rotation, we get the digital versions of these. Does anyone want to do the math?

    I am waiting for the day when neural implants can actually read the human brain, and as such, you can archive experiences to some type of storage medium. I am sure wikipedia has somewhere how much information the human brain processes a second. Now, I am sure we will find a way of compressing stuff, we can already do audio and video, so I am sure one day we will have the ability to compress smell, taste and touch, granting that we actually have a way of capturing these. Still, the amount of data would be massive, and will probably be a whole new avenue for the Porn industry.

    Granted, these are extremes, but who would have thought 15 years ago when we first started hitting the 1 gig barrier, that in 2008 we would have discs used for storing movies that have a capacity of 50 gig, and we would even consider saving stuff at a resolution of 1920x1080 and have PCM sound at a bitrate of 4.6Mbps?

    Give us the storage space, and we will find a use for it.

    1. Re:I could see practical applications by N!k0N · · Score: 1

      I am waiting for the day when neural implants can actually read the human brain, and as such, you can archive experiences to some type of storage medium. I am sure wikipedia has somewhere how much information the human brain processes a second.

      I don't know how *accurate* this is, but I ran across this...

      Current estimates of brain capacity range from 1 to 1000 terabytes! "Robert Birge (Syracuse University) who studies the storage of data in proteins, estimated in 1996 that the memory capacity of the brain was between one and ten terabytes, with a most likely value of 3 terabytes. Such estimates are generally based on counting neurons and assuming each neuron holds 1 bit. Bear in mind that the brain has better algorithms for compressing certain types of information than computers do."

      and this

      the processing power of a average brain to be about 100 million MIPS

      couldn't find anything on wikipedia though.

    2. Re:I could see practical applications by Anonymous Coward · · Score: 0

      But you see, the porn industry drives lots of industries ;) Like the DVD market. haha.

    3. Re:I could see practical applications by Kjella · · Score: 1

      Granted, these are extremes, but who would have thought 15 years ago when we first started hitting the 1 gig barrier, that in 2008 we would have discs used for storing movies that have a capacity of 50 gig, and we would even consider saving stuff at a resolution of 1920x1080 and have PCM sound at a bitrate of 4.6Mbps?

      Actually, very many. The infamous Moore's "law" was well underway and everything was growing nice and exponential. Though what the future needs is the bandwidth revolution, it's not "We can store 50GB" but "Why should I store 50GB?". Give me a fast enough pipe and I'll download on demand, delete and if I want to watch it again a few years later I'll download it again. There's no need for millions of people each storing a multi-TB video archive.

      --
      Live today, because you never know what tomorrow brings
    4. Re:I could see practical applications by kipman725 · · Score: 1

      I would have thought by now we would have SACD style bit streams that simplify the A/D => D/A process by removing the number of needed converters.

    5. Re:I could see practical applications by Anonymous Coward · · Score: 0

      Now let's say that we are an IMAX theater, and upgrading to the new Imax Digital standard. I read not too long ago that an Imax film is equilivant to 18k (most digital theaters project 2K, although some are now installing 4K systems). So, to keep from having these big massive films around of the 20 year old science documentaries that we keep in rotation, we get the digital versions of these. Does anyone want to do the math?

      My train will be leaving New York heading south at an average speed of 50 mph. Another train is heading NorthWest from Florida and departs at the same time. At what average speed does the Florida train have to travel to meet up with the New York train?

      Does anyone want to do the math?

    6. Re:I could see practical applications by Nutria · · Score: 1

      Give me a fast enough pipe and I'll download on demand, delete and if I want to watch it again a few years later I'll download it again.

      Until the site(s) hosting (and that includes BitTorrent seeds) "it" take(s) "it" down.

      It's foolish to rely on others when you can easily store "it" yourself.

      --
      "I don't know, therefore Aliens" Wafflebox1
  36. I've seen porn collections like that... by GameboyRMH · · Score: 1

    ...on virus-infested Windows PCs.

    --
    "When information is power, privacy is freedom" - Jah-Wren Ryel
    1. Re:I've seen porn collections like that... by OeLeWaPpErKe · · Score: 1

      ... among those a download virus by your hand ? Otherwise, what's the point ?

  37. Silly by Anonymous Coward · · Score: 0

    My database will never reach 640K.

  38. Chuck Norris by Anonymous Coward · · Score: 0

    The hard drive was invented by Chuck Norris and he gave IBM the permission to use it. The petabyte is just a keyword Chuck Norris uses to describe the way he can take down Johnny Lawbreakers with his teeth.

  39. My only concern by Anonymous Coward · · Score: 0

    Is if this will run on Linux

  40. How is this news? by Dark$ide · · Score: 4, Interesting

    We've had petabyte databases on mainframes for a good couple of years. DB2 v9 on zSeries has two new tablespace types that make managing these humungous databases much easier.

    So it may be news for the PC world but it's bordering on ancient history on IBM mainframes.

    --

    Sigs. We don't need no steenking sigs.

    1. Re:How is this news? by Anonymous Coward · · Score: 0

      So it may be news for the PC world but it's bordering on ancient history on IBM mainframes.

      But I believe the biggest database of them all is the Internet, yes, Google is just an index into the biggest DB of them all. Completely distributed, dynamic in every respect and world wide web access to all that connect.

      I defend this assertion that the web is a database, as a database is a mater of perspective. I create a query, and along with billions of other queries every hour seek out data.

  41. Gigabyte barrier? by MaxEmerika · · Score: 1

    He meant that the terabyte barrier (not the gigabyte barrier) was broken fifteen years ago, correct?

    1. Re:Gigabyte barrier? by JPLemme · · Score: 1

      I'm assuming, because back in 1992 I remember reading that MCI had a 1 TB (!) database. It was big enough news to make it into PC Week.

    2. Re:Gigabyte barrier? by Ant+P. · · Score: 1

      That was probably a mistake. Even 20 years ago there were multiple-gigabyte single drives available - I came across a few in the trash a few years ago but the date of manufacture was still legible on the cover.

  42. The 1-petabyte Barrier is flattened by cjjjer · · Score: 2, Informative

    Seems that Yahoo made this claim months ago but for a 2 petabyte database. The article goes on to list a couple of others that have more than 2 petabytes of archived data. So it's safe to say that the petabyte data barrier has been broken for some time.

  43. I think this article can be a bit misleading by Anonymous Coward · · Score: 0

    The article is a bit misleading, and the numbers (IMHO) are a bit sensational. Saying you have a 1 Petabyte database is all fine and good, but how are you measuring that? Total database size? Raw input data size?

    I'm guessing it's the former, which skews the results. Any DBA knows that when you load data, you have to index it, maybe partition it, etc -- all of which lead to additional space allocation and overhead, inflating the total size of the database anywhere from 2-8x (or more) the original size.

    And, there are databases out there that actually compress the raw data size, and make it MUCH smaller than it was originally (and they perform WAY better than the DB's in the article).

    Nothing exciting here.

    1. Re:I think this article can be a bit misleading by CurtMonash · · Score: 1

      how are you measuring that? Total database size? Raw input data size?

      It's true user data. I make a point of that.

      Following through to the links re Teradata gives a sense of what kind of back and forth that can engender.

      CAM

      --
      To err is human. To forgive is good system design.
  44. Raw storage vs "actively analyzed" data sizes by Wooly41 · · Score: 1

    One thing to keep in mind in this whole argument is how what is the system capacity of a given data warehouse deployment vs the amount of data being actively analyzed. As an example of this, MySpace has a ~400TB data warehouse where roughly 120TB of user data. Not sure if some of the references above are counting "active" vs "capacity". http://www.asterdata.com/

  45. Yawn by Anonymous Coward · · Score: 0

    I work in big storage. We have customers who want support for petabyte-sized _files_. I know of at least one company that was looking to buy several Pb of SAN a month.

  46. Small potatoes by ctnp · · Score: 1

    The NOAA lab I work at is up to 16 petabytes now. Must've broken the 1 petabyte mark several years ago. :P

  47. LHC data production by SlowMovingTarget · · Score: 3, Informative

    So when active, the Large Hadron Collider will generate the equivalent volume of data of 50 Libraries of Congress every second.

  48. Girls Gone Wild by Anonymous Coward · · Score: 0

    This is kind of my point. Do companies keep libraries of pr0n, video, music?

    The video production team of Girls Gone Wild does.

  49. Please, Bill says... by Anonymous Coward · · Score: 0

    640 terabytes ought to be more than enough for anyone. There, now you know how big your HD will need to be to qualify as Vista Capable.

  50. What about Aster Data? by cecchet · · Score: 1

    The post surprisingly does not mention Aster Data Systems which is the datawarehouse behind MySpace. When web sites start to store and analyze every single user click then you quickly get into massive amount of data. It's no surprise that the Petabyte barrier is reached especially with the density of storage increasing at constant cost.

    1. Re:What about Aster Data? by CurtMonash · · Score: 1

      The post surprisingly does not mention Aster Data Systems which is the datawarehouse behind MySpace. When web sites start to store and analyze every single user click then you quickly get into massive amount of data. It's no surprise that the Petabyte barrier is reached especially with the density of storage increasing at constant cost.

      I met with Aster Data last Thursday, and will be writing about them soon. Aster's MySpace installation is a big database. But it's not petabyte-scale yet.

      CAM

      --
      To err is human. To forgive is good system design.
  51. Pet peeve: misuse of "barrier" by JoeBuck · · Score: 1
    Round numbers are not "barriers", they are just round numbers. The term "barrier" should only be used when there is something special about the number that creates special engineering challenges to overcome.

    Example: the sound barrier. The aerodynamics of a moving airplane are completely different when traveling faster than the speed of sound, than when traveling slower, so it was a real barrier that required engineering effort to overcome.

    Another barrier had to do with fabricating electronic components when the feature size became substantially smaller than the wavelength of light used to expose the masks. My old textbooks said it couldn't be done, but thanks to optical proximity correction and phase-shift masking, we can fabricate 45nm technology semiconductors with a 193nm ultraviolet light source.

    But there is no radical new technology innovation needed to make a database just a bit bigger, even if an extra zero gets added to its size.

    1. Re:Pet peeve: misuse of "barrier" by CurtMonash · · Score: 1

      You have a point.

      But the nice round numbers lead to marketing false alarms, so I think it's noteworthy when hype gives way to reality.

      This also happens to be an area that lends itself to round numbers right now, since 10 terabytes is about the level where Oracle has totally run out of gas, and 100 terabytes used to be the hard limit on Netezza configurations.

      CAM

      --
      To err is human. To forgive is good system design.
  52. Greenplum is based on Postgresql by TheNarrator · · Score: 1

    From the Greenplum article mentioned in the summary:

    Most or all of the PostgreSQL data access methods are left intact. The big changes to PostgreSQL lie in the areas of query optimization, planning, and execution. I.e., Greenplum has its own way of breaking up a query into pieces â" and of course of seeing that data gets shipped among nodes â" but the low-level operators for storage and access are from PostgreSQL.

  53. Pedabyte databases over a decade old by Anonymous Coward · · Score: 0

    Ten years ago I was working in a bank that was dealing with a 4 pedabyte database. In order to store check images electronically, they have to be retained for seven years, front and back.

    It's probably a lot bigger than that now.

    1. Re:Pedabyte databases over a decade old by Anonymous Coward · · Score: 0

      I think you mean Pedobite.

  54. More long-tail economics! by mcrbids · · Score: 4, Interesting

    On the other hand, if you have the extra space, it invites the usual waste in the form of archive directories for closed-out years, development junk, etc. Spinning round and round, doing nothing.

    Yep. That's exactly it. $200 today buys a 1 TB drive. $200 a few years ago bought a 1 GB drive. As the price has fallen the value of the HDD has risen relative to its cost. Those archive directories and development junk aren't being deleted because they have value. Sure, it's enough value to justify keeping them around when a 1 GB drive costs $200, but they are worth keeping around with a 1 TB drive costs that much.

    They aren't "doing nothing" - they just aren't doing enough that it's worth keeping it until the price drops enough.

    All of this is making the 1 TB drive considerably more valuable than the 1 GB drive, despite their original purchase price parity. This is long-tail economics at work. As the individual bits become worth less and less, the value in of the bits in total continues to rise, resulting in a completely new set of capabilities.

    My DVR is an excellent example of this - it's a thorough change in the way that I watch television. Suddenly, it's a family event that we can all share, because when I want to comment, I can just hit pause, and share my thought. Nothing's lost, if needed we can just hit rewind a bit, and suddenly, instead of being annoyed at my daughter for wanting to comment on a point during a televised debate, I'm excited and interested! No more SHUSHSTing at my family, it's now a much more shared experience.

    The price of nonlinear access media has dropped so incredibly that marginal-value bits (like video) are suddenly cheap enough to make it all possible.

    --
    I have no problem with your religion until you decide it's reason to deprive others of the truth.
  55. Teradata by Thuktun · · Score: 1

    Teradata may have crossed the 1-petabyte mark by now too.

    Sounds like it was precipitously named, then.

  56. Xiotech is doing it by Lxy · · Score: 1

    Check out the Emprise 7000. Scales from 1 TB to 1 PB

    If you unracked it, you could squeeze it all into a single Volkswagon, yielding 100 Loc/VW (Libraries of Congress per Volkswagon).

    --

    There is no reasonable defense against an idiot with an agenda
    :wq
    1. Re:Xiotech is doing it by XHIIHIIHX · · Score: 1

      OMG, Imagine the bandwidth of that sucker screaming down the highway flat out at 71 mph.

    2. Re:Xiotech is doing it by DamienRBlack · · Score: 1

      With last-mile problems around here, that may end up being the solution.

  57. Marketing gigs, or real gigs? by newr00tic · · Score: 0

    The problem is, whether the "stated carrying capacity" is stated in "fake GIGs" or real gigs. A recipe for unfortunate events, if he'd get it wrong.

    --
    A horse can't be sick, you know, even if he wants to.
  58. $200? by White+Flame · · Score: 1

    $200 today buys a 1 TB drive.

    For $200 you could almost get two 1TB drives.

  59. SizeMeNow by Petja42 · · Score: 1

    ...And for all your, erm, "entertainment" collection sorting and comparison needs, you can use a little app appropriately called SizeMeNow to see how much space each folder uses:

    SizeMeNow Info