Slashdot Mirror


CERN Releases 300TB of Large Hadron Collider Data Into Open Access (techcrunch.com)

An anonymous reader writes: The European Organization for Nuclear Research, known as CERN, has released 300 terabytes of collider data to the public. "Once we've exhausted our exploration of the data, we see no reason not to make them available publicly," said Kati Lassila-Perini, a physicist who works on the Compact Muon Solenoid detector. "The benefits are numerous, from inspiring high school students to the training of the particle physicists of tomorrow. And personally, as CMS's data preservation coordinator, this is a crucial part of ensuring the long-term availability of our research data," she said in a news release accompanying the data. Much of the data is from 2011, and much of it is from protons colliding at 7 TeV (teraelectronvolts). The 300 terabytes of data includes both raw data from the detectors and "derived" datasets. CERN is providing tools to work with the data which is handy.

60 comments

  1. Finally by Anonymous Coward · · Score: 0

    A legitimate use for my seedbox!

  2. Pseudoscientists of the world, unite! by Lisandro · · Score: 5, Insightful

    I just can visualize a horde of crackpots using this data to fuel fringe theories, find messages from God and prove the existence of aliens.

    That being said, this is awfully cool from CERN. The raw data will be really useful in academic environments, and the Linux visualization tools are great.

    1. Re:Pseudoscientists of the world, unite! by Anonymous Coward · · Score: 2, Funny

      I just can visualize a horde of crackpots using this data to fuel fringe theories

      I heard from good authority that the LHC breached a planar dimension, and one of its red/white striped inhabitants escaped into the LHC data stream.
      So now they're releasing the data into the public in the hopes that someone will find this wimpy alien lifeform data object (waldo)...

    2. Re:Pseudoscientists of the world, unite! by TheRealHocusLocus · · Score: 1

      "Once we've exhausted our exploration of the data, we see no reason not to make them available publicly,"

      Actually they're just sifting and patching out the winning lottery numbers first. In these ~300TB dregs you'd be lucky to find a Pick 3. Best suggestion is to make a list of numbers absent from the data and play those.

      There's also a lot of Quantum Space Spam in it, such as embedded 3D jpgs meant to be projected into 4D space showing reproductive attachments for higher dimensional beings.

      --
      <blink>down the rabbit hole</blink>
    3. Re:Pseudoscientists of the world, unite! by starless · · Score: 3, Interesting

      Data from most NASA astronomy satellites is available after a specified amount of time.
      e.g. Hubble Space Telescope data are available after one year, and Fermi gamma-ray space telescope data are available as soon as it's processed (within one day).
      Software tools are also publicly available along with software support.

      Nice to see particle physicists catching up with astronomers on data release!

    4. Re: Pseudoscientists of the world, unite! by Anonymous Coward · · Score: 1

      CERN has been releasing a lot of data since the 90s. The hard part is not deciding to release it and just copy paste the data to some webserver, but the process of documenting it and making sure there are tools to work with the data. Both the particle physics and astronomy fields have put a lot of time and money into developing these tools for decades. It is now at the point other smaller, data heavy projects can take advantage of the same the same tools with a lot less investment of manpower. But it still often requires a dedicated person on smaller projects, who may not have that to spare.

    5. Re:Pseudoscientists of the world, unite! by Altrag · · Score: 1

      horde of crackpots using this data to fuel fringe theories

      To be fair, most crackpots would manage to fuel their fringe theories just as well without this data.

    6. Re:Pseudoscientists of the world, unite! by Anonymous Coward · · Score: 0

      I'm pretty sure that particle physicists in the past have been reluctant to release their data not because they were mentally stuck in the 19th century, but because there has not been such an interest in the data (mainly, i guess, because of a lack of amateur-usable tools and some advanced physics is required to make sense of the results).

      in astronomy, the tools have been there for ages (i remember the ESO Scisoft package, dpuser, qfitsview ...) and most if not all of the data could be processed with those. they may not have been the specialized tools that some astronomers i know wrote for a specific problem, but they did the job. i have a hard time trying to remember similar tools for collider experiments, since - other than telescope data - every detector is more or less one-of-a-kind.

    7. Re:Pseudoscientists of the world, unite! by RockDoctor · · Score: 1
      But this saves the from the delays of writing a random-number generator to invent their supporting data with.

      I'm still expecting to see the crackpots doing the logical equivalent of reading from a book held upside down.

      --
      Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
  3. Download cap? by Anonymous Coward · · Score: 0

    Should only take 5 million years or so to download...how much is each extra meg after 1GB again???

  4. No reason not to make them available publicly ? by x0ra · · Score: 5, Insightful

    If I'm not mistaken, the LHC has been publicly funded, so these data should have been public to start with. Anything else is bs.

    1. Re:No reason not to make them available publicly ? by religionofpeas · · Score: 2

      It wasn't publicly funded by the entire world, though, so it makes sense to restrict the data sharing to the scientists of the countries that helped funding it.

    2. Re:No reason not to make them available publicly ? by Anonymous Coward · · Score: 0

      They needed time to photoshop in the higgs boson.

    3. Re:No reason not to make them available publicly ? by BitterOak · · Score: 4, Insightful

      If I'm not mistaken, the LHC has been publicly funded, so these data should have been public to start with. Anything else is bs.

      It's standard practice in experimental particle physics to give those who put the time and effort into designing, building, and running the experiment the first chance to analyze the data and publish results. After that, it's not unusual to release the raw data publicly. Otherwise, there'd really be no incentive to do the work, since someone else could swoop in and publish results without having contributed to producing the data.

      --
      If I can be modded down for being a troll, can I be modded up for being an orc, or a balrog?
    4. Re:No reason not to make them available publicly ? by NotInHere · · Score: 0

      And the miniature black hole they created, that wandered to the center of the earth since and that will eat up this planet from inside over the next few decades. Wake up, sheeple, we must leave earth before its too late! Scientist's experiment gone mad!

    5. Re:No reason not to make them available publicly ? by 110010001000 · · Score: 2

      You mean the taxpayers of the countries funding it. After all, not only scientists were paying for it out of their taxes.

    6. Re:No reason not to make them available publicly ? by Anonymous Coward · · Score: 0

      And the miniature black hole they created, that wandered to the center of the earth since and that will eat up this planet from inside over the next few decades. Wake up, sheeple, we must leave earth before its too late! Scientist's experiment gone mad!

      Like that bad Sci-Fi channel movie where scientists unleashed a black hole in st louis missouri from the St Louis science center????? Stupid shit.

    7. Re:No reason not to make them available publicly ? by joe_frisch · · Score: 1

      Unfortunately its not simple. Scientists and the organization that they work for are judged based on their publications. So a lab that spent a lot of money to build a new experiment need to show a lot of publications from that experiment or they won't get future funding.

      Its not a great system, but it is what is in place and its not obvious how to do better.

  5. 300TB by symes · · Score: 1

    I understand there are tools to work with these data, but even so, 300TB is a lot. Wouldn't it be better, assuming they want to encourage future generations of particle physicists, to open source the tools and provide better instruction on how one should manage these data? That seems like half the problem. No way will anyone in high school download 300TBs to play with. Even if they could, what would they use to play with it?

    1. Re:300TB by religionofpeas · · Score: 1

      I assume you don't need all the data if you just wanted it for education purposes.

    2. Re: 300TB by prefec2 · · Score: 1

      Nobody will use it in high school where people have problems with calculus. It might be helpful in college and university.

    3. Re:300TB by NotInHere · · Score: 1

      Yeah certainly, building the largest particle collider in the world is way more easier than copying 300 TB of data. And it will be way more fun, too!

    4. Re:300TB by Jumunquo · · Score: 1

      You manage it on a thumb drive, DUH! Make sure it's USB 3.0! Load the data into Excel, and you can make pretty graphs.

    5. Re:300TB by hackertourist · · Score: 2

      Congratulations on not following TFLinks. They did open-source the tools and provide instructions.
      You also don't need to download the entire 300 TB, the data is divided into batches.

      Available on the CERN Open Data Portal - which is built in collaboration with members of CERN's IT Department and Scientific Information Service - the collision data are released into the public domain under the CC0 waiver and come in types: The so-called 'primary datasets' are in the same format used by the CMS Collaboration to perform research. The 'derived datasets' on the other hand require a lot less computing power and can be readily analysed by university or high-school students, and CMS has provided a limited number of datasets in this format.

      Notably, CMS is also providing the simulated data generated with the same software version that should be used to analyse the primary datasets. Simulations play a crucial role in particle-physics research and CMS is also making available the protocols for generating the simulations that are provided. The data release is accompanied by analysis tools and code examples tailored to the datasets. A virtual-machine image based on CernVM, which comes preloaded with the software environment needed to analyse the CMS data, can also be downloaded from the portal.

    6. Re:300TB by Anonymous Coward · · Score: 0

      A typical capped cable connection gives you about three hundred gigabytes a month. This would take one thousand months to download, or eighty-three years.
      Yeah, you could probably build an LHC faster than that.

      Also, it would take thirty to sixty hard drives just to store it.

  6. Library of Congresses? by Macdude · · Score: 1

    300 TB?
    How many Libraries of Congress is that?

    --
    "Grab them by the pussy" -- President of the United States of America
    1. Re:Library of Congresses? by Megol · · Score: 2

      US or metric LoCs?

    2. Re:Library of Congresses? by Anonymous Coward · · Score: 0

      I think it would be metric LoC because CERN European. That means the data trove is likely much less than 300TB.

    3. Re:Library of Congresses? by Anonymous Coward · · Score: 0

      300 TB is now defined as 1 CERN

  7. Re:Download cap? by dsmatthews9379 · · Score: 1

    Yeah you really need to upgrade from Telex to something a bit more modern.

  8. Re: No reason not to make them available publicly by Anonymous Coward · · Score: 0

    Staff was paid.

  9. Re: No reason not to make them available publicly by prefec2 · · Score: 2

    It was available to all scientists of the funding and visiting countries. Now as the scientists are through with it you can have a look too.

  10. More useful delivery by Anonymous Coward · · Score: 0

    It may be better to stick this behind an API of some form where we can call subsets of the data. No one earth, outside of a handful of people, would have the infrastructure to play with this. Its not like we have 300TB SANs in our homes or schools.

    With an API some useful things like sampling, etc, could already be performed and made available along side the raw data. If people really wanted more that an API could deliver, they could define a sibset and have the API generate iso images of that data for download.

    1. Re:More useful delivery by armanox · · Score: 1

      Its not like we have 300TB SANs in our homes or schools.

      Not yet, anyway, but in the next five to ten years that might not be a problem any longer. Plus I would hope the data is in a more manageable form then just one giant tarball (is there any file system that allows for an individual file that big anyway?)

      --
      I'm starting to think GNU is the problem with "GNU/Linux" these days.
    2. Re:More useful delivery by Anonymous Coward · · Score: 0

      > is there any file system that allows for an individual file that big anyway?

      Pretty much any modern file system? Specific examples BTRFS, ZFS. Even some ancient ones like JFS, and NTFS if you don't use Window's own crippled driver for it.

    3. Re: More useful delivery by Anonymous Coward · · Score: 0

      I run a student organization that has close to 150 TB running in production, with a couple of hundred TBs in reserve. We have no budget and run only on donated hardware. Getting storage like that is not hard if you want to use it for good reasons.

    4. Re:More useful delivery by Anonymous Coward · · Score: 1

      The place I work for has a 374TB SAN that's EoL. We got some quotes for re-sale and best offer we were given was £3000.

      So a bit of a chunk of cash but not crazy.

  11. Re: No reason not to make them available publicly by x0ra · · Score: 1

    It should have been available to the whole population...

  12. Re:Download cap? by NotInHere · · Score: 2

    By the time you have downloaded the 300 TB, they'll have built another, bigger, particle collider, and released an even bigger tarball about that one.

  13. Re: No reason not to make them available publicly by Anonymous Coward · · Score: 0

    how many people have 300 TB of storage? it won't fit on your iPhone, buddy.

  14. El Psy Congaroo by SeaFox · · Score: 1

    Maybe now, we can unlock the mysteries of Steins Gate! Mwahaha!

  15. Re: No reason not to make them available publicly by prefec2 · · Score: 1

    It is now. Before that the people who developed the experiments got first access. I personally understand that perfectly. They invested decades of their lives.

  16. After reading through 300TB, you'll be at... by Jumunquo · · Score: 1

    human is dead, mismatch.

  17. Re: No reason not to make them available publicly by Anonymous Coward · · Score: 0

    I know I know, replying to AC, but you think the staff cared about being paid?!

  18. Re: No reason not to make them available publicly by Anonymous Coward · · Score: 0

    "If I'm not mistaken, the LHC has been publicly funded, so these data should have been public to start with. Anything else is bs."
    "It should have been available to the whole population..."

    Such massive ignorance about how International Science actually works means only one thing- x0ra is not very... bright.
    So the Cern Supreme Soviet Central Committee just met and decided that everybody with the means and intelligence, and who wanted access to the raw and filtered data, could have it- except for x0ra, who is a poopy-head. He even admits it; this is just from the last month:

    "...What you call "press" is an awfully one sided propaganda machine selling the UN agenda...."
    "...I don't mind being called a racist / bigot / whatever..."
    "...hummm! my dick just getting hard again !..."
    "did you mean http://goatse.info/ ?"
    "we're not only trying to fuck "stuff", but pussies, tities, mouthes, asses as well !"
    "...I *do* believe we have a population problem. Hopefully, war / disease / climate change will take care of the problem within the next decades."
    "I'd like to, but libtards fuck it up beyond recognition."
    "Oh, and by the way, while I'm a pretty selfish prick,..."
    "SJW need targets to spur their hatred..."
    "Screw millennials, they need a crash course into real life...
    "Typical SJW asshole argument..."
    "yeah, I always forgot /. is a hideout for SJW and other anti-capitalist crypto-communist anarchists..."
    "We should ban bananas !"
    "my dominant hand is busy doing something else..."

    Some may object that I'm quoting x0ra out of context. In context, he is even worse. But I still think that he should stay around as a cautious example for our Youth- One should not put a diaper on their head after already soiling it.

  19. Great! by Anonymous Coward · · Score: 0

    Can't wait to print it all out!

  20. Re: No reason not to make them available publicly by AmiMoJo · · Score: 1

    Cool. Where's the torrent? It's not in TPB yet.

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  21. Raises the bar by SkyratesPlayer · · Score: 3, Funny

    Before this, the largest collection of collision data was the Russian dash-cam footage on YouTube

  22. 300 TB. How many floppies? by edxwelch · · Score: 1

    Just curious how many floppy disks would it take to store 300 TB?

    1. Re:300 TB. How many floppies? by SkyratesPlayer · · Score: 1

      Let's assume 3.5" form factor. Using HD floppies, you need 20 million. The one on my desk is 3.25 mm thick, so they'd make a stack 65 km tall. In the spirit of Randall Munroe's What If, it would of course collapse and kill you long before it got that high.

    2. Re:300 TB. How many floppies? by HiThere · · Score: 1

      Why not use the 8 inch hard sectored floppy disks? Of course there's nowhere you could either read or write them... but they were a bit thinner, and I think they stored all of 100KB.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    3. Re:300 TB. How many floppies? by SkyratesPlayer · · Score: 1

      Why not use the 8 inch hard sectored floppy disks? Of course there's nowhere you could either read or write them... but they were a bit thinner, and I think they stored all of 100KB.

      Because I am willing to bet there were more 3.5" floppies made than all other types of removable media put together. I have a DEC RX-02 somewhere, but haven't had to use it since 1998... ISTR they held about 500kB.

  23. Re: No reason not to make them available publicly by Anonymous Coward · · Score: 0

    Sure, staff cares about being paid. However, research is research... and speaking as a long-time researcher, having worked in several major universities, there is no way to know on a given day whether a colleague is onto the next great idea or is staring at the wall. Frankly, most of the time, I don't even know if I'm wasting my time or not. So some basic salary to keep people in academic research is needed, and then some extra perks are still required too. Since we judge people (for hiring, for promotions, ...) based on publications, rights to first publish results you worked on are standard throughout science.

  24. There sure are a lot... by flargleblarg · · Score: 1

    Sure are a lot of articles about the Large Hardon Collider lately.

  25. Download Link by Anonymous Coward · · Score: 1

    http://opendata.cern.ch/about/CMS

  26. Re: No reason not to make them available publicly by Baloroth · · Score: 2

    Why? What interest does the general population have in access to the LHC data? They've already release a subset of the data for educational purposes, in addition to this considerable data dump. It serves no public interest to make the whole data set available to everyone, and in fact would run contrary to the public interest: the data set is absolutely massive (the LHC produces petabytes of data per day), and the costs associated with making that data available to the public would be non-negligible.

    If a specific individual is interested in access to the data, they're certainly free to email their local (or not even necessarily local) university department associated with the LHC and ask for it, and they could probably get access to a subset of it, if they've shown genuine interest. And by "genuine interest", I mean have already downloaded, processed, examined, and understand much of the already publicly available data, to the point where they are capable of performing actual scientific research on the data, and aren't simply interested in wasting already-precious scientific research money and time in making some kind of political or philosophical point.

    --
    "None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton
  27. But what about the terrorists? by Anonymous Coward · · Score: 0

    CERN releases data. And the Jihad begins?

  28. interesting but by Anonymous Coward · · Score: 0

    ponder how high tech God is.

    Jesus is the way to God.