Slashdot Mirror


CERN Releases 300TB of Large Hadron Collider Data Into Open Access (techcrunch.com)

An anonymous reader writes: The European Organization for Nuclear Research, known as CERN, has released 300 terabytes of collider data to the public. "Once we've exhausted our exploration of the data, we see no reason not to make them available publicly," said Kati Lassila-Perini, a physicist who works on the Compact Muon Solenoid detector. "The benefits are numerous, from inspiring high school students to the training of the particle physicists of tomorrow. And personally, as CMS's data preservation coordinator, this is a crucial part of ensuring the long-term availability of our research data," she said in a news release accompanying the data. Much of the data is from 2011, and much of it is from protons colliding at 7 TeV (teraelectronvolts). The 300 terabytes of data includes both raw data from the detectors and "derived" datasets. CERN is providing tools to work with the data which is handy.

13 of 60 comments (clear)

  1. Pseudoscientists of the world, unite! by Lisandro · · Score: 5, Insightful

    I just can visualize a horde of crackpots using this data to fuel fringe theories, find messages from God and prove the existence of aliens.

    That being said, this is awfully cool from CERN. The raw data will be really useful in academic environments, and the Linux visualization tools are great.

    1. Re:Pseudoscientists of the world, unite! by Anonymous Coward · · Score: 2, Funny

      I just can visualize a horde of crackpots using this data to fuel fringe theories

      I heard from good authority that the LHC breached a planar dimension, and one of its red/white striped inhabitants escaped into the LHC data stream.
      So now they're releasing the data into the public in the hopes that someone will find this wimpy alien lifeform data object (waldo)...

    2. Re:Pseudoscientists of the world, unite! by starless · · Score: 3, Interesting

      Data from most NASA astronomy satellites is available after a specified amount of time.
      e.g. Hubble Space Telescope data are available after one year, and Fermi gamma-ray space telescope data are available as soon as it's processed (within one day).
      Software tools are also publicly available along with software support.

      Nice to see particle physicists catching up with astronomers on data release!

  2. No reason not to make them available publicly ? by x0ra · · Score: 5, Insightful

    If I'm not mistaken, the LHC has been publicly funded, so these data should have been public to start with. Anything else is bs.

    1. Re:No reason not to make them available publicly ? by religionofpeas · · Score: 2

      It wasn't publicly funded by the entire world, though, so it makes sense to restrict the data sharing to the scientists of the countries that helped funding it.

    2. Re:No reason not to make them available publicly ? by BitterOak · · Score: 4, Insightful

      If I'm not mistaken, the LHC has been publicly funded, so these data should have been public to start with. Anything else is bs.

      It's standard practice in experimental particle physics to give those who put the time and effort into designing, building, and running the experiment the first chance to analyze the data and publish results. After that, it's not unusual to release the raw data publicly. Otherwise, there'd really be no incentive to do the work, since someone else could swoop in and publish results without having contributed to producing the data.

      --
      If I can be modded down for being a troll, can I be modded up for being an orc, or a balrog?
    3. Re:No reason not to make them available publicly ? by 110010001000 · · Score: 2

      You mean the taxpayers of the countries funding it. After all, not only scientists were paying for it out of their taxes.

  3. Re: No reason not to make them available publicly by prefec2 · · Score: 2

    It was available to all scientists of the funding and visiting countries. Now as the scientists are through with it you can have a look too.

  4. Re:Library of Congresses? by Megol · · Score: 2

    US or metric LoCs?

  5. Re:Download cap? by NotInHere · · Score: 2

    By the time you have downloaded the 300 TB, they'll have built another, bigger, particle collider, and released an even bigger tarball about that one.

  6. Re:300TB by hackertourist · · Score: 2

    Congratulations on not following TFLinks. They did open-source the tools and provide instructions.
    You also don't need to download the entire 300 TB, the data is divided into batches.

    Available on the CERN Open Data Portal - which is built in collaboration with members of CERN's IT Department and Scientific Information Service - the collision data are released into the public domain under the CC0 waiver and come in types: The so-called 'primary datasets' are in the same format used by the CMS Collaboration to perform research. The 'derived datasets' on the other hand require a lot less computing power and can be readily analysed by university or high-school students, and CMS has provided a limited number of datasets in this format.

    Notably, CMS is also providing the simulated data generated with the same software version that should be used to analyse the primary datasets. Simulations play a crucial role in particle-physics research and CMS is also making available the protocols for generating the simulations that are provided. The data release is accompanied by analysis tools and code examples tailored to the datasets. A virtual-machine image based on CernVM, which comes preloaded with the software environment needed to analyse the CMS data, can also be downloaded from the portal.

  7. Raises the bar by SkyratesPlayer · · Score: 3, Funny

    Before this, the largest collection of collision data was the Russian dash-cam footage on YouTube

  8. Re: No reason not to make them available publicly by Baloroth · · Score: 2

    Why? What interest does the general population have in access to the LHC data? They've already release a subset of the data for educational purposes, in addition to this considerable data dump. It serves no public interest to make the whole data set available to everyone, and in fact would run contrary to the public interest: the data set is absolutely massive (the LHC produces petabytes of data per day), and the costs associated with making that data available to the public would be non-negligible.

    If a specific individual is interested in access to the data, they're certainly free to email their local (or not even necessarily local) university department associated with the LHC and ask for it, and they could probably get access to a subset of it, if they've shown genuine interest. And by "genuine interest", I mean have already downloaded, processed, examined, and understand much of the already publicly available data, to the point where they are capable of performing actual scientific research on the data, and aren't simply interested in wasting already-precious scientific research money and time in making some kind of political or philosophical point.

    --
    "None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton