CERN Collider To Trigger a Data Deluge
slashthedot sends us to High Productivity Computing Wire for a look at the effort to beef up computing and communications infrastructure at a number of US universities in preparation for the data deluge anticipated later this year from two experiments coming online at CERN. The collider will smash protons together hoping to catch a glimpse of the subatomic particles that are thought to have last been seen at the Big Bang. From the article: "The world's largest science experiment, a physics experiment designed to determine the nature of matter, will produce a mountain of data. And because the world's physicists cannot move to the mountain, an army of computer research scientists is preparing to move the mountain to the physicists... The CERN collider will begin producing data in November, and from the trillions of collisions of protons it will generate 15 petabytes of data per year... [This] would be the equivalent of all of the information in all of the university libraries in the United States seven times over. It would be the equivalent of 22 Internets, or more than 1,000 Libraries of Congress. And there is no search function."
Okay, the Library of Congress has been estimated to contain about 10 Terabyte, so I buy the 1000 * LoC = 15 Petabyte. But archive.org alone expanded its storage capacity to 1 Petabyte in 2005, so the CERN is not going to generate anything near "22 Internet" (whatever that might be). This estimate from 2002 calculates the size of the internet as about 530 Exabyte, 440 Exabyte of which are email, 157 Petabyte for the "surface web"
memomo: free web based language trainer DE-EN-ES-FR-IT
I hope they're planning on running their own fiber optic line across the Atlantic, or shipping a lot of hard drives, cause thats too much data to pass over the public internet.
FYI 15 petabytes per year = 120 petabits per year = 120,000,000 gigabits per year
120,000,000 gigabits per year / ~30,000,000 seconds per year = 4gbps of continuous transmission. They could run a fiber across the Atlantic that could handle 4gbps.
Google it?
If Google is so awesome, maybe they can put their money where there mouth is and do something commendable. Of course, they'll probably have a hard time turning this data into marketing material.
im sure google would love to get their hands on this data, they are like one of them energy alien beings from star trek that feed and grow, except google grows on data
Really...
The CERN collider will begin producing data in November, and from the trillions of collisions of protons it will generate 15 petabytes of data per year... [This] would be the equivalent of all of the information in all of the university libraries in the United States seven times over. It would be the equivalent of 22 Internets, or more than 1,000 Libraries of Congress. And there is no search function.
And 60% of it will be porn.
-
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
What about the backups?
So, is the above quote simply a poster who doesn't know what he is talking about (someone more interested in a catchy phrase in an article than in actually disseminating facts), or are these colliders actually capable of generating particles that haven't existed since the big bang? I tend to think the former - but I'm not a physicist, just a geek.
The real fundamental question is not about beginning of the universe, but something much much more important: Are they going to backup the data?
On the other hand, I'm sure it will be available on some torrent soon.
Revenue from advertising, as always.
Get yer hot fresh strange quark...
You know with the right sort of particle accelerator you could send messages straight through the Earth and save a heap of latency.
http://michaelsmith.id.au
1 "internet" is being used as the amount of data transfered in a given period.
Lepton dancers wearing gluons.... WHOA!
So long as it's not needed right now pretty much any amount of data can be transmitted.
Would that be 0.84 Internet per forthnight? Or 1 kiloLibrary per Congress session? How much in tubes?
Now, the scientists can be serious when they say "I'll give you 22 Internets if you can get me a shop of Einstein dancing with Paris Hilton" on a 4chan board.
How many fulltime jobs can one man have?
The main difference between the LHC data and the Internet is that all that 15 PB of data will come in a standard format, so a search is much easier to perform. In fact most of the search will consist on discarding non-interesting stuff while attempting to identify the very rare events that may show indications of new particles (Higgs for example). The Internet is a lot more diverse, the variety of information dwarfs the limited number of patterns LHC is looking for, so "no search available" for LHC data sounds more like "no search needed".
Both the accident mentioned in that article, and the earlier one linked to, are the result of work by fermilab, providing equipment/calculations for cern. Just another case of sabotaging your competitors' work by providing faulty equipment... and just like when the CIA did it to Soviet energy supplies, with little concern about the potential loss of human life.
Americans don't need science, the bible tells them everything they need to know.
You are aware that the magnet that blew up was built by Fermilab and it was on their side that the mistake was made?
Physics locker room.
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
It would be the equivalent of 22 Internets, or more than 1,000 Libraries of Congress.
You mean W was right all along??
I'd give my right arm to be ambidextrous.
I've got a library of congress in my pants.
Remember, this data can only get out per the size of CERNs Internet pipe(s) - so even if they have, say 5 10Gig-Ethernet connections - that's not much effect on big OC backbones. I'm just guessing, but I don't think CERN has HTTP/FTP servers right on a OC Internet backbone, or the server structure (think magnitudes greater than Google's) to drive the data.
-- www.globaltics.net
Political discussion for a new world
This is really bad news. By defining the amount of data in LoC's, they leave themselves open to a huge exploit... If the LoC ever includes this data, then there will be a recursive loop of definitions and the LoC will expand to fill the universe.
Okay... maybe not, but if they ever did put this data in the LoC, the effort required to re-factor all the LoC based measurements would bankrupt the world. And the confusion that goes on while this re-factoring is happening will surely crash at least one probe into Mars, where the English have used the new LoC units and the Americans will have used the old LoC units.
Perhaps Shakespeare will be generated as data as well? http://en.wikipedia.org/wiki/Infinite_monkey_theor em
FTA:"catch a glimpse of the subatomic particles that are thought to have last been seen at the Big Bang."
Who was at the Big Bang to see them then? I suspect that the numbers are a lot lower than the number of people that heard that tree fall in the woods and heard the sound of one hand clapping put together.
[The Universe] has gone offline.
That line is some of the worst hyperbole ever. Here's why. First, there was (almost by definition) no one there to 'see' anything at the Big Bang. (Supernatural explanations aside, and this purports to be a science article.) Second, these subatomic particles are formed frequently in nature, as high-energy astronomy has found various natural particle accelerators that are FAR more powerful than anything we're likely to build on Earth.
One hopes the author will do better next time.
Galileo: "The Earth revolves around the Sun!"
Score: -1 100% Flamebait
A black hole made earth go into neverland. http://en.wikipedia.org/wiki/Hyperion_(novel)
and also put some Library of Congress saucing on it.
Read radical news here
Americans don't need science, the bible tells them everything they need to know.
The US had its chance to build a bigger and better particle collider: the Superconducting Super Collider.
In 1993 the SSC, already partway-built, was cancelled by the Democrats who controlled Congress at that time. If the SSC had not been cancelled, we would already have discovered the Higgs boson 4 or 5 years ago.
So, yes, in 1993 the US Government reneged on international commitments to scientists from Europe and Japan, and set back the progress of science by years.
But don't blame the Bible. Blame congressmen like Tom Foley and Dick Gephardt, who preferred to lard up the farm bill with as much pork as possible.
All that space to store. "Hit, hit, miss (doesn't matter), hit, miss (doesn't matter)"...
dnuof eruc rof aixelsid
it will generate 15 petabytes of data per year...
Umm, question. Is this BEFORE or AFTER time stops?
Seven puppies were harmed during the making of this post.
But how many rumors are there going to be on those Internets?
"Let's face it, it's a good story. Accuracy would kill it."
Rule 1 and 2, asshole. GB2 gaia
Seven puppies were harmed during the making of this post.
ISTR reading an article on this several years ago in which Cern people said that they just accepted the fact that they were going to lose massive amounts of data every year because backing up such huge amounts of data just wasn't possible.
Not to mention this!
Seven puppies were harmed during the making of this post.
Americans don't need science, the bible tells them everything they need to know.
Americans don't need science, Fox News tells them everything they need to know.
Seven puppies were harmed during the making of this post.
we would already have discovered the Higgs boson 4 or 5 years ago.
Only if it really exists... how can you discover something that you have already discov...gurk too much recursion.
Seven puppies were harmed during the making of this post.
You were aware that it wasn't the magnet which blew up, but rather the supporting structure that "kicked" when the magnet quenched?
Last I heard, they'll be able to add to the structures in-place. FNAL will have to spend some money, but things will be fixed without delaying the project.
And you were aware that FNAL's work passed multiple independent review committees and CERN signed off on it? It just turned out that the same oversight was made by all.
In the end, a little egg-on-face for the US, but not a huge deal.
You Win One Internet.
.22 Internets: Rimfire serious business.
Truth: There are several news agencies that have booked flights to descend upon CERN at the "supposed" start of the LHC in November. What will they come and see, lots of hype and not much!
What will happen? Single beam commissioning earliest in May. Collisions probably in August. Not earlier.
I hate being a Anon Coward, but there you go... Yes, I am sitting at a CERN office right now.
It's a data storm.
The NSA will have to scan the data for potential terrorist Tachyons hiding among the Bosons. That will slow things down a bit.
There are some other benefits to building such a huge network of high powered computers. And it's not the teleportation you thought, it's more copying of metadata and re-creating the original.
Think about it, the only thing stopping us is the ability to store and transfer large amounts of data necessary to describe the precise makeup of a human being. I have a feeling this project will branch off into that area.
I suspect that 15 petabytes of data will actually be equivalent to at most a 2x the information in a number of standard model journal articles and texts. They just have to figure out the right compression kernel.
They should advertize perfectly when the collider is going online, so i can
turn off my computer because it is the only one i have, and i dont want my harddrive full of subatomic particles around my precious data.
?
Sounds like the article was written by Senator Stevens. Nothing to fear, 22 emails can't possibly clog our tubes.
"A deadlock has been reached. One task must die. We must now choose between murder and suicide."
The data deluge from all the HD media wasn't enough...my pc's can't keep up with all this data !!!
See, that's just too bad. They spend $8B on the project, and then they don't have a few million to spend on hard drives to save the data produced by the $8B machine.
Tsunami -- You can't bring a good wave down!
All kidding aside, that does sound like some pretty cool stuff.
My quantum computer has been working on downloading the torrent for the past few weeks.
Hmmm witty sig or funny sig? Maybe elitest techy sig!
Their ISP is gonna be pissed.
-USR1
You must be new here.
Help poke pirates in the eyepatch, arr.
That's a lot of tubes.
oh marmalade.
SATA speeds are 1.5 to 3.0 Gbps...
Free beer is never free as in speech. Free speech is always free as in beer.
Wonder if they'll hire or contract some Google engineers for a data mining effort. Personally I'd work for free to get a chance to mine that much data.
That's not "High Productivity Computing" Wire... the HPC in "HPC Wire" stands for High-Performance Computing.
C 03001USEN/GRC03001USEN.PDF
p aper.pdf
The real story on the ~15PB/year data store is to be found in these two sites:
This outlines the hardware environment supporting the data (IBM 3584 w/ Ultrium and IBM DS4400):
ftp://ftp.software.ibm.com/common/ssi/rep_sp/n/GR
This outlines the software environment (layered Tivoli Storage Manager and dCache):
http://www.dcache.org/manuals/tsm-symposium-2005-
Or is it?
Here, Sun posts how Storagetek supplied the tape storage:
http://www.sun.com/customers/storage/cern.xml
The LCG
Something could certainly be said about their computing backend of going through this data. It's called the LHC LCG (Large Hadron Collider Large Computing Grid) and is described here:
http://lcg.web.cern.ch/LCG/tdr/LCG_TDR_v1_04.pdf
Won't be stored on hard drives. At least, only a small portion of the total amount of data taken after a year or two will be stored on hard drives at any given time. The data gets archived on tape drives.
I don't know for sure that that is how it will be at CERN, but I know that that is how we do it at Fermilab, and I don't know of any change in technology between when that was set up and now that would invalidate the reasoning behind using tape at Fermilab. So, I would expect that CERN would do the same.
SIGSEGV caught, terminating
wait... not that kind of sig.
So... to hold that may Libraries of Congress worth of data, how big will the data server have to be?
Please express the answer in 'Volkswagens'.
Serving your airship needs since 1995.
> This] would be the equivalent of all of the information in all of the university
> libraries in the United States seven times over. It would be the equivalent of
> 22 Internets, or more than 1,000 Libraries of Congress.
$349,000, though I'm sure you'd get a decent volume discount for a thousand of these.
Oh wait, it won't be needed for a year. Halve that.
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
So the argument these experiments are safe, and that they won't introduce exotic states of matter that cascade out of control with regular matter, converting it, and destroying the Earth, is that far more energetic events occur in our upper atmosphere all the time (e.g. the WOW type particles hitting so hard and fast they mass as much as a bacteria and pack the momentum of a pitched baseball)
Yet they claim this all the time:
> The collider will smash protons together hoping to catch a glimpse of
> the subatomic particles that are thought to have last been seen at the Big Bang
So which is it? While I don't believe the experiments are dangerous, this does shoot down their "safety" argument above. Or is their claim really false (e.g. WOW particles would have introduced this via upper atmosphere collisions many times) and just advertising to sell it to politicians and the public?
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
They'll never be able to publish... odds are in all that data somewhere is the AACS key-of-the-week.
15 petabytes per year.
According to Gordon Bell and Jim Gray, recording one person's life as DVD video generates about 7 TB per year, so this is the same as generating life records for 2,000 people.
BTW, according to one trend line I've seen, the cost of a PB of raw storage will drop below $1,000 around 2020. This means that while it may cost ~$5,000,000 to store the first year's data, by 2020 you could store 13 years worth of the data (i.e. all of the data produced up to that date) for around $250,000. Double that if you want it mirrored.
Nothing for 6-digit uids?
If you look at consumer broadband, the US has about 50 million homes getting an average of 1.9 Mbps download speed - that's about 100 petabits/sec, though obviously the network's oversubscribed enough that they couldn't actually carry that much without broadband, but it's still likely to be well above 1 petabit/sec of sustainable throughput if there were enough servers available to pump data that fast. In about two minutes, CERN@home should be able to download the CERN collider's entire data set for the year...
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Perhaps someone with more experience than me with HPC can pipe in on the specifics, but it should be relatively trivial to create two, rather than one, data stores as the data is generated. Or if it's acceptable, just ship the original. Further, you wouldn't be stacking hard drives on a passenger plane, you'd charter an appropriately sized cargo aircraft. Or, more likely, throw it all in a standard cargo container for surface transport.
That's a *lot* less cash than running fibre, and while it has very high latency it has effectively infinite bandwidth.
This sounds like Bush's advance plan for Iraq.
Yeah -- after I wrote that, I asked someone, and was told the engineering run was "not likely" in November now. Oh well. More time to buy cheaper disks. :)
I guess the power consumption of tapes is much better :) It seems that offline storage makes it easier to overlook certain unexpected but possibly groundbreaking events, because it's much harder/more annoying to explore the data. But that's just a layman's view; I'm sure the experts have a better idea of what could be there.
Tsunami -- You can't bring a good wave down!
Err, I thought that CMS would jump on that! (I work with your direct competitor, and let me say, we are not as happy.) Sorry that I gotto remain anonymous... truely unfair, i know.
Can anyone say "data compression"? Most likely you could reduce this to one tenth the size.
Also, you would undoubtedly want some error correction encoding (which would add some percentage increase to the size of the data) as well.
When I was at a tour of the facility a couple of years ago I asked how they could store so much data so fast. They replied that in fact most (80-90%) of the data was lost instantly at the collision but that they could selectively record certain amounts of data that they would use to validate theories.
A terabyte really is a mega-megabyte (1024*1024*1024*1024). That's all that matters.
It is a useful property of the kilobyte to be a power of two size (many related reasons for this). As such, it would be bad if a terabyte was assumed to be decimal and not binary, because it could not be expressed in a simple mulitple of kilobytes, let alone the convienence of raising the coefficient to a power.
The only people it seems to actually bother are boorish computer enthusiasts who are trying to cobble together RAIDs on a shoestring budget and are outrage to discover that their 3-disk RAID5's of 500-marketing-GB apiece does not equal 1TB of porn storage.
Of course that is really 1-marketing-TB, so why this bothers them, I don't know. They should be bugging MS (oh, I'm sorry, Micro$haft) for a patch to shell32.dll that reports the base-10 size they expect.
Of course, when they come crying that the low-level disk-check, partitioning, and defragmenting tools actually report the "real" size to contrary, considering their reliance of "real KB" units for sector and cluster sizing... what are you going to do?
Just deal with it. XXXXbytes are not SI units, and never will be.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
The world's largest science experiment
Who writes stuff like that? It might be the worlds largest or most
expensive particle phisics experiment but I would have to count making
it to the moon as the largest over all hard science experiment and or
for political science maybe the reintroduction of democracy 2000 years
after the last time it failed on a large level might be bigger than
this.
Do they mean largest as in 'largest amount of data every to come from
one single test?' Because that isn't what they said.
Ascii artist &
Nobody is looking for particular events. Everything is statistical in nature. "Do the distributions of these umpteen variables, which are calculable for each event, match what would be expected from theory?" is the basic question. So, in some sense, there is no such thing as "groundbreaking events", and certainly pretty much nobody just goes exploring through the data (by which I assume you mean eyeballing individual events one after another). Offline storage has nothing to do with this; the culprit is just the sheer amount of data combined with the excessively low probabilities of all the interesting processes. Everything with a higher probability has been seen already.
And yes, before somebody asks, people regularly do analyses that are asking the question "Do the distributions of these umpteen variables not match what would be expected from theory, in a statistically significant way?" They very very very rarely find anything, but we keep doing these broad spectrum searches for new physics because it'd just be really cool to be the one to find something utterly unexpected.
Also, unless I'm mistaken, the price per bit for tape is better, and the long-term stability of the data on tapes is much better. Hard disks degrade quite rapidly in comparison to tapes. I have no idea about the power consumption.
SIGSEGV caught, terminating
wait... not that kind of sig.
Thanks for sharing the point of view! I understand your environment better now.
Tsunami -- You can't bring a good wave down!
"Well, yeah, but the probability is about the same as that of you generating a small black hole by clapping your hands together really hard."
Clark Kent could do it.
Does that mean that an unbelievable amount of data will come into being within a fraction of the first second, a phase called "Expansion"?
this is all interesting guys but let me tell ya something. i recently returned from a real trip to this place, yes CERN, and there are some great chicks there. Our group had a tour aroud the computer centre and actually underground to the biggest experiment they ahve there. its called ATLAS. one of the top chicks at ATLAS and you can find this info online easily, is called Connie Potter. Not only does she know her stuff but she is one seriously great lookin woman. man those guys sure know how to pick 'em. in any case, she says if anybody needed any info or whatever, she'd be happy to mail stuff out. could we start with her phone number ? hehe..