LHC Data Generation Expected To Scale Up To 400PB a Year
DW100 writes: Cern has said it expects its experiments with the Large Hadron Collider to generate as much as 400PB of information per year by 2023 as the scope of its work continues to expand. Currently LHC experiments have generated an archive of 100PB and this is growing by 27PB per year. Cern infrastructure manager Tim Bell, speaking at the OpenStack Summit in Paris, said the organization is using OpenStack to underpin this huge data growth, hoping it can handle such vast reams of potentially universe-altering information.
you mean how we see the universe? because i doubt the universe cares much about the data we generate....
To put this in perspective, Facebook states to be generating 4 PB per day, so 3.6 times more than the LHC. Does anybody know about anything generating more data than that?
I doubt that OpenStack can handle it, but if they have the $$ for it, I'm sure that it's no big deal for Oracle.
I don't respond to AC's.
Unless they are capturing pure noise, it should be compressable.
so they used to take photos of these (forced) collisions.
now they're converting the data to zeros and Ones and "printing" them on magnetic platters.
now the question, when do the two curves, that is photo printing and platter printing converge?
when might it be feasible again to store it on media you can examine under a microscope?
pretty sure you can forget all about difficult computer algorithms to pull and compare data, if you
can just overlay two, three layers of glass under the microscope and instantly SEE!
(Disclaimer: Grad student)
ATLAS generates O(PB) of raw data per second, but we only trigger on events that look interesting (e.g. have an isolated lepton, a sign that something more than QCD background happened in the event), and save those for offline analysis. That works out to something on the order of 100s of MBps being saved during run time. I assume the other experiments have similar data rates.
A few hundred of these should do: https://www.backblaze.com/blog/backblaze-storage-pod-4/
(3387 + 180000*0.0517)/0.18 = $70000/PB , For the already existing data it's 7 million, easily within the reach of CERN.
Of course that means very low reliability, but adding redundancy would still keep the cost around 10 megabucks.
But the LHC will be generating original data...
How do they archive all that data?
Seriously, all that data is being generated, how are they storing it? Standard magnetic tape backup?
The LHC is one approach. The "make it bigger and then it might get better" approach.
Another approach is to conceive of a completely different model. I have come up with a different model from the Standard Model, and Quantum Mechanics, and String Theory.
Spring-And-Loop Theory resolves issues the other three theories are stuck on. It is also simpler. Unifies the four forces. And works from the very small to the very large.
But it is a different approach. Most are not ready for this.
I come here for the love
and MAYBE 0.000001 PB of useful data, massaged down to a 10kB pdf file for publication for some hand-waving TOE argument. Welfare for engineers. physicists, and arithmeticians. ROI - zip.
As for the 4TB/day of "data" generated on facebook: washerwomen gossiping over the back fence. content - zero.
Actually the LHC generates more data than this. The talk is only talking about the data at CERN. The last count of all the files in the ATLAS experiment's DQ2 store (a distributed dataset access system with storage around the globe) was 161PB. This value includes all the simulated data, analysis data etc. I'm certain CMS has a comparable amount and then there are Alice and LHCb as well so the total will be well over the 300PB which Facebook stores.
While Facebook generates 4 PB of new data per day they only store 300 PB according to that page so most of this is either discarded or overwrites existing data. If we look at the LHC then the raw data rate is probably about 1 PB/min but we throw away most of this (using computers on the surface, not 100m underground as the original talk says) because it's physics we already know about and we can't afford the storage for it. Then there is the generation of new data by analysis and simulation to include.
So if you actually look at the whole system, not just what is at CERN, we have a larger total storage capacity and generate more data than Facebook...and we plan to scale up.
hmm... lets see
COBOL.. dead .. dead
BSD.. dead
TAPE
And yet there would be no LHC datacenter without tape.
ref:
http://information-technology....
http://www.economist.com/blogs...
http://storageservers.wordpres...
http://scribol.com/science/hal...
Have gnu, will travel.
That amount of data is something only an AI could get through.
"If any question why we died, Tell them because our fathers lied."
Sounds like they really don't know what they want, except they want it all. Using other people's money, of course!
Torrent link?