My bad for forgetting the "link connexion" information. Though your computation is likely to be inaccurate. The problem looks like a large sparse matrix problem. Since memory is the issue here, you do not store "from" and "to" for each "link", you use a Compressed Storage by Row type of datastructure, which only need to store either "from" or "to" (you need an extra array but at that scale it basically does not count).
Also, you do not need 64 bits to store identify a neuron 40 bits are certainly enough. Moreover, if you use a 2d partitioning of the link information, 32 bits is probably enough (you basically encode which part of the matrix you are storing within your MPI rank).
So in this computation, you are good for 8 bytes per connection, which boils down to "only" 800 TB. Sequoia is already twice bigger than that.
I do not know the detail of these algorithm and you seem to believe there is more data involved that meet the eye. (And you are certainly right, I do not know anything about that type of applications.) If you need to run genetic algorithm on each synapse, that is going to take a while, genetic algorithms are slow algorithm. That make them suitable for out of core computing.
Anyway, the project is definitely a leader-class project, but I do not see it as unfeasible. Sequoia is an "old" machine (plugged in in 2011), certainly a custom built machine will have enough memory. It might take a couple iterations of the design to reach there, but we already are in the right ball park.
"To put it in perspective, that 86 billion neurons would be 86 "giga-neurons"; huh, conceptually not too overwhelming. Then we have the 100 trillion connections between them, or 100 "tera-connections"? Forget it."
I am not sure how much data you need to store for each neuron and for each link. But assuming a float for each (which is probably an underestimation), we are around 400 tera byte. Sequoia (one of the super computer at LLNL) has 1.6Peta byte of main memory.
So actually, it is not THAT big. Especially since we can certainly perform the computation out-of-core.
Actually I think the problem is that it is called "maternity leave". If you called it "gestation leave" and allow to take it anytime within 3 months of the delivery data (before or after). Then I am sure no-one will have a problem with that.
I don't know... This heterogeneous computing with low latency seems interesting if it does not harm raw performance. The main advantage would be to transport data back and forth between the two. If the computation on one side is long, then the decrease in latency is not very useful. If both of them are really fast, then there is not too much to gain to begin with.
It really helps when you need fast turn around so for small and very synchronous computation. I am waiting to see one good usecase.
Well, that's a valid questions. And depending on the case, the answer can be different. Many applications (especially the ones that will use GPU clusters) will need a good interconnect. Cray provides that on its machines. Last time I checked cloud platform, they did not have a suitable interconnect (10Gig ethernet has high latency).
many slashdotters points to the extreme price of that graphic card for gamers. I am more interested in the GPGU performance. The comparison uses OpenCL to be able to compare against nvidia's hardware. But I feel like OpenCL is a second class citizen for nvidia. How much the performance difference would be using a carefully crafted CUDA implementation on the nvidia hardware?
I am currently using the multiarch x86. It works fine for libraries, but you can run into problem with executables. You can install libfoo in both 64 and 32 bit. But you can not do the same with programs. That should be fixed in the future, but is broken for now.
armhf+armel, I do not know, but I'd assume it works just the same.
That's not entirely true. If you pay a phd student and he contributes to 2 journals papers a year, half the cost of the student goes to each paper. A student can easily cost more than $50K a year. That's more $25K per paper just for the students salary. Count the professor salary (even excluding teaching) and you add up some. If it is part of a collaboration there is more than 2 authors.
In my lab, 3 or 4 authors is common. With one student as a main work force, one postdoc or junior professor as main coaching and one full prof as more distant strategic and advising. That sums up quite fast just in salary.
Indeed, our database comes from citeseer, DBLP, PubMed Central and Arxiv. If you know where to find appropriate data in other fields, please contact us!!
Yes it is somewhat similar to that. I believe we are using similar algorithms tod o it. The twist in theadvisor is that you can select the "input papers". google scholar uses you publication list as inputs. In the advisor you provide the input list. It allows to perform more targetted searches.
If you have problem finding papers, I recommend you try academic search engines. At OSU, we developped theadvisor ( http://theadvisor.osu.edu/ ). It is a webservice that allows you to search paper that are similar to what you already know. You basically upload a set of papers you know are relevant and the system find what is around.
We are still working on improving the quality of the database, but I strongly believe that these approaches are the way to go.
Oh! thanks for that information. I was looking for it and could not find it. That document is real corporate document. With chunks of reall corporateness in it.
This is actually a common things in academic journals. When I publish a paper, I have the "opportunity" of making the paper "open access" by paying some amount of money. It is a fairly standard practice.
There certainly are ways to do that. But it would require the community to move away from them. As a recently hired assistant professor, my tenure will be evaluated partially based on my publication track in "good journals". So I will publish wherever my tenure commitee believe is good. Currently this happens to be where publishers are.
The things is that there is mostly nothing to be paid. the editor-in-chief and the editorial board is not generally paid. The reviewers are not paid. Most readers access electronic versions and the paper version are almost never opened. So the actual cost is extremely low for the publisher. The only thing the publisher provide now a days is grammar check and spell check and text layouting. Anybody that worked in the field would tell you that mostly that part of the job is not properly done, especially text layouting. I often need multiple rounds with the publisher before I agree on their text layout.
So in brief they do not produce anything of value on the documentitself. They do print it but nobody cares. They do provide web access. But that could be done as the physicists do by publishing everything in arxiv first.
I am going with t-mobile right now, and I am obviously not going to change to something else. Go t-mobile!! Currently my phone has been "paid off" but my wife's did not (becasue she changed it later). So we are still paying the subsidized price for both phones.
Oh that's good to know. I wish I had mod points. I always thought swiping the app away meant "I am not going to use that, you can unload it" which would be a nicer version of force stop.
My bad for forgetting the "link connexion" information. Though your computation is likely to be inaccurate. The problem looks like a large sparse matrix problem. Since memory is the issue here, you do not store "from" and "to" for each "link", you use a Compressed Storage by Row type of datastructure, which only need to store either "from" or "to" (you need an extra array but at that scale it basically does not count).
Also, you do not need 64 bits to store identify a neuron 40 bits are certainly enough. Moreover, if you use a 2d partitioning of the link information, 32 bits is probably enough (you basically encode which part of the matrix you are storing within your MPI rank).
So in this computation, you are good for 8 bytes per connection, which boils down to "only" 800 TB. Sequoia is already twice bigger than that.
I do not know the detail of these algorithm and you seem to believe there is more data involved that meet the eye. (And you are certainly right, I do not know anything about that type of applications.) If you need to run genetic algorithm on each synapse, that is going to take a while, genetic algorithms are slow algorithm. That make them suitable for out of core computing.
Anyway, the project is definitely a leader-class project, but I do not see it as unfeasible. Sequoia is an "old" machine (plugged in in 2011), certainly a custom built machine will have enough memory. It might take a couple iterations of the design to reach there, but we already are in the right ball park.
"To put it in perspective, that 86 billion neurons would be 86 "giga-neurons"; huh, conceptually not too overwhelming. Then we have the 100 trillion connections between them, or 100 "tera-connections"? Forget it."
I am not sure how much data you need to store for each neuron and for each link. But assuming a float for each (which is probably an underestimation), we are around 400 tera byte. Sequoia (one of the super computer at LLNL) has 1.6Peta byte of main memory.
So actually, it is not THAT big. Especially since we can certainly perform the computation out-of-core.
I am taking the debian approach to it. Apparently, I am using firefox 10.0.12
And you know what, I so much do not care which firefox I am using...
Thank you Dr Dyson for sharing your views on the world with us!
a fellow engineer has a vagina between her legs.
Picture or it did not happen!
Actually I think the problem is that it is called "maternity leave". If you called it "gestation leave" and allow to take it anytime within 3 months of the delivery data (before or after). Then I am sure no-one will have a problem with that.
That could be easily solve in a jabber like way: you just need to add the server at the end of the login. talk to ikanreed@effnet
Is there a study that backs this assertion up?
I don't know... This heterogeneous computing with low latency seems interesting if it does not harm raw performance. The main advantage would be to transport data back and forth between the two. If the computation on one side is long, then the decrease in latency is not very useful. If both of them are really fast, then there is not too much to gain to begin with.
It really helps when you need fast turn around so for small and very synchronous computation. I am waiting to see one good usecase.
Well, that's a valid questions. And depending on the case, the answer can be different. Many applications (especially the ones that will use GPU clusters) will need a good interconnect. Cray provides that on its machines. Last time I checked cloud platform, they did not have a suitable interconnect (10Gig ethernet has high latency).
In this context, I believe legal meant "not illegal". Or "data I own" or "data that nobody will sue my ass off for moving them around"
many slashdotters points to the extreme price of that graphic card for gamers. I am more interested in the GPGU performance. The comparison uses OpenCL to be able to compare against nvidia's hardware. But I feel like OpenCL is a second class citizen for nvidia. How much the performance difference would be using a carefully crafted CUDA implementation on the nvidia hardware?
I am currently using the multiarch x86. It works fine for libraries, but you can run into problem with executables. You can install libfoo in both 64 and 32 bit. But you can not do the same with programs. That should be fixed in the future, but is broken for now.
armhf+armel, I do not know, but I'd assume it works just the same.
That's not entirely true. If you pay a phd student and he contributes to 2 journals papers a year, half the cost of the student goes to each paper. A student can easily cost more than $50K a year. That's more $25K per paper just for the students salary. Count the professor salary (even excluding teaching) and you add up some. If it is part of a collaboration there is more than 2 authors.
In my lab, 3 or 4 authors is common. With one student as a main work force, one postdoc or junior professor as main coaching and one full prof as more distant strategic and advising. That sums up quite fast just in salary.
Indeed, our database comes from citeseer, DBLP, PubMed Central and Arxiv. If you know where to find appropriate data in other fields, please contact us!!
Yes it is somewhat similar to that. I believe we are using similar algorithms tod o it. The twist in theadvisor is that you can select the "input papers". google scholar uses you publication list as inputs. In the advisor you provide the input list. It allows to perform more targetted searches.
No there is no charge. It is supported by the research team I am part of. We are trying to be useful :)
If you have problem finding papers, I recommend you try academic search engines. At OSU, we developped theadvisor ( http://theadvisor.osu.edu/ ). It is a webservice that allows you to search paper that are similar to what you already know. You basically upload a set of papers you know are relevant and the system find what is around.
We are still working on improving the quality of the database, but I strongly believe that these approaches are the way to go.
Oh! thanks for that information. I was looking for it and could not find it. That document is real corporate document. With chunks of reall corporateness in it.
This is actually a common things in academic journals. When I publish a paper, I have the "opportunity" of making the paper "open access" by paying some amount of money. It is a fairly standard practice.
There certainly are ways to do that. But it would require the community to move away from them. As a recently hired assistant professor, my tenure will be evaluated partially based on my publication track in "good journals". So I will publish wherever my tenure commitee believe is good. Currently this happens to be where publishers are.
That's a good question. I'd say marketing new journals. And I guess paying folks at the publisher which are doing other things (like book publishing).
It does not seem to go to shareholders as far as I can see.
The things is that there is mostly nothing to be paid. the editor-in-chief and the editorial board is not generally paid. The reviewers are not paid. Most readers access electronic versions and the paper version are almost never opened. So the actual cost is extremely low for the publisher. The only thing the publisher provide now a days is grammar check and spell check and text layouting. Anybody that worked in the field would tell you that mostly that part of the job is not properly done, especially text layouting. I often need multiple rounds with the publisher before I agree on their text layout.
So in brief they do not produce anything of value on the documentitself. They do print it but nobody cares. They do provide web access. But that could be done as the physicists do by publishing everything in arxiv first.
I am going with t-mobile right now, and I am obviously not going to change to something else. Go t-mobile!! Currently my phone has been "paid off" but my wife's did not (becasue she changed it later). So we are still paying the subsidized price for both phones.
That would also push unlocking.
Oh that's good to know. I wish I had mod points. I always thought swiping the app away meant "I am not going to use that, you can unload it" which would be a nicer version of force stop.