It would be incredibly unlikely that each core could directly access the full 620TB. The current largest machines on the Top500 list are all distributed memory machines(clusters). However, the trends in modern interconnect networks are to increase the capabilities for doing stuff like remote direct memory access (RDMA). In such a scheme, the remote memory is not addressable(with load/store instructions), but stuff can be transferred between memories of different nodes by the network hardware. The codes commonly run on the top500 machines are likely written in MPI or MPI/OpenMP. This means they don't need to directly access remote memories.
Google "Chladni patterns." That is a method for shaping the fronts and backs of violins to resonate in certain ways. There have been tons of people looking at frequency responses of stringed instruments. Some even go so far as to strap the instrument to a big speaker, which then plays music for a few days or weeks. The vibrations change the frequency response of the instrument.
Go look at the proposal. This machine is for the sole purpose of performing revolutionary computational science. They want scientific breakthroughs from this machine. You have to be trying for those types of problems to get any time on this machine according to the CFP(I think).
I think supercomputers are pulling ahead a bit from desktop computers. More and more money is being spent on them I think. The big BlueGene/L machine cost on the order of $100M. Assuming a modern computer is $1K, there is a factor of 100000 difference in prices.
There are a few important things that will keep this trend going(in my opinion): 1) These big machines cost $1M or so to power per month. There is some lower bound on the cost for electricity for a particular computational power. Unless technology changes drastically(optical, quantum,...), normal people won't be able to afford the cost of performing computations on this level. 2) PCs get cheaper, supercomputers get more and more expensive 3) PCs have pretty much hit the Ghz barrier, and until more applications are parallelized, PCs will be somewhat stagnant in their performance. Some scientific applications on supercomputers can scale well to tens of thousands of cores, so supercomputers can maybe push the numbers of cores. 4) Power generates heat, and desktops are already starting to have difficulties dissipating enough heat. Old desktop processors didn't need cooling fans. Now you need huge heat sinks, fans, and sometimes liquid cooling.
Those papers I listed describe the problems with java, including the need for precise exception handling which limits compiler code restructuring(like loop unrolling), among the commonly known things like bounds checking for all array accesses.
I saw a 60x slowdown in a journal paper describing this exact topic(Java vs. Fortran). Their summary was that it may be possible to get Java fast but by default Java is(was) slow.
Look at "Java for Numerically Intensive Computing: from Flops to Gigaflops" or "Java for high-performance numerical computing". These both tell that better libraries(for multidimensional arrays) and relaxation of the floating point requirements of Java can speed up things a lot.
That is a convenient number if you look at the bluegene configuration. Bluegene/L at LLNL has 64 racks, each with 1024 processors. 64*1024=65536.
or maybe this is the 32 racks with 2048 processors. The counting of these things is ambiguous, but Each rack has 1024 Nodes(each node having two processors). And can be used in two modes, a coprocessor mode where one just does network stuff. All this information is public, so you can search for it on Google.
It is just a happy power of two that you found.
I have a non-parallelizable algorithm for you. Apply a non-associative operation to elements of an array like this:
result = (a[0] * (a[1] * (a[2] * (a[3] *(....)))))
Note that I use * to represent some binary operator that satisfies non-associativity. I think that this algorithm may be provably non-parallelizable, since the innermost * operation must be performed before any other * operations. Thus no two * operations can be done at the same time, and thus none of the * operations can be parallelized. Furthermore if these are the only operations being performed in the entire algorithm, then no operations in the algorithm can be parallelized. Thus the algorithm is non-parallelizable by any reasonable definition. I do assume for this proof that you cannot parallelize the * operator.
On a side note, I could also prove that NP hard problems are still NP hard on even a large number of processors.
As suggested by another post, you would want to parallelize your job to make it run on some number of the machines simultaneously. Using MPI could be one way of doing this.
However, lets say you have an application which generates terabytes of data, and then processes it, a system like this with tons of fast storage and high bandwidth networks would be useful.
One paper which might help point in the right direction is "Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures" by Grama, Gupta, Kumar.
You pose a very interesting question. Any application where you have a large number of steps, each step relying upon the result from the previous step, and each step independently not parallelizable would probably fit your description.
I don't know of anything off the top of my head where you couldn't parallelize some portion of it, but it is much easier to think of applications which cannot scale to large levels of parallelism. The trivial examples of good scalability like rendering frames of movies or SETI@home will scale to any cluster or set of PC's you put them on. Other things like large matrix multiplications or FFTs or N-body problems do not scale as well. In these cases as you subdivide the problem into smaller pieces for your larger number of machines, the computation on each processor will quickly become small while the communication between processors will become more significant.
I guess the Alpha-Beta searches will probably not benefit by parallelization as one might imagine. You could do some proof that although you can evaluate more nodes in the game-tree, you cannot prune, and thus your search will degrade towards a parallelized MinMax search.
As a user of teragrid, as well as other huge machines, There are some embarassingly parallel tasks like SETI at home which can be easily run on distributed systems. There are other problems where this is just out of the question. The Teragrid clusters will be much better for these types of problems.
Tightly coupled problems just cannot be run efficiently even on clusters of workstations(COWs). It is the age old topic of using the right tool for the right job.
I think you are trying to find old books at the wrong store(s). I personally would prefer to see many new books, so that when some new fundamentally different version is out, I can quickly come up to speed on it. The old books are bound to exist, as they used to be the new fancy ones. Perhaps a local library might have old books, or some non-mega-bookstore might.
Do any of these implementations take advantage of the fact that you can send data also in the payload of said "knock" packets?
Seems that quite a few tricky cryptographic tricks(timestamps,passwords,etc encrypted and put in the payload). Even parts of the TCP or UDP headers could be used.
It would be incredibly unlikely that each core could directly access the full 620TB. The current largest machines on the Top500 list are all distributed memory machines(clusters). However, the trends in modern interconnect networks are to increase the capabilities for doing stuff like remote direct memory access (RDMA). In such a scheme, the remote memory is not addressable(with load/store instructions), but stuff can be transferred between memories of different nodes by the network hardware. The codes commonly run on the top500 machines are likely written in MPI or MPI/OpenMP. This means they don't need to directly access remote memories.
Google "Chladni patterns." That is a method for shaping the fronts and backs of violins to resonate in certain ways. There have been tons of people looking at frequency responses of stringed instruments. Some even go so far as to strap the instrument to a big speaker, which then plays music for a few days or weeks. The vibrations change the frequency response of the instrument.
Sun -- Weren't they the ones who built Niagra without any Floating Point units? Sure seems useful to me...
Go look at the proposal. This machine is for the sole purpose of performing revolutionary computational science. They want scientific breakthroughs from this machine. You have to be trying for those types of problems to get any time on this machine according to the CFP(I think).
L stood for "Light", as in not quite the full heavy BlueGene we will build in the future. It is not just a letter.
This won't be used for that... The energy graph problem is not that problematic or big. Definitely doesn't need that kind of computation power.
I think supercomputers are pulling ahead a bit from desktop computers. More and more money is being spent on them I think. The big BlueGene/L machine cost on the order of $100M. Assuming a modern computer is $1K, there is a factor of 100000 difference in prices.
...), normal people won't be able to afford the cost of performing computations on this level.
There are a few important things that will keep this trend going(in my opinion):
1) These big machines cost $1M or so to power per month. There is some lower bound on the cost for electricity for a particular computational power. Unless technology changes drastically(optical, quantum,
2) PCs get cheaper, supercomputers get more and more expensive
3) PCs have pretty much hit the Ghz barrier, and until more applications are parallelized, PCs will be somewhat stagnant in their performance. Some scientific applications on supercomputers can scale well to tens of thousands of cores, so supercomputers can maybe push the numbers of cores.
4) Power generates heat, and desktops are already starting to have difficulties dissipating enough heat. Old desktop processors didn't need cooling fans. Now you need huge heat sinks, fans, and sometimes liquid cooling.
Those papers I listed describe the problems with java, including the need for precise exception handling which limits compiler code restructuring(like loop unrolling), among the commonly known things like bounds checking for all array accesses.
I saw a 60x slowdown in a journal paper describing this exact topic(Java vs. Fortran). Their summary was that it may be possible to get Java fast but by default Java is(was) slow.
Look at "Java for Numerically Intensive Computing: from Flops to Gigaflops" or "Java for high-performance numerical computing". These both tell that better libraries(for multidimensional arrays) and relaxation of the floating point requirements of Java can speed up things a lot.
This is the same technology used in most computer power supplies. (I use computers every day)
I've also had some experience and training with some big ETC lighting systems. These are quite impressive devices
That is a convenient number if you look at the bluegene configuration. Bluegene/L at LLNL has 64 racks, each with 1024 processors. 64*1024=65536. or maybe this is the 32 racks with 2048 processors. The counting of these things is ambiguous, but Each rack has 1024 Nodes(each node having two processors). And can be used in two modes, a coprocessor mode where one just does network stuff. All this information is public, so you can search for it on Google. It is just a happy power of two that you found.
Actually I meant distributive not associative...sorry
I have a non-parallelizable algorithm for you. Apply a non-associative operation to elements of an array like this:
result = (a[0] * (a[1] * (a[2] * (a[3] *(....)))))
Note that I use * to represent some binary operator that satisfies non-associativity. I think that this algorithm may be provably non-parallelizable, since the innermost * operation must be performed before any other * operations. Thus no two * operations can be done at the same time, and thus none of the * operations can be parallelized. Furthermore if these are the only operations being performed in the entire algorithm, then no operations in the algorithm can be parallelized. Thus the algorithm is non-parallelizable by any reasonable definition. I do assume for this proof that you cannot parallelize the * operator.
On a side note, I could also prove that NP hard problems are still NP hard on even a large number of processors.
As suggested by another post, you would want to parallelize your job to make it run on some number of the machines simultaneously. Using MPI could be one way of doing this. However, lets say you have an application which generates terabytes of data, and then processes it, a system like this with tons of fast storage and high bandwidth networks would be useful.
One paper which might help point in the right direction is "Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures" by Grama, Gupta, Kumar. You pose a very interesting question. Any application where you have a large number of steps, each step relying upon the result from the previous step, and each step independently not parallelizable would probably fit your description. I don't know of anything off the top of my head where you couldn't parallelize some portion of it, but it is much easier to think of applications which cannot scale to large levels of parallelism. The trivial examples of good scalability like rendering frames of movies or SETI@home will scale to any cluster or set of PC's you put them on. Other things like large matrix multiplications or FFTs or N-body problems do not scale as well. In these cases as you subdivide the problem into smaller pieces for your larger number of machines, the computation on each processor will quickly become small while the communication between processors will become more significant. I guess the Alpha-Beta searches will probably not benefit by parallelization as one might imagine. You could do some proof that although you can evaluate more nodes in the game-tree, you cannot prune, and thus your search will degrade towards a parallelized MinMax search.
As a user of teragrid, as well as other huge machines, There are some embarassingly parallel tasks like SETI at home which can be easily run on distributed systems. There are other problems where this is just out of the question. The Teragrid clusters will be much better for these types of problems.
Tightly coupled problems just cannot be run efficiently even on clusters of workstations(COWs). It is the age old topic of using the right tool for the right job.
I think you are trying to find old books at the wrong store(s). I personally would prefer to see many new books, so that when some new fundamentally different version is out, I can quickly come up to speed on it. The old books are bound to exist, as they used to be the new fancy ones. Perhaps a local library might have old books, or some non-mega-bookstore might.
The 180 had a great 16 shades of greyish blue!!!
Do any of these implementations take advantage of the fact that you can send data also in the payload of said "knock" packets? Seems that quite a few tricky cryptographic tricks(timestamps,passwords,etc encrypted and put in the payload). Even parts of the TCP or UDP headers could be used.