I have been using owncloud for about a year now. I must say I am not as enthusiastic as you are about it. I went through two change of major and only exporting the data to a different format and reimporting them kept my calendar safe. The application is overall fairly slow. Still it gets the job done for me.
There is no good unit. It all depends on your application. For some application FLOPS is what matter. For some application bandwidth is what matters. For some application size of the memory is what matters. There is no simple performance metric that account for every single application.
I don't know how familiar you are with the computations you are mentionning. But large scale scientific (read physics/engineering) applications are essentially composed of BLAS routines. I have been working with physicist where 50% of the time of their application was eigensolving a large sparse matrix (where the other 50% is building that sparse matrix). Or with nuclear engineers where 80% of the time of their application was spent solving dense linear systems. If BLAS routines (or blas like routine) are so popular it is precisely because they are a significant part of some scientific applications.
FLOPS is a good measure of performance for SOME applications which are compute bound. For instance many combinatorial optimization algortihm rely heavily on solving dense linear systems. For these applications FLOPS is the meaningful performance metric. That is why top-500 uses linpack as its benchmark because it is meaningful for many applications. Currently the top-500 benchmark is being reassessed to take more sparsity in and the new version should include conjugate gradient algorithm on stencils. Because it is more representative of current scientific application (solving heat equation on 3D objects)
I have also been working in graph analytics and here we measure more things like edge per second (for instance the graph500 benchmark does that). Here it is meaningful because most of these application boil down to graph traversals. But that metric is quite controversial since the performance you get depends highly on the structure of the graph/matrix. And you need an instance benchmark that represents your application target. For instance, graphs of roads get typically terrible performance on the GPU because they have high diameter and little parallelism can be exposed and CPU typically get better performance there. On social networks, the diameter of the graph is much smaller, there is more parallelism so GPU can be utilize closer to its peak performance.
tl;dr, there is no one metric that represent "real applications". Each application has different requirements. And each application needs to be investigated separately. FLOP is meaningful for many "real applications".
As i said it really depends on application. Dense compute intensive kernel (typically matmul but any O(n^3) on O(n^2) data typically does it) will reach 80% of peak flop on both architectures. After that the situation is really complication. I have been looking at sparse linear algebra kernel (spmv or graph traversal) and these kernel are quite catastrophic for both cpu and gpu. Most of the time they are memory bound sometimes by latency. And here most conventional insight can be discarded. Depending on the shape of the matrix, the performance of the gpu can be horrible or very good. People love to talk "in real application" but that is mostly meaningless. In which real application? Weather forecasting can certainly exploit gpu close to peak performance. Text compression, probably not so well. In the field of relational database, it certainly heavily depend on the query. Though a factor of 7 seems a lot in that case.
I did not look at the actual numbers claimed nor what they are. But a factor of 7 between a GPU and a 32 core intel system is not impossible. My BS alarm trip around a factor of 20 for a 2 processor system.
If you look at state of the art nvidia GPU, you pick a tesla K10, ( http://www.nvidia.com/object/tesla-servers.html ). You get about 4T.5flop/s single precision of performance and a bandwidth of 320GB/s. The flop is realistic for compute intensive (read dense mat mul) and the bandwidth is never reached. Probably 250GB/s is more reasonnable.
On the CPU side, if you peak a Xeon E5 such as this one ( http://ark.intel.com/products/64595/ ), you need 4 of them to get to 32 cores. you get 32core*2.6Ghz*8floatpersimd = 665Gflop/s which is actually realistic for dense kernel such as matmul. and 4*50GB/s bandwidth. But in practice you difficulty reach 30GB/s per processor so 120GB/s aggregated.
So the GPU is about 7.5 times faster floating point wise and 2 times faster bandwidth wise. but here we are talking peak, and practical performance varies a lot from application to application and depending if you can use your architecture properly. But overall for some well chosen kernel a factor of 10 still seems not too unreasonnable.
You mean it has ever been alive?
on
Is Ruby Dying?
·
· Score: 1
Clearly I am not in "the web world" and I am seeing this question from an external viewpoint. But I never really saw anybody exciting about ruby or using ruby or praising ruby except one single phd student who was using it to make his experiments repeatable and automatically logged. Sure there is an occasional article on a new version of ruby, a flaw in ruby-on-rails. I heard people talk a lot about PHP, about Python, about javascript, to do pretty much anything. But quite frankly I never hear about ruby. Actually I hear more about LUA than I hear about ruby.
For that reason I never took ruby for more than an hobbyist pet project. Maybe I am wrong, but seen from my chair of low-level programming guy, no one uses ruby.
I don't know much about firearms, but I feel like plastic based guns are not really new. If you can enter a "high security area" with a plastic gun. Then maybe it is NOT a high security area...
Historically, hackers have joined up with mafia or gangs for _physical_ protection, and in exchange, provide black-hat services to the groups providing them with protection.
While, I aggree with the sentiment, is there any actual evidence of that?
Another question that baffles me, how were the people on the 9/11 flights able to use their cell phones during flight? Yeah they have the in seat phones, but i still remember hearing people say, "Yeah they used their phones!" Fun fact that everyone seems to forget.
And you saw how that flight ended?? DO YOU REALLY WANT THAT AGAIN?!
I have the same question. I am all for science and if asked I would be all for it.
But an important question should be answered if possible. What did we gain from discovering the higgs boson? I am sure there are thousands of really cool application that specialist can think of. I think if some could be highlighted (even if 50 years of engineering down the road), people would be much more receptive to it.
Pretty much the same here. You leave your brain outside the theater and then stuff happens, some funny parts are funny; some not funny parts aren't so funny. Overall I had a good time.
Speaking as a scholar, the main problem that I see is that is that communicating to the public is not my job. Writing for some wiki is not my job. My job is composed of 3 components: 1/ teaching: in class and mentoring students. 2/ research: conduct, manage and fund. 3/ service: for my university in comitees and for the community by taking part in conference/journals by submitting/reviewing paper and hleping with organization.
Moreover, writing is difficult. Especially that form of quite high level all-encompasing writting. Writing a good survey paper takes months. It is a significant endeavour.
As you can see, wikifying scholarly cannons is not really a part of my job and takes a lot of time. It is not unrelated, but it is a more abstract thing. As such, it is not directly useful to my advancement. (In other words, my tenure commitee is not going to care.) I just can not afford to spend that time if it is not part of a clearly identified project.
maybe the tv is not IP connected, but nowadays every single gaming system comes with the ability to play videos. Most are compatible with netflix, hulu, or amazon video service. And that is ignoring all the hulu/netflix box that you find here and there. Also cable box can do VOD. And I am discouting the $50 android stick that goes into your tv hdmi port.
So I'd say that the market for "I absolutely need a DVD or I can't play it" is probably quite slim.
TBB is *not* implemented using the Cilk Plus runtime, either in the Intel compiler or in the Cilk Plus branch of GCC. TBB is implemented using a completely separate runtime from Cilk Plus. You can take my word for it that I know what I'm talking about, or you can confirm it by studying the sources online, since they are both publicly available.:)
Interesting. I never looked at how it is implemented by ICC. But my understanding of it is that (some parts of) TBB used a workstealing engine for execution and that it was reusing a significant portion of the cilk runtime. I might have understood wrong.
Pinning threads to cores can help on some benchmarks, but it is less useful for others. In particular, for codes implemented in TBB or Cilk Plus, which use work-stealing schedulers, the performance benefits of pinning can be modest, almost negligible, or sometimes even hurt performance.
Well, the point of pinning is to increase memory locality. Workstealing engine typically try to keep things local to avoid that problem. So you would rather have little migration. Now that I think about it more. Cilk Plus tends to create more threads than cores/hardware thread. So pinning might actually be a catastrophe in that case. I guess what I meant was that ignoring pinning in a parallel benchmark is not the way to do it.
About the speed of Cilk Plus. It is my personnal experience that Cilk Plus is slow. I forwarded a couple of performance related problem to Intel. I remember excluding Cilk Plus result from some charts because they were embarassing (although I might have done something wrong). Whenever you see academic publication using that kind of technology you realize that people emphasize things like parallelism, speedup, load balance, but rarely performance. And when you dig what actually happened in the performance area, you realize that the numbers are not that good. (Though I agree with you that in this case it should not matter)
You are indeed correct, I am french. And for the first AC that replied to me, I was a 2nd computer science student something like 10 years ago.
I am currently a CS professor in a US university and I have been doing low level performance study on various architecture (intel xeon, nvidia GPU, recently Xeon Phi, distributed memory machine) for the last 4 years. So I might not know everything about performance benchmarks, but clearly the methodology of the original article is flawed. I would whip (figuratively of course) students that turn in something like that to me.
Let alone the fact that on linux an memory page is allocated the first time (ans so placed in memory) it is touched. So if you do not intialize it, the allocation and placement happens during the traversal, which is probably not things you want to time.
From my own tests (measuring the performance of large scale applications using real world data sets), intel > clang > g++ (although the difference between them is shrinking).
I made lots of expeirments in this area as well. And my overall conclusions was that the intel compiler, the pgi compiler and GCC are all good compilers. But their performance vary significantly depending on what you are compiling. For some other applications you would get different results. Or that within your application some set of functions are more efficiently compiled by the intel compiler while other are better compiled by GCC. It is really difficult to say which is "the best" compiler in term of performance.
I am a scholar and study parallel computing. These benchmarks are pretty much pointless. You can not make any conclusions out of these results. Here the author take the time whole time of the execution for the creation of the process to its destruction. That means that are included lots of overhead which would be included in startup time in a real application.
There is also apparently no thread pinning to computational cores. This is known to make a HUGE difference.
Then the authors compared cilk result. cilk is known to be slow for simple codes that do not require workstealing and have complex dependencies. For the record, I know they are also comparing TBB. But TBB is implemented on top of the cilk engine in the intel compiler (I don't know about gcc).
In these results hyperthreading is enabled. The proper use of hyperthreading is complicated. There are some problems where it helps, other where it harms, and I would not be surprise that this behavior be compiler dependent.
Finally, it is almost impossible to compare compilers. On different platforms, with the same compilers you will get different results. Some functions are better compiled by one compiler and some functions are better compiled by the other compiler. This has been reported over and over and over again.
If you care about performance, you should not rely on what your compiler is doing in your back. You need to know what it is doing. Depending on memory alignment (and what the compiler knows about it), depending how the vectorization happen, depending on potential memory aliasing you will get different results.
If you care about performance, you need to benchmark and you need to optimize and you need to know what the compiler does.
that's normal. There is hyperthreading on that machine, it screws up that kind of measurement. You should always use wall-clock time when dealing with parallel codes. You should also repeat the test multiple times and discard the first few results which the author did not do. It is very standard in parallel programming benchmark. And since the author did not do that, I assume he does not know much about benchmarking. Lots of parallel middleware have high initialization overhead. This tends to be particularly true for intel tools.
I must +1 that. I am an academic and I do not have the time to write production ready code. I write code to prove a point: "this problem is solvable", "this algorithm can be implemented with that performance on this machine". Once the paper that goes with the code is published. I archive the code and will only touch it again when I want to solve a similar problem.
If I was interested in production ready code, I'd pay a software engineer to release the software.
The number of application GPGPU are tremendous. Actually people start just calling them accelerators more than GPU. from structural biology, to image processing, from graph analytics to text mining, from fluid mecanics to energy minimization, there are not a lot of problems which have been investigated today using GPUs.
Ok, the camera on the samsung gear is a little strange and I'd rather it not be there. But overall, it does not look much different. If you just google "square watch" you will find plenty of classical watch which look quite similar to the smartwatches.
For me, I think that eventually, the smartwatch will replace my smartphone. I already carry a tablet everywhere. I mostly need my phone for phone calls, the occasional text message, GPS some times and giving web to my tablet when I am not at home or at work. So clearly if a watch could pass phone calls and tether phone-internet, I would retire my smartphone almost instantly.
What I don't understand is that people say it is dorky. But I find that it pretty much look like a regular "digital/quartz" watch we had 10 years ago. Clearly they don't look like luxury watch, but they do not look dorky at all to me.
I have been using owncloud for about a year now. I must say I am not as enthusiastic as you are about it. I went through two change of major and only exporting the data to a different format and reimporting them kept my calendar safe. The application is overall fairly slow. Still it gets the job done for me.
There is no good unit. It all depends on your application. For some application FLOPS is what matter. For some application bandwidth is what matters. For some application size of the memory is what matters. There is no simple performance metric that account for every single application.
I don't know how familiar you are with the computations you are mentionning. But large scale scientific (read physics/engineering) applications are essentially composed of BLAS routines. I have been working with physicist where 50% of the time of their application was eigensolving a large sparse matrix (where the other 50% is building that sparse matrix). Or with nuclear engineers where 80% of the time of their application was spent solving dense linear systems. If BLAS routines (or blas like routine) are so popular it is precisely because they are a significant part of some scientific applications.
FLOPS is a good measure of performance for SOME applications which are compute bound. For instance many combinatorial optimization algortihm rely heavily on solving dense linear systems. For these applications FLOPS is the meaningful performance metric. That is why top-500 uses linpack as its benchmark because it is meaningful for many applications. Currently the top-500 benchmark is being reassessed to take more sparsity in and the new version should include conjugate gradient algorithm on stencils. Because it is more representative of current scientific application (solving heat equation on 3D objects)
I have also been working in graph analytics and here we measure more things like edge per second (for instance the graph500 benchmark does that). Here it is meaningful because most of these application boil down to graph traversals. But that metric is quite controversial since the performance you get depends highly on the structure of the graph/matrix. And you need an instance benchmark that represents your application target. For instance, graphs of roads get typically terrible performance on the GPU because they have high diameter and little parallelism can be exposed and CPU typically get better performance there. On social networks, the diameter of the graph is much smaller, there is more parallelism so GPU can be utilize closer to its peak performance.
tl;dr, there is no one metric that represent "real applications". Each application has different requirements. And each application needs to be investigated separately. FLOP is meaningful for many "real applications".
As i said it really depends on application. Dense compute intensive kernel (typically matmul but any O(n^3) on O(n^2) data typically does it) will reach 80% of peak flop on both architectures. After that the situation is really complication. I have been looking at sparse linear algebra kernel (spmv or graph traversal) and these kernel are quite catastrophic for both cpu and gpu. Most of the time they are memory bound sometimes by latency. And here most conventional insight can be discarded. Depending on the shape of the matrix, the performance of the gpu can be horrible or very good.
People love to talk "in real application" but that is mostly meaningless. In which real application? Weather forecasting can certainly exploit gpu close to peak performance. Text compression, probably not so well. In the field of relational database, it certainly heavily depend on the query. Though a factor of 7 seems a lot in that case.
I did not look at the actual numbers claimed nor what they are. But a factor of 7 between a GPU and a 32 core intel system is not impossible. My BS alarm trip around a factor of 20 for a 2 processor system.
If you look at state of the art nvidia GPU, you pick a tesla K10, ( http://www.nvidia.com/object/tesla-servers.html ). You get about 4T.5flop/s single precision of performance and a bandwidth of 320GB/s. The flop is realistic for compute intensive (read dense mat mul) and the bandwidth is never reached. Probably 250GB/s is more reasonnable.
On the CPU side, if you peak a Xeon E5 such as this one ( http://ark.intel.com/products/64595/ ), you need 4 of them to get to 32 cores. you get 32core*2.6Ghz*8floatpersimd = 665Gflop/s which is actually realistic for dense kernel such as matmul. and 4*50GB/s bandwidth. But in practice you difficulty reach 30GB/s per processor so 120GB/s aggregated.
So the GPU is about 7.5 times faster floating point wise and 2 times faster bandwidth wise. but here we are talking peak, and practical performance varies a lot from application to application and depending if you can use your architecture properly. But overall for some well chosen kernel a factor of 10 still seems not too unreasonnable.
Clearly I am not in "the web world" and I am seeing this question from an external viewpoint. But I never really saw anybody exciting about ruby or using ruby or praising ruby except one single phd student who was using it to make his experiments repeatable and automatically logged. Sure there is an occasional article on a new version of ruby, a flaw in ruby-on-rails. I heard people talk a lot about PHP, about Python, about javascript, to do pretty much anything. But quite frankly I never hear about ruby. Actually I hear more about LUA than I hear about ruby.
For that reason I never took ruby for more than an hobbyist pet project. Maybe I am wrong, but seen from my chair of low-level programming guy, no one uses ruby.
I don't know much about firearms, but I feel like plastic based guns are not really new. If you can enter a "high security area" with a plastic gun. Then maybe it is NOT a high security area...
Historically, hackers have joined up with mafia or gangs for _physical_ protection, and in exchange, provide black-hat services to the groups providing them with protection.
While, I aggree with the sentiment, is there any actual evidence of that?
Another question that baffles me, how were the people on the 9/11 flights able to use their cell phones during flight? Yeah they have the in seat phones, but i still remember hearing people say, "Yeah they used their phones!" Fun fact that everyone seems to forget.
And you saw how that flight ended?? DO YOU REALLY WANT THAT AGAIN?!
I have the same question. I am all for science and if asked I would be all for it.
But an important question should be answered if possible. What did we gain from discovering the higgs boson? I am sure there are thousands of really cool application that specialist can think of. I think if some could be highlighted (even if 50 years of engineering down the road), people would be much more receptive to it.
Pretty much the same here. You leave your brain outside the theater and then stuff happens, some funny parts are funny; some not funny parts aren't so funny. Overall I had a good time.
(disclaimer: this being slashdot I did not RTFA.)
Speaking as a scholar, the main problem that I see is that is that communicating to the public is not my job. Writing for some wiki is not my job. My job is composed of 3 components:
1/ teaching: in class and mentoring students.
2/ research: conduct, manage and fund.
3/ service: for my university in comitees and for the community by taking part in conference/journals by submitting/reviewing paper and hleping with organization.
Moreover, writing is difficult. Especially that form of quite high level all-encompasing writting. Writing a good survey paper takes months. It is a significant endeavour.
As you can see, wikifying scholarly cannons is not really a part of my job and takes a lot of time. It is not unrelated, but it is a more abstract thing. As such, it is not directly useful to my advancement. (In other words, my tenure commitee is not going to care.) I just can not afford to spend that time if it is not part of a clearly identified project.
Though nowadays you just click on ftp://... link and get the right file right away. So I am not sure the file listing problem matters that much.
maybe the tv is not IP connected, but nowadays every single gaming system comes with the ability to play videos. Most are compatible with netflix, hulu, or amazon video service. And that is ignoring all the hulu/netflix box that you find here and there. Also cable box can do VOD. And I am discouting the $50 android stick that goes into your tv hdmi port.
So I'd say that the market for "I absolutely need a DVD or I can't play it" is probably quite slim.
TBB is *not* implemented using the Cilk Plus runtime, either in the Intel compiler or in the Cilk Plus branch of GCC. TBB is implemented using a completely separate runtime from Cilk Plus. You can take my word for it that I know what I'm talking about, or you can confirm it by studying the sources online, since they are both publicly available. :)
Interesting. I never looked at how it is implemented by ICC. But my understanding of it is that (some parts of) TBB used a workstealing engine for execution and that it was reusing a significant portion of the cilk runtime. I might have understood wrong.
Pinning threads to cores can help on some benchmarks, but it is less useful for others. In particular, for codes implemented in TBB or Cilk Plus, which use work-stealing schedulers, the performance benefits of pinning can be modest, almost negligible, or sometimes even hurt performance.
Well, the point of pinning is to increase memory locality. Workstealing engine typically try to keep things local to avoid that problem. So you would rather have little migration. Now that I think about it more. Cilk Plus tends to create more threads than cores/hardware thread. So pinning might actually be a catastrophe in that case. I guess what I meant was that ignoring pinning in a parallel benchmark is not the way to do it.
About the speed of Cilk Plus. It is my personnal experience that Cilk Plus is slow. I forwarded a couple of performance related problem to Intel. I remember excluding Cilk Plus result from some charts because they were embarassing (although I might have done something wrong). Whenever you see academic publication using that kind of technology you realize that people emphasize things like parallelism, speedup, load balance, but rarely performance. And when you dig what actually happened in the performance area, you realize that the numbers are not that good. (Though I agree with you that in this case it should not matter)
You are indeed correct, I am french. And for the first AC that replied to me, I was a 2nd computer science student something like 10 years ago.
I am currently a CS professor in a US university and I have been doing low level performance study on various architecture (intel xeon, nvidia GPU, recently Xeon Phi, distributed memory machine) for the last 4 years. So I might not know everything about performance benchmarks, but clearly the methodology of the original article is flawed. I would whip (figuratively of course) students that turn in something like that to me.
Let alone the fact that on linux an memory page is allocated the first time (ans so placed in memory) it is touched. So if you do not intialize it, the allocation and placement happens during the traversal, which is probably not things you want to time.
From my own tests (measuring the performance of large scale applications using real world data sets), intel > clang > g++ (although the difference between them is shrinking).
I made lots of expeirments in this area as well. And my overall conclusions was that the intel compiler, the pgi compiler and GCC are all good compilers. But their performance vary significantly depending on what you are compiling. For some other applications you would get different results. Or that within your application some set of functions are more efficiently compiled by the intel compiler while other are better compiled by GCC. It is really difficult to say which is "the best" compiler in term of performance.
I am a scholar and study parallel computing. These benchmarks are pretty much pointless. You can not make any conclusions out of these results. Here the author take the time whole time of the execution for the creation of the process to its destruction. That means that are included lots of overhead which would be included in startup time in a real application.
There is also apparently no thread pinning to computational cores. This is known to make a HUGE difference.
Then the authors compared cilk result. cilk is known to be slow for simple codes that do not require workstealing and have complex dependencies. For the record, I know they are also comparing TBB. But TBB is implemented on top of the cilk engine in the intel compiler (I don't know about gcc).
In these results hyperthreading is enabled. The proper use of hyperthreading is complicated. There are some problems where it helps, other where it harms, and I would not be surprise that this behavior be compiler dependent.
Finally, it is almost impossible to compare compilers. On different platforms, with the same compilers you will get different results. Some functions are better compiled by one compiler and some functions are better compiled by the other compiler. This has been reported over and over and over again.
If you care about performance, you should not rely on what your compiler is doing in your back. You need to know what it is doing. Depending on memory alignment (and what the compiler knows about it), depending how the vectorization happen, depending on potential memory aliasing you will get different results.
If you care about performance, you need to benchmark and you need to optimize and you need to know what the compiler does.
that's normal. There is hyperthreading on that machine, it screws up that kind of measurement. You should always use wall-clock time when dealing with parallel codes. You should also repeat the test multiple times and discard the first few results which the author did not do. It is very standard in parallel programming benchmark. And since the author did not do that, I assume he does not know much about benchmarking. Lots of parallel middleware have high initialization overhead. This tends to be particularly true for intel tools.
an analysis of the possibility of badBIOS
http://www.rootwyrm.com/2013/11/the-badbios-analysis-is-wrong/
I must +1 that. I am an academic and I do not have the time to write production ready code. I write code to prove a point: "this problem is solvable", "this algorithm can be implemented with that performance on this machine". Once the paper that goes with the code is published. I archive the code and will only touch it again when I want to solve a similar problem.
If I was interested in production ready code, I'd pay a software engineer to release the software.
The number of application GPGPU are tremendous. Actually people start just calling them accelerators more than GPU. from structural biology, to image processing, from graph analytics to text mining, from fluid mecanics to energy minimization, there are not a lot of problems which have been investigated today using GPUs.
Alright smartypants,
Did you look at picture of them?
Here is sony's smartwatch: http://www.digitaltrends.com/wp-content/uploads/2013/06/Sony-SmartWatch-2.jpg
Here is samsung's smart watch: http://s1.ibtimes.com/sites/www.ibtimes.com/files/styles/v2_article_large/public/2013/09/05/samsung-galaxy-gear.jpg
Here is a moschino square (not smart) watch: http://www.the-watch-store.com/shop/3042-11914-large/moschino-cheap-and-chic-unit-square-watch-mw0275.jpg
Ok, the camera on the samsung gear is a little strange and I'd rather it not be there. But overall, it does not look much different. If you just google "square watch" you will find plenty of classical watch which look quite similar to the smartwatches.
For me, I think that eventually, the smartwatch will replace my smartphone. I already carry a tablet everywhere. I mostly need my phone for phone calls, the occasional text message, GPS some times and giving web to my tablet when I am not at home or at work. So clearly if a watch could pass phone calls and tether phone-internet, I would retire my smartphone almost instantly.
What I don't understand is that people say it is dorky. But I find that it pretty much look like a regular "digital/quartz" watch we had 10 years ago. Clearly they don't look like luxury watch, but they do not look dorky at all to me.