Mark is a good guy. Too bad he sucks great humongous dick at hiring fucking idiots. All of the Ubuntu bullshit you hear about is because he has fucking morongs working for him that cannot tell their arse from a hole in the ground.
Mark. For the love of god. Fire EVERYONE at Canonical and hire people that have a goddamned clue.
As s friend of mine once taught me, “first class people hire first class people. second class people hire third class people”.
There are only two rendering engines for Linux, and they are Gecko and Webkit, both of which have horrible support for a lot of advanced web standards such as SVG and MathML, because the focus today is on who makes the fanciest sliding div effect rather than on actually properly implementing existing stuff. The loss of Presto and the reduction of alternatives is a very sad day for the web.
You can still download Opera 12 for Linux. And that's actually a good thing, since Presto is still the least buggy engine when it comes to SVG and, as far as my experience is concerned, MathML.
Yeah but with this kind of applications the real bottleneck is the fact that the discrete GPU needs to access data through the high-latency, low-bandwidth PCIe bus. For this kind of application, an IGP, even with the lower core counts, is often a much better solution, unless you manage to fully cover the host-device-host transfers with computations.
I'd be really curious to see this thing done in OpenCL on a recent AMD APU, exploting all the CPU cores and the IGP cores concurrently.
Either you're trolling or you have no frigging idea what you're talking about.
It is true that often the low-end cards are just crippled versions of the high-end cards, something which —as despicable as it might be— is nothing new to the world of technology. But going from this to saying that there is no competition and no (or slow) progress is a step into ignorance (or trolling).
I've been dealing with GPUs (for the purpose of computing, not gaming) for over five years, that is to say almost since the beginning of proper hardware support for computing on GPU. And there has been a lot of progress, even with the very little competition there has been so far.
NVIDIA alone has produced three major architectures, with very significant differences between them. If you compare the capabilities of a Tesla (1st gen) with those of a Fermi (2nd gen) or a Kepler (3rd gen), for example, you get: Fermi, has introduced an L2 and an L1 cache, which was not present in the Tesla arch, lifting some of the very strict algorithmic restrictions imposed on memory-bound kernels; it also introduced hardware-level support for DP. Kepler is not as big a change over Tesla, but it has introduced things such as the ability for stream processors to swizzle private variables among them, which is a rather revolutionary idea in the GPGPU paradigm. And 6 times more stream processors per compute unit over the previous generation is not exactly something I'd call "not that much different".
AMD has only had one major overhaul (the introduction of GCN), instead of two, but I'm not really spending more words on how much of a change it was compared to the previous VLIW architectures they had. It's a completely different beast, with the most important benefit being that its huge computing power can be harnessed much more straightforwardly. And if you ever had to hand-vectorize your code looking for the pre-GCN hotspot of workload per wavefront, you'd know what a PITN that was.
I would actually hope they stopped coming up with new archs, and spent some more time refining their software side. AMD has some of the worst drivers ever seen by a major hardware manufacturer (in fact, considering they've consistently had better, cheaper hardware, there isn't really any other explanation for their inability to gain dominance in the GPU market), but NVIDIA isn't exactly problem free: their support for OpenCL, for example, is ancient and crappy (obviously, since they'd rather have people use CUDA to do compute on their GPUs).
And hardware-wise, Intel is finally stepping up their game. With their HD4000 chipset they've finally managed to produce an IGP with decent performance (it even supports compute), although AMD's APUs are still top dog. On the HPC side, their Xeon Phi offerings are very interesting competitors to the NVIDIA Tesla (not the arch, the brand name for the HPC-dedicated devices) cards.
Nvidia hardware isn't really clearly superior to AMD.. they rotate on who has the best hardware at various price points.
Actually, if you just look at the specifications, ATI/AMD has almost always had the (theoretically) most competitive hardware (GPU-wise), both in terms of performance/price ratio and often even in terms of raw computing power/memory bandwidth. AMD was even the first to come out with hardware support for compute on GPU (the first CTM/CAL betas came out before CUDA was ever mentioned anywhere), even if it required assembly progamming of the shaders (which you could often do without by using a layer such as BrookGPU).
However, their GPUs have been crippled by the most horrible software ecosystem possible. By and large the main culprit is ATI/AMD itself, who has constantly failed at producing high-quality, stable drivers and capable compilers for their shaders. A secondary culprit (which has finally been removed from the equation) is the architecture itself: up until the introduction of GCN, AMD shaders had a VLIW architecture (VLIW5 first, VLIW4 in the last releases before GCN) which were often not easily exploitable without heavy-duty restructuring and vectorization of your shader code: so you often found yourself with huge horsepower available, while only be able to exploit some 30-60% of it at best.
Wrong. It does mean you can copy it. That's exactly what copyright is about. You can (re)publish (and thus create copies of) works in the public domain as you see fit without paying royalties to anyone.
Re. your point 1., OpenCL can run as well on CPUs (using all the cores and the vector instructions is 'free'), and of course the APUs, which can often access the same memory space as the CPU (especially with the upcoming hUMA designed by —guess who— AMD).
While OpenCL might not be necessary, there's no reason not to use it, since it will mean easy, cross-platform support for multicore programming and use of vector functions, that would be useful on any modern system, even just on CPUs. (Of course, if the system also has an APU with the upcoming hUMA architecture and can access the same memory space as the CPU, why not make use of it too?)
Of course the benefits will only be visible to people with huge spreadsheets. As for 1-2-3- being speedy on a 486, well, either you only had small spreadsheets or the mist of the past is obscuring your memory.
Ok, serious questions here. Are there _technical_ reasons for hating GRASS? It does have a butt-ugly UI, but it's extremely flexible, extensible and it's designed with a Unix-like philosophy in mind, with a collection of tools that do individual things but are well integrated with each other. I'm not saying it's perfect, but then again neither is ArcGIS.
My fault. I still have troubles in considering Trident a serious contender when talking about rendering engines and standard compliance in the same sentence.
I'm an Opera user myself and while I agree that (one of) the main reason(s) for this preference was the functionality of the whole thing, I did like the Opera rendering engine, and often found it to be more standard-compliant than other engines, even when it had less coverage. I'm a little afraid that the Blink switch will break some of the functionalities I've been relying on (such as the ‘presentation mode’ in full-screen).
On the other hand, with the Blink/WebKit fork we are probably going to have three main engines again, and this is a good thing.
OpenCL is suboptimal on NVIDIA only because NVIDIA refues to keep their support up to date, as it would chip in their vendor lock-in attempt with CUDA.
I honestly think everybody doing serious manycore computing should use OpenCL. NVIDIA underperforms with that? Their problem. Ditch them.
I absolutely agree that the software support AMD has for their card is inferior to that of NVIDIA. And this definitely pisses me off, considering their hardware is _consistently_ better than the competitor, in terms of raw performance _and_ in terms of performance/price. OTOH, I get the impression that their software support is slowly getting better. At the very least, I haven't had any significant issues recently (at least using Debian unstable with their packaged drivers).
Problem is no browser follows exactly the standards, and as you point with Office every browser has bugs in it. So if you markup your page following the standards alone it won't render properly anywhere. You end up going back and rewriting some of the styling and scripting to either not use stuff that expose bugs or using browser-specific kludges to get around the bugs.
If all browsers use the same engine, at least we don't have to spend days testing pages with umpteen different browsers and getting around gumpteen bugs. And if one engine is used, wouldn't that become the de-facto standard? The trick is that the engine must be open-sourced (unlike MS Office), so that it's not controlled by a single commercial company and that bugs can be fixed by anyone at the RC stage.
The problem is that, with that kind of attitude, rendering issues in browsers will never be fixed. Even if the rendering engine is crap, and the standard claims a different (more sensible, more functional, whatever) behavior, with a single rendering engine used as the de facto standard, it would never get fixed. Unsurprisingly, whenever one reports a rendering bug, the first question that gets asked is: does it work in other engines? Luckily, we still have at least three major engines (the fourth, Presto, has only been recently abandoned), so we can still compare and see which engines are wrong in implementing that specific part of the standard, and which are not. Without these multitude of implementations, one of the primary motivation in fixing bugs disappears.
Monocultures are bad. Regardless of whether they're open-source or not.
One of the reasons I didn't use Opera was actually because Web developers never tended to create content with Opera's rendering engine in mind.
And that's actually the problem with Opera moving to webkit. Developers shouldn't have any specific rendering engine in mind. They should have the W3C standard in mind. By having one less rendering engine (even if it's just a minority one) reduces the pressure on web developers to code according to standards.
It also makes it much harder to spot bugs in rendering engines: how do you know if a particular CSS+HTML combination doesn't work as you would expect according to what the standard does? You check it against multiple engines. If one of the engine does things differently, then either it is non-compliant, or the other engines are. Having one less engine means having one less external check, and less motivation for web engine developers to code standard-compliant engines. We're falling back to web monoculture, and just because it's not IE this time it doesn't make it better.
Not necessarily. I know quite a few former Ubuntu users that switched to Mint or even plain Debian because of the last few horrible releases. I wouldn't be surprised to discover Ubuntu is losing ground fast to other Linux distributions.
OpenCL is supported by all major vendors, and it can be used both on CPU and on GPU. However, Intel's support for OpenCL on GPU is only available on Windows. Until the GalliumComute framework is ready, we won't be seeing any open source OpenCL support anywhere. (Also, Intel's GPUs support OpenCL only from HD4000 series).
I think he's referring to the hyperthreading technology itself, for which you probably can't set the HT bit unless you actually support it. Still, even though Phenoms don't have HT, they _will_ perform at closer to peak performance if you overcommit (in terms of threads). I've don some testing, and you need about 18 threads to truly saturate an X6, which is about the same number of threads that you need to saturate a dual Xeon (8 physical cores, 16 with HT).
I think you're a bit confused. This subthread is about Bulldozer and how its unusual design (where each pair of "cores" are not truly independent cores because they share a common floating point unit, instruction cache, decoder, and a couple other blocks) interacts with the Windows scheduler. Due to the superficial similarity to hyperthreading, some people maintain that if AMD had only been smart enough to make pairs of Bulldozer cores declare themselves to be one hyperthreaded core, it would have magically made Bulldozer much faster in Windows. This isn't really true, but fans looking for a reason to believe don't ever notice they're simultaneously claiming AMD was smart enough to design a great CPU and dumb enough to accidentally sabotage it in a really trivial, easy-to-fix way.
You are indeed right, I totally missed that part. And I don't know anything about the Windows scheduler, so I have no idea if advertising the core pairs would perform better if they looked like a single core with HT.
Also, your test was probably somewhat bogus. You can easily saturate a Phenom II X6 with six threads. I'd guess you ran a program where individual threads cannot individually saturate 1 CPU core. That is, they frequently go to sleep or wait on each other quite a lot. That's the only way you can continue to get significant scaling after N threads (where N equals the number of hardware threads available). Not all programs behave that way, so it's not real useful to report that one particular program happens to "scale" all the way up to 3 threads per core. (And if it does behave that way, there's no reason to believe it would behave any differently on Intel CPUs.)
The thing it, it does behave different on Intel CPUs. Tested on both a dual-Xeon (2x4 cores, doubled by HT) and on an i7 (4 cores, doubled by HT), and in both cases peak performance was achieved with a number of CPU threads matching (or very close to) the HT-advertised performance (more specifically, 18 threads in the Xeon case and 10 threads in the i7 case.
Of course this is just a very specific application, and I'm sure that the effects I'm seeing are influenced by bottlenecks from other subsystem (memory throughput, most likely); still, I find the difference between Intel and AMD CPUs is quite peculiar.
I think he's referring to the hyperthreading technology itself, for which you probably can't set the HT bit unless you actually support it. Still, even though Phenoms don't have HT, they _will_ perform at closer to peak performance if you overcommit (in terms of threads). I've don some testing, and you need about 18 threads to truly saturate an X6, which is about the same number of threads that you need to saturate a dual Xeon (8 physical cores, 16 with HT).
As s friend of mine once taught me, “first class people hire first class people. second class people hire third class people”.
There are only two rendering engines for Linux, and they are Gecko and Webkit, both of which have horrible support for a lot of advanced web standards such as SVG and MathML, because the focus today is on who makes the fanciest sliding div effect rather than on actually properly implementing existing stuff. The loss of Presto and the reduction of alternatives is a very sad day for the web.
You can still download Opera 12 for Linux. And that's actually a good thing, since Presto is still the least buggy engine when it comes to SVG and, as far as my experience is concerned, MathML.
Yeah but with this kind of applications the real bottleneck is the fact that the discrete GPU needs to access data through the high-latency, low-bandwidth PCIe bus. For this kind of application, an IGP, even with the lower core counts, is often a much better solution, unless you manage to fully cover the host-device-host transfers with computations.
I'd be really curious to see this thing done in OpenCL on a recent AMD APU, exploting all the CPU cores and the IGP cores concurrently.
Oh, I don't know, you could perhaps use the hundreds of euros you charge the article authors?
Either you're trolling or you have no frigging idea what you're talking about.
It is true that often the low-end cards are just crippled versions of the high-end cards, something which —as despicable as it might be— is nothing new to the world of technology. But going from this to saying that there is no competition and no (or slow) progress is a step into ignorance (or trolling).
I've been dealing with GPUs (for the purpose of computing, not gaming) for over five years, that is to say almost since the beginning of proper hardware support for computing on GPU. And there has been a lot of progress, even with the very little competition there has been so far.
NVIDIA alone has produced three major architectures, with very significant differences between them. If you compare the capabilities of a Tesla (1st gen) with those of a Fermi (2nd gen) or a Kepler (3rd gen), for example, you get: Fermi, has introduced an L2 and an L1 cache, which was not present in the Tesla arch, lifting some of the very strict algorithmic restrictions imposed on memory-bound kernels; it also introduced hardware-level support for DP. Kepler is not as big a change over Tesla, but it has introduced things such as the ability for stream processors to swizzle private variables among them, which is a rather revolutionary idea in the GPGPU paradigm. And 6 times more stream processors per compute unit over the previous generation is not exactly something I'd call "not that much different".
AMD has only had one major overhaul (the introduction of GCN), instead of two, but I'm not really spending more words on how much of a change it was compared to the previous VLIW architectures they had. It's a completely different beast, with the most important benefit being that its huge computing power can be harnessed much more straightforwardly. And if you ever had to hand-vectorize your code looking for the pre-GCN hotspot of workload per wavefront, you'd know what a PITN that was.
I would actually hope they stopped coming up with new archs, and spent some more time refining their software side. AMD has some of the worst drivers ever seen by a major hardware manufacturer (in fact, considering they've consistently had better, cheaper hardware, there isn't really any other explanation for their inability to gain dominance in the GPU market), but NVIDIA isn't exactly problem free: their support for OpenCL, for example, is ancient and crappy (obviously, since they'd rather have people use CUDA to do compute on their GPUs).
And hardware-wise, Intel is finally stepping up their game. With their HD4000 chipset they've finally managed to produce an IGP with decent performance (it even supports compute), although AMD's APUs are still top dog. On the HPC side, their Xeon Phi offerings are very interesting competitors to the NVIDIA Tesla (not the arch, the brand name for the HPC-dedicated devices) cards.
Which, on topic, is often significantly based on FBP. The “UNIX way” for command-line programs is essentially FBP.
Actually, if you just look at the specifications, ATI/AMD has almost always had the (theoretically) most competitive hardware (GPU-wise), both in terms of performance/price ratio and often even in terms of raw computing power/memory bandwidth. AMD was even the first to come out with hardware support for compute on GPU (the first CTM/CAL betas came out before CUDA was ever mentioned anywhere), even if it required assembly progamming of the shaders (which you could often do without by using a layer such as BrookGPU).
However, their GPUs have been crippled by the most horrible software ecosystem possible. By and large the main culprit is ATI/AMD itself, who has constantly failed at producing high-quality, stable drivers and capable compilers for their shaders. A secondary culprit (which has finally been removed from the equation) is the architecture itself: up until the introduction of GCN, AMD shaders had a VLIW architecture (VLIW5 first, VLIW4 in the last releases before GCN) which were often not easily exploitable without heavy-duty restructuring and vectorization of your shader code: so you often found yourself with huge horsepower available, while only be able to exploit some 30-60% of it at best.
Wrong. It does mean you can copy it. That's exactly what copyright is about. You can (re)publish (and thus create copies of) works in the public domain as you see fit without paying royalties to anyone.
He's trolling and a ridiculous amount of /. users fell for it.
Sure. The spelling is:
H Y P O C R I S Y.
OTOH apparently you can't.
Re. your point 1., OpenCL can run as well on CPUs (using all the cores and the vector instructions is 'free'), and of course the APUs, which can often access the same memory space as the CPU (especially with the upcoming hUMA designed by —guess who— AMD).
While OpenCL might not be necessary, there's no reason not to use it, since it will mean easy, cross-platform support for multicore programming and use of vector functions, that would be useful on any modern system, even just on CPUs. (Of course, if the system also has an APU with the upcoming hUMA architecture and can access the same memory space as the CPU, why not make use of it too?)
Of course the benefits will only be visible to people with huge spreadsheets. As for 1-2-3- being speedy on a 486, well, either you only had small spreadsheets or the mist of the past is obscuring your memory.
Ok, serious questions here. Are there _technical_ reasons for hating GRASS? It does have a butt-ugly UI, but it's extremely flexible, extensible and it's designed with a Unix-like philosophy in mind, with a collection of tools that do individual things but are well integrated with each other. I'm not saying it's perfect, but then again neither is ArcGIS.
My fault. I still have troubles in considering Trident a serious contender when talking about rendering engines and standard compliance in the same sentence.
I'm an Opera user myself and while I agree that (one of) the main reason(s) for this preference was the functionality of the whole thing, I did like the Opera rendering engine, and often found it to be more standard-compliant than other engines, even when it had less coverage. I'm a little afraid that the Blink switch will break some of the functionalities I've been relying on (such as the ‘presentation mode’ in full-screen).
On the other hand, with the Blink/WebKit fork we are probably going to have three main engines again, and this is a good thing.
I honestly think everybody doing serious manycore computing should use OpenCL. NVIDIA underperforms with that? Their problem. Ditch them.
I absolutely agree that the software support AMD has for their card is inferior to that of NVIDIA. And this definitely pisses me off, considering their hardware is _consistently_ better than the competitor, in terms of raw performance _and_ in terms of performance/price. OTOH, I get the impression that their software support is slowly getting better. At the very least, I haven't had any significant issues recently (at least using Debian unstable with their packaged drivers).
The problem is that, with that kind of attitude, rendering issues in browsers will never be fixed. Even if the rendering engine is crap, and the standard claims a different (more sensible, more functional, whatever) behavior, with a single rendering engine used as the de facto standard, it would never get fixed. Unsurprisingly, whenever one reports a rendering bug, the first question that gets asked is: does it work in other engines? Luckily, we still have at least three major engines (the fourth, Presto, has only been recently abandoned), so we can still compare and see which engines are wrong in implementing that specific part of the standard, and which are not. Without these multitude of implementations, one of the primary motivation in fixing bugs disappears.
Monocultures are bad. Regardless of whether they're open-source or not.
And that's actually the problem with Opera moving to webkit. Developers shouldn't have any specific rendering engine in mind. They should have the W3C standard in mind. By having one less rendering engine (even if it's just a minority one) reduces the pressure on web developers to code according to standards. It also makes it much harder to spot bugs in rendering engines: how do you know if a particular CSS+HTML combination doesn't work as you would expect according to what the standard does? You check it against multiple engines. If one of the engine does things differently, then either it is non-compliant, or the other engines are. Having one less engine means having one less external check, and less motivation for web engine developers to code standard-compliant engines. We're falling back to web monoculture, and just because it's not IE this time it doesn't make it better.
Not necessarily. I know quite a few former Ubuntu users that switched to Mint or even plain Debian because of the last few horrible releases. I wouldn't be surprised to discover Ubuntu is losing ground fast to other Linux distributions.
OpenCL is supported by all major vendors, and it can be used both on CPU and on GPU. However, Intel's support for OpenCL on GPU is only available on Windows. Until the GalliumComute framework is ready, we won't be seeing any open source OpenCL support anywhere. (Also, Intel's GPUs support OpenCL only from HD4000 series).
I think he's referring to the hyperthreading technology itself, for which you probably can't set the HT bit unless you actually support it. Still, even though Phenoms don't have HT, they _will_ perform at closer to peak performance if you overcommit (in terms of threads). I've don some testing, and you need about 18 threads to truly saturate an X6, which is about the same number of threads that you need to saturate a dual Xeon (8 physical cores, 16 with HT).
I think you're a bit confused. This subthread is about Bulldozer and how its unusual design (where each pair of "cores" are not truly independent cores because they share a common floating point unit, instruction cache, decoder, and a couple other blocks) interacts with the Windows scheduler. Due to the superficial similarity to hyperthreading, some people maintain that if AMD had only been smart enough to make pairs of Bulldozer cores declare themselves to be one hyperthreaded core, it would have magically made Bulldozer much faster in Windows. This isn't really true, but fans looking for a reason to believe don't ever notice they're simultaneously claiming AMD was smart enough to design a great CPU and dumb enough to accidentally sabotage it in a really trivial, easy-to-fix way.
You are indeed right, I totally missed that part. And I don't know anything about the Windows scheduler, so I have no idea if advertising the core pairs would perform better if they looked like a single core with HT.
Also, your test was probably somewhat bogus. You can easily saturate a Phenom II X6 with six threads. I'd guess you ran a program where individual threads cannot individually saturate 1 CPU core. That is, they frequently go to sleep or wait on each other quite a lot. That's the only way you can continue to get significant scaling after N threads (where N equals the number of hardware threads available). Not all programs behave that way, so it's not real useful to report that one particular program happens to "scale" all the way up to 3 threads per core. (And if it does behave that way, there's no reason to believe it would behave any differently on Intel CPUs.)
The thing it, it does behave different on Intel CPUs. Tested on both a dual-Xeon (2x4 cores, doubled by HT) and on an i7 (4 cores, doubled by HT), and in both cases peak performance was achieved with a number of CPU threads matching (or very close to) the HT-advertised performance (more specifically, 18 threads in the Xeon case and 10 threads in the i7 case.
Of course this is just a very specific application, and I'm sure that the effects I'm seeing are influenced by bottlenecks from other subsystem (memory throughput, most likely); still, I find the difference between Intel and AMD CPUs is quite peculiar.
I think he's referring to the hyperthreading technology itself, for which you probably can't set the HT bit unless you actually support it. Still, even though Phenoms don't have HT, they _will_ perform at closer to peak performance if you overcommit (in terms of threads). I've don some testing, and you need about 18 threads to truly saturate an X6, which is about the same number of threads that you need to saturate a dual Xeon (8 physical cores, 16 with HT).
Mee 2