Reserving one core for the GUI and mouse would be pretty wasteful. And it doesn't really address the problem. The problem is that shared resources need to be protected from corruption when multiple processes/threads access them. This is usually done with locks. If you are'nt careful then these locks can limit serialization.
In this case the likely fix is to reduce contention for the lock, perhaps by not creating/destroying GDI objects for processes that don't need them.
I'm not sure if this answer was intended to be serious or not. Either way, just to clarify, it is entirely false. The process that updates the mouse position is not frequently recycled.
The real reason is that the same system-global lock is used for a lot of purposes. The same lock appears to protect GDI objects (destroyed when processes are destroyed, even those that never use GDI) and process message queues. Thus, mouse-movement and UI updates are fighting over the same lock which process destruction uses.
So, is it square root (100 is the square root of 10,000) or cube root (like the post said) or is it the square of the fifth root (100 compared to 100,000).
'cause so far the more information I'm given about this story the less I know, and the apology didn't actually clarify anything.
How is this better or different from the single-step option of setting the WpadOverride registry key to "1"? And since you say this "should work for most users", what users will it not work for?
It is unfortunate that the original article didn't explain this carefully (or at all, actually).
Yeah, seriously. Telling people that you are at risk of account compromise unless you do "X" and then giving zero instructions on how to do "X" is pretty terrible.
I did Google for instructions on how to disable Wpad and found the registry setting mentioned above, but it didn't seem clear whether that was sufficient. The instructions below saying "This should work for most users" just add to the confusion.
Fair enough. I didn't realize how ambiguous that sentence would appear, especially when the slashdot summary omitted '2015'. I fixed the sentence in my blog post.
> and is not that much important news
I thought it was interesting, which was why I wrote it. Whether it is important is up to each reader to decide.
> Shingled Magnetic Recording drives, can not typically be used natively by the OS
but gives no reference or explanation for this claim. Searching for this claim finds it repeated verbatim on many sites, but no explanation.
The details of the recording technology rarely matter to the OS which treats the device as block-level storage. I'm not saying it's impossible for SMR to require an OS update, but I would like an explanation or reference.
It sounds like problem is that an SMR drive can't write to a single track, so using it *efficiently* requires an OS update, much like SSDs. But some clarification would be nice.
Please stop describing this book as "using only the thousand most common English words". The word 'thousand' is not one of the thousand most common English words, which is why Randall describes the book as "using only the ten hundred most common English words". Missing that detail is practically missing the entire point.
I've been running the same Windows install on my laptop for 4.5 years and it still feels quite fast to me. I installed an SSD last year, which obviously helps a lot. Prior to that there was the predictable delay whenever I launched a program that I hadn't run for a while (that wasn't in the disk cache), and now I don't even have that. I have *lots* of programs installed, but I see none of the sluggishness which you describe.
A noticeable slowdown in four weeks is quite odd, unusual, and not normal.
The problem with your report is that it is hopelessly vague. What is slow? Launching programs? Running programs? Poor frame rate in some games?
Do you have enough memory? Do you have enough CPU cores?
Three possibilities come to mind: 1) You don't have enough RAM. If so (if there aren't many GB available at all times according to task manager) then get more. 2) Your CPU is overheating. While doing performance investigations for Valve I found that a lot of game slowdowns were caused by thermal throttling: https://randomascii.wordpress.... 3) Something else is wasting CPU or memory. When I did hit sluggishness a few years ago I investigated and found the buggy device driver that was clearing the system disk cache: https://randomascii.wordpress....
So no, it's definitely not normal. To figure out what is going on you need to monitor specific details about your system in order to find and fix the root cause. slow/sluggish is not an actionable bug report.
If a buy a computer with a CPU that is rated at X GHz then that CPU had better be able to maintain that frequency, always. Otherwise it's a meaningless number. CPUs can already overclock themselves (Turboboost) above that frequency so if they can also legitimately underclock themselves then the 'rated frequency' is completely meaningless. I don't think that is acceptable. I encourage all slashdot readers to test their new computers under load and if they cannot maintain their rated frequency RETURN THEM! Or better yet, file a formal complaint for false advertising or fraud and then return them.
I blogged about this a while ago and I think the problem has only gotten worse. Lots of consumers are getting a crap experience because of insufficient cooling, manufacturers are selling rigs that can't do what they promise, and software developers waste time dealing with complaints about slow games/etc.
> who've either spent thousands on astrological equipment
Well there's your problem -- you should have been focusing on telescopes instead of the Zodiac.
I've got a 6" Dobsonian telescope -- not terrifically expensive, under $1,000 I'm sure -- and I've enjoyed Jupiter moon transits before. It's no Hubble, but I enjoy it.
I've only used inline assembly in VC++ and, as you observe, it usually disoptimizes the rest of the function. I don't know if gcc/clang handle it better.
> the human eye has difficulty seeing more than 60hz.
Not true. And, a broad claim like that conflates many different concepts. Flicker fusion can require 85 Hz to not cause headaches for some people (especially with the low persistence needed for non-blurry VR), and smooth motion continues to feel smoother up to at least 120 Hz.
In addition, lower frame rates generally mean increased latency, and latency is probably the biggest cause of VR nausea.
But don't take my word for it. This blog post does a great job of summarizing the latest research on the topic: http://home.comcast.net/~tom_f...
I have no idea what cheap CPUs and server I/O have to do with motion tracking, but tracking a single point (translation and rotation) is exactly what is needed for VR -- that point is the user's head, and tracking it with low latency is what makes VR work.
> The only difference between now and 20 years ago...
is everything. The technology is orders of magnitude cheaper and more capable.
What about the add with carry? That's the particularly hairy bit. Even if clang/gcc/VC++ recognize the pattern and turn it into optimal code, add with carry is a case where assembly language is cleaner and more elegant than the equivalent high-level language code.
I'm not a fan of inline assembler because it often gives you the worst of both worlds -- incomplete control over code generation, and worse syntactic messiness than pure assembly language. But yes, a mixture of C++ and assembly is definitely the right solution, either inline assembly or a single separate function to do the messy math.
Actually the trend is in the opposite direction -- fewer of the math functions are implemented in hardware than used to be. There are many reasons (optimized out-of-order CPUs and old/slow transcendental implementations) but one significant reason is that the new glibc math functions are generally correctly rounded -- exactly correct. Whereas the hardware versions are often not -- as I discussed in this recent blog post:
High-precision math is an excellent time to use assembly language. Assembly languages generally have a way to express ideas like a 32x32->64-bit multiply (and 64x64->128-bit multiply), and add-with-carry. High-level languages generally support neither of those options directly. To tell the compiler that you want a 32x32->64-bit multiply you generally have to have two 32-bit inputs, then cast one of them to 64-bit, and hope that the compiler doesn't actually generate a 64x64 multiply.
For 64x64->128-bit multiplies the problem is more difficult because many languages don't have a 128-bit type, and yet these multiplies are crucial for getting maximal multi-precision performance on x64.
Without access to the carry flag a programmer in a high-level language has to do things like:
a0 += b0; if (a0 https://randomascii.wordpress....
As the article mentions, inflation adjusted gas taxes have been dropping for 21 years. That doesn't make sense. At the very least they should be returned to their 1993 levels and indexed for inflation. Roads are crowded and a gas tax would relieve that by encouraging alternatives. It would also reduce pollution, reduce carbon emissions, reduce oil imports that kill the balance of trade and finance people who use the money to try to kill us.
Gas taxes really are good.
As to the complaints that the gas taxes are being used to fund other things, such as bicycle paths and mass transit -- I'm not sure how true that is, but you would be foolish to fight it. Alternatives to driving are crucial. Paving huge amounts of land makes walking and biking very difficult so drivers *owe* the non-drivers a bit of help. And drivers benefit *greatly* from mass transit. With no mass transit the traffic congestion would be even worse.
And, given that drivers park free almost everywhere it is truly rich to hear drivers complaining about having to subsidize transit. The implicit subsidy that cars get through free parking is orders of magnitude greater (read "The High Cost of Free Parking" for all the details).
It is of course well known that, for double precision, sin(x) == x if x 1.49e-8. They teach that in kindergarten these days.
However the article is about sin(double(pi)), and pi is actually greater than 1.49e-8. Therefore range reduction needs to be done, and that is where things go awry.
Yes, the caller could do the range reduction, but this is not trivial to do correctly and it really should be done by the sin() function. With glibc 2.19 it is.
But the parent you refer to (this comment's great grandparent?) says "Any serious calculation requires an error calculation to go with it." Sure. I can agree with that.
And that's the whole point of the article. If somebody does an error calculation based on Intel's documentation then they will have an incorrect error calculation -- in some cases grossly incorrect. So the claim that an error calculation is needed actually *supports* the article (mine, BTW), while arguing against it.
I think ledow is violently agreeing with my article, perhaps without realizing it.
As the other reply said, the function contract and an explanation of the implementation are different beasts. The Intel manual said that fsin is accurate across a huge range of values. Elsewhere the Manual hinted at Intel's range reduction algorithm such that a numerical analyst could suspect that there was a problem. So, to a numerical analyst the documentation was contradictory. To anybody else it was clear -- guaranteed one ulps precision. The documentation was therefore at best misleading, but for most people it was just wrong. Linus Torvalds was fooled, for example.
> If you let x = pi, then people would ordinarily expect that sin (x) = 0.
Many people would expect that, although the article (my article) most certainly did not expect that.
The calculation being done in the article takes into account the fact that double(pi) is not the same as the mathematical constant pi and it uses that and the properties of sine to measure the error in double(pi). This is an unusual but perfectly reasonable calculation to make. It failed because of the limitations of fsin. Those limitations are contradictory to the documentation. Hence the article.
> A more precise approximation according to Wikipedia would have been...
You don't need to go to Wikipedia, you just have to RTA. It lists a 192-bit hexadecimal approximation of pi.
> The reality is that the correct result would have been zero
No. Zero is the correct answer if doing symbolic or infinite precision math, but I did not make that assumption because I was doing double-precision math.
Can you give an example of when the fsin instruction's accuracy would be insufficient? A few people have asked and a clear answer would be quite helpful.
There's an alternate test that works on the Windows 7 calculator to show that it is not implemented on the FPU. Calculate square root of four, and then subtract two. You should get zero but you don't.
That is an error that no FPU would make. The IEEE standard requires a correctly rounded result for square root, and the Windows 7 calculator fails to deliver that on even very simple inputs like four. The Windows 7 calculator does its calculations to more digits of precision than double precision, but it doesn't do its calculations accurately. Oops.
How would you like that example number to be given? In hexadecimal for maximum readability? Giving it in decimal is necessary for communicating the issue.
But the reality is that that number can show up during calculations. Intel promises to calculate it's tangent to a specific precision, but Intel's documentation is incorrect. That is the problem -- hugely misleading documentation.
The issue of not being able to fully specify a particular real number is actual crucial to the article though, but in a different way. The example is sin(double(pi)), what it should be, what that should allow, and how that fails. You should read the article. I think it's excellent.
> Any serious calculation requires an error calculation to go with it.
Sure. That sounds good. And in order to make that error calculation you need to consult the documentation to know how accurate the instructions you are using are. Let me see -- Intel says that fsin is accurate to one ULP. That's sufficient. Error calculation done.
That's why the inaccuracies in their documentation matters.
> I'll tell you now that I wouldn't rely on a FPU instruction to be anywhere near accurate.
Really. Well that seems foolish. As required by the IEEE standard the x87 FPU supplies correctly rounded results for add, subtract, multiply, divide, and square root. At double and long-double precision you can't do better. Those can be composed into higher-level functions with well defined accuracy if you know what you are doing.
It's funny that there is one group of people asking when this tiny error could even matter (http://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-bounds-by-1-3-quintillion/#comment-13638) and another group saying that even without this error the precision is insufficient. You guys should talk.
Reserving one core for the GUI and mouse would be pretty wasteful. And it doesn't really address the problem. The problem is that shared resources need to be protected from corruption when multiple processes/threads access them. This is usually done with locks. If you are'nt careful then these locks can limit serialization.
In this case the likely fix is to reduce contention for the lock, perhaps by not creating/destroying GDI objects for processes that don't need them.
I'm not sure if this answer was intended to be serious or not. Either way, just to clarify, it is entirely false. The process that updates the mouse position is not frequently recycled.
The real reason is that the same system-global lock is used for a lot of purposes. The same lock appears to protect GDI objects (destroyed when processes are destroyed, even those that never use GDI) and process message queues. Thus, mouse-movement and UI updates are fighting over the same lock which process destruction uses.
So, is it square root (100 is the square root of 10,000) or cube root (like the post said) or is it the square of the fifth root (100 compared to 100,000).
'cause so far the more information I'm given about this story the less I know, and the apology didn't actually clarify anything.
How is this better or different from the single-step option of setting the WpadOverride registry key to "1"? And since you say this "should work for most users", what users will it not work for?
It is unfortunate that the original article didn't explain this carefully (or at all, actually).
Yeah, seriously. Telling people that you are at risk of account compromise unless you do "X" and then giving zero instructions on how to do "X" is pretty terrible.
I did Google for instructions on how to disable Wpad and found the registry setting mentioned above, but it didn't seem clear whether that was sufficient. The instructions below saying "This should work for most users" just add to the confusion.
Fair enough. I didn't realize how ambiguous that sentence would appear, especially when the slashdot summary omitted '2015'. I fixed the sentence in my blog post.
> and is not that much important news
I thought it was interesting, which was why I wrote it. Whether it is important is up to each reader to decide.
The post says:
> Shingled Magnetic Recording drives, can not typically be used natively by the OS
but gives no reference or explanation for this claim. Searching for this claim finds it repeated verbatim on many sites, but no explanation.
The details of the recording technology rarely matter to the OS which treats the device as block-level storage. I'm not saying it's impossible for SMR to require an OS update, but I would like an explanation or reference.
It sounds like problem is that an SMR drive can't write to a single track, so using it *efficiently* requires an OS update, much like SSDs. But some clarification would be nice.
Please stop describing this book as "using only the thousand most common English words". The word 'thousand' is not one of the thousand most common English words, which is why Randall describes the book as "using only the ten hundred most common English words". Missing that detail is practically missing the entire point.
> This is all just superstition at this point without numbers.
Yep. That. Let's see some numbers so that we can do science instead of divining.
I've been running the same Windows install on my laptop for 4.5 years and it still feels quite fast to me. I installed an SSD last year, which obviously helps a lot. Prior to that there was the predictable delay whenever I launched a program that I hadn't run for a while (that wasn't in the disk cache), and now I don't even have that. I have *lots* of programs installed, but I see none of the sluggishness which you describe.
A noticeable slowdown in four weeks is quite odd, unusual, and not normal.
The problem with your report is that it is hopelessly vague. What is slow? Launching programs? Running programs? Poor frame rate in some games?
Do you have enough memory? Do you have enough CPU cores?
Three possibilities come to mind:
1) You don't have enough RAM. If so (if there aren't many GB available at all times according to task manager) then get more.
2) Your CPU is overheating. While doing performance investigations for Valve I found that a lot of game slowdowns were caused by thermal throttling: https://randomascii.wordpress....
3) Something else is wasting CPU or memory. When I did hit sluggishness a few years ago I investigated and found the buggy device driver that was clearing the system disk cache: https://randomascii.wordpress....
So no, it's definitely not normal. To figure out what is going on you need to monitor specific details about your system in order to find and fix the root cause. slow/sluggish is not an actionable bug report.
If a buy a computer with a CPU that is rated at X GHz then that CPU had better be able to maintain that frequency, always. Otherwise it's a meaningless number. CPUs can already overclock themselves (Turboboost) above that frequency so if they can also legitimately underclock themselves then the 'rated frequency' is completely meaningless. I don't think that is acceptable. I encourage all slashdot readers to test their new computers under load and if they cannot maintain their rated frequency RETURN THEM! Or better yet, file a formal complaint for false advertising or fraud and then return them.
I blogged about this a while ago and I think the problem has only gotten worse. Lots of consumers are getting a crap experience because of insufficient cooling, manufacturers are selling rigs that can't do what they promise, and software developers waste time dealing with complaints about slow games/etc.
https://randomascii.wordpress....
> who've either spent thousands on astrological equipment
Well there's your problem -- you should have been focusing on telescopes instead of the Zodiac.
I've got a 6" Dobsonian telescope -- not terrifically expensive, under $1,000 I'm sure -- and I've enjoyed Jupiter moon transits before. It's no Hubble, but I enjoy it.
I've only used inline assembly in VC++ and, as you observe, it usually disoptimizes the rest of the function. I don't know if gcc/clang handle it better.
> the human eye has difficulty seeing more than 60hz.
Not true. And, a broad claim like that conflates many different concepts. Flicker fusion can require 85 Hz to not cause headaches for some people (especially with the low persistence needed for non-blurry VR), and smooth motion continues to feel smoother up to at least 120 Hz.
In addition, lower frame rates generally mean increased latency, and latency is probably the biggest cause of VR nausea.
But don't take my word for it. This blog post does a great job of summarizing the latest research on the topic:
http://home.comcast.net/~tom_f...
I have no idea what cheap CPUs and server I/O have to do with motion tracking, but tracking a single point (translation and rotation) is exactly what is needed for VR -- that point is the user's head, and tracking it with low latency is what makes VR work.
> The only difference between now and 20 years ago...
is everything. The technology is orders of magnitude cheaper and more capable.
What about the add with carry? That's the particularly hairy bit. Even if clang/gcc/VC++ recognize the pattern and turn it into optimal code, add with carry is a case where assembly language is cleaner and more elegant than the equivalent high-level language code.
I'm not a fan of inline assembler because it often gives you the worst of both worlds -- incomplete control over code generation, and worse syntactic messiness than pure assembly language. But yes, a mixture of C++ and assembly is definitely the right solution, either inline assembly or a single separate function to do the messy math.
Actually the trend is in the opposite direction -- fewer of the math functions are implemented in hardware than used to be. There are many reasons (optimized out-of-order CPUs and old/slow transcendental implementations) but one significant reason is that the new glibc math functions are generally correctly rounded -- exactly correct. Whereas the hardware versions are often not -- as I discussed in this recent blog post:
http://randomascii.wordpress.c...
High-precision math is an excellent time to use assembly language. Assembly languages generally have a way to express ideas like a 32x32->64-bit multiply (and 64x64->128-bit multiply), and add-with-carry. High-level languages generally support neither of those options directly. To tell the compiler that you want a 32x32->64-bit multiply you generally have to have two 32-bit inputs, then cast one of them to 64-bit, and hope that the compiler doesn't actually generate a 64x64 multiply.
For 64x64->128-bit multiplies the problem is more difficult because many languages don't have a 128-bit type, and yet these multiplies are crucial for getting maximal multi-precision performance on x64.
Without access to the carry flag a programmer in a high-level language has to do things like:
a0 += b0;
if (a0 https://randomascii.wordpress....
As the article mentions, inflation adjusted gas taxes have been dropping for 21 years. That doesn't make sense. At the very least they should be returned to their 1993 levels and indexed for inflation. Roads are crowded and a gas tax would relieve that by encouraging alternatives. It would also reduce pollution, reduce carbon emissions, reduce oil imports that kill the balance of trade and finance people who use the money to try to kill us.
Gas taxes really are good.
As to the complaints that the gas taxes are being used to fund other things, such as bicycle paths and mass transit -- I'm not sure how true that is, but you would be foolish to fight it. Alternatives to driving are crucial. Paving huge amounts of land makes walking and biking very difficult so drivers *owe* the non-drivers a bit of help. And drivers benefit *greatly* from mass transit. With no mass transit the traffic congestion would be even worse.
And, given that drivers park free almost everywhere it is truly rich to hear drivers complaining about having to subsidize transit. The implicit subsidy that cars get through free parking is orders of magnitude greater (read "The High Cost of Free Parking" for all the details).
It is of course well known that, for double precision, sin(x) == x if x 1.49e-8. They teach that in kindergarten these days.
However the article is about sin(double(pi)), and pi is actually greater than 1.49e-8. Therefore range reduction needs to be done, and that is where things go awry.
Yes, the caller could do the range reduction, but this is not trivial to do correctly and it really should be done by the sin() function. With glibc 2.19 it is.
But the parent you refer to (this comment's great grandparent?) says "Any serious calculation requires an error calculation to go with it." Sure. I can agree with that.
And that's the whole point of the article. If somebody does an error calculation based on Intel's documentation then they will have an incorrect error calculation -- in some cases grossly incorrect. So the claim that an error calculation is needed actually *supports* the article (mine, BTW), while arguing against it.
I think ledow is violently agreeing with my article, perhaps without realizing it.
As the other reply said, the function contract and an explanation of the implementation are different beasts. The Intel manual said that fsin is accurate across a huge range of values. Elsewhere the Manual hinted at Intel's range reduction algorithm such that a numerical analyst could suspect that there was a problem. So, to a numerical analyst the documentation was contradictory. To anybody else it was clear -- guaranteed one ulps precision. The documentation was therefore at best misleading, but for most people it was just wrong. Linus Torvalds was fooled, for example.
> If you let x = pi, then people would ordinarily expect that sin (x) = 0.
Many people would expect that, although the article (my article) most certainly did not expect that.
The calculation being done in the article takes into account the fact that double(pi) is not the same as the mathematical constant pi and it uses that and the properties of sine to measure the error in double(pi). This is an unusual but perfectly reasonable calculation to make. It failed because of the limitations of fsin. Those limitations are contradictory to the documentation. Hence the article.
> A more precise approximation according to Wikipedia would have been...
You don't need to go to Wikipedia, you just have to RTA. It lists a 192-bit hexadecimal approximation of pi.
> The reality is that the correct result would have been zero
No. Zero is the correct answer if doing symbolic or infinite precision math, but I did not make that assumption because I was doing double-precision math.
Can you give an example of when the fsin instruction's accuracy would be insufficient? A few people have asked and a clear answer would be quite helpful.
There's an alternate test that works on the Windows 7 calculator to show that it is not implemented on the FPU. Calculate square root of four, and then subtract two. You should get zero but you don't.
That is an error that no FPU would make. The IEEE standard requires a correctly rounded result for square root, and the Windows 7 calculator fails to deliver that on even very simple inputs like four. The Windows 7 calculator does its calculations to more digits of precision than double precision, but it doesn't do its calculations accurately. Oops.
How would you like that example number to be given? In hexadecimal for maximum readability? Giving it in decimal is necessary for communicating the issue.
But the reality is that that number can show up during calculations. Intel promises to calculate it's tangent to a specific precision, but Intel's documentation is incorrect. That is the problem -- hugely misleading documentation.
The issue of not being able to fully specify a particular real number is actual crucial to the article though, but in a different way. The example is sin(double(pi)), what it should be, what that should allow, and how that fails. You should read the article. I think it's excellent.
> Any serious calculation requires an error calculation to go with it.
Sure. That sounds good. And in order to make that error calculation you need to consult the documentation to know how accurate the instructions you are using are. Let me see -- Intel says that fsin is accurate to one ULP. That's sufficient. Error calculation done.
That's why the inaccuracies in their documentation matters.
> I'll tell you now that I wouldn't rely on a FPU instruction to be anywhere near accurate.
Really. Well that seems foolish. As required by the IEEE standard the x87 FPU supplies correctly rounded results for add, subtract, multiply, divide, and square root. At double and long-double precision you can't do better. Those can be composed into higher-level functions with well defined accuracy if you know what you are doing.
It's funny that there is one group of people asking when this tiny error could even matter (http://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-bounds-by-1-3-quintillion/#comment-13638) and another group saying that even without this error the precision is insufficient. You guys should talk.