You obviously don't work on large projects where build times can be 30 minutes and link times can be 5-10 minutes on top of that. In the past we have tried just about everything possible to make our compiles faster because it allows more iteration and less time waiting on code building. This include minimize include dependencies and looking at dependency graphs, benchmarking distributed build systems (incredibuild), working with pre-compiled headers, examining unity-builds / unified builds (think one CPP that includes many other CPP's in the same system), etc. We also buy fast hardware (8 core CPU's with 16 threads), 32 GB Memory, and fast SSD's. All because minimizing build time is means more productive time for developers.
The fast co-processors (Tom and Jerry) didn't have instruction caches (as you would think of them today anyhow). They did have a small amount (4K) of directly mapped local memory. They were originally designed to run programs either in this memory or in normal memory. However, due to bugs in the chip, you could only reliably run code from the 4K internal memory. Since this was directly mapped, that meant all your code had to run in 4K. If you wanted to run larger programs, you needed a small amount of resident code that swapped functions or chunks of larger code into memory and did fixups on them and then ran them on the GPU. Most developers didn't have the expertise to do this themselves so indeed, a lot of game code ran on the 68K with certain heavy lifting functions (graphics transforms or blitter programming) happening on TOM and then usually just Audio/DSP (software mixing) on JERRY.
FWIW, the hardware was quite buggy as well. I think I averaged finding around one undocumented hardware bug per week in the various coprocessing chips while working on the system.
It's worth noting you didn't have to be "one of the 5 people with an Atari Jaguar" to play the original game. High Voltage Software did a port of the game to Playstation titled Tempest X3. I even did a very tiny amount of work on that project although I don't remember if I received a formal credit or not.
What I want to know is who they had to waterboard to get insurance companies to provide information about their policies written at a 6th-grade level...
One benefit of Obamacare is standardizing insurance policies for what they will cover, eliminating many fine print items (like pre-existing conditions, age restrictions, setting standard limits for copays and out-of-pocket expenses). The only major differences are deductible, premiums, and doctor's network within an insurance class on the exchanges. This makes it much easier to make apples-to-apples comparisons and actually makes the free market of the exchanges work better for consumers.
"[T]he content was written at a sixth-grade reading level so it would be as easy to understand as possible."
They really are setting the bar high in Kentucky.
FWIW, "Harry Potter" is written at a "Sixth Grade" reading level although a number of kids start reading that book in third or fourth grade.
The are using Azure to provide cloud backup (and Azure active directory syncing) and Single Sign-In Services. It's not so much making Azure hard to avoid but actually providing useful utility near seamlessly in Azure.
Is included with the purchase of PS4. Sony is probably just prioritizing what they can do in the time before launch. There's no reason they couldn't choose to eventually support Bluetooth later as well as USB (even though they're not promising Bluetooth now) and there's presumeably no reason why a third party couldn't create a USB-to-Bluetooth dongle for headsets either.
It could have been a woman with her face uncovered. Or a woman, period, driving the car naked. The sentence would be death by stoning and/or beheading.
While that seems to be the current spin on this, just a few days ago, everyone was reporting that it was Kerry that first mentioned this as an option -- Russia just ran with it once they had the chance. Not that it changes anything...I'm glad it seems to be working out in some sort of peaceful way.
It's worth noting that Assad is basically in Putin's pocket since Russia supplies Syria with a large number of it's armaments. Syria is a good customer / proxy / puppet and it's in Putin's interest to have a peaceful resolution which leaves Assad in power.
Nice joke. Although in the current vernacular of media encoding and playback, codec is commonly used to describe modules or libraries that provide EITHER encoding or decoding (or both). For example, if you get a "codec" pack to play media formats, it's often just the decoders. And when you list codecs in FFMPEG, it tells you whether it supports encoding or decoding as separate flags.
I felt the choice of wording was a bit prematurely dismissive (i.e. saying it shouldn't apply to single socket CPU's or to Game Programming -- especially since that is the primary target of my concurrency research).
Also, we are not trying to write specifically to HLE. We are trying to write stuff that runs well on multicore systems and then layer HLE on top of it for an added performance benefit for when we do have lock conflicts.
I agree that well written applications don't have nearly as many locking conflicts to begin with and that's certainly our goal. We try to run most of our game using a multicore graph driven data dependency scheduler (also presented at GDC by my coworker).
But there are a number of systems (both internal and legacy) that do use locking that will benefit from HLE. It's about making the code run as fast as possible.
When we have to lock, we try to make it as fine-grained as possible (until you get diminishing returns in either performance or memory). HLE works well with existing algorithms (almost nothing to rewrite except to specify whether the CAS is acquire or release for a lightweight user space lock) and it is backwards compatible with an extremely low penalty for processors without HLE -- from the benchmarks I have seen, HLE code will run as fast as the original locking code (to within some white noise of most performance metrics on the unsupported CPU's) which should be fairly fast assuming you use RWL's, striped locks, and organize data algorithms to minimize contention.
HLE is a "no brainer" for ease of implmentation. However, a TSX RTM code path does require algorithm rewrites so perhaps that is akin to "writing in asm for a 15% improvement over C" (although some game programmers would find that tradeoff acceptable in small code funtions in low level libraries anyhow!).
As far as gamers are concerned, if we can give them a noticeable performance boost by taking advantage of a specific CPU feature without slowing down the code on CPU's without that feature, they will consider that to be a big win and love us for it.
So as far as game design goes, the transaction stuff is worse than worthless.
I want to feel you're not just trolling me because apparently you've been developing software since at least the Amiga days (we have that in common). However, I feel you are quite misguided on some of your assumptions here.
Not to say I may have a more informed opinion than you because I don't know your personal experience in game development, but I certainly feel that TSX isn't worthless for games and I've been writing performance code full time for games for over 20 years.
Also, I don't see why you keep referencing global locks and spin locking as the only things that would benefit. Did you get a chance to read the presentation I linked to? Mind you, it's based on work I did 4-5 years ago and presented almost 4 years ago, but even back then we were well ahead of the starting point you seem to feel developers are using as a base.
We are already using fine-grained locking, striped locking, reader/writer locks, lock-free atomic SList, lock free allocators, etc. I am interested in having TSX speed up these advanced concurrency primitives on platforms where it is available. If AMD releases ASF, I'll look into accelerating with that as well.
Nope... it's very valuable. Basically what you do is you write code that makes use of fine-grained user space locking that has a fallback to OS locks on contention. This runs very fast on multicore systems.
Then you add HLE extensions and it runs even faster on CPU's that support TSX and you get a rather large performance bonus for free as at that point a majority of your atomic operations become free.
It also allows you to do substitute simple lock-free and non-blocking algorithms that rely on multiaddress DCAS or NCAS on platforms with TSX because they are very simple to emulate either using HLE or even better RTM.
If you think TSX is for spinlocks, you are generations behind in your programming of lockfree algorithms.
It's almost an oxymoron if you are talking about a single-socket Intel cpu. You don't actually need the transactional extensions to make things go fast
Not true... I've written an entire concurrency system including a lock free library and a multicore memory mananger. There are a number of places where TSX offers a large speed improvement even on a single core.
If the purpose is to test code performance then it is better to test without transaction support anyway since transaction support is not a replacement for proper algorithmic design. Or to put it another way... if you code SPECIFICALLY for one of the two intel transactional models that means you will probably wind up with very sloppy code (such as using global spinlocks more than you need to and assuming that the underlying transaction just won't conflict as much). The code might run fine on an Intel cpu but its performance value will not be portable.
Are you even familiar with how TSX works? Hardware Lock Elision is a very simple replacement for atomic locking. You can write a very simple user level mutex using atomic operations that has a fallback to an OS yielding construct. In fact that's what we do in my concurrency library. Uncontested access is a single atomic op while contested access is an actual OS mutex. With HLE, all accesses can appear to be uncontested unless there is an actual data conflict in memory read / written to during the transaction. This cuts down significantly on OS lock calls. It's not just for spinlocks.
And besides... 'your company' ? Use a Xeon then, right? It's not as though it costs all that much more.
-Matt
By "my company", I mean the company I work for. Disclamer: Netherrealm Studios which is owned by Warner Brothers but thia is my personal opinion and any posts I make here do not reflect the official option Warner Brothers nor on Netherrealm Studios. We make video games and I write low level optimized code for multicore / multithreading libraries among many other things.
For what it's worth, none of the Haswell 'K' line supports TSX. You actually have to buy a cheaper CPU to get this feature which is odd... maybe it didn't validate well with overclocking though? The new 'HQ' line seems to support it but the new 'R' line does not.
Anyhow, I'm wondering if the 'X' line supports TSX or not. I can't find docs or specs that answer one way or another right now.
You obviously don't work on large projects where build times can be 30 minutes and link times can be 5-10 minutes on top of that. In the past we have tried just about everything possible to make our compiles faster because it allows more iteration and less time waiting on code building. This include minimize include dependencies and looking at dependency graphs, benchmarking distributed build systems (incredibuild), working with pre-compiled headers, examining unity-builds / unified builds (think one CPP that includes many other CPP's in the same system), etc. We also buy fast hardware (8 core CPU's with 16 threads), 32 GB Memory, and fast SSD's. All because minimizing build time is means more productive time for developers.
The fast co-processors (Tom and Jerry) didn't have instruction caches (as you would think of them today anyhow). They did have a small amount (4K) of directly mapped local memory. They were originally designed to run programs either in this memory or in normal memory. However, due to bugs in the chip, you could only reliably run code from the 4K internal memory. Since this was directly mapped, that meant all your code had to run in 4K. If you wanted to run larger programs, you needed a small amount of resident code that swapped functions or chunks of larger code into memory and did fixups on them and then ran them on the GPU. Most developers didn't have the expertise to do this themselves so indeed, a lot of game code ran on the 68K with certain heavy lifting functions (graphics transforms or blitter programming) happening on TOM and then usually just Audio/DSP (software mixing) on JERRY.
FWIW, the hardware was quite buggy as well. I think I averaged finding around one undocumented hardware bug per week in the various coprocessing chips while working on the system.
It's worth noting you didn't have to be "one of the 5 people with an Atari Jaguar" to play the original game. High Voltage Software did a port of the game to Playstation titled Tempest X3. I even did a very tiny amount of work on that project although I don't remember if I received a formal credit or not.
What I want to know is who they had to waterboard to get insurance companies to provide information about their policies written at a 6th-grade level...
One benefit of Obamacare is standardizing insurance policies for what they will cover, eliminating many fine print items (like pre-existing conditions, age restrictions, setting standard limits for copays and out-of-pocket expenses). The only major differences are deductible, premiums, and doctor's network within an insurance class on the exchanges. This makes it much easier to make apples-to-apples comparisons and actually makes the free market of the exchanges work better for consumers.
Also, most newspapers and magazines target around a "Sixth Grade" reading level. There are in-depth articles that are occasionally the exception
The average American reads at a "Seventh Grade" reading level so targeting "Sixth Grade" gives you a wider audience.
"[T]he content was written at a sixth-grade reading level so it would be as easy to understand as possible."
They really are setting the bar high in Kentucky.
FWIW, "Harry Potter" is written at a "Sixth Grade" reading level although a number of kids start reading that book in third or fourth grade.
The are using Azure to provide cloud backup (and Azure active directory syncing) and Single Sign-In Services. It's not so much making Azure hard to avoid but actually providing useful utility near seamlessly in Azure.
Is included with the purchase of PS4. Sony is probably just prioritizing what they can do in the time before launch. There's no reason they couldn't choose to eventually support Bluetooth later as well as USB (even though they're not promising Bluetooth now) and there's presumeably no reason why a third party couldn't create a USB-to-Bluetooth dongle for headsets either.
It could have been a woman with her face uncovered. Or a woman, period, driving the car naked. The sentence would be death by stoning and/or beheading.
It is illegal for women to drive a car in Saudi Arabia regardless of what they are wearing and women have received sentences of being whipped with a lash for driving there as recently as 2011.
But there are worse places in the Middle East, where women have been stoned to death for owning a cell phone.
How can you be an extreme atheist?
Ban religion -- enforce loyalty to the state". Send anyone who is vocal about their religious beliefs to "re-education camps".
The Nest Protect manual states that the lifetime of this device is 7 years and needs to be replaced after 7 years.
FTA: "the final switch that prevented disaster could easily have been shorted by an electrical jolt, leading to a nuclear burst."
We need to retire the use of the phrase "theory" when used in the context of a scientific theory.
So you're LITERALLY asking to change the definition of a word or retire its proper meaning when enough stupid people use it wrong?
Sigh... Again...
While that seems to be the current spin on this, just a few days ago, everyone was reporting that it was Kerry that first mentioned this as an option -- Russia just ran with it once they had the chance. Not that it changes anything...I'm glad it seems to be working out in some sort of peaceful way.
It's worth noting that Assad is basically in Putin's pocket since Russia supplies Syria with a large number of it's armaments. Syria is a good customer / proxy / puppet and it's in Putin's interest to have a peaceful resolution which leaves Assad in power.
In the case of FFMPEG, it's not idiots... it's the EXPERTS that are using the term in this way.
I'm not keen to have spinning parts in a device that I drop a couple times a day.
So you replace the tablet often??? Tablets screens will crack on a single drop if the screen lands on a hard surface or usually shatter if the tablet lands on an edge as well.
If there's no encoding, isn't it just a dec?
Nice joke. Although in the current vernacular of media encoding and playback, codec is commonly used to describe modules or libraries that provide EITHER encoding or decoding (or both). For example, if you get a "codec" pack to play media formats, it's often just the decoders. And when you list codecs in FFMPEG, it tells you whether it supports encoding or decoding as separate flags.
Unfortunately that link got you to a page on www.citeworld.com which carries a link to www.nytimes.com
After a wild goose chase I finally got that link ---
https://aboutthedata.com/
And that site is down :-(
I felt the choice of wording was a bit prematurely dismissive (i.e. saying it shouldn't apply to single socket CPU's or to Game Programming -- especially since that is the primary target of my concurrency research).
Also, we are not trying to write specifically to HLE. We are trying to write stuff that runs well on multicore systems and then layer HLE on top of it for an added performance benefit for when we do have lock conflicts.
I agree that well written applications don't have nearly as many locking conflicts to begin with and that's certainly our goal. We try to run most of our game using a multicore graph driven data dependency scheduler (also presented at GDC by my coworker).
But there are a number of systems (both internal and legacy) that do use locking that will benefit from HLE. It's about making the code run as fast as possible.
When we have to lock, we try to make it as fine-grained as possible (until you get diminishing returns in either performance or memory). HLE works well with existing algorithms (almost nothing to rewrite except to specify whether the CAS is acquire or release for a lightweight user space lock) and it is backwards compatible with an extremely low penalty for processors without HLE -- from the benchmarks I have seen, HLE code will run as fast as the original locking code (to within some white noise of most performance metrics on the unsupported CPU's) which should be fairly fast assuming you use RWL's, striped locks, and organize data algorithms to minimize contention.
HLE is a "no brainer" for ease of implmentation. However, a TSX RTM code path does require algorithm rewrites so perhaps that is akin to "writing in asm for a 15% improvement over C" (although some game programmers would find that tradeoff acceptable in small code funtions in low level libraries anyhow!).
As far as gamers are concerned, if we can give them a noticeable performance boost by taking advantage of a specific CPU feature without slowing down the code on CPU's without that feature, they will consider that to be a big win and love us for it.
So as far as game design goes, the transaction stuff is worse than worthless.
I want to feel you're not just trolling me because apparently you've been developing software since at least the Amiga days (we have that in common). However, I feel you are quite misguided on some of your assumptions here.
Not to say I may have a more informed opinion than you because I don't know your personal experience in game development, but I certainly feel that TSX isn't worthless for games and I've been writing performance code full time for games for over 20 years.
Also, I don't see why you keep referencing global locks and spin locking as the only things that would benefit. Did you get a chance to read the presentation I linked to? Mind you, it's based on work I did 4-5 years ago and presented almost 4 years ago, but even back then we were well ahead of the starting point you seem to feel developers are using as a base.
We are already using fine-grained locking, striped locking, reader/writer locks, lock-free atomic SList, lock free allocators, etc. I am interested in having TSX speed up these advanced concurrency primitives on platforms where it is available. If AMD releases ASF, I'll look into accelerating with that as well.
Nope... it's very valuable. Basically what you do is you write code that makes use of fine-grained user space locking that has a fallback to OS locks on contention. This runs very fast on multicore systems.
Then you add HLE extensions and it runs even faster on CPU's that support TSX and you get a rather large performance bonus for free as at that point a majority of your atomic operations become free.
It also allows you to do substitute simple lock-free and non-blocking algorithms that rely on multiaddress DCAS or NCAS on platforms with TSX because they are very simple to emulate either using HLE or even better RTM.
If you think TSX is for spinlocks, you are generations behind in your programming of lockfree algorithms.
It's almost an oxymoron if you are talking about a single-socket Intel cpu. You don't actually need the transactional extensions to make things go fast
Not true... I've written an entire concurrency system including a lock free library and a multicore memory mananger. There are a number of places where TSX offers a large speed improvement even on a single core.
If the purpose is to test code performance then it is better to test without transaction support anyway since transaction support is not a replacement for proper algorithmic design. Or to put it another way... if you code SPECIFICALLY for one of the two intel transactional models that means you will probably wind up with very sloppy code (such as using global spinlocks more than you need to and assuming that the underlying transaction just won't conflict as much). The code might run fine on an Intel cpu but its performance value will not be portable.
Are you even familiar with how TSX works? Hardware Lock Elision is a very simple replacement for atomic locking. You can write a very simple user level mutex using atomic operations that has a fallback to an OS yielding construct. In fact that's what we do in my concurrency library. Uncontested access is a single atomic op while contested access is an actual OS mutex. With HLE, all accesses can appear to be uncontested unless there is an actual data conflict in memory read / written to during the transaction. This cuts down significantly on OS lock calls. It's not just for spinlocks.
And besides... 'your company' ? Use a Xeon then, right? It's not as though it costs all that much more.
-Matt
By "my company", I mean the company I work for. Disclamer: Netherrealm Studios which is owned by Warner Brothers but thia is my personal opinion and any posts I make here do not reflect the official option Warner Brothers nor on Netherrealm Studios. We make video games and I write low level optimized code for multicore / multithreading libraries among many other things.
For what it's worth, none of the Haswell 'K' line supports TSX. You actually have to buy a cheaper CPU to get this feature which is odd... maybe it didn't validate well with overclocking though? The new 'HQ' line seems to support it but the new 'R' line does not.
Anyhow, I'm wondering if the 'X' line supports TSX or not. I can't find docs or specs that answer one way or another right now.
Thats an important question for me as I write the base level concurrency libraries for our company.
I wanted to get a 4770K but Intel disabled TSX (Transactional Synchronization Extensions) on that CPU.