I didn't say iPhone is "just a phone"; I said it's an appliance - it handles my phone calls, contacts, web browsing, email, and even maps (which would be better with GPS onboard - I think this is where iPhone fails).
A PDA is an appliance in the exact same way.
If it doesn't do what I want, out of the box, then it's not the PDA for me. I'm not getting into farting around *administering* a damned PDA!
iPhone is an appliance. I don't want third-party apps on my fridge; I don't really want (to need) them on my phone.
It's intersting that my mac usage is also applicance-like. I plug it in, it works. Web, email, photos, music. For dev work, I run on either a separate box (using the mac as a terminal), or to a VM. As far as thrid party software on my mac, it has to be seriously well vetted: I don't want my appliance messed up - I spent too many years dealing with non-appliance linux distros and the decidedly non-appliance windows world to want to even screw around with any sysadmin shite on my "communication appliance".
When I get my iPhone, it won't be for 3rd party apps.
But of course, the real answer to the "save" question is to *always* save. And provide an infinite undo stack that spans sessions. Even better if you can provide "keep this document while backing up a copy using the undo stack". If we journaled the file properly we wouldn't have to worry about shutting down the app, saving documents, etc, and could just provide "it just works" functionality. But this will take a pile of programmer education to design their application document protocols in ways that are compatible with a sensible user model.
The only thing in your analysis that I question is that out-of-order execution and static speculative pre-fetching will continue to be useful solutions going forward.
Memory latency only scales by the square root of transistor count per area, whereas processor performance is more linear by that measure. That means that as processors get faster (and by "faster" I include multi-core performance benfits) the linear improvement in speculative fetching and out-of-order execution just can't, in the big-O() sense, keep up with the ever increasing memory subsystem latency. Eventually, we just give up and give the out-of-order and speculative pre-fetch hardware over to more hardware thread contexts or to more execution units, or even to more on-chip core-to-core communication hardware (though that will only scale with the square root as well).
Joe - I'm quite familliar with how the SPEs work; I led the team that did the fight-night demo from E3 a couple of years back, and that was entirely an SPE hack; I've been working pretty tightly coupled to SPEs ever since. The comments I was replying to had to do specifically with PPE perf, which is bottlenecked in the traditional L1/L2/main fashion. Given the scale of the latencies pipelining in the PPEs would be a pretty serious waste. In the SPEs, I'm happier to seem more SPEs without out-of-order instruction issue than fewer with; the size of out-of order cores would have lost us a couple of them, I'm sure. And given the regularity of graphics workloads that's not a big loss.
You are right, of course, that working set size matters; but at a different semantic level than the memory latency bottleneck. Getting stuff from disk is into paging-scale bottlenecks, and that's painful no matter what platform!
Out-of-order can help some, but at most that gets you 20 or so cycles of "infered" parallelism. But L2 is already 3 times that far away, and main memory 30 times that far. Out-of-order just doesn't buy you enough relative to these *huge* stalls. At that point it becomes a 1 in 30 perf difference which probably doesn't warrant the huge increase in sillicon complexity.
As far as optimizing for the memory system using prefetches and streamed processing et al., that's the future of performance coding. There's no avoiding these techniques as the gap between memory speed and processor speed looks destined to only get worse. It's a space in which the compiler really can't do much to help you; your algorithm design has to take into account how much slower memory is than compute, and either be able to set up its data transfers long in advance (as in streaming computation), or have something else to do while it waits (as in context switching).
The amount of RAM is a different issue from being bottlenecked on the memory subsystem. Long ago a cpu running 1mhz had memory running at the same rate - you could effectively manage a memory access per instruction. Over time CPUs got faster faster than memory got faster. So caches showed up to try to mask it. On a PS2 a cache miss wound up costing 40-60 cycles. Ouch. And the trend has continued, but now it's worse: on the PS3 a cache miss is something ludicrous like 400-600 cycles. Think of it: 500 instructions possible in the time it takes to fetch from memory. Without getting clever, you wind up spending a lot of time stalled waiting for memory. And that's without piles of contention from lots of different threads and processors trying to use the same bus. That's what's meant by being bottlenecked on memory.
( A=tmp/a/b/c cd $A || mkdir -p $A && cd $A )
But this is silly: mkdir -p succeeds even if the directory already exists, so this suffices:
mkdir -p a/b/c && cd a/b/c
I also resent the use of zillions of little command options to "be more efficient". The -C to tar is just plain silly:
( cd a/b/c ; tar xf $ELSEWHERE/foo.tar )
does the same thing, withouth having to look up the fricking -C each time you want to use it. Three guesses which technique has the true unix nature.
The reason CP is illegal isn't because it's believed the users of CP will become molesters. It's illegal largely in order to remove the market for CP, whose creation involves sexual abuse of children.
Those plugs are often behind the machine, which is moved for maintenance. There is often a rat's nest (or even a tidy bundle in some rare cases) of cables. No-one is paying attention to where yet another blue cable is running. Even with just 2 moderately long cables it's hard to visually track them. You can't count on someone "catching" this problem.
And how does the average joe know to look at both ends of the cable? It's not obvious that you're plugged into the network rather than the copier, particularly if you are "supposed" to be there. That's the joy of social engineering.
For two points, define "correct". You're using that word the same way people use "common sense". There's an awful lot of assumption behind both these usages. I dare say you might be succumbing to your own hubris.
Actually, I've just applied for a patent on a derivative work of this one in which the list can only be traversed in one direction. That allows a saving of 50% of the non-data overhead.
Just a little. Like that's an awful lot of what I'm doing when I'm not hiring. And vertex shaders. And getting a Cell and SPUs to perform. And GPGPU.
And I know that if you know machine architectures and compilers, I can explain to you the hardware and threading model of pixel & vertex shaders, and you'll be able to learn pretty quickly. I have no such confidence that if you know pixel shaders that I can get you to adapt to writing data compilers targetting various DMA/push_buffer architectures to feed them efficiently. Certainly not in a quick week of poking at it.
Never mind that as a hiring manager I'd rather have the applicant who understands compilers and OS theory over anyone who just knows pixel shaders. Someone who understands compilers, hardware, & OSes will learn pixel shaders in a day. The opposite is not true.
I've been running WoW under OS X on my MBP. Used to run it on my G4 powerBook. It's well supported.
I've been playing on my MBP and loving it
And I wish I didn't know what that meant ;-)
A PDA is an appliance in the exact same way. If it doesn't do what I want, out of the box, then it's not the PDA for me. I'm not getting into farting around *administering* a damned PDA!
It's intersting that my mac usage is also applicance-like. I plug it in, it works. Web, email, photos, music. For dev work, I run on either a separate box (using the mac as a terminal), or to a VM. As far as thrid party software on my mac, it has to be seriously well vetted: I don't want my appliance messed up - I spent too many years dealing with non-appliance linux distros and the decidedly non-appliance windows world to want to even screw around with any sysadmin shite on my "communication appliance".
When I get my iPhone, it won't be for 3rd party apps.
But of course, the real answer to the "save" question is to *always* save. And provide an infinite undo stack that spans sessions. Even better if you can provide "keep this document while backing up a copy using the undo stack". If we journaled the file properly we wouldn't have to worry about shutting down the app, saving documents, etc, and could just provide "it just works" functionality. But this will take a pile of programmer education to design their application document protocols in ways that are compatible with a sensible user model.
Really, the thing to do to address this is some trivial googling. Use a secure password store and a decent protocol: http://plan9.bell-labs.com/sys/doc/auth.pdf
Memory latency only scales by the square root of transistor count per area, whereas processor performance is more linear by that measure. That means that as processors get faster (and by "faster" I include multi-core performance benfits) the linear improvement in speculative fetching and out-of-order execution just can't, in the big-O() sense, keep up with the ever increasing memory subsystem latency. Eventually, we just give up and give the out-of-order and speculative pre-fetch hardware over to more hardware thread contexts or to more execution units, or even to more on-chip core-to-core communication hardware (though that will only scale with the square root as well).
Yes, IIRC all of Altivec is implemented on PPE.
A pleasure.
You are right, of course, that working set size matters; but at a different semantic level than the memory latency bottleneck. Getting stuff from disk is into paging-scale bottlenecks, and that's painful no matter what platform!
As far as optimizing for the memory system using prefetches and streamed processing et al., that's the future of performance coding. There's no avoiding these techniques as the gap between memory speed and processor speed looks destined to only get worse. It's a space in which the compiler really can't do much to help you; your algorithm design has to take into account how much slower memory is than compute, and either be able to set up its data transfers long in advance (as in streaming computation), or have something else to do while it waits (as in context switching).
The amount of RAM is a different issue from being bottlenecked on the memory subsystem. Long ago a cpu running 1mhz had memory running at the same rate - you could effectively manage a memory access per instruction. Over time CPUs got faster faster than memory got faster. So caches showed up to try to mask it. On a PS2 a cache miss wound up costing 40-60 cycles. Ouch. And the trend has continued, but now it's worse: on the PS3 a cache miss is something ludicrous like 400-600 cycles. Think of it: 500 instructions possible in the time it takes to fetch from memory. Without getting clever, you wind up spending a lot of time stalled waiting for memory. And that's without piles of contention from lots of different threads and processors trying to use the same bus. That's what's meant by being bottlenecked on memory.
( A=tmp/a/b/c cd $A || mkdir -p $A && cd $A )
But this is silly: mkdir -p succeeds even if the directory already exists, so this suffices:
mkdir -p a/b/c && cd a/b/c
I also resent the use of zillions of little command options to "be more efficient". The -C to tar is just plain silly:
( cd a/b/c ; tar xf $ELSEWHERE/foo.tar )
does the same thing, withouth having to look up the fricking -C each time you want to use it. Three guesses which technique has the true unix nature.
The reason CP is illegal isn't because it's believed the users of CP will become molesters. It's illegal largely in order to remove the market for CP, whose creation involves sexual abuse of children.
Remember, a cold war with Russia is much better for the economy than a hot war with Ira[qn].
Those plugs are often behind the machine, which is moved for maintenance. There is often a rat's nest (or even a tidy bundle in some rare cases) of cables. No-one is paying attention to where yet another blue cable is running. Even with just 2 moderately long cables it's hard to visually track them. You can't count on someone "catching" this problem.
And how does the average joe know to look at both ends of the cable? It's not obvious that you're plugged into the network rather than the copier, particularly if you are "supposed" to be there. That's the joy of social engineering.
I don't steal music; I resent being branded a "pirate". I especially resent paying a tax for a sin I don't commit.
They get the "democracy" they ask for. Go figure.
For two points, define "correct". You're using that word the same way people use "common sense". There's an awful lot of assumption behind both these usages. I dare say you might be succumbing to your own hubris.
Actually, I've just applied for a patent on a derivative work of this one in which the list can only be traversed in one direction. That allows a saving of 50% of the non-data overhead.
And I know that if you know machine architectures and compilers, I can explain to you the hardware and threading model of pixel & vertex shaders, and you'll be able to learn pretty quickly. I have no such confidence that if you know pixel shaders that I can get you to adapt to writing data compilers targetting various DMA/push_buffer architectures to feed them efficiently. Certainly not in a quick week of poking at it.
Never mind that as a hiring manager I'd rather have the applicant who understands compilers and OS theory over anyone who just knows pixel shaders. Someone who understands compilers, hardware, & OSes will learn pixel shaders in a day. The opposite is not true.