The original MIPS processors are immune, because they had a 3-stage pipeline and a branch delay slot and so always had the branch destination available by the time that they needed to fetch the target instruction. Almost all later MIPS cores are vulnerable to variations of Spectre, though I believe some of the Cavium ones aren't because they have lots of hardware threads and simply pause a thread on each branch and execute another until they get the branch target.
Speculative execution was an idea that the CPU evaluates the two possible future state of itself then discard the outcome that doesn't happen
I know of a couple of research processors that have worked that way, but nothing in production. It doesn't really scale, because typical C code has a branch (on average) every 7 instructions, but a modern processor can have almost 200 instructions in flight at a time. If you execute both paths (assuming branches are simple conditional branches and not computed jumps), you need to be able to handle both instruction streams for every speculative operation. That means doubling the resources every 7 instructions and you quickly run out of transistors.
Speculative execution is about guessing what instructions you're going to run next[1] and running it as soon as you have its input operands available, then throwing away all of the state associated with the results if you guessed wrong. This is why branch mispredicts are expensive: the pipeline spends some time executing the wrong thing, then some more time discarding any of the state. The root cause of meltdown and spectre is that 'all of the state' turns out to be more than expected. In the simple case, values are loaded into (or evicted from) the cache as a result of speculatively executed instructions. This can be worked around by fetching values into some separate cache space and only writing them back to the main cache when the instructions are committed. In the more complex case, the time taken to execute and cancel the instructions varies depending on the values. That's much harder to address, because you can't simply roll back time to a little bit earlier...
[1] In most processors, anyway. The Alpha did value speculation, so would guess the results of instructions and guessing the address of the next instruction was just a special case of this. Fortunately, no one does that anymore - Spectre on the Alpha would have been so much worse than on anything from Intel.
Even short in-order pipelines do speculative execution. We are using a 7-stage single-issue in-order pipeline implemented in FPGA for prototyping some ISA extensions. If we don't do speculative execution, we take around a 20-30% performance hit most real-world code.
What exactly do you think the difference between prefetching and speculative execution is?
Not the GP, but prefetching, unlike speculative execution, is not rolled back. In speculative execution, you start executing instructions that probably will be executed, but if you shouldn't actually have executed them then you silently discard the results and reset the pipeline to the earlier state. In prefetching, you pull in data from main memory to the cache that might be needed soon, but if it isn't used then you still leave it in the cache and still evict whatever you displaced to load it.
Both are observable and provide side channels, but prefetching is simpler (speculative execution can also trigger prefetching, so they're non-orthogonal).
I don't personally do any significant JS development, but the suggestion that this is a JS-specific problem is silly. This could have just as well been in a Java or C++ framework. We all use third party libraries and frameworks all the time without doing a line-by-line code review.
There's one difference: when I use a C/C++ library, I almost always install it via my operating system's package manager. The version installed won't have been checked for backdoors, but it will at least be a released version that has gone through some minimal QA. The fact that it's included at all typically means that upstream has a half-competent release process or that some other applications are depending on it and making it worth packaging.
Newer languages all seem to feel the need to create a per-language package manager. This works fine as long as all software is written in that language, but is really painful for multi-language software (i.e. most non-trivial programs). NPM isn't quite the worst - the default way of including stuff in a Go program is to point at the upstream project's git repo and have your build environment clone the head revision and statically link it into your binary - but it's pretty bad.
There should be neither welfare nor a minimum wage.
Okay, so when a company offers a job for just above starvation wages, what do you expect to happen? The history of the industrial revolution tells us that desperate people will accept it, because the other alternative is to starve to death. You think that's good for the long-term health of the country?
The solution to inadequate pay in an environment where you have no skills is to move.
Move where? Look up the term 'race to the bottom'. And, again, look at the history of the industrial revolution. Oh, and also explain how you expect people with no welfare, no jobs, and few skills to be able to afford to move.
I'm not a low wage worker- but I moved from a place where the cost of living was higher and the people living there were wealthy.
And, no doubt, you were able to do this because you had a decent amount of savings accumulated to cover your relocation costs and you had a very high chance of employment where you moved.
It's lazy and/or stupid people who think like you who need to be educated on economics
The irony of ending a post that demonstrates a complete lack of awareness of either history or economics with that is not lost on me. I honestly can't tell if you're a parody or not.
There are a couple of reasons, but the main one is that if you're able to run code on both the client and server then there are much smarter ways of doing this. The simplest way is to store a public key on the server and store the corresponding private key on the client. Then the server sends some data to the client, which encrypts it with the private key and sends it to the server, which decrypts it with the public key. If the encrypted data matches the data provided by the server, this proves that the client has the private key. There are more complex protocols, such as the Secure Remote Password protocol that do something equivalent to this but using a password rather than a stored private key. The basic idea is the same: you send something that proves that you have the password, not the password itself.
In most cases on the web, it's usually because you want to support browsers with JavaScript disabled, so you must support a simple submit-the-password-via-a-form mechanism. That may change as the WebAuthn standard becomes more widely deployed.
The salt isn't secret, it's just used to prevent rainbow tables from being useful. If you store passwords as unsalted hashes, then an attacker can construct a large table of all of the hashes of 8-character inputs and compare each of your hashes against their table. If there's a match, then they have a password that will work. If you add a salt, then they can't use such a table, because they have to check each 8-character sequence with the hash prepended. If the salt is different for each password (as it should be), then there's no benefit from pre-calculating the table. If it takes 2 hours of GPU time to compute the table, then with unsalted passwords that's a one-time cost and you can then crack any weak password in a leaked password database almost instantly. In contrast, it will take you 2 hours to attempt each password and crack each weak one in a leaked salted password database.
If you hash on the client, then you're not really hashing, you're just using a different password and the thing in your database is now the real password, because that's what the client presents to log in. If you really want to do it properly, there are zero-knowledge protocols that allow it, but simply hashing on the client is not sufficient.
That alone doesn't undermine collective bargaining. In an environment with collective bargaining, saying 'you're fired' without just cause would mean a strike by every other member of the collective. If this is not happening in 'at will' states, then that's a separate failure.
Housing can only sell/rent for what people can afford. Empty units make no one money
This is not true. Well, the second part, anyway. In a boom / bubble, the appreciation on an asset like a house can be significantly more than the rental income. A lot of properties are being bought with no intention of renting them (tenants may decrease the resale value by increasing wear on the interiors) so that they can be sold in a few years for a large profit. This happens in any market where a large proportion of the participants are speculators, rather than producers or consumers. The US used to have regulations that limited the number of speculators in commodities markets (you need some to improve liquidity and reduce risk), but Goldman Sachs successfully lobbied to have them removed. There were never any such regulations on real estate.
'I am set up enough to have a solid job with prospects, and I see no reason why people who are not should have a job, because they cannot earn enough to make it worthwhile in my view'
There are a couple of issues. The first is the difference in negotiating power. In a high-skill or high-demand occupation, there is a vaguely level playing field between employer and employee. If the employer doesn't treat the employee well, then the employee can leave and continue employment elsewhere (and the departure is likely to financially hurt the employer). In a low-skill occupation, there is a huge imbalance. The employee is replaceable, but probably needs the job to be able to afford to live. There is a large mass of historical evidence that, when employers are allowed to abuse this bargaining power, they will do so. The only ways that have historically worked to prevent this abuse are minimum wage and related regulations and collective bargaining. The latter works only if the majority of the workforce opts into it.
The second issue is one of indirect subsidy. If a company is not paying enough to live, then that slack has to be picked up elsewhere, typically by welfare payments. We, as taxpayers, are paying that bill. If a large chunk of my income is going to welfare payments, I'd much prefer that it were going to help people less fortunate than my and not to subsidise abusive business models from companies that couldn't compete if they paid a living wage.
Several of the polytechnics were well known for first-rate vocational engineering training. I don't know about Newcastle, but Bristol Poly's aerospace engineering course pretty much guaranteed a job at BAe or Rolls Royce. It was a tragedy when the government decided everyone should go to university and turned them from first-rate vocational institutions into third-rate universities.
(1) Does out-of-order necessarily imply speculative execution?
No, though typically you move to speculative execution before you move to superscalar. Speculative execution is required to get decent performance from any pipelined processor. The difference between the speculative execution in a superscalar and an in-order pipeline is one of degree, rather than kind. There's a fairly common heuristic that you have a branch roughly ever 7 instructions in code compiled from vaguely C-like languages (code that is often quite misleadingly called 'general-purpose' code). If you have a 7-stage pipeline (fairly small by modern standards), then by the time you've got the first last instruction in a basic block out of the pipeline, you'll have reached the end. You now have two choices: wait until you have the branch target value (i.e. you've finished executing the instruction that computes it) or guess. If you guess and get it right even some of the time, you'll have better performance because you'll be executing useful instructions at least some of the time, whereas if you don't guess you're guaranteed to be executing nothing. This is the essence of branch prediction.
(2) Is in-order really that bad, considering all the other advances in processor design?
That depends a lot on the workload. GPUs, for example, are in-order and don't do speculative execution. They are really fast. They are; however, very slow at running typical C code.
Out of order designs result from the desire to exploit instruction-level parallelism. The basic idea is that it's easy to do things in parallel in hardware: just stamp out more logical blocks. Given sequential code, if you can convert it into sequences of data dependencies, you can then execute anything in parallel once its dependencies are satisfied. Most straight-line code has on the order of 4-way instruction-level parallelism, so if you can fully exploit this then you can run at around four times the speed.
The obvious alternative is to just have four single-pipeline[1] cores. This will give you better power efficiency, but at the cost of requiring some combination of the programmer and the compiler to generate code that exposes parallelism. You can take this even further and have multiple thread contexts per core, so whenever you would have to speculatively execute an instruction, you instead pause the thread and start running instructions from another thread. This approach means that all of the instructions that you're executing are ones where the result is actually used and simplifies some of the very complex parts of a processor (branch predictor, register rename engine) and so gives much better performance per Watt and per unit area than an out-of-order processor. If you have a source language that makes it easy to expose this kind of parallelism, then this will give you much better performance. Unfortunately, most Algol-derived languages (C, C++, Java, and so on) are not in this category.
(3) What about efficiency? If in-order means the CPU is doing less work in a given time, is it also consuming less power? I.e. is the 50%..75% reduction in absolute computing power, or also efficiency?
Good question. If you look in a typical Android smartphone, you will see ARM cores in a big.LITTLE configuration: a cluster of simpler in-order cores and a cluster of higher-performance out-of-order cores. The in-order cores will consume less power than the big cores executing the same sequence of instructions[2]. The peak performance for the big cores will be much higher, but at the cost of efficiency.
(2) is related to some anecdotal experience that hyperthreading works well on in-order Atom processors. In my understanding, HT helps keep the pipeline full, so if it's half empty due to in-order design, the relative improvement would be better compared to OOO processors. (3) is also about pipeline utilization
Fixing meltdown is fairly simple - don't speculate across ring transitions. That will come with a small performance hit, but only a small one. Fixing Spectre is much harder because Spectre isn't really a vulnerability so much as a class of vulnerabilities with proofs of concept for the easiest things to attack. Fixing Spectre means making sure that no side effects of speculation, including timing, are visible. That means, among other things, no cache fills or evictions during speculative execution, all instructions in flight must be cancelled as soon as they're known not-taken, rename registers must be returned for use as soon as instructions are known to be cancelled, and so on. It might be possible to design a superscalar chip that is not vulnerable to Spectre-like attacks, but I'm sceptical (and I'm doubly sceptical that, if you could, it would perform better than an in-order processor).
First, these are Cannon Lake chips. Remember Cannon Lake, due late 2016? Delayed until late 2017? Delayed until late in the first half of 2018? Yup, that Cannon Lake. Among other things, Cannon Lake was scheduled to introduce LPDDR4, so would be the first Intel mobile chips that could manage 32GB of RAM without using a huge power budget. If you think it's bad for Intel now, wait for the Apple fanboys to notice...
Second, one of the features that people have been waiting for since it was originally announced in 2016 and was expected to debut with Cannon Lake is Intel's Control-flow Enforcement Technology. This comprises two parts. The first is a set of magic nops that indicate a valid branch target and protect forward control flow arcs (any jump that isn't to a designated landing pad will trap in code marked as supporting the feature). The second is a secure stack. Every call instruction pushes the return address onto the main stack, but also onto a second stack (which is not readable or writeable by normal instructions). Each ret instruction checks the top of both stacks and traps if they disagree. Sounds great? That's what everyone thought last year, but unfortunately it is incompatible with the retpoline Spectre mitigation that is now fairly widely deployed, so CET is now impossible to deploy in the presence of code using retpolines (e.g. Chrome) and so needs to be redesigned very late in the schedule or skipped entirely.
However, I still like Apple's Routers because they are the ONLY people I trust not to slipstream-in NSA backdoors into the Firmware.
That's not true. Huawei, for example, doesn't install NSA back doors. More seriously, how do you know that the NSA hasn't injected vulnerabilities into Apple's firmware? If you've followed the story of how the Juniper backdoor was introduced, you'll know that it doesn't necessarily require anyone in the company to be aware...
I just hope Apple opens the Time Capsule "protocol".
They did. They have documented the extensions to AFP that Time Machine uses and they have been supported by netatalk for many years. I've been backing up to a ZFS-based FreeBSD machine with Time Machine for about 8 years. It mostly worked with netatalk 2.x, with netatalk 3.x it's been flawless.
Does it still do the thing that the older Windows Backup utility did of creating nonsense-named zip files of your data? This was annoying because it completely defeated block-level deduplication on the NAS.
I've set up AFP on a FreeBSD machine for Time Machine backups. Doing it with netatalk 2.x was quite painful. With 3.x, it works out of the box. It is slightly more effort to set up than a Time Capsule, but it's also more powerful. For example, my Time Machine backups go to a ZFS filesystem, which is periodically snapshotted.
Time Machine occasionally corrupts backups (colleagues have reported this on both Time Capsules and external disks, though I haven't seen it for a few years) and if it does then the only option with a Time Capsule is to delete all of the existing backups and start again. Well, that's the only official Apple-supported thing. You can also often run fsck on the backup disk image and get it working, but that falls into 'Linux-user' territory for your definition.
With ZFS, I can clone the current version, revert to an older snapshot and Time Machine will then happily run an incremental backup from a couple of weeks ago, rather than a full backup (as it does if the same thing happens with a Time Capsule).
Oh, and Time Machine does file-level deduplication, but my ZFS filesystem is doing block-level deduplication and lz4 compression, so I'm using a lot less disk space for my backups (they get very high dedup and compression ratios).
Apple entered the wireless access point market because it wasn't competitive. There were few players and there was a big premium for 802.11g parts (many of which were crap), and Apple wanted to sell support for 802.11g as a feature on the PowerBooks. This feature was largely worthless if the expensive 802.11g WiFi interface on the laptop was always running in downgraded 802.11b-compatible mode. Something similar happened with 802.11n. By the time 802.11ac came along, the market was competitive enough that there was no need for Apple to do anything: if they did nothing, people were still able to get 802.11ac working well. In addition, 802.11ac was much less of a selling point. The jump from.11b to.11g was the difference between nice toy for demos and generally useful. The jump from.11g to.11n meant that the WiFi was typically not the bottleneck for most users. The jump to.11ac means that WiFi is even less of a bottleneck, but it's well past the point where most people care.
The original MIPS processors are immune, because they had a 3-stage pipeline and a branch delay slot and so always had the branch destination available by the time that they needed to fetch the target instruction. Almost all later MIPS cores are vulnerable to variations of Spectre, though I believe some of the Cavium ones aren't because they have lots of hardware threads and simply pause a thread on each branch and execute another until they get the branch target.
Speculative execution was an idea that the CPU evaluates the two possible future state of itself then discard the outcome that doesn't happen
I know of a couple of research processors that have worked that way, but nothing in production. It doesn't really scale, because typical C code has a branch (on average) every 7 instructions, but a modern processor can have almost 200 instructions in flight at a time. If you execute both paths (assuming branches are simple conditional branches and not computed jumps), you need to be able to handle both instruction streams for every speculative operation. That means doubling the resources every 7 instructions and you quickly run out of transistors.
Speculative execution is about guessing what instructions you're going to run next[1] and running it as soon as you have its input operands available, then throwing away all of the state associated with the results if you guessed wrong. This is why branch mispredicts are expensive: the pipeline spends some time executing the wrong thing, then some more time discarding any of the state. The root cause of meltdown and spectre is that 'all of the state' turns out to be more than expected. In the simple case, values are loaded into (or evicted from) the cache as a result of speculatively executed instructions. This can be worked around by fetching values into some separate cache space and only writing them back to the main cache when the instructions are committed. In the more complex case, the time taken to execute and cancel the instructions varies depending on the values. That's much harder to address, because you can't simply roll back time to a little bit earlier...
[1] In most processors, anyway. The Alpha did value speculation, so would guess the results of instructions and guessing the address of the next instruction was just a special case of this. Fortunately, no one does that anymore - Spectre on the Alpha would have been so much worse than on anything from Intel.
Even short in-order pipelines do speculative execution. We are using a 7-stage single-issue in-order pipeline implemented in FPGA for prototyping some ISA extensions. If we don't do speculative execution, we take around a 20-30% performance hit most real-world code.
What exactly do you think the difference between prefetching and speculative execution is?
Not the GP, but prefetching, unlike speculative execution, is not rolled back. In speculative execution, you start executing instructions that probably will be executed, but if you shouldn't actually have executed them then you silently discard the results and reset the pipeline to the earlier state. In prefetching, you pull in data from main memory to the cache that might be needed soon, but if it isn't used then you still leave it in the cache and still evict whatever you displaced to load it.
Both are observable and provide side channels, but prefetching is simpler (speculative execution can also trigger prefetching, so they're non-orthogonal).
I don't personally do any significant JS development, but the suggestion that this is a JS-specific problem is silly. This could have just as well been in a Java or C++ framework. We all use third party libraries and frameworks all the time without doing a line-by-line code review.
There's one difference: when I use a C/C++ library, I almost always install it via my operating system's package manager. The version installed won't have been checked for backdoors, but it will at least be a released version that has gone through some minimal QA. The fact that it's included at all typically means that upstream has a half-competent release process or that some other applications are depending on it and making it worth packaging.
Newer languages all seem to feel the need to create a per-language package manager. This works fine as long as all software is written in that language, but is really painful for multi-language software (i.e. most non-trivial programs). NPM isn't quite the worst - the default way of including stuff in a Go program is to point at the upstream project's git repo and have your build environment clone the head revision and statically link it into your binary - but it's pretty bad.
There should be neither welfare nor a minimum wage.
Okay, so when a company offers a job for just above starvation wages, what do you expect to happen? The history of the industrial revolution tells us that desperate people will accept it, because the other alternative is to starve to death. You think that's good for the long-term health of the country?
The solution to inadequate pay in an environment where you have no skills is to move.
Move where? Look up the term 'race to the bottom'. And, again, look at the history of the industrial revolution. Oh, and also explain how you expect people with no welfare, no jobs, and few skills to be able to afford to move.
I'm not a low wage worker- but I moved from a place where the cost of living was higher and the people living there were wealthy.
And, no doubt, you were able to do this because you had a decent amount of savings accumulated to cover your relocation costs and you had a very high chance of employment where you moved.
It's lazy and/or stupid people who think like you who need to be educated on economics
The irony of ending a post that demonstrates a complete lack of awareness of either history or economics with that is not lost on me. I honestly can't tell if you're a parody or not.
Shouldn't that be pedants'? There are at least three of us, I believe...
There are a couple of reasons, but the main one is that if you're able to run code on both the client and server then there are much smarter ways of doing this. The simplest way is to store a public key on the server and store the corresponding private key on the client. Then the server sends some data to the client, which encrypts it with the private key and sends it to the server, which decrypts it with the public key. If the encrypted data matches the data provided by the server, this proves that the client has the private key. There are more complex protocols, such as the Secure Remote Password protocol that do something equivalent to this but using a password rather than a stored private key. The basic idea is the same: you send something that proves that you have the password, not the password itself.
In most cases on the web, it's usually because you want to support browsers with JavaScript disabled, so you must support a simple submit-the-password-via-a-form mechanism. That may change as the WebAuthn standard becomes more widely deployed.
The salt isn't secret, it's just used to prevent rainbow tables from being useful. If you store passwords as unsalted hashes, then an attacker can construct a large table of all of the hashes of 8-character inputs and compare each of your hashes against their table. If there's a match, then they have a password that will work. If you add a salt, then they can't use such a table, because they have to check each 8-character sequence with the hash prepended. If the salt is different for each password (as it should be), then there's no benefit from pre-calculating the table. If it takes 2 hours of GPU time to compute the table, then with unsalted passwords that's a one-time cost and you can then crack any weak password in a leaked password database almost instantly. In contrast, it will take you 2 hours to attempt each password and crack each weak one in a leaked salted password database.
If you hash on the client, then you're not really hashing, you're just using a different password and the thing in your database is now the real password, because that's what the client presents to log in. If you really want to do it properly, there are zero-knowledge protocols that allow it, but simply hashing on the client is not sufficient.
That alone doesn't undermine collective bargaining. In an environment with collective bargaining, saying 'you're fired' without just cause would mean a strike by every other member of the collective. If this is not happening in 'at will' states, then that's a separate failure.
Housing can only sell/rent for what people can afford. Empty units make no one money
This is not true. Well, the second part, anyway. In a boom / bubble, the appreciation on an asset like a house can be significantly more than the rental income. A lot of properties are being bought with no intention of renting them (tenants may decrease the resale value by increasing wear on the interiors) so that they can be sold in a few years for a large profit. This happens in any market where a large proportion of the participants are speculators, rather than producers or consumers. The US used to have regulations that limited the number of speculators in commodities markets (you need some to improve liquidity and reduce risk), but Goldman Sachs successfully lobbied to have them removed. There were never any such regulations on real estate.
Since this is the pedants thread, I think you mean homophones. As in 'I'm not a homophone, some of my best friends sound the same!'
'I am set up enough to have a solid job with prospects, and I see no reason why people who are not should have a job, because they cannot earn enough to make it worthwhile in my view'
There are a couple of issues. The first is the difference in negotiating power. In a high-skill or high-demand occupation, there is a vaguely level playing field between employer and employee. If the employer doesn't treat the employee well, then the employee can leave and continue employment elsewhere (and the departure is likely to financially hurt the employer). In a low-skill occupation, there is a huge imbalance. The employee is replaceable, but probably needs the job to be able to afford to live. There is a large mass of historical evidence that, when employers are allowed to abuse this bargaining power, they will do so. The only ways that have historically worked to prevent this abuse are minimum wage and related regulations and collective bargaining. The latter works only if the majority of the workforce opts into it.
The second issue is one of indirect subsidy. If a company is not paying enough to live, then that slack has to be picked up elsewhere, typically by welfare payments. We, as taxpayers, are paying that bill. If a large chunk of my income is going to welfare payments, I'd much prefer that it were going to help people less fortunate than my and not to subsidise abusive business models from companies that couldn't compete if they paid a living wage.
That is why the right is dogmatically insistent that it is only liberals, minorities, and especially the "illegals" who get welfare
This is true. Only people on the left receive welfare. People on the right receive well-deserved subsidy and financial assistance.
Several of the polytechnics were well known for first-rate vocational engineering training. I don't know about Newcastle, but Bristol Poly's aerospace engineering course pretty much guaranteed a job at BAe or Rolls Royce. It was a tragedy when the government decided everyone should go to university and turned them from first-rate vocational institutions into third-rate universities.
(1) Does out-of-order necessarily imply speculative execution?
No, though typically you move to speculative execution before you move to superscalar. Speculative execution is required to get decent performance from any pipelined processor. The difference between the speculative execution in a superscalar and an in-order pipeline is one of degree, rather than kind. There's a fairly common heuristic that you have a branch roughly ever 7 instructions in code compiled from vaguely C-like languages (code that is often quite misleadingly called 'general-purpose' code). If you have a 7-stage pipeline (fairly small by modern standards), then by the time you've got the first last instruction in a basic block out of the pipeline, you'll have reached the end. You now have two choices: wait until you have the branch target value (i.e. you've finished executing the instruction that computes it) or guess. If you guess and get it right even some of the time, you'll have better performance because you'll be executing useful instructions at least some of the time, whereas if you don't guess you're guaranteed to be executing nothing. This is the essence of branch prediction.
(2) Is in-order really that bad, considering all the other advances in processor design?
That depends a lot on the workload. GPUs, for example, are in-order and don't do speculative execution. They are really fast. They are; however, very slow at running typical C code.
Out of order designs result from the desire to exploit instruction-level parallelism. The basic idea is that it's easy to do things in parallel in hardware: just stamp out more logical blocks. Given sequential code, if you can convert it into sequences of data dependencies, you can then execute anything in parallel once its dependencies are satisfied. Most straight-line code has on the order of 4-way instruction-level parallelism, so if you can fully exploit this then you can run at around four times the speed.
The obvious alternative is to just have four single-pipeline[1] cores. This will give you better power efficiency, but at the cost of requiring some combination of the programmer and the compiler to generate code that exposes parallelism. You can take this even further and have multiple thread contexts per core, so whenever you would have to speculatively execute an instruction, you instead pause the thread and start running instructions from another thread. This approach means that all of the instructions that you're executing are ones where the result is actually used and simplifies some of the very complex parts of a processor (branch predictor, register rename engine) and so gives much better performance per Watt and per unit area than an out-of-order processor. If you have a source language that makes it easy to expose this kind of parallelism, then this will give you much better performance. Unfortunately, most Algol-derived languages (C, C++, Java, and so on) are not in this category.
(3) What about efficiency? If in-order means the CPU is doing less work in a given time, is it also consuming less power? I.e. is the 50%..75% reduction in absolute computing power, or also efficiency?
Good question. If you look in a typical Android smartphone, you will see ARM cores in a big.LITTLE configuration: a cluster of simpler in-order cores and a cluster of higher-performance out-of-order cores. The in-order cores will consume less power than the big cores executing the same sequence of instructions[2]. The peak performance for the big cores will be much higher, but at the cost of efficiency.
(2) is related to some anecdotal experience that hyperthreading works well on in-order Atom processors. In my understanding, HT helps keep the pipeline full, so if it's half empty due to in-order design, the relative improvement would be better compared to OOO processors. (3) is also about pipeline utilization
Pu
Fixing meltdown is fairly simple - don't speculate across ring transitions. That will come with a small performance hit, but only a small one. Fixing Spectre is much harder because Spectre isn't really a vulnerability so much as a class of vulnerabilities with proofs of concept for the easiest things to attack. Fixing Spectre means making sure that no side effects of speculation, including timing, are visible. That means, among other things, no cache fills or evictions during speculative execution, all instructions in flight must be cancelled as soon as they're known not-taken, rename registers must be returned for use as soon as instructions are known to be cancelled, and so on. It might be possible to design a superscalar chip that is not vulnerable to Spectre-like attacks, but I'm sceptical (and I'm doubly sceptical that, if you could, it would perform better than an in-order processor).
First, these are Cannon Lake chips. Remember Cannon Lake, due late 2016? Delayed until late 2017? Delayed until late in the first half of 2018? Yup, that Cannon Lake. Among other things, Cannon Lake was scheduled to introduce LPDDR4, so would be the first Intel mobile chips that could manage 32GB of RAM without using a huge power budget. If you think it's bad for Intel now, wait for the Apple fanboys to notice...
Second, one of the features that people have been waiting for since it was originally announced in 2016 and was expected to debut with Cannon Lake is Intel's Control-flow Enforcement Technology. This comprises two parts. The first is a set of magic nops that indicate a valid branch target and protect forward control flow arcs (any jump that isn't to a designated landing pad will trap in code marked as supporting the feature). The second is a secure stack. Every call instruction pushes the return address onto the main stack, but also onto a second stack (which is not readable or writeable by normal instructions). Each ret instruction checks the top of both stacks and traps if they disagree. Sounds great? That's what everyone thought last year, but unfortunately it is incompatible with the retpoline Spectre mitigation that is now fairly widely deployed, so CET is now impossible to deploy in the presence of code using retpolines (e.g. Chrome) and so needs to be redesigned very late in the schedule or skipped entirely.
However, I still like Apple's Routers because they are the ONLY people I trust not to slipstream-in NSA backdoors into the Firmware.
That's not true. Huawei, for example, doesn't install NSA back doors. More seriously, how do you know that the NSA hasn't injected vulnerabilities into Apple's firmware? If you've followed the story of how the Juniper backdoor was introduced, you'll know that it doesn't necessarily require anyone in the company to be aware...
Samba also supports the requirements for Time Machine over SMB, since Samba 4.8, so you can back up via either SMB or AFP to a non-Apple device.
I just hope Apple opens the Time Capsule "protocol".
They did. They have documented the extensions to AFP that Time Machine uses and they have been supported by netatalk for many years. I've been backing up to a ZFS-based FreeBSD machine with Time Machine for about 8 years. It mostly worked with netatalk 2.x, with netatalk 3.x it's been flawless.
Does it still do the thing that the older Windows Backup utility did of creating nonsense-named zip files of your data? This was annoying because it completely defeated block-level deduplication on the NAS.
I've set up AFP on a FreeBSD machine for Time Machine backups. Doing it with netatalk 2.x was quite painful. With 3.x, it works out of the box. It is slightly more effort to set up than a Time Capsule, but it's also more powerful. For example, my Time Machine backups go to a ZFS filesystem, which is periodically snapshotted.
Time Machine occasionally corrupts backups (colleagues have reported this on both Time Capsules and external disks, though I haven't seen it for a few years) and if it does then the only option with a Time Capsule is to delete all of the existing backups and start again. Well, that's the only official Apple-supported thing. You can also often run fsck on the backup disk image and get it working, but that falls into 'Linux-user' territory for your definition.
With ZFS, I can clone the current version, revert to an older snapshot and Time Machine will then happily run an incremental backup from a couple of weeks ago, rather than a full backup (as it does if the same thing happens with a Time Capsule).
Oh, and Time Machine does file-level deduplication, but my ZFS filesystem is doing block-level deduplication and lz4 compression, so I'm using a lot less disk space for my backups (they get very high dedup and compression ratios).
Apple entered the wireless access point market because it wasn't competitive. There were few players and there was a big premium for 802.11g parts (many of which were crap), and Apple wanted to sell support for 802.11g as a feature on the PowerBooks. This feature was largely worthless if the expensive 802.11g WiFi interface on the laptop was always running in downgraded 802.11b-compatible mode. Something similar happened with 802.11n. By the time 802.11ac came along, the market was competitive enough that there was no need for Apple to do anything: if they did nothing, people were still able to get 802.11ac working well. In addition, 802.11ac was much less of a selling point. The jump from .11b to .11g was the difference between nice toy for demos and generally useful. The jump from .11g to .11n meant that the WiFi was typically not the bottleneck for most users. The jump to .11ac means that WiFi is even less of a bottleneck, but it's well past the point where most people care.