David Patterson Says It's Time for New Computer Architectures and Software Languages (ieee.org)
Tekla S. Perry, writing for IEEE Spectrum: David Patterson -- University of California professor, Google engineer, and RISC pioneer -- says there's no better time than now to be a computer architect. That's because Moore's Law really is over, he says : "We are now a factor of 15 behind where we should be if Moore's Law were still operative. We are in the post -- Moore's Law era." This means, Patterson told engineers attending the 2018 @Scale Conference held in San Jose last week, that "we're at the end of the performance scaling that we are used to. When performance doubled every 18 months, people would throw out their desktop computers that were working fine because a friend's new computer was so much faster." But last year, he said, "single program performance only grew 3 percent, so it's doubling every 20 years. If you are just sitting there waiting for chips to get faster, you are going to have to wait a long time."
We've only had three new ones come out this week. We need M0AR! M0AR languages!! M0AR syntaxes!!
M0AR of all the things!
In fact, it should be a requirement for all CS majors to develop their own language before graduations, so everyone can be *THE* subject matter expert in a language. That would be awesome. Everyone would be able to charge $500/hr for being the ONLY expert in their language.
What could be wrong with this??
I would say that architecture is changing, at least for production systems. It's all about scaling horizontally instead of vertically. Sure, individual cores aren't much faster, but a couple years ago I launched 30,000 cores in two minutes on AWS, and about a year ago EC2 Spot announced a million-core stunt of some sort.
A million cores from COTS technology is a lot of performance.
Socialism: a lie told by totalitarians and believed by fools.
A SPECint graph shared on Quora shows this slowdown starting back in 2005.
https://qph.fs.quoracdn.net/ma...
3. Profit!
2. ???
1. On Soviet Slashdot, a Beowulf cluster of alien Natalie Portman overlords welcomes YOU!
I am pretty sure David Patterson is out there doing it. He is a professor in the field who has accomplished plenty. He is 70 now and is likely past his academic prime, so now he is doing what he should be doing at this time in his career: teaching, mentoring, and inspiring the next generation.
-- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
Depends on what your definition of "need" is. For example, I could say I need to run Minecraft with 220 mods, at 30 FPS, with hundreds of machine blocks. (with an i7-7700k I usually get around 11-12)
3. Profit!
2. ???
1. On Soviet Slashdot, a Beowulf cluster of alien Natalie Portman overlords welcomes YOU!
Hey, I know. We should use asychhronous techniques! At both the circuit and the architecture level. (P.S. This is sarcasm, which students of Digital Logic and Computer Engineering may find amusing.)
The other half of the joke is that async I/O was the big new feature of a recent C# version, which means it will be the hot new thing in Java in another couple of years.
Socialism: a lie told by totalitarians and believed by fools.
And? What if we don't need it to keep getting endless faster?
I worked on the BiiN project. https://en.wikipedia.org/wiki/... A 'capability' was a specific -hardware protected- feature that was set up to be unforgeable and contain access rights. This computer architecture approach date back to the Burroughs 6500 https://en.wikipedia.org/wiki/... and even back to some aspects of MULTICS.
They're definitely not von Neumann architectures, since a capability pointing to executable code is a very different thing than a capability pointing to data. In many respects, these would be "direct execution engines" for object-oriented languages (even C++, with some restrictions on that language's definition).
A huge part of this is getting over the illusion that you have any clue about the (set of) instructions generated by your compiler. If you're working on a PDP-8 or even PDP-11, C might well be close to 'assembly language'. But with the much more complex instruction sets and compiler optimizations to support those instruction sets, most languages are far removed from any knowledge of what the underlying hardware executes.
Various alternate architectures have been tried out over the decades. A lot of other programming models have been tried out as well. They all basically failed or live on only in niches because people could not hack coding for them.
Performance increases for most tasks are over. Deal with it and stop proposing silver bullets. It only makes you look stupid.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Moore's law predicted early exponential growth in semi-conductors, but as in all fields it eventually hits an inflection point and becomes asymptotic, infinite transistor density will never happen.
Teach by example. Show me an architecture that would make me want it. With C++ as a platform language, not stupid shit like java, ruby etc.
Hey, I know. We should use asychhronous techniques! At both the circuit and the architecture level. (P.S. This is sarcasm, which students of Digital Logic and Computer Engineering may find amusing.)
The other half of the joke is that async I/O was the big new feature of a recent C# version, which means it will be the hot new thing in Java in another couple of years.
Java NIO (Non-blocking I/O) was introduced in Java 4 (2002).
Because Moore's law is more about transistor density. It's an easy nit to pick.
It's still a big deal that we aren't getting any easy gains on single core speed, or, factoring in all their new fancy branch predictions, single thread performance. But newer CPUs are fitting more cores in, newer GPUs are wildly more effective (at a fundamentally parallel task). These are the arguments for Moore's law actually being still online.
Anyone who was around for the late 90s or before knows that computers simply aren't doing what they did before- completely obliterating previous generations of computing. A machine from 2008 can run most current games, and those it can't inherit their restrictions artificially (a motherboard that won't take a new enough GPU, for instance). It can certainly run the latest version of pretty much any OS, and many productivity programs. If you do that comparison from 1999 to 1989, it's a joke- a Pentium III at nearly a gigahertz compared to a 486 at like 50 or 66 megahertz. Look back again at 1979, and you are comparing to an 8086 or something.
You know, I thought my Amiga 500+ was pretty snappy even in 1995.
While we complain about bloat and poor programming, the reality is that most of the performance improvements we've seen over the history of microcomputing has been directed at creating entirely new applications that weren't feasible with the older technology. That 500+ could never run a modern web browser no matter how efficiently it was built. As for video games...
You are not alone. This is not normal. None of this is normal.
Boy, do I miss the old MC68000, where I could just look up how long something takes and could do actually fully synchronous coding for a significant performance boost.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
More cores are worthless for most tasks. Takes some actual knowledge to see that though.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
As the other indicated, it depends on what you mean by need. It also depends a great deal on what you're doing. Unfortunately, Gates law of bloat continues unabated as projects add pointless bullshit without considering how to optimize for it and how much of it is really just unnecessary bloat. Just because I have 16gb of RAM, does not mean that a process should be using it just to use it.
What we really need at this point, are better tools for using extra threads and better optimization. Most of the applications that I personally use don't benefit much from more processing power. I can give an entire core to my VM and the thing that really benefits from more processing power is compiling software, but I can do that while I'm asleep or out of the house, so having a faster computer isn't necessarily going to make it worth it for me personally.
We also need better security features to help guard against programming mistakes and to keep exploits as contained as possible.
Depends on what your definition of "need" is. For example, I could say I need to run Minecraft with 220 mods, at 30 FPS, with hundreds of machine blocks. (with an i7-7700k I usually get around 11-12)
The straight up implementation of the Multi MMC Predictor algorithm over 1,000,000 symbols takes 20 minutes to run on my laptop. That's a problem when you want to test every device in a production line. I'll happily take a faster CPU.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Um, all modern processors are RISC. Careful who you are calling an idiot.
This all points back to software devs. I've spent 2 decades dealing with low-level drivers and optimizations in assembly language. Now, not that I would expect developers to write assembly language, the problem I run into is that software developers of high level languages can't even write efficient code at their level. On top of that, they don't even understand how the language stack works, what code constructs give better performance in one language versus another. In addition, they can't even profile their code anymore or look at logs.
If anything needs changing, it's software developers first. They keep eating up all the computer resources and say "get more this/that for your computer." No, pull your head out of your 4th point of contact and learn to write efficient code. We were doing this shit in the 90s all the time. We even advertised for assembly programmers in NEXT Generation magazine, constantly!
While there's nothing wrong with using high-level languages, programmers today have lost the art of what it means to be lean and mean. I don't hire any developer unless they can demonstrate they know the stack for the language in which they use.
Me: "Oh, no assembly language experience?"
Applicant: "Oh, no. Is that required here?"
Me: "In rare cases, but I'm trying to understand if you even understand how a computer works at a fundamental level. In fact, have you ever worked with state diagrams?"
Applicant: "No."
Me: "Okay, you write an application that simply opens a file. What are the failure modes of your application and the opening of the file? Can you draw a state diagram for this?"
Application: "A flow chart?"
Me: "No, a state diagram. Given a set of inputs, I want you to diagram all outputs and failure modes for each state."
Applicants could answer these questions in the 90s and early 00s, but rarely anymore. I blame software devs for this problem. Hardware engineers are always having to pick up the slack and drag everyone up hill because software devs can't pull their own weight.
we'll just keep "upgrading" your software until you do - and we won't stop then either
This program was made possible by a grant from the Ultra-Humanite, and viewers like you.
CLR
Sounds more to me that he cannot let go of the old, failed ideas and face the reality that there is no silver bullet. There is plenty of aging CS professors around with that problem.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
I assume such a laptop would be pretty fast.
In a time of universal deceit, telling the truth is a revolutionary act. George Orwell
Yesterday at work an application team bitched because I wouldn't give them a VM with 128 CPU.
I know we've said this before, but I think we really have reached the point where the overwhelming majority of users can no longer tell, use, or appreciate an increase in processing speed. It wasn't that long ago that it was necessary to have a cutting edge CPU to do a lot of important end-user tasks. Now I do the majority of my work - which is vastly more computationally intensive than work I did not long ago - on my laptop. This isn't a cutting-edge gaming laptop or workstation replacement laptop either, it is a ThinkPad that I bought for ~$1,000 a few years ago.
Can we make processors even faster yet? Sure. Can we make code even faster too? Certainly. Will most uses notice it? Almost certainly not.
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
The oxymoron here is that David uses hardware performance to substantiate his cliam.
The computer revolution went askew taking the hardware track leaving software to rot in 1960â(TM)s state of the art computer science.
The next revolution is soft not hard.
Can't read the link so I assume it's about parallelism.
I think we welcome languages that encourage users to divide a problem into many smaller ones. But do we really need them?
What I mean to say is that the value of software lies in the APIs and libs you develop. Having it perform well in a parallel environment takes a bit of clever thinking but most of us will hack it.
There are quite a few programming models and frameworks that already allow astonishing things to happen in parallel. What is Patterson hinting at?
Moore's original law on the number of transistors in a dense integrated circuit is over. But there are sufficient derived/similar observation for parallelism. So, well designed parallel programs will still run faster or cheaper in a similar way as Moore observed more than 50 years ago.
I hadn't the slightest objection to his spending his time planning massacres for the bourgeoisie... (P.G. Wodehouse)
In his blog https://caseymuratori.com/blog... Casey Muratori advocates the move away from drivers to instruction set architectures (ISAs). Back in the day individual software could boot the entire computer in relatively few lines of code and still do its job while fitting on a single sided, single density floppy disk. Even today, you don't see game vendors making bootable Linux versions of their games that could theoretically work on both Mac and Windows, but I get his point.
You just need to be smart enough to learn how to solve your problems with more cores!
My, these assertions are easy to make.
Socialism: a lie told by totalitarians and believed by fools.
Maybe can eliminate shared libraries, dynamic linking and other archaic constructs that came into existence to protect scarce resources like RAM and disk space. Let's put each process in its own 'jail' like existence with closely monitored mechanisms for communication between processes.
Suppose you were an idiot. And suppose you were a member of congress. But then I repeat myself. -- Mark Twain
That will in no way prevent it from being the hot new thing in Java in a couple of years. Probably with all new libraries.
Socialism: a lie told by totalitarians and believed by fools.
Transistors are doubling every 24 months or so, on par with moore's original enunciation of the law, and slightly off the 18 months of his revision of said law.
What is not working anymore is _*"People's Interpretation"*_ of said law, that dictate that computers sould be 2x faster every 18 months. Moore never said that. He only said that in a given sqr centimeter of silicon, the optimum number of transistors would double every 24 months. then he latter revised the number to every 18 months.
When Moore's law was in full swing, say, in the 80's and 90's, making transistors smaller every 18 months allowed computer and processor architects to make the transistors switch faster, consume less power, we could pack more features, and make the processors cheaper, all of that at the same time. So, we had a Mhz race, on top of wich we integrated things in the processor, first paging unit, the Caches (L1 and latter L2) and Math coprocessor, then the memory controller, and more pipelines (starting with the UV pipes) and we depeloped new features/functionality, like protected mode, IA32, MMX, SSE, AMD64, et al, all of this at the same time. Please notice that all of that was made possible by moore's law, because, to implement most of those things, one needs, you know, more transistors...
At some point in the mid 00's, we hit the first roadblock, and it was not possible to make the transistors switch faster without consuming a significant amount of power, so, one dimension less.
But moore's law did not stop. We still were able to double the number of transsistors every 18 months. That's why we were still able to pack more features (SSE*, AVX*, transactional memory, virtualization support, advanced Ecryption support) and more cheaper things (higher core counts, a GPU, PCIe controllers, ETH controllers and more) in the processor for the same price. But not with increased Ghz. And the SW world has been slow to make use of many of those new features/things, it seems that performance has stagnated. If moore's law had stoped, we wouldn't have been able to increase core counts (and cache, and all the other stuff) as we did, because all those things require, you know, transistors to implement...
Now, in 2018, finally, the REAL Moore's law is strugling and limping (ask intel 10nm and GloFo 7nm people). It has a few cycles left (ask the TSMC and Samsung 7nm people), but not many more.
In the meantime, yes, new architectures are worthwhile, fine and dandy (and more publishing and headline worthy), but a more inmediate thing would be to make better use of what we already have (SSE, AVX, GPGPU, Multiple cores)...
JM2C, YMMV
*** Suerte a todos y Feliz dia!
Unfortunately, software developers keep introducing endless layers of abstraction so that shit still isn't running nearly as fast as it should be.
I think it's time for teleporters, holodecks, and replicators. Is everyone with me??
Laws are rules for the court, but merely a bottom bar to hit for life. Think beyond laws in your actions always.
The 500 Plus in 1995 was nothing to brag about.... it was still using a 68000 processor when other Motorola machines like Macs were on 68060s (at ten times the clock rate). Now if you had said "1985" then it would be impressive (the Amiga's initial release date).
I ran a modern web browser on a PowerMac G3 (1999).... slow as snails. Ditto on a Pentium 4 at 3000 megahertz. SD video is okay, but HD video runs like molasses.
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
As I posted above Moore's Law says "transistors will double every 2 years". That held true until 2015, when it slowed to 2.5 years (not a huge difference).
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
Of course it's text, what else would we use. Written languages are like 5,000 years old and have always been more expressive than pointing and gesturing.
Sure there are a few visual programming suites, but they are best suited for stream processing, and not for heavily branched general purpose code.
And a brain interface is a much harder problem, and trusting AI to write code could have devastating results.
More cores are worthless for many implementations of most tasks, which is not the same as being worthless for the task itself.
Java NIOng!
In the eighties we needed only 640kb of RAM, in the nineties a 1 Gb hard drive was more than enough space, in the new millennium HD was all the resolution we needed, and now we don't need faster chips?
My whole life the main factor leading to people accepting sucky software is that the hardware is always getting better, and by the time they ship it it runs "fast enough" on the new hardware.
I've been anticipating this for decades; eventually computing power is "good enough" that people start actually trying to write good software. In my view the hardware is at that point, and time is ripe for software changes. And this will lead to tool improvements, to be sure. Architectures are likely to change, because a big part of what goes into existing systems is exactly this desire for the architecture to be relevant for a predetermined amount of time! If the hardware isn't improving and you can't just sell people the new version every few years, that radically changes the design considerations for the whole system. So far, that hasn't happened at the consumer level, and I don't know which changes will be successful, but there are likely to be radical changes in system architecture as companies start to design systems for much longer life.
I will make one specific prognostication: As new hardware architectures are introduced, more of the software will be pushed back onto ROMs, and OS kernels will act more like microcontroller libraries where you use system functions that on more expensive hardware is implemented in ROM, and on cheaper hardware the same code gets copied to RAM (as is the case for most of the system now)
Right now, and historically, audio/video subsystems sometimes have "hardware acceleration," certain things like floating point hardware are not always available, some optical scanners have to have their firmware copied from the host driver, some laptops require custom drivers because of custom hardware support/acceleration of various subsystems, and there is no general mechanism to manage any of this. Each subsystem might have a locally-standardized interface, but there is nothing general for those classes of situations. Or at least, to the extent that there is, it is only within the C compilers that it exists. I predict much greater convergence in how these different subsystems are defined and how they interact. Return of the Thick Client! Except the cheap version will be a compatible thin client. And each subsystem will be able to be implemented as hardware, local software, or cloud services.
It depends, man times it is much cheaper for a company to buy morse ram and faster cpus then to pay someone to optimize it. People want a cheap quick fix. Not a long wait and a huge price tag in the end. Hardware is cheap and software dev is expensive. But If we run out of mores law the scale will tips.
so.. slightly delayed then?
Right anymore the CISC target is an abstraction, that lets the decode unit in the chip one last chance at optimization before sending data to the computational assemblies. 80% of the out-of-order speed advantage comes better static schedules on chip. And most of the circuitry on a modern chip is dedicated to figuring out what to do next and getting ready to do it, vs actual silicon dedicated to is. Something like a VLIW can get a lot more performance/watt by letting the compiler do the scheduling and set-up, at the penalty of being poor at branch heavy code.
Making mortgage payments and profits wasn't something Linus was worried about, and things turned out pretty well for him until recently. 20+ years is a good run. If it's as amazing an idea as you say, I don't really see any obstacles other than the ones you place in your own way.
Why miss what you can still have? Starting at under $33!
https://www.mouser.com/Product...
But I'm not sure you'd have any advantage over a $5 ARM SoC.
You couldn't run a modern web browser on that, but that's because modern web browsers and web apps are just about the most bloated sacks of shit imaginable. 10 pounds of shit in a 5 pound sack with a tear in it.
But there's nothing inherently expensive in what a web browser actually accomplishes, other than playing compressed video. Flowing text to fit the screen in real time was impractical for 8-bit processors, but by the 16-but days it was fine. Javascript is horrifying, but there's nothing impossibly slow about parsing a page description / DOM. A good binary format designed for rapid parsing, gifs instead of jpgs, images sized for "back in the day", and a scripting language designed for performance.
Socialism: a lie told by totalitarians and believed by fools.
That's the whole point of the story; with improved software architecture you might be using an algorithm that scales horizontally, in which case there would be no need at all for a faster CPU, just more CPUs. Or even, more subsystems; maybe doing too much of the work in a CPU is the problem? In a lot of tools that I use, the hard parts are done in an FPGAs and embedded microntrollers and the CPU is just managing the user interface and system buses.
And yet, it could run an old web browser that could easily display normal pages of data. It seems the only thing that it would actually choke on would be all the advertising and user tracking scripts. A few changes is CSS and things, but those have low overhead.
Study the AS/400 architecture. It does things very differently.
Only the State obtains its revenue by coercion. - Murray Rothbard
Gordon Moore's paper seems pretty clear on component count. https://drive.google.com/file/...
Only the State obtains its revenue by coercion. - Murray Rothbard
When you begin an ignorant missive with a set of requirements that are in direct opposition to your stated end goal, how do you expect someone to respond?
Seriously, your request is like someone asking for better ASCII porn in 2018.
If you want to be taken seriously in computer science, start by not sucking up to the worst language since BrainFuck.
And if your system has more than 1 user (at the same time)?
Socialism: a lie told by totalitarians and believed by fools.
Show me an architecture that would make me want it. With C++ as a platform language
"Show me a new vehicle to replace the carriage. It has to include a horse."
Ezekiel 23:20
... and more importantly, because humans can absorb information about a program faster from text than they can absorb information via other means.
Among other reasons, it's why we still have books in 2018.
Trouble was, RISC solved a problem that was temporary - lack of space on the silicon.
There is the core of your misunderstanding. The idea of RISC was not to use fewer transistors, but to have a simpler, more orthogonal instruction set with homogenous stages all running in about the same time to enable high clock speeds and pipelining. And yes, there are excellent RISC designs out there - ARM is one, and so is RISC-V. "CISC" nowadays is 99% RISC - they copied the large register sets with x86_64, and they have broken down the CISC instructions into microops that are executed RISC style since roughly forever.
Stephan
What twisted you knickers so badly?
> "Show me a new vehicle to replace the carriage. It has to include a horse."
https://www.crossroadstrailers.com/blog/wp-content/uploads/2016/08/iStock_10259053_LARGE.jpg
That's the whole point of the story; with improved software architecture you might be using an algorithm that scales horizontally, in which case there would be no need at all for a faster CPU, just more CPUs.
That isn't a new computer architecture, it's the same kind of distributed computing that I learned ~20 years ago, and it wasn't all that new back then, either. Maybe some new programming language could make it easier to implement some distributed algorithms, but current languages have the capabilities already.
I've been saying that for 8 years. Of course whenever I say it here on Slashdot I get assailed by people who can't accept the truth.
Only because you a) don't understand Moore's law, or b) can't count to 7000000. Moore's law is alive and well and only slightly behind the trend line in the past two years. Even Intel is doubling the transistor count and changing gate size every 2.5 years now instead of every 2 years as Moore's law predicted.
8 years ago? Don't tell me you, a person who's nerdy enough to read news for nerds doesn't understand that the only part of Moore's law is counting transistors. I mean it's not like you confused it with single core performance right? RIGHT?
Some pedant always crawls out of the woodwork and starts talking about "transistor density".
You mean pedants like Gordon Moore who invented the law carrying his namesake? But I mean there's no need to take anyone's word for it. errr. Except this guy's: http://www.lithoguru.com/scien... he's kind of a guru when it comes to his own law.
Of course, but have we hit that inflection point? By all accounts were only slightly behind the times in transistor count in ICs with them doubling every 3 years now instead of every 2. Still very much a large logarithmic gain.
Macs never used '060s. The last straight-up Moto chip used in a Mac was the '040, then it switched to PowerPC (which was still partly Moto, but you know what I mean).
Reminds me of a computer class in high school around the time of that switch, and the introduction of the Pentium, when our teacher was saying that Macs used 68020, '030, '040, '050... while PCs used 286, 386, 486, 586... and I had to interrupt and correct him on two counts.
-Forrest Cameranesi, Geek of all Trades
"I am Sam. Sam I am. I do not like trolls, flames, or spam."
Why is is that there are always those people on here that think we don't need anything new or faster?
Have you never run any development system and think, OMG, why is this taking so long?
We need way faster CPU's and computers, In fact we need Quantum computers.
I would love to compile 1.3 million LOC for 10 different platforms in 3 seconds.
The faster we go, the bigger the systems are they we can build. Try running a Neural Network the size of your brain!
A NN with 10^11 (one hundred billion) and 10^15 synaptic connects on your computer today...You can't...
We will need to make ones 10 to 100 times the power..
Yes. Some tasks have got a lot faster with the standard box of chips (primarily tasks that take advantage of GPUs), and a lot of tasks are faster on a single processor, as long as they take advantage of multithreading.
Honestly I'm a little surprised that clock speed increase seems to have slowed so much but an 8th generation i7 will knock compile times down a lot from a first gen.
More cores are worthless for most tasks.
Thats the whole point of the article. Getting more work out of your cores.
http://michaelsmith.id.au
We need hardware native Flash and Actionscript 3.0 as the base of all computing!
Well, I have to admit you are right. My beef with RISC isn't so much any problem with an internal microarchitecture, it's the lack of handy ASM instructions I'm used to from other platforms (most notably the VAX and the 68k). When you spend time coding on the 68k or VAX then switch to a RISC platform, you feel the suck. However, I do question your assertion that RISC was just have a more machine-clean ISA. There *was* a lack of space on silicon in the 1990's and some of the motivation to "go RISC" did stem from that fact, too.
It is computers are becoming more tied to the The Cloud where to do anything the computer has to be online. And then there's constant upgrades to upgrade in order to meet the next upgrade (and pay more money). For most of my stuff I don't need a faster computer, just something to do stuff without having to deal with downloading crap I don't need.
mfwright@batnet.com
I like how you took this car analogy, I posted this little thread to a friend's FB page. She really likes horses and also collects horse analogies like computer people collect car analogies.
mfwright@batnet.com
reddit
Plenty of porn then.
http://michaelsmith.id.au
"people would throw out their desktop computers that were working fine because a friend's new computer was so much faster." Just let that sink in, then think about it in the context of all the e-waste pollution we've created. Perhaps this is now a good thing that people can hold onto their computers for 4 - 8 years until the components actually wear out or become unsatisfactory.
4K with full game support. Not at 50 fps and lower game settings. Want some new ray tracing support in that too? Thats a new CPU and GPU.
Domestic spying is now "Benign Information Gathering"
Patterson's argument is blatantly intellectual dishonest... he talks about single program performance as if single programs are never parallel these days. He mischaracterizes Moore's law as being about single program performance (his creative definition). It is not, it is about transistor density, which continues to increase roughly according to Moore's law, and with no end in sight. Sure, process node shrink is slowing down, but parallelism is increasing rapidly, roughly balancing that. And 3D stacking is already a thing, it will accelerate. Then there are non-silicon technologies like carbon nanostructures. I don't see Moore's law halting for at least the next 20 years.
I'm sure he has a point, but faulty rhetoric detracts from it.
When all you have is a hammer, every problem starts to look like a thumb.
How much of this is slowdown is marketing driven? There's no reason to release a chip that's 50% faster if people are buying plenty of the older chip. You want to spread that out over time.
Well, last week I left a job running overnight, and in the morning it still wasn't finished. It did eventually finish, so you can argue that I don't *need* a faster computer, but I'd sure like one.
I think we've pushed this "anyone can grow up to be president" thing too far.
However that modern web browser does a lot more work because it's doing a lot more unimportant stuff as well. Browsers do a ton now of stuff that they should not be doing - Javascript and stuff sucking doing so much work that I could feel my computer slowing down until I killed the browser. Do browsers really need to be doing all this extra stuff? The Amiga was perfectly fine for an early web browser.
Consider word processing. The modern Word does not feel faster than the word processing I did on the Amiga. In fact often Word seems slow despite all the massive computing power thrown at such a simple problem. The Amiga felt crisp and snappy, while Office apps today are sluggish. They're doing exactly the same job, except that the modern Word has all these extra features added that aren't necessarily useful. As computers get faster and faster over time, applications lagged behind and became slower.
That's part of the problem here - with all this computing power why don't we see anything good come from it in our applications? We only buy faster computers just to stay ahead of the application slowdown. Just look at word processors, every year they're slower and there's nothing new in them since the Amiga days except for spelling/grammar checking and rendering. The rendering is fast today and not a cause of slowdowns, instead it is the application portion of Word that is the bottleneck.
1995 was actually after they stopped building Amigas. The 500 Plus was in 1991, and more of a last gasp, while the original 500 was in 1987.
The re-order is fairly static/deterministic on x86 on the instruction stream level. Yes there's some dynamism to make up for cache misses and mis-predicts, but that's about 20% of the performance gain. However the re-order logic takes up a lot of die space and power in the critical pathway. Additionally x86 can't be a VLIW machine, it has a really hard time dispatching 4 ops/cycle, whereas static in-order machines can dispatch many more ops. The 32nm Itaniums could do 6-12 ops/cyle. The Mill is designed to sustain 33 ops/second at the high end of the familiy. Whether a compiler can actually find that in the code is a different question. Itanium was a decent architecture even if a commercial flop between delays, compiler problems, and x86_64. I think the Mill may get it's foot in the door not by competing head to head with Xeons in conventional servers, but by being lower power HPC, and be being well-matched for the micro-kernel space. Strong hardware memory and process isolation combined with 3 cycle function calls (including interupts, and calls across permission boundries.) could effectively solve the IPC penalty on mainstream processors.
I recently upgraded to an i7, from a CPU that was first released about 10 years prior. According to online benchmarks, my new CPU is only 5x faster at single threaded tasks. There are way more transistors in my new CPU that do more work in parallel, even within a single core. Instruction decoding, instruction decode caching, scheduling, speculating, .... That theoretical 5x speed up has more to do with throwing more transistors at the problem, not because the transistors themselves are much faster.
09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
You seem to know nothing about algorithm performance analysis. Please stop claiming nonsense. You are just disgracing yourself.
The fact of the matter is that for many important classes of algorithms, using more cores slows things down and that is a hard, theoretical fact, so there is no way around it. For many others, you get speed-ups per additional core that are so bad that the communication overhead kills all advantages. This has been known for something like 50 years or longer.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Then they each get less performance than if they had their own computer due to various bottle-necks. That is not scaling.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Yes, so? That is what you say and it has nothing to do whith what I just said. For most tasks, it does not matter how good the implementation is, more cores do not help.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
If only I could get every slashdotter to take an hour out from flaming and look over the mill architecture diagram: http://millcomputing.com/wiki/... Or burn an hour grokking some part of it they might want to understand (ivan is a trip to watch) https://millcomputing.com/docs... It would be a better world. The Mill folk think way out of the box.
Written language is a mature technology that is exceptionally well optimized for interfacing to humans.
Having an AI write code would require strong AI and that is not even on the distant horizon, despite the current demented hype where things that have been known for half a century or longer are suddenly called "AI".
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
You don't ... do backend software, do you.
Socialism: a lie told by totalitarians and believed by fools.
Moreover, compared to on-chip or at least on-board multiprocessing, the network is really, really slow. Seriously long-ass latency. So cloud scaling has more limits on the number of algorithms for which it is actually useful than even very simple highly parallel architectures.
Someone had to do it.
But this is the thing. I work on an asynchronous back end application written in javascript (stupid design decision, I wasn't around at the time). It uses all of one core and nothing from other cores so there is little I can do to give it more resources to work on. If the JS virtual machine could cope with working across multiple cores, then all would be good.
Thats what this article is about, and its a good question.
http://michaelsmith.id.au
Just give me more cores, so I can compile faster.
We already have multi-architecture computing. Ever since MMX, SSE, 3dNow, AVX and all that back in the late 90s SIMD architecture has been on the map. And lots of computing is nowadays offloaded to GPUs. Most real CPU cycles are wasted waiting from stuff from a disk while updating some visual doodah on a screen. We probably need a doodah coprocessor that executes CSS animations directly in hardware.
The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
This attitude of expecting absolute, immediate solutions for complex problems is one of the reasons why lots of things go wrong, not just with software/hardware but in general. Currently, there are a myriad of available alternatives to account for each single step of the process which are either far from optimal or being systematically misused.
Anyone interested in doing things properly at each single level (efficiency, security, scalability, etc.) has tons of available resources to do it under quite favourable conditions. But no matter how many resources you will have at your disposal, complex things will always require a relevant effort and doing things properly will always be more difficult than taking the easier path. If you focus on immediate goals and let irrelevant-from-the-technical-perspective concerns to affect important decisions, you would get a bad product and lots problems. And it will be your fault, your incompetence, your lack of understanding, your wrong decisions.
I am all for improving and further easing increasingly-complex tasks, but nothing of this should ever be seen as a magical solution. There is no magical solution anywhere. The ones knowing and working harder and caring about doing things properly (and being allowed to make relevant decisions on those fronts! This apparently-evident clarification doesn't seem too clear for quite a few people in the IT world) will get excellent results, everyone else will be, in the best scenario, conditioned by the circumstances, by pure chance.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
Yes but would you still get 60+fps in 4k (numbers made up), here is the thing compared to CPU and gpu storrage is relatively cheap and with ssds fast enugh to not be a major pitam so the fame deva trade storrage for performance, is this right or wrong? Not shore, what do you think? I agree that the initial download of 80GB is a rather boring process (esp if you are stuck on a pipe that is nerrower than it should be)
...the only part of Moore's law is counting transistors.
Moores law was as observation stated in the 60s... what was implicitly meant is dennard scaling, which was not explicitly defined until the 70s. Doubling _was_ a side effect of halfing the transistor size and being able to fit double the transistors on the same die, but you get two things from this:
1. More transistor logic possibilties.
2. faster clock
Number 2 is obviously what everyone used to fixate on because it's an easy to quantify, single dimmensional number that you can slap on the front of boxes and don't have to explain to anyone.
I mean it's not like you confused it with single core performance right? RIGHT?
It pretty much used to mean that, and largely still does when interpretted as intended. We are reduced to improving single core speed through moving logic around only - the only architectural agnostic things to quantify here are tricks of computational reduction and speculation, and not everything _can_ be computationally reduced - many operations are just too fundamental that out of a larger context they are always going to be limmited to propagation delay which has stayed still since 2006.
Actually, that is something that was solved a few decades ago. There is no need for anything new here, the only need would be that the people that wrote this JS engine knew the state-of-the-art. That is if there was no good reason to make this engine single-thread. Making it multi-threaded comes with a host of well-known problems, that may or may not make it worth it. In particular, single-threaded applications get slowed down and garbage-collections becomes much more difficult to do.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
No, by requiring C++ there is no new software. Not reading linked articles is standard here, not reading the blurb is normal too, however not even reading the title?
I don't think you properly read the post to which you are replying. Their first sentence wasn't meant to be read literally, as their second sentence makes clear. They were simply pointing out that it's easy to make blanket assertions that aren't really true. The particular assertion-that's-not-really-true is the following:
> More cores are worthless for most tasks.
This statement would be _definitely_ true if you'd said, "more cores are worthless for _many_ tasks". Now it's far too easy to assume that everyone on the Internet is a total idiot. But the truth is that many people aren't (well we all are, some of the time). It's really best to give people the benefit of the doubt.
In particular, if someone claims that a million core is a lot of performance, it doesn't necessarily mean they don't know what can and cannot be done with those cores. Especially if they describe spinning up lots of cores on AWS as a stunt.
Google, Amazon. Netflix and the people who build and use super-computing clusters all use lots of cores. And they know what they're doing.
In another post, you also said this:
> compared to on-chip or at least on-board multiprocessing, the network is really, really slow
This is, of course, true. But it's not the whole story. The "network" and "on-board" aren't quite as distinct concepts as they once were. Especially in a datacentre or super-computing cluster. In practice, network latencies are often surprisingly small. (And, conversely, the latency within a CPU or board is often surprisingly high - they're full of networks these days) The plain fact is that super-computing clusters (and the datacentres they resemble) get more work done than if they had a single CPU.
We'd all like faster cores. But the people scaling horizontally are not idiots.
Your comment is right on the money. I recall the time when the rapid succession of ever bigger and more complicated operating systems demanded that we all upgrade hardware, because hardware that did well under Windows 3.1 would not do well running Windows 95 or 98. The hardware that run Windows 95 or 98 well, would not run Windows 2000 or XP. The XP hardware was not sufficiently fast for Windows Vista or 7.
The reason for that was that the older OSes like 95 were junk. They lacked true multi-user multitasking features, they sucked at networking and security. But the subsequent OSes fixed that.
After Windows 7, the Microsoft operating systems (as well as others) have reached a certain degree of maturity. These were now OSes with true multitasking, multi-user, networking and security features. As a result, the trend after Windows Vista was that the subsequent versions would actually run better and better on the same hardware. A computer that comfortably run Windows Vista, can still run Windows 10 today. I have tested this hypothesis on a couple of Core 2 desktops from something like 2008-9.
What Moore's Law really means in this context is that processors aren't getting faster.
From https://en.wikipedia.org/wiki/Moore's_law: "Moore's law is the observation that the number of transistors in a dense integrated circuit doubles about every two years." Moore's law is *not* about CPU speed. From the same link: "In general, it is not logically sound to extrapolate from the historical growth rate into the indefinite future."
If you know anything about creating circuitry from silicon and its compounds, the statement on extrapolation won't be shocking news.
Circle the wagons and fire inward. Entropy increases without bounds.
The fact of the matter is that for many important classes of algorithms, using more cores slows things down and that is a hard, theoretical fact
While you are technically correct, there are trade-offs that can be made. Many programmers limit themselves to preserving ordering and never losing data. There are cases where a lossy or non-order preserving alternative algorithm can have near perfect linear scaling, but you need to make sure you system is designed and can fundamentally accept this trade-off.
A simple example that I've done. I wrote a high concurrency library that had an estimated time to completion. The way it did that was to update the average rate of work units processed and the remaining. When I did proper locking, the fine grained work units were causing high contention, resulting a maximum of 80-90% utilization on a quad core HT CPU. I removed the locks and just optimistically checked a variable padded to its own cache-line. If the data-structure was in use, just don't update and discard the timing results of the current work unit, and if it wasn't in use, then flag it in use and update. If a race condition occurred and two updates happened at the same time, it didn't matter because the the values were designed to not be accumulative and would approach correct.
In the end, there was an ever so slight increase of variability in the presented ETC, but it was still capable of predicting what time it would finish within 1% of error within the first few minutes of a multi-hour run, and now it was able to scale to 99% CPU on a 32core HT server. Not that the increased variability mattered much because there was already a natural amount of minor variability due to available computational resources. In short, the increased variability was measurably different, but within the std-dev of existing uncontrollable fluctuations.
Moral of this store. In concurrency, perfect can be the enemy of good.
Spin up several worker nodes and load balance?
The answer to the problem is to let people who can do concurrency well, have a say in the architecture and design of systems. The problem is this whole top-down management style. It needs to be bi-directional. Most of your talent is in the trenches and it's a recipe for disaster to tell your talent how to do their job when they see the flaws in the designs. No one person can both be technically fluent and have the communication skills required to be the perfect lead.
Programming is a huge industry. People saying "concurrency is hard" is akin to a typical person saying calculus is hard. Some people find it easy. But don't expect people who find calculus easy to find communication easy.
Coding is just a transcription process. There's a fundamental issue that in order to make a system that can code for you, that you need a way to unambiguously tell that system what you want it to do. Guess what coding is. Telling a system in an unambiguous way what you want it to do. What we really want is a way to further divorce programmers from the nitty-gritty details that consume most of our coding time.
Languages up to this point have done a decent job of just that by hiding ASM from the coder, but we've hit a brick wall. We keep creating new specialized languages, but each has a trade-off and most of the time they're specialized in order to deal with specific back-ends or front-ends, not so much as general purpose.
I personally think the first effective usage of AI to enhancing coding is better refactoring tooling. It can be a time-consuming operation, especially in large systems. Imagine instead of refactoring bits at a time, you can do the entire system in one step and have it ready for the next release. Could be done by assigning attributes to methods and variables and specify guarantees required. You still have an architectural and design issue that you have to specify the correct attributes, but even if you miss one, it could be as easy and adding the missing or changing an incorrect one and letting the AI restructure the system.
Magically mapping naive algorithms to efficient algorithms would be a start. I'll be working on making it faster the hard way - probably parallelizing it, possibly making a more efficient algorithm.
FPGAs are handy when you are running locally - especially if you happen to be a hardware engineer. Not so much when you're expecting it to run on arbitrary machines.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
It's easy to have an idea about wanting a "better" way to develop software. Many people go farther, actually creating new languages.
Haskell anyone? It's a functional language that's supposed to be superior to classics like C++. It's used by about 0.4% of programmers, according to Stack Overflow. How about Groovy, Scala, Kotlin?
What's hard is to get people on board, excited about your new programming idea.
I recently wrote a program to compare DNA results, using C#. Basically, it has to compare two CSV files containing about a million rows of data, and spit out a list of matching rows.
On my first attempt, it took 37 seconds to match two files.
When I looked into what was taking the most time, it turned out to be my use of RegEx for parsing. Switching to good old Split() and x.Parse() functions brought the time down to 6 seconds.
Next, I found that File.ReadAllLines() was taking about 1.5 seconds per file. Switching to File.ReadAllBytes() took only 20 ms, bringing down the time per pair of files to about 2 seconds.
Next, I switched to parsing the fields character by character, using mainly switch() statements. In the end, I got the total time to match two files down to 160 ms.
No change of language was required, just tuning effort.
I suspect that this is true with much of our bloated software. It's slow, not because of the language, but because nobody does any real work to improve the performance of software.
Only works if the application is amenable to doing that. In this case its not, short of a complete rewrite. The rewrite is under way so yeah some support for parallel processing at the application level can work as a last resort.
http://michaelsmith.id.au
Why would running on arbitrary machines even be interesting?
Yeah, if that is what you want, by definition you're only working within the lowest common denominator and you don't need to worry about new architecture possibilities; that becomes relevant 20 years after most of the market switched.
I'd be much more interested in automatically mapping naive algorithms to efficient algorithms on a wide variety of new systems built according to one of the new architectures discussed! :)
You're saying, it isn't a new architecture because the engineering principles aren't new.
I would suggest you reconsider the difference between architecture and engineering.
It's a great time to be a battery engineer. Just create a new battery that's 10 X better than what's out there.
Woo hoo! Now what else should we do?
I am pretty sure David Patterson is out there doing it. He is a professor in the field who has accomplished plenty. He is 70 now and is likely past his academic prime, so now he is doing what he should be doing at this time in his career: teaching, mentoring, and inspiring the next generation.
How can you say, "Past your prime". I am 80 and doing hardware and software design. I am inspired and I enspire others.
For the desktop, what I foresee is migration to loosely coupled systems. Don't need 32cpus on a chip, instead have a half dozen sub-systems with multi-core cpus. My next major desktop should have a system(with it's cpu) for file-IO, another for security, another for user interface and perhaps more specialized subsystems. Each system would be independent of the other and communicate between each other via some fibre optic interface
Leslie Satenstein Montreal Quebec Canada
Why would running on arbitrary machines even be interesting?
Because I don't expect every machine to have an FPGA rig running next to it.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
It was out of step with the direction of OS development at the time. This is one of the reasons it didn't catch on.
Many of the people who worked on it are now retired or dead.
As a CPU design with a security focus and with specific security guarantees built in, it was indeed ahead of its time. We surely are finding we need such ideas in CPUs today.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Why in the 21st century are we still TYPING code?!
Because we are too busy complaining that we don't have jet packs or robot butlers, and we still pay for electricity.
And fail. Coding is far more than a "transcription process". If it were, the input to that process would need to have the same level of detail as the output. Guess what, in that situation it is less effort to code directly than create that input.
What actually happens is that coding transforms a somewhat to very abstract description into a concrete one in a programming language, respecting aspects of performance, reliability, security and others. It is a process that requires insight and some degree of creativity. Approaches to "program" on the level of a formal specification language have failed some 40 years ago or so, because doing this is more effort than coding directly and the translation result is not very good. The same still applies today.
And no, "AI" cannot do refactoring, because that requires coding skills and no machine in existence has those or will have those for the foreseeable future.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Is "aswell" a word?
Martley, Near Worcester UK.
Meh. Polystyrene balls and toothpicks FTW.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Does not compose, except at the highest level of the application. The curse of NodeJS
The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
I like the BEAM, but elixir is a badly designed language. Who designs functional languages without algebraic sum types and pattern matching!?
The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
And hardware instructions for CSS since the machine animating a doodah while waiting for the disk in any case.
The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
If Erlang does stream fusion I am the pope. The concurrency is good though.
The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
Patterson is a CPU architecture God.
The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
It was called the Pentium instead because if you added 100 to 486 you got 585.999983605.
TY,IHAW.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
NNIO
The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
Which is is why it is dying...
The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
It pretty much used to mean that, and largely still does when interpretted as intended.
Not even remotely. In fact Moore was quite explicit in what he defined as a component in his observation and prediction. "a component being a
transistor, resistor, diode or capacitor, in an integrated structure"
Just because work is now split up into multiple cores doesn't make the single die any less of an integrated structure, and doesn't change the purpose of the law which was to count functions on an IC.
Moore's law doesn't even take into account the physical size of the IC. So even if transistors stayed the same, simply throwing double the cores at the problem and making the chip twice as large still is very much Moore's original observation which had everything to do with manufacturing cost when he defined his law.
You can make up whatever you want. It won't make it any less wrong when discussing what someone explicitly said in a paper.
Teach by example. Show me an architecture that would make me want it. With C++ as a platform language, not stupid shit like java, ruby etc.
If you want it linked with C++ (or C) as C++ is, then you have already failed.
On the C++ side, how is it that multiplying two INTs produces an INT?
On the architecture side, why doesn't each register include stateful flags?
The other half of the joke is that async I/O was the big new feature of a recent C# version, which means it will be the hot new thing in Java in another couple of years.
Java NIO (Non-blocking I/O) was introduced in Java 4 (2002).
Non-blocking I/O and asynchronous I/O are not the same.
http://www.programmr.com/blogs...
When your computer is fast enough to do everything that you need to get done in a reasonable amount of time, do you really need faster chips?
Yes, we need faster chips so we can produce slower software.
Also the lead designer of Itanium died before it was brought to market and the vision was compromised as a result.
Transistors are doubling every 24 months or so, on par with moore's original enunciation of the law, and slightly off the 18 months of his revision of said law.
He did not even say that and even Wikipedia gets it wrong with his original quote right there.
The complexity for minimum component costs has increased at a rate of roughly a factor of two per year.
It is about economics. For a given cost, the number of transistors doubles no matter whether that is due to increased density, increased area, or increased packaging. And this happens even if transistor performance is decreased which has happened several times.
Sure, "many important classes". And the rest? You know, the stuff almost every business does?
Here's an example: lets say you want to render the next Pixar movie. At what point do more cores stop helping? For reference, you need to render 150,000 frames, each with maybe a half dozen independently rendered effects that are later composited. Could you find a way to use 1 million cores?
Here's an example: lets say you run a web service that gets more than 10,000 hits per second, but does nothing computationally intensive for each request, beyond the normal crypto. Do you care how fast your cores are, or how many cores you have?
Yeah, yeah, academic calculations are great and all, but most of the world need to get shit done.
Socialism: a lie told by totalitarians and believed by fools.
So, have one server for each user. Easy. Realistically, you can get away with 100 or 1000 users on each server, as most real-world tasks aren't computationally intensive, and a whole lot of users can wait idly for I/O to finish.
Socialism: a lie told by totalitarians and believed by fools.
How can you say, "Past your prime". I am 80 and doing hardware and software design. I am inspired and I enspire others.
Past your prime does not mean useless. Tom Brady is past his prime but still one of the best quarterbacks in the NFL. Humans peak physically earlier than we peak mentally, but both still happen. And it is long before we reach 80. Not all aspects of mental acuity peak; vocabulary for instance steadily climbs through age, while working memory peaks in our mid-30's.
-- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
The other half of the joke is that async I/O was the big new feature of a recent C# version, which means it will be the hot new thing in Java in another couple of years.
Java NIO (Non-blocking I/O) was introduced in Java 4 (2002).
Non-blocking I/O and asynchronous I/O are not the same.
http://www.programmr.com/blogs...
Java NIO also introduces Non-blocking IO
What actually happens is that coding transforms a somewhat to very abstract description into a concrete
That's called design. Coding is literally the act of write code. Code is how human currently communicate with computers and potentially other humans. "Designing" code is taking an idea and thinking of a way to express it in code. Design is something rarely done. Most people just get to working pumping out code.
Moore's law doesn't even take into account the physical size of the IC. So even if transistors stayed the same, simply throwing double the cores at the problem and making the chip twice as large still is very much Moore's original observation which had everything to do with manufacturing cost when he defined his law.
You are completely ignoring the context it was observed in... in an era where shrinking process size was the driving force in doubling transistor count - a phenomenon that improved a processor threefold: increases transistor count, switching speed, reduces local propagation delay, increases power efficiency and most importantly had a reasonably sustainable future to _continue_ improving until it hit the fundamental limitations of the materials used.
Now look at the remaining ways transistor doubling happens without shrinking the process size: increasing die area? that only increases transistor count, it does nothing for switching speed, propagation delay, power efficiency and introduces logistic problems, thermal problems and most importantly has little future (how sustainably can you keep increasing die area before we go back to room sized computers - except with out current transistor density those room sized computers would be consuming trillions more watts than their predecessors).
In short: your arguments seems to be that moore's observation is alive because transistor count still increasing in spite of it happening in an entirely different way at an slower rate, with almost none of the benefits that the original phenomenon endowed on processors with no sustainable way of continuing to increase transistor count in the foreseeable future.
Moore's law is informal observation, not a well defined law, if you interpret it as a pedant you will miss the point - look at it from the authors perspective.