Domain: millcomputing.com
Stories and comments across the archive that link to millcomputing.com.
Comments · 39
-
Re:Nvidia had RISC-V on their GPUs for years.
The controllers on their graphics cards are RISC-V. Now they're considering implementing their compute cores as well since the compilers are good enough.
Assuming this trajectory keeps up for the next couple of years, nothing short of a Mill Computing level breakthrough will stop RISC-V from replacing ARM and x86. There's just little to no value in paying for ISA IP when the fabs are doing all the real hard work anyhow.
I’m just curious how long we will remain in the dark ages because of Imaginary Property. Many are skeptical of the Mill, but suppose it pans out; how would that innovation benefit people in this lifetime? We’d now have a wonderful new proprietary architecture that no one will touch, because there isn’t a second source. So it will remain confined to niches until the patents run out and someone implements the ideas anew, which may only repeat the cycle with a minor variation. Without an open model, I fear the Mill will be doomed to obscurity.
The RISC-V ecosystem has demonstrated rapid progress with community efforts, and while the architecture is more attractive than ARM and x86, it is basically a nice yet open conventional ISA which suffers from the same fundamental drawbacks. Even before entering the nightmare of speculative execution exploits, the hardware security mechanisms have long been lacking. Current in-order RISC-V cores are extremely compact, making them attractive for embedded and many-core applications. However, OoO will reduce security even further, and the performance it offers comes at great expense in complexity, area, and power, sacrificing much of the benefit.
The Mill aims to deliver DSP efficiency and cost on general purpose workloads, and is invulnerable to those exploits and many others. One might argue that the Mill performance advantage is a luxury, but the greatly enhanced security characteristics of a Mill are not; they are basic functionality which is desperately needed by general purpose CPUs. While the Mill is a fascinating novel architecture, the most compelling aspect may be the security model, which will enable efficient microkernels and much greater isolation in applications.
-
I love the Mill
If only I could get every slashdotter to take an hour out from flaming and look over the mill architecture diagram: http://millcomputing.com/wiki/... Or burn an hour grokking some part of it they might want to understand (ivan is a trip to watch) https://millcomputing.com/docs... It would be a better world. The Mill folk think way out of the box.
-
I love the Mill
If only I could get every slashdotter to take an hour out from flaming and look over the mill architecture diagram: http://millcomputing.com/wiki/... Or burn an hour grokking some part of it they might want to understand (ivan is a trip to watch) https://millcomputing.com/docs... It would be a better world. The Mill folk think way out of the box.
-
Or is it Mill Computing?
-
A corollary applies to monolithic applications
Web browsers rival operating systems in size and complexity, and are also hopelessly insecure. The main problem, shared with microkernels, is that the protection mechanisms available in common hardware don't allow efficient or convenient communication between protection domains, which are tied to address spaces. In order to cross the boundary, the address mappings must be flushed and reloaded, or at least manipulated, which are both very expensive operations. This makes any IPC very expensive, so the preferred means of communicating is by sharing memory, and for convenience and performance, nearly everything ends up in the same address space. Thus, the inevitable compromise of any part of these monolithic kernels and applications, is a compromise of the whole.
Without better hardware mechanisms for protection, that allow for efficient protection within the kernel and applications themselves, effective security will remain illusory. The furious and endless effort will continue in a futile attempt to hold the line against the flood of exploits. It is an intractable problem, unless we can shrink the protection domains to contain the effects of inevitable breaches. Capability-based addressing as with CHERI offers one approach, and the Mill architecture offers another. (see the Memory, Security, and IPC talks specifically.) Each represent a different set of trade-offs, which will limit applications. In any case, it is an area that needs work, so if there really are any nerds left on Slashdot, get to it, or at least help fund such efforts.
-
Re: Whatever happened to step changes?
Without javascript you might have missed the docs link under "Technology", but there is a wealth of information available in the videos there. (or even in this brief introductory forum post for the impatient.)
The Mill is not an Itanium; they had the opportunity to learn from Intel's failure, and offer compelling solutions to its deficiencies. The mechanisms available do support OoO performance with static scheduling, and the hardware abstraction is a near ideal compiler target, as surprising as that might be. As you suggest, it is a non-trivial problem, and solutions were not for sale, even given Intel's vast resources. A lot of clever thinking went into the Mill over a long period of time, and they produced something quite elegant, if very different.
Judging only by the Itanium, the skepticism is understandable. If one takes the time to understand the Mill though, their claims look very plausible if not overly conservative. Time will tell, but it is nevertheless fascinating.
-
Re: Whatever happened to step changes?
Without javascript you might have missed the docs link under "Technology", but there is a wealth of information available in the videos there. (or even in this brief introductory forum post for the impatient.)
The Mill is not an Itanium; they had the opportunity to learn from Intel's failure, and offer compelling solutions to its deficiencies. The mechanisms available do support OoO performance with static scheduling, and the hardware abstraction is a near ideal compiler target, as surprising as that might be. As you suggest, it is a non-trivial problem, and solutions were not for sale, even given Intel's vast resources. A lot of clever thinking went into the Mill over a long period of time, and they produced something quite elegant, if very different.
Judging only by the Itanium, the skepticism is understandable. If one takes the time to understand the Mill though, their claims look very plausible if not overly conservative. Time will tell, but it is nevertheless fascinating.
-
Re:Whatever happened to step changes?
I'd like to see a multidisciplinary team take a look at software and hardware together and see if a bit of re-imagination can improve both. Security should be in there as well.
For instance, how about thinking about things like? [...]
I love the open source stuff. It just seems that improvements these days are incremental, and no one is really trying to take a step back, understand the big picture and perhaps take a few risks to try to get step change in say performance per watt.
I thought risc was the way of the future when it first came out, yet Intel dominates with their fairly complex architecture. Why and are the problems solvable? A quick review of Itanium seems to indicate that the magic compilers to make all the hype work never materialized to the extent expected. I do wonder if that will always be the case, since finding ways to reduce processor complexity are still very appealing...
You might be interested in the Mill architecture, which is a thoughtful re-imagining of general purpose architecture from scratch, addressing many of the problems plaguing existing technologies. It is not a small departure from conventional thinking, but a shift in paradigm with the potential to considerably simplify hardware and compilers. The takeaway is OoO performance at DSP area/cost and power efficiency, with an elegant hardware abstraction.
-
Re:Not many CPU designs are
Or something more modern, that addresses Itanium's (many) issues. The Mill is impervious to speculative exploits, and also others owing to hardware security mechanisms.
Securing conventional OoO hardware is a fool's errand. Patching one hole after another will not restore the confidence lost in awful hardware where security was a distant afterthought. Many if not most exploits can be traced to the lack of adequate hardware protection mechanisms, and the solution won't be a downloadable update. There are also RISC-V efforts aimed at securing hardware, but it remains to be seen if even a new OoO can avoid all speculation pitfalls, and the impervious in-order designs lack performance.
On the bright side, the Mill is expected to be at least an order of magnitude more power efficient than out of order, while using considerably less area. Along with the death of Moore's Law, at least there is now increasing motivation to reconsider architecture from the foundation. Sadly, the systems of intellectual monopoly in place will likely retard deployment.
-
Re:Not many CPU designs are
Or something more modern, that addresses Itanium's (many) issues. The Mill is impervious to speculative exploits, and also others owing to hardware security mechanisms.
Securing conventional OoO hardware is a fool's errand. Patching one hole after another will not restore the confidence lost in awful hardware where security was a distant afterthought. Many if not most exploits can be traced to the lack of adequate hardware protection mechanisms, and the solution won't be a downloadable update. There are also RISC-V efforts aimed at securing hardware, but it remains to be seen if even a new OoO can avoid all speculation pitfalls, and the impervious in-order designs lack performance.
On the bright side, the Mill is expected to be at least an order of magnitude more power efficient than out of order, while using considerably less area. Along with the death of Moore's Law, at least there is now increasing motivation to reconsider architecture from the foundation. Sadly, the systems of intellectual monopoly in place will likely retard deployment.
-
Re:Not many CPU designs are
Or something more modern, that addresses Itanium's (many) issues. The Mill is impervious to speculative exploits, and also others owing to hardware security mechanisms.
Securing conventional OoO hardware is a fool's errand. Patching one hole after another will not restore the confidence lost in awful hardware where security was a distant afterthought. Many if not most exploits can be traced to the lack of adequate hardware protection mechanisms, and the solution won't be a downloadable update. There are also RISC-V efforts aimed at securing hardware, but it remains to be seen if even a new OoO can avoid all speculation pitfalls, and the impervious in-order designs lack performance.
On the bright side, the Mill is expected to be at least an order of magnitude more power efficient than out of order, while using considerably less area. Along with the death of Moore's Law, at least there is now increasing motivation to reconsider architecture from the foundation. Sadly, the systems of intellectual monopoly in place will likely retard deployment.
-
Re:IA-64 is better.
Yep. Kinda like the ill-fated iAPX 432 (Intel's first stab at a 32-bit x86 CPU). Certain supporting technologies had to mature in tandem, and they finally did four years later when the 386 came out.
If they were smart, Intel would buy up all the Mill Computing IP and base a new architecture off of that. They should think about it whilst they're still sitting on a decent pile of cash. -
Re:Intel did not turn down Apple
This states that WASM is not SSA, though it can be decoded to a compiler's internal SSA form.
While it's not exactly a competitor, the Mill architecture similarly provides an abstraction of the physical ISA that is more amenable to compilers. However, the Mill hardware itself is fundamentally SSA in nature, and naturally extends to a generalized form, presenting a genuine SSA target for compilers.
-
Re:It's not time to reinvent the past
Slashdot largely seems to be missing the point of RISC-V. It isn't so much about having an open source processor, as an open specification that anyone can easily and freely implement and extend. The basic open designs are implemented in a high level design language and may be readily composed with a rich and growing selection of peripheral hardware in a flourishing ecosystem. The ISA itself is just a simple and elegant RISC, but the offer of escape from vendor lock-in or maintaining custom designs and toolchains is clearly very attractive to industry.
Even so, while RISC-V will be great for embedded applications and running legacy operating systems with minimal change, no conventional architecture will ever really be safe in a network facing system. We need a much better architectural foundation to enable genuinely trustworthy and secure systems, or there will be no stemming the flood of vulnerabilities.
The Mill Architecture is one prospect which promises very effective security mechanisms. Many common exploit vectors become impossible, and protection is flexible and virtually free, enabling the implementation of true micro-kernel based operating systems. There are many compelling aspects of the Mill, but it is not a trivial effort, and it will be a while for the hardware and ecosystem to develop, if it does while encumbered by patents. Meanwhile, it will remain a fascinating and inspiring curiosity which may be explored further under docs.
-
Re:It's not time to reinvent the past
Slashdot largely seems to be missing the point of RISC-V. It isn't so much about having an open source processor, as an open specification that anyone can easily and freely implement and extend. The basic open designs are implemented in a high level design language and may be readily composed with a rich and growing selection of peripheral hardware in a flourishing ecosystem. The ISA itself is just a simple and elegant RISC, but the offer of escape from vendor lock-in or maintaining custom designs and toolchains is clearly very attractive to industry.
Even so, while RISC-V will be great for embedded applications and running legacy operating systems with minimal change, no conventional architecture will ever really be safe in a network facing system. We need a much better architectural foundation to enable genuinely trustworthy and secure systems, or there will be no stemming the flood of vulnerabilities.
The Mill Architecture is one prospect which promises very effective security mechanisms. Many common exploit vectors become impossible, and protection is flexible and virtually free, enabling the implementation of true micro-kernel based operating systems. There are many compelling aspects of the Mill, but it is not a trivial effort, and it will be a while for the hardware and ecosystem to develop, if it does while encumbered by patents. Meanwhile, it will remain a fascinating and inspiring curiosity which may be explored further under docs.
-
I'm 60 and still very active...
You could claim that I've gone into management since I'm the CTO of Open iT, a multinational sw development corporation, but as long as I still get to do as much interesting programming as I want to, I will consider myself a programmer.
Besides my daytime work I'm involved with Network Time Protocol and I'm also part of Mill Computing which is a team of mostly very mature people trying to develop a _really_ interesting cpu architecture, please take a look. That team is lead by our own real-life wizard and Gandalf lookalike, Ivan Godard (do an image search...). As part of my Mill work I am also active in the ieee754 2018 revision, i.e. the update to the international floating point standard.
In my spare time I'm the leader of the Mapping Commission of the Norwegian Orienteering Federation, a job I got mostly due to my interest in developing sw to create much better base maps based on LiDAR point clouds.
Previously in my career I've worked on video and audio coding/optimization, including DVD, BluRay and Ogg Vorbis, as well as helping optimize the Quake assembly code. I've also worked on one of the AES candidates and at one point I doubled the speed of a research Computational Fluid Chemistry code. My Warhol moment might have been when I by accident made the first public disclosure of the FDIV bug (on usenet:comp.sys.intel) and then wrote most of the (compiler) SW workaround for that.
I have no intention to retire until I'm much closer to 70! (If I did that my wife who's a mechanical engineer and responsible for making the trains in Norway run on time, would expect me to make dinner for her every day, as well as doing all the cleaning and laundry.
:-) )Terje
-
Re:VLIW Had Other Problems Too
Where VLIW works, it uses much less power than an out of order superscalar, as evidenced in the DSP space. Multicore may be more flexible for general purpose code, but it doesn't fix the substantial power and area cost of an OOO processor. The Itanium was an attempt at a more general purpose wide architecture, but it was half-baked and very limited.
The Mill architecture incorporates some of the valuable ideas, and there is a reasonable expectation that it will deliver OOO performance with DSP level power and area. It has a number of novel innovations that make this possible, described in more depth at their own site.
The relatively small footprint will also enable more cores in a similar area, though memory will still limit throughput causing diminishing returns. The Mill does have mechanisms that allow it to reduce memory traffic though, such as (mostly) avoiding the need to write dead stack frames back to main memory.
-
Re:Because we're already close
To get a 4x or 8x improvement in size, power, or speed would imply there's a revolutionary way to do things that we just don't quite know yet. And it better be something which can be quickly turned to production because Moore's Law hasn't stopped yet. If you have a 4x improvement idea but it takes five years to release, it won't get funded. Plain CMOS silicon has too good a chance of catching up.
However, architectural improvements also benefit on new processes as well. The problem is that most evolutionary improvements implemented by competitors could not overcome the gap with Intel's latest process. Now that their process advantage is narrowing, there is a growing opportunity for better architectures.
I think the bottom line is, it's really hard to produce a system which really is even 2x faster than the competition. 4x is incredible and 8x probably has never been done.
Not only that, but architectures offering revolutionary performance improvements are almost certain to be accompanied by a radical departure in design, which takes time to implement in both software and hardware.
The Itanium did have some attractive ideas, but it really was half-baked, and totally incapable of delivering on the promises. The better ideas found their way into the Mill Architecture, which promises a (very conservative) 10x improvement. The goal is DSP level performance on general purpose code. It is a fascinating architecture encompassing many novel ideas, and nothing to suggest it won't deliver.
-
Re:Limitations
Physics aside, some of those limitations can be relaxed by novel architectures. There was a lot of this back in the day, before architectural innovation was abandoned in favor of more predictable process innovation thanks to Moore's Law. Intel has relied upon brute force and anti-competitive practices ever since, virtually eliminating architectural innovation. In a way, process limitations are a welcome obstacle, that should motivate reflection on legacy decisions, and perhaps finally allow the x86 architecture to be put to rest. Many consider x86 "good enough", but the problems with legacy hardware run a lot deeper than performance, and are largely responsible for the horrific state of computer security today.
Have a look at the Mill Architecture for an idea of the possibilities. Out of order hardware is very expensive, both in terms of power and area, and imposes some unnecessary limitations. The Mill can do substantially better. It is fundamentally more secure, eliminating most common exploits by design, and enabling efficient implementation of a microkernel. Meanwhile, it provides a much better abstraction of hardware, which is far more friendly to compilers.
-
Mill Computing and Wintel
For a long time, Intel and Microsoft Windows have rules the computing world. The platform has been at the bottom, Intel's instruction set architecture.
Intel leaped from 16-bit to 32-bit architecture and then from 32-bit to 64-bit but the basic execution model remains the same. Most of the advances that Intel have done from the Pentium onwards in the early '90s have been stopgaps to get as much out of the execution model, but still being limited by it.There are other processors out there, DSPs, that are much faster than x86 at specialized tasks by making them pipelined and parallel. GPUs could be seen as massively parallel DSPs.
But raw computing power is not the problem. The problem is to run general-purpose code well - and general-purpose code has many branches between code paths and that can't be parallelized.A company called Mill Computing is working on a general-purpose CPU architecture inspired by DSPs and from what they think that the Intel IA-64 (Itanium) should have been.
By being vastly different in several significant ways from x86, they claim to be able to achieve a significantly higher performance per watt and performance per clock overall than Intel and AMD's x86. -
Re:First of many
Which is only necessary, because time and again, conventional architectures have proven to be terrible for implementing microkernels. Security is hopeless on an x86, but at least monolithic kernels are performant.
A microkernel implemented on hardware with something like the Mill security model would be a beautiful thing. When crossing a protection boundary is roughly the cost of a function call, and the system uses a single address space model, microkernels become trivial and highly attractive. Still an immense amount of work, but the effort would reap genuine gains in security across the board.
-
A secure architecture&OS would be more economi
x86 and systems based on it are hopeless from a security perspective, and that is even before considering the ticking time bomb that is Intel's Management Engine. It will be exploited eventually, and it would be surprising if the NSA wasn't already compelling Intel to backdoor it.
See the Mill security architecture, for an example of how a clever architecture can eliminate the bulk of common exploit vectors, and require little more than a recompile. It isn't the only option, but I highlight the Mill because it is a fascinating and novel architecture which also addresses many other long-standing issues with conventional systems. The security mechanisms also enable performant microkernels to be built, and protection between applications and libraries.
Operating systems will require work to take advantage of the protection features, but that will benefit everyone and be well worth the investment. This is the kind of "cyber" initiative I would like to see, rather than the focus on offensive capabilities. The latter poses a direct conflict of interest with securing systems, and ensures that adversaries will stock vulnerabilities rather than share and fix them.
-
Re:Why?
You might be interested in the Mill, which is designed to be compiler friendly and promises DSP level efficiency on general purpose code. It will require a lot of compiler work since it is so different, but much of that is a one time investment. The architecture makes software pipelining and vectorization the new normal, eliminating the need for hand-tuned assembly in a jumble of different instruction set extensions. Conventional compilers are not only complex, but very limited in what they can vectorize, and the extensive setup and teardown required for loops reduces the utility of such optimization. There is no such tradeoff on the Mill; all loops can be pipelined with minimal cost.
-
Looking past single-threaded x86...
Even on single-threaded workloads, there is a potentially substantial gain to be had with something like the Mill Architecture. Beyond that, conventional architectures also prevent efficient light-weight threading. Intel's ruthless pursuit of process technologies to keep x86 competitive, has also postponed serious efforts at improving architecture and exploiting parallelism. Some headway has been made on specific workloads amenable to GPUs, but there still remains great potential and a lot of work to be done on progressing languages and scalable architectures. Seen in this way, the physical process limits may actually encourage real progress.
-
Re:Really???
All function calls are slow on conventional architectures, and especially slow on register starved architectures, or with the mismatch caused by a stack-based VM. For contrast, see the Mill Architecture, which enables true single cycle function calls, without all of the shuffling of registers or memory-based stack nonsense.
-
Re:Really???
Using an intermediate representation was not pioneered by Apple, or even with Java. IBM has been doing it since the System/360. For a modern take on this idea, see the Mill Architecture. It provides a highly orthogonal instruction set which is then specialized to concrete implementations of the architecture. Unlike the JVM, the architecture itself is fundamentally SSA, and designed to be an ideal compiler target. The belt architecture is perfectly suited to this, without the drawbacks or impedance mismatch inherent in stack and register architectures.
-
Re:A Shame
For an interesting architecture, try the Mill. It is so strikingly brilliant, that it is hard to retain any interest at all in conventional architectures. It also puts the RISC/x86 conflict into perspective; they are essentially identical at the core, and share many of the same problems. Peeling away legacy cruft is always welcome, but the Mill offers so much more...
-
Re:Not sure whats more impressive...
I don't have a background in microprocessor design: I've only designed a very simple one as an assignment, but I've been following the industry pretty closely.
From what I can tell, his design looks like it might be flops per watt comparable to GPUs, but with different memory abstractions that result in similar limitations. I suspect that if you write custom code for it, and have the right kind of problem, it will do significantly better than available options, but in the general case and/or non specialized code it won't do anything much better than a GPU, but it it may be competitive.
I don't see the design as revolutionary: basically everyone wants to make a grid of DSPs because thats the most efficient thing you can do and we all know it. Also everyone wants to have core local memory with explicit DMA between them because we know thats the most efficient, but it sucks to use. Look how much people enjoyed writing for the PS3... (It had an IBM cell processor with explicit DMA). Its a idealized mix of the ideas from DSPs, GPUs and might feel a bit like a monster IBM Cell processor. The real question is what its like to code for.
If you want an interesting general purpose processor to look into (something that will run existing code well) I recommend the mill processor. The videos on that site provide hours of interesting insights into CPU design.
-
Mill Architecture
The Mill architecture is around the corner now, and promises immense potential. It elegantly addresses many deficiencies of conventional architectures, and enables substantially increased efficiency while also simplifying system software and compilers. It is a fascinating and compelling design, which re-abstracts the hardware and software in a fundamentally superior way.
While the Alpha is a nice RISC design, at heart it is more similar to an x86 than not. The paradigm introduced by the Mill architecture is a world apart.
-
I mosly agree with you...
The first big problem with integers is that they are really badly defined in C, so just like you I try to use unsigned as much as possible:
Any underflow turns into a big overflow, so it can be tested for at the same time as the overflow test, and the semantics of power-of-two sized wraparound is pretty solid on all platforms and implementations.
OTOH I don't agree that having proper overflow handling would mostly be a new source of bugs, i.e. on the new Mill cpu architecture we have a full orthogonal set of of all basic operations:
When adding two numbers (belt values) you can specify signed or unsigned, and over/underflow to be handled as saturating, wraparound or trapping, as well as automatically widening.
http://millcomputing.com/wiki/...
Look at ADDSW as an example of a Signed ADD that will widen if needed.
Since the Mill carries metadata alongside each belt slot it does not need separate byte/short/word/dword ADD instructions: The size of the operations is defined by the belt slot specified and not in the instruction encoding, so the machine code is polymorphic in data item size.
I.e. you can start with 8-bit values and an 8-bit accumulator, when the sum becomes too large then it is automatically widened to 16 bits or more. This works all the way to 128 bits for all scalar operations.
Terje
-
Re:Gurus like Carmack don't need agents
Thanks for remembering, that time was a lot of fun.
:-)I'm still doing low-level programming, I've been involved with the Mill for a little more than a year now, I'm working on scalar/vector FP emulation for the smallest models we intend to produce.
Take a look at http://millcomputing.com/ if you want to widen your mind a bit: A CPU with a belt instead of registers!
Terje
-
Re:Easily my favorite modern features
Yes, but instead of having a status register, you compare each item in one vector with each vector in another and get the results as a vector of booleans.
Then execute a SIMD instruction, where each component scalar operation is conditional according to each corresponding boolean.Or, you could convert that vector of booleans into something else. For instance, you could count the number of leading 1's in the vector and store into a scalar, which would allow you implement operations such as strlen() or strcmp() with vectors.
(It is a bit like programming in APL, if you have tried it)These types of operations have hitherto mostly been done by DSPs.
An architecture for general-purpose computing under development that would do this well is The Mill. Mind you, it is very interesting in other ways. There is a lot of stuff about it on the web site, and good talks about various features on Youtube. -
Re:Static scheduling always performs poorly
I think your generalization of static scheduling performs poorly on a Mill.
:) The Mill architecture uses techniques which essentially eliminate stalls even with static scheduling, at least to about the same extent that an OOO can. Obviously, there will be cases where it will stall on main memory, but those are unavoidable on either. See the Memory talk in particular for how the Mill achieves this, and other improvements possible over OOO. The entire series of videos is fascinating if you have time, but there is also a short introduction. Beyond that, there is a considerable amount of detail scattered in the forums which the videos don't cover.The Mill aims to provide OOO performance with DSP level efficiency, and offers a defensible means of doing so. Ultimately, any complex schemes aimed at keeping functional units busy are a waste of power if that result can be achieved with simple static hardware.
-
Re:Static scheduling always performs poorly
I think your generalization of static scheduling performs poorly on a Mill.
:) The Mill architecture uses techniques which essentially eliminate stalls even with static scheduling, at least to about the same extent that an OOO can. Obviously, there will be cases where it will stall on main memory, but those are unavoidable on either. See the Memory talk in particular for how the Mill achieves this, and other improvements possible over OOO. The entire series of videos is fascinating if you have time, but there is also a short introduction. Beyond that, there is a considerable amount of detail scattered in the forums which the videos don't cover.The Mill aims to provide OOO performance with DSP level efficiency, and offers a defensible means of doing so. Ultimately, any complex schemes aimed at keeping functional units busy are a waste of power if that result can be achieved with simple static hardware.
-
Re:Sounds smart, but is it?
There is a lot of information available on the Mill architecture at this point, and very little reason to doubt its feasibility. Essentially all of the parts have been demonstrated in existing architectures, and the genius is in how they are combined in such a simple and elegant manner. Implementation issues aside, the idea is rock solid, and has too much potential to ignore. Perhaps the layman can not appreciate it, but the architecture has a profound ability to simplify and secure the entire stack of software on top of it. Even without silicon, that much is clear, and no doubt why there is so much excitement.
People look at architectures like a black box and fail to appreciate that the quality of applications they use are heavily dependent on what systems and language programmers can provide using that box. Not many have experience with such low level code, but it is a nightmare to produce and maintain on conventional architectures. It is fragile and full of cruft, requiring tremendous effort to optimize compilers/languages/libraries/etc. The Mill wipes away the need for such effort and enables trivial yet superior compilers, systems, software optimization and security. Those costs are worth arbitraging, even if the Mill itself offered no performance advantage. However, a Mill is actually very similar to a DSP from the hardware perspective so it is easy to extrapolate.
(Incidentally, the Mill is also expected to be an excellent platform for micro-kernels. The value of micro-kernels has never been in question, but there is a significant performance trade off on conventional architectures. L4 has done well in minimizing it, but on the Mill there will be no contest.)
-
The Mill
I think NVidia tied their hands by retaining the ARM architecture. I suspect the result will be a "worst of both worlds" processor that doesn't use less power or provide better performance than competitors.
In order execution, exposed pipelines, and software scheduling are not new ideas. They sound great in theory, but never seem to work out in practice. These architectures are unbeatable for certain tasks (e.g. DSP), but success as general purpose processors has been elusive. History is littered with the corpses of dead architectures that attempted (and failed) to tame the beast.
Personally, I'm very excited about the Mill architecture. If anybody can tame the beast, it will be these guys.
-
Re:They're fools
This was their opportunity to dominate the CPU market with the MIll CPU architecture and they blew it.
So...you pitched AMD on your unknown, untested architecture that has never been implemented and only exists on paper... and the went with ARM instead? The fools
I mean it's not like ARM has mature compilers (multiple) OS support, strong developer mindshare, tons of performance data to drive design on real silicon to drive CPU design tradeoffs, like MIII already does...oh..wait minute...
-
They're fools
This was their opportunity to dominate the CPU market with the MIll CPU architecture and they blew it.
-
Re:Right, because that worked so well
You cannot meaningfully do reordering and so on in software on a modern CPU. You do not know in advance which operands will be available from memory at which time. You have to redo that work every time you get to the code (unless it is in a tight loop, but modern x86's are REALLY good at tight loops) because circumstances will likely have changed -- and you cannot reorder in software every time, that is just too costly.
If you want to see an architecture which looks like it has a chance of breaking the limits on single-threaded performance, look at the Mill. In theory you could software-translate x86 to Mill code and gain performance, but it would be really tricky and no Mill implementations exist yet.