iPhone 5 A6 SoC Teardown: ARM Cores Appear To Be Laid Out By Hand
MrSeb writes "Reverse engineering company Chipworks has completed its initial microscopic analysis of Apple's new A6 SoC (found in the iPhone 5), and there are some rather interesting findings. First, there's a tri-core GPU — and then there's a custom, hand-made dual-core ARM CPU. Hand-made chips are very rare nowadays, with Chipworks reporting that it hasn't seen a non-Intel hand-made chip for 'years.' The advantage of hand-drawn chips is that they can be more efficient and capable of higher clock speeds — but they take a lot longer (and cost a lot more) to design. Perhaps this is finally the answer to what PA Semi's engineers have been doing at Apple since the company was acquired back in 2008..."
Pretty picture of the chip after using an Ion Beam to remove the casing. The question I have is how it's less expensive (in the long run) to lay a chip out by hand once instead of improving your VLSI layout software forever. NP classification notwithstanding.
That must be a very fine tipped resist pen...
The question I have is how it's less expensive (in the long run) to lay a chip out by hand once instead of improving your VLSI layout software forever. NP classification notwithstanding.
Coding in assembly still remains a superior method of squeezing extra performance out of software. It's just that few people do it because compilers are "good enough" at guessing which optimizations to apply, and where, and usually development costs are the primary concern for software development. But when you're shipping hundreds of millions of units of hardware, and you're trying to pack as much processing power in a small and efficient form factor, you don't go with VLSI for the same reason you don't go with a compiler for realtime code: You need that extra few percent.
#fuckbeta #iamslashdot #dicemustdie
...companies thinking in the long run prefer an intelligent or well-trained workforce to automation and minimum wage.
And before you retort, no - Foxconn workers are far above "minimum wage" for China.
Site is already down due to the Slashdot effect.
They must have tiny hands.
There are a lot of layout methodologies that are between the (frankly mythical) "X cache, Y FPUs, and Z cores" and fully hand layout. The top level may have more or less amounts of hand assembly, some blocks can be hand optimized, etc.. Usually, there is lots of glue logic which must be designed in RTL, synthesized and only then laid-out. And, for most blocks the process to create the logic design (RTL or perhaps gates) is separate from the process of laying-out these blocks. So there is room for manual involvement in each of the steps.
The real "Libtards" are the Libertarians!
The question I have is how it's less expensive (in the long run) to lay a chip out by hand once instead of improving your VLSI layout software forever. NP classification notwithstanding.
Coding in assembly still remains a superior method of squeezing extra performance out of software. It's just that few people do it because compilers are "good enough" at guessing which optimizations to apply, and where, and usually development costs are the primary concern for software development. But when you're shipping hundreds of millions of units of hardware, and you're trying to pack as much processing power in a small and efficient form factor, you don't go with VLSI for the same reason you don't go with a compiler for realtime code: You need that extra few percent.
I like to view things as a little more complicated than just applying optimizations. IMHO assembly gets some of its biggest wins when the human programmer has information that can't quite be expressed in the programming language. Specifically I recall such things in the bad old days when games and graphics code would use fixed point math. The programmer knew the goal was to multiply two 32-bit values, get a 64-bit result and right shift that result back down to 32 bits. The Intel assembly programmer knew this could be done in a single instruction. However there wasn't any real way to convey the bit twiddling details of this fixed point multiply to a C compiler so that it could do a comparable operation. C code could do the calculation but it needed to multiply two 64-bit operands to get the 64-bit result.
Looking closely I see a bunch of ram - probably half laid out by hand (caches) - and a many may small standard cell blocks almost certainly not laid out by hand - what I don't see is an obviously hand laid out datapath (the first part of your CPU you spend layout engineers on) - look for that diagonal where the barrel shifter(s) would be. There are some very regular structures (8 vertically) that I suspect are register blocks.
Still what I see is probably someone managing timing by synthesizing small std cell blocks (not by hand), laying those blocks out by hand then letting their router hook them up on a second pass - - it's probably a great way to spend a little extra time guiding your tools into doing a better job to squeeze that extra 20% out of your timing budget and give you a greater gate density (and lower resulting wire delays)
So - a little bit of stuff being done by hand but almost all the gates being lait out by machine
This is not by hand.
To take a programming analogy, it's looking at what the compiler generated, and then giving it hints so the resultant code/chip is laid out as you expect.
Chips stopped being able to be laid out 'properly' by hand some time ago.
Doing this has much the same benefits as doing it with code.
You know stuff the compiler does not.
You can spot silly stuff it's doing, that is not wrong, but suboptimal, and hold its hand.
Display is LG. Flash is mostly Hynix and Toshiiba.
IANAEE (I am not an electrical engineer).
I'm a mathematician. As I understand it, chip layout is usually done by solving a large-scale optimization problem to obtain a layout where things are packed as tightly as possible.
What I don't understand is the assertion in the summary that hand-drawn optimizations can be faster than a computed one. To me, hand-crafted optimizations can be passed to the optimizer, which will then output the tightest design based on the constraints given (with the hand-crafted optimizations incorporated). So it seems to me that hand-crafted optimizations can be as fast as computer-optimized circuit layouts... but not faster.
What am I missing here?
Note: I guess the compiler analogy would be writing something in assembly (with hand-crafted optimizations) vs. writing something in C and passing it through an optimizing compiler. But I don't know if circuit layout works like that. Or does it?
Good job Samsung!
For what, exactly reproducing what Apple gave them?
The question I have is how it's less expensive (in the long run) to lay a chip out by hand once instead of improving your VLSI layout software forever.
You can teach a small kid to ride a bicycle. The same kid has no chance to program a robot into doing the same motion and balancing. It's the same order of magnitude in difference with VLSI layout, a person can lay out the circuits but it's almost impossible to describe to the computer all the reasons why he'd lay it out that way. It's not easy controlling anything well through a level of indirection, that's true for most things.
As for being "less expensive", companies don't just have expenses but they have income too. If you can increase revenue because you got a better chip that sells more, they're willing to pay a higher cost. Companies care about profits, not expenses in isolation. Those tiny improvements to the compiler, how valuable are they to Apple in 10 years? 20 years? As opposed to an optimized chip which they know how much is worth right now.
Live today, because you never know what tomorrow brings
Android users usually get laid by hand.
Brilliant, this is what I love about Slashdot, I can be the biggest geek in whatever field I pick and I will still get outgeeked! I enjoyed reading the comments above mostly because I have absolutely no idea what the detail is, and I'd never even realised that hand-drawn vs machine was a issue.
:)
Can anyone supply a concise explanation of the differences and how it's all done? I'm guessing we're talking about people drawing circuits on acetate or similar and then it's scaled down photo-style to produce a mask for the actual chip?
Yes, I know I can just Google it, and I will, but as the question came up here I thought I'd add something to a real conversation, it beats a pointless click of some vague "like" button any day
Please consider this account deleted, I just can't be bothered with the spam anymore.
That the cost of the new iphone is finally justified because of expensive handmade chips?
...companies thinking in the long run prefer an intelligent or well-trained workforce to automation and minimum wage.
In general your point does have some merit but it really does depend on the specific task at hand. My grandfather was a master welder. However for *some* of the tasks that he used to perform a robotic welding system would be a better idea.
Must mean that chip designing software is crap then. Should invest in better software?
(This post is half funny/half serious - it's 2012, haven't intel or amd engineers developed algorithms to do the chip design for them?
I don't think bicycle riding is a very good analogy to this problem. How about cooking, which is a procedural step-by-step operation? Little hints the recipe can give you like "preheat oven to 350 degrees" can be a tremendous time-saver later. If you didn't know to do that, you'd get your dish ready and then look at the oven (off) and click it on and sit back and wait 20 minutes before placing it in the oven. A dish that was supposed to be 60 minutes start to serve is now going to take 80 minutes due to a lack of process optimization.
Compilers have the same problem of not knowing what the expectations are down the road, and aren't good at timing things. Good expereinced cooks can manage a 4 course meal and time it so all the dishes are done at the right time and don't dirty as many dishes. Inexperienced cooks are much like compilers, they can get the job done but their timing and efficiency usually have much room for improvement.
I work for the Department of Redundancy Department.
What kind of moron are you?
If you improve the CAD software now, you get the better chip now, and any chip you design in the future.
It's called a non-recurring cost.
If you do it separately for each chip it becomes a recurring cost.
Maybe they plan on this being the last chip they ever make?
" The question I have is how it's less expensive (in the long run) to lay a chip out by hand once instead of improving your VLSI layout software forever. NP classification notwithstanding."
I've done PCB layouts, microwave chip and wire circuits, as well as RFIC/MMIC layouts. Anyone who asks the question above has never done a real layout. Many autorouter and layout tools allow complex rules to match delays, keep minimum widths, etc. You can spend as much time on each layout trying to populate these rules for critical sections of a design, but it is like trying to train a 5 year old to do brain surgery. Digital design is rather much different than the analog circuits I work on, but you only have to do a few layouts of any flavor by hand in your life to be able to see just how scary it is to hand a layout to HAL.
Clearly autorouters and autogenerated layouts, and I don't mean to sound like too much of a luddite... I've witnesses plenty of awful hand layouts to go around as well.
Not being a chip expert, the following made me think twice over whether some dextrous East Asian factory workers used tweezers to lay out the circuits of each and every chip rolling down the assembly line:
"Hand-made chips are very rare nowadays, with Chipworks reporting that it hasn't seen a non-Intel hand-made chip for 'years.'"
The phrase "hand-made chips" is misleading because it gives the impression that, similar to the way motherboards are still assembled by hand, the production of CPUs involve human fingers coming into direct contact with the silicon.
Instead of teams of wondrous elves etching the microscopic pathways of electrons, we are treated to the less remarkable, but still impressive revelation that most chip designs are automatically spit out by high level chip design software.
Or maybe with their $100 billion in cash and 10s of billions of dollars in revenue that they can easily absorb the costs?
Seing those guys evolve is like watching an intricate ballet while everybody else is sumo wrestling.
Display is LG, Flash is Hynix, the RAM is from Elpida and their chip is their own design with Samsung just acting as a fab no different than Global Foundries or TSMC.
Display is LG. Flash is mostly Hynix and Toshiiba.
Yeah, but the software is Samsung, and everyone knows that's what really counts.
The CPU is manufactured by Samsung, and that's what really counts for Fandroids.
Of course news about a fake are Fake News.
Display is LG. Flash is mostly Hynix and Toshiiba.
Yeah, but the software is Samsung, and everyone knows that's what really counts.
The CPU is manufactured by Samsung, and that's what really counts for Fandroids.
Nah, I was referring to the well sourced fact that iOS is actually just a gimped version of Android.
Remember Schmidt was on the Apple board, and he provided preview copies of Android to Jobs.
When someone buys a design from ARM, they buy one of two things:
1. A Hard macro block. This is like an mspaint version of a cpu. it looks just like the photos here. The CPU has been laid out partially by hand by ARM engineers. The buyer must use it exactly as supplied - changing it would be neigh-on impossible. In the software world, it's the equivalent of giving an exe file.
2. Source Code. This can be compiled by the buyer. Most buyers make minor changes, like adjusting the memory controller or caches, or adding custom FPU-like things. They then compile themselves. Most use a standard compiler rather than hand-laying out the stuff, and performance is therefore lower.
The articles assertion that hand layout hasn't been done for years outside intel as far as I know is codswallop. Elements of hand layout, from gate design to designing memory cells and cache blocks have been present in ARM hard blocks since the very first arm processors. Go look in the lobby at ARM HQ in Cambridge UK and you can see the meticulous hand layout of their first cpu, and it's so simple you can see every wire!
Apple has probably collaborated with ARM to get a hand layout done with apples chosen modifications. I can't see anything new or innovative here.
Evidence: http://www.arm.com/images/A9-osprey-hres.jpg (this is a layout for an ARM Cortex A9)
The question I have is how it's less expensive (in the long run) to lay a chip out by hand once instead of improving your VLSI layout software forever.
No matter how much improvement on VLSI layout software their output can't match that of hand-laid layout by those who know what they are doing.
The VLSI layout software are like compilers. The final compiled code relies on two factors - the source-code input and the built-in "rules" of the compilers.
A similar case is in software programming - The source code from a so-so programmer compiled by a very very good compiler will result in a "good-enough" result.
It's good enough because it gets the job done.
However, a similar program by an expert Assembly Language programmer would have left "good enough" behind because the assembly language programmer would know how to tweak his code using the most efficient commands, and cut out the 'fats" by optimizing the loops and flows.
Muchas Gracias, Señor Edward Snowden !
Word to the mods: this person was joking.
Ecce ARM.
The proper syntax for that is (using x64 types) something like:
int a,b,z;
z = (int)(((long long)a * b) >> 32);
I'm assuming int is 32bit and long long is 64. Even though a is promoted to a larger type and also b, good compilers know that the upper half of those promoted variables are not relevant. They will then use the 32bit multiply, shift the 64bit result and store the part you need. I still do fixed point for control systems and find using 16bit signals and 32bit products is faster in C than floating point even on some embedded PPC chips - never mind the fixed point DSPs we use where the shifts cost nothing. Anyway, this syntax also worked on a HC12 compiler back in '98 or so. It's still hit or miss, but generally works on parts where this stuff is still common.
This article about Apple (and Qualcomm) wanting to use TSMC came out a few days after the verdict.
Apple has sued Chipworks for revealing company trade secret information, violating the terms of use attached to the device, and other intellectual property crimes.
Bundled in the Apple terms of use is an agreement not to reverse-engineer. They could not have obtained an Apple to deprocess without first agreeing to those terms.
Chipworks will belong to Apple when it is all over.
But at least they can do it with anyone they want.
There are lots of important applications that are NP-complete: chip design, traffic timings, tuning the accuracy of a multi-jewel mechanical watch.
For many of them, we have algorithms that approximate an optimal solution, but they are all based on certain assumed heuristics that the programmer was able to supply. Some have tuning parameters in order to tap into human expertise somewhat.
But that is where art begins: a human can develop and intuition for a certain class of problem that goes beyond the processing power we have and the algorithms we can create. It appears that chip design is one of these.
What kind of moron are you?
If you improve the CAD software now, you get the better chip now, and any chip you design in the future.
Any chip you design? Don't you mean any chip the Indian subcontractor designs with the CAD and rules you developed?
Have gnu, will travel.
Not surprising at all, as PA SEMI was founded by Daniel W. Dobberpuhl.
Daniel Dobberpuhl had his hand in StrongARM and DEC Alpha design - both hand-drawn cores which to this day command some respect in chip design circles I'm told.
Anyway,
I said no... but I missed and it came out yes.
someone ELSE buys it, then sells it to chipworks. the restriction only applies to the first buyer
I wouldn't assume those companies are making all of them.
Apple usually tries hard to find at least two manufacturers for every single component. There are some that are made from a single source but that is the exception rather than the rule.
Just because the first 10,000,000 shipments have displays made by LG does not mean the next 10,000,000 will.
It's entirely possible that they use a datapath place-and-route software to achieve this layout pattern over traditional quadratic standard-cell placement.
I don't see anything in the pictures which implies "hand custom layout". I see a lot of carefully placed and floorplanned blocks, some of which are synthesized and some of which may have varying degrees of directed placement & routing. There are a lot of RAMs and register files, which look very regular but there's no way to tell whether they were generated by a bog standard RAM/RF compiler or whether there was some custom work (perhaps a combination of the two). There are a lot of unique blocks for a chip this size, I suspect there are several fixed function units to do various things (mpeg decoding or whatnot).
Hand custom layout conjures images of dozens of layout engineers drawing polygons for every transistor; I doubt they did much of that but I'm certain you can't tell from these kinds of photos.
It certainly looks "designed" and knowing how sharp the pasemi folks are then that isn't at all surprising.
The proper syntax for that is (using x64 types) something like:
int a,b,z;
z = (int)(((long long)a * b) >> 32);
I'm assuming int is 32bit and long long is 64. Even though a is promoted to a larger type and also b, good compilers know that the upper half of those promoted variables are not relevant. They will then use the 32bit multiply, shift the 64bit result and store the part you need.
Admittedly its been a while since I did fixed point but back in the day when I checked popular 32-bit x86 compilers (MS and gcc) they did not generate the couple of instructions that an assembly language programmer would. My example may be dated.
Looks like gcc isn't a good compiler.
Compiling this at -O3
int mult(int a, int b)
{
return (int)(((long long)a * b) >> 32);
}
In x86-64 mode gives
movslq %esi, %rax
movslq %edi, %rdi
imulq %rdi, %rax
sarq $32, %rax
ret
and 32-bit mode gives
pushl %ebp
movl %esp, %ebp
movl 12(%ebp), %eax
imull 8(%ebp)
popl %ebp
movl %edx, %eax
ret
On powerpc the 64-bit version is very clean and obvious:
mulld 4,4,3
sradi 3,4,32
blr
the 32-bit version is a little bit more complicated
mulhw 9,4,3
mullw 10,4,3
srawi 11,9,31
srawi 12,9,0
mr 3,12
blr
The question I have is how it's less expensive (in the long run) to lay a chip out by hand once instead of improving your VLSI layout software forever. NP classification notwithstanding.
It's simple math. At what volume will the chip be produced? A modern fab costs $X Billion, and you know pretty much exactly how many wafers you can run during the 3 years it is state-of-the-art. After that, add $Y Billion for a refit, or just continue to run old processes. Anyway, say a new fab at refit time would cost $Z Billion. Refitting the old fab instead costs $Y Billion. So you save $Z-$Y by doing a refit. So the original fab cost you $X-($Z-$Y). Divide by number of wafers the fab can run during its life, that is the cost per wafer. Now compute die area for hand layout versus auto layout, and adjust for imporved yield for smaller die. Divide by die per wafer. That is how much less each die costs you. Now since the die is smaller, it probably runs faster, so adjust your yield-to-frequency-spec upwards, or adjust your average selling price upwards if the speed difference is "large" (enough MHz to have marketing value). That is the value of hand layout. It isn't rocket surgury to work out a dollars-and-cents number.
Anyway, even at Intel for at least the past 20 years only highly repetive structures like datapath logic has been hand laid out. Control logic is too tedius to lay out by hand, doesn't yield much area benefit, and is where the bulk of the bug fixes end up so it's the most volatile part of the layout from stepping to stepping.
So, can hand layout have a positive return on investment? Yes, if you run enough wafers of one part to make the math work out. These days the math will only work out for higher volume parts.
(Yes, I'm ex-Intel).
Argh, where is a good overview of the EDA industry when you need one?
The only thing I can come up with is the "EDA Industry Primer" on one of Synopsys' Investor Relations pages. That's too high level here-- it only describes the buzzwords for investors.
There use to be a better one, and I think it was Cadence that was putting this out. But I can't find it now.
Hopefully, TSMC is past their manufacturing issues at, at, uh, waitaminute.
[Types in "TSMC" into Google. Autocomplete shows "TSMC 28nm".]
Hopefully, TSMC is past their manufacturing issues at 28nm that was reported eariler this year.
Sure, it's a non-recurring cost, the question really is, is the non-recurring cost $10^96 (to get close to "solving" an impossible problem by improving the CAD software) or should you spend $10^9 several times, in the hope that you don't design 10^87 more chips.
at first I thought this was about little chinese children hand assambling the chips at some rat infested factory.... Thou that might still be the case as they are cheaper to upkeep than a machine!
Apple usually tries hard to find at least two manufacturers for every single component.
Honestly any hardware designer with a clue tries to minimise use of single source parts. Single source parts can be troublesome pretty much regardless of volume. For small volumes where you buy from stock there is the risk of all the stock of a part suddenly dissapearing with no new batch due for months. For large volumes where you are having parts made to fill your order single source parts gives the part manufacturers more bargining power.
Sadly for ICs there often isn't a lot of choice since there is often no generic part that does what you want.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
And the fab is even in the US. Northeast Austin to be specific. (They've been making the A5 there, so I'm guessing the A6 would be made there too.)
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
I did VLSI as postgrad researcher and designed a fairly complex DSP IC together with several layouts. Contrary to what the name implies, the 'manual' layout doesn't mean manually placing each part in the layout, or manually routing. The only thing that gets done manually is pre-placement of key blocks, be it for optimising wire lengths or die size. Then you run the algorithms that fit the less critical stuff in and route wires. You can do this at several levels of architecture, e.g. when laying out and ALU you can place input/output buffers and let sw do the rest. Once this is done, whole ALU can be placed "manually" as a module in a CPU design.
The process is not that much more expensive then letting software do it all.