Slashdot Mirror


iPhone 5 A6 SoC Teardown: ARM Cores Appear To Be Laid Out By Hand

MrSeb writes "Reverse engineering company Chipworks has completed its initial microscopic analysis of Apple's new A6 SoC (found in the iPhone 5), and there are some rather interesting findings. First, there's a tri-core GPU — and then there's a custom, hand-made dual-core ARM CPU. Hand-made chips are very rare nowadays, with Chipworks reporting that it hasn't seen a non-Intel hand-made chip for 'years.' The advantage of hand-drawn chips is that they can be more efficient and capable of higher clock speeds — but they take a lot longer (and cost a lot more) to design. Perhaps this is finally the answer to what PA Semi's engineers have been doing at Apple since the company was acquired back in 2008..." Pretty picture of the chip after using an Ion Beam to remove the casing. The question I have is how it's less expensive (in the long run) to lay a chip out by hand once instead of improving your VLSI layout software forever. NP classification notwithstanding.

19 of 178 comments (clear)

  1. Costs by girlintraining · · Score: 5, Informative

    The question I have is how it's less expensive (in the long run) to lay a chip out by hand once instead of improving your VLSI layout software forever. NP classification notwithstanding.

    Coding in assembly still remains a superior method of squeezing extra performance out of software. It's just that few people do it because compilers are "good enough" at guessing which optimizations to apply, and where, and usually development costs are the primary concern for software development. But when you're shipping hundreds of millions of units of hardware, and you're trying to pack as much processing power in a small and efficient form factor, you don't go with VLSI for the same reason you don't go with a compiler for realtime code: You need that extra few percent.

    --
    #fuckbeta #iamslashdot #dicemustdie
  2. Chip design not black-or-white by whoever57 · · Score: 5, Informative

    Today, chips are nearly always laid out using advanced, CAD-like software â" the designer says he wants X cache, Y FPUs, and Z cores, and the software automagically creates a chip. Hand-drawn processors, on the other hand, are painstakingly laid out by chip designers.

    There are a lot of layout methodologies that are between the (frankly mythical) "X cache, Y FPUs, and Z cores" and fully hand layout. The top level may have more or less amounts of hand assembly, some blocks can be hand optimized, etc.. Usually, there is lots of glue logic which must be designed in RTL, synthesized and only then laid-out. And, for most blocks the process to create the logic design (RTL or perhaps gates) is separate from the process of laying-out these blocks. So there is room for manual involvement in each of the steps.

    --
    The real "Libtards" are the Libertarians!
  3. Looking closely by taniwha · · Score: 5, Informative

    Looking closely I see a bunch of ram - probably half laid out by hand (caches) - and a many may small standard cell blocks almost certainly not laid out by hand - what I don't see is an obviously hand laid out datapath (the first part of your CPU you spend layout engineers on) - look for that diagonal where the barrel shifter(s) would be. There are some very regular structures (8 vertically) that I suspect are register blocks.

    Still what I see is probably someone managing timing by synthesizing small std cell blocks (not by hand), laying those blocks out by hand then letting their router hook them up on a second pass - - it's probably a great way to spend a little extra time guiding your tools into doing a better job to squeeze that extra 20% out of your timing budget and give you a greater gate density (and lower resulting wire delays)

    So - a little bit of stuff being done by hand but almost all the gates being lait out by machine

  4. Re:Site is down by sexconker · · Score: 5, Informative

    I've put the picture (which is what everyone wants) up here:
    http://i.imgur.com/vqCAu.jpg

  5. 'by hand' - not really. by queazocotal · · Score: 5, Informative

    This is not by hand.
    To take a programming analogy, it's looking at what the compiler generated, and then giving it hints so the resultant code/chip is laid out as you expect.

    Chips stopped being able to be laid out 'properly' by hand some time ago.

    Doing this has much the same benefits as doing it with code.
    You know stuff the compiler does not.
    You can spot silly stuff it's doing, that is not wrong, but suboptimal, and hold its hand.

  6. Re:What makes hand-made chips "faster"? by Hatta · · Score: 5, Informative

    I'm guessing that the search space is too large to brute force the optimization. For similar reasons we can't write a program that can beat a Go master. It's just too hard a problem without heuristics, and the heuristics in the human brain are better. Figure out why, and you've solved AI.

    --
    Give me Classic Slashdot or give me death!
  7. Re:What makes hand-made chips "faster"? by marcansoft · · Score: 5, Informative

    What you're missing is that chip layout is NP-complete. For anything beyond very trivial chips, no computer algorithm can yield the optimal solution in a reasonable time.

    As I understand it, automated layout algorithms are still, when you get down to it, largely quite dumb. I'm sure this is oversimplifying and someone who writes place-and-route software will probably want to kill me, but the algorithm is closer to "throw stuff together, measure performance, tweak things randomly, measure performance, keep the change if it got better" than to anything likely to yield an optimal solution. Eventually, you'll converge on a decent layout, sure, but not an optimal one.

    It's pretty much guaranteed that this chip wasn't completely hand-crafted (modern chips are much too complicated to do that). Instead, most likely, engineers guided the placement of major blocks and data paths, and let the automated place-and-route software choose the rest. By constraining the design based on intelligent decisions, you can guide the automated process to converge on a better solution.

  8. Layout by HAL by Anonymous Coward · · Score: 3, Informative

    " The question I have is how it's less expensive (in the long run) to lay a chip out by hand once instead of improving your VLSI layout software forever. NP classification notwithstanding."

    I've done PCB layouts, microwave chip and wire circuits, as well as RFIC/MMIC layouts. Anyone who asks the question above has never done a real layout. Many autorouter and layout tools allow complex rules to match delays, keep minimum widths, etc. You can spend as much time on each layout trying to populate these rules for critical sections of a design, but it is like trying to train a 5 year old to do brain surgery. Digital design is rather much different than the analog circuits I work on, but you only have to do a few layouts of any flavor by hand in your life to be able to see just how scary it is to hand a layout to HAL.

    Clearly autorouters and autogenerated layouts, and I don't mean to sound like too much of a luddite... I've witnesses plenty of awful hand layouts to go around as well.

  9. Re:And made by Samsung by Lunix+Nutcase · · Score: 4, Informative

    Display is LG, Flash is Hynix, the RAM is from Elpida and their chip is their own design with Samsung just acting as a fab no different than Global Foundries or TSMC.

  10. Re:What makes hand-made chips "faster"? by AK+Marc · · Score: 3, Informative

    And an incremental chip would benefit from hand-holding more than a new one. Say they tested the chip at + 50% clock speed and identified locations of instability. Then, they "by hand" took an optimized automated layout and tweaked it to improve those few specific areas. That would be a "by hand" design that didn't take too much work and gave a better result. Perhaps do the same thing, but with 50% less voltage, rather then increased speed. Then hand-optimize on that. Then compare the two and come up with something that's faster and lower power than the automated process would ever come up with.

    Most algorithms I've messed with are very good at iteration, but bad at evolution (they win chess by calculating odds and values, not by analyzing the opponent and his moves).

  11. Re:ARM hard blocks are always laid out by hand... by Lunix+Nutcase · · Score: 5, Informative

    When someone buys a design from ARM, they buy one of two things:

    Which is not what Apple did.

    Apple has probably collaborated with ARM to get a hand layout done with apples chosen modifications. I can't see anything new or innovative here.

    No, they designed it themselves since they are an architectural licensee like Qualcomm. You remember how they bought PA Semi?

  12. Re:News For This Nerd by lexman098 · · Score: 5, Informative

    The headline is attention-grabbing bullshit.

    I'd believe that Intel may have in the past done manual placing and routing of custom made cells in certain key parts of their CPUs, but I can almost assure you that Apple did not place all of the standard cells in their ARM core's and then route them together manually, which is what the headline implies.

    What I'm talking about here is literally placing down a hundred thousand rectangles in a CAD tool and then connecting them correctly with more rectangles which is way beyond what Apple would have considered worth the investment for a single iPhone iteration. What's more probable (and pretty standard for digital chip design) is that they placed all of the large blocks in the chip by hand (or at least by coordinates hand-placed in a script), and they probably "guided" their place and route tool as to which general areas to place the various components of the ARM cores. They might have even gone in after the tool and fixed things up here and there.

    Modern chips are almost literally impossible to "lay out by hand".

  13. Re:News For This Nerd by slew · · Score: 3, Informative

    Nobody "draws" chips by "hand" anymore. It's all being done by a computer (there are so many design rules these days humans can't do this anymore in a realistic time frame). Reticles (the photomasks) are all fractured by computer these days because rectangles aren't really rectangles anymore at these small feature sizes (we are now past the diffraction limit so masks must be "phase-shift" masks not binary masks back in the old-days).

    I don't have any specific knowledge about the A6, but what is euphamistically called hand-drawn these days is often still very automated relative to the bad-old-days when people were drawing rectangles on layers to make transitors. That was the real-hand-drawn days, but even way back then you didn't actually draw them by hand, you used a computer program to enter the coordinates for the rectangles.

    Quick background: now days when typical chips go to physical design, they usually go through a system called place-and-route where pre-optimized "cells" (which have 2-4 inputs and 1-3 outputs and implement stuff like and-or-invert, or register flop) are placed down by the computer (typically using advanced heuristic algorithms) and the various inputs and outputs are connected together with many layers of wires which logically match the schematic or netlist (which is the intention of the logical design). Of course this is when physics starts to impose on the "logical" design, so often things need special fixups to make things work. Unfortunatly, the fixups and the worst case wirelengths between cells conspire to limit the performance and power of the design, but just like compiled software, it's usually good enough for most purposes. Highly leveraged regularly structured components of normal designs might have libraries, specialized compilers or even have hand intervention (e.g, rams, fifos, or register files), but not the bulk of the logic.

    As far as I can tell from looking at the pictures the most likely possibility is that just that instead of letting the computer place the design completely out of small cells, some larger blocks (say like ALUs for the ARM SIMD path) were created by a designer and layout engineer who probably used a lower-level tool to put down the same small cells relative to other small cells where they think is a good place to put them and tweak the relative positioning to try to minimize the maximum wire lengths between critical parts of the block. The most common flow for doing this is mostly automated, but tweakable with human intervention (this what passed for "by-hand" these days). In addition to being designed to optimize critical paths, these larger blocks are generally desgined so that they "fit" well with other parts of the design (e.g., port order, wire pitch match, etc) to minimize wire congestion (so they can be connected with mostly straight wires, instead of those that bend). Basically looking at the patterns of whitespace in the presumed CPU, you can see the structure of these larger blocks instead of big rectangles (called partitions) which have rows of cells you get when you let a computer do place-and-route with small cells.

    Just like optimizing a program, there are many levels of pain you can go through and what I described above is probably the limit these days. Say if you wanted less pain, another more automated way to get most of the same benefits is to just develop a flow that hints where to put parts of the design inside the normal rectangular placement region, and let a placement engine use those hints. The designer can just tweak the hints to get better results. Of course with this method, the routing may still have "kinks" in this case because routing is not wire-pitch-matched, but you can often get 80-90% the way there. The advantage of this lesser technique is that you don't need to spend a bunch of time developing big blocks and if there is a small mistake (of course nobody ever makes mistakes), it's much, much easier to fix the mistake w/o perturbing the whole design.

    FWIW, it is highly unlikely that th

  14. Re:What makes hand-made chips "faster"? by Space+cowboy · · Score: 3, Informative

    As a mathematician, you ought to understand global optimization encountering local minima in a high-dimensional space. Standard tools for large-scale functional minimization are all subject to it in one form or another, and humans get to ignore all the "stuff that doesn't make sense" - machines don't have that latitude, at least with current algorithms.

    Don't get me wrong, the layout and design tools are on the bleeding edge; they're as sophisticated as they come, and there's a *huge* amount of maths in how they work, but they're still crap, compared to a moderately skilled human. What they do excel at is doing all the tedious repetitive work that is typically required, and there's a *lot* of that.

    Simon

    --
    Physicists get Hadrons!
  15. Looks like a modern semi-custom chip to me by Brannon · · Score: 3, Informative

    I don't see anything in the pictures which implies "hand custom layout". I see a lot of carefully placed and floorplanned blocks, some of which are synthesized and some of which may have varying degrees of directed placement & routing. There are a lot of RAMs and register files, which look very regular but there's no way to tell whether they were generated by a bog standard RAM/RF compiler or whether there was some custom work (perhaps a combination of the two). There are a lot of unique blocks for a chip this size, I suspect there are several fixed function units to do various things (mpeg decoding or whatnot).

    Hand custom layout conjures images of dozens of layout engineers drawing polygons for every transistor; I doubt they did much of that but I'm certain you can't tell from these kinds of photos.

    It certainly looks "designed" and knowing how sharp the pasemi folks are then that isn't at all surprising.

  16. The arithmetic is simple by dbc · · Score: 5, Informative

    The question I have is how it's less expensive (in the long run) to lay a chip out by hand once instead of improving your VLSI layout software forever. NP classification notwithstanding.

    It's simple math. At what volume will the chip be produced? A modern fab costs $X Billion, and you know pretty much exactly how many wafers you can run during the 3 years it is state-of-the-art. After that, add $Y Billion for a refit, or just continue to run old processes. Anyway, say a new fab at refit time would cost $Z Billion. Refitting the old fab instead costs $Y Billion. So you save $Z-$Y by doing a refit. So the original fab cost you $X-($Z-$Y). Divide by number of wafers the fab can run during its life, that is the cost per wafer. Now compute die area for hand layout versus auto layout, and adjust for imporved yield for smaller die. Divide by die per wafer. That is how much less each die costs you. Now since the die is smaller, it probably runs faster, so adjust your yield-to-frequency-spec upwards, or adjust your average selling price upwards if the speed difference is "large" (enough MHz to have marketing value). That is the value of hand layout. It isn't rocket surgury to work out a dollars-and-cents number.

    Anyway, even at Intel for at least the past 20 years only highly repetive structures like datapath logic has been hand laid out. Control logic is too tedius to lay out by hand, doesn't yield much area benefit, and is where the bulk of the bug fixes end up so it's the most volatile part of the layout from stepping to stepping.

    So, can hand layout have a positive return on investment? Yes, if you run enough wafers of one part to make the math work out. These days the math will only work out for higher volume parts.

    (Yes, I'm ex-Intel).

  17. Re:Automation versus human instinct by TheRaven64 · · Score: 3, Informative

    Compilers almost always do a much better job than humans if provided with the same input. The advantage that humans have is that they are often aware of extra information that is not encoded in the source language and so can apply extra invariants that the compiler is not aware of. A human is also typically more free to change data formats, for example for better cache usage, whereas a compiler for a language like C is required to take whatever layouts the programmer provided.

    The problem with place-and-route is that the search space is enormous and automated tools typically use purely deterministic algorithms, whereas humans use a lot more backtracking. A simulated annealing approach, for example, can often do a lot better (check the literature, there are a few research systems that do this).

    However, a similar program by an expert Assembly Language programmer would have left "good enough" behind because the assembly language programmer would know how to tweak his code using the most efficient commands, and cut out the 'fats" by optimizing the loops and flows.

    This is, on a modern architecture, complete bullshit. Whoever is generating the assembly needs to be aware of pipeline behaviour, the latency and dispatch timings of every instruction and equivalences between them. Even if you just compare register allocation and use the same instruction selection, humans typically do significantly worse than even mediocre compilers. Instruction selection is just applying a (very large) set of rules: it's exactly the sort of task that computers do better than humans.

    --
    I am TheRaven on Soylent News
  18. Re:Automation versus human instinct by stevew · · Score: 5, Informative

    Okay - I'm stepping in here because I actually do chip design for a living. The difference between hand laid-out and machine generated chips can be as much as a 5X performance difference. The facts are that physical design isn't the same as compiler writing. It's a harder problem to crack - first it's a multi-dimensional problem. Next, it has to follow the laws of physics, themselves complicated ;-)

    Both processes DO rely on the quality of input. When my designs don't run fast enough, the likely fix is to go back to the source and fix it there instead of trying to come up with some fix within placement and routing. The other simple fact is that in timing a physical design - you have to consider EVERY path that the logic takes in parallel. There is not such thing as the "inner-most" loop of the algorithm for determining where the performance goes. Finally once you have a good architecture for timing, the placement of the physical gates dominates the process.

    A human - with their common sense is always going to give better performance than an algorithm. I mentioned a 5X difference between hand-drawn & compiled hardware. That is about what I see on a daily basis between what my tools can do for me, and what Intel gets out of their hand-drawn designs for a given technology node.

    --
    Have you compiled your kernel today??
  19. Re:Why assembly ... by swillden · · Score: 4, Informative

    I don't know what that single instruction would be (I am not an assembler expert), or how likely it is that a compiler would recognize it.

    Followup: Just for fun I decided to test it. I compiled the code with -O1 on my handy compiler (g++ 4.6.3) and what it produced was:

    imulq %rdi, %rax
    shrq $32, %rax

    So, two instructions. However, it occurred to me that perhaps the code in question was to be run on a 32-bit processor, and my compiler is compiling for 64 bits. So I changed the problem a bit, to the analogous one on a 64-bit CPU:

    uint64_t((__uint128_t(a) * b) >> 64)

    and what the compiler produced was:

    mulq %rdi

    So, it looks like gcc 4.6.3 does, in fact, recognize how to optimize this particular code. No need for inline assembler here.

    --
    Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.