Slashdot Mirror


Ask Slashdot: How Many (Electronics) Gates Is That Software Algorithm?

dryriver writes "We have developed a graphics algorithm that got an electronics manufacturer interested in turning it into hardware. Here comes the problematic bit... The electronics manufacturer asked us to describe how complex the algorithm is. More specifically, we were asked 'How many (logic) gates would be needed to turn your software algorithm into hardware?' This threw us a bit, since none of us have done electronics design before. So here is the question: Is there a piece of software or another tool that can analyze an algorithm written in C/C++ and estimate how many gates would be needed to turn it into hardware? Or, perhaps, there is a more manual method of converting code lines to gates? Maybe an operation like 'Add' would require 3 gates while an operation like 'Divide' would need 6 gates? Something along those lines, anyway. To state the question one more time: How do we get from a software algorithm that is N lines long and executes X number of total operations overall, to a rough estimate of how many gates this algorithm would use when translated into electronic hardware?"

64 of 365 comments (clear)

  1. Holy crap by CajunArson · · Score: 5, Insightful

    Either implement it as shaders for a GPU (or a DSP) or hire somebody who actually knows about hardware design if you are hell-bent on implementing an ASIC.

    Slashdot: Where *not* to go to get specific advice about specific technical issues.

    --
    AntiFA: An abbreviation for Anti First Amendment.
    1. Re:Holy crap by Megane · · Score: 5, Funny

      And to think, they rejected my Ask Slashdot submission on how to find a cheat code on my bank's web site for unlimited moneys

      Just walk up to any ATM and press: up up down down left right left right B A start.

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    2. Re:Holy crap by Joce640k · · Score: 5, Insightful

      Just 'fess up and say "We don't know, we're software people, not hardware people".

      If it's really important they might offer some help.

      --
      No sig today...
    3. Re:Holy crap by Goaway · · Score: 5, Insightful

      This is the only sane answer. They probably only asked to find out if you happened to know.

      Say you don't know, and let them look at the code to figure it out.

    4. Re:Holy crap by Lehk228 · · Score: 2

      it's easier than that. just walk in with a note "PUT ALL OF THE MONEY IN A BAG AND NOBODY GETS HURT"

      might want to put on a fake mustache or a long wig and a stuffed bra if you have a girly face.

      --
      Snowden and Manning are heroes.
    5. Re:Holy crap by Austerity+Empowers · · Score: 5, Informative

      To give a more helpful, unhelpful answer, it's an ill-formed question. "How many gates" depends on the target on which you synthesize the hardware: a PCB, an FPGA, actual silicon (which fab? Which process? whose std cell library? what clock frequency?).

      If somehow the above could be narrowed down by asking the customer, then the next thing I'd advise is contracting someone who can write RTL using an HDL (verilog is most popular). The synthesizeable subset of HDL is tricky to learn for non-HW people, so unless you understand digital logic well I'd suggest finding someone else to do it for you. They can then synthesize it to the targeted device/platform. If you can do this, you should charge quite a lot of money since this form of IP is expensive, and they know it. If they're ok with that, you may also want to have this contractor also write the design verification suite, since this company will certainly want that to integrate into their own testing. Lots of contractors are out there for this due to the cyclic nature of this job, make sure you also have some support feature in place if you need them to fix/update the code later.

      Even simple software algorithms can be very big in HW, but some surpisingly complex SW algorithms are next to 1 liners in HW (like any form of bit masking or bit swizzling is free!). But generally if there are a lot of sequential steps, and those steps are different...it gets big. Also assume that for every 1 SW guy that wrote the code, you will need 1 RTL designer. If you take the verification step, it may be 1-2 verification engineers for 1 RTL, depending on your timeline.

    6. Re:Holy crap by fractoid · · Score: 4, Interesting

      It's also ill-formed (to the point of being almost meaningless) in the sense that the smallest number of gates for a given algorithm is probably going to be to implement some kind of low-end processor which then runs the algorithm as code.

      What they really wanted to ask was "what's the best price/performance option for executing this algorithm, given the following expected parameters and an initial production run size of X".

      --
      Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
    7. Re:Holy crap by Goaway · · Score: 3, Informative

      I work with plenty of people with that kind of degree or higher, and I doubt any of them could. Very few CS educations would teach you that. That is highly specialist knowledge, in an usual field.

      I really don't know why you would ever think that would be a common skill.

    8. Re:Holy crap by Mr+Z · · Score: 2

      I pretty much agree with all of the above, having worked in the biz awhile myself.

      Since this is a graphics algorithm (apparently), the OP might do better to try to state what the computational complexity is in terms of the operations involved for one output, in terms of basic operations such as multiplies and adds, and perhaps how much storage you need.

      Consider this example: If someone came to me and asked me "How much does an 8x8 IDCT cost?" After asking them if it needs bit exactness or not (some standards require it, others don't), I could give them some numbers and some implementation bounds. "The Chen IDCT needs around 11 multiplies and 20 adds per 8-pt IDCT. Multiply that by 16 to get the full cost for an 8x8. (176 multiplies, 320 adds) To meet video precision requirements for an 8x8, the multiplies should be greater than 16 bit precision, and you should carry greater than 16 bits of precision between horizontal and vertical passes."

      How many gates is that? Well, depends on the throughput you require, and the details of the implementation. Given the number of multiplies and adds required, you can work toward a number. Suppose you needed to have enough IDCT bandwidth to update a 1080p 4:2:2 image at 60Hz. So, that's 1920 * 1080 * 2 * 60 = approx 250M pixels/second that you need to produce. In terms of 8x8 blocks, that's a little under 4M blocks/second, with 176 multiplies and 320 adds. So, that's approx 700M multiplies a second and 1.3B adds.

      Still, that's far from enough to get to a gate count. If you put down 1 multiplier and 2 adders and ran it at 1GHz, you'd have more than enough compute throughput. You still need to add some control logic around it (especially if you only put 1 multiplier and 2 adders, because the IDCT's compute pattern is non-trivial), and some memory to store inputs, outputs and intermediate results. A more likely implementation probably has a lot more multipliers and adders in hardware, but also runs at a much slower clock rate.

      So how many gates is that? You need much more information to answer that question, despite the analysis above. You now need to pick an implementation strategy, and more than one makes sense. But, you have a much better idea of the computational cost, and can pick among multiple implementations. For example, if energy efficiency is your goal, you might implement the horizontal and vertical IDCTs in explicitly tuned multiplies and adds tuned to the exact precision necessary and connected exactly as the dataflow requires, and run the whole block at a low clock rate using slower transistors with less leakage. If flexibility is your goal, you might put in a small CPU with enough grunt to fit the computational load. with the idea that you can run other algorithms there if you need to. etc...

  2. How many by Aighearach · · Score: 2, Funny

    beowulf clusters does your algorithm desire?

  3. Verilog by tepples · · Score: 4, Informative

    If you learn to program in Verilog, you could try synthesizing for some FPGA and see how much space it takes up on the FPGA. But then programming for an FPGA differs from programming for a serial computer in that each line of code runs essentially as a separate thread, usually triggered on another signal (such as a clock) having a positive or negative edge.

    1. Re:Verilog by Anonymous Coward · · Score: 5, Interesting

      if you only need a estimation, use something like bamboo from PandA to convert your C Code to Verilog. Then synthesize this code for a FPGA. In the summery you should find how many logic cells would be used as well as how many digital gates in an asics are necessary. This value is only a estimation, but for your question, this should work.

    2. Re:Verilog by Andy+Dodd · · Score: 3, Interesting

      While there are some compilers that ATTEMPT to convert C/C++ into a hardware representation - These will usually fail unless you understand the target hardware.

      http://www.drdobbs.com/embedded-systems/c-for-fpgas/230800194

      One thing is: Even if you can successfully compile from C to Verilog or VHDL, there is no guarantee that the Verilog or VHDL will successfully synthesize on your target hardware.

      Even if it successfully synthesizes, there is no guarantee that it will be in any way an optimal implementation.

      Some C algorithms may never transfer well into a hardware implementation.

      --
      retrorocket.o not found, launch anyway?
    3. Re:Verilog by ranulf · · Score: 5, Informative

      The number of slices or logic cells or whatever else a particular synthesis program for a particular chip generates doesn't exactly correspond to a number of gates either. For instance, a single 4-in 1-out LUT on a Xilinx can be used for 1 gate or 6.

      I wouldn't have much confidence in automatic C to HDL conversion either. Good HDL design is about understanding the problem in terms of gates and parallelism. FPGAs and ASICs in general aren't particularly good at things that CPUs are good for, and inversely CPUs aren't especially good for things that FPGAs and ASICs can do well.

      The OP shows such a lack of understanding of hardware design that it's not funny! "Add = 3 gates, Divide = 6 gates" is quite comical to anyone who actually knows these things. A more ball park is that an n-bit add can be done with 2n LUTs, in terms of gates it's about 5n gates, but really that depends what gates you have available. A multiplier is massively more, dividing is even more complicated still. Fortunately, many FPGAs come with a few dedicator multipliers... Unless your algorithm requires only as many multipliers as you have available, you're probably best building a state machine and multiplexing a single multiplier unit, in much the same way as a CPU multiplexes the ALU at its core.

      The whole thing is massively dependent on algorithm and experience of the person doing the porting. The best advice is to say "I don't know" or to hire someone who does or suggest them running the algorithm on an embedded CPU.

    4. Re:Verilog by Jane+Q.+Public · · Score: 2, Informative

      "Some C algorithms may never transfer well into a hardware implementation."

      This is a fundamentally silly thing to say.

      Hardware can be made to implement ANY functioning software. It might not be easy, but it is pretty much by definition possible. It's already running on hardware... it would be very rare indeed for it to not be possible to translate it into even more-efficient hardware, since the hardware it's running on now is general-purpose.

    5. Re:Verilog by harrkev · · Score: 5, Informative

      Seriously???? Asking a C++ programmer to begin to use Verilog is simply not practical. There is a VERY STEEP learning curve in trying to target real hardware. There is even a very different frame of mind that has to be learned in order to target gates.

      I speak from experience. I program Verilog and SystemVerilog for a living doing ASIC design.

      Now, to answer the OP:

      The answer is very strongly: it depends. The most optimistic answer is a couple hundred thousand. Implement an 8-bit CPU and write the thing in under 32K of code.

      On the other end of the spectrum is "many billions." Design your own x86 multi-core CPU, throw a couple of gigs of SRAM on the ASIC, tons of flash for a solid-state disc drive, and you will have a complete high-end PC on a chip. Then add your software.

      Of course, these are both ridiculous extremes. Everything depends on the TYPE of operations being done. In a CPU a simple 32-bit multiply can be done with one character ("*"). In gates, if you need the answer in a single clock cycle, it can take an EXTREME amount of logic. However, if you are willing to wait 32 clock cycles for the answer, the amount of logic is reduced to a very manageable level. This is why C++ is a bad choice of input. How time-sensitive is it? Hardware is also very parallel in nature. Different parts of the chip can indeed be working on different things at the same time. You can go for a strictly pipelined architecture where each block does one little bit of the job and passes it off to the next block. High throughput, but lots of gates. Or you could design a general-purpose block and have it to everything slowly (the most extreme example of this approach is a common CPU).

      While I have heard of magic "C to gates" compilers, after almost 15 years in the business, I have never actually seen one. The closest that I have seen are tools that can turn Matlab code into (messy-looking) gates. If your algorithm is DSP in nature, this is a very viable alternative. Otherwise, the only advice that I can give you is to consult somebody who does hardware design for a living (like me).

      Otherwise, you really need to look at where the input comes from, where the output goes, and how fast you need to do the work.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    6. Re:Verilog by SecurityTheatre · · Score: 3, Informative

      "Add = 3 gates, Divide = 6 gates" is quite comical to anyone who actually knows these things.

      Looking at an old reference I have, a 16-bit ripple-carry style adder requires 576 transistors, and a 16-bit carry-lookahead style adder (faster) requires 784 transistors.

      This is not including ANY control circuitry, nor a subtract feature.

      A pure-hardware 16-bit integer DIVIDE is between 15-30 times more complicated. To do it in pure hardware, would require on the order of 23,000 transistors.

      Unless you need your division to happen wicked fast with low latency and you don't care about transistor count, it's better to build add/shift hardware and simply perform a division operation using those bits of hardware repeatedly.

      Also, we're only doing 16-bit. If you need 64-bit, multiple all of those numbers by about 50 (spitballing).

      And converting from C into VHDL is probably not going to be the best way to go about this. Hire a decent hardware engineer.

    7. Re:Verilog by harrkev · · Score: 5, Interesting

      Oh, one more thing about "C to Gates" compilers. In the industry I have not seen one in actual use, but they do supposedly exist. However, they would only work in a limited domain.

      For example, if you have C++ that does simple control or DSP-type stuff, then it might work (cannot vouch for the quality of the results). On the other hand, if you get one of these compilers and try feeding it the source code for the Apache web server or the Quake engine source code, you are completely screwed.

      If your application is, say, a novel type of network filter that inspects and does something to Ethernet packets, you have to figure out how to interface your design with a real Ethernet SerDes .. which is a *LOT* different than opening up something in the "/dev/" directory. If your application is robotics, then you also need to get data into and out of the chip. How exactly is this done? How fast does the logic need to run? Is it speech processing? If so, then this will involve a lot of straight-forward DSP. If you constrain the design to tell it how fast the data needs to flow through, you should be able to get a reasonable estimate. Does your application need a lot of memory? If so, you might need some type of RAM controller. DRAM controllers can be hairy to work with, and you also have to consider latency and throughput.

      In theory, C to gates can work quite well, ***for a limited subset of applications***.

      HOWEVER: as others have pointed out, anybody who needs to know the answer to this question should be qualified to answer it for themselves.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    8. Re:Verilog by MickLinux · · Score: 2

      I am confounded by your claim that a 16-bit hardware divide would take 24000 transistors. If nothing else, you should be able to cascade it into 4 4-bit lookups, and that would handle the job. And that would probably be overkill.

      Using shift-and-add would almost definitely seem to be better, especially since you could cue the operations. Although one 16-bit divide would then take about 120 clocks, 120 divides could take 240 clocks. (Look at me, I say clocks, I should say ops, and then let the clocks be whatever they are, be they quads or quarter clocks).

      Even better, logarithm takes only about twice that -- it's a lookup Shift-and-add, and square root is only about 140 clocks.

      Sure, you could go with the 24000 transistors, but wouldn't that end up being a cost/benefit situation? All that is in the domain of the chip design within constraints.

      Or am I wrong?

      --
      Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
    9. Re: Verilog by harrkev · · Score: 5, Insightful

      I still must disagree. Yes, the syntax is somewhat like C. However, WHAT you are coding is completely different. In particular, things that C and do with a simple "if" statement are not allowed at all in proper gate design. It is not hard to imagine a software guy coding latches all over the place, assigning the same signals from withing different always blocks, etc. Even "always @(posedge clock)" may be a fundamental paradigm shift for a software guy. And not to mention the rather arbitrary way that Verilog treats wire vs. reg.

      wire a = b & c;

      wire a;
      assign a = b & c;

      reg a
      always @(*) a = b & c;

      These three constructs do the same thing. Why is one "wire" and one "reg"?

      What is the difference between the two blocks (they are NOT the same - blocking vs. non-blocking)?

      always @(posedge clk) begin
          a = b;
          c = a & b;
      end

      always @(posedge clk) begin
          a = b;
          c = a & b;
      end

      What about race conditions? Glitches on combinatorial logic? Proper coding of state machines? Need memory? How do you drop in an encrypted 3rd party DDR controller and PHY? Interface with AHB bus? In a given process, how many levels are logic are reasonable for a given clock speed? What exactly are hold violations?

      I am not saying that any of these are insurmountable. What I am saying is that a good digital designer is worth paying for, and a software guy may have a very steep learning curve indeed.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    10. Re: Verilog by harrkev · · Score: 3, Informative

      Gaaa. On the blocking vs. non-blocking, Slashdot swallowed the "less than" sign. Apologies.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    11. Re:Verilog by SecurityTheatre · · Score: 2

      I was misunderstanding my notes.

      You would need several thousand transistors for a standard DIV circuit, and then the CPU would need to iterate through the operation many times in order to perform a division.

      A single-cycle division circuit isn't practical, so it would involve building a state-machine and having the processor stall while doing the DIV calculation. The simple 1-bit circuit I was looking at would require a number of cycles equal to the number of bits input (16, 32, 64, etc), although they can be made faster.

      looking at it, the latency for the Core2Duo chip to do a 64-bit integer DIV up to 87 cycles, and that's a pretty optimized circuit for raw speed.

    12. Re: Verilog by SecurityTheatre · · Score: 2

      I was wondering... Stared at that for too long before deciding something must have happened... :-)

    13. Re:Verilog by kesuki · · Score: 2

      "A multiplier is massively more, dividing is even more complicated still."
      which is why you multiply by .5 to get division by 2. by 3 you need to multiply by .333334 depending on your precision. all possible divisions are a subset of multiplication from .999 infinite repeating to .000near infinite zeros followed by a 1. strange that something so 'easy' is harder than regular multiplication.

    14. Re:Verilog by harrkev · · Score: 3, Interesting

      Actually, that depends on what the 24,000 transistors are doing. Let's assume that you stupidly did a divide using Verilog "/". This implies a one-cycle divide which might well take that many transistors. The problem is that you would not likely be able to get this to work in real life. With so many levels of logic, your timing would be pure crap. Plus you might have fanout and congestion issues that would further limit your timing. So you could get a divide in one clock cycle, but limit yourself to a clock speed of 10 MHz, for example.

      Once you get past about 10 or 12 levels of logic (in my opinion), it is time to re-code, no matter what your clock speed is. If you can't get the job done in 12 level, it is time to re-think your approach. Register re-timing can certainly be useful, but it is much better to do the job right in RTL, the way God intended. Register re-timing can make later steps more complicated (including formal verification).

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    15. Re:Verilog by harrkev · · Score: 2

      That is exactly my point. At one extreme, you could do a job in a a few hundred thousand gates, and at the other extreme you could do a job in a few billion gates. This is sort of an extreme upper-bound and a lower-bound on the size of the solution. Without further details, we have no idea where in the spectrum the real solution lies.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
  4. Why don't they know? by Anonymous Coward · · Score: 5, Insightful

    You'd think the "electronics manufacturer" would have some idea how to estimate this.

    1. Re:Why don't they know? by janeuner · · Score: 5, Insightful

      They do have a way. They asked if it had already been determined.

      The correct response is, "We don't know."

    2. Re:Why don't they know? by Sarten-X · · Score: 3, Funny

      Because they're robots with no AI functionality?

      --
      You do not have a moral or legal right to do absolutely anything you want.
    3. Re:Why don't they know? by Megane · · Score: 3, Informative

      A more accurate car analogy would be GM wanting to build a car using your technology and asking you how many assembly line workers it would take.

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
  5. Just like any other software project by mbadolato · · Score: 5, Funny

    Make up a number, then when they complain that it was way off, blame it on their management changing scope a hundred times throughout the life of the project!

    1. Re:Just like any other software project by Imrik · · Score: 3, Funny

      Made me think of this.

  6. C to HDL to netlist by Anonymous Coward · · Score: 2, Informative

    As a first-order approximation, you can translate your C/C++ code to a hardware description language (HDL) such as VHDL or Verilog. Tools exist for this process. The result won't be optimized for the peculiarities of HDL, but it will provide a good start. From there, you can port the HDL to a Xilinx or Altera FPGA netlist using vendor-specific tool chains. The porting effort will summarize the logic and memory resources of your implementation. Any digital hardware engineer worth their salt should be able to translate FPGA utilization metrics into whatever platform they are interested in.

  7. Try Stackoverflow perhaps? by Anonymous Coward · · Score: 5, Insightful

    I think you may have a better chance of getting an answer if you ask this question on Stackoverflow (or one of its related sites).

    Unfortunately, I think asking on Slashdot is only likely to get you some tired and outdated memes / jokes...

    1. Re:Try Stackoverflow perhaps? by greg1104 · · Score: 2

      The world is full of monkeys with precious little comprehension about the things they write let alone their theory of application

      Fixed that for you. Ninety percent of everything is crud.

    2. Re:Try Stackoverflow perhaps? by rasmusbr · · Score: 2

      The world is full of monkeys with precious little comprehension about the things they write let alone their theory of application

      Fixed that for you. Ninety percent of everything is crud.

      Off topic, but it is more interesting than that...

      When are you the most excited about some new idea or concept? When is your impulse to share technical ideas the greatest? Well, usually right after you've learned it, or rather when you think you've learned it but in reality you've only got a half-decent grasp of the idea and you still have have a number of the details completely wrong. The exception to this rule is that some highly skilled and knowledgeable people take pleasure in beating less knowledgeable people in the head with their knowledge. So there you have it: the virtual world of teachers consists of a lot of well-meaning people who don't know what they're talking about and one or two jerks who do.

      Obvious question: What does this tell us about people who like to give sex advice on the internet?

  8. They shouldn't be asking you. by pavon · · Score: 5, Insightful

    If they plan on implementing this in hardware, then they should have people who are capable of answering that question. If instead, they are just a manufacturer and aren't capable of doing the actual hardware design, then you have bigger problems than answering this question. That is something you should find out about ASAP.

  9. Minecraft by nbetcher · · Score: 5, Funny

    Develop out the algorithm in Minecraft using ProjectRed (Integration module, specifically) and then you can easily count the gates! :-)

  10. Completely stupid question by dskoll · · Score: 4, Insightful

    The question "How many gates does it take to implement this algorithm?" is stupid. It's like asking "How long is a piece of string?"

    There will always be a time/space tradeoff, even with translating an algorithm to hardware. You can save time by throwing more gates at the problem to increase parallelism, or you can save space by reusing gates in sequential operations.

  11. You need a C to VHDL translator by Animats · · Score: 4, Informative

    You need a C to VHDL translator. Here's a tutorial for one.

    Only the parts of the algorithm that have to go really fast need to be fully translated into hardware. Control, startup, debugging, and rarely used functions can be done in some minimal CPU on or off the chip. So, for sizing purposes, extract the core part of the code that uses most of the time and work only on that.

    1. Re:You need a C to VHDL translator by Trepidity · · Score: 4, Informative

      One caveat to going this route: if the algorithm contains well-known operations as building blocks, you probably don't want to synthesize your own VHDL versions of those standard operations, since they already have highly optimized hardware implementations. For example, if one step of the algorithm is "compute an FFT", you probably want to use an existing FFT IP core to implements it, rather than translating some FFT C code to new VHDL.

      At one extreme, where the algorithm is nothing but a chain of such cores (common in DSP applications), you could get a rough estimate just by looking up the gate counts for each operation and adding them up.

  12. VHDL by Anonymous Coward · · Score: 3, Informative

    Implement the algorithm in VHDL and test it on an FPGA. I would imagine you'll need to pay someone $$$$$ to do that for you...

  13. Break down your algorithm by neutrino38 · · Score: 4, Informative

    Hello,

    It is probable that you can break down your algorithm -(I do not mean code) into a pipeline of elementary processing and find implementations (IP) for each of them.

    to give out an estimate:
    - subdivise your algorithm into simpler pieces
    - find for each simple piece how it can or could be implemented in hardware and the complexity of each piece.
    - do the sum.

      Indeed an hardware designer or consultant would be of a great help here.

  14. HLS by orledrat · · Score: 4, Informative

    What you want to do is called high-level synthesis (going from C to hardware description language (HDL) to generating gate-lists from that HDL) and there's plenty of software to do that with. A neat open-source package for HLS is LegUp (http://legup.eecg.utoronto.ca/), check it out to get an idea of what the process consists of.

  15. It doesn't work like that by cjonslashdot · · Score: 5, Informative

    It's about more than gates. It is about registers, ALUs, gates, and how they are all connected. There are many different possible architectures, so it depends on the design: some designs are faster but take more real estate. There are algorithm-to-silicon compilers (I know: I wrote one for a product company during the '80s and it is apparently still in use today) but each compiler will assume a certain architecture. I would recommend one but I have been out of that field for decades.

  16. Easy calculation by Anonymous Coward · · Score: 5, Funny

    Here is a proven method for calculation.

    If your code is:
    a) C: divide the number of lines with 7
    b) C++: divide the number of lines with 5
    c) Ruby/Python/Java: divide the number of lines with 3
    d) Perl: multiply the number of lines with 42
    e) C#: resign.

  17. Sounds like a joke by Cryacin · · Score: 4, Funny

    How many Gates will it take to implement your software project?

    One. His name is Bill, and here is yours.

    --
    Science advances one funeral at a time- Max Planck
    1. Re:Sounds like a joke by Alioth · · Score: 2

      It's such a shame that Gates McFadden from ST:TNG didn't marry Bill Gates. Then she could have been Gates Gates.

  18. It's an optimization problem by swm · · Score: 5, Insightful

    You already have your algorithm running in electronic hardware, right?
    Your current gate count is the sum of
      * the gate count of your CPU
      * the gate count of your RAM
      * the gate count of your program ROM

    So that's an upper bound on the gate count.
    If that number is too big for your manufacturing partner,
    then you have an optimization problem.

    Optimization is a hard problem...

  19. Accurate answer by Sarten-X · · Score: 3, Informative

    Write out the truth table for each output as a Karnaugh map incorporating every input. Count the number of gates needed to solve the map, and that's your answer for that output bit. Repeat for every other output bit. Add all those numbers together, and that's a fair estimate of how many gates you'll need.

    Of course, this method requires that your number of input bits must be fairly small. Don't forget that memory counts as both input (when read) and output (when written). For nontrivial applications, you'll find that the number of gates quickly approaches "a lot".

    --
    You do not have a moral or legal right to do absolutely anything you want.
    1. Re:Accurate answer by mrego · · Score: 2

      Since they are translating a program/algorithm into circuitry, they need only to know the maximum number of gates that are used at any one cycle time (taking into account necessary time delays), so just adding all the gates per operation way over states the answer since and, not, or, etc. gate circuitry can be reused for different operations at another cycle time. Also, as for logic operations, run it through a Quine-McClusky optimization as well to minimize them.

  20. Gate count more a matter of speed by Yoik · · Score: 2

    It doesn't take many gates for a Turing machine that will run your algorithm but it's likely to be slow. A proper hardware implementation will optimize everything and be as parallel as possible.

    The problem as stated isn't adequately constrained.

  21. The key to success. by ttucker · · Score: 5, Insightful

    Do not ask a computer scientist to be an electrical engineer.

    1. Re:The key to success. by crankyspice · · Score: 2

      Do not ask a computer scientist to be an electrical engineer.

      Except ... Wow. An early course in my computer science curriculum was:

      201. Computer Logic Design I (3)
      Prerequisite: MATH 113 or equivalent all with a grade of "C" or better.
      Basic topics in combinational and sequential switching circuits with applications to the design of digital devices. Introduction to Electronic Design Automation (EDA) tools. Laboratory projects with Field Programmable Gate Arrays (FPGA).
      (Lecture 2 hours, lab 3 hours) Letter grade only (A-F).

      (We used Verilog and a Xilinx FPGA board.) I'm surprised a reputable CS degree wouldn't require at least a basic course in digital logic; Cal State Long Beach is a great school, but it's certainly not a standards bearer...

      --
      geek. lawyer.
    2. Re:The key to success. by geoskd · · Score: 3, Insightful

      Except ... Wow. An early course in my computer science curriculum was: 201. Computer Logic Design I (3) Prerequisite: MATH 113 or equivalent all with a grade of "C" or better. Basic topics in combinational and sequential switching circuits with applications to the design of digital devices. Introduction to Electronic Design Automation (EDA) tools. Laboratory projects with Field Programmable Gate Arrays (FPGA). (Lecture 2 hours, lab 3 hours) Letter grade only (A-F). (We used Verilog and a Xilinx FPGA board.) I'm surprised a reputable CS degree wouldn't require at least a basic course in digital logic; Cal State Long Beach is a great school, but it's certainly not a standards bearer...

      There is a world of difference between an entry level college course on ASIC/FPGA design, and actually being able to do the job. Just because you can design and synthesize a projct with a few hundred gates in it does not mean you are even remotely prepared to know where to begin a project with 10^6+ gates in it. More impotantly, high level software languages allow for indescriminant serial loops which are massively difficult to deal with in pure hardware. In short, the design methodology is completely different if you are trying to build for a software path, or a hardware path. You need someone with a hardware mindset to take your algorithm back to scratch and start over. Even knowing the HDLs is not good enough, as it is relatively trivial to write "valid" VHDL or Verlilog code that cant be synthesized...

      --
      I wish I had a good sig, but all the good ones are copyrighted
  22. Re:Cost estimate by EndlessNameless · · Score: 2

    If they can design the hardware, they can ask for the source and supply the quote themselves.

    If they can't, then OP needs to understand they have no practical design capabilities and plan on paying someone else to design it---before paying these guys to manufacture it. Or he can search for a shop that can handle both the design and the manufacture.

    --

    ---
    According to the latest ruleset, this post should be modded as Vorpal Flamebait +5.
  23. I've done this before by Asmodae · · Score: 4, Informative

    There's been several people who suggested using a high-level synthesis tool to convert your software (c/c++) directly to HDL (verilog/VHDL) of some kind. This can work and I've been on this task and seen it's output before. The catch is; unless that software was expressly and purpose written to describe hardware (by someone who understands that hardware and it's limitations and how that particular converter works), it almost always makes awful and extraordinarily inefficient hardware.

    Case in point - we had one algorithm developed in Simulink/Matlab that needed to end up in an FPGA. After 'pushing the button' and letting the tool generate the HDL, it consumed not just 1 but about 4 FPGAs worth of logic gates, RAMs, and registers. Needless to say the hardware platform only had one FPGA and a good portion of it was already dedicated to platform tasks so only about 20% was available for the algorithm. We got it working after basically re-implementing the algorithm with the goal of hardware in mind. The generation tool's output was 20 times worse than what was even feasible. If you're doing an ASIC you can just throw a crap-load of extra silicon at it, but that gets expensive very quickly. Plus closing timing on that will be a nightmare.

    My job recently has been to go through and take algorithms written by very smart people (but oriented to software) and re-implement them so they can fit on reasonably sized FPGAs. It can be a long task sometimes and there's no push-button solution for getting something good, fast, cheap. Techies usually say you can pick two during the design process, but when converting from software to hardware you usually only get one.

    Granted this all varies a lot and depends heavily on the specifics of the algorithm in question. But the most likely way to get a reasonable estimate is going to be to explain the algorithm in detail to an ASIC/FPGA engineer and let them work up a prelim architecture and estimate. The high-level synthesis push-button tools will give you a number but it probably won't be something people actually want to build/sell or buy.

  24. Liar! by nobuddy · · Score: 2

    I just tried this and all my money was transferred to a different account.

  25. Re:Matlab has a solution for this, but $$$ by Asmodae · · Score: 2

    They have a tool that can do this, I don't know if I'd call it a 'solution' just yet though. We've just finished ripping out all the 'solution' for our project because we wanted a device that was actually small enough (and thus cheap enough) to be able to sell.

    It takes input designed to be hardware and makes good hardware. It takes input designed to be software and makes shit hardware. It also doesn't handle version control very well, you need proprietary tools to even VIEW the design files... and the output which actually describes the hardware (vhdl) is so obfuscated as to be nearly illegible. The build times are also 4-5 times longer than they need to be, so it takes a whole day to place and route the designs output by this tool. Unless you're building something trivial I wouldn't advise depending on mathworks/simulink tools for a solution.

  26. We actually solved this.. by rayhoare · · Score: 2

    We (ConcurrentEDA.com) have developed a tool call Concurrent Analtyics that analyzes a program's x86 code and estimates the gate count. This tool works for Xilinx and Altera FPGA chips and provides an upper bound since logic optimization reduces the gate count. Essentially, we have an extensive library of all software assembly instructions and their gate count in an FPGA. Synthesizing software into a chip requires more work but we have an internal tool for that as well. We translate x86 into a hardware description language (HDL) that the vendor's tools synthesize into FPGA gates. Over 1 million lines of high-performance HDL have been generated using these tools since 2006. Both tools are internal tools that we use to offer accelerated FPGA design services. (feel free to contact me directly RayHoare _at_ concurrenteda _dot_ com)

  27. Knowing The Algorithm Is NOT Enough by SplawnDarts · · Score: 5, Insightful

    Knowing what algorithm you want to run in hardware in not even close to enough to estimate gates. You need to know the algorithm, and the required performance, and have a sketched out HW design that meets those goals. THEN you can estimate gate count.

    For a simple example of why this is, consider processors. A 386 and a Sandy Bridge i7 implement very similar "algorithms" - it's just fetch->decode->execute->writeback all day long. If you implemented them in software emulation, it would be very similar software with some additional bits for the newer ISA features on the i7. But a 386 is about 280 THOUSAND gates, and the i7 is about 350 MILLION gates/core - three orders of magnitude different. Of course, there's at least a 2 order of magnitude performance difference too - it's not like those gates are going to waste.

    Point is, knowing the algorithm isn't enough to get even a finger in the wind guess at gate count. If you need an answer to this question, you need to get competent HW design people looking at it.

  28. There's politics involved here by cloud.pt · · Score: 2

    I believe the OP is asking the question with an underlying motive that most users aren't grasping - The manufacturer definitely has a way of estimating the gate "cost" from C++, as some experts on the matter have pointed out here, but for that he probably demanded source code, which the OP probably has no safe way of handing over without compromising his Intelectual Property. He doesn't want to lose the business contract or spend money blindly on a consultancy he doesn't even know which's name is, so the question makes FULL SENSE regardless of its child-like semantics.

    You can probably bet the manufacturer is based and/or has legal safe-haven in a dodgy country, along the lines of having properties like:

    1. An established electronics manufacturing industry;
    2. Low respect and legislation for IP and the concept of royalties

    (hint: China) ...This makes the OP think twice about passing source around.

    Now, my personal opinion regarding a possible answer is more business-focused - if such a kind of manufacturer is even remotely interested on your "product" as to ask that, then you have a very marketable piece of code on your hands and you need to do the following...

    1. Find a "safer" buyer - something based on Europe (Germany?), Japan, or maybe the US if location is pinnacle over legislation. This nets you light IP protection
    2. Spend on a good legal advisor to draft a nuke-proof NDA with special clauses like "if we give you the code for estimation of costs, you either buy it or refrain from implementing similar technology for at least N years" (N>10)
    3. Despite all this, you still need an expert on electronic device manufacturing by your side, and I mean full-time. This also ensures you don't get robbed when they don't gain leverage on a final money deal with you by stating "it's too much gates! We can't pay more than XXXXX"
    4. In alternative, find business angels, investors or waste a TON of money and do the hardware YOURSELF, under your own company's umbrella, or maybe some form of partnership. This is the stuff that makes you a millionaire, but also places a lot of risk on your side.
  29. Re:why not go to /. by sgt+scrub · · Score: 2

    It is either good for that or good for picking up girls.

    --
    Having to work for a living is the root of all evil.
  30. Since nobody else here is prividing much help... by Wierdy1024 · · Score: 2

    I shall give it a go.

    First up, most algorithms can't be directly translated to hardware without either changing them or taking a serious performance hit.

    Nearly all widespread algorithms (eg. H264 video) are designed specifically with a hardware implementation in mind, and in fact must usually have elements removed that would produce good results simply because it wouldn't be sensible to implement in hardware.

    In particular, in hardware, loops that iterate an unknown number of times are generally not allowed.

    Steps to make this estimate would probably be to take your code and 'flatten' it (IE. Rewrite it to avoid all use of pointers, except arrays).

    For every variable, figure out how many bits wide it needs to be(IE. What is the smallest and largest possible value). You probably want to convert floating point to fixed point.

    Next, to make a lower bound of how many gates would be used if you were to design for minimal gate use, take every add and subtract operation and call them 15 gates per bit. For every multiply call it 5 gates per input bit squared. Don't do division (division can be done as a multiplication by the inverse of a number).

    For the upper bound, do the same, but multiply by the number of times each loop goes round. That gives you a design with lots more gates but much higher performance.

    For the upper bound finally add on 5 gates for every bit of every variable times the number of lines of your input code. This approximates the d type flip flops for storage in a pipeline. Note that if two lines of code operate on entirely different variables, you can call them the same line as far as this metric goes.

    For the lower bound, if you got a value greater than 10000 plus 16 times the number of bytes that your program is compiled plus the ram it allocates to run, it would be more gate efficient to put in a tiny processor and keep your algorithm in a ROM. (Lots of complex algorithms are implemented this way when space is at a premium).