Slashdot Mirror


Ask Slashdot: How Many (Electronics) Gates Is That Software Algorithm?

dryriver writes "We have developed a graphics algorithm that got an electronics manufacturer interested in turning it into hardware. Here comes the problematic bit... The electronics manufacturer asked us to describe how complex the algorithm is. More specifically, we were asked 'How many (logic) gates would be needed to turn your software algorithm into hardware?' This threw us a bit, since none of us have done electronics design before. So here is the question: Is there a piece of software or another tool that can analyze an algorithm written in C/C++ and estimate how many gates would be needed to turn it into hardware? Or, perhaps, there is a more manual method of converting code lines to gates? Maybe an operation like 'Add' would require 3 gates while an operation like 'Divide' would need 6 gates? Something along those lines, anyway. To state the question one more time: How do we get from a software algorithm that is N lines long and executes X number of total operations overall, to a rough estimate of how many gates this algorithm would use when translated into electronic hardware?"

233 of 365 comments (clear)

  1. Holy crap by CajunArson · · Score: 5, Insightful

    Either implement it as shaders for a GPU (or a DSP) or hire somebody who actually knows about hardware design if you are hell-bent on implementing an ASIC.

    Slashdot: Where *not* to go to get specific advice about specific technical issues.

    --
    AntiFA: An abbreviation for Anti First Amendment.
    1. Re:Holy crap by Megane · · Score: 5, Funny

      And to think, they rejected my Ask Slashdot submission on how to find a cheat code on my bank's web site for unlimited moneys

      Just walk up to any ATM and press: up up down down left right left right B A start.

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    2. Re:Holy crap by Joce640k · · Score: 5, Insightful

      Just 'fess up and say "We don't know, we're software people, not hardware people".

      If it's really important they might offer some help.

      --
      No sig today...
    3. Re:Holy crap by fisted · · Score: 1

      Doesn't sound to me like they were going to implement it themselves. But then again, you are frist post so you presumably didn't read TFS properly, in order to get your awsm frist post.

    4. Re:Holy crap by Goaway · · Score: 5, Insightful

      This is the only sane answer. They probably only asked to find out if you happened to know.

      Say you don't know, and let them look at the code to figure it out.

    5. Re:Holy crap by Anonymous Coward · · Score: 1

      Either implement it as shaders for a GPU (or a DSP) or hire somebody who actually knows about hardware design if you are hell-bent on implementing an ASIC.

      Slashdot: Where *not* to go to get specific advice about specific technical issues.

      But that's not what the customer wants. They want to pay them money for their algorithm so they can put it on hardware. This isn't the response you give to a customer who is asking for information because they want to potentially pay for your algorithm.

    6. Re:Holy crap by Anonymous Coward · · Score: 1

      This isn't even "please do my job for me", this is "this guy we're working with wants me to do his job. Please do his job for me." The hardware guy is asking the software guys to do hardware work.

    7. Re:Holy crap by Lehk228 · · Score: 2

      it's easier than that. just walk in with a note "PUT ALL OF THE MONEY IN A BAG AND NOBODY GETS HURT"

      might want to put on a fake mustache or a long wig and a stuffed bra if you have a girly face.

      --
      Snowden and Manning are heroes.
    8. Re:Holy crap by Lehk228 · · Score: 1

      it's perfectly thought out. it's a psychological trick, all any witness will describe is an ugly girl with a gross mustache.

      the same can be done with a ridiculous outfit and face paint but that will draw attention once you leave too.

      --
      Snowden and Manning are heroes.
    9. Re:Holy crap by ozmanjusri · · Score: 1

      Mustache and a girly face?

      http://www.mtv.com/news/articles/1715521/justin-bieber-believe-movie-stache-clip.jhtml

      Could work. It'd make most people look away in disgust...

      --
      "I've got more toys than Teruhisa Kitahara."
    10. Re:Holy crap by kevingnet · · Score: 1

      Just ask them how many programmers does it take to change a light bulb. He'll understand.

    11. Re:Holy crap by Austerity+Empowers · · Score: 5, Informative

      To give a more helpful, unhelpful answer, it's an ill-formed question. "How many gates" depends on the target on which you synthesize the hardware: a PCB, an FPGA, actual silicon (which fab? Which process? whose std cell library? what clock frequency?).

      If somehow the above could be narrowed down by asking the customer, then the next thing I'd advise is contracting someone who can write RTL using an HDL (verilog is most popular). The synthesizeable subset of HDL is tricky to learn for non-HW people, so unless you understand digital logic well I'd suggest finding someone else to do it for you. They can then synthesize it to the targeted device/platform. If you can do this, you should charge quite a lot of money since this form of IP is expensive, and they know it. If they're ok with that, you may also want to have this contractor also write the design verification suite, since this company will certainly want that to integrate into their own testing. Lots of contractors are out there for this due to the cyclic nature of this job, make sure you also have some support feature in place if you need them to fix/update the code later.

      Even simple software algorithms can be very big in HW, but some surpisingly complex SW algorithms are next to 1 liners in HW (like any form of bit masking or bit swizzling is free!). But generally if there are a lot of sequential steps, and those steps are different...it gets big. Also assume that for every 1 SW guy that wrote the code, you will need 1 RTL designer. If you take the verification step, it may be 1-2 verification engineers for 1 RTL, depending on your timeline.

    12. Re:Holy crap by gargleblast · · Score: 1

      Mustache and a girly face?

      It was mustache or a girly face. Logical operator precedence.

    13. Re:Holy crap by Anne+Thwacks · · Score: 1

      In the UK reading the GP post probably qualifies you as "havng information likely to be of use to a terrorist".

      --
      Sent from my ASR33 using ASCII
    14. Re:Holy crap by fractoid · · Score: 4, Interesting

      It's also ill-formed (to the point of being almost meaningless) in the sense that the smallest number of gates for a given algorithm is probably going to be to implement some kind of low-end processor which then runs the algorithm as code.

      What they really wanted to ask was "what's the best price/performance option for executing this algorithm, given the following expected parameters and an initial production run size of X".

      --
      Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
    15. Re:Holy crap by Goaway · · Score: 3, Informative

      I work with plenty of people with that kind of degree or higher, and I doubt any of them could. Very few CS educations would teach you that. That is highly specialist knowledge, in an usual field.

      I really don't know why you would ever think that would be a common skill.

    16. Re:Holy crap by Kagetsuki · · Score: 1

      Absolutely agreed. Just the fact the author didn't mention anything about an FPGA surprised me. I imagine the chip manufacturer must have been taken aback when they couldn't even give him a ballpark range.

      dryriver, you are doing it wrong. I question your motives for this if you haven't done it on a GPU or DSP as mentioned above, and compared to your current base implementation to that. If you are so convinced this absolutely needs to be done in hardware start looking for someone who knows what they are doing. I've only done enough FPGA development to know it's something that takes experience to do well and quite a bit of knowlege just to set up properly. Verilog and the like may look simple but consider how much time you spend valgrinding - you'll be doing that in hardware using a language which does not compile to something in any way you are used to, with no real conception or grasp of what to do to make things run better or even how to gauge performance. Save money by saving time by hiring a pro.

    17. Re:Holy crap by Mr+Z · · Score: 2

      I pretty much agree with all of the above, having worked in the biz awhile myself.

      Since this is a graphics algorithm (apparently), the OP might do better to try to state what the computational complexity is in terms of the operations involved for one output, in terms of basic operations such as multiplies and adds, and perhaps how much storage you need.

      Consider this example: If someone came to me and asked me "How much does an 8x8 IDCT cost?" After asking them if it needs bit exactness or not (some standards require it, others don't), I could give them some numbers and some implementation bounds. "The Chen IDCT needs around 11 multiplies and 20 adds per 8-pt IDCT. Multiply that by 16 to get the full cost for an 8x8. (176 multiplies, 320 adds) To meet video precision requirements for an 8x8, the multiplies should be greater than 16 bit precision, and you should carry greater than 16 bits of precision between horizontal and vertical passes."

      How many gates is that? Well, depends on the throughput you require, and the details of the implementation. Given the number of multiplies and adds required, you can work toward a number. Suppose you needed to have enough IDCT bandwidth to update a 1080p 4:2:2 image at 60Hz. So, that's 1920 * 1080 * 2 * 60 = approx 250M pixels/second that you need to produce. In terms of 8x8 blocks, that's a little under 4M blocks/second, with 176 multiplies and 320 adds. So, that's approx 700M multiplies a second and 1.3B adds.

      Still, that's far from enough to get to a gate count. If you put down 1 multiplier and 2 adders and ran it at 1GHz, you'd have more than enough compute throughput. You still need to add some control logic around it (especially if you only put 1 multiplier and 2 adders, because the IDCT's compute pattern is non-trivial), and some memory to store inputs, outputs and intermediate results. A more likely implementation probably has a lot more multipliers and adders in hardware, but also runs at a much slower clock rate.

      So how many gates is that? You need much more information to answer that question, despite the analysis above. You now need to pick an implementation strategy, and more than one makes sense. But, you have a much better idea of the computational cost, and can pick among multiple implementations. For example, if energy efficiency is your goal, you might implement the horizontal and vertical IDCTs in explicitly tuned multiplies and adds tuned to the exact precision necessary and connected exactly as the dataflow requires, and run the whole block at a low clock rate using slower transistors with less leakage. If flexibility is your goal, you might put in a small CPU with enough grunt to fit the computational load. with the idea that you can run other algorithms there if you need to. etc...

    18. Re:Holy crap by DarwinSurvivor · · Score: 1

      You can't patent and algorithm. At least your not supposed to be able to...

    19. Re:Holy crap by fatphil · · Score: 1

      That, or having a London A-Z.

      --
      Also FatPhil on SoylentNews, id 863
    20. Re:Holy crap by meustrus · · Score: 1

      With my 4-year CS degree I could tell you the basic idea, and I could recognize software that did it, but it would take a month for me to implement something myself. So here's my stab at the problem.

      The crux of the issue is to reduce the software to specific operations for which you know how many gates are needed. To get a rough idea, I'd look at the compiled bytecode. There might then be an existing table of how many gates are needed to implement each operation in the resulting bytecode, or even more likely a number of transistors. But if not, that's where it would take a month of doing rough logical analysis to put together such a table. Then you add it up and get your result, which is kind of "it shouldn't take more than X many gates".

      But then somebody has to actually transform the program into transistors so maybe you should just hire somebody that can do that. If you have the hardware design, it's trivial to tell someone how many transistors/gates are in it.

      --
      I sometimes ask revealing, often ignorant-seeming questions. Maybe they're harder to answer than you think.
    21. Re:Holy crap by bluefoxlucid · · Score: 1

      I got an associate's degree in computer networking because I learned to configure CISCO routers. The way to handle this is to define your algorithm as a set of discrete logic and arithmetic actions (arithmetic actions can be represented as half-adders and such), and then count the number of decisions and do some on-paper optimizations. Then you know how many gates you need, roughly.

      Then again, I have the inherent ability to simulate the entire universe in my head on the cosmic or subatomic level so...

    22. Re:Holy crap by Goaway · · Score: 1

      This is quite many levels beyond little k-maps.

    23. Re:Holy crap by jeremyp · · Score: 1

      Lileth? The PDP-11 was a hardware Fortran machine

      No it wasn't.

      and C was its assembler!

      No it wasn't, it's assembler was Macro 11 which doesn't look anything like Fortran or C.

      --
      All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
    24. Re:Holy crap by gregor-e · · Score: 1

      No, the easy-peasy software developer estimation is to buy a bunch of progressively smaller CPUs, port your algorithm to each of them, and find the smallest CPU on which your algorithm still provides acceptable throughput. Then quote the number of gates on that CPU. If your algorithm still runs acceptably on a 4004, you can tell them it'll take about 2,300 transistors.

    25. Re:Holy crap by stevesliva · · Score: 1

      K-maps can't develop stateful logic?!? Inconceivable.

      --
      Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
    26. Re:Holy crap by Teancum · · Score: 1

      I think he was talking about Elementary Education majors. They know how to count lots of gates.... white gates, black gates, picket fence gates, and other kinds of gates. They will even show you how to put that on a proper number line to count to more than the number of fingers on both hands.

    27. Re:Holy crap by Some_Llama · · Score: 1

      "Then again, I have the inherent ability to simulate the entire universe in my head on the cosmic or subatomic level so... "

      i thought this was a common ability?

  2. How many by Aighearach · · Score: 2, Funny

    beowulf clusters does your algorithm desire?

  3. Verilog by tepples · · Score: 4, Informative

    If you learn to program in Verilog, you could try synthesizing for some FPGA and see how much space it takes up on the FPGA. But then programming for an FPGA differs from programming for a serial computer in that each line of code runs essentially as a separate thread, usually triggered on another signal (such as a clock) having a positive or negative edge.

    1. Re:Verilog by Anonymous Coward · · Score: 5, Interesting

      if you only need a estimation, use something like bamboo from PandA to convert your C Code to Verilog. Then synthesize this code for a FPGA. In the summery you should find how many logic cells would be used as well as how many digital gates in an asics are necessary. This value is only a estimation, but for your question, this should work.

    2. Re:Verilog by Andy+Dodd · · Score: 3, Interesting

      While there are some compilers that ATTEMPT to convert C/C++ into a hardware representation - These will usually fail unless you understand the target hardware.

      http://www.drdobbs.com/embedded-systems/c-for-fpgas/230800194

      One thing is: Even if you can successfully compile from C to Verilog or VHDL, there is no guarantee that the Verilog or VHDL will successfully synthesize on your target hardware.

      Even if it successfully synthesizes, there is no guarantee that it will be in any way an optimal implementation.

      Some C algorithms may never transfer well into a hardware implementation.

      --
      retrorocket.o not found, launch anyway?
    3. Re:Verilog by ranulf · · Score: 5, Informative

      The number of slices or logic cells or whatever else a particular synthesis program for a particular chip generates doesn't exactly correspond to a number of gates either. For instance, a single 4-in 1-out LUT on a Xilinx can be used for 1 gate or 6.

      I wouldn't have much confidence in automatic C to HDL conversion either. Good HDL design is about understanding the problem in terms of gates and parallelism. FPGAs and ASICs in general aren't particularly good at things that CPUs are good for, and inversely CPUs aren't especially good for things that FPGAs and ASICs can do well.

      The OP shows such a lack of understanding of hardware design that it's not funny! "Add = 3 gates, Divide = 6 gates" is quite comical to anyone who actually knows these things. A more ball park is that an n-bit add can be done with 2n LUTs, in terms of gates it's about 5n gates, but really that depends what gates you have available. A multiplier is massively more, dividing is even more complicated still. Fortunately, many FPGAs come with a few dedicator multipliers... Unless your algorithm requires only as many multipliers as you have available, you're probably best building a state machine and multiplexing a single multiplier unit, in much the same way as a CPU multiplexes the ALU at its core.

      The whole thing is massively dependent on algorithm and experience of the person doing the porting. The best advice is to say "I don't know" or to hire someone who does or suggest them running the algorithm on an embedded CPU.

    4. Re:Verilog by bob_super · · Score: 1

      This.

      But there has been recent progress, and Xilinx is pushing hard to get people to compile C to gates with their Vivado HLS (guess the targets?).
      Worth having a look at, since you usually can get a 30-day eval license for FPGA tools.

    5. Re:Verilog by Jane+Q.+Public · · Score: 2, Informative

      "Some C algorithms may never transfer well into a hardware implementation."

      This is a fundamentally silly thing to say.

      Hardware can be made to implement ANY functioning software. It might not be easy, but it is pretty much by definition possible. It's already running on hardware... it would be very rare indeed for it to not be possible to translate it into even more-efficient hardware, since the hardware it's running on now is general-purpose.

    6. Re:Verilog by Asmodae · · Score: 1

      Yep. Although sub-optimal is like the understatement of the year. I've seen not just inefficient but inefficient by an order of magnitude at times.

    7. Re:Verilog by harrkev · · Score: 5, Informative

      Seriously???? Asking a C++ programmer to begin to use Verilog is simply not practical. There is a VERY STEEP learning curve in trying to target real hardware. There is even a very different frame of mind that has to be learned in order to target gates.

      I speak from experience. I program Verilog and SystemVerilog for a living doing ASIC design.

      Now, to answer the OP:

      The answer is very strongly: it depends. The most optimistic answer is a couple hundred thousand. Implement an 8-bit CPU and write the thing in under 32K of code.

      On the other end of the spectrum is "many billions." Design your own x86 multi-core CPU, throw a couple of gigs of SRAM on the ASIC, tons of flash for a solid-state disc drive, and you will have a complete high-end PC on a chip. Then add your software.

      Of course, these are both ridiculous extremes. Everything depends on the TYPE of operations being done. In a CPU a simple 32-bit multiply can be done with one character ("*"). In gates, if you need the answer in a single clock cycle, it can take an EXTREME amount of logic. However, if you are willing to wait 32 clock cycles for the answer, the amount of logic is reduced to a very manageable level. This is why C++ is a bad choice of input. How time-sensitive is it? Hardware is also very parallel in nature. Different parts of the chip can indeed be working on different things at the same time. You can go for a strictly pipelined architecture where each block does one little bit of the job and passes it off to the next block. High throughput, but lots of gates. Or you could design a general-purpose block and have it to everything slowly (the most extreme example of this approach is a common CPU).

      While I have heard of magic "C to gates" compilers, after almost 15 years in the business, I have never actually seen one. The closest that I have seen are tools that can turn Matlab code into (messy-looking) gates. If your algorithm is DSP in nature, this is a very viable alternative. Otherwise, the only advice that I can give you is to consult somebody who does hardware design for a living (like me).

      Otherwise, you really need to look at where the input comes from, where the output goes, and how fast you need to do the work.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    8. Re:Verilog by SecurityTheatre · · Score: 3, Informative

      "Add = 3 gates, Divide = 6 gates" is quite comical to anyone who actually knows these things.

      Looking at an old reference I have, a 16-bit ripple-carry style adder requires 576 transistors, and a 16-bit carry-lookahead style adder (faster) requires 784 transistors.

      This is not including ANY control circuitry, nor a subtract feature.

      A pure-hardware 16-bit integer DIVIDE is between 15-30 times more complicated. To do it in pure hardware, would require on the order of 23,000 transistors.

      Unless you need your division to happen wicked fast with low latency and you don't care about transistor count, it's better to build add/shift hardware and simply perform a division operation using those bits of hardware repeatedly.

      Also, we're only doing 16-bit. If you need 64-bit, multiple all of those numbers by about 50 (spitballing).

      And converting from C into VHDL is probably not going to be the best way to go about this. Hire a decent hardware engineer.

    9. Re:Verilog by xvan · · Score: 1

      Yes, but the best tecnology is currently applied to CPU's, not FPGA or ASICS. So for certain sequential algorithms, the size of the pipeliene should be too big to beat a Processor real speed.

    10. Re:Verilog by Darinbob · · Score: 1

      Ya, FPGA is a good start, but you often need experts to redesign the algorithm for hardware. Ie, you will be able to do much more parallism than in software (fine and coarse grained, maybe pipelined dataflow, vector operations, etc). Software as an algorithm usually has very little parallelism unless using a language intended to show the parallelism.

      Maybe consider if part of the algorithm can be better done with a DSP chip as well.

      As for how many gates, well as many gates as it takes to have an 8 bit CPU is one answer plus the gates to hold the memory of the algorithm. It won't be fast that way but it certainly is enough. Since it's not an acceptable answer I suspect, this implies that the question of "how many gates?" is the wrong question to be asking.

    11. Re:Verilog by Asmodae · · Score: 1

      He didn't say "may not transfer at all", he said "may not transfer well". Also remember that an algorithm isn't just running on any old bit of hardware it's running on a modern CPU with lots of special instructions with a gigantic RAM attached to it and potentially some other peripherals for special functions. Hardware RNG, etc. It might very well not be reasonable to try to convert all this to a custom FPGA/ASIC for the cost involved.

    12. Re:Verilog by fisted · · Score: 1

      It really isn't feasible for even moderately complex systems.
      Or you seem to be ignoring that most 'hardware' does pretty much nothing without .... software (i.e. firmware).

    13. Re:Verilog by harrkev · · Score: 5, Interesting

      Oh, one more thing about "C to Gates" compilers. In the industry I have not seen one in actual use, but they do supposedly exist. However, they would only work in a limited domain.

      For example, if you have C++ that does simple control or DSP-type stuff, then it might work (cannot vouch for the quality of the results). On the other hand, if you get one of these compilers and try feeding it the source code for the Apache web server or the Quake engine source code, you are completely screwed.

      If your application is, say, a novel type of network filter that inspects and does something to Ethernet packets, you have to figure out how to interface your design with a real Ethernet SerDes .. which is a *LOT* different than opening up something in the "/dev/" directory. If your application is robotics, then you also need to get data into and out of the chip. How exactly is this done? How fast does the logic need to run? Is it speech processing? If so, then this will involve a lot of straight-forward DSP. If you constrain the design to tell it how fast the data needs to flow through, you should be able to get a reasonable estimate. Does your application need a lot of memory? If so, you might need some type of RAM controller. DRAM controllers can be hairy to work with, and you also have to consider latency and throughput.

      In theory, C to gates can work quite well, ***for a limited subset of applications***.

      HOWEVER: as others have pointed out, anybody who needs to know the answer to this question should be qualified to answer it for themselves.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    14. Re: Verilog by Scowler · · Score: 1

      Verilog syntax was designed specifically to make it similar to C syntax, so I have to partially disagree with you on that note. A lot of software engineers do understand basics of system design, as well as some basics of parallel processing. There is indeed a learning curve on Verilog, but I'd say the vast majority of it is learning how to create effective test benches, not writing the system logic itself.

    15. Re:Verilog by MickLinux · · Score: 2

      I am confounded by your claim that a 16-bit hardware divide would take 24000 transistors. If nothing else, you should be able to cascade it into 4 4-bit lookups, and that would handle the job. And that would probably be overkill.

      Using shift-and-add would almost definitely seem to be better, especially since you could cue the operations. Although one 16-bit divide would then take about 120 clocks, 120 divides could take 240 clocks. (Look at me, I say clocks, I should say ops, and then let the clocks be whatever they are, be they quads or quarter clocks).

      Even better, logarithm takes only about twice that -- it's a lookup Shift-and-add, and square root is only about 140 clocks.

      Sure, you could go with the 24000 transistors, but wouldn't that end up being a cost/benefit situation? All that is in the domain of the chip design within constraints.

      Or am I wrong?

      --
      Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
    16. Re: Verilog by harrkev · · Score: 5, Insightful

      I still must disagree. Yes, the syntax is somewhat like C. However, WHAT you are coding is completely different. In particular, things that C and do with a simple "if" statement are not allowed at all in proper gate design. It is not hard to imagine a software guy coding latches all over the place, assigning the same signals from withing different always blocks, etc. Even "always @(posedge clock)" may be a fundamental paradigm shift for a software guy. And not to mention the rather arbitrary way that Verilog treats wire vs. reg.

      wire a = b & c;

      wire a;
      assign a = b & c;

      reg a
      always @(*) a = b & c;

      These three constructs do the same thing. Why is one "wire" and one "reg"?

      What is the difference between the two blocks (they are NOT the same - blocking vs. non-blocking)?

      always @(posedge clk) begin
          a = b;
          c = a & b;
      end

      always @(posedge clk) begin
          a = b;
          c = a & b;
      end

      What about race conditions? Glitches on combinatorial logic? Proper coding of state machines? Need memory? How do you drop in an encrypted 3rd party DDR controller and PHY? Interface with AHB bus? In a given process, how many levels are logic are reasonable for a given clock speed? What exactly are hold violations?

      I am not saying that any of these are insurmountable. What I am saying is that a good digital designer is worth paying for, and a software guy may have a very steep learning curve indeed.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    17. Re: Verilog by harrkev · · Score: 3, Informative

      Gaaa. On the blocking vs. non-blocking, Slashdot swallowed the "less than" sign. Apologies.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    18. Re:Verilog by SecurityTheatre · · Score: 2

      I was misunderstanding my notes.

      You would need several thousand transistors for a standard DIV circuit, and then the CPU would need to iterate through the operation many times in order to perform a division.

      A single-cycle division circuit isn't practical, so it would involve building a state-machine and having the processor stall while doing the DIV calculation. The simple 1-bit circuit I was looking at would require a number of cycles equal to the number of bits input (16, 32, 64, etc), although they can be made faster.

      looking at it, the latency for the Core2Duo chip to do a 64-bit integer DIV up to 87 cycles, and that's a pretty optimized circuit for raw speed.

    19. Re:Verilog by harrkev · · Score: 1

      I think that he was talking about doing a divide the dummy way: just use the "/" character and let the compiler do it in one clock cycle. Yes, you can do divide in a LOT fewer transistors, but you have to be smart about it, and wait a few extra clock cycles.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    20. Re:Verilog by SecurityTheatre · · Score: 1

      Yeah, I misunderstood my notes. See below :-) I last did any hardware design 15 years ago. hah

    21. Re: Verilog by SecurityTheatre · · Score: 2

      I was wondering... Stared at that for too long before deciding something must have happened... :-)

    22. Re:Verilog by InvalidError · · Score: 1

      HDL is not that steep of a learning curve for people who have no problem thinking in parallel instead of serial. Personally, I have an easier time writing VHDL code than C/C++ and a large part of this is because writing HDL requires much more clearly defined goals than high-level languages.

      The biggest problem with translating C-code to gates is that even if there were "magic compilers" to do this automatically, the number of gates will vary drastically depending on how much loop unrolling, pipelining and other parallelism the algorithm allows, the available gate budget, the actual performance goals, the ASIC process itself with its primitives library, etc. It is extremely unlikely that automatic tools will manage to achieve a good balance between all factors without substantial guidance and if you are going to set compiler hints all over the C-code to tell the HDL translator how you want it to unroll loops and exactly how you want stuff to get pipelined, it may end up cheaper, faster, cleaner and far more efficient to simply ask an HDL programmer to port the algorithm. There is not much point in using a "magic compiler" if you still need HDL/FPGA/ASIC specialists to go through the whole algorithm and re-think it from a hardware point of view to put the relevant hints in the code to assist the compiler in producing at least somewhat sensible code... might as well pay them for a proper port since this is pretty much what those guys need to have in mind to put meaningful hints in the original code.

      As you said, it has been the holy grail of some software developers for a while and I have a hard time imagining a successful "magic compiler" any time soon.

    23. Re:Verilog by eric31415927 · · Score: 1

      Previous poster had the following signature:
      MR ASICs. MR not. SAR CDEDBD transistors? YLB. MR ASICs

      YLB??
      This should be replaced with Li'l B.

      Mr Ducks; Mr Knott; Czar; C. M. Wings; Li'l B.; Mr Ducks

    24. Re:Verilog by kesuki · · Score: 2

      "A multiplier is massively more, dividing is even more complicated still."
      which is why you multiply by .5 to get division by 2. by 3 you need to multiply by .333334 depending on your precision. all possible divisions are a subset of multiplication from .999 infinite repeating to .000near infinite zeros followed by a 1. strange that something so 'easy' is harder than regular multiplication.

    25. Re:Verilog by Bing+Tsher+E · · Score: 1

      Hardware can be made to implement ANY functioning software.

      Sure it can. Even if it involves huge, huge diode arrays and many pounds of solder.

    26. Re:Verilog by Bing+Tsher+E · · Score: 1

      I replaced a microcontroller with a dual op-amp and some passives in a design when they told me the OTP microcontroller was too expensive. The CPU was about 20 cents. A dual op-amp was less than one cent.

    27. Re:Verilog by cheater512 · · Score: 1

      If you want to trade transistors for time, just use a CPU.

      Anyway when making a chip, 24,000 transistors is not much. You don't want to do it everywhere sure, but a couple of times and it isn't an issue.

    28. Re: Verilog by Scowler · · Score: 1

      Your point is well made. I work with a lot of software engineers who are required to be both highly proficient in C++ and modestly proficient in Verilog, and have seen many of the mistakes you highlight. One thing I wish I had mentioned before is, FOR-GENERATE logic specified in Verilog 2001 spec. Using this syntax made the transition for software engineers even easier, although it can be argued that type of syntax is often abused these days and is difficult to maintain.

    29. Re: Verilog by O('_')O_Bush · · Score: 1

      He was talking most common/basic arithmetic. Floating point(the only way you'd do 2*0.33333 == 2/3) is a whole different animal, and more complicated than any of the other functions discussed.

      --
      while(1) attack(People.Sandy);
    30. Re: Verilog by Scowler · · Score: 1

      Debug and test is certainly the hard part. Tools like Modelsim don't seem very approachable and take time to learn. Creating proper test coverage is always a challenge, even for vets. As for crossing clock domains... Oy, that part does have a steep learning curve...

    31. Re:Verilog by Asmodae · · Score: 1

      Unless the algorithm requires all those special instructions and monster ram to run.... at which point your custom hardware looks very much like the CPU and system it is intended to replace, and definitely not cheaper unless you're selling a whole lot of them. Reliable hardware is expensive to build even when it's a simple design iterating on previously known good hardware. Starting from scratch on raw silicon takes millions of dollars, just for your first chip lot, not to mention all the man hours to get it there and subsequent revisions. There are lots of algorithms that don't make any sense (from a cost vs efficiency standpoint) to port to custom hardware. That's the whole reason the generic CPU exists in the first place.

      I guess I'm disagreeing with your definition of better. If it's faster but costs too much for anyone to actually buy isn't better.

    32. Re:Verilog by Asmodae · · Score: 1

      The original wording was "Some C algorithms may never transfer well into a hardware implementation." At least in my mind the transfer process is what might not go well... not how the final product may or may not run. Having some experience here I understood the transfer to be where the work/expense would be. And those are ultimately key factors you would use to base your decision about whether or not to go ahead and make the conversion.

      I don't think we disagree on content, just on what might have been meant by Andy's post. Especially considering exactly what you said for the reasons you said, making claims of impossibility would indeed be silly.

    33. Re: Verilog by Asmodae · · Score: 1

      Nice point - For is used for iteration in software, and For Generate in hardware is used to generate new instantiations. The similarity in words/syntax is a dangerous trap. The closest thing in software is like putting malloc or new() in a loop. It's a great convention for when you need many similar bits of hardware. Completely wrong for iteration.

    34. Re:Verilog by harrkev · · Score: 1

      At least you get it. If that sig doesn't get me geed cred, I don't know what will.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    35. Re: Verilog by harrkev · · Score: 1

      Meh. As long as you use this the way it is intended: making a lot of instances that look a lot alike, for-generate is awesome. I cannot imagine having to instantiate 128 instances of RAMs without it. Well, I could use PERL to generate Verilog, but that gets messy fast.

      For what it's worth, Emacs has some pretty rockin' Verilog features. The ability to hook things up by name, with a regexp thrown in to keep things sorted, is awesome.

      Emacs also has a completely different VHDL mode that provides a completely different set of features, with the down side that you have to use VHDL ;-)

      Seriously, Emacs has a Verilog and a VHDL mode that both provide awesome, but almost completely non-overlapping features. Verilog mode does lots of cool things that VHDL does not do, and vice-verse.

      I wish that somebody would port the VHDL stuff over to Verilog world... that somebody NOT being me, since I already have Verilog, bash, csh, PERL, and some Java rolling around in my head. It would probably sprain my brain to get enough LISP in there to do the job. Hmm, is it "if () {" or "if () begin"??? Do I do "else if" or "elsif." That sort of thing gets tiring after a while. Plus, with a wife and five kids, not much time for learning a new language either.

      Why doesn't somebody port Emacs over to PERL instead of the bloody abortion known as LISP? Not THAT I could learn to love.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    36. Re:Verilog by harrkev · · Score: 3, Interesting

      Actually, that depends on what the 24,000 transistors are doing. Let's assume that you stupidly did a divide using Verilog "/". This implies a one-cycle divide which might well take that many transistors. The problem is that you would not likely be able to get this to work in real life. With so many levels of logic, your timing would be pure crap. Plus you might have fanout and congestion issues that would further limit your timing. So you could get a divide in one clock cycle, but limit yourself to a clock speed of 10 MHz, for example.

      Once you get past about 10 or 12 levels of logic (in my opinion), it is time to re-code, no matter what your clock speed is. If you can't get the job done in 12 level, it is time to re-think your approach. Register re-timing can certainly be useful, but it is much better to do the job right in RTL, the way God intended. Register re-timing can make later steps more complicated (including formal verification).

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    37. Re:Verilog by harrkev · · Score: 1

      "Some C algorithms may never transfer well into a hardware implementation."

      This is a fundamentally silly thing to say

      Not silly at all. Imagine a malloc of a gigabyte of RAM. You do not want to casually just drop a gigabyte of RAM into an ASIC, since that would likely be most of your chip size. You would need to use some sort of DRAM controller, which is HIGHLY dependent upon what foundry you use.

      Also, how about opening an Ethernet port? Is this magic compiler also supposed to magically create a SerDes -- complete with a PLL, for any architecture that you choose?

      Should I even mention file opens -- how would that work on a chip with no hard drive attached? Use your imagination about keyboards, mice, and graphics cards, and sound.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    38. Re: Verilog by doctor.entheogen · · Score: 1

      Trying to see if you can escape out the less than sign. &lt= I guess not =

    39. Re:Verilog by gargleblast · · Score: 1

      Dyslexia?

    40. Re:Verilog by fractoid · · Score: 1

      Even if it successfully synthesizes, there is no guarantee that it will be in any way an optimal implementation.

      However, if it does synthesize into something runnable, then you've just proved an upper bound for the cost of the implementation. If the upper bound is in any way commercially feasible then it's definitely worth optimising.

      --
      Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
    41. Re:Verilog by fisted · · Score: 1

      Your point being? Where's the 'moderately complex system'? Or are you going to tell me that two opamps now do the whole job of the MCU? If so, using an MCU there was probably a failure to begin with.

    42. Re: Verilog by loufoque · · Score: 1

      Surely you're speaking of integral logarithm and division, not floating-point?

    43. Re: Verilog by makomk · · Score: 1

      You've forgotten about fixed point, which isn't really any more complicated to implement than integer arithmetic and is a perfectly reasonable way of implementing integer division by a fixed divisor. (A lot of compilers actually use this trick, because even running on a CPU it's often more efficient than using hardware division.)

    44. Re:Verilog by Bing+Tsher+E · · Score: 1

      Design your own x86 multi-core CPU, throw a couple of gigs of SRAM on the ASIC, tons of flash for a solid-state disc drive, and you will have a complete high-end PC on a chip. Then add your software.

      That's utterly ridiculous. It's like if someone wanted a doughnut-making machine. So they built a city, and in one of the neighborhoods they built a factory that could make doughnut machines.

    45. Re:Verilog by Muad'Dave · · Score: 1

      The classic 74181 4 bit ALU shows how it can be done (page 5). It shows the schematic of the chip in gate form. Page 4 shows the 'opcodes' (really operation selection line combinations) that this simple chip can perform.

      The mighty (!) 32xx series of minicomputers from Concurrent Computer in the 1980s/90s used a bunch of these chained together to form a 32 bit ALU.

      --
      Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.
    46. Re: Verilog by Muad'Dave · · Score: 1

      < yes you can - the HTML escape is &lt; - note the trailing semicolon.

      --
      Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.
    47. Re: Verilog by Mr+Z · · Score: 1

      You need &lt; to get it (the semicolon also): <

    48. Re:Verilog by harrkev · · Score: 2

      That is exactly my point. At one extreme, you could do a job in a a few hundred thousand gates, and at the other extreme you could do a job in a few billion gates. This is sort of an extreme upper-bound and a lower-bound on the size of the solution. Without further details, we have no idea where in the spectrum the real solution lies.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    49. Re:Verilog by Vreejack · · Score: 1

      What a great "idea". Unfortunately you still have to divide first in order to determine your multiplicand.

      For example. Dividing by 34.527. How do you know what to multiply by instead? Do you have a very, very large lookup table? Now there is a solution. Get rid of multiplication AND division and just use lookup tables. That should be fast.

      --
      "Will future ages believe that such stupid bigotry ever existed!" -- Ivanhoe
    50. Re:Verilog by fatphil · · Score: 1

      Is it possible to make a quicksort run "well" in hardware? What are you going to do for the stack, and how big will you make it so that everything still works in the worst case.?

      Compare that to a trivial network sort that makes use of the inherent massive parallelism possible in FPGAs.

      Is your O(n^(1+eps)) really "running well" next to an O(n^(1/2+eps))?

      --
      Also FatPhil on SoylentNews, id 863
    51. Re:Verilog by Jane+Q.+Public · · Score: 1

      Is it possible to make a quicksort run "well" in hardware? What are you going to do for the stack, and how big will you make it so that everything still works in the worst case.?

      Yes, it is as possible to make it run as well in custom hardware as anywhere else.

      You people have kept making the same arguments, and I have kept answering them. Do you have a reading comprehension problem?

    52. Re:Verilog by fatphil · · Score: 1

      But your definition of "well" appears to include "desperately inefficiently"?

      If lots of people think you're using a term badly, then maybe, just maybe, you're at fault - did that ever cross your mind?

      --
      Also FatPhil on SoylentNews, id 863
    53. Re:Verilog by Asmodae · · Score: 1

      Not really. The biggest conversion issues I deal with (when converting algorithms to hardware) are related to how software treats RAM vs how hardware treats RAM. They are fundamentally different methods of operation. In software RAM is cheap/free, so it is preferred over CPU cycles. In hardware, the processing is cheaper (in general) and RAM is more expensive.

      Buffering and holding a megabyte of data between each stage of processing is natural and very easy for software. But in hardware this is a very inefficient way to do things. Converting from one method to the other can be quite difficult depending on the algorithm.

    54. Re:Verilog by Jane+Q.+Public · · Score: 1

      "If lots of people think you're using a term badly, then maybe, just maybe, you're at fault - did that ever cross your mind?"

      And your definition of "well" means consuming X power and Y hardware?

      I will repeat this for you one more time: *I* was the one who wrote above that there could be misunderstandings about the meaning of "well". You haven't been adding ANYTHING to the conversation.

    55. Re:Verilog by Asmodae · · Score: 1

      To be fair, the definition of "well" I intend isn't an arbitrary X/Y value. There's already very well defined numbers for the hardware which currently runs the algorithm. To transfer "Well" to custom hardware would be somewhere in the vicinity of: less than the original general purpose CPU by enough that it justifies the design effort involved and doesn't cost MORE to manufacture. All engineering decisions are trade-offs, and if the trade-off isn't worth the effort and resource cost you don't do it. For a transfer effort to go "Well" means at the end of the day you come out ahead somewhere.

      If you have to spend 3 million dollars on custom hardware development just to get performance parity with a COTS general purpose CPU... you'd be hard pressed to call that "well" by any measure. This is what is implied by the setup of the original Ask Slashdot question, asking an engineering question about feasibility and cost.

    56. Re: Verilog by MickLinux · · Score: 1

      I should also note that it is possible to MULTIPLY the factors to get the target number, which is simply an incrementally-ordered shift and an addition and subtraction.

      Example: with all numbers in floatingepoint binary,

      Take log (1011.0111)
      T=1.0110111, Mantissa=11

      1.1>T; don't use 1.1
      1.01T, try 1.0000000001)

      So :

      log(1.0110111)=log(1.01)+log(1.001)+log(1.000001)+log(1.00000000001)+...

      And

      Log(1011.011)=11+log(1.0110111)

      --
      Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
    57. Re:Verilog by Jane+Q.+Public · · Score: 1

      "If you have to spend 3 million dollars on custom hardware development just to get performance parity with a COTS general purpose CPU... you'd be hard pressed to call that "well" by any measure."

      Must I repeat this yet again? I was using "well" to mean it was possible to get it to run well on custom hardware. GP's comment may have had a different definition of "well". We know this, we acknowledged it a long time ago, and this is all just a rehash of what has gone before.

      Don't misunderstand me. I'm not trying to be rude. But comes a point at which I tire of repeating myself.

    58. Re:Verilog by niftymitch · · Score: 1

      If you want to trade transistors for time, just use a CPU.

      Anyway when making a chip, 24,000 transistors is not much. You don't want to do it everywhere sure, but a couple of times and it isn't an issue.

      Gark... with a MC14500 your can (http://www.linurs.org/mc14500.html)
      with an Intel 4004 you can. With a MC6800 you can.... build a system...

      If I recall the Motorola MC68000 was about 68000 transistors
      but a "C" program on the 68K runs a lot slower than on a modern
      processor with a couple billion transistors. Nvidia is beyond 7Billion transistors
      for their high end graphics.

      There is something missing in big buckets here.

      Lock the door, air gap a design lab, get some large as heck FPGA from Xilinx
      and go to work. If you have something magic you want to own it but the turf
      is well occupied so market and price points will matter.

      You can use FPGA parts for subsystems --- WP reminds
      me that Xilinx currently holds the "world-record" for an FPGA containing 6.8 billion transistors.
      so you can get a lot done on field programmable devices -- or tiled arrays of parts.

      --
      Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
    59. Re:Verilog by niftymitch · · Score: 1

      I was misunderstanding my notes.

      You would need several thousand transistors for a standard DIV circuit, and then the CPU would need to iterate through the operation many times in order to perform a division.

      ...snip....

      Trivia...
      the MC68000 took 144 clocks to finish a DIV.

      Many processors are microcode engines under the hood. Modern ALU blocks are
      big but can be purchased as a library.

      --
      Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
    60. Re: Verilog by MickLinux · · Score: 1

      Sorry, it looks like all my math got eaten up between less-than and greater than signs.

      --
      Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
    61. Re:Verilog by Bing+Tsher+E · · Score: 1

      If you don't understand how a dual op-amp and passives can be made into a 'moderately complex system' then stick to your expensive DSP processors.

    62. Re:Verilog by fisted · · Score: 1

      I'm pretty well-aware of what you can do with opamps and RLC, thanks. Not that you mentioned any RLC in the first place, you also left them out in your price comparison, so why would i assume you did anything more complicated than a buffer?

      but even both opamps in some exotic configration don't really match what i was referring to by 'moderately complex system'. two operations are not moderately complex.

    63. Re:Verilog by Andy+Dodd · · Score: 1

      That is indeed what I meant by "transfer well". If it requires 4x the cost of a general purpose CPU to get an FPGA to match the performance for a given algorithm - then the FPGA wins.

      As an interesting reference point - in many cases, "custom hardware" for some algorithms is now winding up something more along the lines of a tweaked CPU with a modified instruction set than dedicated hardware (such as the video encoding/decoding blocks of most ARM CPUs, such as TI's Ducati engine and Qualcomm's vidc, both of which are running firmware that is loaded by the applications processor on ??? architectures - vidc might be Hexagon just like most of Qualcomm's basebands are) Determining the proper tradeoff of hardware vs. software requires a lot of work and whether it is worth it depends on a lot of things (cost per unit, number of units shipped, etc.)

      --
      retrorocket.o not found, launch anyway?
  4. Why don't they know? by Anonymous Coward · · Score: 5, Insightful

    You'd think the "electronics manufacturer" would have some idea how to estimate this.

    1. Re:Why don't they know? by janeuner · · Score: 5, Insightful

      They do have a way. They asked if it had already been determined.

      The correct response is, "We don't know."

    2. Re:Why don't they know? by i+kan+reed · · Score: 1

      Because manufacture doesn't necessarily mean design expertise?

      Warning: Car analogy inbound.
      Why can't the workers on the assembly line of a GM plant design a car?

    3. Re:Why don't they know? by Sarten-X · · Score: 3, Funny

      Because they're robots with no AI functionality?

      --
      You do not have a moral or legal right to do absolutely anything you want.
    4. Re:Why don't they know? by Megane · · Score: 3, Informative

      A more accurate car analogy would be GM wanting to build a car using your technology and asking you how many assembly line workers it would take.

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    5. Re:Why don't they know? by AndyKron · · Score: 1

      I'm thinking the "electronics manufacturer" doesn't have an idea, or they wouldn't have asked that question.

    6. Re:Why don't they know? by Goaway · · Score: 1

      Or, you know, they asked so they wouldn't need to duplicate work that has already been done, in case they had it figured out already.

    7. Re:Why don't they know? by nebular · · Score: 1

      Why are the workers on the assembly line speaking to anyone about the design of the car. The engineers who design or maintain the plant should be speaking to the artists and engineers who designed the car.

      So the engineers who know how to deisgn chips should be speaking to the programmers who made the algorithm. If those engineers are unable to translate an algorithm into silicon I'd be very worried about that company.

    8. Re:Why don't they know? by daboochmeister · · Score: 1

      Maybe the algorithm is proprietary, and dryriver's company doesn't want to release it to the manufacturer yet, even under NDA. Hard to estimate what you're not allowed to see. Just a thought.

      --
      "Ahh! I see you're in that indeterminate Schrodinger state where - oh, uh ... never mind." Dave Bucci
    9. Re:Why don't they know? by sir-gold · · Score: 1

      Because it's not their job. GM has engineers and designers to do that.
      If this hardware manufacturer doesn't have a design team (and only does post-design production) then it's time to find a different manufacturer

    10. Re:Why don't they know? by Kagetsuki · · Score: 1

      The electronics manufacturer must have assumed they had some concept of how to design ASICs if they were even calling. This is the equivilent of somebody painting a picture of a house, then calling a carpenter and saying "I've designed a house, I'd like you to build it". Both a painting and drafted design documents are images of a house, just one gives you technical information like how much wood and how many nails you will need and the other does not.

      I imagine the electronics manufacturer must have asked the question and was dumbfounded when they couldn't give any sort of answer.

    11. Re:Why don't they know? by jbo5112 · · Score: 1

      The electronics manufacturer probably hasn't seen the algorithm at this point. I assume they're still trying to figure out things like design cost, build cost, and feasibility before making a commitment to buy, and the software company doesn't want to give it away without a contract for payment. I would add up all the different operations of each type in your algorithm, along with some information about looping etc. and present this to the hardware company, but you would have to get a careful balance between giving them enough information to help and enough to build it themselves.

      A hardware implementation can vary widely for a single algorithm. For example there are many implementations for running x86 instructions. A Haswell chip should run the same code that a 286 does, but with more transistors, higher IPC and a modified algorithm. If you look at closer processor generations, you may even see a repeated algorithm at some points.

  5. Just like any other software project by mbadolato · · Score: 5, Funny

    Make up a number, then when they complain that it was way off, blame it on their management changing scope a hundred times throughout the life of the project!

    1. Re:Just like any other software project by Anonymous Coward · · Score: 1

      42!

    2. Re:Just like any other software project by Tablizer · · Score: 1

      Or just say, "Our best estimate is somewhere between 7 and 38,000,000,000".

    3. Re:Just like any other software project by captain_nifty · · Score: 1

      42 factorial... hmm.

      1.4 x 10^51... yeah that ought to be enough gates.

    4. Re:Just like any other software project by Imrik · · Score: 3, Funny

      Made me think of this.

  6. C to HDL to netlist by Anonymous Coward · · Score: 2, Informative

    As a first-order approximation, you can translate your C/C++ code to a hardware description language (HDL) such as VHDL or Verilog. Tools exist for this process. The result won't be optimized for the peculiarities of HDL, but it will provide a good start. From there, you can port the HDL to a Xilinx or Altera FPGA netlist using vendor-specific tool chains. The porting effort will summarize the logic and memory resources of your implementation. Any digital hardware engineer worth their salt should be able to translate FPGA utilization metrics into whatever platform they are interested in.

  7. Try Stackoverflow perhaps? by Anonymous Coward · · Score: 5, Insightful

    I think you may have a better chance of getting an answer if you ask this question on Stackoverflow (or one of its related sites).

    Unfortunately, I think asking on Slashdot is only likely to get you some tired and outdated memes / jokes...

    1. Re:Try Stackoverflow perhaps? by greg1104 · · Score: 2

      The world is full of monkeys with precious little comprehension about the things they write let alone their theory of application

      Fixed that for you. Ninety percent of everything is crud.

    2. Re:Try Stackoverflow perhaps? by rasmusbr · · Score: 2

      The world is full of monkeys with precious little comprehension about the things they write let alone their theory of application

      Fixed that for you. Ninety percent of everything is crud.

      Off topic, but it is more interesting than that...

      When are you the most excited about some new idea or concept? When is your impulse to share technical ideas the greatest? Well, usually right after you've learned it, or rather when you think you've learned it but in reality you've only got a half-decent grasp of the idea and you still have have a number of the details completely wrong. The exception to this rule is that some highly skilled and knowledgeable people take pleasure in beating less knowledgeable people in the head with their knowledge. So there you have it: the virtual world of teachers consists of a lot of well-meaning people who don't know what they're talking about and one or two jerks who do.

      Obvious question: What does this tell us about people who like to give sex advice on the internet?

    3. Re:Try Stackoverflow perhaps? by greg1104 · · Score: 1

      I'm probably the wrong person to comment on this, since by your classification I'm one of the jerks.

    4. Re:Try Stackoverflow perhaps? by KevReedUK · · Score: 1

      You can't be serious..? S.O. is full of monkeys...

      Then, by virtue of the infinite monkeys theorem, the OP might actually get the kind of answer he needs!?!

      --
      Just my $0.03 (At current exchange rates, my £0.02 is worth more than your $0.02)
    5. Re:Try Stackoverflow perhaps? by Nethemas+the+Great · · Score: 1

      Alas I regret I must concede a point to you on this matter. Well met sir.

      --
      Two of my imaginary friends reproduced once ... with negative results.
  8. They shouldn't be asking you. by pavon · · Score: 5, Insightful

    If they plan on implementing this in hardware, then they should have people who are capable of answering that question. If instead, they are just a manufacturer and aren't capable of doing the actual hardware design, then you have bigger problems than answering this question. That is something you should find out about ASAP.

  9. Minecraft by nbetcher · · Score: 5, Funny

    Develop out the algorithm in Minecraft using ProjectRed (Integration module, specifically) and then you can easily count the gates! :-)

    1. Re: Minecraft by DigiShaman · · Score: 1

      An ASIC of Minecraft? Brilliant!!!

      --
      Life is not for the lazy.
  10. Have you tried telling them by Anonymous Coward · · Score: 1

    "We don't know."

  11. hls design by Anonymous Coward · · Score: 1

    The most common languages for chip or FPGA design would be VHDL or Verilog. Now there is also High Level Synthesis (http://en.wikipedia.org/wiki/High-level_synthesis), in which you can use C/C++ directly. So if your using a tool like Xilinx's Vivado (http://www.xilinx.com/products/design-tools/vivado/integration/esl-design/) then you can go directly from C/C++ to gate count. However, even in C/C++ it probably needs lots of work from where it is.

  12. Completely stupid question by dskoll · · Score: 4, Insightful

    The question "How many gates does it take to implement this algorithm?" is stupid. It's like asking "How long is a piece of string?"

    There will always be a time/space tradeoff, even with translating an algorithm to hardware. You can save time by throwing more gates at the problem to increase parallelism, or you can save space by reusing gates in sequential operations.

    1. Re:Completely stupid question by presidenteloco · · Score: 1

      Yes, theoretically, according to Turing, you could get by with enough gates to make a couple of registers, a goto/jump instruction and a branch if is-zero test, as long as you have some read-write memory somewhere else.

      --

      Where are we going and why are we in a handbasket?
  13. You need a C to VHDL translator by Animats · · Score: 4, Informative

    You need a C to VHDL translator. Here's a tutorial for one.

    Only the parts of the algorithm that have to go really fast need to be fully translated into hardware. Control, startup, debugging, and rarely used functions can be done in some minimal CPU on or off the chip. So, for sizing purposes, extract the core part of the code that uses most of the time and work only on that.

    1. Re:You need a C to VHDL translator by Trepidity · · Score: 4, Informative

      One caveat to going this route: if the algorithm contains well-known operations as building blocks, you probably don't want to synthesize your own VHDL versions of those standard operations, since they already have highly optimized hardware implementations. For example, if one step of the algorithm is "compute an FFT", you probably want to use an existing FFT IP core to implements it, rather than translating some FFT C code to new VHDL.

      At one extreme, where the algorithm is nothing but a chain of such cores (common in DSP applications), you could get a rough estimate just by looking up the gate counts for each operation and adding them up.

    2. Re:You need a C to VHDL translator by iggymanz · · Score: 1

      I'm worried about dryriver's "electronics manufacturer", that kind of skill and knowledge should be a core competancy of any business that makes custom app chipsets

    3. Re:You need a C to VHDL translator by HalWasRight · · Score: 1

      How did this get modded up to "Informative"? This is misinformation. If you believe what an FPGA vendor tells you about their tools then I have some land in Florida you might be interested in. There is NO push button path from C to hardware, unless you consider compiling the C into object code that is burned into ROM as a hardware solution. Yes, there are tools like Cynthesizer from Forte and the cited tool from Xilinx that use C as an input language, but it is gerrymandered C geared toward synthesis, not "dusty deck" C. As stated above, there are too many tradeoffs in time and space to provide a simple answer to your interested party. You should hire someone who can find a couple of points in the solution space and give your interested party an educated answer like "At xx mm^2 it runs this fast with this latency, while at yy mm^2 it runs this fast at this latency with 50% better power".

      --
      "This mission is too important to allow you to jeopardize it." -- HAL
  14. about 40 gates. by Anonymous Coward · · Score: 1, Interesting

    it would only make sense to reuse the same adder circuit for each addition, instead of making a separate adder circuit for each operation.
    then you'd add control logic to move the data to adder circuits, multiplier circuits, etc.
    then essentially what you have is a microprocessor.
    then you just turn that microprocessor into the simplest one possible. which is basically a queue and a stack, and a few elementary logic operations. you can do operations a bit at a time.
    so the number of logic gates your program needs it the number to make a queue and a stack, and a few elementary logic operations, and that's probably on the order of about 40 gates.

    1. Re:about 40 gates. by behrooz0az · · Score: 1

      40? How did you come up with that number?
      A radio from 80s probably has more gates, and they don't add or multiply.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion. -- Spazmania (174582)
  15. VHDL by Anonymous Coward · · Score: 3, Informative

    Implement the algorithm in VHDL and test it on an FPGA. I would imagine you'll need to pay someone $$$$$ to do that for you...

  16. Cost estimate by tepples · · Score: 1

    Maybe they can do the translation, but they need a number for how many gates so that they can give a number for how many dollars.

    1. Re:Cost estimate by EndlessNameless · · Score: 2

      If they can design the hardware, they can ask for the source and supply the quote themselves.

      If they can't, then OP needs to understand they have no practical design capabilities and plan on paying someone else to design it---before paying these guys to manufacture it. Or he can search for a shop that can handle both the design and the manufacture.

      --

      ---
      According to the latest ruleset, this post should be modded as Vorpal Flamebait +5.
  17. Break down your algorithm by neutrino38 · · Score: 4, Informative

    Hello,

    It is probable that you can break down your algorithm -(I do not mean code) into a pipeline of elementary processing and find implementations (IP) for each of them.

    to give out an estimate:
    - subdivise your algorithm into simpler pieces
    - find for each simple piece how it can or could be implemented in hardware and the complexity of each piece.
    - do the sum.

      Indeed an hardware designer or consultant would be of a great help here.

  18. HLS by orledrat · · Score: 4, Informative

    What you want to do is called high-level synthesis (going from C to hardware description language (HDL) to generating gate-lists from that HDL) and there's plenty of software to do that with. A neat open-source package for HLS is LegUp (http://legup.eecg.utoronto.ca/), check it out to get an idea of what the process consists of.

  19. It doesn't work like that by cjonslashdot · · Score: 5, Informative

    It's about more than gates. It is about registers, ALUs, gates, and how they are all connected. There are many different possible architectures, so it depends on the design: some designs are faster but take more real estate. There are algorithm-to-silicon compilers (I know: I wrote one for a product company during the '80s and it is apparently still in use today) but each compiler will assume a certain architecture. I would recommend one but I have been out of that field for decades.

  20. C code synthesis tools exist, but... by Anonymous Coward · · Score: 1

    There are a number of tools on the (commercial) market that can compile (a subset of) C to hardware into a hardware description language (Verilog/VHDL), e.g. from Cadence and Synopsys. See http://en.wikipedia.org/wiki/High-level_synthesis for an overview of the approach and links to tools. There are also some open source tools that can turn C code into Verilog or VHDL, but they are not very mature in my opinion.

    However, you will not get a single number for gate complexity out of these tools. Depending on the requirements and tradeoffs (smaller chip area vs. higher speed, single cycle vs. pipelined implementation, target device as FPGA or ASIC), the number of gates (or logic blocks for FPGAs) required will differ significantly. From my experience, you definitely need a hardware design expert to obtain useful (i.e., more-or-less optimized) results from these tools - and you should expect having to invest significant effort for restructuring your C code so the high-level synthesis tools can grok it.

  21. Easy calculation by Anonymous Coward · · Score: 5, Funny

    Here is a proven method for calculation.

    If your code is:
    a) C: divide the number of lines with 7
    b) C++: divide the number of lines with 5
    c) Ruby/Python/Java: divide the number of lines with 3
    d) Perl: multiply the number of lines with 42
    e) C#: resign.

    1. Re:Easy calculation by neiras · · Score: 1

      Did you pull these numbers from your rectal database? Given these rules, it should be theoretically possible to put the Linux kernel in a chip without a general-purpose CPU...And since it completely bypasses the fetch-decode-execute pipeline of a general-purpose CPU, it should run blazingly fast! So, for fewer transistors, we can get probably 10x the performance of running the Linux core on dedicated silicon.

      WHOA THANKS FOR THE PITCH! VENTURE CAPITAL HERE I COME BABY

    2. Re:Easy calculation by Anonymous Coward · · Score: 1

      Wow. Just... wow. Hey harrkev, I think that was a joke. Maybe you've had enough - you're embarrassing yourself. Get some sleep.

  22. More details please by ttg512 · · Score: 1

    The answer, as you might imagine, is complicated and depends on how these gates are implemented. Think for instance you could design a chip to do this, you could write RTL to do this in an FPGA, or you could even write the algorithm into more software on an embedded processor of some kind. Is this electronics manufacturer one that makes chips or one that makes systems (boards, cases, etc). If it is the former they should have people who can work with your people to figure this out. If it is the latter then why do they care? Are they really asking you to provide a chip which implements your algorithm? Ask some more questions...

  23. Troll bait? by khb · · Score: 1

    The question seems so ill-posed that one has to wonder if there's a product or service advert lurking... but assuming this is real.

    Software doesn't automatically translate directly to hardware. As others have noted, break out the algorithmic core from the setup and finish. Presumably there is some part of the code which is the most critical in steady state. Describe that to their hardware engineers in whatever depth is required. Depending on the algorithm, the ASIC library elements available (or FPGA units, etc.) you may want to make some substantial adjustments to the "code" to make it fit within the design parameters of the available device. This should be an iterative process, not a single estimate based on a pure software perspective.

    If there isn't a clearly identifiable set of "hot blocks" the chances of there being a good hw implementation fit is poor. If there is, it may still be necessary to change the algorithm details to fit but it should be "doable". Whether it is worthwhile depends on the volumes and the performance gains.

  24. Sounds like a joke by Cryacin · · Score: 4, Funny

    How many Gates will it take to implement your software project?

    One. His name is Bill, and here is yours.

    --
    Science advances one funeral at a time- Max Planck
    1. Re:Sounds like a joke by klubar · · Score: 1

      Actually you have your choice (these and many more). Probably with all of these gates you could solve almost any problem:
      Bill Gates (Chairman of Microsoft)
      Melinda Gates (American philanthropist)
      Robert Gates (Former Defense Secretary)Antonio Gates (San Diego Chargers Tight End)
      Brent Gates (American professional baseball player)
      Clyde Gates (New York Jets Wide Receiver)
      Lionel Gates (American professional football player)servants[edit]
      Artemus Gates (American financier and Undersecretary of the Navy)'

    2. Re:Sounds like a joke by TheGratefulNet · · Score: 1

      wow, that reminds me of a very old sigfile:

      "my computer has AND-gates, OR-gates and NOT-gates, but no bill gates"

      and yes, it was from unix guys, probably 10 or so years ago.

      --

      --
      "It is now safe to switch off your computer."
    3. Re:Sounds like a joke by Alioth · · Score: 2

      It's such a shame that Gates McFadden from ST:TNG didn't marry Bill Gates. Then she could have been Gates Gates.

  25. Oh crap by fiannaFailMan · · Score: 1

    Mod this offtopic if you want but now I can't see my comments, I can't see if anyone has responded to them, and it has become almost impossible to participate in discussions as a result. WTF, /.?

    --
    Drill baby drill - on Mars
  26. Difficult, but... by ciw1973 · · Score: 1

    ...this should get you started:

    http://en.wikipedia.org/wiki/C_to_HDL

    Find a suitable converter, then grab a free (or evaluation) version of an FPGA design tool, for example one of these (I only suggest these over the many other, probably equally as good alternatives, as I've used them myself):

    http://www.xilinx.com/products/design-tools/ise-design-suite/index.htm

    And with a bit of work you should be able to produce output that will essentially be your code implemented in programmable logic, and the tools will tell you the number of gates/cells required.

    What I would say, is that you'll have a much easier ride if your algorithm is in C rather than C++.

    Despite saying that you have no experience with this sort of thing, defining logic in something like VHDL is basically programming. Sure, you'll need to develop a fair understanding of the hardware, but with the libraries of pre-built components available from the numerous companies who produce programmable hardware like FPGAs and CPLDs, you may find you could do a lot more than you think yourself.

    1. Re:Difficult, but... by Asmodae · · Score: 1

      VHDL is basically programming

      Sure, the same way software is basically just english and letters and numbers and if you understand those you can do most any software yourself! /sarc.
      VHDL is code, but after cleaning up after software people who think they can write VHDL, it's not the same thing at all. The key statement is

      Sure, you'll need to develop a fair understanding of the hardware

      This is by no means a light or trivial task. There's even entire university degrees dedicated to it. ;) But if you have all THAT, then sure writing hardware code is a snap! In all seriousness the above statement basically says it's easy if you have the skill set already.

      Just don't make the mistake of thinking that if you understand BOTH hardware and software that they are equivalent, or that everyone else shares in your expanded understanding. I've seen programs fail because people try to treat hardware like software simply because they're both captured with some text. It's a dangerous viewpoint if you want your project to succeed.

  27. Cadence C to Silicon by solidraven · · Score: 1

    Haven't tried it, but Cadence's C to Silicon might be up for the job. Also keep in mind that in hardware you have very different requirements than in software, and parallellisation has interesting effects on the number of gates. The best option is to get an EE, preferably with experience in digital design, to take a look at it. Other options are SystemC compilers, but they're not really up to production use yet as far as I know. And it is also very technology dependant, sometimes complicated logical functions that are common are implemented directly. This isn't something you can just wing!

  28. No good answer by kosh271 · · Score: 1

    There really isn't a great way to answer your question without a detailed analysis of your code.

    There are more factors to the number of gates required for a given task than just the complexity of code. Clock speed can be a major factor in determining the number of gates required for a given algorithm. Another major factor is the part you are targeting. The number of design elements in FPGAs used can change just by targeting a different device family.

    Even if your algorithm was small enough to fit into a part, there are other issues that could arise (such as not enough bandwidth or pins for your memory device(s)).

    It sounds like the electronics manufacturer doesn't have the resources to determine the number of gates for you. It looks like your only avenue is to ask a third party to review your code (under NDA) to help you determine the approximate gate requirements. This won't be cheap.

  29. software as hardware?! but but but software patent by raymorris · · Score: 1, Insightful

    Clearly it's not possible to render a software program as hardware. If everyone who explained the process (use Verilog) above is correct, that would mean that the exact same algorithm exists as both hardware and software.

    We can't have the same algorithm exist as both hardware and software, because that would mean algorithms are hardware just as much as they are software.
      that would mean all the people whining about "software patents" may as well be whining about unicorns. I hereby declare Verilog, ASICs, and FPGAs to be non-existent so we can continue to pretend that there is such a thing as a "software patent".

  30. Rough Approximation by excelblue · · Score: 1

    This is a horrible question to ask. Software is a tool to lower hardware requirements.

    Compile your algorithm to the simplest RISC architecture reasonable. For most, something among the lines of ARM or MIPS works. Then, take note of all variables and add up how much RAM they'll take. Consider every bit (yes, bit, not byte) as a D-flipflop and convert every instruction (post-compile, in assembly) into a respective set of logic gates. A bit of googling should get you those values.

    If your algorithm is reasonably complicated, chances are, you'll get a number that seems absurdly high compared to what state-of-art hardware is available.

    In practice, it's probably best to just pick an off-the-shelf CPU and run the software on it. There might be some parts that are better done in hardware than in software, but you should get someone who knows what they're doing for that.

  31. It's an optimization problem by swm · · Score: 5, Insightful

    You already have your algorithm running in electronic hardware, right?
    Your current gate count is the sum of
      * the gate count of your CPU
      * the gate count of your RAM
      * the gate count of your program ROM

    So that's an upper bound on the gate count.
    If that number is too big for your manufacturing partner,
    then you have an optimization problem.

    Optimization is a hard problem...

    1. Re:It's an optimization problem by Austerity+Empowers · · Score: 1

      RAM and ROM, not being comprised chiefly of logic "gates" would probably not be all that helpful.

  32. Accurate answer by Sarten-X · · Score: 3, Informative

    Write out the truth table for each output as a Karnaugh map incorporating every input. Count the number of gates needed to solve the map, and that's your answer for that output bit. Repeat for every other output bit. Add all those numbers together, and that's a fair estimate of how many gates you'll need.

    Of course, this method requires that your number of input bits must be fairly small. Don't forget that memory counts as both input (when read) and output (when written). For nontrivial applications, you'll find that the number of gates quickly approaches "a lot".

    --
    You do not have a moral or legal right to do absolutely anything you want.
    1. Re:Accurate answer by mrego · · Score: 2

      Since they are translating a program/algorithm into circuitry, they need only to know the maximum number of gates that are used at any one cycle time (taking into account necessary time delays), so just adding all the gates per operation way over states the answer since and, not, or, etc. gate circuitry can be reused for different operations at another cycle time. Also, as for logic operations, run it through a Quine-McClusky optimization as well to minimize them.

  33. Gate count more a matter of speed by Yoik · · Score: 2

    It doesn't take many gates for a Turing machine that will run your algorithm but it's likely to be slow. A proper hardware implementation will optimize everything and be as parallel as possible.

    The problem as stated isn't adequately constrained.

  34. "Graphics" algorithm... right by ArcadeMan · · Score: 1

    You guys are probably trying to get a manufacturer to make Scrypt-mining ASICs.

    1. Re:"Graphics" algorithm... right by neiras · · Score: 1

      Yep, first thing I thought too.

    2. Re:"Graphics" algorithm... right by theskipper · · Score: 1

      Bingo. And the hardware guys recognized it immediately too. Mainly because they're probably getting emailed the same question 10 times a day.

      There's a reason scrypt asic has been a long time in the making, it's memory intensive. Only alpha-t.net seems to be making headway to a viable product. But even with them taking preorders now, nothing is written in stone. Commercially and technically, SHA asic was an easier cat to skin.

  35. The key to success. by ttucker · · Score: 5, Insightful

    Do not ask a computer scientist to be an electrical engineer.

    1. Re:The key to success. by multimediavt · · Score: 1

      Do not ask a computer scientist to be an electrical engineer.

      And for the sake of Pete's dragon don't hand him/her an electric screwdriver! Chaos will ensue.

    2. Re:The key to success. by crankyspice · · Score: 2

      Do not ask a computer scientist to be an electrical engineer.

      Except ... Wow. An early course in my computer science curriculum was:

      201. Computer Logic Design I (3)
      Prerequisite: MATH 113 or equivalent all with a grade of "C" or better.
      Basic topics in combinational and sequential switching circuits with applications to the design of digital devices. Introduction to Electronic Design Automation (EDA) tools. Laboratory projects with Field Programmable Gate Arrays (FPGA).
      (Lecture 2 hours, lab 3 hours) Letter grade only (A-F).

      (We used Verilog and a Xilinx FPGA board.) I'm surprised a reputable CS degree wouldn't require at least a basic course in digital logic; Cal State Long Beach is a great school, but it's certainly not a standards bearer...

      --
      geek. lawyer.
    3. Re:The key to success. by Anonymous Coward · · Score: 1

      Yeah, that's for breadth of experience. It does not make you an electrical engineer any more than your college chemistry course made you someone Dow should be asking for advice.

    4. Re:The key to success. by geoskd · · Score: 3, Insightful

      Except ... Wow. An early course in my computer science curriculum was: 201. Computer Logic Design I (3) Prerequisite: MATH 113 or equivalent all with a grade of "C" or better. Basic topics in combinational and sequential switching circuits with applications to the design of digital devices. Introduction to Electronic Design Automation (EDA) tools. Laboratory projects with Field Programmable Gate Arrays (FPGA). (Lecture 2 hours, lab 3 hours) Letter grade only (A-F). (We used Verilog and a Xilinx FPGA board.) I'm surprised a reputable CS degree wouldn't require at least a basic course in digital logic; Cal State Long Beach is a great school, but it's certainly not a standards bearer...

      There is a world of difference between an entry level college course on ASIC/FPGA design, and actually being able to do the job. Just because you can design and synthesize a projct with a few hundred gates in it does not mean you are even remotely prepared to know where to begin a project with 10^6+ gates in it. More impotantly, high level software languages allow for indescriminant serial loops which are massively difficult to deal with in pure hardware. In short, the design methodology is completely different if you are trying to build for a software path, or a hardware path. You need someone with a hardware mindset to take your algorithm back to scratch and start over. Even knowing the HDLs is not good enough, as it is relatively trivial to write "valid" VHDL or Verlilog code that cant be synthesized...

      --
      I wish I had a good sig, but all the good ones are copyrighted
    5. Re:The key to success. by Kagetsuki · · Score: 1

      Yeah but you took ASM too and I seriously doubt you would call yourself a capable ASM developer unless you happen to be doing a lot of embedded code. Just because you've done some labs doesn't make you a pro. I've done FPGA dev using Verilog as well, and I've done enough to understand what it is and how to do it. I've also done enough to know if I wanted to make an efficient ASIC for a production application I'd shell out some cash to hire a pro rather than just assuming I could do it well myself without any professional experience or analysis.

  36. Synthesis tools and estimation by slew · · Score: 1

    Theoretical answer:
    Recode your algorithm in SystemC (a c++ library that can be used to implement a register transfer language representation of your algorithm) and synthesize it with one of the available tools (e.g., Accelera, Synopsys, Calypto, etc) targeting a typical library (e.g, 28nm TSMC), at a particular clock frequency.

    Practical answer:
    Ask someone with hw design experience to estimate it for you...

    FWIW, nobody wants an "exact" size in logic gates, all they want an idea in complexity. The big ticket items people care about are the size in bits of RAMs (and how many simultaneous read/write ports it might need) and complicated math that is likely to take more than 1 clock cycle to complete (e.g., like a floating point math operation) and the data-width of the main data path at the throughput that you want to have. Simply multiplying the data path width by the estimated number of pipeline cycles is generally proportional to the eventual area minus the RAMS and special math ops (which is why you need to identify those parts separately).

    Generally, I've found that naïve "software" algorithms have not been very amenable to HW implementation without some amount of rework and the fact that you do not have an answer to the posed question would likely lead me (and probably your potential customer), that your algorithm is half-baked from a HW implementation point of view... Just food for thought...

  37. Convert to known assemby language? by Giblet535 · · Score: 1

    Short answer: you need to contract an electronics engineer. Possible: You could dump the non-optimized assemby language (-S on most compilers) for a popular processor family e.g., 80686, PA-RISC, etc. The manufacturer probably has resources to convert "this pile of 80686 instructions" to "an ASIC that does the same thing really well".

    1. Re:Convert to known assemby language? by Overzeetop · · Score: 1

      Funny, that was my thought. I don't know squat about it, but it seems like a starting point if you had to make an educated guess.

      --
      Is it just my observation, or are there way too many stupid people in the world?
  38. Here's my circuit for a simple problem:Good Luck! by deathcloset · · Score: 1


    For what it is worth, here is a circuit I developed to see what the gate configuration (nor only) would look like for the implementation of a condition that the input switches be:

    0
    1
    01
    11
    http://www.neuroproductions.be/logic-lab/index.php?id=3699

    you know, the counting integers 0,1,2,3 - the same code that I have on my luggage. I thought there might be a fun implementation related to security or something - a hybrid mechanical/electronic locking system.

    It turned out to be super hard for me to figure out this final result. Nonetheless, the result was most interesting and I encourage you to find a more efficient configuration.

    I did this using basic logic and a crapton of that time-honored tradition of guessing and trial and error.

    I can only begin to imagine the complexity of trying to implement and design circuits based on algorithms written in anything above assembler level.

  39. Easy... by verbatim_verbose · · Score: 1

    "It takes one gate that accepts our input and outputs a desirable answer. We would like you to design that gate."

  40. Re:Here's my circuit for a simple problem:Good Luc by deathcloset · · Score: 1

    *oopsy* that should have been
    0
    1
    10
    11

    but you knew that ;)

  41. One MILLION gates! by GodfatherofSoul · · Score: 1

    Then, stick your pinky into the corner of your mouth and do your best evil laugh!

    --
    I swear to God...I swear to God! That is NOT how you treat your human!
  42. I've done this before by Asmodae · · Score: 4, Informative

    There's been several people who suggested using a high-level synthesis tool to convert your software (c/c++) directly to HDL (verilog/VHDL) of some kind. This can work and I've been on this task and seen it's output before. The catch is; unless that software was expressly and purpose written to describe hardware (by someone who understands that hardware and it's limitations and how that particular converter works), it almost always makes awful and extraordinarily inefficient hardware.

    Case in point - we had one algorithm developed in Simulink/Matlab that needed to end up in an FPGA. After 'pushing the button' and letting the tool generate the HDL, it consumed not just 1 but about 4 FPGAs worth of logic gates, RAMs, and registers. Needless to say the hardware platform only had one FPGA and a good portion of it was already dedicated to platform tasks so only about 20% was available for the algorithm. We got it working after basically re-implementing the algorithm with the goal of hardware in mind. The generation tool's output was 20 times worse than what was even feasible. If you're doing an ASIC you can just throw a crap-load of extra silicon at it, but that gets expensive very quickly. Plus closing timing on that will be a nightmare.

    My job recently has been to go through and take algorithms written by very smart people (but oriented to software) and re-implement them so they can fit on reasonably sized FPGAs. It can be a long task sometimes and there's no push-button solution for getting something good, fast, cheap. Techies usually say you can pick two during the design process, but when converting from software to hardware you usually only get one.

    Granted this all varies a lot and depends heavily on the specifics of the algorithm in question. But the most likely way to get a reasonable estimate is going to be to explain the algorithm in detail to an ASIC/FPGA engineer and let them work up a prelim architecture and estimate. The high-level synthesis push-button tools will give you a number but it probably won't be something people actually want to build/sell or buy.

  43. Liar! by nobuddy · · Score: 2

    I just tried this and all my money was transferred to a different account.

    1. Re:Liar! by davester666 · · Score: 1

      And I thank you very much for your contribution to keeping me in the lifestyle I deserve.

      --
      Sleep your way to a whiter smile...date a dentist!
  44. Why not have them figure it out? by Punto · · Score: 1

    While an interesting question (I didn't even know hardware manufacturers were in the habit of converting software into hardware), why don't they figure it out themselves? They must have the tools/people to do it. Are you afraid they'll "steal your algorithm" if you give them the source? (that's much less interesting)

    --

    --
    Stay tuned for some shock and awe coming right up after this messages!

  45. How many gates could a gate chuck chuck if a gate by Anonymous Coward · · Score: 1

    How many gates could a gate chuck chuck if a gate chuck could chuck gates?

  46. Matlab has a solution for this, but $$$ by AmazinglySmooth · · Score: 1

    Look at Mathworks. They have a solution for this.

    1. Re:Matlab has a solution for this, but $$$ by Asmodae · · Score: 2

      They have a tool that can do this, I don't know if I'd call it a 'solution' just yet though. We've just finished ripping out all the 'solution' for our project because we wanted a device that was actually small enough (and thus cheap enough) to be able to sell.

      It takes input designed to be hardware and makes good hardware. It takes input designed to be software and makes shit hardware. It also doesn't handle version control very well, you need proprietary tools to even VIEW the design files... and the output which actually describes the hardware (vhdl) is so obfuscated as to be nearly illegible. The build times are also 4-5 times longer than they need to be, so it takes a whole day to place and route the designs output by this tool. Unless you're building something trivial I wouldn't advise depending on mathworks/simulink tools for a solution.

  47. find a smarter partner by samantha · · Score: 1

    Computers are so cheap and low power today that turning an algorithm into gates would be a silly way to proceed. So the question is not really relevant except academically.

    1. Re:find a smarter partner by Austerity+Empowers · · Score: 1

      Unless you wanted to sell a chip that had this feature built in to it. Thus people do this operation all the time. It just takes someone with RTL experience to do.

  48. We actually solved this.. by rayhoare · · Score: 2

    We (ConcurrentEDA.com) have developed a tool call Concurrent Analtyics that analyzes a program's x86 code and estimates the gate count. This tool works for Xilinx and Altera FPGA chips and provides an upper bound since logic optimization reduces the gate count. Essentially, we have an extensive library of all software assembly instructions and their gate count in an FPGA. Synthesizing software into a chip requires more work but we have an internal tool for that as well. We translate x86 into a hardware description language (HDL) that the vendor's tools synthesize into FPGA gates. Over 1 million lines of high-performance HDL have been generated using these tools since 2006. Both tools are internal tools that we use to offer accelerated FPGA design services. (feel free to contact me directly RayHoare _at_ concurrenteda _dot_ com)

  49. Knowing The Algorithm Is NOT Enough by SplawnDarts · · Score: 5, Insightful

    Knowing what algorithm you want to run in hardware in not even close to enough to estimate gates. You need to know the algorithm, and the required performance, and have a sketched out HW design that meets those goals. THEN you can estimate gate count.

    For a simple example of why this is, consider processors. A 386 and a Sandy Bridge i7 implement very similar "algorithms" - it's just fetch->decode->execute->writeback all day long. If you implemented them in software emulation, it would be very similar software with some additional bits for the newer ISA features on the i7. But a 386 is about 280 THOUSAND gates, and the i7 is about 350 MILLION gates/core - three orders of magnitude different. Of course, there's at least a 2 order of magnitude performance difference too - it's not like those gates are going to waste.

    Point is, knowing the algorithm isn't enough to get even a finger in the wind guess at gate count. If you need an answer to this question, you need to get competent HW design people looking at it.

    1. Re:Knowing The Algorithm Is NOT Enough by Stumbles · · Score: 1

      Their best bet is (if they are still alive) contact the old time hardware engineers predating chips. I worked on several systems in the Air Force designed and built in the early 60s and just before the first microchips were created.

      One such machine was used to calculated the range of geosynchronous satellites using TTL (can transistors only) occupying an entire equipment rack. In a nutshell it was a hardwired computer designed to do one thing and one thing only.

      I think if they could reduce their program to boolean equations it might be possible to get a ballpark idea the needed gates.

      --
      My karma is not a Chameleon.
    2. Re:Knowing The Algorithm Is NOT Enough by Overzeetop · · Score: 1

      Why not? A 386 does (within a certain limit) exactly the same thing that an i7 does, it just does it faster because it can run more operations in parallel and at a higher clock speed (to take a simplistic view). The minimum number of gates required simply sets the baseline speed of the final product. To get more speed you add more parallel processors, up to the available parallelism of the problem to be solved.

      Knowing the algorithm would seem to allow a reasonable lower bound to be placed on the number of gates, from which a baseline speed can be determined.

      If the algorithm takes 175,000 gates and you know the processing speed you can determine the throughput. If you know the necessary/target throughput you know how many pipelines you need and, if you're an ASIC mfr you have a good idea of the % overhead required to parallelize and the in house estimating group can take it from there to evaluate feasibility.

      --
      Is it just my observation, or are there way too many stupid people in the world?
    3. Re:Knowing The Algorithm Is NOT Enough by bigtreeman · · Score: 1

      if the 386 was implemented with the same geometry as the i7 the performance difference would be ???

      --
      Go well
    4. Re:Knowing The Algorithm Is NOT Enough by SplawnDarts · · Score: 1

      Substantial, certainly. The deep pipe for fetch/decode and the superscalar backend make a big difference. Maybe 10x and 2x or so respectively. They also interact with the memory system (system RAM and caches) very differently so it's hard to make a perfect comparison.

  50. best answer by swschrad · · Score: 1

    because if the hardware company is thinking "gates" instead of "cycles," they want to implement it in a FPGA. hell, if they were going to put it on a dedicated microprocessor, they'd just recast it with libraries for that processor and recompile.

    --
    if this is supposed to be a new economy, how come they still want my old fashioned money?
  51. Count operations for a rough gate estimate by erice · · Score: 1

    The manufacturer is probably asking how many gates you need to implement the algorithm exactly as it is coded, with exactly as much parallel or sequential logic as it already has, and that will have a fairly specific answer.

    While that number could be determined, it would not be very useful. Hardware implementation, especially when targeting FPGA's, get most of their performance advantage by exploiting more parallelism than is achievable by running on a processor.

    No, the manufacturer isn't make any assumptions about how the algorithm is translated. The deal in gates. Gates are the most direct measure of how much the hardware will cost to manufacture.

    Without a direct number for gates you will to come about in in a more indirect fashion. How much memory does the algorithm use? What data structures are used how big are they? (*all* data structures. An integer is a data structure for this purpose) What operations (adds, subtracts, etc) are needed and how many are required to go from input to result? With those you can usually come up with a ball park guess of how many gates will be required. There are always optimizations and non-obvious operations that get overlooked but it is a good start.

  52. If you are using floating point by DrFalkyn · · Score: 1

    The answer is "too many"

  53. Best answer by multimediavt · · Score: 1

    "You cannot directly interpret a software algorithm to hardware." Why? Here are the follow ups: What type of hardware, FPGA, GPU, custom ASIC? What part of the algorithm NEEDS to be in hardware to gain performance over basic system resources (CPU, GPU)? Who is going to pay for this little experiment?

    As others more qualified have already stated, you rarely if ever get a direct translation nor do you always need to interpret the entire algorithm to hardware. For a hardware manufacturer to even ask the question is suspect, unless it was a sales or marketing rep, then it might make sense. The hardware people will know best how to do this, for them to ask you ... RUN!

    My suggestion would be to say thank you and stick with software. You will probably spend enough time working this out that someone else will implement it before you, better. If you're not talking to Nvidia, AMD or Intel you're probably wasting your time.

  54. C to RTL converter by kursancew · · Score: 1

    The best path for you would be ForteDS Cynthesizer, Mentor Catapult C and C-toSilicon from Cadence. Those are behavioral synthesis tools. I have used some of those and they are very strong for datapath oriented designs, if you follow their design guidelines they are quite good to convert C code to RTL. There's a free thing you can try call xPilot, never touched it though...

    --
    linux user #271173
  55. Re:software as hardware?! but but but software pat by sourcerror · · Score: 1

    You can't patent math. Does that mean that the world doesn't exist?

  56. Concurrent Analytics solves this by rayhoare · · Score: 1

    We (ConcurrentEDA.com) have developed a tool call Concurrent Analtyics that analyzes a program's x86 code and estimates the gate count. This tool works for Xilinx and Altera FPGA chips and provides an upper bound since logic optimization reduces the gate count. Essentially, we have an extensive library of all software assembly instructions and their gate count in an FPGA. Synthesizing software into a chip requires more work but we have an internal tool for that as well. We translate x86 into a hardware description language (HDL) that the vendor's tools synthesize into FPGA gates. Over 1 million lines of high-performance HDL have been generated using these tools since 2006. Both tools are internal tools that we use to offer accelerated FPGA design services. (feel free to contact me directly RayHoare _at_ concurrenteda _dot_ com)

  57. Handel-C by Endophage · · Score: 1

    I don't know how easy it would be to port your specific algorithm, but I did my masters thesis around a language called Handel-C. It's a super-set of C that provides a high level FPGA programming interface. That might get you some distance in determining the number of gates. Disclaimer: I was working with it a few years back and the documentation/support was appalling, I don't know if it's become any better.

  58. Why are they asking you? by davesque · · Score: 1

    It seems like, if you could describe the algorithm in a sufficiently low-level language like C, they shouldn't be asking you how many gates it would take. If they're the hardware manufacturer, they should know. Besides, there are too many factors that could influence the gate count depending on how the manufacturer decided to implement the adders, etc. None of these things seem like questions that programmers should be responsible for answering.

  59. Please don't let programmers develop hardware by niks42 · · Score: 1

    You'll end up with a hardware emulation of a software algorithm, which will necessarily be slower and less efficient than the correct answer, which is to design a hardware solution to the original problem.

  60. a common misconception. "the laws of nature" by raymorris · · Score: 1

    That is a common misconception, spread by people who like a certain type of FUD. In fact, what is not patent patentable is "the laws of nature, including those of science and mathematics".

    The LAWS of nature. You can't patent gravity, you can patent an elevator. You can't patent refraction, you can patent an acoustic lens. You can't patent the associative property of addition, you can patent a scoring system for detecting bogus reviews.

    If you take out "the laws of" and replace it with "anything using", THEN you would end up with "you can't patent anything using nature, including science and math", but that's not the law. The law is that you can't patent the laws of nature, including mathematical LAWS. You can patent things that are scientific, and you can patent things that are mathematical.

    If y
    you think about it, it makes sense. You can't invent "x + 1 = 1 + x". That's always been true. However, you CAN invent a way of detecting suspicious stock trades. Since that could be a new invention, it could be patented.

  61. bad patents exist, good ones do, software don't by raymorris · · Score: 1

    > Patents (used to) cover only specific implementations of ideas, not the ideas themselves.

    That's a legitimate criticism of many patents. Of course, an implementation IS itself an idea, so we'd need to be a little more specific with our vocabulary in order to really talk about policy. Saying "you shouldn't be able to patent ideas" won't quite get us there. Certainly we can say "goals, objectives, shouldn't be patentable; only METHODS for achieving an objective should be."

    Certainly there exist bad patents that are too broad, that cover an objective rather than a method or mechanism. On the other hand, there simply is no such thing as a "software patent" per se. The problem with the patents is that they are over broad. Whether they cover something made of wood, plastic, or magnetized iron particles is irrelevant.

  62. The regression of computer science...fiction. by 3seas · · Score: 1
  63. Using labview compiler for FPGA by dsoodak · · Score: 1

    Haven't done this myself, but you can evidently run Labview programs ("virtual instruments") on some FPGA chips. You'd have a good estimate (plus an actual digital circuit) if you translated your code to labview (I believe the actual language is called "G") and found a copy of the add-on which turns this into verilog. -- Dustin

  64. Left as an excersise for the reader... by svirre · · Score: 1

    As a first pass you can estimate adders as 10 gates pr. bit, state as 20 gates pr. bit and multipliers as 10x bits squared (Unless it is by a power of two in which case it is free) If you need to to division in your algorithm you should redesign it. If you use floating point, everything gets huge (Try not to use floating point, remember in hardware you do not need to deal with arbitrary word size restrictions, just scale word sizes to suit the requirements)

    Now, figuring out exactly what resources you need, this is where you will get into trouble. Normally you will reuse some (lots) of your arithmetic, but exactly how much depend on what performance/power/gate count target you need to hit. More reuse means less gates but faster clocks (Which can drive you to more gates if you get into trouble on timing closure). The extreme case is software which just reuse a very limited set of ALUs, the other extreme is an unrolled design where algorithmic operation have dedicated hardware, so one iteration takes one clock.

    Depending on performance targets the same algorithm can have a factor 1000 difference in gate count.

  65. General answer by marcopo · · Score: 1

    That depends a whole lot on what kind of hardware you want to use. One way is to implement a universal Turing machine, and give it the code as input. Those can be quite small, and you don't even need access to the algorithm to find the answer.
    You're probably looking for a more efficient implementation.

  66. There's politics involved here by cloud.pt · · Score: 2

    I believe the OP is asking the question with an underlying motive that most users aren't grasping - The manufacturer definitely has a way of estimating the gate "cost" from C++, as some experts on the matter have pointed out here, but for that he probably demanded source code, which the OP probably has no safe way of handing over without compromising his Intelectual Property. He doesn't want to lose the business contract or spend money blindly on a consultancy he doesn't even know which's name is, so the question makes FULL SENSE regardless of its child-like semantics.

    You can probably bet the manufacturer is based and/or has legal safe-haven in a dodgy country, along the lines of having properties like:

    1. An established electronics manufacturing industry;
    2. Low respect and legislation for IP and the concept of royalties

    (hint: China) ...This makes the OP think twice about passing source around.

    Now, my personal opinion regarding a possible answer is more business-focused - if such a kind of manufacturer is even remotely interested on your "product" as to ask that, then you have a very marketable piece of code on your hands and you need to do the following...

    1. Find a "safer" buyer - something based on Europe (Germany?), Japan, or maybe the US if location is pinnacle over legislation. This nets you light IP protection
    2. Spend on a good legal advisor to draft a nuke-proof NDA with special clauses like "if we give you the code for estimation of costs, you either buy it or refrain from implementing similar technology for at least N years" (N>10)
    3. Despite all this, you still need an expert on electronic device manufacturing by your side, and I mean full-time. This also ensures you don't get robbed when they don't gain leverage on a final money deal with you by stating "it's too much gates! We can't pay more than XXXXX"
    4. In alternative, find business angels, investors or waste a TON of money and do the hardware YOURSELF, under your own company's umbrella, or maybe some form of partnership. This is the stuff that makes you a millionaire, but also places a lot of risk on your side.
  67. Learn Forth. by crovira · · Score: 1

    Charles H. Moore wrote it and extended it to be able to compile directly into silicon.

    Forth is actually a TIL [Threaded Interpretive Language] but it is so easily extensible that it is possible to implement all the way to the gates.

    Moore was working with Forth to do exactly that last I heard.

    --
    MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
  68. Fascinating problem, approximate solution? by beachdog · · Score: 1

    It is more than 30 years ago I learned digital logic from Blakeslee's Digital Design with MSI and LSI. These days I program an Arduino. I have my hands full just reinventing the spokes of a 20 year old wheel.

    I think you have a fascinating problem. Suppose you treat your computer program as a black box where you feed pages of data into one end and you get pages of output data out the other end. Suppose you say each page of data is an x,y grid of image values. You could say, your problem is for a central pixel in the image, you want to write a truth table. An initial truth table is the values and locations of the pixels from the immediate preceding image that when always present always result in the specific value of that pixel.

    Your image processing process probably uses data from several preceding images to come up with the result. If it takes five preceding images, then the truth table for a single pixel picks up five more blocks of data about the state of the surrounding pixels. No matter how wonderful the computer program may seem to be, it is still a finite state engine. The present state of the image or output (we hypothesize) should be dependent on the some number of previous states of the image.

    The process is a classic series dance steps for extracting the essential predecessor logic states. The Blakeslee book models this better than I can remember after all these years. The steps are normalize, simplify, flag all the dont't care states and gracefully conceal or wrap the data to handle the physical edges . When you have one of these cubic things, you go through a simplification process. first, you normalize the input data which means remove the numerical clutter and have a single number. Another feature of the extraction process is you sort the truth table output column and input columns and you try to mark as many of the input columns with does-not-matter as possible. A third thing is, you set a limit on the depth of the input data and that means the possible values for an output point are limited because the permuted possibilities are capped, and that cap is usually an exponent like 2^N.

    The resulting gadget will be a truth table that grinds out something like x=1 for a=1, b=0, c=1, d=0 and on and on.

    Unlike rewriting the software and putting it on a programmable gate array, This is an approach at writing a state table that produces an approximate pixel based on looking at a chunk of images containing that pixel.

  69. Re:why not go to /. by sgt+scrub · · Score: 2

    It is either good for that or good for picking up girls.

    --
    Having to work for a living is the root of all evil.
  70. Re:von neuman by Austerity+Empowers · · Score: 1

    Yet software algorithms that run on these architectures are converted to straight HW implementations all the time. It's just not "turn key", it takes quite a bit of work but it often pays off.

  71. Re:Walks Like Troll... by Austerity+Empowers · · Score: 1

    Nope, you can translate most anything if you are patient, and throw enough gates and sram at it.

    And HW manufacturers are manufacturers, they don't necessarily know anything about the designs they manufacture. Foxconn is a prime example, they are clueless as all get out about any form of design. Including and perhaps especially their design centers.

  72. Re:Ditch your electronics manufacturer by Austerity+Empowers · · Score: 1

    Gates and timing closure is a physical designers job, not the fab. And he trades area, clock rate and power based on design intent. But for FPGAs it's fairly straight forward.

  73. How many? by Avalanche_Joe · · Score: 1

    One, two, three - crunch! Three licks to get to the center of a tootsie pop! Wait. Wrong question, never mind.

  74. dissasemble by PC_THE_GREAT · · Score: 1

    give him the code in x86 assembly :p, most probably if he is into hardware, he should get the information he seeks from this. If he complains that he doesn't understand because he doesn't code, then tell him that you do not design hardware equally, that is the best thing you can come up with. If all else fails and that you are willing to pimp your arrogance and ego for money, then write a simple perl parser that will parse your program and replace specifically where there are logical decision to their respective gates, http://www.chem.uoa.gr/applets/appletgates/Images/Image1.gif
    I would also if i was to pimp out my arrogance and ego to gain money and succumb to doing something out of what i like to do to design an algorithm using gate, it would be no different than learning a new language, map out the if, add, sub, mul, (division gets a lil bit tricky), from that you can build anything, once you make this map, use a perl parser to parse your program as mentionned and make it generate it according to the map you made.

    This might help also http://www.i-programmer.info/programming/hardware/4626-getting-started-with-digital-logic-logic-gates.html



    :p am no electronic's person, but to solve any problem, a good background study is needed :p.

    who knows, if you write it out and open source it out :p it might be of some use to someone else [the perl parser to convert a language to logic gates]!

  75. Documentation by bigtreeman · · Score: 1

    Remove your C source from your documentation and comments
    ( which were written first )
    now design your hardware.

    --
    Go well
  76. Sell them the algorithm and collaborate with them by JackChang · · Score: 1

    As many pointed out, it's a quite complicated question doesn't have a straightforward answer. I used to worked on RTL implementation of graphic algorithms for years and I can say there can be night and days between different implementations of the same algorithm. Also unspecified is their performance requirement. What kind of input your algorithm is expecting? YUV? RGB? CMYK? What's the expecting throughput? How much memory and I/O bandwidth your algorithm is going to take? How many temporary registers are needed? Do they allow deep pipelining and longer latency? What's the fab process they are going to use? There are far too many variables need to be taken into consideration. Also some seemly minor tweaks can bring major differences on both area and speed. I once helped a friend to optimize a supposedly simple error diffusion pipeline. Took us 3 weeks to shrink original design into one third size of original implementation while improved its performance by 15% at the same time. I would say simply tell them you don't know because you are software people unless they are only interested in synthesiz-able codes. Finding someone to write it in RTL for you can be a much heavier burden then you might expected because it's hard to manage something you don't understand. Chances are you may also need to change your algorithms a bit because some operations aren't feasible in hardware with reasonable cost, and some operations can't be removed or simplified.

  77. Been there, done that. by treczoks · · Score: 1

    Compare this to the following situation: A graphical designer draws a fantastic new GUI for an application (on a piece of paper, even). Then you ask him how many lines or kilobytes of code this will be. And then the designer asks: "Can't I just scan the pictures I've drawn and have a software figure this out?". Sounds riddiculus? Yes, but: This is what you wanted.

    To answer the original question: The only realistic estimate is to add all the gates you've got in you computer, and take that as an upper bound. Which is still just an estimate, because implementing an algorithm for real-time in hardware can still increase the gate count by leaps and bounds.

    To be able to answer such a question you have to re-implement the algorithm in a Hardware Abstraction Language (HAL) like Verilog or VHDL.

    I did this in one of our current systems, where I had to process a stream of data.
    First I designed an algorithm in C which took an infile, generated an outfile and measured the "quallity" of the output.
    Then I re-implemented the algorithm in VHDL, which looks and "thinks" totally different than the original C source (but still DOES the same).
    Only after that one can give a realistic estimate (based on the target system/platform and timing constraints) on gate or cell counts.

  78. Since nobody else here is prividing much help... by Wierdy1024 · · Score: 2

    I shall give it a go.

    First up, most algorithms can't be directly translated to hardware without either changing them or taking a serious performance hit.

    Nearly all widespread algorithms (eg. H264 video) are designed specifically with a hardware implementation in mind, and in fact must usually have elements removed that would produce good results simply because it wouldn't be sensible to implement in hardware.

    In particular, in hardware, loops that iterate an unknown number of times are generally not allowed.

    Steps to make this estimate would probably be to take your code and 'flatten' it (IE. Rewrite it to avoid all use of pointers, except arrays).

    For every variable, figure out how many bits wide it needs to be(IE. What is the smallest and largest possible value). You probably want to convert floating point to fixed point.

    Next, to make a lower bound of how many gates would be used if you were to design for minimal gate use, take every add and subtract operation and call them 15 gates per bit. For every multiply call it 5 gates per input bit squared. Don't do division (division can be done as a multiplication by the inverse of a number).

    For the upper bound, do the same, but multiply by the number of times each loop goes round. That gives you a design with lots more gates but much higher performance.

    For the upper bound finally add on 5 gates for every bit of every variable times the number of lines of your input code. This approximates the d type flip flops for storage in a pipeline. Note that if two lines of code operate on entirely different variables, you can call them the same line as far as this metric goes.

    For the lower bound, if you got a value greater than 10000 plus 16 times the number of bytes that your program is compiled plus the ram it allocates to run, it would be more gate efficient to put in a tiny processor and keep your algorithm in a ROM. (Lots of complex algorithms are implemented this way when space is at a premium).

  79. Re: Since nobody else here is prividing much help. by Wierdy1024 · · Score: 1

    Note that these by the way assume you have the engineering time to 'do it properly'. There are lots of ways of making a considerably bigger design, but with much less design effort.

    Check out 'Handel c' for example. Its a one click tool that takes C code and produces horribly inefficient hardware, but it works.

  80. Unique algorithm? I recommend High Level Synth by LucienMP · · Score: 1

    If you algorithm is standard just google "FPGA/ASIC IP provider" (of which there are many, eg H264 etc) and pay the price, your results will be optimal and cheaper than doing it yourself - assuming you never have before.

    If your alogirthm is custom then you are either going to make a horrible job of it as you learn HW as you go, it takes years to get optimal and reach the clock speeds, area, and QoR/Test coverage numbers needed for production Si. Alternatively you could hire a HW team who will cost you a pretty penny to get it done, or outsource - also not cheap but it allows for risk reduction.

    How many gates isnt just a question of counting "*" and "/" and scaling (although back of napkin will give you a general feel). Practically any C/C++ you write is going to be very sequentially orientated, this results in algorithms that probably have better parallel implementations. Whilst you might now be thinking threads, or separate processes we are talking HW parallelism, which is far more fine grained than threads, than SW parallelism. Further any integers you have may be optimizable to less than 32 (eg 1 bit or 3 bits) thus saving a large amount of HW area. Finally you didnt really say what sort of performance you want - if its 1MHz and 1 word / year of I/O then I suspect you could build some very clever hardware to do it all a few gates but if its 1GHz and 1 Giga Words/sec then area might expand as you will need to duplicate the circuit in parallel. Finally the speed at which you process data will affect latency (time from input to the first correct output) which is often a killer in real-time or other systems (eg. If you put LCD TVs side by side you might notice some are running several seconds behind whats being broadcast - this is because as it received the image some are doing more video processing to clean up the image - latency )

    So at this point I would recommend looking into SystemC (based around C++), SystemVerilog (so so ) and then a raft of tools to help you do the job. These tools are called "High Level Synthesis" (HLS) tools and they arent cheap but they do cut down on man hours manually converting algorithms, but you will still need to be able to think extremely low level as bad code results in bad gate count - no matter the language.

    I dont want to come over as a shill so I am going to present the 4 main competitors for HLS tools;
    1) Calypto Systems, formerly Mentor Graphics' tools before spin-out ( http://calypto.com/en/products/catapult/overview )
    2) Forte Desgin systems ( http://www.forteds.com/products/cynthesizer.asp )
    3) Cadence C2S ( http://www.cadence.com/products/sd/silicon_compiler/pages/default.aspx )
    4) Impulse C ( http://www.impulseaccelerated.com/ ) - this is very reasonably priced but has its limitations.
    5) There are some open source things out there, I wouldnt recommend them as they are quite in their infancy.

    Disclaimer: I used to work at one of the companies that provides synthesis tools, for >10 years converting C/C++/SystemC to HW quite often for a service fee. I can tell you we never had a design that cost under 30K and most were in the 100s to millions of USD.

  81. Do they know what they want? by MindStalker · · Score: 1

    It seems to me that a company asking software developers what it would take in hardware might possibly not know what they want.

    Its highly possible that a small CPU and program on flash ROM solution might be all they really want. Do they really NEED it burned into the hardware?

  82. Software Development by Murdoch5 · · Score: 1

    You can't call yourself a software developer with a solid understanding of hardware. To develop software, you should always be thinking in terms of how the hardware is going to handle that software. That being said if you need a gate count then use VHDL or Verilog or another Hardware Descriptor Language. You can't actually convert an abstracted software language like C and up to gates because every single compiler and linker will turn out different end code.

    1. Re:Software Development by Murdoch5 · · Score: 1

      *without

  83. One way to make it right...but it requires work by Depressive+Cyborg · · Score: 1

    Make sure that you can analyze the software properly and break it down to well defined functions/modules/whatever-your-abstraction-is

    Implement hardware modules for corresponding inputs/outputs but make sure that you do not use an automatic tool. For hardware, you can often
    do things very differently since you have different ways to implement things and may use state machines, memories and logic in different ways.

    For each module, check hardware and software implementation against each other using CBMC or some other software which can actually verify that your implementation is, if not correct, at least equally bad as the implementation in the other domain.

    (Since I'm posting as a Depressive Cyborg, you might be able to figure out what (SW/HW) made me a Cyborg and what made me Depressive....)

  84. no clear specs, no clear answer by xgeorgio · · Score: 1

    If your algorithm is purely arithmetic, then translate it to primitves (+,-,*,/) and estimate complexity based on simple full adders and flip-flps (bit level). Note that this is a very rough estimation and does not apply easilty to long, data-oriented code, since in that case your interest is with the data storage, not the operations on them (imagine adding +1 to a billion-billion-cell vector of counters).

    If your algorithm is mixed-form, then you must know your hardware capabilities and, preferrably, its firmware. if you can transform your flowchart (low level) design to assembly code, then you can lookup the necessary opcodes in some standard IC and estimate (again, roughly) the order of the required IC in your case. For example, if you want to sort some 100-element vector of 16-bit integers, then a few generic x86 opcodes are enough - therefore, even 8086 (or a fraction of it) will do the job.

    Generally speaking, the algorithm-to-gates estimation works only on very primitive or streamlined procedures, mostly arithmetic. That is, only when we are speaking about DSP (not CPU) implementations, like for GPUs. In almost any other case, most IC circuitry comes from the corresponding memory/heap modules, I/O, registers, etc, as the "algorithm" will require much more than a simple processing unit.

    --
    "Abashed the Devil stood, and felt how awful goodness is..."
  85. FPGA by echen1024 · · Score: 1

    I would advise them to first try on an FPGA (Field programmable gate array), and just write the program in Verilog, see how many gates it needs, and then simply select an FPGA from Altera or Xilinx that fits your needs. No need for a full blown ASIC.

  86. If they are serious.... by niftymitch · · Score: 1

    If they are serious get some funding to start coding this in a hardware description language.

    Note Well: this is a lot like asking how many x86 instructions a "C" program will take
    without writing a "C" program. At best this gets you a starting answer.

    If you tell the compiler to kill loop unrolling code shrinks and might run slower.
    If loops unroll code grows but might run faster. SIMD instructions the code
    can shrink. Now ask if the x86 answer is the same answer you get on a ARM
    and a MIPS processor. The other thing to know is data path widths have large
    impact -- wider is faster but used more gates -- too wide is slow -- too narrow is
    slow.

    Invest a couple grand of their money on some large FPGA development kits and
    go to work. For the most part graphics hardware is tightly coupled stripped down
    common processors and state machines setup to solve specific display problems.

    One positive place to work is in the world of CUDA on graphics cards.

    CAUTION.... the field is full of patents and going fast on CUDA is dancing with
    a hungry bear... Any hardware you build to the same end will likely trip on patents
    that others have.

    Well written hardware descriptions read a lot like any programming language.
    With a second beer in hand you can read down from an X-windows program
    all the way town to gates and other hardware library stuff and hardly see a
    speed bump.

    Going fast in hardware requires clever minds....

    And if you cannot build Open-GL and WindowZ graphics on top
    of your "C" proof of concept you have a lot of work to do.

    --
    Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
  87. Too high up by AvailableNickname · · Score: 1

    I think that C/C++ is much too high a level to look at and say "This if statement requires n logic gates" because that if statement will be implemented in assembly differently on different systems. However, one can look at assembly code and say "Ahah, that is 2 logic gates, and there's three more and that's another 3.". So I think you need to compile your code on a whole bunch of systems, compare the assembly, and use some kind of average of the results to get a rough estimate of how many logic gates.

  88. Re:meh by jbo5112 · · Score: 1

    The hardware company may not have signed a contract yet. You don't want to just give something away to the customer when they haven't bought it yet. They're probably trying to establish design and build costs, so they will have an idea profitability and feasibility before locked into a contract to buy something they can't sell.

  89. Re:Since nobody else here is prividing much help.. by patrick.clemins · · Score: 1

    Wierdy's last suggestion is my personal favorite. It's really a sliding slope between software and hardware anyway. Does putting a Linux ROM with your algorithm set to autoload as a startup daemon in an x86 machine count as hardware or software? Embedded applications often have something resembling an OS, if not a full blown OS, managing resources. Unless your algorithm is super simple, or this electronics manufacturer is a glutton for punishment, I'd put your algorithm on a ROM alongside some DSP or other processing core and call it a day. Another option to explore that's between the two (all gates and ROM/processor combo) is a PAL/GAL... but it will certainly take some mental gymnastics to get your genetic algorithm into a form appropriate for burning the PAL. Good Luck!

  90. Re:Cadence C to Silicon redux by solidraven · · Score: 1

    Well, it really depends on the algorithm I'd say, simple things are easy enough to estimate depending on if you wish to run it in parallel or not. But if they come to you to ask for it that's usually not the case I figure.