Ask Slashdot: How Many (Electronics) Gates Is That Software Algorithm?
dryriver writes "We have developed a graphics algorithm that got an electronics manufacturer interested in turning it into hardware. Here comes the problematic bit... The electronics manufacturer asked us to describe how complex the algorithm is. More specifically, we were asked 'How many (logic) gates would be needed to turn your software algorithm into hardware?' This threw us a bit, since none of us have done electronics design before. So here is the question: Is there a piece of software or another tool that can analyze an algorithm written in C/C++ and estimate how many gates would be needed to turn it into hardware? Or, perhaps, there is a more manual method of converting code lines to gates? Maybe an operation like 'Add' would require 3 gates while an operation like 'Divide' would need 6 gates? Something along those lines, anyway. To state the question one more time: How do we get from a software algorithm that is N lines long and executes X number of total operations overall, to a rough estimate of how many gates this algorithm would use when translated into electronic hardware?"
Either implement it as shaders for a GPU (or a DSP) or hire somebody who actually knows about hardware design if you are hell-bent on implementing an ASIC.
Slashdot: Where *not* to go to get specific advice about specific technical issues.
AntiFA: An abbreviation for Anti First Amendment.
beowulf clusters does your algorithm desire?
If you learn to program in Verilog, you could try synthesizing for some FPGA and see how much space it takes up on the FPGA. But then programming for an FPGA differs from programming for a serial computer in that each line of code runs essentially as a separate thread, usually triggered on another signal (such as a clock) having a positive or negative edge.
You'd think the "electronics manufacturer" would have some idea how to estimate this.
Make up a number, then when they complain that it was way off, blame it on their management changing scope a hundred times throughout the life of the project!
As a first-order approximation, you can translate your C/C++ code to a hardware description language (HDL) such as VHDL or Verilog. Tools exist for this process. The result won't be optimized for the peculiarities of HDL, but it will provide a good start. From there, you can port the HDL to a Xilinx or Altera FPGA netlist using vendor-specific tool chains. The porting effort will summarize the logic and memory resources of your implementation. Any digital hardware engineer worth their salt should be able to translate FPGA utilization metrics into whatever platform they are interested in.
I think you may have a better chance of getting an answer if you ask this question on Stackoverflow (or one of its related sites).
Unfortunately, I think asking on Slashdot is only likely to get you some tired and outdated memes / jokes...
If they plan on implementing this in hardware, then they should have people who are capable of answering that question. If instead, they are just a manufacturer and aren't capable of doing the actual hardware design, then you have bigger problems than answering this question. That is something you should find out about ASAP.
Develop out the algorithm in Minecraft using ProjectRed (Integration module, specifically) and then you can easily count the gates! :-)
The question "How many gates does it take to implement this algorithm?" is stupid. It's like asking "How long is a piece of string?"
There will always be a time/space tradeoff, even with translating an algorithm to hardware. You can save time by throwing more gates at the problem to increase parallelism, or you can save space by reusing gates in sequential operations.
You need a C to VHDL translator. Here's a tutorial for one.
Only the parts of the algorithm that have to go really fast need to be fully translated into hardware. Control, startup, debugging, and rarely used functions can be done in some minimal CPU on or off the chip. So, for sizing purposes, extract the core part of the code that uses most of the time and work only on that.
Implement the algorithm in VHDL and test it on an FPGA. I would imagine you'll need to pay someone $$$$$ to do that for you...
Hello,
It is probable that you can break down your algorithm -(I do not mean code) into a pipeline of elementary processing and find implementations (IP) for each of them.
to give out an estimate:
- subdivise your algorithm into simpler pieces
- find for each simple piece how it can or could be implemented in hardware and the complexity of each piece.
- do the sum.
Indeed an hardware designer or consultant would be of a great help here.
What you want to do is called high-level synthesis (going from C to hardware description language (HDL) to generating gate-lists from that HDL) and there's plenty of software to do that with. A neat open-source package for HLS is LegUp (http://legup.eecg.utoronto.ca/), check it out to get an idea of what the process consists of.
It's about more than gates. It is about registers, ALUs, gates, and how they are all connected. There are many different possible architectures, so it depends on the design: some designs are faster but take more real estate. There are algorithm-to-silicon compilers (I know: I wrote one for a product company during the '80s and it is apparently still in use today) but each compiler will assume a certain architecture. I would recommend one but I have been out of that field for decades.
Here is a proven method for calculation.
If your code is:
a) C: divide the number of lines with 7
b) C++: divide the number of lines with 5
c) Ruby/Python/Java: divide the number of lines with 3
d) Perl: multiply the number of lines with 42
e) C#: resign.
How many Gates will it take to implement your software project?
One. His name is Bill, and here is yours.
Science advances one funeral at a time- Max Planck
You already have your algorithm running in electronic hardware, right?
Your current gate count is the sum of
* the gate count of your CPU
* the gate count of your RAM
* the gate count of your program ROM
So that's an upper bound on the gate count.
If that number is too big for your manufacturing partner,
then you have an optimization problem.
Optimization is a hard problem...
Write out the truth table for each output as a Karnaugh map incorporating every input. Count the number of gates needed to solve the map, and that's your answer for that output bit. Repeat for every other output bit. Add all those numbers together, and that's a fair estimate of how many gates you'll need.
Of course, this method requires that your number of input bits must be fairly small. Don't forget that memory counts as both input (when read) and output (when written). For nontrivial applications, you'll find that the number of gates quickly approaches "a lot".
You do not have a moral or legal right to do absolutely anything you want.
It doesn't take many gates for a Turing machine that will run your algorithm but it's likely to be slow. A proper hardware implementation will optimize everything and be as parallel as possible.
The problem as stated isn't adequately constrained.
Do not ask a computer scientist to be an electrical engineer.
If they can design the hardware, they can ask for the source and supply the quote themselves.
If they can't, then OP needs to understand they have no practical design capabilities and plan on paying someone else to design it---before paying these guys to manufacture it. Or he can search for a shop that can handle both the design and the manufacture.
---
According to the latest ruleset, this post should be modded as Vorpal Flamebait +5.
There's been several people who suggested using a high-level synthesis tool to convert your software (c/c++) directly to HDL (verilog/VHDL) of some kind. This can work and I've been on this task and seen it's output before. The catch is; unless that software was expressly and purpose written to describe hardware (by someone who understands that hardware and it's limitations and how that particular converter works), it almost always makes awful and extraordinarily inefficient hardware.
Case in point - we had one algorithm developed in Simulink/Matlab that needed to end up in an FPGA. After 'pushing the button' and letting the tool generate the HDL, it consumed not just 1 but about 4 FPGAs worth of logic gates, RAMs, and registers. Needless to say the hardware platform only had one FPGA and a good portion of it was already dedicated to platform tasks so only about 20% was available for the algorithm. We got it working after basically re-implementing the algorithm with the goal of hardware in mind. The generation tool's output was 20 times worse than what was even feasible. If you're doing an ASIC you can just throw a crap-load of extra silicon at it, but that gets expensive very quickly. Plus closing timing on that will be a nightmare.
My job recently has been to go through and take algorithms written by very smart people (but oriented to software) and re-implement them so they can fit on reasonably sized FPGAs. It can be a long task sometimes and there's no push-button solution for getting something good, fast, cheap. Techies usually say you can pick two during the design process, but when converting from software to hardware you usually only get one.
Granted this all varies a lot and depends heavily on the specifics of the algorithm in question. But the most likely way to get a reasonable estimate is going to be to explain the algorithm in detail to an ASIC/FPGA engineer and let them work up a prelim architecture and estimate. The high-level synthesis push-button tools will give you a number but it probably won't be something people actually want to build/sell or buy.
I just tried this and all my money was transferred to a different account.
They have a tool that can do this, I don't know if I'd call it a 'solution' just yet though. We've just finished ripping out all the 'solution' for our project because we wanted a device that was actually small enough (and thus cheap enough) to be able to sell.
It takes input designed to be hardware and makes good hardware. It takes input designed to be software and makes shit hardware. It also doesn't handle version control very well, you need proprietary tools to even VIEW the design files... and the output which actually describes the hardware (vhdl) is so obfuscated as to be nearly illegible. The build times are also 4-5 times longer than they need to be, so it takes a whole day to place and route the designs output by this tool. Unless you're building something trivial I wouldn't advise depending on mathworks/simulink tools for a solution.
We (ConcurrentEDA.com) have developed a tool call Concurrent Analtyics that analyzes a program's x86 code and estimates the gate count. This tool works for Xilinx and Altera FPGA chips and provides an upper bound since logic optimization reduces the gate count. Essentially, we have an extensive library of all software assembly instructions and their gate count in an FPGA. Synthesizing software into a chip requires more work but we have an internal tool for that as well. We translate x86 into a hardware description language (HDL) that the vendor's tools synthesize into FPGA gates. Over 1 million lines of high-performance HDL have been generated using these tools since 2006. Both tools are internal tools that we use to offer accelerated FPGA design services. (feel free to contact me directly RayHoare _at_ concurrenteda _dot_ com)
Knowing what algorithm you want to run in hardware in not even close to enough to estimate gates. You need to know the algorithm, and the required performance, and have a sketched out HW design that meets those goals. THEN you can estimate gate count.
For a simple example of why this is, consider processors. A 386 and a Sandy Bridge i7 implement very similar "algorithms" - it's just fetch->decode->execute->writeback all day long. If you implemented them in software emulation, it would be very similar software with some additional bits for the newer ISA features on the i7. But a 386 is about 280 THOUSAND gates, and the i7 is about 350 MILLION gates/core - three orders of magnitude different. Of course, there's at least a 2 order of magnitude performance difference too - it's not like those gates are going to waste.
Point is, knowing the algorithm isn't enough to get even a finger in the wind guess at gate count. If you need an answer to this question, you need to get competent HW design people looking at it.
I believe the OP is asking the question with an underlying motive that most users aren't grasping - The manufacturer definitely has a way of estimating the gate "cost" from C++, as some experts on the matter have pointed out here, but for that he probably demanded source code, which the OP probably has no safe way of handing over without compromising his Intelectual Property. He doesn't want to lose the business contract or spend money blindly on a consultancy he doesn't even know which's name is, so the question makes FULL SENSE regardless of its child-like semantics.
You can probably bet the manufacturer is based and/or has legal safe-haven in a dodgy country, along the lines of having properties like:
(hint: China) ...This makes the OP think twice about passing source around.
Now, my personal opinion regarding a possible answer is more business-focused - if such a kind of manufacturer is even remotely interested on your "product" as to ask that, then you have a very marketable piece of code on your hands and you need to do the following...
It is either good for that or good for picking up girls.
Having to work for a living is the root of all evil.
I shall give it a go.
First up, most algorithms can't be directly translated to hardware without either changing them or taking a serious performance hit.
Nearly all widespread algorithms (eg. H264 video) are designed specifically with a hardware implementation in mind, and in fact must usually have elements removed that would produce good results simply because it wouldn't be sensible to implement in hardware.
In particular, in hardware, loops that iterate an unknown number of times are generally not allowed.
Steps to make this estimate would probably be to take your code and 'flatten' it (IE. Rewrite it to avoid all use of pointers, except arrays).
For every variable, figure out how many bits wide it needs to be(IE. What is the smallest and largest possible value). You probably want to convert floating point to fixed point.
Next, to make a lower bound of how many gates would be used if you were to design for minimal gate use, take every add and subtract operation and call them 15 gates per bit. For every multiply call it 5 gates per input bit squared. Don't do division (division can be done as a multiplication by the inverse of a number).
For the upper bound, do the same, but multiply by the number of times each loop goes round. That gives you a design with lots more gates but much higher performance.
For the upper bound finally add on 5 gates for every bit of every variable times the number of lines of your input code. This approximates the d type flip flops for storage in a pipeline. Note that if two lines of code operate on entirely different variables, you can call them the same line as far as this metric goes.
For the lower bound, if you got a value greater than 10000 plus 16 times the number of bytes that your program is compiled plus the ram it allocates to run, it would be more gate efficient to put in a tiny processor and keep your algorithm in a ROM. (Lots of complex algorithms are implemented this way when space is at a premium).