Ask Slashdot: How Many (Electronics) Gates Is That Software Algorithm?
dryriver writes "We have developed a graphics algorithm that got an electronics manufacturer interested in turning it into hardware. Here comes the problematic bit... The electronics manufacturer asked us to describe how complex the algorithm is. More specifically, we were asked 'How many (logic) gates would be needed to turn your software algorithm into hardware?' This threw us a bit, since none of us have done electronics design before. So here is the question: Is there a piece of software or another tool that can analyze an algorithm written in C/C++ and estimate how many gates would be needed to turn it into hardware? Or, perhaps, there is a more manual method of converting code lines to gates? Maybe an operation like 'Add' would require 3 gates while an operation like 'Divide' would need 6 gates? Something along those lines, anyway. To state the question one more time: How do we get from a software algorithm that is N lines long and executes X number of total operations overall, to a rough estimate of how many gates this algorithm would use when translated into electronic hardware?"
Either implement it as shaders for a GPU (or a DSP) or hire somebody who actually knows about hardware design if you are hell-bent on implementing an ASIC.
Slashdot: Where *not* to go to get specific advice about specific technical issues.
AntiFA: An abbreviation for Anti First Amendment.
beowulf clusters does your algorithm desire?
If you learn to program in Verilog, you could try synthesizing for some FPGA and see how much space it takes up on the FPGA. But then programming for an FPGA differs from programming for a serial computer in that each line of code runs essentially as a separate thread, usually triggered on another signal (such as a clock) having a positive or negative edge.
You'd think the "electronics manufacturer" would have some idea how to estimate this.
Make up a number, then when they complain that it was way off, blame it on their management changing scope a hundred times throughout the life of the project!
As a first-order approximation, you can translate your C/C++ code to a hardware description language (HDL) such as VHDL or Verilog. Tools exist for this process. The result won't be optimized for the peculiarities of HDL, but it will provide a good start. From there, you can port the HDL to a Xilinx or Altera FPGA netlist using vendor-specific tool chains. The porting effort will summarize the logic and memory resources of your implementation. Any digital hardware engineer worth their salt should be able to translate FPGA utilization metrics into whatever platform they are interested in.
I think you may have a better chance of getting an answer if you ask this question on Stackoverflow (or one of its related sites).
Unfortunately, I think asking on Slashdot is only likely to get you some tired and outdated memes / jokes...
If they plan on implementing this in hardware, then they should have people who are capable of answering that question. If instead, they are just a manufacturer and aren't capable of doing the actual hardware design, then you have bigger problems than answering this question. That is something you should find out about ASAP.
Develop out the algorithm in Minecraft using ProjectRed (Integration module, specifically) and then you can easily count the gates! :-)
"We don't know."
The most common languages for chip or FPGA design would be VHDL or Verilog. Now there is also High Level Synthesis (http://en.wikipedia.org/wiki/High-level_synthesis), in which you can use C/C++ directly. So if your using a tool like Xilinx's Vivado (http://www.xilinx.com/products/design-tools/vivado/integration/esl-design/) then you can go directly from C/C++ to gate count. However, even in C/C++ it probably needs lots of work from where it is.
The question "How many gates does it take to implement this algorithm?" is stupid. It's like asking "How long is a piece of string?"
There will always be a time/space tradeoff, even with translating an algorithm to hardware. You can save time by throwing more gates at the problem to increase parallelism, or you can save space by reusing gates in sequential operations.
You need a C to VHDL translator. Here's a tutorial for one.
Only the parts of the algorithm that have to go really fast need to be fully translated into hardware. Control, startup, debugging, and rarely used functions can be done in some minimal CPU on or off the chip. So, for sizing purposes, extract the core part of the code that uses most of the time and work only on that.
it would only make sense to reuse the same adder circuit for each addition, instead of making a separate adder circuit for each operation.
then you'd add control logic to move the data to adder circuits, multiplier circuits, etc.
then essentially what you have is a microprocessor.
then you just turn that microprocessor into the simplest one possible. which is basically a queue and a stack, and a few elementary logic operations. you can do operations a bit at a time.
so the number of logic gates your program needs it the number to make a queue and a stack, and a few elementary logic operations, and that's probably on the order of about 40 gates.
Implement the algorithm in VHDL and test it on an FPGA. I would imagine you'll need to pay someone $$$$$ to do that for you...
Maybe they can do the translation, but they need a number for how many gates so that they can give a number for how many dollars.
Hello,
It is probable that you can break down your algorithm -(I do not mean code) into a pipeline of elementary processing and find implementations (IP) for each of them.
to give out an estimate:
- subdivise your algorithm into simpler pieces
- find for each simple piece how it can or could be implemented in hardware and the complexity of each piece.
- do the sum.
Indeed an hardware designer or consultant would be of a great help here.
What you want to do is called high-level synthesis (going from C to hardware description language (HDL) to generating gate-lists from that HDL) and there's plenty of software to do that with. A neat open-source package for HLS is LegUp (http://legup.eecg.utoronto.ca/), check it out to get an idea of what the process consists of.
It's about more than gates. It is about registers, ALUs, gates, and how they are all connected. There are many different possible architectures, so it depends on the design: some designs are faster but take more real estate. There are algorithm-to-silicon compilers (I know: I wrote one for a product company during the '80s and it is apparently still in use today) but each compiler will assume a certain architecture. I would recommend one but I have been out of that field for decades.
There are a number of tools on the (commercial) market that can compile (a subset of) C to hardware into a hardware description language (Verilog/VHDL), e.g. from Cadence and Synopsys. See http://en.wikipedia.org/wiki/High-level_synthesis for an overview of the approach and links to tools. There are also some open source tools that can turn C code into Verilog or VHDL, but they are not very mature in my opinion.
However, you will not get a single number for gate complexity out of these tools. Depending on the requirements and tradeoffs (smaller chip area vs. higher speed, single cycle vs. pipelined implementation, target device as FPGA or ASIC), the number of gates (or logic blocks for FPGAs) required will differ significantly. From my experience, you definitely need a hardware design expert to obtain useful (i.e., more-or-less optimized) results from these tools - and you should expect having to invest significant effort for restructuring your C code so the high-level synthesis tools can grok it.
Here is a proven method for calculation.
If your code is:
a) C: divide the number of lines with 7
b) C++: divide the number of lines with 5
c) Ruby/Python/Java: divide the number of lines with 3
d) Perl: multiply the number of lines with 42
e) C#: resign.
The answer, as you might imagine, is complicated and depends on how these gates are implemented. Think for instance you could design a chip to do this, you could write RTL to do this in an FPGA, or you could even write the algorithm into more software on an embedded processor of some kind. Is this electronics manufacturer one that makes chips or one that makes systems (boards, cases, etc). If it is the former they should have people who can work with your people to figure this out. If it is the latter then why do they care? Are they really asking you to provide a chip which implements your algorithm? Ask some more questions...
The question seems so ill-posed that one has to wonder if there's a product or service advert lurking... but assuming this is real.
Software doesn't automatically translate directly to hardware. As others have noted, break out the algorithmic core from the setup and finish. Presumably there is some part of the code which is the most critical in steady state. Describe that to their hardware engineers in whatever depth is required. Depending on the algorithm, the ASIC library elements available (or FPGA units, etc.) you may want to make some substantial adjustments to the "code" to make it fit within the design parameters of the available device. This should be an iterative process, not a single estimate based on a pure software perspective.
If there isn't a clearly identifiable set of "hot blocks" the chances of there being a good hw implementation fit is poor. If there is, it may still be necessary to change the algorithm details to fit but it should be "doable". Whether it is worthwhile depends on the volumes and the performance gains.
How many Gates will it take to implement your software project?
One. His name is Bill, and here is yours.
Science advances one funeral at a time- Max Planck
Mod this offtopic if you want but now I can't see my comments, I can't see if anyone has responded to them, and it has become almost impossible to participate in discussions as a result. WTF, /.?
Drill baby drill - on Mars
...this should get you started:
http://en.wikipedia.org/wiki/C_to_HDL
Find a suitable converter, then grab a free (or evaluation) version of an FPGA design tool, for example one of these (I only suggest these over the many other, probably equally as good alternatives, as I've used them myself):
http://www.xilinx.com/products/design-tools/ise-design-suite/index.htm
And with a bit of work you should be able to produce output that will essentially be your code implemented in programmable logic, and the tools will tell you the number of gates/cells required.
What I would say, is that you'll have a much easier ride if your algorithm is in C rather than C++.
Despite saying that you have no experience with this sort of thing, defining logic in something like VHDL is basically programming. Sure, you'll need to develop a fair understanding of the hardware, but with the libraries of pre-built components available from the numerous companies who produce programmable hardware like FPGAs and CPLDs, you may find you could do a lot more than you think yourself.
Haven't tried it, but Cadence's C to Silicon might be up for the job. Also keep in mind that in hardware you have very different requirements than in software, and parallellisation has interesting effects on the number of gates. The best option is to get an EE, preferably with experience in digital design, to take a look at it. Other options are SystemC compilers, but they're not really up to production use yet as far as I know. And it is also very technology dependant, sometimes complicated logical functions that are common are implemented directly. This isn't something you can just wing!
There really isn't a great way to answer your question without a detailed analysis of your code.
There are more factors to the number of gates required for a given task than just the complexity of code. Clock speed can be a major factor in determining the number of gates required for a given algorithm. Another major factor is the part you are targeting. The number of design elements in FPGAs used can change just by targeting a different device family.
Even if your algorithm was small enough to fit into a part, there are other issues that could arise (such as not enough bandwidth or pins for your memory device(s)).
It sounds like the electronics manufacturer doesn't have the resources to determine the number of gates for you. It looks like your only avenue is to ask a third party to review your code (under NDA) to help you determine the approximate gate requirements. This won't be cheap.
google: https://www.google.com/search?q=convert+c+code+to+fpga
Clearly it's not possible to render a software program as hardware. If everyone who explained the process (use Verilog) above is correct, that would mean that the exact same algorithm exists as both hardware and software.
We can't have the same algorithm exist as both hardware and software, because that would mean algorithms are hardware just as much as they are software.
that would mean all the people whining about "software patents" may as well be whining about unicorns. I hereby declare Verilog, ASICs, and FPGAs to be non-existent so we can continue to pretend that there is such a thing as a "software patent".
This is a horrible question to ask. Software is a tool to lower hardware requirements.
Compile your algorithm to the simplest RISC architecture reasonable. For most, something among the lines of ARM or MIPS works. Then, take note of all variables and add up how much RAM they'll take. Consider every bit (yes, bit, not byte) as a D-flipflop and convert every instruction (post-compile, in assembly) into a respective set of logic gates. A bit of googling should get you those values.
If your algorithm is reasonably complicated, chances are, you'll get a number that seems absurdly high compared to what state-of-art hardware is available.
In practice, it's probably best to just pick an off-the-shelf CPU and run the software on it. There might be some parts that are better done in hardware than in software, but you should get someone who knows what they're doing for that.
You already have your algorithm running in electronic hardware, right?
Your current gate count is the sum of
* the gate count of your CPU
* the gate count of your RAM
* the gate count of your program ROM
So that's an upper bound on the gate count.
If that number is too big for your manufacturing partner,
then you have an optimization problem.
Optimization is a hard problem...
Write out the truth table for each output as a Karnaugh map incorporating every input. Count the number of gates needed to solve the map, and that's your answer for that output bit. Repeat for every other output bit. Add all those numbers together, and that's a fair estimate of how many gates you'll need.
Of course, this method requires that your number of input bits must be fairly small. Don't forget that memory counts as both input (when read) and output (when written). For nontrivial applications, you'll find that the number of gates quickly approaches "a lot".
You do not have a moral or legal right to do absolutely anything you want.
It doesn't take many gates for a Turing machine that will run your algorithm but it's likely to be slow. A proper hardware implementation will optimize everything and be as parallel as possible.
The problem as stated isn't adequately constrained.
You guys are probably trying to get a manufacturer to make Scrypt-mining ASICs.
Get free satoshi (Bitcoin) and Dogecoins
Do not ask a computer scientist to be an electrical engineer.
Theoretical answer:
Recode your algorithm in SystemC (a c++ library that can be used to implement a register transfer language representation of your algorithm) and synthesize it with one of the available tools (e.g., Accelera, Synopsys, Calypto, etc) targeting a typical library (e.g, 28nm TSMC), at a particular clock frequency.
Practical answer:
Ask someone with hw design experience to estimate it for you...
FWIW, nobody wants an "exact" size in logic gates, all they want an idea in complexity. The big ticket items people care about are the size in bits of RAMs (and how many simultaneous read/write ports it might need) and complicated math that is likely to take more than 1 clock cycle to complete (e.g., like a floating point math operation) and the data-width of the main data path at the throughput that you want to have. Simply multiplying the data path width by the estimated number of pipeline cycles is generally proportional to the eventual area minus the RAMS and special math ops (which is why you need to identify those parts separately).
Generally, I've found that naïve "software" algorithms have not been very amenable to HW implementation without some amount of rework and the fact that you do not have an answer to the posed question would likely lead me (and probably your potential customer), that your algorithm is half-baked from a HW implementation point of view... Just food for thought...
Short answer: you need to contract an electronics engineer. Possible: You could dump the non-optimized assemby language (-S on most compilers) for a popular processor family e.g., 80686, PA-RISC, etc. The manufacturer probably has resources to convert "this pile of 80686 instructions" to "an ASIC that does the same thing really well".
For what it is worth, here is a circuit I developed to see what the gate configuration (nor only) would look like for the implementation of a condition that the input switches be:
0
1
01
11
http://www.neuroproductions.be/logic-lab/index.php?id=3699
you know, the counting integers 0,1,2,3 - the same code that I have on my luggage. I thought there might be a fun implementation related to security or something - a hybrid mechanical/electronic locking system.
It turned out to be super hard for me to figure out this final result. Nonetheless, the result was most interesting and I encourage you to find a more efficient configuration.
I did this using basic logic and a crapton of that time-honored tradition of guessing and trial and error.
I can only begin to imagine the complexity of trying to implement and design circuits based on algorithms written in anything above assembler level.
"It takes one gate that accepts our input and outputs a desirable answer. We would like you to design that gate."
*oopsy* that should have been
;)
0
1
10
11
but you knew that
Then, stick your pinky into the corner of your mouth and do your best evil laugh!
I swear to God...I swear to God! That is NOT how you treat your human!
There's been several people who suggested using a high-level synthesis tool to convert your software (c/c++) directly to HDL (verilog/VHDL) of some kind. This can work and I've been on this task and seen it's output before. The catch is; unless that software was expressly and purpose written to describe hardware (by someone who understands that hardware and it's limitations and how that particular converter works), it almost always makes awful and extraordinarily inefficient hardware.
Case in point - we had one algorithm developed in Simulink/Matlab that needed to end up in an FPGA. After 'pushing the button' and letting the tool generate the HDL, it consumed not just 1 but about 4 FPGAs worth of logic gates, RAMs, and registers. Needless to say the hardware platform only had one FPGA and a good portion of it was already dedicated to platform tasks so only about 20% was available for the algorithm. We got it working after basically re-implementing the algorithm with the goal of hardware in mind. The generation tool's output was 20 times worse than what was even feasible. If you're doing an ASIC you can just throw a crap-load of extra silicon at it, but that gets expensive very quickly. Plus closing timing on that will be a nightmare.
My job recently has been to go through and take algorithms written by very smart people (but oriented to software) and re-implement them so they can fit on reasonably sized FPGAs. It can be a long task sometimes and there's no push-button solution for getting something good, fast, cheap. Techies usually say you can pick two during the design process, but when converting from software to hardware you usually only get one.
Granted this all varies a lot and depends heavily on the specifics of the algorithm in question. But the most likely way to get a reasonable estimate is going to be to explain the algorithm in detail to an ASIC/FPGA engineer and let them work up a prelim architecture and estimate. The high-level synthesis push-button tools will give you a number but it probably won't be something people actually want to build/sell or buy.
I just tried this and all my money was transferred to a different account.
http://davidmoorephoto.com/storage/cache/images/000/975/1I2B9665,large.2x.jpg?1372317810
While an interesting question (I didn't even know hardware manufacturers were in the habit of converting software into hardware), why don't they figure it out themselves? They must have the tools/people to do it. Are you afraid they'll "steal your algorithm" if you give them the source? (that's much less interesting)
--
Stay tuned for some shock and awe coming right up after this messages!
How many gates could a gate chuck chuck if a gate chuck could chuck gates?
Look at Mathworks. They have a solution for this.
Computers are so cheap and low power today that turning an algorithm into gates would be a silly way to proceed. So the question is not really relevant except academically.
We (ConcurrentEDA.com) have developed a tool call Concurrent Analtyics that analyzes a program's x86 code and estimates the gate count. This tool works for Xilinx and Altera FPGA chips and provides an upper bound since logic optimization reduces the gate count. Essentially, we have an extensive library of all software assembly instructions and their gate count in an FPGA. Synthesizing software into a chip requires more work but we have an internal tool for that as well. We translate x86 into a hardware description language (HDL) that the vendor's tools synthesize into FPGA gates. Over 1 million lines of high-performance HDL have been generated using these tools since 2006. Both tools are internal tools that we use to offer accelerated FPGA design services. (feel free to contact me directly RayHoare _at_ concurrenteda _dot_ com)
Knowing what algorithm you want to run in hardware in not even close to enough to estimate gates. You need to know the algorithm, and the required performance, and have a sketched out HW design that meets those goals. THEN you can estimate gate count.
For a simple example of why this is, consider processors. A 386 and a Sandy Bridge i7 implement very similar "algorithms" - it's just fetch->decode->execute->writeback all day long. If you implemented them in software emulation, it would be very similar software with some additional bits for the newer ISA features on the i7. But a 386 is about 280 THOUSAND gates, and the i7 is about 350 MILLION gates/core - three orders of magnitude different. Of course, there's at least a 2 order of magnitude performance difference too - it's not like those gates are going to waste.
Point is, knowing the algorithm isn't enough to get even a finger in the wind guess at gate count. If you need an answer to this question, you need to get competent HW design people looking at it.
because if the hardware company is thinking "gates" instead of "cycles," they want to implement it in a FPGA. hell, if they were going to put it on a dedicated microprocessor, they'd just recast it with libraries for that processor and recompile.
if this is supposed to be a new economy, how come they still want my old fashioned money?
The manufacturer is probably asking how many gates you need to implement the algorithm exactly as it is coded, with exactly as much parallel or sequential logic as it already has, and that will have a fairly specific answer.
While that number could be determined, it would not be very useful. Hardware implementation, especially when targeting FPGA's, get most of their performance advantage by exploiting more parallelism than is achievable by running on a processor.
No, the manufacturer isn't make any assumptions about how the algorithm is translated. The deal in gates. Gates are the most direct measure of how much the hardware will cost to manufacture.
Without a direct number for gates you will to come about in in a more indirect fashion. How much memory does the algorithm use? What data structures are used how big are they? (*all* data structures. An integer is a data structure for this purpose) What operations (adds, subtracts, etc) are needed and how many are required to go from input to result? With those you can usually come up with a ball park guess of how many gates will be required. There are always optimizations and non-obvious operations that get overlooked but it is a good start.
The answer is "too many"
"You cannot directly interpret a software algorithm to hardware." Why? Here are the follow ups: What type of hardware, FPGA, GPU, custom ASIC? What part of the algorithm NEEDS to be in hardware to gain performance over basic system resources (CPU, GPU)? Who is going to pay for this little experiment?
As others more qualified have already stated, you rarely if ever get a direct translation nor do you always need to interpret the entire algorithm to hardware. For a hardware manufacturer to even ask the question is suspect, unless it was a sales or marketing rep, then it might make sense. The hardware people will know best how to do this, for them to ask you ... RUN!
My suggestion would be to say thank you and stick with software. You will probably spend enough time working this out that someone else will implement it before you, better. If you're not talking to Nvidia, AMD or Intel you're probably wasting your time.
The best path for you would be ForteDS Cynthesizer, Mentor Catapult C and C-toSilicon from Cadence. Those are behavioral synthesis tools. I have used some of those and they are very strong for datapath oriented designs, if you follow their design guidelines they are quite good to convert C code to RTL. There's a free thing you can try call xPilot, never touched it though...
linux user #271173
You can't patent math. Does that mean that the world doesn't exist?
We (ConcurrentEDA.com) have developed a tool call Concurrent Analtyics that analyzes a program's x86 code and estimates the gate count. This tool works for Xilinx and Altera FPGA chips and provides an upper bound since logic optimization reduces the gate count. Essentially, we have an extensive library of all software assembly instructions and their gate count in an FPGA. Synthesizing software into a chip requires more work but we have an internal tool for that as well. We translate x86 into a hardware description language (HDL) that the vendor's tools synthesize into FPGA gates. Over 1 million lines of high-performance HDL have been generated using these tools since 2006. Both tools are internal tools that we use to offer accelerated FPGA design services. (feel free to contact me directly RayHoare _at_ concurrenteda _dot_ com)
I don't know how easy it would be to port your specific algorithm, but I did my masters thesis around a language called Handel-C. It's a super-set of C that provides a high level FPGA programming interface. That might get you some distance in determining the number of gates. Disclaimer: I was working with it a few years back and the documentation/support was appalling, I don't know if it's become any better.
It seems like, if you could describe the algorithm in a sufficiently low-level language like C, they shouldn't be asking you how many gates it would take. If they're the hardware manufacturer, they should know. Besides, there are too many factors that could influence the gate count depending on how the manufacturer decided to implement the adders, etc. None of these things seem like questions that programmers should be responsible for answering.
You'll end up with a hardware emulation of a software algorithm, which will necessarily be slower and less efficient than the correct answer, which is to design a hardware solution to the original problem.
That is a common misconception, spread by people who like a certain type of FUD. In fact, what is not patent patentable is "the laws of nature, including those of science and mathematics".
The LAWS of nature. You can't patent gravity, you can patent an elevator. You can't patent refraction, you can patent an acoustic lens. You can't patent the associative property of addition, you can patent a scoring system for detecting bogus reviews.
If you take out "the laws of" and replace it with "anything using", THEN you would end up with "you can't patent anything using nature, including science and math", but that's not the law. The law is that you can't patent the laws of nature, including mathematical LAWS. You can patent things that are scientific, and you can patent things that are mathematical.
If y
you think about it, it makes sense. You can't invent "x + 1 = 1 + x". That's always been true. However, you CAN invent a way of detecting suspicious stock trades. Since that could be a new invention, it could be patented.
> Patents (used to) cover only specific implementations of ideas, not the ideas themselves.
That's a legitimate criticism of many patents. Of course, an implementation IS itself an idea, so we'd need to be a little more specific with our vocabulary in order to really talk about policy. Saying "you shouldn't be able to patent ideas" won't quite get us there. Certainly we can say "goals, objectives, shouldn't be patentable; only METHODS for achieving an objective should be."
Certainly there exist bad patents that are too broad, that cover an objective rather than a method or mechanism. On the other hand, there simply is no such thing as a "software patent" per se. The problem with the patents is that they are over broad. Whether they cover something made of wood, plastic, or magnetized iron particles is irrelevant.
http://en.wikipedia.org/wiki/Turing_machine
nough said
Haven't done this myself, but you can evidently run Labview programs ("virtual instruments") on some FPGA chips. You'd have a good estimate (plus an actual digital circuit) if you translated your code to labview (I believe the actual language is called "G") and found a copy of the add-on which turns this into verilog. -- Dustin
As a first pass you can estimate adders as 10 gates pr. bit, state as 20 gates pr. bit and multipliers as 10x bits squared (Unless it is by a power of two in which case it is free) If you need to to division in your algorithm you should redesign it. If you use floating point, everything gets huge (Try not to use floating point, remember in hardware you do not need to deal with arbitrary word size restrictions, just scale word sizes to suit the requirements)
Now, figuring out exactly what resources you need, this is where you will get into trouble. Normally you will reuse some (lots) of your arithmetic, but exactly how much depend on what performance/power/gate count target you need to hit. More reuse means less gates but faster clocks (Which can drive you to more gates if you get into trouble on timing closure). The extreme case is software which just reuse a very limited set of ALUs, the other extreme is an unrolled design where algorithmic operation have dedicated hardware, so one iteration takes one clock.
Depending on performance targets the same algorithm can have a factor 1000 difference in gate count.
That depends a whole lot on what kind of hardware you want to use. One way is to implement a universal Turing machine, and give it the code as input. Those can be quite small, and you don't even need access to the algorithm to find the answer.
You're probably looking for a more efficient implementation.
I believe the OP is asking the question with an underlying motive that most users aren't grasping - The manufacturer definitely has a way of estimating the gate "cost" from C++, as some experts on the matter have pointed out here, but for that he probably demanded source code, which the OP probably has no safe way of handing over without compromising his Intelectual Property. He doesn't want to lose the business contract or spend money blindly on a consultancy he doesn't even know which's name is, so the question makes FULL SENSE regardless of its child-like semantics.
You can probably bet the manufacturer is based and/or has legal safe-haven in a dodgy country, along the lines of having properties like:
(hint: China) ...This makes the OP think twice about passing source around.
Now, my personal opinion regarding a possible answer is more business-focused - if such a kind of manufacturer is even remotely interested on your "product" as to ask that, then you have a very marketable piece of code on your hands and you need to do the following...
Charles H. Moore wrote it and extended it to be able to compile directly into silicon.
Forth is actually a TIL [Threaded Interpretive Language] but it is so easily extensible that it is possible to implement all the way to the gates.
Moore was working with Forth to do exactly that last I heard.
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
It is more than 30 years ago I learned digital logic from Blakeslee's Digital Design with MSI and LSI. These days I program an Arduino. I have my hands full just reinventing the spokes of a 20 year old wheel.
I think you have a fascinating problem. Suppose you treat your computer program as a black box where you feed pages of data into one end and you get pages of output data out the other end. Suppose you say each page of data is an x,y grid of image values. You could say, your problem is for a central pixel in the image, you want to write a truth table. An initial truth table is the values and locations of the pixels from the immediate preceding image that when always present always result in the specific value of that pixel.
Your image processing process probably uses data from several preceding images to come up with the result. If it takes five preceding images, then the truth table for a single pixel picks up five more blocks of data about the state of the surrounding pixels. No matter how wonderful the computer program may seem to be, it is still a finite state engine. The present state of the image or output (we hypothesize) should be dependent on the some number of previous states of the image.
The process is a classic series dance steps for extracting the essential predecessor logic states. The Blakeslee book models this better than I can remember after all these years. The steps are normalize, simplify, flag all the dont't care states and gracefully conceal or wrap the data to handle the physical edges . When you have one of these cubic things, you go through a simplification process. first, you normalize the input data which means remove the numerical clutter and have a single number. Another feature of the extraction process is you sort the truth table output column and input columns and you try to mark as many of the input columns with does-not-matter as possible. A third thing is, you set a limit on the depth of the input data and that means the possible values for an output point are limited because the permuted possibilities are capped, and that cap is usually an exponent like 2^N.
The resulting gadget will be a truth table that grinds out something like x=1 for a=1, b=0, c=1, d=0 and on and on.
Unlike rewriting the software and putting it on a programmable gate array, This is an approach at writing a state table that produces an approximate pixel based on looking at a chunk of images containing that pixel.
It is either good for that or good for picking up girls.
Having to work for a living is the root of all evil.
Yet software algorithms that run on these architectures are converted to straight HW implementations all the time. It's just not "turn key", it takes quite a bit of work but it often pays off.
Nope, you can translate most anything if you are patient, and throw enough gates and sram at it.
And HW manufacturers are manufacturers, they don't necessarily know anything about the designs they manufacture. Foxconn is a prime example, they are clueless as all get out about any form of design. Including and perhaps especially their design centers.
Gates and timing closure is a physical designers job, not the fab. And he trades area, clock rate and power based on design intent. But for FPGAs it's fairly straight forward.
One, two, three - crunch! Three licks to get to the center of a tootsie pop! Wait. Wrong question, never mind.
give him the code in x86 assembly :p, most probably if he is into hardware, he should get the information he seeks from this.
If he complains that he doesn't understand because he doesn't code, then tell him that you do not design hardware equally, that is the best thing you can come up with.
If all else fails and that you are willing to pimp your arrogance and ego for money, then write a simple perl parser that will parse your program and replace specifically where there are logical decision to their respective gates,
http://www.chem.uoa.gr/applets/appletgates/Images/Image1.gif
:p am no electronic's person, but to solve any problem, a good background study is needed :p.
:p it might be of some use to someone else [the perl parser to convert a language to logic gates]!
I would also if i was to pimp out my arrogance and ego to gain money and succumb to doing something out of what i like to do to design an algorithm using gate, it would be no different than learning a new language, map out the if, add, sub, mul, (division gets a lil bit tricky), from that you can build anything, once you make this map, use a perl parser to parse your program as mentionned and make it generate it according to the map you made.
This might help also http://www.i-programmer.info/programming/hardware/4626-getting-started-with-digital-logic-logic-gates.html
who knows, if you write it out and open source it out
Remove your C source from your documentation and comments
( which were written first )
now design your hardware.
Go well
As many pointed out, it's a quite complicated question doesn't have a straightforward answer. I used to worked on RTL implementation of graphic algorithms for years and I can say there can be night and days between different implementations of the same algorithm. Also unspecified is their performance requirement. What kind of input your algorithm is expecting? YUV? RGB? CMYK? What's the expecting throughput? How much memory and I/O bandwidth your algorithm is going to take? How many temporary registers are needed? Do they allow deep pipelining and longer latency? What's the fab process they are going to use? There are far too many variables need to be taken into consideration. Also some seemly minor tweaks can bring major differences on both area and speed. I once helped a friend to optimize a supposedly simple error diffusion pipeline. Took us 3 weeks to shrink original design into one third size of original implementation while improved its performance by 15% at the same time. I would say simply tell them you don't know because you are software people unless they are only interested in synthesiz-able codes. Finding someone to write it in RTL for you can be a much heavier burden then you might expected because it's hard to manage something you don't understand. Chances are you may also need to change your algorithms a bit because some operations aren't feasible in hardware with reasonable cost, and some operations can't be removed or simplified.
Compare this to the following situation: A graphical designer draws a fantastic new GUI for an application (on a piece of paper, even). Then you ask him how many lines or kilobytes of code this will be. And then the designer asks: "Can't I just scan the pictures I've drawn and have a software figure this out?". Sounds riddiculus? Yes, but: This is what you wanted.
To answer the original question: The only realistic estimate is to add all the gates you've got in you computer, and take that as an upper bound. Which is still just an estimate, because implementing an algorithm for real-time in hardware can still increase the gate count by leaps and bounds.
To be able to answer such a question you have to re-implement the algorithm in a Hardware Abstraction Language (HAL) like Verilog or VHDL.
I did this in one of our current systems, where I had to process a stream of data.
First I designed an algorithm in C which took an infile, generated an outfile and measured the "quallity" of the output.
Then I re-implemented the algorithm in VHDL, which looks and "thinks" totally different than the original C source (but still DOES the same).
Only after that one can give a realistic estimate (based on the target system/platform and timing constraints) on gate or cell counts.
I shall give it a go.
First up, most algorithms can't be directly translated to hardware without either changing them or taking a serious performance hit.
Nearly all widespread algorithms (eg. H264 video) are designed specifically with a hardware implementation in mind, and in fact must usually have elements removed that would produce good results simply because it wouldn't be sensible to implement in hardware.
In particular, in hardware, loops that iterate an unknown number of times are generally not allowed.
Steps to make this estimate would probably be to take your code and 'flatten' it (IE. Rewrite it to avoid all use of pointers, except arrays).
For every variable, figure out how many bits wide it needs to be(IE. What is the smallest and largest possible value). You probably want to convert floating point to fixed point.
Next, to make a lower bound of how many gates would be used if you were to design for minimal gate use, take every add and subtract operation and call them 15 gates per bit. For every multiply call it 5 gates per input bit squared. Don't do division (division can be done as a multiplication by the inverse of a number).
For the upper bound, do the same, but multiply by the number of times each loop goes round. That gives you a design with lots more gates but much higher performance.
For the upper bound finally add on 5 gates for every bit of every variable times the number of lines of your input code. This approximates the d type flip flops for storage in a pipeline. Note that if two lines of code operate on entirely different variables, you can call them the same line as far as this metric goes.
For the lower bound, if you got a value greater than 10000 plus 16 times the number of bytes that your program is compiled plus the ram it allocates to run, it would be more gate efficient to put in a tiny processor and keep your algorithm in a ROM. (Lots of complex algorithms are implemented this way when space is at a premium).
Note that these by the way assume you have the engineering time to 'do it properly'. There are lots of ways of making a considerably bigger design, but with much less design effort.
Check out 'Handel c' for example. Its a one click tool that takes C code and produces horribly inefficient hardware, but it works.
If you algorithm is standard just google "FPGA/ASIC IP provider" (of which there are many, eg H264 etc) and pay the price, your results will be optimal and cheaper than doing it yourself - assuming you never have before.
If your alogirthm is custom then you are either going to make a horrible job of it as you learn HW as you go, it takes years to get optimal and reach the clock speeds, area, and QoR/Test coverage numbers needed for production Si. Alternatively you could hire a HW team who will cost you a pretty penny to get it done, or outsource - also not cheap but it allows for risk reduction.
How many gates isnt just a question of counting "*" and "/" and scaling (although back of napkin will give you a general feel). Practically any C/C++ you write is going to be very sequentially orientated, this results in algorithms that probably have better parallel implementations. Whilst you might now be thinking threads, or separate processes we are talking HW parallelism, which is far more fine grained than threads, than SW parallelism. Further any integers you have may be optimizable to less than 32 (eg 1 bit or 3 bits) thus saving a large amount of HW area. Finally you didnt really say what sort of performance you want - if its 1MHz and 1 word / year of I/O then I suspect you could build some very clever hardware to do it all a few gates but if its 1GHz and 1 Giga Words/sec then area might expand as you will need to duplicate the circuit in parallel. Finally the speed at which you process data will affect latency (time from input to the first correct output) which is often a killer in real-time or other systems (eg. If you put LCD TVs side by side you might notice some are running several seconds behind whats being broadcast - this is because as it received the image some are doing more video processing to clean up the image - latency )
So at this point I would recommend looking into SystemC (based around C++), SystemVerilog (so so ) and then a raft of tools to help you do the job. These tools are called "High Level Synthesis" (HLS) tools and they arent cheap but they do cut down on man hours manually converting algorithms, but you will still need to be able to think extremely low level as bad code results in bad gate count - no matter the language.
I dont want to come over as a shill so I am going to present the 4 main competitors for HLS tools;
1) Calypto Systems, formerly Mentor Graphics' tools before spin-out ( http://calypto.com/en/products/catapult/overview )
2) Forte Desgin systems ( http://www.forteds.com/products/cynthesizer.asp )
3) Cadence C2S ( http://www.cadence.com/products/sd/silicon_compiler/pages/default.aspx )
4) Impulse C ( http://www.impulseaccelerated.com/ ) - this is very reasonably priced but has its limitations.
5) There are some open source things out there, I wouldnt recommend them as they are quite in their infancy.
Disclaimer: I used to work at one of the companies that provides synthesis tools, for >10 years converting C/C++/SystemC to HW quite often for a service fee. I can tell you we never had a design that cost under 30K and most were in the 100s to millions of USD.
It seems to me that a company asking software developers what it would take in hardware might possibly not know what they want.
Its highly possible that a small CPU and program on flash ROM solution might be all they really want. Do they really NEED it burned into the hardware?
You can't call yourself a software developer with a solid understanding of hardware. To develop software, you should always be thinking in terms of how the hardware is going to handle that software. That being said if you need a gate count then use VHDL or Verilog or another Hardware Descriptor Language. You can't actually convert an abstracted software language like C and up to gates because every single compiler and linker will turn out different end code.
Make sure that you can analyze the software properly and break it down to well defined functions/modules/whatever-your-abstraction-is
Implement hardware modules for corresponding inputs/outputs but make sure that you do not use an automatic tool. For hardware, you can often
do things very differently since you have different ways to implement things and may use state machines, memories and logic in different ways.
For each module, check hardware and software implementation against each other using CBMC or some other software which can actually verify that your implementation is, if not correct, at least equally bad as the implementation in the other domain.
(Since I'm posting as a Depressive Cyborg, you might be able to figure out what (SW/HW) made me a Cyborg and what made me Depressive....)
If your algorithm is purely arithmetic, then translate it to primitves (+,-,*,/) and estimate complexity based on simple full adders and flip-flps (bit level). Note that this is a very rough estimation and does not apply easilty to long, data-oriented code, since in that case your interest is with the data storage, not the operations on them (imagine adding +1 to a billion-billion-cell vector of counters).
If your algorithm is mixed-form, then you must know your hardware capabilities and, preferrably, its firmware. if you can transform your flowchart (low level) design to assembly code, then you can lookup the necessary opcodes in some standard IC and estimate (again, roughly) the order of the required IC in your case. For example, if you want to sort some 100-element vector of 16-bit integers, then a few generic x86 opcodes are enough - therefore, even 8086 (or a fraction of it) will do the job.
Generally speaking, the algorithm-to-gates estimation works only on very primitive or streamlined procedures, mostly arithmetic. That is, only when we are speaking about DSP (not CPU) implementations, like for GPUs. In almost any other case, most IC circuitry comes from the corresponding memory/heap modules, I/O, registers, etc, as the "algorithm" will require much more than a simple processing unit.
"Abashed the Devil stood, and felt how awful goodness is..."
I would advise them to first try on an FPGA (Field programmable gate array), and just write the program in Verilog, see how many gates it needs, and then simply select an FPGA from Altera or Xilinx that fits your needs. No need for a full blown ASIC.
If they are serious get some funding to start coding this in a hardware description language.
Note Well: this is a lot like asking how many x86 instructions a "C" program will take
without writing a "C" program. At best this gets you a starting answer.
If you tell the compiler to kill loop unrolling code shrinks and might run slower.
If loops unroll code grows but might run faster. SIMD instructions the code
can shrink. Now ask if the x86 answer is the same answer you get on a ARM
and a MIPS processor. The other thing to know is data path widths have large
impact -- wider is faster but used more gates -- too wide is slow -- too narrow is
slow.
Invest a couple grand of their money on some large FPGA development kits and
go to work. For the most part graphics hardware is tightly coupled stripped down
common processors and state machines setup to solve specific display problems.
One positive place to work is in the world of CUDA on graphics cards.
CAUTION.... the field is full of patents and going fast on CUDA is dancing with
a hungry bear... Any hardware you build to the same end will likely trip on patents
that others have.
Well written hardware descriptions read a lot like any programming language.
With a second beer in hand you can read down from an X-windows program
all the way town to gates and other hardware library stuff and hardly see a
speed bump.
Going fast in hardware requires clever minds....
And if you cannot build Open-GL and WindowZ graphics on top
of your "C" proof of concept you have a lot of work to do.
Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
I think that C/C++ is much too high a level to look at and say "This if statement requires n logic gates" because that if statement will be implemented in assembly differently on different systems. However, one can look at assembly code and say "Ahah, that is 2 logic gates, and there's three more and that's another 3.". So I think you need to compile your code on a whole bunch of systems, compare the assembly, and use some kind of average of the results to get a rough estimate of how many logic gates.
The hardware company may not have signed a contract yet. You don't want to just give something away to the customer when they haven't bought it yet. They're probably trying to establish design and build costs, so they will have an idea profitability and feasibility before locked into a contract to buy something they can't sell.
Wierdy's last suggestion is my personal favorite. It's really a sliding slope between software and hardware anyway. Does putting a Linux ROM with your algorithm set to autoload as a startup daemon in an x86 machine count as hardware or software? Embedded applications often have something resembling an OS, if not a full blown OS, managing resources. Unless your algorithm is super simple, or this electronics manufacturer is a glutton for punishment, I'd put your algorithm on a ROM alongside some DSP or other processing core and call it a day. Another option to explore that's between the two (all gates and ROM/processor combo) is a PAL/GAL... but it will certainly take some mental gymnastics to get your genetic algorithm into a form appropriate for burning the PAL. Good Luck!
Well, it really depends on the algorithm I'd say, simple things are easy enough to estimate depending on if you wish to run it in parallel or not. But if they come to you to ask for it that's usually not the case I figure.