Ask Slashdot: How Many (Electronics) Gates Is That Software Algorithm?

Holy crap by CajunArson · 2014-01-08 08:35 · Score: 5, Insightful

Either implement it as shaders for a GPU (or a DSP) or hire somebody who actually knows about hardware design if you are hell-bent on implementing an ASIC.

Slashdot: Where *not* to go to get specific advice about specific technical issues.

--
AntiFA: An abbreviation for Anti First Amendment.

Re:Holy crap by AJH16 · 2014-01-08 08:50 · Score: 0, Flamebait

Getting unlimited moneys is easy, just make a big enough bank, take all the money and then tell the government you need more.

--
AJ Henderson
Re:Holy crap by Megane · 2014-01-08 09:21 · Score: 5, Funny

And to think, they rejected my Ask Slashdot submission on how to find a cheat code on my bank's web site for unlimited moneys
Just walk up to any ATM and press: up up down down left right left right B A start.

--
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
Re:Holy crap by Joce640k · 2014-01-08 09:48 · Score: 5, Insightful

Just 'fess up and say "We don't know, we're software people, not hardware people".
If it's really important they might offer some help.

--
No sig today...
Re:Holy crap by fisted · 2014-01-08 10:01 · Score: 1

Doesn't sound to me like they were going to implement it themselves. But then again, you are frist post so you presumably didn't read TFS properly, in order to get your awsm frist post.

--
CLI paste? paste.pr0.tips!
Re:Holy crap by Anonymous Coward · 2014-01-08 10:05 · Score: 0

And to think, they rejected my Ask Slashdot submission on how to find a cheat code on my bank's web site for unlimited moneys
Just walk up to any ATM and press: up up down down left right left right B A start.

Do I use the braille keys or the other keys?
Re:Holy crap by Goaway · 2014-01-08 10:05 · Score: 5, Insightful

This is the only sane answer. They probably only asked to find out if you happened to know.
Say you don't know, and let them look at the code to figure it out.
Re:Holy crap by Anonymous Coward · 2014-01-08 10:07 · Score: 1

Either implement it as shaders for a GPU (or a DSP) or hire somebody who actually knows about hardware design if you are hell-bent on implementing an ASIC.
Slashdot: Where *not* to go to get specific advice about specific technical issues.
But that's not what the customer wants. They want to pay them money for their algorithm so they can put it on hardware. This isn't the response you give to a customer who is asking for information because they want to potentially pay for your algorithm.
Re:Holy crap by Anonymous Coward · 2014-01-08 10:28 · Score: 0

then pay it all back (with interest!) and have people still think you got free money!!
Re:Holy crap by Anonymous Coward · 2014-01-08 10:36 · Score: 0

Getting unlimited moneys is easy, just make a big enough bank...
That part is surprisingly hard. One weekend I tried, but the biggest bank I could make was only 8'x10' and even I didn't want to do my banking there.
Re:Holy crap by Anonymous Coward · 2014-01-08 10:50 · Score: 0, Insightful

This is the only sane answer. They probably only asked to find out if you happened to know.
Say you don't know, and let them look at the code to figure it out.
Anybody who holds an actual 4-year CS degree should be able to do at least a rudimentary analysis of this type.
Anybody who holds an actual 4-year "triple E" degree should be able to do a full blown analysis.
Obviously neither company has any employees with either of those two degrees working for them. Or if they do, they really suck.
Since submitter admits they are a software only house, my personal advice is to go find a different company to handle the hardware.
Re:Holy crap by Anonymous Coward · 2014-01-08 11:11 · Score: 1

This isn't even "please do my job for me", this is "this guy we're working with wants me to do his job. Please do his job for me." The hardware guy is asking the software guys to do hardware work.
Re:Holy crap by Anonymous Coward · 2014-01-08 12:04 · Score: 0

Here's an insane answer. How many transistors does it take to make a 2-symbol, 3-state device?
Leave it up to them to write the cross-compiler.
Re:Holy crap by Lehk228 · 2014-01-08 12:10 · Score: 2

it's easier than that. just walk in with a note "PUT ALL OF THE MONEY IN A BAG AND NOBODY GETS HURT"

might want to put on a fake mustache or a long wig and a stuffed bra if you have a girly face.

--
Snowden and Manning are heroes.
Re:Holy crap by Grog6 · 2014-01-08 12:14 · Score: 0

Be careful, you might get arrested for telling someone how to steal.

--
Truth isn't Truth - Guliani
Re:Holy crap by Anonymous Coward · 2014-01-08 12:35 · Score: 0

might want to put on a fake mustache or a long wig and a stuffed bra if you have a girly face.
I don't think you thought that one through :)
Mustache and a girly face?
Re:Holy crap by Anonymous Coward · 2014-01-08 12:48 · Score: 0

get it patented first.
Re:Holy crap by Lehk228 · 2014-01-08 13:23 · Score: 1

it's perfectly thought out. it's a psychological trick, all any witness will describe is an ugly girl with a gross mustache.

the same can be done with a ridiculous outfit and face paint but that will draw attention once you leave too.

--
Snowden and Manning are heroes.
Re:Holy crap by Anonymous Coward · 2014-01-08 13:55 · Score: 0

I agree that we are being asked to do this guy's job.

But damned if I'm not a little bit curious about the answer, and I'm sure others are as well.
Re:Holy crap by ozmanjusri · 2014-01-08 14:26 · Score: 1

Mustache and a girly face?
http://www.mtv.com/news/articles/1715521/justin-bieber-believe-movie-stache-clip.jhtml
Could work. It'd make most people look away in disgust...

--
"I've got more toys than Teruhisa Kitahara."
Re:Holy crap by kevingnet · 2014-01-08 14:56 · Score: 1

Just ask them how many programmers does it take to change a light bulb. He'll understand.
Re:Holy crap by Austerity+Empowers · 2014-01-08 16:10 · Score: 5, Informative

To give a more helpful, unhelpful answer, it's an ill-formed question. "How many gates" depends on the target on which you synthesize the hardware: a PCB, an FPGA, actual silicon (which fab? Which process? whose std cell library? what clock frequency?).
If somehow the above could be narrowed down by asking the customer, then the next thing I'd advise is contracting someone who can write RTL using an HDL (verilog is most popular). The synthesizeable subset of HDL is tricky to learn for non-HW people, so unless you understand digital logic well I'd suggest finding someone else to do it for you. They can then synthesize it to the targeted device/platform. If you can do this, you should charge quite a lot of money since this form of IP is expensive, and they know it. If they're ok with that, you may also want to have this contractor also write the design verification suite, since this company will certainly want that to integrate into their own testing. Lots of contractors are out there for this due to the cyclic nature of this job, make sure you also have some support feature in place if you need them to fix/update the code later.
Even simple software algorithms can be very big in HW, but some surpisingly complex SW algorithms are next to 1 liners in HW (like any form of bit masking or bit swizzling is free!). But generally if there are a lot of sequential steps, and those steps are different...it gets big. Also assume that for every 1 SW guy that wrote the code, you will need 1 RTL designer. If you take the verification step, it may be 1-2 verification engineers for 1 RTL, depending on your timeline.
Re:Holy crap by Anonymous Coward · 2014-01-08 16:53 · Score: 0

And more than likely, any number they could come up with would be way off.
Re:Holy crap by gargleblast · 2014-01-08 17:03 · Score: 1

Mustache and a girly face?
It was mustache or a girly face. Logical operator precedence.
Re:Holy crap by Anonymous Coward · 2014-01-08 18:09 · Score: 0

Bullshit...
God. Algorithms don't get implemented into hardware,
Code DOES. Look at your assembler file, to see how much code,
Each instruction need a different amount of gates.
Remember Lilth: the machine who's hardware ran Modula-2
Re:Holy crap by Anne+Thwacks · 2014-01-08 18:20 · Score: 0

Lileth? The PDP-11 was a hardware Fortran machine, and C was its assember! What does that have to do with the price of fish?
Realistically, if these guys ide of the relative complexity of logic makes them think that Add is 3 gates (per bit), but divide is 6, then they should not even be designing softwatre.
Hint: divide can be implemented in a wide variety of gates, but for words longer than abuot 2 bits, there is likely to be an enormous difference in speed (like 10^6) between implementations using 6 gates and ones using 10B^2 gates (where B is width of operands in bits).
(A real expert will havev a more accurate figure).
I once implemented a divide in hardware (FPGA) by using look up tables to find the logs of the numbers, subtracting them, and looking up the antilog (it had to be fast),

--
Sent from my ASR33 using ASCII
Re:Holy crap by Anne+Thwacks · 2014-01-08 18:24 · Score: 1

In the UK reading the GP post probably qualifies you as "havng information likely to be of use to a terrorist".

--
Sent from my ASR33 using ASCII
Re:Holy crap by fractoid · 2014-01-08 21:09 · Score: 4, Interesting

It's also ill-formed (to the point of being almost meaningless) in the sense that the smallest number of gates for a given algorithm is probably going to be to implement some kind of low-end processor which then runs the algorithm as code.

What they really wanted to ask was "what's the best price/performance option for executing this algorithm, given the following expected parameters and an initial production run size of X".

--
Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
Re:Holy crap by Anonymous Coward · 2014-01-08 22:06 · Score: 0

I tried this and subzero froze my account
Re:Holy crap by Goaway · 2014-01-08 23:26 · Score: 3, Informative

I work with plenty of people with that kind of degree or higher, and I doubt any of them could. Very few CS educations would teach you that. That is highly specialist knowledge, in an usual field.
I really don't know why you would ever think that would be a common skill.
Re:Holy crap by Anonymous Coward · 2014-01-09 00:08 · Score: 0

Obviously neither company has any employees with either of those two degrees working for them. Or if they do, they really suck.
In the case of the electronics manufacturer they generally have a higher need for inexpensive people than educated people.
The probably need one or two guys with a higher education but the last time they built arithmetics with gates was before C++ existed and C was a higher level language that it was unlikely that they would need to work with.
Re:Holy crap by Kagetsuki · 2014-01-09 01:46 · Score: 1

Absolutely agreed. Just the fact the author didn't mention anything about an FPGA surprised me. I imagine the chip manufacturer must have been taken aback when they couldn't even give him a ballpark range.
dryriver, you are doing it wrong. I question your motives for this if you haven't done it on a GPU or DSP as mentioned above, and compared to your current base implementation to that. If you are so convinced this absolutely needs to be done in hardware start looking for someone who knows what they are doing. I've only done enough FPGA development to know it's something that takes experience to do well and quite a bit of knowlege just to set up properly. Verilog and the like may look simple but consider how much time you spend valgrinding - you'll be doing that in hardware using a language which does not compile to something in any way you are used to, with no real conception or grasp of what to do to make things run better or even how to gauge performance. Save money by saving time by hiring a pro.
Re:Holy crap by Mr+Z · 2014-01-09 02:24 · Score: 2

I pretty much agree with all of the above, having worked in the biz awhile myself.
Since this is a graphics algorithm (apparently), the OP might do better to try to state what the computational complexity is in terms of the operations involved for one output, in terms of basic operations such as multiplies and adds, and perhaps how much storage you need.
Consider this example: If someone came to me and asked me "How much does an 8x8 IDCT cost?" After asking them if it needs bit exactness or not (some standards require it, others don't), I could give them some numbers and some implementation bounds. "The Chen IDCT needs around 11 multiplies and 20 adds per 8-pt IDCT. Multiply that by 16 to get the full cost for an 8x8. (176 multiplies, 320 adds) To meet video precision requirements for an 8x8, the multiplies should be greater than 16 bit precision, and you should carry greater than 16 bits of precision between horizontal and vertical passes."
How many gates is that? Well, depends on the throughput you require, and the details of the implementation. Given the number of multiplies and adds required, you can work toward a number. Suppose you needed to have enough IDCT bandwidth to update a 1080p 4:2:2 image at 60Hz. So, that's 1920 * 1080 * 2 * 60 = approx 250M pixels/second that you need to produce. In terms of 8x8 blocks, that's a little under 4M blocks/second, with 176 multiplies and 320 adds. So, that's approx 700M multiplies a second and 1.3B adds.
Still, that's far from enough to get to a gate count. If you put down 1 multiplier and 2 adders and ran it at 1GHz, you'd have more than enough compute throughput. You still need to add some control logic around it (especially if you only put 1 multiplier and 2 adders, because the IDCT's compute pattern is non-trivial), and some memory to store inputs, outputs and intermediate results. A more likely implementation probably has a lot more multipliers and adders in hardware, but also runs at a much slower clock rate.
So how many gates is that? You need much more information to answer that question, despite the analysis above. You now need to pick an implementation strategy, and more than one makes sense. But, you have a much better idea of the computational cost, and can pick among multiple implementations. For example, if energy efficiency is your goal, you might implement the horizontal and vertical IDCTs in explicitly tuned multiplies and adds tuned to the exact precision necessary and connected exactly as the dataflow requires, and run the whole block at a low clock rate using slower transistors with less leakage. If flexibility is your goal, you might put in a small CPU with enough grunt to fit the computational load. with the idea that you can run other algorithms there if you need to. etc...

--
Program Intellivision!
Re:Holy crap by DarwinSurvivor · 2014-01-09 02:51 · Score: 1

You can't patent and algorithm. At least your not supposed to be able to...
Re:Holy crap by fatphil · 2014-01-09 02:59 · Score: 1

That, or having a London A-Z.

--
Also FatPhil on SoylentNews, id 863
Re:Holy crap by meustrus · 2014-01-09 03:57 · Score: 1

With my 4-year CS degree I could tell you the basic idea, and I could recognize software that did it, but it would take a month for me to implement something myself. So here's my stab at the problem.
The crux of the issue is to reduce the software to specific operations for which you know how many gates are needed. To get a rough idea, I'd look at the compiled bytecode. There might then be an existing table of how many gates are needed to implement each operation in the resulting bytecode, or even more likely a number of transistors. But if not, that's where it would take a month of doing rough logical analysis to put together such a table. Then you add it up and get your result, which is kind of "it shouldn't take more than X many gates".
But then somebody has to actually transform the program into transistors so maybe you should just hire somebody that can do that. If you have the hardware design, it's trivial to tell someone how many transistors/gates are in it.

--
I sometimes ask revealing, often ignorant-seeming questions. Maybe they're harder to answer than you think.
Re:Holy crap by Anonymous Coward · 2014-01-09 04:09 · Score: 0

What is a "triple E" degree? Did you mean EE (double E)?
Re:Holy crap by bluefoxlucid · 2014-01-09 04:12 · Score: 1

I got an associate's degree in computer networking because I learned to configure CISCO routers. The way to handle this is to define your algorithm as a set of discrete logic and arithmetic actions (arithmetic actions can be represented as half-adders and such), and then count the number of decisions and do some on-paper optimizations. Then you know how many gates you need, roughly.
Then again, I have the inherent ability to simulate the entire universe in my head on the cosmic or subatomic level so...

--
Support my political activism on Patreon.
Re:Holy crap by Anonymous Coward · 2014-01-09 04:20 · Score: 0

k-maps are not highly specialized.
Re:Holy crap by Goaway · 2014-01-09 05:00 · Score: 1

This is quite many levels beyond little k-maps.
Re:Holy crap by jeremyp · 2014-01-09 05:17 · Score: 1

Lileth? The PDP-11 was a hardware Fortran machine
No it wasn't.

and C was its assembler!
No it wasn't, it's assembler was Macro 11 which doesn't look anything like Fortran or C.

--
All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
Re:Holy crap by gregor-e · 2014-01-09 06:40 · Score: 1

No, the easy-peasy software developer estimation is to buy a bunch of progressively smaller CPUs, port your algorithm to each of them, and find the smallest CPU on which your algorithm still provides acceptable throughput. Then quote the number of gates on that CPU. If your algorithm still runs acceptably on a 4004, you can tell them it'll take about 2,300 transistors.
Re:Holy crap by Anonymous Coward · 2014-01-09 10:43 · Score: 0

Probably said a million times, but I don't see it on the first pages, so to give it a chance: Most software algorithms are ill-suited for hardware implementation. At least, some changes will have to be made. As such, answering such question will pretty much require having the implementation ready. A full adder for example requires 5 gates of logic per bit of precision, per addition being made. But often you may be able to share resources, say in a simple case repeating same calculation on Red, Green and Blue on successive clock-cycles. In most cases you will benefit greatly from doing things a software algorithm would do sequentially in parallel instead. These trade-offs are process, implementation and constraint dependent. I'm at loss to explain why an electronics manufacturer would even ask that question, unless they thought you already had a commercialized hardware implementation.
Re: Holy crap by Anonymous Coward · 2014-01-09 13:51 · Score: 0

That really depends on the complexity of the software.
Re:Holy crap by stevesliva · 2014-01-09 16:35 · Score: 1

K-maps can't develop stateful logic?!? Inconceivable.

--
Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
Re: Holy crap by Anonymous Coward · 2014-01-09 17:48 · Score: 0

This.
You could start by disassembling your program and trying to convert functions into digital logic, but you will quickly find out that it just doesn't work that way. I would think that the amount of time it would take to figure this out is exactly as long as it takes to write your algorithm in a hardware description language and have the computer solve it for you.
Re:Holy crap by Anonymous Coward · 2014-01-10 04:22 · Score: 0

I work with plenty of people with that kind of degree or higher, and I doubt any of them could. Very few CS educations would teach you that. That is highly specialist knowledge, in an usual field.
I really don't know why you would ever think that would be a common skill.
Try rewriting your code as SystemC, one of the VHDL derivatives. It will synthesize and can give you a direct count. There are transliterators from Aldec, but they don't work very well and even screw up the logic occasionally.
Re: Holy crap by Anonymous Coward · 2014-01-10 16:56 · Score: 0

And when it doesn't work as vaguely explained, fire someone, then double the budget. Rinse and repeat.
Re:Holy crap by Teancum · 2014-01-10 17:31 · Score: 1

I think he was talking about Elementary Education majors. They know how to count lots of gates.... white gates, black gates, picket fence gates, and other kinds of gates. They will even show you how to put that on a proper number line to count to more than the number of fingers on both hands.
Re:Holy crap by Some_Llama · 2014-01-13 13:41 · Score: 1

"Then again, I have the inherent ability to simulate the entire universe in my head on the cosmic or subatomic level so... "
i thought this was a common ability?

How many by Aighearach · 2014-01-08 08:35 · Score: 2, Funny

beowulf clusters does your algorithm desire?

Verilog by tepples · 2014-01-08 08:35 · Score: 4, Informative

If you learn to program in Verilog, you could try synthesizing for some FPGA and see how much space it takes up on the FPGA. But then programming for an FPGA differs from programming for a serial computer in that each line of code runs essentially as a separate thread, usually triggered on another signal (such as a clock) having a positive or negative edge.

Re:Verilog by Anonymous Coward · 2014-01-08 08:46 · Score: 5, Interesting

if you only need a estimation, use something like bamboo from PandA to convert your C Code to Verilog. Then synthesize this code for a FPGA. In the summery you should find how many logic cells would be used as well as how many digital gates in an asics are necessary. This value is only a estimation, but for your question, this should work.
Re:Verilog by Andy+Dodd · 2014-01-08 09:03 · Score: 3, Interesting

While there are some compilers that ATTEMPT to convert C/C++ into a hardware representation - These will usually fail unless you understand the target hardware.
http://www.drdobbs.com/embedded-systems/c-for-fpgas/230800194
One thing is: Even if you can successfully compile from C to Verilog or VHDL, there is no guarantee that the Verilog or VHDL will successfully synthesize on your target hardware.
Even if it successfully synthesizes, there is no guarantee that it will be in any way an optimal implementation.
Some C algorithms may never transfer well into a hardware implementation.

--
retrorocket.o not found, launch anyway?
Re:Verilog by ranulf · 2014-01-08 09:23 · Score: 5, Informative

The number of slices or logic cells or whatever else a particular synthesis program for a particular chip generates doesn't exactly correspond to a number of gates either. For instance, a single 4-in 1-out LUT on a Xilinx can be used for 1 gate or 6.
I wouldn't have much confidence in automatic C to HDL conversion either. Good HDL design is about understanding the problem in terms of gates and parallelism. FPGAs and ASICs in general aren't particularly good at things that CPUs are good for, and inversely CPUs aren't especially good for things that FPGAs and ASICs can do well.
The OP shows such a lack of understanding of hardware design that it's not funny! "Add = 3 gates, Divide = 6 gates" is quite comical to anyone who actually knows these things. A more ball park is that an n-bit add can be done with 2n LUTs, in terms of gates it's about 5n gates, but really that depends what gates you have available. A multiplier is massively more, dividing is even more complicated still. Fortunately, many FPGAs come with a few dedicator multipliers... Unless your algorithm requires only as many multipliers as you have available, you're probably best building a state machine and multiplexing a single multiplier unit, in much the same way as a CPU multiplexes the ALU at its core.
The whole thing is massively dependent on algorithm and experience of the person doing the porting. The best advice is to say "I don't know" or to hire someone who does or suggest them running the algorithm on an embedded CPU.
Re:Verilog by bob_super · 2014-01-08 09:37 · Score: 1

This.
But there has been recent progress, and Xilinx is pushing hard to get people to compile C to gates with their Vivado HLS (guess the targets?).
Worth having a look at, since you usually can get a 30-day eval license for FPGA tools.
Re:Verilog by Jane+Q.+Public · 2014-01-08 09:48 · Score: 2, Informative

"Some C algorithms may never transfer well into a hardware implementation."
This is a fundamentally silly thing to say.

Hardware can be made to implement ANY functioning software. It might not be easy, but it is pretty much by definition possible. It's already running on hardware... it would be very rare indeed for it to not be possible to translate it into even more-efficient hardware, since the hardware it's running on now is general-purpose.
Re:Verilog by Asmodae · 2014-01-08 09:50 · Score: 1

Yep. Although sub-optimal is like the understatement of the year. I've seen not just inefficient but inefficient by an order of magnitude at times.
Re:Verilog by harrkev · 2014-01-08 09:50 · Score: 5, Informative

Seriously???? Asking a C++ programmer to begin to use Verilog is simply not practical. There is a VERY STEEP learning curve in trying to target real hardware. There is even a very different frame of mind that has to be learned in order to target gates.
I speak from experience. I program Verilog and SystemVerilog for a living doing ASIC design.
Now, to answer the OP:
The answer is very strongly: it depends. The most optimistic answer is a couple hundred thousand. Implement an 8-bit CPU and write the thing in under 32K of code.
On the other end of the spectrum is "many billions." Design your own x86 multi-core CPU, throw a couple of gigs of SRAM on the ASIC, tons of flash for a solid-state disc drive, and you will have a complete high-end PC on a chip. Then add your software.
Of course, these are both ridiculous extremes. Everything depends on the TYPE of operations being done. In a CPU a simple 32-bit multiply can be done with one character ("*"). In gates, if you need the answer in a single clock cycle, it can take an EXTREME amount of logic. However, if you are willing to wait 32 clock cycles for the answer, the amount of logic is reduced to a very manageable level. This is why C++ is a bad choice of input. How time-sensitive is it? Hardware is also very parallel in nature. Different parts of the chip can indeed be working on different things at the same time. You can go for a strictly pipelined architecture where each block does one little bit of the job and passes it off to the next block. High throughput, but lots of gates. Or you could design a general-purpose block and have it to everything slowly (the most extreme example of this approach is a common CPU).
While I have heard of magic "C to gates" compilers, after almost 15 years in the business, I have never actually seen one. The closest that I have seen are tools that can turn Matlab code into (messy-looking) gates. If your algorithm is DSP in nature, this is a very viable alternative. Otherwise, the only advice that I can give you is to consult somebody who does hardware design for a living (like me).
Otherwise, you really need to look at where the input comes from, where the output goes, and how fast you need to do the work.

--
"-1 Troll" is the apparently the same as "-1 I disagree with you."
Re:Verilog by SecurityTheatre · 2014-01-08 10:01 · Score: 3, Informative

"Add = 3 gates, Divide = 6 gates" is quite comical to anyone who actually knows these things.
Looking at an old reference I have, a 16-bit ripple-carry style adder requires 576 transistors, and a 16-bit carry-lookahead style adder (faster) requires 784 transistors.
This is not including ANY control circuitry, nor a subtract feature.
A pure-hardware 16-bit integer DIVIDE is between 15-30 times more complicated. To do it in pure hardware, would require on the order of 23,000 transistors.
Unless you need your division to happen wicked fast with low latency and you don't care about transistor count, it's better to build add/shift hardware and simply perform a division operation using those bits of hardware repeatedly.
Also, we're only doing 16-bit. If you need 64-bit, multiple all of those numbers by about 50 (spitballing).
And converting from C into VHDL is probably not going to be the best way to go about this. Hire a decent hardware engineer.
Re:Verilog by xvan · 2014-01-08 10:01 · Score: 1

Yes, but the best tecnology is currently applied to CPU's, not FPGA or ASICS. So for certain sequential algorithms, the size of the pipeliene should be too big to beat a Processor real speed.
Re:Verilog by Darinbob · 2014-01-08 10:02 · Score: 1

Ya, FPGA is a good start, but you often need experts to redesign the algorithm for hardware. Ie, you will be able to do much more parallism than in software (fine and coarse grained, maybe pipelined dataflow, vector operations, etc). Software as an algorithm usually has very little parallelism unless using a language intended to show the parallelism.
Maybe consider if part of the algorithm can be better done with a DSP chip as well.
As for how many gates, well as many gates as it takes to have an 8 bit CPU is one answer plus the gates to hold the memory of the algorithm. It won't be fast that way but it certainly is enough. Since it's not an acceptable answer I suspect, this implies that the question of "how many gates?" is the wrong question to be asking.
Re:Verilog by Asmodae · 2014-01-08 10:08 · Score: 1

He didn't say "may not transfer at all", he said "may not transfer well". Also remember that an algorithm isn't just running on any old bit of hardware it's running on a modern CPU with lots of special instructions with a gigantic RAM attached to it and potentially some other peripherals for special functions. Hardware RNG, etc. It might very well not be reasonable to try to convert all this to a custom FPGA/ASIC for the cost involved.
Re:Verilog by fisted · 2014-01-08 10:12 · Score: 1

It really isn't feasible for even moderately complex systems.
Or you seem to be ignoring that most 'hardware' does pretty much nothing without .... software (i.e. firmware).

--
CLI paste? paste.pr0.tips!
Re:Verilog by harrkev · 2014-01-08 10:14 · Score: 5, Interesting

Oh, one more thing about "C to Gates" compilers. In the industry I have not seen one in actual use, but they do supposedly exist. However, they would only work in a limited domain.
For example, if you have C++ that does simple control or DSP-type stuff, then it might work (cannot vouch for the quality of the results). On the other hand, if you get one of these compilers and try feeding it the source code for the Apache web server or the Quake engine source code, you are completely screwed.
If your application is, say, a novel type of network filter that inspects and does something to Ethernet packets, you have to figure out how to interface your design with a real Ethernet SerDes .. which is a *LOT* different than opening up something in the "/dev/" directory. If your application is robotics, then you also need to get data into and out of the chip. How exactly is this done? How fast does the logic need to run? Is it speech processing? If so, then this will involve a lot of straight-forward DSP. If you constrain the design to tell it how fast the data needs to flow through, you should be able to get a reasonable estimate. Does your application need a lot of memory? If so, you might need some type of RAM controller. DRAM controllers can be hairy to work with, and you also have to consider latency and throughput.
In theory, C to gates can work quite well, ***for a limited subset of applications***.
HOWEVER: as others have pointed out, anybody who needs to know the answer to this question should be qualified to answer it for themselves.

--
"-1 Troll" is the apparently the same as "-1 I disagree with you."
Re: Verilog by Scowler · 2014-01-08 10:31 · Score: 1

Verilog syntax was designed specifically to make it similar to C syntax, so I have to partially disagree with you on that note. A lot of software engineers do understand basics of system design, as well as some basics of parallel processing. There is indeed a learning curve on Verilog, but I'd say the vast majority of it is learning how to create effective test benches, not writing the system logic itself.
Re:Verilog by MickLinux · 2014-01-08 10:47 · Score: 2

I am confounded by your claim that a 16-bit hardware divide would take 24000 transistors. If nothing else, you should be able to cascade it into 4 4-bit lookups, and that would handle the job. And that would probably be overkill.
Using shift-and-add would almost definitely seem to be better, especially since you could cue the operations. Although one 16-bit divide would then take about 120 clocks, 120 divides could take 240 clocks. (Look at me, I say clocks, I should say ops, and then let the clocks be whatever they are, be they quads or quarter clocks).
Even better, logarithm takes only about twice that -- it's a lookup Shift-and-add, and square root is only about 140 clocks.
Sure, you could go with the 24000 transistors, but wouldn't that end up being a cost/benefit situation? All that is in the domain of the chip design within constraints.
Or am I wrong?

--
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
Re: Verilog by harrkev · 2014-01-08 10:48 · Score: 5, Insightful

I still must disagree. Yes, the syntax is somewhat like C. However, WHAT you are coding is completely different. In particular, things that C and do with a simple "if" statement are not allowed at all in proper gate design. It is not hard to imagine a software guy coding latches all over the place, assigning the same signals from withing different always blocks, etc. Even "always @(posedge clock)" may be a fundamental paradigm shift for a software guy. And not to mention the rather arbitrary way that Verilog treats wire vs. reg.
wire a = b & c;
wire a;
assign a = b & c;
reg a
always @(*) a = b & c;
These three constructs do the same thing. Why is one "wire" and one "reg"?
What is the difference between the two blocks (they are NOT the same - blocking vs. non-blocking)?
always @(posedge clk) begin
a = b;
c = a & b;
end
always @(posedge clk) begin
a = b;
c = a & b;
end
What about race conditions? Glitches on combinatorial logic? Proper coding of state machines? Need memory? How do you drop in an encrypted 3rd party DDR controller and PHY? Interface with AHB bus? In a given process, how many levels are logic are reasonable for a given clock speed? What exactly are hold violations?
I am not saying that any of these are insurmountable. What I am saying is that a good digital designer is worth paying for, and a software guy may have a very steep learning curve indeed.

--
"-1 Troll" is the apparently the same as "-1 I disagree with you."
Re: Verilog by harrkev · 2014-01-08 10:49 · Score: 3, Informative

Gaaa. On the blocking vs. non-blocking, Slashdot swallowed the "less than" sign. Apologies.

--
"-1 Troll" is the apparently the same as "-1 I disagree with you."
Re: Verilog by Anonymous Coward · 2014-01-08 10:51 · Score: 0

Hmm, no. Unless you're already High Breed of software engineer that can actually write down the algorithm as math and simple high-parallel state machines from the point of view of input and output (something that is not useful at all to a normal software engineer, which will work at a MUCH higher level of abstraction all the time), you will need to learn how to think that way to get anything useful out of verilog.
It is not that verilog is difficult. It isn't. Its that hardware design is very very different from software design.
Want an extremely basic example? For hardware (and verilog), when you tell it that A = B, you're not assigning the value of B to variable A. You're telling it that signal A is an *alias* of signal B (you basically wired A to B).
Re:Verilog by SecurityTheatre · 2014-01-08 10:55 · Score: 2

I was misunderstanding my notes.
You would need several thousand transistors for a standard DIV circuit, and then the CPU would need to iterate through the operation many times in order to perform a division.
A single-cycle division circuit isn't practical, so it would involve building a state-machine and having the processor stall while doing the DIV calculation. The simple 1-bit circuit I was looking at would require a number of cycles equal to the number of bits input (16, 32, 64, etc), although they can be made faster.
looking at it, the latency for the Core2Duo chip to do a 64-bit integer DIV up to 87 cycles, and that's a pretty optimized circuit for raw speed.
Re:Verilog by harrkev · 2014-01-08 10:55 · Score: 1

I think that he was talking about doing a divide the dummy way: just use the "/" character and let the compiler do it in one clock cycle. Yes, you can do divide in a LOT fewer transistors, but you have to be smart about it, and wait a few extra clock cycles.

--
"-1 Troll" is the apparently the same as "-1 I disagree with you."
Re:Verilog by SecurityTheatre · 2014-01-08 10:56 · Score: 1

Yeah, I misunderstood my notes. See below :-) I last did any hardware design 15 years ago. hah
Re: Verilog by Anonymous Coward · 2014-01-08 11:01 · Score: 0

The syntax is similar, but that's where the similarity stops. Unless your C programmer is used to dealing with writing massively code and I don't mean just a few threads. HDL languages are inherently parallel just as all the hardware gates are. Also have fun debugging that code. Some of the basics stuff like synchronize different clock domains, understanding and meeting timing constraints.
There is a steep learning curve.
Re:Verilog by Anonymous Coward · 2014-01-08 11:10 · Score: 0

Shouldn't this be exactly what the "manufacturer" had done already?
Sheesh, who you got building your product!
Re: Verilog by SecurityTheatre · 2014-01-08 11:12 · Score: 2

I was wondering... Stared at that for too long before deciding something must have happened... :-)
Re:Verilog by InvalidError · 2014-01-08 11:17 · Score: 1

HDL is not that steep of a learning curve for people who have no problem thinking in parallel instead of serial. Personally, I have an easier time writing VHDL code than C/C++ and a large part of this is because writing HDL requires much more clearly defined goals than high-level languages.
The biggest problem with translating C-code to gates is that even if there were "magic compilers" to do this automatically, the number of gates will vary drastically depending on how much loop unrolling, pipelining and other parallelism the algorithm allows, the available gate budget, the actual performance goals, the ASIC process itself with its primitives library, etc. It is extremely unlikely that automatic tools will manage to achieve a good balance between all factors without substantial guidance and if you are going to set compiler hints all over the C-code to tell the HDL translator how you want it to unroll loops and exactly how you want stuff to get pipelined, it may end up cheaper, faster, cleaner and far more efficient to simply ask an HDL programmer to port the algorithm. There is not much point in using a "magic compiler" if you still need HDL/FPGA/ASIC specialists to go through the whole algorithm and re-think it from a hardware point of view to put the relevant hints in the code to assist the compiler in producing at least somewhat sensible code... might as well pay them for a proper port since this is pretty much what those guys need to have in mind to put meaningful hints in the original code.
As you said, it has been the holy grail of some software developers for a while and I have a hard time imagining a successful "magic compiler" any time soon.
Re:Verilog by eric31415927 · 2014-01-08 11:35 · Score: 1

Previous poster had the following signature:
MR ASICs. MR not. SAR CDEDBD transistors? YLB. MR ASICs
YLB??
This should be replaced with Li'l B.
Mr Ducks; Mr Knott; Czar; C. M. Wings; Li'l B.; Mr Ducks
Re:Verilog by kesuki · 2014-01-08 11:47 · Score: 2

"A multiplier is massively more, dividing is even more complicated still."
which is why you multiply by .5 to get division by 2. by 3 you need to multiply by .333334 depending on your precision. all possible divisions are a subset of multiplication from .999 infinite repeating to .000near infinite zeros followed by a 1. strange that something so 'easy' is harder than regular multiplication.

--
https://www.gnu.org/philosophy/free-sw.html
Re:Verilog by Anonymous Coward · 2014-01-08 11:51 · Score: 0

I don't see how you can do add in 3 gates. My best was a variation of the carry look ahead adder which came out to 4 gate delays per bit with any number of bits carried out in parallel.
Software like CEDAR logic simulator help a ton with figuring out logic before approaching circuitry design. But you do need to know how to turn a logic diagram into a functioning circuit. It helps a ton to understand what each gate actually is and how to construct it out of inversions if you do anything remotely complicated, cuts back on number of gates when you can factor some of the terms out.
Re:Verilog by Bing+Tsher+E · 2014-01-08 12:06 · Score: 1

Hardware can be made to implement ANY functioning software.
Sure it can. Even if it involves huge, huge diode arrays and many pounds of solder.
Re:Verilog by Bing+Tsher+E · 2014-01-08 12:08 · Score: 1

I replaced a microcontroller with a dual op-amp and some passives in a design when they told me the OTP microcontroller was too expensive. The CPU was about 20 cents. A dual op-amp was less than one cent.
Re:Verilog by cheater512 · 2014-01-08 12:13 · Score: 1

If you want to trade transistors for time, just use a CPU.
Anyway when making a chip, 24,000 transistors is not much. You don't want to do it everywhere sure, but a couple of times and it isn't an issue.
Re:Verilog by Anonymous Coward · 2014-01-08 12:15 · Score: 0

It's been awhile, but I used to use Cadence SoC Encounter. It was pretty effective in taking behavioral verilog and automating a working layout. Of course, you had to tweak it quite a bit. As mentioned, there are a few different software packages available to convert from C/C++ to verilog first.
Re: Verilog by Scowler · 2014-01-08 12:21 · Score: 1

Your point is well made. I work with a lot of software engineers who are required to be both highly proficient in C++ and modestly proficient in Verilog, and have seen many of the mistakes you highlight. One thing I wish I had mentioned before is, FOR-GENERATE logic specified in Verilog 2001 spec. Using this syntax made the transition for software engineers even easier, although it can be argued that type of syntax is often abused these days and is difficult to maintain.
Re: Verilog by O('_')O_Bush · 2014-01-08 12:29 · Score: 1

He was talking most common/basic arithmetic. Floating point(the only way you'd do 2*0.33333 == 2/3) is a whole different animal, and more complicated than any of the other functions discussed.

--
while(1) attack(People.Sandy);
Re:Verilog by Jane+Q.+Public · 2014-01-08 12:31 · Score: 0

"Also remember that an algorithm isn't just running on any old bit of hardware it's running on a modern CPU with lots of special instructions with a gigantic RAM attached to it and potentially some other peripherals for special functions."
You're merely reinforcing my point. There is somewhere between little and no reason to believe that custom hardware could not run the algorithm better.

As for the word "well", it is open to interpretation. It is certainly possible to make it run well. I would say that "well" is not the right word. It just might not be easy to do it.
Re: Verilog by Scowler · 2014-01-08 12:39 · Score: 1

Debug and test is certainly the hard part. Tools like Modelsim don't seem very approachable and take time to learn. Creating proper test coverage is always a challenge, even for vets. As for crossing clock domains... Oy, that part does have a steep learning curve...
Re:Verilog by Asmodae · 2014-01-08 13:10 · Score: 1

Unless the algorithm requires all those special instructions and monster ram to run.... at which point your custom hardware looks very much like the CPU and system it is intended to replace, and definitely not cheaper unless you're selling a whole lot of them. Reliable hardware is expensive to build even when it's a simple design iterating on previously known good hardware. Starting from scratch on raw silicon takes millions of dollars, just for your first chip lot, not to mention all the man hours to get it there and subsequent revisions. There are lots of algorithms that don't make any sense (from a cost vs efficiency standpoint) to port to custom hardware. That's the whole reason the generic CPU exists in the first place.
I guess I'm disagreeing with your definition of better. If it's faster but costs too much for anyone to actually buy isn't better.
Re:Verilog by Jane+Q.+Public · 2014-01-08 13:25 · Score: 0

It wasn't about "better". GP's comment was that it couldn't be made to run "well". It certainly can be made to run well.

*I* was the one who said it wouldn't necessarily be "easy."
Re:Verilog by Asmodae · 2014-01-08 13:40 · Score: 1

The original wording was "Some C algorithms may never transfer well into a hardware implementation." At least in my mind the transfer process is what might not go well... not how the final product may or may not run. Having some experience here I understood the transfer to be where the work/expense would be. And those are ultimately key factors you would use to base your decision about whether or not to go ahead and make the conversion.
I don't think we disagree on content, just on what might have been meant by Andy's post. Especially considering exactly what you said for the reasons you said, making claims of impossibility would indeed be silly.
Re:Verilog by Jane+Q.+Public · 2014-01-08 13:43 · Score: 0

I don't think we disagree on content, just on what might have been meant by Andy's post. Especially considering exactly what you said for the reasons you said, making claims of impossibility would indeed be silly.
But this is also what I wrote above. It depends on what you mean by "well".

All *I* meant was that it was possible to make it run well in hardware. If he meant something else, well, fine.
Re: Verilog by Asmodae · 2014-01-08 13:55 · Score: 1

Nice point - For is used for iteration in software, and For Generate in hardware is used to generate new instantiations. The similarity in words/syntax is a dangerous trap. The closest thing in software is like putting malloc or new() in a loop. It's a great convention for when you need many similar bits of hardware. Completely wrong for iteration.
Re:Verilog by harrkev · 2014-01-08 14:07 · Score: 1

At least you get it. If that sig doesn't get me geed cred, I don't know what will.

--
"-1 Troll" is the apparently the same as "-1 I disagree with you."
Re: Verilog by harrkev · 2014-01-08 14:15 · Score: 1

Meh. As long as you use this the way it is intended: making a lot of instances that look a lot alike, for-generate is awesome. I cannot imagine having to instantiate 128 instances of RAMs without it. Well, I could use PERL to generate Verilog, but that gets messy fast.
For what it's worth, Emacs has some pretty rockin' Verilog features. The ability to hook things up by name, with a regexp thrown in to keep things sorted, is awesome.
Emacs also has a completely different VHDL mode that provides a completely different set of features, with the down side that you have to use VHDL ;-)
Seriously, Emacs has a Verilog and a VHDL mode that both provide awesome, but almost completely non-overlapping features. Verilog mode does lots of cool things that VHDL does not do, and vice-verse.
I wish that somebody would port the VHDL stuff over to Verilog world... that somebody NOT being me, since I already have Verilog, bash, csh, PERL, and some Java rolling around in my head. It would probably sprain my brain to get enough LISP in there to do the job. Hmm, is it "if () {" or "if () begin"??? Do I do "else if" or "elsif." That sort of thing gets tiring after a while. Plus, with a wife and five kids, not much time for learning a new language either.
Why doesn't somebody port Emacs over to PERL instead of the bloody abortion known as LISP? Not THAT I could learn to love.

--
"-1 Troll" is the apparently the same as "-1 I disagree with you."
Re:Verilog by harrkev · 2014-01-08 14:23 · Score: 3, Interesting

Actually, that depends on what the 24,000 transistors are doing. Let's assume that you stupidly did a divide using Verilog "/". This implies a one-cycle divide which might well take that many transistors. The problem is that you would not likely be able to get this to work in real life. With so many levels of logic, your timing would be pure crap. Plus you might have fanout and congestion issues that would further limit your timing. So you could get a divide in one clock cycle, but limit yourself to a clock speed of 10 MHz, for example.
Once you get past about 10 or 12 levels of logic (in my opinion), it is time to re-code, no matter what your clock speed is. If you can't get the job done in 12 level, it is time to re-think your approach. Register re-timing can certainly be useful, but it is much better to do the job right in RTL, the way God intended. Register re-timing can make later steps more complicated (including formal verification).

--
"-1 Troll" is the apparently the same as "-1 I disagree with you."
Re:Verilog by harrkev · 2014-01-08 14:30 · Score: 1

"Some C algorithms may never transfer well into a hardware implementation."
This is a fundamentally silly thing to say
Not silly at all. Imagine a malloc of a gigabyte of RAM. You do not want to casually just drop a gigabyte of RAM into an ASIC, since that would likely be most of your chip size. You would need to use some sort of DRAM controller, which is HIGHLY dependent upon what foundry you use.
Also, how about opening an Ethernet port? Is this magic compiler also supposed to magically create a SerDes -- complete with a PLL, for any architecture that you choose?
Should I even mention file opens -- how would that work on a chip with no hard drive attached? Use your imagination about keyboards, mice, and graphics cards, and sound.

--
"-1 Troll" is the apparently the same as "-1 I disagree with you."
Re: Verilog by doctor.entheogen · 2014-01-08 14:36 · Score: 1

Trying to see if you can escape out the less than sign. &lt= I guess not =
Re:Verilog by Anonymous Coward · 2014-01-08 14:36 · Score: 0

and if they're going to hire a proper hardware engineer, they may as well hold the entire design in house and not bother to sell the algrythm. At that point, they could then license the damn design and let someone like TSMC/Intel/Global Foundaries or any of a rash of others make their chip.
Re:Verilog by Anonymous Coward · 2014-01-08 14:55 · Score: 0

Let's ignore for the sake of argument the fact that you've turned a (possible) integer problem into something involving fractions.
You have "simplified" your division implementation by describing it as multiplication by the inverse. This is certainly a valid way to implement division (let's ignore real-world issues such as accuracy concerns).
But then you wonder why it's "harder than regular multiplication": this is because you have forgotten to consider the difficulty of computing multiplicative inverses. To do, say, a division of 12 by 17 in this manner, the first step is to find the inverse of 17 (approx. 0.059) and the 2nd step is to peform the actual multiplication. In writing your post you just calculated some inverses in your head, but you can't skip this step in a real computer that's supposed to do division!
Re:Verilog by Anonymous Coward · 2014-01-08 15:13 · Score: 0

go ahead and try and multiply 5 by 0.3334 using your integer multipliction gates.
Re:Verilog by gargleblast · 2014-01-08 17:30 · Score: 1

Dyslexia?
Re:Verilog by Anonymous Coward · 2014-01-08 19:26 · Score: 0

use something like bamboo from PandA

Due to incredibly bad product naming, I am unable to find this product using google and searches for the company & product name, even if I narrow the scope down by including "software development" or "verilog" as keywords. Do you have a link?
Re:Verilog by Anonymous Coward · 2014-01-08 20:50 · Score: 0

If nothing else, you should be able to cascade it into 4 4-bit lookups, and that would handle the job. And that would probably be overkill.
Or am I wrong?
I don't know exactly how you are reasoning but if you think the way I think you think then you are wrong.
(A+B+C+D) / (E+F+G+H) is not the same as A/E + B/F + C/G + D/H. You can not split the denominator.
This means that you will have four cascaded 4*16 bit lookups which amounts to a million memory words each.
Re:Verilog by fractoid · 2014-01-08 21:14 · Score: 1

Even if it successfully synthesizes, there is no guarantee that it will be in any way an optimal implementation.
However, if it does synthesize into something runnable, then you've just proved an upper bound for the cost of the implementation. If the upper bound is in any way commercially feasible then it's definitely worth optimising.

--
Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
Re:Verilog by Anonymous Coward · 2014-01-08 21:31 · Score: 0

You are stretching the definition of "Algorithm" a bit there.
Re:Verilog by fisted · 2014-01-08 21:43 · Score: 1

Your point being? Where's the 'moderately complex system'? Or are you going to tell me that two opamps now do the whole job of the MCU? If so, using an MCU there was probably a failure to begin with.

--
CLI paste? paste.pr0.tips!
Re: Verilog by loufoque · 2014-01-08 22:50 · Score: 1

Surely you're speaking of integral logarithm and division, not floating-point?
Re: Verilog by makomk · 2014-01-08 22:51 · Score: 1

You've forgotten about fixed point, which isn't really any more complicated to implement than integer arithmetic and is a perfectly reasonable way of implementing integer division by a fixed divisor. (A lot of compilers actually use this trick, because even running on a CPU it's often more efficient than using hardware division.)
Re:Verilog by Bing+Tsher+E · 2014-01-09 00:31 · Score: 1

Design your own x86 multi-core CPU, throw a couple of gigs of SRAM on the ASIC, tons of flash for a solid-state disc drive, and you will have a complete high-end PC on a chip. Then add your software.
That's utterly ridiculous. It's like if someone wanted a doughnut-making machine. So they built a city, and in one of the neighborhoods they built a factory that could make doughnut machines.
Re:Verilog by Muad'Dave · 2014-01-09 02:24 · Score: 1

The classic 74181 4 bit ALU shows how it can be done (page 5). It shows the schematic of the chip in gate form. Page 4 shows the 'opcodes' (really operation selection line combinations) that this simple chip can perform.
The mighty (!) 32xx series of minicomputers from Concurrent Computer in the 1980s/90s used a bunch of these chained together to form a 32 bit ALU.

--
Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.
Re:Verilog by Anonymous Coward · 2014-01-09 02:32 · Score: 0

depending on your precision.
Not a problem at all. There is none.
Re: Verilog by Muad'Dave · 2014-01-09 02:34 · Score: 1

< yes you can - the HTML escape is < - note the trailing semicolon.

--
Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.
Re: Verilog by Mr+Z · 2014-01-09 02:50 · Score: 1

You need < to get it (the semicolon also): <

--
Program Intellivision!
Re:Verilog by harrkev · 2014-01-09 03:17 · Score: 2

That is exactly my point. At one extreme, you could do a job in a a few hundred thousand gates, and at the other extreme you could do a job in a few billion gates. This is sort of an extreme upper-bound and a lower-bound on the size of the solution. Without further details, we have no idea where in the spectrum the real solution lies.

--
"-1 Troll" is the apparently the same as "-1 I disagree with you."
Re:Verilog by Vreejack · 2014-01-09 03:50 · Score: 1

What a great "idea". Unfortunately you still have to divide first in order to determine your multiplicand.
For example. Dividing by 34.527. How do you know what to multiply by instead? Do you have a very, very large lookup table? Now there is a solution. Get rid of multiplication AND division and just use lookup tables. That should be fast.

--
"Will future ages believe that such stupid bigotry ever existed!" -- Ivanhoe
Re:Verilog by Anonymous Coward · 2014-01-09 04:33 · Score: 0

64bit DIV on POWER3 had a 37 cycle latency. Newer CPUs have deeper pipelines to increase throughput at the expense of latency. C to Verilog will at least give an upper bound on the needed complexity even if it is an order of magnitude worse than a real HDL engineer would accept.
Re:Verilog by fatphil · 2014-01-09 04:36 · Score: 1

Is it possible to make a quicksort run "well" in hardware? What are you going to do for the stack, and how big will you make it so that everything still works in the worst case.?

Compare that to a trivial network sort that makes use of the inherent massive parallelism possible in FPGAs.

Is your O(n^(1+eps)) really "running well" next to an O(n^(1/2+eps))?

--
Also FatPhil on SoylentNews, id 863
Re:Verilog by Jane+Q.+Public · 2014-01-09 07:49 · Score: 1

Is it possible to make a quicksort run "well" in hardware? What are you going to do for the stack, and how big will you make it so that everything still works in the worst case.?
Yes, it is as possible to make it run as well in custom hardware as anywhere else.

You people have kept making the same arguments, and I have kept answering them. Do you have a reading comprehension problem?
Re:Verilog by fatphil · 2014-01-09 08:04 · Score: 1

But your definition of "well" appears to include "desperately inefficiently"?

If lots of people think you're using a term badly, then maybe, just maybe, you're at fault - did that ever cross your mind?

--
Also FatPhil on SoylentNews, id 863
Re:Verilog by Asmodae · 2014-01-09 08:34 · Score: 1

Not really. The biggest conversion issues I deal with (when converting algorithms to hardware) are related to how software treats RAM vs how hardware treats RAM. They are fundamentally different methods of operation. In software RAM is cheap/free, so it is preferred over CPU cycles. In hardware, the processing is cheaper (in general) and RAM is more expensive.

Buffering and holding a megabyte of data between each stage of processing is natural and very easy for software. But in hardware this is a very inefficient way to do things. Converting from one method to the other can be quite difficult depending on the algorithm.
Re:Verilog by Jane+Q.+Public · 2014-01-09 09:14 · Score: 1

"If lots of people think you're using a term badly, then maybe, just maybe, you're at fault - did that ever cross your mind?"
And your definition of "well" means consuming X power and Y hardware?

I will repeat this for you one more time: *I* was the one who wrote above that there could be misunderstandings about the meaning of "well". You haven't been adding ANYTHING to the conversation.
Re: Verilog by Anonymous Coward · 2014-01-09 11:37 · Score: 0

Floating point log isn't that hard. First, in piching the mantissa, you normalize it to a decimal form 1.nnnnnnnnn in binary. That shouldn't be too hard. Then, you successively divide by 1.1, 1.01, 1.001, AS APPROPRIATE, depending on if the current result is greater than or equal to that factor. For every factor used, you trigger the addition of its logarithm.
Doing all those divides might seem operation intensive, but it isn't. Each divide ends up being something like a single subtraction when they are all done concurrently.
Re:Verilog by Asmodae · 2014-01-09 11:56 · Score: 1

To be fair, the definition of "well" I intend isn't an arbitrary X/Y value. There's already very well defined numbers for the hardware which currently runs the algorithm. To transfer "Well" to custom hardware would be somewhere in the vicinity of: less than the original general purpose CPU by enough that it justifies the design effort involved and doesn't cost MORE to manufacture. All engineering decisions are trade-offs, and if the trade-off isn't worth the effort and resource cost you don't do it. For a transfer effort to go "Well" means at the end of the day you come out ahead somewhere.

If you have to spend 3 million dollars on custom hardware development just to get performance parity with a COTS general purpose CPU... you'd be hard pressed to call that "well" by any measure. This is what is implied by the setup of the original Ask Slashdot question, asking an engineering question about feasibility and cost.
Re: Verilog by MickLinux · 2014-01-09 14:29 · Score: 1

I should also note that it is possible to MULTIPLY the factors to get the target number, which is simply an incrementally-ordered shift and an addition and subtraction.
Example: with all numbers in floatingepoint binary,
Take log (1011.0111)
T=1.0110111, Mantissa=11
1.1>T; don't use 1.1
1.01T, try 1.0000000001)
So :
log(1.0110111)=log(1.01)+log(1.001)+log(1.000001)+log(1.00000000001)+...
And
Log(1011.011)=11+log(1.0110111)

--
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
Re:Verilog by Jane+Q.+Public · 2014-01-09 15:09 · Score: 1

"If you have to spend 3 million dollars on custom hardware development just to get performance parity with a COTS general purpose CPU... you'd be hard pressed to call that "well" by any measure."
Must I repeat this yet again? I was using "well" to mean it was possible to get it to run well on custom hardware. GP's comment may have had a different definition of "well". We know this, we acknowledged it a long time ago, and this is all just a rehash of what has gone before.

Don't misunderstand me. I'm not trying to be rude. But comes a point at which I tire of repeating myself.
Re:Verilog by niftymitch · 2014-01-09 19:10 · Score: 1

If you want to trade transistors for time, just use a CPU.
Anyway when making a chip, 24,000 transistors is not much. You don't want to do it everywhere sure, but a couple of times and it isn't an issue.
Gark... with a MC14500 your can (http://www.linurs.org/mc14500.html)
with an Intel 4004 you can. With a MC6800 you can.... build a system...
If I recall the Motorola MC68000 was about 68000 transistors
but a "C" program on the 68K runs a lot slower than on a modern
processor with a couple billion transistors. Nvidia is beyond 7Billion transistors
for their high end graphics.
There is something missing in big buckets here.
Lock the door, air gap a design lab, get some large as heck FPGA from Xilinx
and go to work. If you have something magic you want to own it but the turf
is well occupied so market and price points will matter.
You can use FPGA parts for subsystems --- WP reminds
me that Xilinx currently holds the "world-record" for an FPGA containing 6.8 billion transistors.
so you can get a lot done on field programmable devices -- or tiled arrays of parts.

--
Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
Re:Verilog by niftymitch · 2014-01-09 19:14 · Score: 1

I was misunderstanding my notes.
You would need several thousand transistors for a standard DIV circuit, and then the CPU would need to iterate through the operation many times in order to perform a division.
...snip....
Trivia...
the MC68000 took 144 clocks to finish a DIV.
Many processors are microcode engines under the hood. Modern ALU blocks are
big but can be purchased as a library.

--
Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
Re: Verilog by MickLinux · 2014-01-09 23:43 · Score: 1

Sorry, it looks like all my math got eaten up between less-than and greater than signs.

--
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
Re:Verilog by Bing+Tsher+E · 2014-01-12 05:34 · Score: 1

If you don't understand how a dual op-amp and passives can be made into a 'moderately complex system' then stick to your expensive DSP processors.
Re:Verilog by fisted · 2014-01-12 06:28 · Score: 1

I'm pretty well-aware of what you can do with opamps and RLC, thanks. Not that you mentioned any RLC in the first place, you also left them out in your price comparison, so why would i assume you did anything more complicated than a buffer?
but even both opamps in some exotic configration don't really match what i was referring to by 'moderately complex system'. two operations are not moderately complex.

--
CLI paste? paste.pr0.tips!
Re:Verilog by Andy+Dodd · 2014-01-14 06:40 · Score: 1

That is indeed what I meant by "transfer well". If it requires 4x the cost of a general purpose CPU to get an FPGA to match the performance for a given algorithm - then the FPGA wins.
As an interesting reference point - in many cases, "custom hardware" for some algorithms is now winding up something more along the lines of a tweaked CPU with a modified instruction set than dedicated hardware (such as the video encoding/decoding blocks of most ARM CPUs, such as TI's Ducati engine and Qualcomm's vidc, both of which are running firmware that is loaded by the applications processor on ??? architectures - vidc might be Hexagon just like most of Qualcomm's basebands are) Determining the proper tradeoff of hardware vs. software requires a lot of work and whether it is worth it depends on a lot of things (cost per unit, number of units shipped, etc.)

--
retrorocket.o not found, launch anyway?

Why don't they know? by Anonymous Coward · 2014-01-08 08:36 · Score: 5, Insightful

You'd think the "electronics manufacturer" would have some idea how to estimate this.

Re:Why don't they know? by janeuner · 2014-01-08 08:47 · Score: 5, Insightful

They do have a way. They asked if it had already been determined.
The correct response is, "We don't know."
Re:Why don't they know? by i+kan+reed · 2014-01-08 08:47 · Score: 1

Because manufacture doesn't necessarily mean design expertise?
Warning: Car analogy inbound.
Why can't the workers on the assembly line of a GM plant design a car?
Re:Why don't they know? by Sarten-X · 2014-01-08 09:27 · Score: 3, Funny

Because they're robots with no AI functionality?

--
You do not have a moral or legal right to do absolutely anything you want.
Re:Why don't they know? by Megane · 2014-01-08 09:28 · Score: 3, Informative

A more accurate car analogy would be GM wanting to build a car using your technology and asking you how many assembly line workers it would take.

--
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
Re:Why don't they know? by AndyKron · 2014-01-08 09:35 · Score: 1

I'm thinking the "electronics manufacturer" doesn't have an idea, or they wouldn't have asked that question.
Re:Why don't they know? by Goaway · 2014-01-08 10:00 · Score: 1

Or, you know, they asked so they wouldn't need to duplicate work that has already been done, in case they had it figured out already.
Re:Why don't they know? by nebular · 2014-01-08 10:23 · Score: 1

Why are the workers on the assembly line speaking to anyone about the design of the car. The engineers who design or maintain the plant should be speaking to the artists and engineers who designed the car.
So the engineers who know how to deisgn chips should be speaking to the programmers who made the algorithm. If those engineers are unable to translate an algorithm into silicon I'd be very worried about that company.
Re:Why don't they know? by daboochmeister · 2014-01-08 10:44 · Score: 1

Maybe the algorithm is proprietary, and dryriver's company doesn't want to release it to the manufacturer yet, even under NDA. Hard to estimate what you're not allowed to see. Just a thought.

--
"Ahh! I see you're in that indeterminate Schrodinger state where - oh, uh ... never mind." Dave Bucci
Re:Why don't they know? by sir-gold · 2014-01-08 11:44 · Score: 1

Because it's not their job. GM has engineers and designers to do that.
If this hardware manufacturer doesn't have a design team (and only does post-design production) then it's time to find a different manufacturer
Re:Why don't they know? by Kagetsuki · 2014-01-09 01:54 · Score: 1

The electronics manufacturer must have assumed they had some concept of how to design ASICs if they were even calling. This is the equivilent of somebody painting a picture of a house, then calling a carpenter and saying "I've designed a house, I'd like you to build it". Both a painting and drafted design documents are images of a house, just one gives you technical information like how much wood and how many nails you will need and the other does not.
I imagine the electronics manufacturer must have asked the question and was dumbfounded when they couldn't give any sort of answer.
Re:Why don't they know? by Anonymous Coward · 2014-01-09 10:41 · Score: 0

And then their correct counter-response will be: "Ok, give us the raw and detailed specification for what you are trying to accomplish and we'll quote your time-and-materials to design the HW design, alpha and omega, and then we'll know how much it will cost to manufacture it."
The likely cost range will be $250K-$2M depending on how complex the problem is, how well you specify requirements, the quality of the team you hire and the number of iterations you have to go through for failing the definition/selection prior items (application problem, specs, team quality).
(30 years HW design including applications similar to this)
Re:Why don't they know? by jbo5112 · 2014-01-10 05:10 · Score: 1

The electronics manufacturer probably hasn't seen the algorithm at this point. I assume they're still trying to figure out things like design cost, build cost, and feasibility before making a commitment to buy, and the software company doesn't want to give it away without a contract for payment. I would add up all the different operations of each type in your algorithm, along with some information about looping etc. and present this to the hardware company, but you would have to get a careful balance between giving them enough information to help and enough to build it themselves.
A hardware implementation can vary widely for a single algorithm. For example there are many implementations for running x86 instructions. A Haswell chip should run the same code that a 286 does, but with more transistors, higher IPC and a modified algorithm. If you look at closer processor generations, you may even see a repeated algorithm at some points.

Just like any other software project by mbadolato · 2014-01-08 08:36 · Score: 5, Funny

Make up a number, then when they complain that it was way off, blame it on their management changing scope a hundred times throughout the life of the project!

Re:Just like any other software project by Anonymous Coward · 2014-01-08 08:54 · Score: 1

42!
Re:Just like any other software project by Anonymous Coward · 2014-01-08 09:51 · Score: 0

My wife, a particle physicist, has suggested that she wants to make a t-shirt with just the number 42 printed on it! :-)
Re:Just like any other software project by Anonymous Coward · 2014-01-08 09:57 · Score: 0

1405006117752879898543142606244511569936384000000000
Re:Just like any other software project by Tablizer · 2014-01-08 11:17 · Score: 1

Or just say, "Our best estimate is somewhere between 7 and 38,000,000,000".

--
Table-ized A.I.
Re:Just like any other software project by captain_nifty · 2014-01-08 11:32 · Score: 1

42 factorial... hmm.
1.4 x 10^51... yeah that ought to be enough gates.
Re:Just like any other software project by Imrik · 2014-01-08 15:47 · Score: 3, Funny

Made me think of this.

meh by Anonymous Coward · 2014-01-08 08:38 · Score: 0, Troll

This is the dumbest ask slashdot ever. Give them the source code and tell them to figure it out. Remind them they are the EEs. I don't know which of the two companies here is stupider.

Re:meh by jbo5112 · 2014-01-10 05:14 · Score: 1

The hardware company may not have signed a contract yet. You don't want to just give something away to the customer when they haven't bought it yet. They're probably trying to establish design and build costs, so they will have an idea profitability and feasibility before locked into a contract to buy something they can't sell.

C to HDL to netlist by Anonymous Coward · 2014-01-08 08:38 · Score: 2, Informative

As a first-order approximation, you can translate your C/C++ code to a hardware description language (HDL) such as VHDL or Verilog. Tools exist for this process. The result won't be optimized for the peculiarities of HDL, but it will provide a good start. From there, you can port the HDL to a Xilinx or Altera FPGA netlist using vendor-specific tool chains. The porting effort will summarize the logic and memory resources of your implementation. Any digital hardware engineer worth their salt should be able to translate FPGA utilization metrics into whatever platform they are interested in.

Re:C to HDL to netlist by Anonymous Coward · 2014-01-08 19:37 · Score: 0

As a first-order approximation, you can translate your C/C++ code to a hardware description language (HDL) such as VHDL or Verilog. Tools exist for this process.

Yes, but they only work on a small subset of C/C++ programs. Do you use malloc() or new? Or recursive calls that can't easily be transformed to tail recursion? If so, the process *will* fail.

Try Stackoverflow perhaps? by Anonymous Coward · 2014-01-08 08:40 · Score: 5, Insightful

I think you may have a better chance of getting an answer if you ask this question on Stackoverflow (or one of its related sites).

Unfortunately, I think asking on Slashdot is only likely to get you some tired and outdated memes / jokes...

Re:Try Stackoverflow perhaps? by Anonymous Coward · 2014-01-08 09:55 · Score: 0

Well... fist you have to calculate the program in terms of hot grits per Natalie Portman.
Re:Try Stackoverflow perhaps? by Nethemas+the+Great · 2014-01-08 10:44 · Score: 0

You can't be serious..? S.O. is full of monkeys with precious little comprehension about the things they write let alone their theory of application. You might get a hint or two that could coalesce into some ideas for research inquiries but expecting something more foundational is only asking for trouble.

--
Two of my imaginary friends reproduced once ... with negative results.
Re:Try Stackoverflow perhaps? by greg1104 · 2014-01-08 11:14 · Score: 2

The world is full of monkeys with precious little comprehension about the things they write let alone their theory of application
Fixed that for you. Ninety percent of everything is crud.
Re:Try Stackoverflow perhaps? by rasmusbr · 2014-01-08 14:07 · Score: 2

The world is full of monkeys with precious little comprehension about the things they write let alone their theory of application
Fixed that for you. Ninety percent of everything is crud.
Off topic, but it is more interesting than that...
When are you the most excited about some new idea or concept? When is your impulse to share technical ideas the greatest? Well, usually right after you've learned it, or rather when you think you've learned it but in reality you've only got a half-decent grasp of the idea and you still have have a number of the details completely wrong. The exception to this rule is that some highly skilled and knowledgeable people take pleasure in beating less knowledgeable people in the head with their knowledge. So there you have it: the virtual world of teachers consists of a lot of well-meaning people who don't know what they're talking about and one or two jerks who do.
Obvious question: What does this tell us about people who like to give sex advice on the internet?
Re:Try Stackoverflow perhaps? by greg1104 · 2014-01-08 14:37 · Score: 1

I'm probably the wrong person to comment on this, since by your classification I'm one of the jerks.
Re:Try Stackoverflow perhaps? by KevReedUK · 2014-01-09 08:42 · Score: 1

You can't be serious..? S.O. is full of monkeys...
Then, by virtue of the infinite monkeys theorem, the OP might actually get the kind of answer he needs!?!

--
Just my $0.03 (At current exchange rates, my £0.02 is worth more than your $0.02)
Re:Try Stackoverflow perhaps? by Nethemas+the+Great · 2014-01-10 08:17 · Score: 1

Alas I regret I must concede a point to you on this matter. Well met sir.

--
Two of my imaginary friends reproduced once ... with negative results.

They shouldn't be asking you. by pavon · 2014-01-08 08:40 · Score: 5, Insightful

If they plan on implementing this in hardware, then they should have people who are capable of answering that question. If instead, they are just a manufacturer and aren't capable of doing the actual hardware design, then you have bigger problems than answering this question. That is something you should find out about ASAP.

Minecraft by nbetcher · 2014-01-08 08:41 · Score: 5, Funny

Develop out the algorithm in Minecraft using ProjectRed (Integration module, specifically) and then you can easily count the gates! :-)

Re: Minecraft by Anonymous Coward · 2014-01-08 09:53 · Score: 0

Yo dog! I heard you need custom ASICs to design custom ASICs...
Re: Minecraft by DigiShaman · 2014-01-08 09:54 · Score: 1

An ASIC of Minecraft? Brilliant!!!

--
Life is not for the lazy.
Re: Minecraft by Anonymous Coward · 2014-01-08 11:01 · Score: 0

Yo dawg! I herd u need a brotha to sexually satisfy your woman!
Re:Minecraft by Anonymous Coward · 2014-01-08 11:49 · Score: 0

Minecraft redstone in Creative Mode is an excellent and humbling experience for a software developer like me. In software, you can multiply, abstract and generalize your ideas so the sky's the limit of your creativity.
In hardware, you have to manufacture every bit physically and not make mistakes. No doubt real hardware manufacturers have developed numerous copying techniques, but none of them are as general-purpose and powerful as the software techniques.
Developing a minimal von Neumann -computer or a programmable piano with redstone takes weeks of dedicated work, and then your brilliant creation is frustrated by redstone glitches, geometric and timing considerations as well as bugs in the implementation.
Re:Minecraft by Anonymous Coward · 2014-01-10 11:56 · Score: 0

Re: "no doubt": yes. Although it's true that software gives you far more powerful abstractions than hardware, the way I'd put it is that doing logic in redstone is a bit like writing programs in machine code using toggle switches, while in the real world we have the moral equivalent of excellent macro assemblers and debuggers. (And, in some cases, even more abstraction than that.)
There are two functional programming languages which most logic is designed with, VHDL and Verilog. These languages both have a fairly rich set of features, of which a subset are "synthesizable" (translatable to gates by automatic processes). The non-synthesizable features exist to make it easier to write testing frameworks or "testbenches" for simulation purposes. For example, both languages have facilities for reading and writing files, an abstraction which isn't available in hardware but is tremendously useful for writing a testbench.
Here's a snippet of synthesizable (translatable to gates by automatic processes) Verilog source code:
reg [15:0] ctr;
always @(posedge clk) ctr = ctr + 16'b1;
That's a free running 16-bit counter which increments by 1 every time the "clk" signal transitions from logic 0 to logic 1. Changing it to do other things (like, say, pulsing an output every N clock cycles) would be pretty simple. I'm not intimately familiar with redstone logic but based on watching youtube videos I'm sure it would take a lot longer than five seconds to implement that, and it would be a really huge PITA to combine it with other basic circuits to make a complex machine. When you're working in Verilog, signal connections are by name not 3D geometry, and cut & paste and search & replace are things you can do.
Basically, when I watch videos of people trying to do shit with redstone my immediate reaction is "why would I want to do what I do at work, only with cement overshoes on?". Not denying that redstone logic is interesting and fun to many, and maybe somewhat useful as a learning tool (though I'd say playing with real digital logic would be an even better learning tool), but if you've got experience with the real thing redstone looks far more tedious than the 101 level projects you did back in college. ;)

Have you tried telling them by Anonymous Coward · 2014-01-08 08:41 · Score: 1

"We don't know."

hls design by Anonymous Coward · 2014-01-08 08:42 · Score: 1

The most common languages for chip or FPGA design would be VHDL or Verilog. Now there is also High Level Synthesis (http://en.wikipedia.org/wiki/High-level_synthesis), in which you can use C/C++ directly. So if your using a tool like Xilinx's Vivado (http://www.xilinx.com/products/design-tools/vivado/integration/esl-design/) then you can go directly from C/C++ to gate count. However, even in C/C++ it probably needs lots of work from where it is.

Completely stupid question by dskoll · 2014-01-08 08:42 · Score: 4, Insightful

The question "How many gates does it take to implement this algorithm?" is stupid. It's like asking "How long is a piece of string?"

There will always be a time/space tradeoff, even with translating an algorithm to hardware. You can save time by throwing more gates at the problem to increase parallelism, or you can save space by reusing gates in sequential operations.

Re:Completely stupid question by presidenteloco · 2014-01-08 09:17 · Score: 1

Yes, theoretically, according to Turing, you could get by with enough gates to make a couple of registers, a goto/jump instruction and a branch if is-zero test, as long as you have some read-write memory somewhere else.

--

Where are we going and why are we in a handbasket?
Re:Completely stupid question by Baloroth · 2014-01-08 09:23 · Score: 0

The question "How many gates does it take to implement this algorithm?" is stupid. It's like asking "How long is a piece of string?"
There will always be a time/space tradeoff, even with translating an algorithm to hardware. You can save time by throwing more gates at the problem to increase parallelism, or you can save space by reusing gates in sequential operations.
Not entirely true. While obviously the number of gates will depend on your exact implementation (you could theoretically use an infinite number of gates for any algorithm), there will be a certain minimum number of gates dependent on the algorithm itself. Even if you reuse gates by performing sequential algorithms, you still need to store data from previous operations and will need a certain number of gates for that.
The manufacturer is probably asking how many gates you need to implement the algorithm exactly as it is coded, with exactly as much parallel or sequential logic as it already has, and that will have a fairly specific answer. Again, you might be able to optimize the code to use fewer gates, but that would be a different question.

--
"None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton
Re:Completely stupid question by Anonymous Coward · 2014-01-08 09:35 · Score: 0

The analogy "How long is a piece of string?" is a stupid example of stupid questions. WORST ANALOGY EVER

Make life easier for everyone.... by Anonymous Coward · 2014-01-08 08:42 · Score: 0

...just rewrite your software in machine code.

You need a C to VHDL translator by Animats · 2014-01-08 08:42 · Score: 4, Informative

You need a C to VHDL translator. Here's a tutorial for one.

Only the parts of the algorithm that have to go really fast need to be fully translated into hardware. Control, startup, debugging, and rarely used functions can be done in some minimal CPU on or off the chip. So, for sizing purposes, extract the core part of the code that uses most of the time and work only on that.

Re:You need a C to VHDL translator by Trepidity · 2014-01-08 09:08 · Score: 4, Informative

One caveat to going this route: if the algorithm contains well-known operations as building blocks, you probably don't want to synthesize your own VHDL versions of those standard operations, since they already have highly optimized hardware implementations. For example, if one step of the algorithm is "compute an FFT", you probably want to use an existing FFT IP core to implements it, rather than translating some FFT C code to new VHDL.
At one extreme, where the algorithm is nothing but a chain of such cores (common in DSP applications), you could get a rough estimate just by looking up the gate counts for each operation and adding them up.

--
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Re:You need a C to VHDL translator by iggymanz · 2014-01-08 09:15 · Score: 1

I'm worried about dryriver's "electronics manufacturer", that kind of skill and knowledge should be a core competancy of any business that makes custom app chipsets
Re:You need a C to VHDL translator by HalWasRight · 2014-01-08 09:53 · Score: 1

How did this get modded up to "Informative"? This is misinformation. If you believe what an FPGA vendor tells you about their tools then I have some land in Florida you might be interested in. There is NO push button path from C to hardware, unless you consider compiling the C into object code that is burned into ROM as a hardware solution. Yes, there are tools like Cynthesizer from Forte and the cited tool from Xilinx that use C as an input language, but it is gerrymandered C geared toward synthesis, not "dusty deck" C. As stated above, there are too many tradeoffs in time and space to provide a simple answer to your interested party. You should hire someone who can find a couple of points in the solution space and give your interested party an educated answer like "At xx mm^2 it runs this fast with this latency, while at yy mm^2 it runs this fast at this latency with 50% better power".

--
"This mission is too important to allow you to jeopardize it." -- HAL
Re:You need a C to VHDL translator by Anonymous Coward · 2014-01-09 11:32 · Score: 0

Dude, OP is just looking for a quick order-of-magnitude approximation to tell their buyer. Not a final design. Relax.

about 40 gates. by Anonymous Coward · 2014-01-08 08:42 · Score: 1, Interesting

it would only make sense to reuse the same adder circuit for each addition, instead of making a separate adder circuit for each operation.
then you'd add control logic to move the data to adder circuits, multiplier circuits, etc.
then essentially what you have is a microprocessor.
then you just turn that microprocessor into the simplest one possible. which is basically a queue and a stack, and a few elementary logic operations. you can do operations a bit at a time.
so the number of logic gates your program needs it the number to make a queue and a stack, and a few elementary logic operations, and that's probably on the order of about 40 gates.

Re:about 40 gates. by behrooz0az · 2014-01-08 09:43 · Score: 1

40? How did you come up with that number?
A radio from 80s probably has more gates, and they don't add or multiply.

--
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion. -- Spazmania (174582)

VHDL by Anonymous Coward · 2014-01-08 08:42 · Score: 3, Informative

Implement the algorithm in VHDL and test it on an FPGA. I would imagine you'll need to pay someone $$$$$ to do that for you...

Cost estimate by tepples · 2014-01-08 08:43 · Score: 1

Maybe they can do the translation, but they need a number for how many gates so that they can give a number for how many dollars.

Re:Cost estimate by Anonymous Coward · 2014-01-08 09:08 · Score: 0

I'm not sure if the economics are so straightforward.
I've seen many designs where a PIC has been used, where the same job could have been done with random discrete logic. It was cheaper to use a PIC, as it simplified the design and testing process, even though the part cost was higher and many thousands more gates used.
Re:Cost estimate by EndlessNameless · 2014-01-08 09:16 · Score: 2

If they can design the hardware, they can ask for the source and supply the quote themselves.
If they can't, then OP needs to understand they have no practical design capabilities and plan on paying someone else to design it---before paying these guys to manufacture it. Or he can search for a shop that can handle both the design and the manufacture.

--

---
According to the latest ruleset, this post should be modded as Vorpal Flamebait +5.

Break down your algorithm by neutrino38 · 2014-01-08 08:43 · Score: 4, Informative

Hello,

It is probable that you can break down your algorithm -(I do not mean code) into a pipeline of elementary processing and find implementations (IP) for each of them.

to give out an estimate:
- subdivise your algorithm into simpler pieces
- find for each simple piece how it can or could be implemented in hardware and the complexity of each piece.
- do the sum.

Indeed an hardware designer or consultant would be of a great help here.

find an EE by Anonymous Coward · 2014-01-08 08:45 · Score: 0

I would suggest finding someone who has a better idea of what a logic gate is. They'll know what to do.

Re:find an EE by Anonymous Coward · 2014-01-08 18:42 · Score: 0

I would suggest finding someone who has a better idea of what a logic gate is. They'll know what to do.
Pretty much sums up my first reaction to the OPs question.

HLS by orledrat · 2014-01-08 08:45 · Score: 4, Informative

What you want to do is called high-level synthesis (going from C to hardware description language (HDL) to generating gate-lists from that HDL) and there's plenty of software to do that with. A neat open-source package for HLS is LegUp (http://legup.eecg.utoronto.ca/), check it out to get an idea of what the process consists of.

Bluespec by Anonymous Coward · 2014-01-08 08:45 · Score: 0

You could write in Bluespec then compile to SystemVerilog from there to something synthesisable...FPGA, ASIC and count transistors.

But many of those transistors will be providing necessary infrastructure in addition to the "raw" algorithm

Morons by Anonymous Coward · 2014-01-08 08:46 · Score: 0, Offtopic

Only idiots write "C/C++." As soon as I see that, I stop reading. It is clear indicator that the writer doesn't know what he's talking about.

It doesn't work like that by cjonslashdot · 2014-01-08 08:46 · Score: 5, Informative

It's about more than gates. It is about registers, ALUs, gates, and how they are all connected. There are many different possible architectures, so it depends on the design: some designs are faster but take more real estate. There are algorithm-to-silicon compilers (I know: I wrote one for a product company during the '80s and it is apparently still in use today) but each compiler will assume a certain architecture. I would recommend one but I have been out of that field for decades.

C code synthesis tools exist, but... by Anonymous Coward · 2014-01-08 08:48 · Score: 1

There are a number of tools on the (commercial) market that can compile (a subset of) C to hardware into a hardware description language (Verilog/VHDL), e.g. from Cadence and Synopsys. See http://en.wikipedia.org/wiki/High-level_synthesis for an overview of the approach and links to tools. There are also some open source tools that can turn C code into Verilog or VHDL, but they are not very mature in my opinion.

However, you will not get a single number for gate complexity out of these tools. Depending on the requirements and tradeoffs (smaller chip area vs. higher speed, single cycle vs. pipelined implementation, target device as FPGA or ASIC), the number of gates (or logic blocks for FPGAs) required will differ significantly. From my experience, you definitely need a hardware design expert to obtain useful (i.e., more-or-less optimized) results from these tools - and you should expect having to invest significant effort for restructuring your C code so the high-level synthesis tools can grok it.

Easy calculation by Anonymous Coward · 2014-01-08 08:48 · Score: 5, Funny

Here is a proven method for calculation.

If your code is:
a) C: divide the number of lines with 7
b) C++: divide the number of lines with 5
c) Ruby/Python/Java: divide the number of lines with 3
d) Perl: multiply the number of lines with 42
e) C#: resign.

Re:Easy calculation by harrkev · 2014-01-08 11:12 · Score: 0

Did you pull these numbers from your rectal database? Given these rules, it should be theoretically possible to put the Linux kernel in a chip without a general-purpose CPU.
From Wikipedia:
As of 2013, the Linux 3.10 release had 15,803,499 lines of code.
This means that you would need around 2.2 million lines of Verilog. If you assume around 20 gates per line of code, that comes to 44 million gates. Assuming around ten transistors per gate, that comes to 440 million transistors. That is smaller than the current "Sandy Bridge M-2" die. And since it completely bypasses the fetch-decode-execute pipeline of a general-purpose CPU, it should run blazingly fast! So, for fewer transistors, we can get probably 10x the performance of running the Linux core on dedicated silicon.
Of course this is a very stilted example to show how stupid such rules of thumb can be.

--
"-1 Troll" is the apparently the same as "-1 I disagree with you."
Re:Easy calculation by neiras · 2014-01-08 12:13 · Score: 1

Did you pull these numbers from your rectal database? Given these rules, it should be theoretically possible to put the Linux kernel in a chip without a general-purpose CPU...And since it completely bypasses the fetch-decode-execute pipeline of a general-purpose CPU, it should run blazingly fast! So, for fewer transistors, we can get probably 10x the performance of running the Linux core on dedicated silicon.
WHOA THANKS FOR THE PITCH! VENTURE CAPITAL HERE I COME BABY
Re:Easy calculation by Anonymous Coward · 2014-01-08 13:07 · Score: 1

Wow. Just... wow. Hey harrkev, I think that was a joke. Maybe you've had enough - you're embarrassing yourself. Get some sleep.

It only takes one Gates by Anonymous Coward · 2014-01-08 08:49 · Score: 0

...but he's pissed that the lawmakers are focused on re-election and partisan bullshit, so it'll never get done.

More details please by ttg512 · 2014-01-08 08:50 · Score: 1

The answer, as you might imagine, is complicated and depends on how these gates are implemented. Think for instance you could design a chip to do this, you could write RTL to do this in an FPGA, or you could even write the algorithm into more software on an embedded processor of some kind. Is this electronics manufacturer one that makes chips or one that makes systems (boards, cases, etc). If it is the former they should have people who can work with your people to figure this out. If it is the latter then why do they care? Are they really asking you to provide a chip which implements your algorithm? Ask some more questions...

Troll bait? by khb · 2014-01-08 08:51 · Score: 1

The question seems so ill-posed that one has to wonder if there's a product or service advert lurking... but assuming this is real.

Software doesn't automatically translate directly to hardware. As others have noted, break out the algorithmic core from the setup and finish. Presumably there is some part of the code which is the most critical in steady state. Describe that to their hardware engineers in whatever depth is required. Depending on the algorithm, the ASIC library elements available (or FPGA units, etc.) you may want to make some substantial adjustments to the "code" to make it fit within the design parameters of the available device. This should be an iterative process, not a single estimate based on a pure software perspective.

If there isn't a clearly identifiable set of "hot blocks" the chances of there being a good hw implementation fit is poor. If there is, it may still be necessary to change the algorithm details to fit but it should be "doable". Whether it is worthwhile depends on the volumes and the performance gains.

Use SystemC to Gate flow by Anonymous Coward · 2014-01-08 08:53 · Score: 0

It's typically not that hard to convert C/C++ To SystemC and then use one of the many SystemC to gates flow. Various tools have come and gone in the past decade as C/C++ to gates flow isn't exactly optimized and is difficult to do functional equivalency checking.

Sounds like a joke by Cryacin · 2014-01-08 08:54 · Score: 4, Funny

How many Gates will it take to implement your software project?

One. His name is Bill, and here is yours.

--
Science advances one funeral at a time- Max Planck

Re:Sounds like a joke by Anonymous Coward · 2014-01-08 10:45 · Score: 0

it takes one gate on a plane to do the FAT lady
Re:Sounds like a joke by klubar · 2014-01-08 12:52 · Score: 1

Actually you have your choice (these and many more). Probably with all of these gates you could solve almost any problem:
Bill Gates (Chairman of Microsoft)
Melinda Gates (American philanthropist)
Robert Gates (Former Defense Secretary)Antonio Gates (San Diego Chargers Tight End)
Brent Gates (American professional baseball player)
Clyde Gates (New York Jets Wide Receiver)
Lionel Gates (American professional football player)servants[edit]
Artemus Gates (American financier and Undersecretary of the Navy)'
Re:Sounds like a joke by Anonymous Coward · 2014-01-08 14:09 · Score: 0

I wouldn't do it. Bill will use way too many Gates. Also, will hide secret code to track what kind of socks you wear and the coffee you drink and pass that information to the NSA PRISM system.
Re:Sounds like a joke by TheGratefulNet · 2014-01-08 15:34 · Score: 1

wow, that reminds me of a very old sigfile:
"my computer has AND-gates, OR-gates and NOT-gates, but no bill gates"
and yes, it was from unix guys, probably 10 or so years ago.

--

--
"It is now safe to switch off your computer."
Re:Sounds like a joke by Alioth · 2014-01-08 20:47 · Score: 2

It's such a shame that Gates McFadden from ST:TNG didn't marry Bill Gates. Then she could have been Gates Gates.

--
Oolite: Elite-like game. For Mac, Linux and Windows

Oh crap by fiannaFailMan · 2014-01-08 08:55 · Score: 1

Mod this offtopic if you want but now I can't see my comments, I can't see if anyone has responded to them, and it has become almost impossible to participate in discussions as a result. WTF, /.?

--
Drill baby drill - on Mars

Difficult, but... by ciw1973 · 2014-01-08 08:58 · Score: 1

...this should get you started:

http://en.wikipedia.org/wiki/C_to_HDL

Find a suitable converter, then grab a free (or evaluation) version of an FPGA design tool, for example one of these (I only suggest these over the many other, probably equally as good alternatives, as I've used them myself):

http://www.xilinx.com/products/design-tools/ise-design-suite/index.htm

And with a bit of work you should be able to produce output that will essentially be your code implemented in programmable logic, and the tools will tell you the number of gates/cells required.

What I would say, is that you'll have a much easier ride if your algorithm is in C rather than C++.

Despite saying that you have no experience with this sort of thing, defining logic in something like VHDL is basically programming. Sure, you'll need to develop a fair understanding of the hardware, but with the libraries of pre-built components available from the numerous companies who produce programmable hardware like FPGAs and CPLDs, you may find you could do a lot more than you think yourself.

Re:Difficult, but... by Asmodae · 2014-01-08 09:37 · Score: 1

VHDL is basically programming

Sure, the same way software is basically just english and letters and numbers and if you understand those you can do most any software yourself! /sarc.
VHDL is code, but after cleaning up after software people who think they can write VHDL, it's not the same thing at all. The key statement is

Sure, you'll need to develop a fair understanding of the hardware
This is by no means a light or trivial task. There's even entire university degrees dedicated to it. ;) But if you have all THAT, then sure writing hardware code is a snap! In all seriousness the above statement basically says it's easy if you have the skill set already.

Just don't make the mistake of thinking that if you understand BOTH hardware and software that they are equivalent, or that everyone else shares in your expanded understanding. I've seen programs fail because people try to treat hardware like software simply because they're both captured with some text. It's a dangerous viewpoint if you want your project to succeed.

Cadence C to Silicon by solidraven · 2014-01-08 08:59 · Score: 1

Haven't tried it, but Cadence's C to Silicon might be up for the job. Also keep in mind that in hardware you have very different requirements than in software, and parallellisation has interesting effects on the number of gates. The best option is to get an EE, preferably with experience in digital design, to take a look at it. Other options are SystemC compilers, but they're not really up to production use yet as far as I know. And it is also very technology dependant, sometimes complicated logical functions that are common are implemented directly. This isn't something you can just wing!

Re:Cadence C to Silicon by Anonymous Coward · 2014-01-08 10:41 · Score: 0

For pure C, there is nothing like Mentor Catapult C. Cadence tool requires SystemC, with a lot of hardware intent made explicit. Mentor tool accepts pure algorithmic C/C++, allows you much more freedom for experimentation with data types, memory layouts, pipeline synthesis, etc., etc. For "software" people, Mentor tool would be a much easier entry point, compared to Cadence, or ForteDS cynthesizer tool.

They are the hardware people, they should know. by Anonymous Coward · 2014-01-08 08:59 · Score: 0

Just give the number of gates in whatever processor you're running this thing on as an upper bound.

No good answer by kosh271 · 2014-01-08 09:00 · Score: 1

There really isn't a great way to answer your question without a detailed analysis of your code.

There are more factors to the number of gates required for a given task than just the complexity of code. Clock speed can be a major factor in determining the number of gates required for a given algorithm. Another major factor is the part you are targeting. The number of design elements in FPGAs used can change just by targeting a different device family.

Even if your algorithm was small enough to fit into a part, there are other issues that could arise (such as not enough bandwidth or pins for your memory device(s)).

It sounds like the electronics manufacturer doesn't have the resources to determine the number of gates for you. It looks like your only avenue is to ask a third party to review your code (under NDA) to help you determine the approximate gate requirements. This won't be cheap.

Have you tried by DogGuts · 2014-01-08 09:00 · Score: 1

google: https://www.google.com/search?q=convert+c+code+to+fpga

software as hardware?! but but but software patent by raymorris · 2014-01-08 09:01 · Score: 1, Insightful

Clearly it's not possible to render a software program as hardware. If everyone who explained the process (use Verilog) above is correct, that would mean that the exact same algorithm exists as both hardware and software.

We can't have the same algorithm exist as both hardware and software, because that would mean algorithms are hardware just as much as they are software.
that would mean all the people whining about "software patents" may as well be whining about unicorns. I hereby declare Verilog, ASICs, and FPGAs to be non-existent so we can continue to pretend that there is such a thing as a "software patent".

Rough Approximation by excelblue · 2014-01-08 09:02 · Score: 1

This is a horrible question to ask. Software is a tool to lower hardware requirements.

Compile your algorithm to the simplest RISC architecture reasonable. For most, something among the lines of ARM or MIPS works. Then, take note of all variables and add up how much RAM they'll take. Consider every bit (yes, bit, not byte) as a D-flipflop and convert every instruction (post-compile, in assembly) into a respective set of logic gates. A bit of googling should get you those values.

If your algorithm is reasonably complicated, chances are, you'll get a number that seems absurdly high compared to what state-of-art hardware is available.

In practice, it's probably best to just pick an off-the-shelf CPU and run the software on it. There might be some parts that are better done in hardware than in software, but you should get someone who knows what they're doing for that.

It's an optimization problem by swm · 2014-01-08 09:03 · Score: 5, Insightful

You already have your algorithm running in electronic hardware, right?
Your current gate count is the sum of
* the gate count of your CPU
* the gate count of your RAM
* the gate count of your program ROM

So that's an upper bound on the gate count.
If that number is too big for your manufacturing partner,
then you have an optimization problem.

Optimization is a hard problem...

Re:It's an optimization problem by Austerity+Empowers · 2014-01-08 16:17 · Score: 1

RAM and ROM, not being comprised chiefly of logic "gates" would probably not be all that helpful.
Re:It's an optimization problem by Anonymous Coward · 2014-01-09 00:59 · Score: 0

You're wrong!
Since your description assumes sequential processing (using a microprocessor), you can be quite efficient in using registers, ALU's and such. Most of the gates you count in your example are memory anyway.
I you want need to go to hardware for speed reasons, you're going to be doing stuff in parallel, so your hardware requirements increase there. A good chance the result will be less than that of the general purpose microprocessor + memory, but there's no guarantee.
Re:It's an optimization problem by Anonymous Coward · 2014-01-09 03:22 · Score: 0

This is actually the correct answer.
The optimization steps are:
Looking at what the code is doing. (both the logic and time available to do it)
Choosing a strategy that can be coded in logic (both will fit and can be maintained)
Synthesizing it to get a gate count to see what it took.
Perhaps debugging to make sure you have what you think you have.
Repeating the process if the count is too high.
In theory, anybody with a computer engineering degree should be able to do this.
In practice, you need someone who is good at at least reading S/W, designing gates and flops, and synthesizing to a particular technology.
Depending on the project size and target tech, may not be one person, but rather a group.
Next time, you really should do your own homework.

Accurate answer by Sarten-X · 2014-01-08 09:04 · Score: 3, Informative

Write out the truth table for each output as a Karnaugh map incorporating every input. Count the number of gates needed to solve the map, and that's your answer for that output bit. Repeat for every other output bit. Add all those numbers together, and that's a fair estimate of how many gates you'll need.

Of course, this method requires that your number of input bits must be fairly small. Don't forget that memory counts as both input (when read) and output (when written). For nontrivial applications, you'll find that the number of gates quickly approaches "a lot".

--
You do not have a moral or legal right to do absolutely anything you want.

Re:Accurate answer by mrego · 2014-01-08 09:30 · Score: 2

Since they are translating a program/algorithm into circuitry, they need only to know the maximum number of gates that are used at any one cycle time (taking into account necessary time delays), so just adding all the gates per operation way over states the answer since and, not, or, etc. gate circuitry can be reused for different operations at another cycle time. Also, as for logic operations, run it through a Quine-McClusky optimization as well to minimize them.
Re:Accurate answer by Anonymous Coward · 2014-01-08 12:21 · Score: 0

This also requires that your code has no internal state at all. Does your code use variables of any kind?

Gate count more a matter of speed by Yoik · 2014-01-08 09:09 · Score: 2

It doesn't take many gates for a Turing machine that will run your algorithm but it's likely to be slow. A proper hardware implementation will optimize everything and be as parallel as possible.

The problem as stated isn't adequately constrained.

Re:Gate count more a matter of speed by Anonymous Coward · 2014-01-08 19:50 · Score: 0

It doesn't take many gates for a Turing machine that will run your algorithm but it's likely to be slow. A proper hardware implementation will optimize everything and be as parallel as possible.
The problem as stated isn't adequately constrained.
+1. They're asking for something that's basically impossible to answer with the information you've given. More explicitly: Are they expecting to run everything in parallel? Or could some of the job be delegated to an embedded microprocessor? What kind of speed requirements do they have? You could almost certainly rewrite your algorithm to run on a 4-bit microprocessor; it may need to be somewhat more capable than, say, the Intel 4004 (which has severe limitations on memory addressing, for example), but as that only contains around 500 gates, it's pretty easy to figure that, say, 1000 gates would likely be enough. It'd be pretty damned slow, though.

"Graphics" algorithm... right by ArcadeMan · 2014-01-08 09:09 · Score: 1

You guys are probably trying to get a manufacturer to make Scrypt-mining ASICs.

--
Get free satoshi (Bitcoin) and Dogecoins

Re:"Graphics" algorithm... right by neiras · 2014-01-08 09:32 · Score: 1

Yep, first thing I thought too.
Re:"Graphics" algorithm... right by theskipper · 2014-01-08 10:03 · Score: 1

Bingo. And the hardware guys recognized it immediately too. Mainly because they're probably getting emailed the same question 10 times a day.
There's a reason scrypt asic has been a long time in the making, it's memory intensive. Only alpha-t.net seems to be making headway to a viable product. But even with them taking preorders now, nothing is written in stone. Commercially and technically, SHA asic was an easier cat to skin.

Not even close. by Anonymous Coward · 2014-01-08 09:10 · Score: 0

You guys really need the help of a Comp E or EE. Not to be super critical but from reading your post/question it's pretty easy to see that you don't understand hardware design at all. Nothing wrong with that but you're clearly purely software. Any tools to convert code to HDL is not going to help you at this point because you're not going to understand the hows and whys of how things work on a gate level. There's no magic program that can take a program of some complexity and turn it into logic gates.

Maybe you could get by with an embedded processor and some peripheral ip blocks in your asic... maybe the whole thing is one giant 32 stage image processing pipeline. It really depends on your algorithm.

FYI reversing a bit (1 bit) takes 2 gates or 16 gates per byte. 'Add' takes several hundred if you do anything more (look ahead) then the most basic versions. 'Divide' takes thousands of gates. Any memory you use as buffers will be on chip if you want to get the performance of an asic and that costs gates (4 gates per bit) and there's no free/malloc (no memory reuse basically). Any external memory access will require a dram controller so that adds gates.

Howto by Anonymous Coward · 2014-01-08 09:11 · Score: 0

1. Solicit buyers
2. Land one
3. Post on Slashdot
4. ???
5. Profit!

How fast do you need it? by Anonymous Coward · 2014-01-08 09:12 · Score: 0

Speed of result and gate count are directly related to each other, except for a single bit input and output.

Multiply by magic constant, get estimate! by Anonymous Coward · 2014-01-08 09:12 · Score: 0

Create some reasonable metric, for example 5 gates per instruction. Compile your program and somehow get the number of instructions in assembler, then multiply by 5 or more. In good case you could expect 1-2 gate(s) per instruction, in bad case you could expect 2-3 gates per instruction, and those two more gates are just "to make sure" you won't underestimate it.

The correct response is... by Anonymous Coward · 2014-01-08 09:15 · Score: 0

If you don't have anyone on your team that has done ASICs or FPGAs, the only correct response is "We don't have the expertise here to estimate that."

People have mentioned C to VHDL translators. While they exist, they aren't magic and will not give you a useful answer. They do not "know" what parts of the design are important to do in logic and they do not "know" what tradeoffs you want in order to hit the right balance between speed, gate count, and so forth.

The key to success. by ttucker · 2014-01-08 09:16 · Score: 5, Insightful

Do not ask a computer scientist to be an electrical engineer.

Re:The key to success. by multimediavt · 2014-01-08 10:08 · Score: 1

Do not ask a computer scientist to be an electrical engineer.
And for the sake of Pete's dragon don't hand him/her an electric screwdriver! Chaos will ensue.
Re:The key to success. by Anonymous Coward · 2014-01-08 11:00 · Score: 0

Unless the computer scientist does some actual work designing electronics. Well I guess he would be an electrical engineer then as well.
Never mind...
Re:The key to success. by crankyspice · 2014-01-08 11:24 · Score: 2

Do not ask a computer scientist to be an electrical engineer.
Except ... Wow. An early course in my computer science curriculum was:

201. Computer Logic Design I (3)
Prerequisite: MATH 113 or equivalent all with a grade of "C" or better.
Basic topics in combinational and sequential switching circuits with applications to the design of digital devices. Introduction to Electronic Design Automation (EDA) tools. Laboratory projects with Field Programmable Gate Arrays (FPGA).
(Lecture 2 hours, lab 3 hours) Letter grade only (A-F).
(We used Verilog and a Xilinx FPGA board.) I'm surprised a reputable CS degree wouldn't require at least a basic course in digital logic; Cal State Long Beach is a great school, but it's certainly not a standards bearer...

--
geek. lawyer.
Re:The key to success. by Anonymous Coward · 2014-01-08 11:51 · Score: 1

Yeah, that's for breadth of experience. It does not make you an electrical engineer any more than your college chemistry course made you someone Dow should be asking for advice.
Re:The key to success. by geoskd · 2014-01-08 12:18 · Score: 3, Insightful

Except ... Wow. An early course in my computer science curriculum was: 201. Computer Logic Design I (3) Prerequisite: MATH 113 or equivalent all with a grade of "C" or better. Basic topics in combinational and sequential switching circuits with applications to the design of digital devices. Introduction to Electronic Design Automation (EDA) tools. Laboratory projects with Field Programmable Gate Arrays (FPGA). (Lecture 2 hours, lab 3 hours) Letter grade only (A-F). (We used Verilog and a Xilinx FPGA board.) I'm surprised a reputable CS degree wouldn't require at least a basic course in digital logic; Cal State Long Beach is a great school, but it's certainly not a standards bearer...
There is a world of difference between an entry level college course on ASIC/FPGA design, and actually being able to do the job. Just because you can design and synthesize a projct with a few hundred gates in it does not mean you are even remotely prepared to know where to begin a project with 10^6+ gates in it. More impotantly, high level software languages allow for indescriminant serial loops which are massively difficult to deal with in pure hardware. In short, the design methodology is completely different if you are trying to build for a software path, or a hardware path. You need someone with a hardware mindset to take your algorithm back to scratch and start over. Even knowing the HDLs is not good enough, as it is relatively trivial to write "valid" VHDL or Verlilog code that cant be synthesized...

--
I wish I had a good sig, but all the good ones are copyrighted
Re:The key to success. by Anonymous Coward · 2014-01-08 13:38 · Score: 0

You have to be a software engineer to be able not to make that request.
Re:The key to success. by Anonymous Coward · 2014-01-08 14:12 · Score: 0

There's a rather vast difference between an introductory course and being able to use it in the real world. By your logic, somebody who has completed their HS health course is qualified to perform surgery...
Re:The key to success. by Anonymous Coward · 2014-01-08 18:46 · Score: 0

Yesterday I "learned" to program: printf("hello world.\n");
Today I'm 'qualified' to program a MMORPG from scratch.
And if you believe that, you need way more help than I can give you.
Re:The key to success. by Kagetsuki · 2014-01-09 02:06 · Score: 1

Yeah but you took ASM too and I seriously doubt you would call yourself a capable ASM developer unless you happen to be doing a lot of embedded code. Just because you've done some labs doesn't make you a pro. I've done FPGA dev using Verilog as well, and I've done enough to understand what it is and how to do it. I've also done enough to know if I wanted to make an efficient ASIC for a production application I'd shell out some cash to hire a pro rather than just assuming I could do it well myself without any professional experience or analysis.

Synthesis tools and estimation by slew · 2014-01-08 09:16 · Score: 1

Theoretical answer:
Recode your algorithm in SystemC (a c++ library that can be used to implement a register transfer language representation of your algorithm) and synthesize it with one of the available tools (e.g., Accelera, Synopsys, Calypto, etc) targeting a typical library (e.g, 28nm TSMC), at a particular clock frequency.

Practical answer:
Ask someone with hw design experience to estimate it for you...

FWIW, nobody wants an "exact" size in logic gates, all they want an idea in complexity. The big ticket items people care about are the size in bits of RAMs (and how many simultaneous read/write ports it might need) and complicated math that is likely to take more than 1 clock cycle to complete (e.g., like a floating point math operation) and the data-width of the main data path at the throughput that you want to have. Simply multiplying the data path width by the estimated number of pipeline cycles is generally proportional to the eventual area minus the RAMS and special math ops (which is why you need to identify those parts separately).

Generally, I've found that naïve "software" algorithms have not been very amenable to HW implementation without some amount of rework and the fact that you do not have an answer to the posed question would likely lead me (and probably your potential customer), that your algorithm is half-baked from a HW implementation point of view... Just food for thought...

Write it in C by Anonymous Coward · 2014-01-08 09:16 · Score: 0

Write it in C. Not C++, not Perl or PHP or, gods forbid, Java. Actual C.

Then run it through some debugging programs to get a ballpark idea of how many instructions are actually being implemented, and how many library functions are being called if you can't isolate the actual process to a small enough operation.

von neuman by Anonymous Coward · 2014-01-08 09:17 · Score: 0

As far I Know, if you want to run an algorithm , you need a Von Neuman computer, with rom, ram and I/O , because your algorithm need to read and write data values and read the execution code itself. This arragement is much more eficient than singles logical gates. What You will do when find a bug in your code ? Bake another bunch of chips ?

Re:von neuman by Austerity+Empowers · 2014-01-08 16:20 · Score: 1

Yet software algorithms that run on these architectures are converted to straight HW implementations all the time. It's just not "turn key", it takes quite a bit of work but it often pays off.

Convert to known assemby language? by Giblet535 · 2014-01-08 09:18 · Score: 1

Short answer: you need to contract an electronics engineer. Possible: You could dump the non-optimized assemby language (-S on most compilers) for a popular processor family e.g., 80686, PA-RISC, etc. The manufacturer probably has resources to convert "this pile of 80686 instructions" to "an ASIC that does the same thing really well".

Re:Convert to known assemby language? by Overzeetop · 2014-01-08 11:09 · Score: 1

Funny, that was my thought. I don't know squat about it, but it seems like a starting point if you had to make an educated guess.

--
Is it just my observation, or are there way too many stupid people in the world?

Here's my circuit for a simple problem:Good Luck! by deathcloset · 2014-01-08 09:18 · Score: 1

For what it is worth, here is a circuit I developed to see what the gate configuration (nor only) would look like for the implementation of a condition that the input switches be:

0
1
01
11
http://www.neuroproductions.be/logic-lab/index.php?id=3699

you know, the counting integers 0,1,2,3 - the same code that I have on my luggage. I thought there might be a fun implementation related to security or something - a hybrid mechanical/electronic locking system.

It turned out to be super hard for me to figure out this final result. Nonetheless, the result was most interesting and I encourage you to find a more efficient configuration.

I did this using basic logic and a crapton of that time-honored tradition of guessing and trial and error.

I can only begin to imagine the complexity of trying to implement and design circuits based on algorithms written in anything above assembler level.

Easy... by verbatim_verbose · 2014-01-08 09:18 · Score: 1

"It takes one gate that accepts our input and outputs a desirable answer. We would like you to design that gate."

Re:Here's my circuit for a simple problem:Good Luc by deathcloset · 2014-01-08 09:20 · Score: 1

*oopsy* that should have been
0
1
10
11

but you knew that ;)

Give it up... by Anonymous Coward · 2014-01-08 09:22 · Score: 0

This Bitcoin mining thing will never go anywhere :-)

One MILLION gates! by GodfatherofSoul · 2014-01-08 09:22 · Score: 1

Then, stick your pinky into the corner of your mouth and do your best evil laugh!

--
I swear to God...I swear to God! That is NOT how you treat your human!

I've done this before by Asmodae · 2014-01-08 09:24 · Score: 4, Informative

There's been several people who suggested using a high-level synthesis tool to convert your software (c/c++) directly to HDL (verilog/VHDL) of some kind. This can work and I've been on this task and seen it's output before. The catch is; unless that software was expressly and purpose written to describe hardware (by someone who understands that hardware and it's limitations and how that particular converter works), it almost always makes awful and extraordinarily inefficient hardware.

Case in point - we had one algorithm developed in Simulink/Matlab that needed to end up in an FPGA. After 'pushing the button' and letting the tool generate the HDL, it consumed not just 1 but about 4 FPGAs worth of logic gates, RAMs, and registers. Needless to say the hardware platform only had one FPGA and a good portion of it was already dedicated to platform tasks so only about 20% was available for the algorithm. We got it working after basically re-implementing the algorithm with the goal of hardware in mind. The generation tool's output was 20 times worse than what was even feasible. If you're doing an ASIC you can just throw a crap-load of extra silicon at it, but that gets expensive very quickly. Plus closing timing on that will be a nightmare.

My job recently has been to go through and take algorithms written by very smart people (but oriented to software) and re-implement them so they can fit on reasonably sized FPGAs. It can be a long task sometimes and there's no push-button solution for getting something good, fast, cheap. Techies usually say you can pick two during the design process, but when converting from software to hardware you usually only get one.

Granted this all varies a lot and depends heavily on the specifics of the algorithm in question. But the most likely way to get a reasonable estimate is going to be to explain the algorithm in detail to an ASIC/FPGA engineer and let them work up a prelim architecture and estimate. The high-level synthesis push-button tools will give you a number but it probably won't be something people actually want to build/sell or buy.

Re:I've done this before by Anonymous Coward · 2014-01-08 16:26 · Score: 0

I want to work where you work. I used to convert C++ DSP algorithms(written with hardware in mind) to Verilog RTL for ASICs.

Liar! by nobuddy · 2014-01-08 09:27 · Score: 2

I just tried this and all my money was transferred to a different account.

Re:Liar! by Anonymous Coward · 2014-01-08 09:40 · Score: 0

Thank you for your (not so) generous donation.
Re:Liar! by Anonymous Coward · 2014-01-08 10:08 · Score: 0

I just tried this and all my money was transferred to a different account.

Thanks, I wondered where that came from.
Re:Liar! by Anonymous Coward · 2014-01-08 11:07 · Score: 0

Thank you for that, btw... I was really starting to run low.
Re:Liar! by davester666 · 2014-01-08 17:40 · Score: 1

And I thank you very much for your contribution to keeping me in the lifestyle I deserve.

--
Sleep your way to a whiter smile...date a dentist!

Because by nobuddy · 2014-01-08 09:28 · Score: 1

http://davidmoorephoto.com/storage/cache/images/000/975/1I2B9665,large.2x.jpg?1372317810

profiling by Anonymous Coward · 2014-01-08 09:29 · Score: 0

Everything traditional for graphics has been done for ages in software, or in traditional hardware. Algorithms are known, and have been implemented. No-one would need to ask any questions. Now there's a new kind of algorithm, and questions are being asked. Means, something unusual, used for some unusual purpose, in a context where all-purpose processing capability is too limited / too expensive / to energy-consuming. Phones have been carrying around sufficient power for quite some time, anything wired is a non-issue. Something mobile, energy inaccessible, high expected runtime. Number of gates seems to matter, a small device. Ball-pen. A dedicated silicon for this purpose, price does not seem to matter that much.Wrist-watch. The algorithm itself is unusual, else there would be no need to ask questions. Nothing really new in the 2d or 3d area. Either a different kind of 3d than what usual hardware does, voxels or similar, or rendering something holographic. Wrist watch with holographic hands?

Neat. I want one.

Please make the FPGA as big as it can fit in to keep the frequency and thus energy consumption down. Don't want to recharge my watch as often as my cellphone.

ask a stupid question: by Anonymous Coward · 2014-01-08 09:29 · Score: 0

Take all permutations of inputs, generate all possible outputs, put them in a table. How much space do you need to store the table?

Well, that would give you one boundary condition, granted, a pretty extreme measure.
On the plus side, if you implement this way, it takes constant time to generate an output.

First step by Anonymous Coward · 2014-01-08 09:30 · Score: 0

The first step is to compile it into an architecture-neutral assembly program. Then, appropriate tools can turn it into a machine-specific set of instructions, and the answer will become apparent, or at least a reasonable approximation!

Why not have them figure it out? by Punto · 2014-01-08 09:31 · Score: 1

While an interesting question (I didn't even know hardware manufacturers were in the habit of converting software into hardware), why don't they figure it out themselves? They must have the tools/people to do it. Are you afraid they'll "steal your algorithm" if you give them the source? (that's much less interesting)

--

--
Stay tuned for some shock and awe coming right up after this messages!

What is the architecture that will run it? by Anonymous Coward · 2014-01-08 09:31 · Score: 0

Making electronics out of an algorithm depends mainly on the architecture that is going to run the algorithm. The only scenario in which estimating a gates count makes sense is if the electronic manufacturer wants to put it on an FPGA or make an ASIC out of it. If that is the case, a C-to-Gate compiler may be the route to go.

There are also plenty of off-the-self embedded processors with HW acceleration, DSP and GPU that may very well also tackle the problem efficiently on embedded software. And porting to embedded software is less expensive than converting it to gates. It depends strongly in your algorithm but the bottom line is that you may not need an electronic manufacturer as a partner.

How many gates could a gate chuck chuck if a gate by Anonymous Coward · 2014-01-08 09:31 · Score: 1

How many gates could a gate chuck chuck if a gate chuck could chuck gates?

Find an engineer who is good as software/hardware by Anonymous Coward · 2014-01-08 09:31 · Score: 0

and go from there. Do NOT listen to /.. HLSs have come a long way but I'd never really trust them(especially for an ASIC design). Also, FPGA LUT/BRAM/etc usage isn't really a great estimate of how many ASIC gates will be used. If you want a decent answer you're going to have to pay for it and you're NOT going to get it from /.. Good luck though, I think you'll need it.

Re: Find an engineer who is good as software/hardw by Anonymous Coward · 2014-01-08 09:33 · Score: 0

At*

Re:software as hardware?! but but but software pat by Anonymous Coward · 2014-01-08 09:34 · Score: 0

Nice troll. Algorithms can be implemented in many different ways in either software or hardware, much in the same way ideas can, because algorithms are ideas. Patents (used to) cover only specific implementations of ideas, not the ideas themselves.

Matlab has a solution for this, but $$$ by AmazinglySmooth · 2014-01-08 09:34 · Score: 1

Look at Mathworks. They have a solution for this.

Re:Matlab has a solution for this, but $$$ by Asmodae · 2014-01-08 09:44 · Score: 2

They have a tool that can do this, I don't know if I'd call it a 'solution' just yet though. We've just finished ripping out all the 'solution' for our project because we wanted a device that was actually small enough (and thus cheap enough) to be able to sell.

It takes input designed to be hardware and makes good hardware. It takes input designed to be software and makes shit hardware. It also doesn't handle version control very well, you need proprietary tools to even VIEW the design files... and the output which actually describes the hardware (vhdl) is so obfuscated as to be nearly illegible. The build times are also 4-5 times longer than they need to be, so it takes a whole day to place and route the designs output by this tool. Unless you're building something trivial I wouldn't advise depending on mathworks/simulink tools for a solution.

one approach... machine code by Anonymous Coward · 2014-01-08 09:36 · Score: 0

Compile the project and take a look at the generated machine code. Take the unique set of those instructions and look at the circuits needed by your x86 architecture to execute those instructions. This could be a reasonable estimate.

Re:Bingo by Anonymous Coward · 2014-01-08 09:39 · Score: 0

Go to a hobby site about EE learn about tools they use. Slashdot use to be is SE site...

MAKE by Anonymous Coward · 2014-01-08 09:45 · Score: 0

Seems like you should head on over to the MAKE forums http://makezine.com/forums/

why not go to /. by Anonymous Coward · 2014-01-08 09:46 · Score: 0

Looking at the responses, I'd say slashdot is a pretty good place to get technical advice.

Re:why not go to /. by sgt+scrub · 2014-01-08 16:12 · Score: 2

It is either good for that or good for picking up girls.

--
Having to work for a living is the root of all evil.

Cadence C to Silicon redux by Anonymous Coward · 2014-01-08 09:47 · Score: 0

Cadence's C to Silicon is just part of Cadence's Digital Implementation suite (whatever name they gave to it). As with any new tool to you, it will take some time to get something out from it. It is called C to Silicon, but it supports also SystemC to Silicon. SystemC is an extension to C++, although only a subset is supported for synthesis. You are free to download SystemC from Accellera.

A quick estimation of software complexity is hard to do. One thing the OP can do is count up how much memory the algorithm needs, as memory is usually expensive and might be slow to access. If the algorithm is a typical data in / data out process, he/she should also get a list of operations that needs to be done per input data unit. Let me get an example, video deinterlacer methods ( see http://www.100fps.com ). Discarding even or odd lines is the simplest solution. Blending needs some video lines in order to do the vertical rescaling. Weave needs you to store a video frame, or half image, in order to display both even and odd lines at the same time. And at the end you might consider RAM intensive algorithms like area, temporal or motion based algorithms, with added extra motion blur.

So, no, it is not easy to guess what algorithms will take. However, it should be straightforward to know RAM usage.

Re:Cadence C to Silicon redux by solidraven · 2014-01-11 03:28 · Score: 1

Well, it really depends on the algorithm I'd say, simple things are easy enough to estimate depending on if you wish to run it in parallel or not. But if they come to you to ask for it that's usually not the case I figure.

Use high level synthesis by Anonymous Coward · 2014-01-08 09:53 · Score: 0

I see some people mentioning high level synthesis (HLS). This is the right way to go. There are academic tools, tools given practically for free by FPGA vendors, and then tools that sell for $$$ (Disclaimer: my company sells one such commercial HLS tool). HLS has come a long way, after more than 25 years of R&D going into it. Graphics, image processing, signal processing, etc. are the sweet spots of HLS, where the quality of the generated RTL is comparable to or better than human written RTL. Looks like you'll fit into the sweet spot.

So for quick and dirty estimates, you can taken an academic tool. For more serious work, I suggest you get an evaluation license from one of the HLS vendors, or FPGA companies, and try to get your synthesis flow down to the gates going. Note that there are two types of synthesis tools required in the tool chain to get gates. One is the HLS tool, and one is RTL synthesis tool. However, for estimation purposes, you can put some faith into the estimates that the HLS tool itself will provide.

Re: Difficult, but ... by Anonymous Coward · 2014-01-08 09:56 · Score: 0

Amen to Asmodae teachings. I work with people that translate from Matlab, written by bright mathematicians, into Verilog, and let me tell you it is not as straightforward as many people think. Some way to do things do not fit into existing hardware.

True Story by Anonymous Coward · 2014-01-08 09:57 · Score: 0

x264 is the BEST video encoder in the world, and rests muchly on the shoulders of one young programmer. This guy was associated with a video rebroadcast company that wanted the best possible solution for scalable H264 encoder systems. Now, x264 is open source, and usually runs on various types of standard x86 PC equipment. The young x264 guy convinced himself that OBVIOUSLY x264 algorithms could be moved to dedicated hardware logic, and sought financial backing from the broadcast company to finance this.

Long story short- many years later, x264 is STILL a solution almost entirely used on general PC hardware. The attempt to move the decent quality modes of x264 encoding to dedicated hardware was a total failure. The dreadful quality of Intel's (very slow) H264 hardware encoder built into most current Intel CPU parts shows the problem. Hard wired logic is very good at very dumb and simple algorithms, but lousy when the algorithm becomes more sophisticated. This fact is why we have CPUs (and the modern computer architecture) in the first place.

What company would be so cretinous to seek a hard-wired solution, when an ARM SoC, capable of driving a 4K display, can be had for less than 5 dollars?

I would remind you that a few years back, when ill-informed home cinema enthusiasts were paying 1000 dollars plus just for a video box that scaled the video image, people with brains were getting VASTLY better scaling functions from their PC video cards that cost less than 200 dollars. The dedicated hardware solutions were putrid (and obviously completely non-reprogrammable).

However, most 'professional' (and I use the word very lightly) video personnel in the broadcast industry still use insanely over-priced hardware junk from dinosaur companies that date back to the birth of TV, and get business on their (long irrelevant) reputations. So, for instance, the quality of the vast amount of broadcast TV is GARBAGE, because real-time H264 (and MPEG2) encoder hardware boxes are used, that produce the worst possible compression possible for the unthinkably large file sizes each hour of TV represents.

Again, what 'video' algorithm cannot be implemented as a general shader operation on the GPU of a modern, super cheap ARM SoC part?

find a smarter partner by samantha · 2014-01-08 10:00 · Score: 1

Computers are so cheap and low power today that turning an algorithm into gates would be a silly way to proceed. So the question is not really relevant except academically.

Re:find a smarter partner by Austerity+Empowers · 2014-01-08 16:27 · Score: 1

Unless you wanted to sell a chip that had this feature built in to it. Thus people do this operation all the time. It just takes someone with RTL experience to do.

We actually solved this.. by rayhoare · 2014-01-08 10:02 · Score: 2

We (ConcurrentEDA.com) have developed a tool call Concurrent Analtyics that analyzes a program's x86 code and estimates the gate count. This tool works for Xilinx and Altera FPGA chips and provides an upper bound since logic optimization reduces the gate count. Essentially, we have an extensive library of all software assembly instructions and their gate count in an FPGA. Synthesizing software into a chip requires more work but we have an internal tool for that as well. We translate x86 into a hardware description language (HDL) that the vendor's tools synthesize into FPGA gates. Over 1 million lines of high-performance HDL have been generated using these tools since 2006. Both tools are internal tools that we use to offer accelerated FPGA design services. (feel free to contact me directly RayHoare _at_ concurrenteda _dot_ com)

Knowing The Algorithm Is NOT Enough by SplawnDarts · 2014-01-08 10:03 · Score: 5, Insightful

Knowing what algorithm you want to run in hardware in not even close to enough to estimate gates. You need to know the algorithm, and the required performance, and have a sketched out HW design that meets those goals. THEN you can estimate gate count.

For a simple example of why this is, consider processors. A 386 and a Sandy Bridge i7 implement very similar "algorithms" - it's just fetch->decode->execute->writeback all day long. If you implemented them in software emulation, it would be very similar software with some additional bits for the newer ISA features on the i7. But a 386 is about 280 THOUSAND gates, and the i7 is about 350 MILLION gates/core - three orders of magnitude different. Of course, there's at least a 2 order of magnitude performance difference too - it's not like those gates are going to waste.

Point is, knowing the algorithm isn't enough to get even a finger in the wind guess at gate count. If you need an answer to this question, you need to get competent HW design people looking at it.

Re:Knowing The Algorithm Is NOT Enough by Stumbles · 2014-01-08 10:40 · Score: 1

Their best bet is (if they are still alive) contact the old time hardware engineers predating chips. I worked on several systems in the Air Force designed and built in the early 60s and just before the first microchips were created.
One such machine was used to calculated the range of geosynchronous satellites using TTL (can transistors only) occupying an entire equipment rack. In a nutshell it was a hardwired computer designed to do one thing and one thing only.
I think if they could reduce their program to boolean equations it might be possible to get a ballpark idea the needed gates.

--
My karma is not a Chameleon.
Re:Knowing The Algorithm Is NOT Enough by Overzeetop · 2014-01-08 11:16 · Score: 1

Why not? A 386 does (within a certain limit) exactly the same thing that an i7 does, it just does it faster because it can run more operations in parallel and at a higher clock speed (to take a simplistic view). The minimum number of gates required simply sets the baseline speed of the final product. To get more speed you add more parallel processors, up to the available parallelism of the problem to be solved.
Knowing the algorithm would seem to allow a reasonable lower bound to be placed on the number of gates, from which a baseline speed can be determined.
If the algorithm takes 175,000 gates and you know the processing speed you can determine the throughput. If you know the necessary/target throughput you know how many pipelines you need and, if you're an ASIC mfr you have a good idea of the % overhead required to parallelize and the in house estimating group can take it from there to evaluate feasibility.

--
Is it just my observation, or are there way too many stupid people in the world?
Re:Knowing The Algorithm Is NOT Enough by bigtreeman · 2014-01-08 18:13 · Score: 1

if the 386 was implemented with the same geometry as the i7 the performance difference would be ???

--
Go well
Re:Knowing The Algorithm Is NOT Enough by SplawnDarts · 2014-01-09 01:59 · Score: 1

Substantial, certainly. The deep pipe for fetch/decode and the superscalar backend make a big difference. Maybe 10x and 2x or so respectively. They also interact with the memory system (system RAM and caches) very differently so it's hard to make a perfect comparison.

best answer by swschrad · 2014-01-08 10:08 · Score: 1

because if the hardware company is thinking "gates" instead of "cycles," they want to implement it in a FPGA. hell, if they were going to put it on a dedicated microprocessor, they'd just recast it with libraries for that processor and recompile.

--
if this is supposed to be a new economy, how come they still want my old fashioned money?

Count operations for a rough gate estimate by erice · 2014-01-08 10:12 · Score: 1

The manufacturer is probably asking how many gates you need to implement the algorithm exactly as it is coded, with exactly as much parallel or sequential logic as it already has, and that will have a fairly specific answer.

While that number could be determined, it would not be very useful. Hardware implementation, especially when targeting FPGA's, get most of their performance advantage by exploiting more parallelism than is achievable by running on a processor.

No, the manufacturer isn't make any assumptions about how the algorithm is translated. The deal in gates. Gates are the most direct measure of how much the hardware will cost to manufacture.

Without a direct number for gates you will to come about in in a more indirect fashion. How much memory does the algorithm use? What data structures are used how big are they? (*all* data structures. An integer is a data structure for this purpose) What operations (adds, subtracts, etc) are needed and how many are required to go from input to result? With those you can usually come up with a ball park guess of how many gates will be required. There are always optimizations and non-obvious operations that get overlooked but it is a good start.

If you are using floating point by DrFalkyn · 2014-01-08 10:13 · Score: 1

The answer is "too many"

Best answer by multimediavt · 2014-01-08 10:21 · Score: 1

"You cannot directly interpret a software algorithm to hardware." Why? Here are the follow ups: What type of hardware, FPGA, GPU, custom ASIC? What part of the algorithm NEEDS to be in hardware to gain performance over basic system resources (CPU, GPU)? Who is going to pay for this little experiment?

As others more qualified have already stated, you rarely if ever get a direct translation nor do you always need to interpret the entire algorithm to hardware. For a hardware manufacturer to even ask the question is suspect, unless it was a sales or marketing rep, then it might make sense. The hardware people will know best how to do this, for them to ask you ... RUN!

My suggestion would be to say thank you and stick with software. You will probably spend enough time working this out that someone else will implement it before you, better. If you're not talking to Nvidia, AMD or Intel you're probably wasting your time.

cmos logic by Anonymous Coward · 2014-01-08 10:22 · Score: 0

I seem to recall that during my hardware classes for Computer Science, there was a method for designing our own ALU and arithmetic operators by using CMOS logic. It's been a while, but the computational part of your algorithm can be designed by means of CMOS logic. It might not be the most efficient design (Heck, you can make everything with NAND's, but it might not be optimal) but it'll give you at least some indication of the required hardware for the execution of the algorithm.

Eventual memory use and storage of variables do fall out of the scope of this, however. And it tends to be quite some work.

C to RTL converter by kursancew · 2014-01-08 10:25 · Score: 1

The best path for you would be ForteDS Cynthesizer, Mentor Catapult C and C-toSilicon from Cadence. Those are behavioral synthesis tools. I have used some of those and they are very strong for datapath oriented designs, if you follow their design guidelines they are quite good to convert C code to RTL. There's a free thing you can try call xPilot, never touched it though...

--
linux user #271173

Re:software as hardware?! but but but software pat by sourcerror · 2014-01-08 10:29 · Score: 1

You can't patent math. Does that mean that the world doesn't exist?

Concurrent Analytics solves this by rayhoare · 2014-01-08 10:38 · Score: 1

We (ConcurrentEDA.com) have developed a tool call Concurrent Analtyics that analyzes a program's x86 code and estimates the gate count. This tool works for Xilinx and Altera FPGA chips and provides an upper bound since logic optimization reduces the gate count. Essentially, we have an extensive library of all software assembly instructions and their gate count in an FPGA. Synthesizing software into a chip requires more work but we have an internal tool for that as well. We translate x86 into a hardware description language (HDL) that the vendor's tools synthesize into FPGA gates. Over 1 million lines of high-performance HDL have been generated using these tools since 2006. Both tools are internal tools that we use to offer accelerated FPGA design services. (feel free to contact me directly RayHoare _at_ concurrenteda _dot_ com)

Handel-C by Endophage · 2014-01-08 10:39 · Score: 1

I don't know how easy it would be to port your specific algorithm, but I did my masters thesis around a language called Handel-C. It's a super-set of C that provides a high level FPGA programming interface. That might get you some distance in determining the number of gates. Disclaimer: I was working with it a few years back and the documentation/support was appalling, I don't know if it's become any better.

Why are they asking you? by davesque · 2014-01-08 10:40 · Score: 1

It seems like, if you could describe the algorithm in a sufficiently low-level language like C, they shouldn't be asking you how many gates it would take. If they're the hardware manufacturer, they should know. Besides, there are too many factors that could influence the gate count depending on how the manufacturer decided to implement the adders, etc. None of these things seem like questions that programmers should be responsible for answering.

or even a company by Anonymous Coward · 2014-01-08 10:58 · Score: 0

Also consider that there are plenty of companies around that will happily do the translation from code to gates for you and have experience with these design flows. Talk to them, you'll see that translating algorithms to gates is just one of the pieces of the puzzle.

Walks Like Troll... by Anonymous Coward · 2014-01-08 11:00 · Score: 0

Truly this is a bizarre question to start with,
Software algorithms do not exactly translate to hardware, only some functions of the code can be translated...
For the Hardware Manufacture to ask this question is even stranger if the code. In the end none of this adds up.

It smells more like a some trolling for advice on converting some sweet code that fails to scale cost effectively into something that can be stamped as an asic....

Then again... what do I know

Re:Walks Like Troll... by Austerity+Empowers · 2014-01-08 16:30 · Score: 1

Nope, you can translate most anything if you are patient, and throw enough gates and sram at it.
And HW manufacturers are manufacturers, they don't necessarily know anything about the designs they manufacture. Foxconn is a prime example, they are clueless as all get out about any form of design. Including and perhaps especially their design centers.

Ditch your electronics manufacturer by Anonymous Coward · 2014-01-08 11:17 · Score: 0

Your electronics manufacturer's question shows that they don't know what they're doing. Hardware and software engineers aren't supposed to even think at the gate level anymore, that's the foundry's job. If neither you or they don't have people who can translate between your algorithm and the foundry's toolchain they should at least know a capable partner who they've worked with successfully in the past.

If they can't even offer you a capable partner then they're probably a lame middleman -- find someone else.

Now that the practical answer is out of the way, let's answer your actual question.

Generally, any algorithm you can describe in a programming language like C is turned into a custom processor when implementing it in an ASIC. The overall steps are:

1) Product requirement analysis (need vs want feature breakdown, cost constraints, timeline, human and capital resource analysis)
2) Based on 1), determine whether a simpler approach such as a FPGA or synthesizable core will work instead of a full custom design. If so, then just have your partner do that.
2) This bears repeating: you really want something like an FPGA or a synthesizable core. There hasn't been a successful business case for hardware acceleration ASICs except in extremely high end boutique markets in recent memory.
3) If you really, really have to do a full custom design and based on 1) you can afford it, break down the algorithm into steps and draw any data dependencies. The longest chain of data dependencies determines your pipeline length. This sketch is the "datapath diagram".
4) Look at all the math operations and turn them into basic logic blocks using your first semester logic design textbook and your foundry's cell library.
5) Figure out the worst case scenario for each logic operation which might be executed simultaneously and build that many of those logic blocks.
6) Now that the datapath and logic units are sketched out, design a control system which uses a hardcoded ROM to attach the various logic units to the buses and each other in the proper sequence to implement your algorithm.
7) Attach a simulated clock to all of this and do a lot of debugging.
8) Synthesize all of this junk using the foundry's toolset and do even more debugging.
9) Eventually the foundry will accept the design (DRC pass) and you can tape it out to them. They will bill you a very large amount of money at this point.
10) You will get a shipment of chips which won't work and there will be even more debugging. At some point you have to send a mask or two back to the mask house to get them to fix something. The mask house will bill you another very large amount of money at this point.
11) Eventually after repeating 9 and 10 several times you'll either run out of money or you'll have a successful chip. The amount of resources poured into the project and smart people you'll be working with will make you laugh hilariously at ever considering posting such a question to slashdot.

Note that at no point were gates ever involved.

Re:Ditch your electronics manufacturer by Austerity+Empowers · 2014-01-08 16:32 · Score: 1

Gates and timing closure is a physical designers job, not the fab. And he trades area, clock rate and power based on design intent. But for FPGAs it's fairly straight forward.

Re:software as hardware?! but but but software pat by Anonymous Coward · 2014-01-08 11:18 · Score: 0

Please buy a dictionnary and learn what a patent is, and what is patentable. then you'll understand what people mean by software patent, and why it is evil.

and btw it is only one of the many terrible trends that corrupts the patent system and transform it from a innovation incentive the the exact opposite

Please don't let programmers develop hardware by niks42 · 2014-01-08 11:20 · Score: 1

You'll end up with a hardware emulation of a software algorithm, which will necessarily be slower and less efficient than the correct answer, which is to design a hardware solution to the original problem.

Hire a professional by Anonymous Coward · 2014-01-08 11:25 · Score: 0

Go to a company like Plexus who knows a thing or two about RTL and hardware design.

http://www.plexus.com/solutions/design

a common misconception. "the laws of nature" by raymorris · 2014-01-08 11:43 · Score: 1

That is a common misconception, spread by people who like a certain type of FUD. In fact, what is not patent patentable is "the laws of nature, including those of science and mathematics".

The LAWS of nature. You can't patent gravity, you can patent an elevator. You can't patent refraction, you can patent an acoustic lens. You can't patent the associative property of addition, you can patent a scoring system for detecting bogus reviews.

If you take out "the laws of" and replace it with "anything using", THEN you would end up with "you can't patent anything using nature, including science and math", but that's not the law. The law is that you can't patent the laws of nature, including mathematical LAWS. You can patent things that are scientific, and you can patent things that are mathematical.

If y
you think about it, it makes sense. You can't invent "x + 1 = 1 + x". That's always been true. However, you CAN invent a way of detecting suspicious stock trades. Since that could be a new invention, it could be patented.

bad patents exist, good ones do, software don't by raymorris · 2014-01-08 12:02 · Score: 1

> Patents (used to) cover only specific implementations of ideas, not the ideas themselves.

That's a legitimate criticism of many patents. Of course, an implementation IS itself an idea, so we'd need to be a little more specific with our vocabulary in order to really talk about policy. Saying "you shouldn't be able to patent ideas" won't quite get us there. Certainly we can say "goals, objectives, shouldn't be patentable; only METHODS for achieving an objective should be."

Certainly there exist bad patents that are too broad, that cover an objective rather than a method or mechanism. On the other hand, there simply is no such thing as a "software patent" per se. The problem with the patents is that they are over broad. Whether they cover something made of wood, plastic, or magnetized iron particles is irrelevant.

The regression of computer science...fiction. by 3seas · 2014-01-08 12:05 · Score: 1

http://en.wikipedia.org/wiki/Turing_machine

nough said

Bad question, apples and oranges by Anonymous Coward · 2014-01-08 12:28 · Score: 0

How many (logic) gates would be needed to turn your software algorithm into hardware

Non-nonsensical question... they're very different beasts, processing in very different ways (generalised vs specialised hardware). The answer really means *designing* the whole thing again in hardware. Or just replying with the number of gates in the entire PC.

Using labview compiler for FPGA by dsoodak · 2014-01-08 12:33 · Score: 1

Haven't done this myself, but you can evidently run Labview programs ("virtual instruments") on some FPGA chips. You'd have a good estimate (plus an actual digital circuit) if you translated your code to labview (I believe the actual language is called "G") and found a copy of the add-on which turns this into verilog. -- Dustin

Left as an excersise for the reader... by svirre · 2014-01-08 12:35 · Score: 1

As a first pass you can estimate adders as 10 gates pr. bit, state as 20 gates pr. bit and multipliers as 10x bits squared (Unless it is by a power of two in which case it is free) If you need to to division in your algorithm you should redesign it. If you use floating point, everything gets huge (Try not to use floating point, remember in hardware you do not need to deal with arbitrary word size restrictions, just scale word sizes to suit the requirements)

Now, figuring out exactly what resources you need, this is where you will get into trouble. Normally you will reuse some (lots) of your arithmetic, but exactly how much depend on what performance/power/gate count target you need to hit. More reuse means less gates but faster clocks (Which can drive you to more gates if you get into trouble on timing closure). The extreme case is software which just reuse a very limited set of ALUs, the other extreme is an unrolled design where algorithmic operation have dedicated hardware, so one iteration takes one clock.

Depending on performance targets the same algorithm can have a factor 1000 difference in gate count.

convert C to verilog by Anonymous Coward · 2014-01-08 12:43 · Score: 0

http://www.c-to-verilog.com/

Programmers == Terrible ASIC Designs by Anonymous Coward · 2014-01-08 12:48 · Score: 0

Most programmers make terrible ASIC designers.

Getting the SIZE or NUMBER of gates is one part of the equation. Computing or evaluating the performance is another.

How many CLOCK cycles does it take to do that operation. Not machine cycles, but CLOCK cycles. That is:

-FETCH, DECODE, EXECUTE, RETIRE, each on of these operations may be more than one CLOCK cycle, depending on data locality, cache hit/miss, and overall architecture.

what is a simple java line, may require hundred, if not thousand of gates, and many many clock cycles.

When designing hardware, in real ASICs, you have essentially NOTHING. No memories [besides bit-wide/deep], no multipliers, no dividers, no floating point, no IEEE compliance, unless you either licence it, or design it yourself.

Then we get into, how rich is the technology library, what kind of gates do you have, what transistor sizes, what speeds...

General answer by marcopo · 2014-01-08 12:56 · Score: 1

That depends a whole lot on what kind of hardware you want to use. One way is to implement a universal Turing machine, and give it the code as input. Those can be quite small, and you don't even need access to the algorithm to find the answer.
You're probably looking for a more efficient implementation.

There is a tool by Anonymous Coward · 2014-01-08 13:22 · Score: 0

There is a tool that automatically converts executables targeted for specific processors (the ATMEL AT90s8515 and AT90s8535) into custom circuits in Verilog. If your "algorithm" were compiled using AVR Studio to create a .bin, you could use that tool to generate a Verilog circuit, then synthesize it using a synthesis tool (such as the Xilinx synthesis tool) and get an estimate in a matter of minutes. See www.fromorrow.com for details.

There's politics involved here by cloud.pt · 2014-01-08 13:51 · Score: 2

I believe the OP is asking the question with an underlying motive that most users aren't grasping - The manufacturer definitely has a way of estimating the gate "cost" from C++, as some experts on the matter have pointed out here, but for that he probably demanded source code, which the OP probably has no safe way of handing over without compromising his Intelectual Property. He doesn't want to lose the business contract or spend money blindly on a consultancy he doesn't even know which's name is, so the question makes FULL SENSE regardless of its child-like semantics.

You can probably bet the manufacturer is based and/or has legal safe-haven in a dodgy country, along the lines of having properties like:

An established electronics manufacturing industry;
Low respect and legislation for IP and the concept of royalties

(hint: China) ...This makes the OP think twice about passing source around.

Now, my personal opinion regarding a possible answer is more business-focused - if such a kind of manufacturer is even remotely interested on your "product" as to ask that, then you have a very marketable piece of code on your hands and you need to do the following...

Find a "safer" buyer - something based on Europe (Germany?), Japan, or maybe the US if location is pinnacle over legislation. This nets you light IP protection
Spend on a good legal advisor to draft a nuke-proof NDA with special clauses like "if we give you the code for estimation of costs, you either buy it or refrain from implementing similar technology for at least N years" (N>10)
Despite all this, you still need an expert on electronic device manufacturing by your side, and I mean full-time. This also ensures you don't get robbed when they don't gain leverage on a final money deal with you by stating "it's too much gates! We can't pay more than XXXXX"
In alternative, find business angels, investors or waste a TON of money and do the hardware YOURSELF, under your own company's umbrella, or maybe some form of partnership. This is the stuff that makes you a millionaire, but also places a lot of risk on your side.

Learn Forth. by crovira · 2014-01-08 14:10 · Score: 1

Charles H. Moore wrote it and extended it to be able to compile directly into silicon.

Forth is actually a TIL [Threaded Interpretive Language] but it is so easily extensible that it is possible to implement all the way to the gates.

Moore was working with Forth to do exactly that last I heard.

--
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.

Re:Learn Forth. by Anonymous Coward · 2014-01-08 19:00 · Score: 0

If you boot into Open Firmware https://en.wikipedia.org/wiki/Open_Firmware on a PowerPC Power Mac you're in a Forth interpreter environment. But that's as much as I know (without doing research) - maybe Apple wrote that firmware in Forth?

Rough rule of thumb by Anonymous Coward · 2014-01-08 14:23 · Score: 0

If you application is complex, then you're SOL. You will probably need a state machine, or a complex of functional units coordinated by some kind of programmable unit.

However, for a simple loop, rough rules of thumb:
* count the upper bound of the loop (say N), and the number of arrays used in it. For each array, allocate a RAM of N rows x b bits where b = 64 bits if you need double precision otherwise 32 bits. That gives the RAM size
* count the number of operations that do not involve the loop count. Count addition+subtraction, multiply & divide operations, and integer/float/double operations separately. Ignore any operations where one of the arguments is a constant.That should be enough for a h/w guy to estimate the logic gate count
* Figure out how many iterations of the loop you need to do per second. That will give you the cycle time.
* Figure out the critical path of operations through the loop. Assume integer add/subtracts have wt 1, double precision add subtracts, integer multiplies have wt 3, integer divides have wt 6
and double precision divides have wt 10. Find the longest cascade of operations in the loop. That will tell the h/w guys an idea of whether they can meet the cycle requirement without some radical engineering (roughly speaking if the wt of all operations in the critical path * number of operations per second > 1e9, you'll have to rethink things), and how to size the various compute units (e.g. whether they can use a ripple carry adder, or have to do a look-ahead).

Again, these are VERY rough rules of thumb.

Fascinating problem, approximate solution? by beachdog · 2014-01-08 14:31 · Score: 1

It is more than 30 years ago I learned digital logic from Blakeslee's Digital Design with MSI and LSI. These days I program an Arduino. I have my hands full just reinventing the spokes of a 20 year old wheel.

I think you have a fascinating problem. Suppose you treat your computer program as a black box where you feed pages of data into one end and you get pages of output data out the other end. Suppose you say each page of data is an x,y grid of image values. You could say, your problem is for a central pixel in the image, you want to write a truth table. An initial truth table is the values and locations of the pixels from the immediate preceding image that when always present always result in the specific value of that pixel.

Your image processing process probably uses data from several preceding images to come up with the result. If it takes five preceding images, then the truth table for a single pixel picks up five more blocks of data about the state of the surrounding pixels. No matter how wonderful the computer program may seem to be, it is still a finite state engine. The present state of the image or output (we hypothesize) should be dependent on the some number of previous states of the image.

The process is a classic series dance steps for extracting the essential predecessor logic states. The Blakeslee book models this better than I can remember after all these years. The steps are normalize, simplify, flag all the dont't care states and gracefully conceal or wrap the data to handle the physical edges . When you have one of these cubic things, you go through a simplification process. first, you normalize the input data which means remove the numerical clutter and have a single number. Another feature of the extraction process is you sort the truth table output column and input columns and you try to mark as many of the input columns with does-not-matter as possible. A third thing is, you set a limit on the depth of the input data and that means the possible values for an output point are limited because the permuted possibilities are capped, and that cap is usually an exponent like 2^N.

The resulting gadget will be a truth table that grinds out something like x=1 for a=1, b=0, c=1, d=0 and on and on.

Unlike rewriting the software and putting it on a programmable gate array, This is an approach at writing a state table that produces an approximate pixel based on looking at a chunk of images containing that pixel.

How many? by Avalanche_Joe · 2014-01-08 16:49 · Score: 1

One, two, three - crunch! Three licks to get to the center of a tootsie pop! Wait. Wrong question, never mind.

and how fast does it need to go? by Anonymous Coward · 2014-01-08 17:44 · Score: 0

At one level, you say "implement a Power PC or SPARC core" (see, e.g., the LEON cores from http://wwww.gaisler.com/) and just store the code in ROM (for which there are well defined "bit->gate" equations).

If you're looking at implementing the algorithms in a more "hardwarey" or "data flow" method, then it's not so easy.
For instance, implementing a Costas loop in a sampled data system is VERY different when using discrete multipliers and adders than if you are doing it in C.

Example.. if you're doing a digital filter, pretty much every "sample" there's 4 multiplies and 4 adds (for a second order section IIR filter), but you can optimize things like the coefficient precision, etc. There's also decisions about precision.. how many bits do you carry after a multiply?

So, the filter might be:

y(i) = b(0) * x(i) + b(1) *x(i-1) + b(2) *x(i-2) + a(1) * y(i-1) + a(2) *y(i-2).

Practically speaking, this would usually be done something like

y = b0*x + b1*x1 + b2*x2 + a1*y1 + a2*y2
x2 = x1
x1 = x
y2 = y1
y1 = y

Even if you were implementing it in assembler, there's lots of options on how you address the coefficients and inputs.

But you can imagine schemes with registers and a single multiplier and adder, or with shift registers, or with multiple multipliers and adders, and then throw in multiple precision arithmetic, and whether you do ripple carry in a systolic array or synchronous operations with a Cray multiplier and fast adders. Heck, if you're clever you can probably do it in a half dozen gates with a lot of shift registers, and do the adds and multiplies one bit at a time serially. (See, e.g., the Millman and Taub textbook for examples)

Even though VHDL and Verilog resemble C (particulalry VHDL), don't be deluded into thinking you can take your C code and just synthesize it as gates.

dissasemble by PC_THE_GREAT · 2014-01-08 17:51 · Score: 1

give him the code in x86 assembly :p, most probably if he is into hardware, he should get the information he seeks from this. If he complains that he doesn't understand because he doesn't code, then tell him that you do not design hardware equally, that is the best thing you can come up with. If all else fails and that you are willing to pimp your arrogance and ego for money, then write a simple perl parser that will parse your program and replace specifically where there are logical decision to their respective gates, http://www.chem.uoa.gr/applets/appletgates/Images/Image1.gif
I would also if i was to pimp out my arrogance and ego to gain money and succumb to doing something out of what i like to do to design an algorithm using gate, it would be no different than learning a new language, map out the if, add, sub, mul, (division gets a lil bit tricky), from that you can build anything, once you make this map, use a perl parser to parse your program as mentionned and make it generate it according to the map you made.

This might help also http://www.i-programmer.info/programming/hardware/4626-getting-started-with-digital-logic-logic-gates.html

:p am no electronic's person, but to solve any problem, a good background study is needed :p.

who knows, if you write it out and open source it out :p it might be of some use to someone else [the perl parser to convert a language to logic gates]!

That's an NxN problem by Anonymous Coward · 2014-01-08 18:02 · Score: 0

Here is an example of how you can look at this problem. A disk reader for very early computers was all build in TTL (transistor-transistor-logic) hardware. It took over a dozen chips. The early Apple Computer guys (I think Steve Wozniak was one) did it with 3 chips and a bunch of software. Certainly you can store a program in a ROM, or a fusable link, or a FPGA (field programmable gate array), but if you are talking about implementing an algorithm in hardware, you are talking about computation, specifically calculation. That usually means you will need something like (at least part of) an arithmetic logic unit (ALU). If you need multiple divides, you can build one divide circuit, and re-use it, or if you need multiple divides at once, you will have to build more. Ultimately if you are calculating something, you may need intermediate steps (one result acting as input to the next step) will mean a multi-step, multi-time clocked circuit. There are many ways of calculating anything (tautologies), and many ways of implementing an algorithm in code, and likewise many ways of turning that code into circuits. Its an NxN problem.

Documentation by bigtreeman · 2014-01-08 18:07 · Score: 1

Remove your C source from your documentation and comments
( which were written first )
now design your hardware.

--
Go well

Sell them the algorithm and collaborate with them by JackChang · 2014-01-08 19:51 · Score: 1

As many pointed out, it's a quite complicated question doesn't have a straightforward answer. I used to worked on RTL implementation of graphic algorithms for years and I can say there can be night and days between different implementations of the same algorithm. Also unspecified is their performance requirement. What kind of input your algorithm is expecting? YUV? RGB? CMYK? What's the expecting throughput? How much memory and I/O bandwidth your algorithm is going to take? How many temporary registers are needed? Do they allow deep pipelining and longer latency? What's the fab process they are going to use? There are far too many variables need to be taken into consideration. Also some seemly minor tweaks can bring major differences on both area and speed. I once helped a friend to optimize a supposedly simple error diffusion pipeline. Took us 3 weeks to shrink original design into one third size of original implementation while improved its performance by 15% at the same time. I would say simply tell them you don't know because you are software people unless they are only interested in synthesiz-able codes. Finding someone to write it in RTL for you can be a much heavier burden then you might expected because it's hard to manage something you don't understand. Chances are you may also need to change your algorithms a bit because some operations aren't feasible in hardware with reasonable cost, and some operations can't be removed or simplified.

Been there, done that. by treczoks · 2014-01-08 20:28 · Score: 1

Compare this to the following situation: A graphical designer draws a fantastic new GUI for an application (on a piece of paper, even). Then you ask him how many lines or kilobytes of code this will be. And then the designer asks: "Can't I just scan the pictures I've drawn and have a software figure this out?". Sounds riddiculus? Yes, but: This is what you wanted.

To answer the original question: The only realistic estimate is to add all the gates you've got in you computer, and take that as an upper bound. Which is still just an estimate, because implementing an algorithm for real-time in hardware can still increase the gate count by leaps and bounds.

To be able to answer such a question you have to re-implement the algorithm in a Hardware Abstraction Language (HAL) like Verilog or VHDL.

I did this in one of our current systems, where I had to process a stream of data.
First I designed an algorithm in C which took an infile, generated an outfile and measured the "quallity" of the output.
Then I re-implemented the algorithm in VHDL, which looks and "thinks" totally different than the original C source (but still DOES the same).
Only after that one can give a realistic estimate (based on the target system/platform and timing constraints) on gate or cell counts.

Fingers off! by Anonymous Coward · 2014-01-08 20:54 · Score: 0

There is no way you can deliver a sane estimate. You wrote: "Maybe an operation like 'Add' would require 3 gates while an operation like 'Divide' would need 6 gates? Something along those lines, anyway." An operation like "add" uses about 8 gates (give or take a few) for adding a single bit with carry. Adding 32 bit takes significantly more than 32 times that since you want "carry lookahead". Division (64/32 bit, 32bit result) takes a few dozen gates in addition in order to deliver one bit per cycle. Square roots similarly. If you want division to be faster than that, you have to throw in a lot more gates, several orders of magnitude.

And so forth and so on. You are totally out of your depth here. The only sane answer is "We'd have to talk to hardware people. Do you have suggestions?". There is no way you can learn enough about this to deliver an estimate that can be relied upon.

Lulz. by Anonymous Coward · 2014-01-08 21:30 · Score: 0

The quality of some of the responses on Stack Overflow would leave me to believe he'd probably get responses telling to count the conditionals and allow one gate per conditional ;) :p

Actual answer by Anonymous Coward · 2014-01-08 21:48 · Score: 0

Somewhere in the history of this question is a conversation in which someone there has asked "Can it be done in hardware?" and someone in your business has said "Yes". You are here because that latter person said "Yes" and not "Perhaps, with work" because "Yes" means you already know the answer to this question.

So find that person who said "Yes" and ask them where the hardware design they said you had is. Then count the gates.

Since nobody else here is prividing much help... by Wierdy1024 · 2014-01-08 22:21 · Score: 2

I shall give it a go.

First up, most algorithms can't be directly translated to hardware without either changing them or taking a serious performance hit.

Nearly all widespread algorithms (eg. H264 video) are designed specifically with a hardware implementation in mind, and in fact must usually have elements removed that would produce good results simply because it wouldn't be sensible to implement in hardware.

In particular, in hardware, loops that iterate an unknown number of times are generally not allowed.

Steps to make this estimate would probably be to take your code and 'flatten' it (IE. Rewrite it to avoid all use of pointers, except arrays).

For every variable, figure out how many bits wide it needs to be(IE. What is the smallest and largest possible value). You probably want to convert floating point to fixed point.

Next, to make a lower bound of how many gates would be used if you were to design for minimal gate use, take every add and subtract operation and call them 15 gates per bit. For every multiply call it 5 gates per input bit squared. Don't do division (division can be done as a multiplication by the inverse of a number).

For the upper bound, do the same, but multiply by the number of times each loop goes round. That gives you a design with lots more gates but much higher performance.

For the upper bound finally add on 5 gates for every bit of every variable times the number of lines of your input code. This approximates the d type flip flops for storage in a pipeline. Note that if two lines of code operate on entirely different variables, you can call them the same line as far as this metric goes.

For the lower bound, if you got a value greater than 10000 plus 16 times the number of bytes that your program is compiled plus the ram it allocates to run, it would be more gate efficient to put in a tiny processor and keep your algorithm in a ROM. (Lots of complex algorithms are implemented this way when space is at a premium).

Re: Since nobody else here is prividing much help. by Wierdy1024 · 2014-01-08 22:25 · Score: 1

Note that these by the way assume you have the engineering time to 'do it properly'. There are lots of ways of making a considerably bigger design, but with much less design effort.

Check out 'Handel c' for example. Its a one click tool that takes C code and produces horribly inefficient hardware, but it works.

Unique algorithm? I recommend High Level Synth by LucienMP · 2014-01-09 00:10 · Score: 1

If you algorithm is standard just google "FPGA/ASIC IP provider" (of which there are many, eg H264 etc) and pay the price, your results will be optimal and cheaper than doing it yourself - assuming you never have before.

If your alogirthm is custom then you are either going to make a horrible job of it as you learn HW as you go, it takes years to get optimal and reach the clock speeds, area, and QoR/Test coverage numbers needed for production Si. Alternatively you could hire a HW team who will cost you a pretty penny to get it done, or outsource - also not cheap but it allows for risk reduction.

How many gates isnt just a question of counting "*" and "/" and scaling (although back of napkin will give you a general feel). Practically any C/C++ you write is going to be very sequentially orientated, this results in algorithms that probably have better parallel implementations. Whilst you might now be thinking threads, or separate processes we are talking HW parallelism, which is far more fine grained than threads, than SW parallelism. Further any integers you have may be optimizable to less than 32 (eg 1 bit or 3 bits) thus saving a large amount of HW area. Finally you didnt really say what sort of performance you want - if its 1MHz and 1 word / year of I/O then I suspect you could build some very clever hardware to do it all a few gates but if its 1GHz and 1 Giga Words/sec then area might expand as you will need to duplicate the circuit in parallel. Finally the speed at which you process data will affect latency (time from input to the first correct output) which is often a killer in real-time or other systems (eg. If you put LCD TVs side by side you might notice some are running several seconds behind whats being broadcast - this is because as it received the image some are doing more video processing to clean up the image - latency )

So at this point I would recommend looking into SystemC (based around C++), SystemVerilog (so so ) and then a raft of tools to help you do the job. These tools are called "High Level Synthesis" (HLS) tools and they arent cheap but they do cut down on man hours manually converting algorithms, but you will still need to be able to think extremely low level as bad code results in bad gate count - no matter the language.

I dont want to come over as a shill so I am going to present the 4 main competitors for HLS tools;
1) Calypto Systems, formerly Mentor Graphics' tools before spin-out ( http://calypto.com/en/products/catapult/overview )
2) Forte Desgin systems ( http://www.forteds.com/products/cynthesizer.asp )
3) Cadence C2S ( http://www.cadence.com/products/sd/silicon_compiler/pages/default.aspx )
4) Impulse C ( http://www.impulseaccelerated.com/ ) - this is very reasonably priced but has its limitations.
5) There are some open source things out there, I wouldnt recommend them as they are quite in their infancy.

Disclaimer: I used to work at one of the companies that provides synthesis tools, for >10 years converting C/C++/SystemC to HW quite often for a service fee. I can tell you we never had a design that cost under 30K and most were in the 100s to millions of USD.

It depends by Anonymous Coward · 2014-01-09 02:03 · Score: 0

4004 is a Turing complete machine and has around 2300 transistors. If you code your algorithm for 4004, put it in a ROM and add a 4004 and some RAM to it you'll have your algorithm in hardware (probably working painfully slow). So the question you were asked makes no sense - you'll need more information to give a meaningful answer. In general the same functionality can be coded in a small number of gates and take more time to run or in a bigger number of gates and take less time.

Do they know what they want? by MindStalker · 2014-01-09 02:45 · Score: 1

It seems to me that a company asking software developers what it would take in hardware might possibly not know what they want.

Its highly possible that a small CPU and program on flash ROM solution might be all they really want. Do they really NEED it burned into the hardware?

Question? by Anonymous Coward · 2014-01-09 03:23 · Score: 0

Q: How many computer programmers does it take to screw in a light bulb?

A: Can't be done. It's a hardware problem.

ATM by Anonymous Coward · 2014-01-09 03:29 · Score: 0

cancel enter clear 123

111111

You should contact these guys by Anonymous Coward · 2014-01-09 03:32 · Score: 0

There's no simple way to answer your question but these guys ( http://www.spacecodesign.com/ ) are developing tools to do automated hardware/software co-design

Estimate Memory Usage First by Anonymous Coward · 2014-01-09 03:35 · Score: 0

A simple, direct answer is impossible, but a quick estimate that can often tell you if it's feasible is how much memory the algorithm uses at any given time. If you must iterate over large sets of data you already have an engineering trade on your hands between external vs internal RAM.

Question you should ask yourself:

What is the dynamic range of the data and how accurate of a solution do you need to produce? Are two decimals places enough?
Is all the math floating point or can it be implemented as fixed point or integer math?
Do you have to perform any complex math operations like sqrts or trig? Can these be implemented by look-up tables?
Do you need external RAM chips?
What is the throughput?
Are there latency requirements?
What are the inputs and outputs? Those eat up gates too and if the I/O is a standard you can get your hands on some IP for an estimated gate count.
Will the ASIC be able to talk to a processor and leverage it for complex math operations? Then you don't have to spec, design, and test some crazy cordic math function to sqrts.

I work on implement Matlab algorithms in HDL and it's VERY hard for us to estimate a gate count from pure Matlab. We try to make the problem at little easier by mimicking the fixed point math of our design in Matlab but that only really gives us an idea on dynamic range, noise performance, etc.

Software Development by Murdoch5 · 2014-01-09 05:01 · Score: 1

You can't call yourself a software developer with a solid understanding of hardware. To develop software, you should always be thinking in terms of how the hardware is going to handle that software. That being said if you need a gate count then use VHDL or Verilog or another Hardware Descriptor Language. You can't actually convert an abstracted software language like C and up to gates because every single compiler and linker will turn out different end code.

Re:Software Development by Murdoch5 · 2014-01-09 05:28 · Score: 1

*without

The best answer by Anonymous Coward · 2014-01-09 05:04 · Score: 0

"All of them."

Manager Needed by Anonymous Coward · 2014-01-09 05:25 · Score: 0

"Uh, we really rely upon doing the best software development we can. We don't do hardware implementations of our products so we're not equipped to answer your question. Was there a critical need for this information, such that we need to open a channel to a third-party provider to answer the question? We would only pass along their raw cost to provide an answer at no profit to us."

Translation: We're damn good at what we do. You're asking a question which could be a route to backdooring us and we won't allow that without your paying for the privilege. If you really want the answer, we'll hire someone to get that answer and bill you for that person's time. (Even though you're asking a question you're not entitled to ask unless you're buying the source code as well, which we'd rape you in cost for.)

One way to make it right...but it requires work by Depressive+Cyborg · 2014-01-09 06:18 · Score: 1

Make sure that you can analyze the software properly and break it down to well defined functions/modules/whatever-your-abstraction-is

Implement hardware modules for corresponding inputs/outputs but make sure that you do not use an automatic tool. For hardware, you can often
do things very differently since you have different ways to implement things and may use state machines, memories and logic in different ways.

For each module, check hardware and software implementation against each other using CBMC or some other software which can actually verify that your implementation is, if not correct, at least equally bad as the implementation in the other domain.

(Since I'm posting as a Depressive Cyborg, you might be able to figure out what (SW/HW) made me a Cyborg and what made me Depressive....)

no clear specs, no clear answer by xgeorgio · 2014-01-09 08:31 · Score: 1

If your algorithm is purely arithmetic, then translate it to primitves (+,-,*,/) and estimate complexity based on simple full adders and flip-flps (bit level). Note that this is a very rough estimation and does not apply easilty to long, data-oriented code, since in that case your interest is with the data storage, not the operations on them (imagine adding +1 to a billion-billion-cell vector of counters).

If your algorithm is mixed-form, then you must know your hardware capabilities and, preferrably, its firmware. if you can transform your flowchart (low level) design to assembly code, then you can lookup the necessary opcodes in some standard IC and estimate (again, roughly) the order of the required IC in your case. For example, if you want to sort some 100-element vector of 16-bit integers, then a few generic x86 opcodes are enough - therefore, even 8086 (or a fraction of it) will do the job.

Generally speaking, the algorithm-to-gates estimation works only on very primitive or streamlined procedures, mostly arithmetic. That is, only when we are speaking about DSP (not CPU) implementations, like for GPUs. In almost any other case, most IC circuitry comes from the corresponding memory/heap modules, I/O, registers, etc, as the "algorithm" will require much more than a simple processing unit.

--
"Abashed the Devil stood, and felt how awful goodness is..."

Have you looked at SystemC by Anonymous Coward · 2014-01-09 10:04 · Score: 0

http://en.wikipedia.org/wiki/SystemC

Cheers,

dan@3-e.net

FPGA by echen1024 · 2014-01-09 14:55 · Score: 1

I would advise them to first try on an FPGA (Field programmable gate array), and just write the program in Verilog, see how many gates it needs, and then simply select an FPGA from Altera or Xilinx that fits your needs. No need for a full blown ASIC.

Estimate with static compilation on a simple ISA by Anonymous Coward · 2014-01-09 16:13 · Score: 0

From what I understood, they are willing to implement your algorithm on the hardware. So, they just need an estimate on the complexity of the hardware. You can cross-compile your code for a RISC-like ISA as a stand-alone executable (mind the ISA, x86 has too much in it). The number of instructions should correlate with the complexity of the implementation of the hardware. In the end, every instruction is a piece of hardware executed sequentially through a control mechanism.
E.g.: (Static MIPS compilation of your application - A) / (static MIPS compilation of X - B) =~ (hardware of your application - C) / (hardware of X - D)
You can use a few (5 min) applications (like fft, jpeg decoder, e.g.) in place of X to find A, B, C, D and see how it matches. You can use simple regression, too, if you have many data points. Getting many data points should not be very hard as there are many applications implemented in both worlds.

If they are serious.... by niftymitch · 2014-01-09 18:38 · Score: 1

If they are serious get some funding to start coding this in a hardware description language.

Note Well: this is a lot like asking how many x86 instructions a "C" program will take
without writing a "C" program. At best this gets you a starting answer.

If you tell the compiler to kill loop unrolling code shrinks and might run slower.
If loops unroll code grows but might run faster. SIMD instructions the code
can shrink. Now ask if the x86 answer is the same answer you get on a ARM
and a MIPS processor. The other thing to know is data path widths have large
impact -- wider is faster but used more gates -- too wide is slow -- too narrow is
slow.

Invest a couple grand of their money on some large FPGA development kits and
go to work. For the most part graphics hardware is tightly coupled stripped down
common processors and state machines setup to solve specific display problems.

One positive place to work is in the world of CUDA on graphics cards.

CAUTION.... the field is full of patents and going fast on CUDA is dancing with
a hungry bear... Any hardware you build to the same end will likely trip on patents
that others have.

Well written hardware descriptions read a lot like any programming language.
With a second beer in hand you can read down from an X-windows program
all the way town to gates and other hardware library stuff and hardly see a
speed bump.

Going fast in hardware requires clever minds....

And if you cannot build Open-GL and WindowZ graphics on top
of your "C" proof of concept you have a lot of work to do.

--
Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.

Too high up by AvailableNickname · 2014-01-10 00:24 · Score: 1

I think that C/C++ is much too high a level to look at and say "This if statement requires n logic gates" because that if statement will be implemented in assembly differently on different systems. However, one can look at assembly code and say "Ahah, that is 2 logic gates, and there's three more and that's another 3.". So I think you need to compile your code on a whole bunch of systems, compare the assembly, and use some kind of average of the results to get a rough estimate of how many logic gates.

You might want to first read up what 'gates' are by Anonymous Coward · 2014-01-10 01:30 · Score: 0

Gates are quite basic (think and/or/xor/not with just a few inputs and one output)
Consider that arithmetic functions (binary add/multiply/increment/latch for a numeric word sized data) can add up to hundreds and thousands of 'gates'
The fact they are refering to 'gates' instead of offering already predesigned functional blocks is a bit worrying (google Programmable Array Logic )

Single chip micrcontroller by Anonymous Coward · 2014-01-10 08:39 · Score: 0

It might be a lot easier to use a single chip microcontroller. I used to use 8048 and 8051 chips, 40 pin cases with built-in memory, rom and I/O. You can even get ones with EPROM to flash your program in.

It all depends on the speed you need, and if you can learn the language to program it. But it's a lot better than designing gate logic...

Re:Since nobody else here is prividing much help.. by patrick.clemins · 2014-01-10 12:46 · Score: 1

Wierdy's last suggestion is my personal favorite. It's really a sliding slope between software and hardware anyway. Does putting a Linux ROM with your algorithm set to autoload as a startup daemon in an x86 machine count as hardware or software? Embedded applications often have something resembling an OS, if not a full blown OS, managing resources. Unless your algorithm is super simple, or this electronics manufacturer is a glutton for punishment, I'd put your algorithm on a ROM alongside some DSP or other processing core and call it a day. Another option to explore that's between the two (all gates and ROM/processor combo) is a PAL/GAL... but it will certainly take some mental gymnastics to get your genetic algorithm into a form appropriate for burning the PAL. Good Luck!

Slashdot Mirror

Ask Slashdot: How Many (Electronics) Gates Is That Software Algorithm?

365 comments