TIOBE's Language-Popularity Index Sees A New Top 10 Language: Assembly (tiobe.com)
TIOBE's "Programming Community Index" measures the popularity of languages by the number of skilled engineers, courses, and third-party vendors. Their July report indicates that Assembly has become one of the 10 most popular languages:
It might come as surprise that the lowest level programming language that exists has re-entered the TIOBE index top 10. Why would anyone write code at such a low level, being far less productive if compared to using any other programming language and being vulnerable to all kinds of programming mistakes? The only reasonable explanation for this is that the number of very small devices that are only able to run assembly code is increasing. Even your toothbrush or coffee machine are running assembly code nowadays. Another reason for adoption is performance. If performance is key, nobody can beat assembly code.
The report also noted that CFML (ColdFusion) jumped from #102 to #66, Maple from #94 to #74, and Tcl from #65 to #48. But Java still remains the #1 most-popular language, with C and C++ still holding the #2 and #3 positions. Over the last five years, C# and Python have risen into the #4 and #5 spots (made possible by PHP's drop to the #6 position) while JavaScript now holds the #7 position (up from #9 in 2011). Visual Basic .NET came in at #8, and Perl at #9.
The report also noted that CFML (ColdFusion) jumped from #102 to #66, Maple from #94 to #74, and Tcl from #65 to #48. But Java still remains the #1 most-popular language, with C and C++ still holding the #2 and #3 positions. Over the last five years, C# and Python have risen into the #4 and #5 spots (made possible by PHP's drop to the #6 position) while JavaScript now holds the #7 position (up from #9 in 2011). Visual Basic .NET came in at #8, and Perl at #9.
The only reasonable explanation for this is that the number of very small devices that are only able to run assembly code is increasing. Even your toothbrush or coffee machine are running assembly code nowadays.
Wait, my toothbrush or coffee machine is capable of running assembly code? Normally, they only run bytecode that's been compiled from assembly code. I got my toothbrush for ten bucks with a charger, that's awesome! What's in there, an assembler or an interpreter? Either way, it must have a seriously overkill microcontroller.
These ratings are based on search results. Pardon my skepticism. From what I can tell, more and more microcontrollers are being programmed in C these days.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
I've been involved in firmware development for implantable medical devices as well as other devices and it's simply not true that assembly has much use this day in age. Unless you are coding for one of the small memory footprint AVR or PIC devices you are not going to get better results working in assembly.
If you want speed, assembly is the ONLY option.
But virtually all of the optimization happens in virtually none of the code, I always forget the percentages but they are probably made up anyway. You will need some ASM. I think it's more credible that more people are writing some ASM for some parts of some projects. It's not unreasonable to imagine that the ongoing proliferation of embedded doodads would spur that on, but it's a stretch to imagine that it's for devices for which there is nothing but an assembler.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
They are machine dependent.
X86, ARM, AVR, IBM360, PDP8, or what? Just saying 'assembly' is not all that interesting. Processor architecture(s) would be interesting to know.
I love assembler programming - in fact it was it was the second language I learnt after basic. If I could get a job in assembly* I would be in heaven. Unfortunately I've seen almost exactly zero evidence that this "trend" is real.
*x86 or z80 ideally, but I'm not fussy, and new processors aren't hard to pick up.
... or are they mixing assembly up with web-assembly?
That would explain a thing or two.
We suffer more in our imagination than in reality. - Seneca
"Why would anyone write code at such a low level, being far less productive if compared to using any other programming language and being vulnerable to all kinds of programming mistakes?"
...More productivity down the drain.
A) Why don't you ask them and learn?
B) I politely disagree that they are automatically "far less productive". I am an embedded programmer and have only used tiny amounts of assembly over the last decade. However, if I had more tiny projects, and my bosses weren't akin to cats chasing flashlight spots to where I could stick with the same processor for more than a year, I'd consider it for sure. Why? Because I never seem to get to just "code and go" anymore. As an example, last week I had to reinstall my multi-gigabyte Eclipse IDE for the SECOND time this past year (this time due to a debugger corruption). In such IDEs, all the higher libraries (and their paths) need to be in place, and compiled too (which doesn't always go perfectly). Whereas my former officemate would open any text editor (his was Corel Word Perfect(!)) to write his assembly, then compile on the command line, then upload the binary.
I don't know how many hours I've spent learning and fixing IDEs. Then, a year to two later, the IDE changes again after the chip's OEM "upgraded" it!
Yeah, right ... NOT!
"Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
There's more than one reason to want optimization. There's optimizing for speed in a full algorithm, in which case assembler isn't that important. But optimizing for speed in localized locations can be very important. Ie, the faster your interrupt handlers are the better I/O throughput you can get, or faster context switches, etc. If you're programming on a DSP for instance, you almost always want the best speed and that often means assembler or assembler wrapped inside of macros or special directives. There's also optimization for size, and occasionally assembler helps there as well to cram in as much as you can in the limited space.
And of course you need to *know* assembler even if you don't write it. It's how you decode core dumps, figure out what your code is doing, and lets you treat the machine as more than a black box (I've seen people with efficient algorithms that weren't so fast because they didn't understand what was fast or slow under the hood).
Here's a electric toothbrush reference design from Texas Instruments.
Here's the MSP430G22x0 microcontroller used in the design.
Here's a list of software tools for that microcontroller. The list includes something called "GCC", which they say is an "Open Source Compiler for MSP Microcontrollers".
Here's a page from Renesas about electric toothbrush designs.
Here's a list of software tools for Renesas processors; they list C compilers for the R8C and RL78 microcontrollers, as mentioned in the previous page.
So don't assume all the code in your toothbrush was written in assembler language; some of it may have been written in C, although some of the low-level library routines might be written in assembler (or an asm in the C code).
That hasn't really been true for a long time, unless your hand-written assembly code will reliably outperform a good compiler's generated code.
Hint: It's a reasonably safe bet that it won't, unless you actually are a world-class expert on the subject, such as the people who write those compilers.
Modern CPUs are not the chips your grandpa programmed. They are full of caches, pipelines, predictive execution, parallel operations, and numerous other confounding factors that mean what you think will run fast and what will actually run fast may be two wildly different things, even in apparently simple cases.
Whole teams of very smart people spend years reading and understanding thousands of pages of CPU specs so they can write compilers that analyse and optimize code at the speed of those modern CPUs to generate efficient machine code from it. Even if you can beat them on implementing one simple algorithm in isolation because you can spot something the compiler missed, modern high level languages encode so much more programmer intent than raw assembly that you might still lose out overall because you're disrupting larger-scale optimisations.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
I write compilers for a living, and I came in to say something much like the above.
I have direct experience recoding moderate sized C++ functions in hand-coded assembly. I have been able to beat the C++ compiler, but... not easily, and not without major elbow grease and lots of my time. It's most often not an easy thing to do, because the compiler has sophisticated scheduling and optimization algorithms, and because of the hardware complexity the parent post is talking about. Most programmers weaned on Java don't quite grasp the performance complexity that exists in modern CPUs.
Does it ever make sense to write asm directly? Once in a while, yes, but in the vast majority of cases, optimization time is better spent at the C++ level improving the time complexity of algorithms that dominate the problem space, or perhaps recoding Java into C++. Outside of very specialized and rare circumstances, I don't find reasons to optimize below the C++ level.
There are highly constrained devices where directly coding asm is the way to go, but fewer and fewer all the time.
People suck at macro optimizations in assembly. If your assembly program is more than 25 instructions, you're probably not going to do a better job than a compiler.
If you're writing intel code you'll probably see better speedup using c++17 parallel policies or some other parallelization framework, but sometimes it is necessary to optimize small segments of code. Premature optimization is the root of all evil and effective use of templates and metaprogramming can go a long way without much effort. Embedded devices should be treated to same way. Write a high level implementation in the highest level language available to you and hand optimize only when necessary.
I've compared c++ stl implementation of algorithms to c implementations and my c++ almost always runs faster with significantly less code. Small programs that calculate trivial results are always faster in low level languages, but real programs need complicated structures and its almost always infeasible to hand optimize those effectively.
I wanted to take assembly language in college. The dean wanted to teach the course. But I was the only student who showed up for the class. One student doesn't prevent a class from being cancelled. Since this wasn't a required course for graduation, I couldn't take it as a special studies project.
In the day of viable superoptimizers or superoptimizer-generated peephole optimizers and viable evolutionary/exploratory/search-based compilers, you should need to know assembly even less than ever before. Remember, the machine can try out new things both faster than you and cheaper than you.
Ezekiel 23:20
One might argue that the C code was bad in the first place. Unless the templated C++ code did vast rearrangements of data structures (how?), DataDraw-like approaches are still likely to beat it. Having said that, both C++ and C don't really seem like a really good vehicle for implementing the techniques that could search for the fastest machine code for particular subtask.
Ezekiel 23:20
There are a couple of reasons to optimize early.
Design of your program may change depending on maximum capability. It's better to change a design earlier in development.
Faster run times can improve testing and debug cycles.
Play Command HQ online
There are a lot of processors out there that aren't complicated. On an 8 bit microcontroller with an instruction set size of only a few dozen, it's not hard to beat compiled code.
Play Command HQ online
No. Yes. No. Uh... that really depends.
On a modern desktop CPU, no chance. gcc optimizes in ways you wouldn't even dream of. Even Visual C++ produces decently optimized code. Usually, writing ASM yourself you can't come close to the optimization level that modern compilers achieve.
Except Atmel's very own compiler. That's the maybe WORST kind of rubbish I have ever seen. Here you can actually gain a LOT of performance and code size by hand craftin ASM code. Which says something considering that, unlike desktop PCs, AVR ICs are very limited in space and processing speed and if there should be any compiler that takes optimization seriously, it's that one.
Fortunately gcc is quite capable of optimizing well even for AVR...
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Bullshit. A good C-compiler will do as well or better in most cases. For the rare case where it does not, embed assembly for the critical parts. You may also need embedded assembly in drivers, where hardware has to be accessed just right in order to work or work fast.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Counter-intuitive as it may seem at first, the smaller your instruction set, the higher the chance that the compiler is actually better at optimizing than you are, unless you have a LOT of experience writing and optimizing assembler code. Mostly because the smaller the set, the longer the code gets and the more you have to take into account certain side effects from instructions. I spent quite a few hours puzzling why the compiler would completely rearrange the instructions only to notice that it gained a bit of an advantage from creative pipeline restructuring and carefully manipulating certain jumps.
The only area where humans outperform compilers reliably is conditional jumps. Because humans can usually more readily tell from the intended program flow which branch is more likely to be executed and can optimize the programs in that way. Hand that information to the compiler and I hold any bet that it will outperform you any day.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
And there are more good reasons to not do it except in special cases. The primary one is that it makes coding much more expensive, mostly via increased effort in debugging.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
There was a very good op-ed from the Usenet days which pointed out that an assembly programmer will always beat a high-level-language programmer on most performance metrics because a typical assembly programmer can use a compiler but a typical high-level-language programmer can't use an assembler.
An assembly programmer can use a profiler to work out where the problems are, inspect the output of the compiler, see what's wrong with it, and fix it using assembly. (Today we might include intrinsics as a potential fix, of course.) Assembly programmers write faster code because they can also use a compiler.
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
If you do not know assembly, you cannot be a really good coder and you cannot even understand how common attacks on code work these days or why some things run much slower than others. That said, actually coding in assembly is something you only do when there are very good reasons to and mostly as assembly embedded in C.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
If you are on an 8 Bit MCU, speed is not that critical, or you would use a larger and faster MCU.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
They are full of caches, pipelines, predictive execution, parallel operations, and numerous other confounding factors
You are using all the same arguments C/C++ programmers used in 1995 but if the discussions wasnt about assembly language but instead the discussion was a comparison of compilers from 1995 with compilers from 2015 you would be telling us how terrible those 1995 compilers were at optimization even then.
..and if you are "confounded" by caches, pipelines, superscaler techniques, and so on ("cause surprise or confusion in (someone), especially by acting against their expectations.") then its because you are ignorant of the architecture. Your ignorance doesnt define reality.
..and notice how its now 10% faster... or 20% faster... there is a range because it depends which gcc compiler version is used (older versions produce better code.) If gcc has such a large variance in performance, if newer versions are slower, then what are we to think of the veracity of your claims? I'll tell you what: they arent veracious claims. They are wishful thinking.
The fact is that it doesnt require an expert to beat a compiler at optimization. The fact is that your idea of an "average assembly language programmer" is actually a terrible assembly language programmer. When you dont consider someone that learned C/C++ a month ago an average C/C++ programmer, why do you then consider someone thats literally never actually taken assembly serious as the benchmark average?
Recently a programmer undertook the task of converting the strongest open source chess engine (stockfish) to 80x86 assembly language. He still has done no optimization. He has literally just done a straight simple conversion and its already 10%...20% faster and is now easily beating the C++ compiler version in tournaments.
"His name was James Damore."
No it's not. C is also an option, given a good optimising compiler.
All those moments will be lost in time, like tears in rain.
I write compilers for a living, and I came in to say something much like the above.
I apologise in advance for the following rant. It may not be fair to direct it at you personally, but I'm going to do it anyway.
There is a trend happening in modern C-family compilers where they have decided to go full language-lawyer on aliasing rules. It's getting much, much easier to write code which looks sane and has worked for 20 years, but compiles to code which subtly does what you don't expect on the most recent compilers.
Now this isn't entirely your fault, because C-family language standards are severely obsolete and kind of stupid in this respect. The aliasing rules date from an era when CPUs were much slower than they are today. I do a lot of scientific-type high performance code, and I don't think I've seen a performance issue in the last decade that can be blamed on pointer aliasing, despite the C-family committees still act as if this is the most insidious performance problem in the world. But I have been bitten several times by compilers generating code that may be technically correct (the best kind of correct!) but clearly not what the programmer intended.
The standards are full of these nasal demons, where you shouldn't go there just in case the compiler decides in its wisdom to optimise that code in a certain way. Is it so much to ask that once in a while we might get a performance guarantee? Something which says "If you write the code this way, this will definitely happen". Hell, Scheme programmers get a guarantee that tail call optimisation always happens. We don't even get a guarantee on the empty base optimisation!
We may see more assembly in the years to come, not because it's faster, but because assembly has "do what I say" semantics in a way that C and C++ do not.
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
Is that the CPU and instruction set we are all using, or is that some specious reference to a bygone era with no actual current value?
All those moments will be lost in time, like tears in rain.
Spoken like a twat that hasn't got any metrics to back up his rhetoric.
All those moments will be lost in time, like tears in rain.
Yes, knowing assembly and actually writing it for no good reason are two entirely different things. I was arguing against the latter.
Ezekiel 23:20
3x faster than C, 10x faster than Java, many orders of magnitude more compact than either.
You don't use asm if you have choices, but if you have fixed hardware and still need more performance or more compact code there aren't any other good options.
Though Forth will give asm a run for it's money in some circumstances - the problem with Forth is the bills from keeping the programmers in a suitable frame of mind are probably excessive ;)
You may be right for some VERY small and trivial applications, but sometime in the early 90's optimizing C compilers FAR outstripped hand-coded assembly for any larger project. These days it isn't even a contest. A good optimizing compiler like the Intel C/C++ compiler can crank out code that is anywhere from 3 to 10 times faster than what you can do by hand. I should know, I did PLENTY of Assembly, and worked with some LARGE assembly-language applications back in the 80's. ALL of them were totally rewritten in C before 1995, and I'm talking about RTOS kernels and stuff, things where one clock cycle matters. There may be some few very specific 'inner loop' interrupt handling logic and such that is still written in assembler, mostly because that sort of code is idiosyncratic and can't really be safely optimized, but we're talking a very few lines of code, maybe 500 out of 500k SLOC.
I can buy the idea that there are 'IoC' things out there with extremely minimal processing power, basically little 8 bit devices with a few K bytes of RAM and Khz CPU clocks that you really CANNOT code in an HLL. Of course the amount of code that you write for that thing is obviously also extremely minimal. We're talking "blast this fixed length 802.11 frame out there every 2 seconds with these 12 bytes holding the RAW bit values from the ADC and the 3 discretes to a broadcast address" kind of thing.
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
I fully agree that writing assembler without a very good reason is not a good idea today.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Take a look at OpenSSL some day cowboy.
Generally the asm paths are at least 3x the speed of the code the best compilers can generate.
When the speed up for using asm on Itanium was 5x was when I realized it was doomed. The compilers were further from human performance than on the older architecture and that wasn't going to improve.
I used to write assembler. But that was in the days of very limited machine memory, when saving a couple of bytes would matter. It was also in a day when compilers put out relatively inefficient code. Programming, say, I/O, at the assembly level would likely produce code orders of magnitude faster than compiled code that used repeated calls to I/O libraries.
But today there's plenty of memory in most computers, and compilers are really good. So I only see two reasons to write assembler. One is to deal with small "things" that may still have very limited memory and processor capability. The other is in analysis, such as analysis of malware, artful decompiling, and good old-fashioned hacking and cracking (by Good Guys (TM) only, of course). The latter may not be so much writing assembler (unless you're crafting binary patches) as in understanding assembler.
It's not unreasonable to imagine that the ongoing proliferation of embedded doodads would spur that on, but it's a stretch to imagine that it's for devices for which there is nothing but an assembler.
For systems using a 6502 family CPU, there is a C compiler. But it doesn't optimize much, and the 6502 architecture isn't well suited for efficient execution of C anyway. That's why even though a few modern-day NES games are written in C, most are written in assembly language.
What you say may be true of x86, x86-64, ARMv4-7, and AArch64, which get the most attention from compiler authors. On an 8-bit microcontroller, assembly tuned specifically for a particular ISA's available addressing modes can handily beat C++. Some ISAs, such as 6502, prefer a set of parallel arrays instead of an array of structures.
If you are on an 8 Bit MCU, speed is not that critical, or you would use a larger and faster MCU.
When you want to add a new feature to hundreds of thousands or even millions of devices in the field, have fun recalling them to install "a larger and faster MCU."
the smaller your instruction set, the higher the chance that the compiler is actually better at optimizing than you are
Unless the authors of the only compiler for your ISA maintain it as a hobby and aren't interested enough in optimization to quit their day jobs. Case in point: cc65.
Say, when were we supposed to get unicode support?
As soon as it's possible to prevent all current and future directionality override control characters from fuccing with the layout.
Basically the code and vars won't fit in the chip I have at the moment without finding weird ways to save space. So there are times when changing something in assembler can save a couple hundred bytes. There's also parts of the code even more constrained in size, like a bootloader that has to fit in 4k, so optimizing there is worth it.
I wanna see a human beat any modern-day compiler to optimizations.
asmfish
10% to 20% faster than the C++ version.
The authors of the c++ version have attributed the speed gains to better register usage. The author of this asm version says he hasnt actually started optimizing yet.
You C++ programmers are delusional.
"His name was James Damore."
FWIW, when I write assembly, I routinely beat GCC in execution speed by 30% without even trying.
And yes, it is possible to write modular, well-structured assembly language programs. Stick to calling conventions, make local variables truly local, implement objects when necessary. And document everything. Writing in assembly no more removes the need for good design than writing in C++; though I must admit, I've seen many more poorly architected C++ and Java programs than assembly programs.
Assembly teaches you to think about program structure and efficient use of resources. To write maintainable assembly programs, you MUST think about architecture. You have to be able to manage both high level and low level details *in the same context*. What hamstrings a lot of HLL programmers is they believe that HLL programs obviate the necessity of good design, and go on to write horribly unmaintainable code because HLLs encourage both laziness of thought and practice.
Assembly requires discipline, that one acquired, will make one a good programmer in any language. The same is not true of C++ or Java, where many a programmer spend years writing awful code before really learning how to use it well. And some coast by without ever learning how the underlying mechanisms are implemented, that memory is not -GASP!- unlimited, that CPU time does matter, and the problems of resource allocation and release do not simply "go away" because the compiler implemented new (think about what happens to open sockets, queues, and the like when the destructor is called).
I think they're talking about projects whose source code is in assembly language, where a work's "source code" is defined as "the preferred form of the work for making modifications to it" (GPLv2, GPLv3). That is, something like the Pently audio player (which uses a preprocessor written in Python but is otherwise in 6502 assembly) or the video games RHDE: Furniture Fight and Nova the Squirrel .
Spoken like a twat that hasn't got any metrics to back up his rhetoric.
You are right. I dont have metrics. I just have binaries.
C++ version of stockfish
80x86 asm version of stockfish
One of these is faster. I'll let you decide what "metric" to use.
"His name was James Damore."
And this guy specializes in testing the strength of stockfish as it evolves:
http://spcc.beepworld.de/
Note how he included a june build of asmfish, and that asmfish is leading even against july builds of stockfish.
Which "metric" should we use?
Nodes per second? asmfish is faster than stockfish.
Time to depth? asmfish is better than stockfish.
Playing strength? asmfish is better than stockfish.
I wonder what metric this guy is going to want to use that retains his wishful thinking of C++ supremacy. Maybe the number of lines of code?
"His name was James Damore."
I have a hard time understanding the benefits of after the return restoring the stack over during/before the return. When you do it that way you have to write stack resetting code every time you return from the call instead of just once. My best guess was that it has some benefit for procedures that could accept a variable number of parameters. The former was called "c" passing and the latter "pascal". Somehow the pascal term got changed to system if I'm looking at #defs correctly. Maybe with a few underscores thrown in. Then there's the left-to-right/right-to-left mess.
The smallest device I have written code for is a PIC with 512 bytes of RAM and 256 bytes of ROM. It had a C compiler. It is also lacking in connectivity for making trendy IoT devices. So what are all these devices that can only run assembly code?
I think a more likely explanation is that fad languages come and go, and now that globalization has driven the value out of programming, and kids are leaving the industry, they are mostly going, leaving only the languages that have stood the test of time behind.
A lot of comments talking about the speed of assembly, but if you enjoy puzzles like sudoku, coding up a small executable in assembly can be much more challenging and rewarding. You have to work with a large number of variables in your head, keep track of memory locations, remember how the system calls work, etc. Not great if you're just trying to get things done, but it's definitely a good mental exercise.
01010011 01110100 01110010 01100001 01101001 01100111 01101000 01110100 00100000 01110101 01110000 00100000 01100010 01101001 01101110 01100001 01110010 01111001 00100000 01101001 01110011 00100000 01110111 01101000 01100101 01110010 01100101 00100000 01110100 01101000 01100101 00100000 01100001 01100011 01110100 01101001 01101111 01101110 00100000 01101001 01110011 00101110 00100000 01000101 01110110 01100101 01110010 01111001 01110100 01101000 01101001 01101110 01100111 00100000 01100101 01101100 01110011 01100101 00100000 01101001 01110011 00100000 01110011 01100001 01101100 01100001 01100100 00100000 01100100 01110010 01100101 01110011 01110011 01101001 01101110 01100111 00101110 00101110 00101110
The Slashdot filtering system prevents posting this without ASCII fluff that you are free to ignore...
“He’s not deformed, he’s just drunk!”
Then there's also the popularity of retro computers and consoles. There is an increasing number of indie/amateur developers developing for these machines as a hobby.
During the last few months I have been uploading my Z80 assembly code to GitHub myself (a game for MSX computers), and many other developers are doing the same.
My site
GCC has never been a very useful benchmark for compiler performance in my experience. Even a decade ago, when I used to work on a lot of very performance-focused mathematical code that was compiled on many different platforms, Visual C++ would generate code that often ran twice as fast as anything g++ would produce on Windows, and if you looked at what the Intel C++ compiler could do it was much better still.
It's also worth remembering that C++ is still quite low level on the programming language spectrum, and in terms of expressive power and showing programmer intent, it isn't exactly the most promising choice; potential aliasing alone prevents all kinds of otherwise useful optimisations, for example. There are much more powerful optimisation possibilities for higher level languages with more expressive semantics, some of which will already dramatically outperform a naive assembly equivalent with today's real world compilers. As CPUs get more complicated and compilers for higher level languages get better, the scales are only going to tip further in that direction.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Counter-intuitive? Maybe, but what you are saying here is largely the argument used 50+ years ago in developing RISC (Reduced Instruction Set Computers), the technology which led to the first so-called supercomputers capable of sustained operation at megaflop+ speeds. Although I have not kept up with today's chip architectures all that carefully, I am sure that many of the characteristics of RISC continue to be present.
Modern cpu does not necessarily mean big complex processors. In embedded sometimes very small processors with simple instruction sets are used to save power and money. Sometimes a simple analog circuit can be done simpler in digital, but you may not need more than 6 bytes of ram or any more instructions than or, not, shl, mul. If that is the extent of what your cpu does, why would anyone bother writing an optimizing compiler for it?
I think most case statements switch from else-it to jump tables past about 5-6 cases.
In 2008, there were some 30 embedded microprocessors per person in developed countries (PDF). Most of those don't have plenty of memory. There is a whole entire world out there that you are likely unaware of.
In embedded sometimes very small processors with simple instruction sets are used to save power and money.
Sometimes, yes. As you say, in addition to assembly programming being simpler on those platforms, compilers are often much less sophisticated in their optimisations as well, so going with assembly is a double-win.
That said, in an era when you can build a complete Raspberry Pi with a decent ARM chip for a few dollars and when mainstream CPUs are a lot more power-efficient than they were even just a few years ago, using very low-end CPUs is a lot less attractive than it used to be even in the embedded space.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
There are probably hundreds of different CPU's out there. Many for which specs have never been published. Everyone here has a PC class mentality, but in the real world, there are dozens of very tiny and primitive processors doing simple tasks that you never see. Decoding a front panel to a tv for instance. It may be less expensive to create your own proprietary cpu and have three million made at $0.04 each than to add the circuitry necessary for the main processor in your tv to handle it. Nobody is going to bother writing an optimizing C compiler for a processor that has six instructions and 16 bytes of memory. I've worked on many similar systems that would not most of the C language.
There are a lot of cases where the compiler will pick it wrong, especially in embedded world.
Maybe compiler assumes code is run from ram although it is run from flash, meaning different code is a lot faster.
Then there are cases which the compiler cannot even handle, like cache cleanup and memory barriers - there is no way the compiler can know the peculiarities of your (custom) system. Same with task swap and atomic operations. You might be able to write those with C intrinsics, but even then you must know what code the compiler will create (i.e. not let optimizer reorder operations over a memory barrier - far from trivial).
The compiler support for those cases hasn't really improved in last ten years or so, actually due to new aggressive optimizations it could be said it is worse.
People suck at macro optimizations in assembly. If your assembly program is more than 25 instructions, you're probably not going to do a better job than a compiler.
Good assembly language programmers are not trying to out-optimize the compiler at instruction scheduling. Good assembly language programmers are leveraging information that can not be communicated to the compiler.
In the day of viable superoptimizers or superoptimizer-generated peephole optimizers and viable evolutionary/exploratory/search-based compilers, you should need to know assembly even less than ever before. Remember, the machine can try out new things both faster than you and cheaper than you.
Contrary to popular myth C/C++ code can be written in ways that favor one architecture over another. One reason to understand the architecture, assembly language, is to write better C/C++ code for your target.
And then there is debugging.
My 300Euro coffee machine runs no code at all.
Neither does my plastic tooth brush.
And by the way: Assembly is not a language. 68k Assembly might be one assembly language. x86 another one. Both have not much in common.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Why can't hints be added to compiled languages to indicate the most common route? For example, special comments? Oracle SQL has a feature where a plus sign in comments indicates it's an optimization hint: /*+ ... */
Table-ized A.I.
Where the good assembly language programmer beats the compiler is usually not in the instruction scheduling. It is more typically leveraging knowledge that can not be communicated to the compiler about the desired implementation. High level programmers are constrained by their respective languages. The assembly language programmer is not so constrained.
Furthermore the architectural differences are sometimes a bogus complaint. For example where I last wrote non-trivial amounts of assembly was to get a few more fps out of a couple of video games. This was only necessary for low end machines, not the most recent architecture. If my optimizations were not needed on the more recent architectures that was not a problem, those systems were just fine performance wise. FYI I benchmarked this assembly code over several generations of x86 architecture over the years. While the performance benefits of assembly decreased over the generations it was always still a win, I did not have to update it. However when the original architecture that needed the assembly was no longer a supported target I turned off the assembly and let the C/C++ implementation get compiled so that maintenance would be easier (i.e. I would not be needed).
even in the embedded space.
Completely application dependent. What if your requirements are to operate 6 months on a single AA battery and the whole application is to shift out a 16byte sequence? There are about 20-50x more systems with requirements like this in the world than those that would require an atmel or arm chip. I worked on a system once that had such requirements because it was cheaper than a couple of TTL parts. This was a tiny subsystem on a larger computer board with real processors and RAM.
And remember 15cents spread out over 30 million widgets is $5million.
One benefit of hard resetting a stack, register or even a variable after each use is that it is a step towards making your code more deterministic, by setting it to a known safe state.
For some of the driver code I develop, the ability to do that in a strict manner, without a compiler overriding my design, is why I use some assembler.
I'm not surprised that compiler technology never caught up with itanium. The volume was never there to allow the kind of investment that made the x86 compilers so amazing.
Assembly weeds out the less competent programmers early after they realize they simply aren't up to the task. In other languages they can stick around for longer and might eventually learn. So what is to be preferred: that they learn eventually, or simply leave the field?
As for what happens 'when the destructor is called', it will clean up the sockets and queues and the like; that's what they are for. 'delete' is not just for deallocating memory, it is for deallocating all controlled resources. It's possibly the most fundamental building block of C++; I'm a little surprised you don't know this considering how much of an opinion you seem to have on higher level languages...
Perhaps THOBE is mistaking the fact the NASA put up the full assembly code for the Apollo 11 guidance computer on Github and the interest in that (6000+ stars) for a revival of the language instead of a historical artefact.
Check it out here and don't forget to start it, together we can confuse THOBE and make Assembly the #1 language of the world again! ;-)
https://github.com/chrislgarry...
First of all, many, if not most, computers run software so trivial on microcomputers so small, installing a compiler for them would just be _way_ more effort than to just code to program in C. Typical examples are the microcontrollers in electric toothbrushes or other smaller embedded systems. It's hard to get your software running on a 4 bit microcontroller as modern C makes assumptions like having your memory addressible at a byte level.
Then there is another, in my opinion more important, point. Assembler is a great teaching aid. It shows you what the computer actually does. Understanding things like pointers is trivial in Assembler so you can learn a lot from it. Also in Assembler every control structure hurts, as you need to keep track of it yourself. This nudges you towards writing simpler code, away from thousands of nested if statements and functions with hundreds of lines. Those are desirable traits in all programming languages.
No language is suitable for everything, but most languages have at least one are where they are really useful.
You seem to be experienced in being a twat without anything to back up his rant...
The only area where humans outperform compilers reliably is conditional jumps.
The problem here is that in software a conditional jump IS what makes it useful. Software will be full of it and this is exactly the kind of thing that can lead someone to out-optimising a compiler.
A compiler can generate perfect code for pretty much every bitwise operation. It can even be told to do it in the smallest size (prioritise looping) or fastest speed (prioritise linear execution). Beating a compiler in doing mathematics such as non binary division is also an incredible feat. But beating a compiler using a clever set of conditionals is actually quite trivial.
That said on a simple system people don't realise how much control they have on the assembly generated. e.g. do-while and while-do for instance generate different code. Vendors write hole manuals on how their compiler handles these scenarios allowing you to optimise your assembly in your higher level language.
That hasn't really been true for a long time, unless your hand-written assembly code will reliably outperform a good compiler's generated code.
Hint: It's a reasonably safe bet that it won't, unless you actually are a world-class expert on the subject, such as the people who write those compilers.
Compiler writers aren't generally skilled in assembly language programming.
Modern CPUs are not the chips your grandpa programmed. They are full of caches, pipelines, predictive execution, parallel operations, and numerous other confounding factors that mean what you think will run fast and what will actually run fast may be two wildly different things, even in apparently simple cases.
Have you ever written any assembly code? Because writing code for an OoO processor without obvious weaknesses (like the Pentium 4 had) is trivial.
Caches have to be taken into account as much as in high-level languages, pipelines aren't a problem, I don't know what you mean by predictive execution - if it's speculative execution one can generally ignore it altogether, parallel operations is another mystery - if you are referring to SIMD it's actually simpler to code in assembly.
Now if we'd be talking about a strongly skewed architecture like a non-interlocked VLIW your statement would be more correct _but_ not true. A skilled assembly coder would still generate better code than a compiler however with a lot of effort.
Whole teams of very smart people spend years reading and understanding thousands of pages of CPU specs so they can write compilers that analyse and optimize code at the speed of those modern CPUs to generate efficient machine code from it. Even if you can beat them on implementing one simple algorithm in isolation because you can spot something the compiler missed, modern high level languages encode so much more programmer intent than raw assembly that you might still lose out overall because you're disrupting larger-scale optimisations.
The only advantages of high level languages are portability and faster programming. But the disadvantage is a worse interface to the actual execution hardware - and that is a huge disadvantage.
Yes it is. Even a skilled C/C++ programmer will not generally approach the performance of a skilled assembly language programmer, even when rewriting the C/C++ code to be closer to an optimized asm routine. Sometimes the differences are huge.
The C/C++ programmer can spend more time tweaking code as less effort is required for those high-level languages compared to assembly language. In some cases that mean the C/C++ code can be faster (by using a better algorithm) in practice.
There was a very good op-ed from the Usenet days which pointed out that an assembly programmer will always beat a high-level-language programmer on most performance metrics...
Dozens of posts so far and all missing the central point: in terms of determining popularity, tiobe's methodology is junk. People search when they have questions. Sure, more people have questions if more people are using a given language, but this is trumped by obscurity and steep learning curve. Really, there is little in the technology world more obscure or harder to master than machine level coding. If a language is well structured then you just go straight to the web resources, the many excellent C++ sites for example. When nothing makes sense and you don't know where to start, you ask the search engine. Therefore poorly documented and really hard to use languages tend to look excessively "popular" to tiobe. Nonsense.
Now, what really worries me... those people searching for answers about assembly language... they have no idea what they are in for, and what a hazard they are going to be if they actually try to apply the superficial level of knowledge you get from web answers to writing assembly level code of any complexity.
When all you have is a hammer, every problem starts to look like a thumb.
Depends on what you mean with 'reliable'.
I'm sure the numbers are correct. But they only track what languages people talk about.
While popularity is one reason for people to talk about a programming language, there are many others.
Newer languages will get more traffic than older ones since people have more questions about them.
More complicated languages will get more traffic because people have more problems with them.
Also, the granularity sometimes seems off. Assembler is a good example: there really isn't a single assembler language; there are many different ones. How useful is it to lump them all together?
So this index does provide some interesting data, but you have to be careful when trying to draw any conclusions.
COBAL battles the FORTRAN Godzilla in Silicon Valley..
Is COBAL a mix of COBOL and COMAL?
/. refugees on Usenet: news:comp.misc
Compiler writers aren't generally skilled in assembly language programming.
To be fair, neither are most people writing assembly language. Most people I've worked with in this area certainly didn't have an exhaustive knowledge of all the available opcodes and their detailed performance characteristics on every processor architecture we targeted. I'll be the first to admit that I didn't.
However, some of the best assembly programmers I've known have been working on compilers. It makes sense to hire those people to work on code generators used by millions rather than to hand craft assembly for individual projects.
Have you ever written any assembly code?
Yes, professionally, for some years, on several different architectures, and in performance-sensitive applications. These days I read assembly a lot more than I write it, for exactly the reasons we're discussing here.
Because writing code for an OoO processor without obvious weaknesses (like the Pentium 4 had) is trivial.
Perhaps, but Pentium 4 was hardly the same as today's major CPUs from the likes of Intel and ARM. It's not as if writing an XOR to zero a register is the most tricky thing to get right any more.
Caches have to be taken into account as much as in high-level languages, pipelines aren't a problem, I don't know what you mean by predictive execution - if it's speculative execution one can generally ignore it altogether, parallel operations is another mystery - if you are referring to SIMD it's actually simpler to code in assembly.
The thing is, though, that good compilers for high level languages can look at much larger areas of the code and systematically optimise them, thus making better use of finite resources like cache capacity.
Will your hypothetical assembly programmer consistently rewrite their functions to perform the appropriate level of inlining and fusion effects based on context? Modern compilers for some high-level languages already do.
Will your hypothetical assembly programmer rewrite their entire function a different way with more efficient register allocation if the underlying algorithm changes? (Granted this one is less relevant these days if we really are talking about a relatively high-end CPU with plenty of registers available anyway, but not all CPUs have that luxury.)
Will your hypothetical assembly programmer rewrite an entire hierarchy of functions a different way to avoid both blowing the FPU register stack and spilling unnecessarily?
Will your hypothetical assembly programmer detect implicit parallelism in their algorithms and restructure them to use parallel low-level operations (as in SIMD) and/or multiple threads for better performance? There's a lot of research going on in these kinds of areas for building better HLL compilers, too.
The only advantages of high level languages are portability and faster programming.
You missed the crucial ones that give compilers a natural advantage in the long term: they can analyse much larger parts of a program together, even a whole codebase; they can do it with extra semantic information about actual programmer intent to guide them; and they can do it consistently and quickly. Those compilers can apply optimisation and code generation strategies at much broader levels than any manual assembly coding ever could.
How can manual assembly hope to match the kinds of large-scale optimisations performed by modern HLL compilers across functions? You'd have to write a new unified assembly function for every combination you needed. Obviously that is possible in theory and anything a HLL compiler can do can be done in assembly as well, but the amount of work required is already prohibitive in some cases and will only become more so with time.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
True enough, and I agree it could sometimes be relevant. But looking at it the other way around, $5M spread out over 30 million widgets is only a few cents per widget, so unless your widgets are extremely price sensitive and your requirements extremely simple, this is unlikely to be a deciding factor.
I work with clients who develop various embedded systems, and I haven't actually seen a real world example of using a very simple, very cheap CPU in this way for probably more than decade now. YMMV depending on your industry, of course.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Completely application dependent. What if your requirements are to operate 6 months on a single AA battery and the whole application is to shift out a 16byte sequence? There are about 20-50x more systems with requirements like this in the world than those that would require an atmel or arm chip.
That's quite a strong claim to make without any supporting data. I do a fair bit of work in the embedded space, as do many of my colleagues, and the overwhelming majority of that work in recent years has been with more powerful CPUs. Obviously our experience might not be representative of the wider industry, but we'd have to be extreme outliers if your figure is correct there.
That is not to say there isn't also a need for much simpler devices. However, in those cases you're probably programming them in assembly because that's all you've got and your logic is almost trivial. You're probably not writing assembly just because it's faster, which was the original claim in this discussion.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
I might believe you when I see a compiler that can do global compilation rather than rigidly adhering to some ABI spec, but until I do, I'm going to call bullshit on everything you say.
Good mainstream compilers have been doing that for well over a decade, and the best ones are much better at it than any human will ever be.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Do you mean using ambient volume as a trigger, with some means of gain control to make activation independent of overall playback volume? This would perform very poorly, as it would react to room noises and react to noises in the soundtrack that aren't in sync with the action.
Better performing options need an MCU. One is for the video to encode vibrator instructions in audio steganography, which would need a DSP to extract. Another, as used in Teddy Ruxpin toys, is to encode timing on a separately transmitted channel. For a vibrator, this would probably be RF of some sort, such as Bluetooth.
RISC made sense when silicon real estate has been a premium. Nowadays not so much, hence modern RISC chips have way more instructions than CISC CPUs. Also complex instructions make more sense nowadays, because bandwidth is premium now.
"It's such a fine line between stupid and clever" -- David St. Hubbins, Spinal Tap
The problem with that is that it makes an assumption about the processors stack and how it works.
Not exactly. If your language needs to support cactus-stack-structured lexical closures anyway, it can be worth not implementing your activation stack in terms of the CPU's stack. Notably, Appel's original ML implementation didn't bother with a traditional stack and just allocated fragments on the heap.
ISO-C has a lot of behavior that is specified as being undefined or compiler-defined that you have to be aware of, but if you are your code will compile for more architectures than any other language out there, no matter how portable they claim that it is.
I see you haven't been bitten by this yet. Trust me, when your old code compiles on every architecture you can think of EXCEPT for x86-64, you'll get just a little bit upset too.
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
to know it is a bullcrap index. 1986: Ada as #2. Really? Lisp as #3? Really? The world was COBOL and C back then with FORTRAN (caps then) and assembly thrown in. Was there hype around both Ada and Lisp? Yes, very much so. There wasn't even an official standard until the early 80s but Ada and Lisp were going to be "the future." Perhaps more believable is the 1991 #3 as that is when DOD briefly mandated Ada use.
To call either Ada or Lisp "popular" at any time is really a reach. Hype/astroturfing is not popularity. Even tracking "how to I learn ..." searches does not measure popularity. They are measures of interest. Popularity can only be based upon job requirements, number of projects, even (broadly) loc metrics. Popularity is reflected by use/consumption, not what is talked about.
popular: 1. liked, admired, or enjoyed by many people or by a particular person or group. 2. (of cultural activities or products) intended for or suited to the taste, understanding, or means of the general public rather than specialists or intellectuals.
Of def 2, substitute "general programmer" (ie, Java, C, C++) with 'specialists/intellectuals' being those involved with Haskell, D, etc.
You have to be able to manage both high level and low level details *in the same context*
I code in C#, while I don't code in ASM I have respect for it and have done a lot of reading on it when I was around 7 and have read about modern CPU architectures and cycle latencies and cycle throughputs of different instructions. I have also read about how C#'s GC works, how objects work(More than 128 bytes for an empty object), how interfaces are work(method indirection can be O(1) when you have one implementing class, but O(N) when you have many and they're being used), casting for inheritance (casting child to parent is pretty much free but casting parent to child is expensive, like hundreds of cycles), method parameters work (large structures or too many parameters cause an object to be allocated on the heap that holes the parameters, same with method return types if they're too large).
I can many times jump into some other programmer's code and make it a few factors faster. It some cases, orders of magnitude. My pet peeve is that most people think of performance as an afterthought because "don't preemptively optimize", but they apply this to their architecture. A high performing architecture needs to be designed from the beginning, and this also requires being able to mix high and low level details, even before the low level details are fleshed out.
Why not for the pleasure of coding against the processor directly?
Joining this thread a bit late, but I thought I had to mention this.. Security hasn't yet been mentioned as a reason for using assembly. Let's say you need a hardened general computing device with a handful of basic apps running on it. Start with the device's BIOS, and then move on to HW components' firmware (e.g., SSD drive). Then audit the OS (e.g., minimalist version of a Linux distro), and finally begin coding the app(s) of interest with an assembly compiler you trust (not trivial!). Huge, inefficient undertaking... realistically one within the domain of only gov't entities. But this is unfortunately what is needed in today's era of compromised BIOS, firmware, operating systems and even compilers that automatically insert "beacons" and countless other crap. In 1985 my father threw an x86 assembly book on my desk after I complained I couldn't run Sierra's Kings Quest series on his rig optimized for AutoCAD with a Hercules monographics card (i.e., only CGA, EGA and VGA supported by Sierra). So I learned assembly, and it took me months to display my first pixel in "graphics" mode. Point being: going through this exercise, I was able to count and account for every byte in the .exe or .com executables.
If you do not know assembly, you cannot be a really good coder .Net.
That is bollocks.
Assembly only helps you in doing: assembly. On the machine you know how it works.
It does not help you at all to program Pascal, BASIC, Perl, Java or any
To learn why your Java program sucks you need to know O calculus and the differences of algorithms and approaches, and: how the Java Virtual Machine works. Oh, just because it is virtual it is not real?
Of you can mot graps how a processor/computer works until you have learned how to programm it with assembly: you lack serious abstraction skills. Probably the reason why you ditch high levle languages and believe knowing assembly makes you superior.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
You're correct, C itself is simply a way to tell a compiler to generate SOME SORT of instructions that accomplish X. If X is possible in environment Y, then C COULD be used in that environment. Practically speaking you could take a compiler that targets your instruction set, use a libc version with that and a cstart.o with that which are tailored to the specific environment and do what you're suggesting.
Of course in some very resource constrained environments that MAY not be worth the bother, and optimizers are certainly less effective when they have very few options to choose between. It may also be that in many cases only a small subset of C constructs CAN be effectively translated to a really limited environment.
So, there's a SMALL set of cases where you may have no choice but hand assembling code. These are going to be extremely resource constrained. Things like embedded sensors that run on ambient power (thermal, vibration, etc) or the energy of a wi-fi signal they're picking up. This is where you get into nano-watt power budgets and 'processors' that have only a couple registers, a few dozen bytes of RAM, and a tiny amount of ROM. Usually their ONLY function is to do some Analog to DIgital conversion and dump the results in a fixed format to a radio or other similar interface. This kind of 'code' is really just replacing a bunch of custom fixed-function electronics with a very basic progammability for design flexibility reasons. No real decisions are made in code, etc. These days anything that incorporates any significant control laws, etc is going to be on a control system that's talking to such sensors, and that will be in 99.99% of cases at least a 16-bit controller. Heck, embedded military systems all went 32-bit 20 years ago now. These things are all programmed in PL/1, PL/M, ADA, or some other HLL.
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
Prominent 8bit processors like 8080/8086/8088 variations and Z80: are complicated
6502/65c816, 68008, 6809: thise are easy as they are much more orthogonal in design.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
8086 is a very old processor.
A compiler hardly can do any optimizations on it.
Why you think that is also true for an i5 or i7 or SPARC or a MIPS is beyond me.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Yes, I am aware, these are the "small" things I mentioned in my post. Perhaps my wording de-emphasized them unintentionally.
For what it's worth, 30% on a modern x86 is noise. You'll get that difference from very small changes. We recently were writing some assembly code to benchmark the cost of atomic ops. A simple loop that did nothing took 2.5 times as long to execute as a loop that had an extra add in the loop - the simple version hit a pathological case in register rename, which we fixed by adding an instruction zeroing a register and marking it as dead. There was a paper at ASPLOS two years ago that showed that you get a 30% delta on most programs from randomised code layouts just from different cache interaction.
I am TheRaven on Soylent News
Resetting the stack is just an add to the stack pointer. If you're using any stack space in the callee, then you're doing that anyway and so it's just a different immediate value in the instruction. If you're calling anything else, then you're saving a frame pointer and so it's a cheap register-register move (basically free on a machine that does register renaming).
I am TheRaven on Soylent News
I'm sorry I wasn't clear. I call moving the stack pointer to a specific known point as opposed to pushing and popping, "resetting the stack". The return has to modify the stack when it pops the return address to the calling program. so your statement about needing to adjust the stack pointer every time you call that function makes no sense as the return instruction has a parameter for adjusting the stack in addition to the amount necessary for the call. I touched on argument/parameter order, but I didn't link it strongly to the pascal keyword. I also think that the space saved by using an instruction once at the end of the procedure as opposed to an instruction after each and every return from a procedure a particular reason.
If the function is used from another compilation unit, then you have to do that anyway (or, at least, document the calling convention and make sure that every other call site uses it). If it isn't, then most compilers will mark it as not needing to follow the public ABI and will move to a different calling convention. Some (Pro64 and derivatives, and there's a GSoC project for LLVM this year) will do inter-procedural register allocation on these functions and try to define a custom calling convention where things that the callers want to keep live are callee-save. That's very hard to do manually for anything that has multiple callers (and if it only has one caller, any modern compiler will inline it and you'll have no call overhead).
I am TheRaven on Soylent News
They don't have plenty of memory in comparison to a modern laptop or phone. Most do have plenty of memory in comparison to desktops from the 80s that were able to run BASIC interpreters at a reasonable speed. A lot of them have plenty of memory in comparison to a '70s Xerox Alto that could run an entire GUI written in a garbage-collected pure OO language.
I am TheRaven on Soylent News
I'm not talking about "after each use". At least, in any sense that you aren't resetting it after each use either way. Just that some calling methods "reset" the stack on return by the called procedure, or on architectures without a return instruction that has a parameter that specifies how much to adjust the stack by, immediately before, or immediately after by the calling procedure. Having the calling procedure free the stack means that there is an extra instruction each and every time a call is made. The called procedure has to modify the stack anyways when the space of the address to return to is adjusted on the stack.
I'm not sure what degree you are agreeing with me that it makes sense for the callee to do the stack reset as part of the return as opposed to the caller doing it. I don't understand the need of a frame pointer, though I am aware of the "enter" "leave" instructions on the Intel architecture. I'm sorry if you misunderstood me. Sometimes I have difficulty being clear.
Leave the field. I have never met someone who "eventually learned". Programming seems to be bimodal. Either they can or they can't. There are those who can't and seem like they can, until you dig into their Rube Goldberg code only to realize it was working only by pure luck. These are the most dangerous "programmers".
One common characteristic that I've noticed with people who I would considered "programmers" is they can debug code without ever seeing the code or using a debugger. Computers are logical and you can reverse-engineer their high level designs based on their operational characteristics and there are only so many reasons why certain errors can occur. Using nothing but logic should be enough to debug most problems.
You also don't need to know any assembly language if your computer doesn't have one. For example, the Oberon system is just Oberon.
Ezekiel 23:20
That is not the point.
And I'm pretty sure you can write Oberon modules in Assembler if you want.
Point is: being good or even knowledgeable in Assembky is completely irrelevant for the question if one is a good coder/programmer or not.
It gets more and more irrelevant when you consider massive parallel architecture etc.
The parent does not even know that old school programmers in the 1950 (General Data comes to mind or Honywell and Bull etc.) never really used Assembler. Basically all business applications where written in Ad Hoc invented 'business byte code'. A small virtual machine was implemented for every 'topic', and the problems and business cases of such topics where programed using that virtual machine byte code.
Of course you used a macro assembler for that.
Think about Sweet16 but on a much higher level.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
80x86 was what was written. Also known as x86 (which just removes the first two chars from 80x86). In most cases x86 is used to refer to the 32 bit extension (i386) or, like here, the 64 bit extension. That is also known as AMD64 (as AMD introduced the extension), Intel64 etc. and also x86-64 - which is used in the project in question.
https://github.com/tthsqe12/as...
Observe: "Welcome to the project of converting stockfish into x86-64".
So what are you talking about exactly?
Don't know.
What exactly are you talking about?
The original parent wrote 8086 ... at least I read it that way.
Regarding the project, it is unclear if he is converting the existing code routine by routine, or doing a complete rewrite.
Having a speed increase b 10 - 20 percent is impressive, but pretty meaningless for real life software projects. You basically pay 10 - 100 times the price verus C/C++ to write the same code in assembly.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Compiler writers aren't generally skilled in assembly language programming.
To be fair, neither are most people writing assembly language. Most people I've worked with in this area certainly didn't have an exhaustive knowledge of all the available opcodes and their detailed performance characteristics on every processor architecture we targeted. I'll be the first to admit that I didn't.
However, some of the best assembly programmers I've known have been working on compilers. It makes sense to hire those people to work on code generators used by millions rather than to hand craft assembly for individual projects.
Have you ever written any assembly code?
Yes, professionally, for some years, on several different architectures, and in performance-sensitive applications. These days I read assembly a lot more than I write it, for exactly the reasons we're discussing here.
Because writing code for an OoO processor without obvious weaknesses (like the Pentium 4 had) is trivial.
Perhaps, but Pentium 4 was hardly the same as today's major CPUs from the likes of Intel and ARM. It's not as if writing an XOR to zero a register is the most tricky thing to get right any more.
Caches have to be taken into account as much as in high-level languages, pipelines aren't a problem, I don't know what you mean by predictive execution - if it's speculative execution one can generally ignore it altogether, parallel operations is another mystery - if you are referring to SIMD it's actually simpler to code in assembly.
The thing is, though, that good compilers for high level languages can look at much larger areas of the code and systematically optimise them, thus making better use of finite resources like cache capacity.
Will your hypothetical assembly programmer consistently rewrite their functions to perform the appropriate level of inlining and fusion effects based on context? Modern compilers for some high-level languages already do.
So we are discussing scientific code? General purpose code will not get huge advantages from advanced inlining etc. I don't know what you mean by fusion - the only thing that comes to mind is loop fusion.
And the answer is of course yes.
Will your hypothetical assembly programmer rewrite their entire function a different way with more efficient register allocation if the underlying algorithm changes? (Granted this one is less relevant these days if we really are talking about a relatively high-end CPU with plenty of registers available anyway, but not all CPUs have that luxury.)
I'd hope so given the fact the code have to be rewritten. Register allocation is one area where a good compiler often is better than a skilled assembly programmer as generating the perfect schedule is very work intensive. In practice a skilled assembly programmer will not be far from optimal by simply using heuristics. Even a good compiler today (GCC, LLVM) can have problems with placing spill-fill of registers and that interacts with the register allocator and a lot more.
Will your hypothetical assembly programmer rewrite an entire hierarchy of functions a different way to avoid both blowing the FPU register stack and spilling unnecessarily?
I don't think there would be an hierarchy in the optimized case. IME compilers are very bad in handling the register-stack hybrid while assembly programmers are capable to handle them after a learning period. Intel even made the FXCHG instruction faster so that compilers could treat the FPU stack as a kind of register file.
Or was as nobody not requiring extended precision floats use the x87 instruction set any more.
Will your hypothetical assembly programmer detect implicit parallelism in their algorithms and restructure them to use parallel low-level operations (as in SIMD) and/or m
"Premature optimization is the root of all evil." Said some sage. It was either Fred Brooks or Donald Knuth, I forget. Google it.
>> If you do not know assembly, you cannot be a really good coder
> That is bollocks.
Only shitty programmers are clueless about assembly, which in turn implies they lack an understanding of memory access patterns.
Hint, try *reading*: Pitfalls of Object Oriented Programming
Even Bjarne Stroustrup, the designer of C++, until 2012 was completely cluess _why_ doubly Linked Lists were so slow compared to Arrays
HINT: Managing the L1 Cache usage is critical for performance sensitive code.
Programmers concerned about speed use Data-Orientated Design. For details see CppCon 2014: Mike Acton "Data-Oriented Design and C++"
Knowing when to use, and NOT to use OOP, makes a programmer better. Using design patterns without *thinking* shows others you don't understand programming.
--
Wanted: An Apple 2 Thunderclock Plus peripheral card.
So we are discussing scientific code? General purpose code will not get huge advantages from advanced inlining etc.
I'm assuming we're talking about something where performance actually matters, for sure. If the problem doesn't require particularly efficient code given the speed of the system it's going to run on, it's probably not very useful to drop down to assembly anyway, nor to consider how well the code generator for a high level language optimises its output.
I don't know what you mean by fusion - the only thing that comes to mind is loop fusion.
It's a general category of optimisations used when you're composing multiple operations over the same stream, data structure, etc. A typical example might involve a programmer writing some list processing code as filtering with one function, composed with mapping with another function, composed with reducing using a third function to get the final answer. An optimising compiler might merge those operations into one space-efficient loop that calculates the final answer without ever generating the intermediate lists.
Put another way, fusion is similar in effect to applying some combination of inlining and loop-based optimisations in situation where you're composing multiple operations over data sets, with the goal of eliminating the storage of unnecessary intermediate values and the overheads of passing them around. It's particularly relevant with higher-level languages that describe their data crunching in functional terms, where a naive implementation is much slower than the fused version.
If you know assembly programmers who would routinely apply that degree of tight cross-function optimisation (and maintain the code well as the underlying functions evolved later) then I'll be both genuinely impressed at their diligence and somewhat disturbed at how much redundancy they must have in their code base.
I don't think there would be an hierarchy in the optimized case. IME compilers are very bad in handling the register-stack hybrid while assembly programmers are capable to handle them after a learning period.
I'm not sure that's entirely correct. Even when I did more work on these things a few years ago, compilers were already doing cross-function optimisation right down a call stack to optimise the use of the floating point register stack.
I brought this one up as another example where if you were writing the functions manually in assembly, you'd have to either devise your own custom calling conventions for every case (and so potentially reimplement the same functions multiple times) or accept less than optimal performance. As you point out, it's probably not the best example in the context of current CPUs, though.
Most code isn't compiled with whole-program optimization. In fact a huge amount of software are compiled with little optimization done.
Maybe, but then most code isn't developed with hand-tuned assembly for its hot spots either. I'm assuming that we're talking about performance-sensitive cases where that kind of effort would be justified, and that we're interested in which strategy is likely to give the best results in practice. My contention is that, in 2016 and on most modern CPUs, it is likely that using a high level language and a good optimising compiler will give better results than most people would achieve by dropping to assembly.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
It really is, in the sense that it allows all kinds of fine low-level control over the data you're working with. Pointers and aliasing, mutability by default, imperative programming style often with manually constructed control flows, global state and often shared state if concurrency is in use... These things all introduce ambiguity into the programmer's intent and so make unsafe the assumptions that would support lots of different optimisations.
In short, optimisation is usually not about the upper bound of where a language lives on the abstraction spectrum, but the lower bound.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
TIOBE results doesn't makes much sense. They are using general web searching to rank their list. It seems Redmonk, http://redmonk.com/sogrady/201..., is more accurate which is basd on github and stackoverflow. I hope they also publish their new analysis soon.
Be like shadow in the light or darkness.KMZ
Sorry, ;) ;)
No idea why you are ranting like that.
In Java you have no influence in caches etc.
In C already it is limited.
And frankly: I don't need to read your links for education. Only perhaps for amusement and for correcting the errors in them
Your "lack of understanding of memory access patterns" has nothing to do with assembly or C++.
I guess I was the first one writing simple template based data structures + custom operator new - libraries that tried to exploit caching behaviour (1993 or so?). However that has nothing to do with assembly
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
I think I wrote somewhere else that the compiler in the Atmel Studio is one of the worst when it comes to optimization. You might want to do the final compile using gcc, that actually does some pretty good optimization.
Gonna call bullshit on that too?
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
ISO-C has a lot of behavior that is specified as being undefined or compiler-defined that you have to be aware of, but if you are your code will compile for more architectures than any other language out there, no matter how portable they claim that it is.
Incidentally, the C standard contains a lot of undefined behaviour which isn't there to ease porting to obscure platforms, but to enable optimisations that only seem to give a measurable performance improvement in highly artificial benchmarks.
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
Only shitty programmers continue to make excuses why they are shitty.
Arrogance is assuming you THINK you already know.
> In Java you have no influence in caches etc.
FALSE.
Do you actually understand _anything_ about cache lines???
HOW you access memory determines your maximum throughput.
Proof: Here are two different ways to sum up an array:
1. In the first example we access every 0th, 16th, 32nd, 64th element. Then 1st, 17th, 33rd, etc. Then the 2nd, 18th, 34th, etc. Our stride is 16 elements.
2. In the second example we access every element in a contiguous fashion. Our stride is 1 element.
If one "had no influence in caches" like you ignorantly assume the timing would be identical. However one is SLOWER then the other.
Copy and paste into an online Java compiler
> Your "lack of understanding of memory access patterns" has nothing to do with assembly or C++.
If you would actually _watch_ and _learn_ you would recognize this is false.
https://youtu.be/rX0ItVEVjHc?t...
> libraries that tried to exploit caching behaviour (1993 or so?). However that has nothing to do with assembly ;)
Here's your cookie for missing the point"
Assembly language programmers tend to be more knowledgeable of when and where their code is slow. C++ and Java programmers tend to be more ignorant of high performance because they generally don't have a clue _how_ the compiler is generating the corresponding asm for their code -- and then they wonder why it is slow.
A good programmer learns assembly to better understand how to write simpler, smaller, and faster code.
A shitty programmer makes excuses for why they don't understand assembly.
Assembly language programming is still alive and in good health even for x86 processors.
And there are projects written entirely in assembly language. Get some links:
FASM assembler.
FASM clone for ARM.
Advanded RAD IDE for FASM. (and FreshLib portable library)
RWASA - High performance, scalable web server.
MiniMagAsm - Small content management system (CMS).
AsmBB - High performance lightweight web forum software.
Kolibri OS - small and very fast operating system with GUI interface
I am not counting the small exercise projects here and there. I am not counting mixed language projects where assembly language is used together with high level languages.
The above is my post. I simply missed to login. ;) Some of the above projects are mine, some of other assembly programmers.
And if someone think assembly language is hard to code and support - simply look at the timeline of AsmBB. It has been written for a month in my spare time.
The timeline contains only 89 commits:
AsmBB timeline
People suck at macro optimizations in assembly. If your assembly program is more than 25 instructions, you're probably not going to do a better job than a compiler.
Maybe, but not optimized programs in assembly language are much faster than not optimized programs in high level languages. And as long as 99% of the code is not optimized, the result is predictable - the assembly language programs are faster almost always.
Haven't seen it mentioned yet. Another reason to learn assembly (why I am right now) is for debugging closed source apps to earn bug bounties.
You have a few different groups of apps to target, with different rewards and challenges.
Web apps, open source apps, closed source programs.
Looking for bug bounties, the first pays less and there's tons of people looking for the same bugs. The second group is usually not going to pay unless you're looking at the Linux kernel or a specific server. You can use many source code analysis tools on these. The closed source apps are where the real money is at. Browser escapes, media player RCE, OS exploits, you'll need assembly skills to debug these and see what's going on internally. This is where you find $100,000 bug bounties.
Cwm, fjord-bank glyphs vext quiz
TL:DR; You need to learn HOW to optimize:
@37:30 -- Mike Acton: Code Clinic 2015: How to Write Code the Compiler Can Actually Optimize
- - -
> If you want speed, assembly is the ONLY option.
Total NONSENSE.
A. You keep implying this word optimization -- it doesn't mean what you think it means!
B. There are four lights, er, types of optimizations:
1. Use a lower level language
2. Micro-optimization or Bit-twidling
3. Algorithm
4. Macro or cache-orientated, aka (Data-Orientated Design)
What do these mean?
1. Use a lower level language
With bloated languages and incorrect use of C++, Java, etc., inexperienced programmers naively thing changing to a "lower level" language -- such as C or Assembly -- will help speed up their code. While it is true one has access to more programming paradigms in a low level language, i.e. you can use the carry flag as a return value instead of wasting an entire byte with assembly, this type of optimization only takes you so far before you need to look at alternatives.
2. Micro-optimizations
All good programmers should read (and understand!) these bit twiddling optimizations:
* bit-twiddling hacks
* Hackers Delight
While compilers can generate "good enough code", sometimes hand-optimized instructions can beat the compiler. e.g. Before compilers optimized division with reciprocal multiplication, a common technique for division was to manually change division into reciprocal multiplication. i.e. `/ 3` -> `* 1/3` which means you would see something like this:
would be replaced with:
Thankfully most compilers will perform these integer divisions but you can try this out with an online C compiler:
These types of micro-optimizations are becoming rarer and rarer as compilers (slowly) get better. However Don't assume. VERIFY your assembly output of the compiler.
Floating-point optimizations still show up. The most famous is probably John Carmack's Quake 3 Inverse Square Root
This PDF provides a very good explanation:
* http://www.lomont.org/Math/Pap...
3. Algorithm
The fastest way to optimize (from the programmer's run-time) is to replace a slower algorithm with a faster one.
i.e.
* If the common case for your data is unsorted, then replacing a dog-slow bubble sort with quick-sort will show gains.
* However, if the common caseis that the is 99% mostly sorted, then changing algorithms may not always help.
This is where most people start to optimize. BUT, notice how I said "Common Case". There is a _higher level_ of optimization we can do:
4. Macro or Cache-orientated.
The 0th rule in optimization is:
Know Thy Data
When you optimize you need to optimize for the common case. This means understanding data flow, Memoization, and transforms. You _must_ question ALL assumptions, knowns, and unknowns. What, exactly, are you trying to do? i.e. Printing Primes and Printing
A good programmer learns assembly to better understand how to write simpler, smaller, and faster code.
A shitty programmer makes excuses for why they don't understand assembly.
And that is just the point. Most of the excuses here have stayed the same for the last few decades and are just as invalid today as they where back then.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
These are decidedly special cases and you get no argument from me.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Whoops, forgot the link to Mike Acton's talk:
Code Clinic 2015: How to Write Code the Compiler Can Actually Optimize
* https://youtu.be/GPpD4BBtA1Y?t...
I too have worked on a C++ compiler (PS3).
While it sounds like you are very knowledgeable in the first 3, sadly, it sounds like you don't understand the 4th level of optimization aka Data-Oriented-Design. :-/ The levels of optimization are:
1. Use a lower level language
2. Micro-optimization or bit-twiddling
3. Algorithm
4. Macro-optimization or cache-orientated aka Data-Orientated Design
I think you'll enjoy these links:
* Pitfalls of Object Oriented Programming
and
Code Clinic 2015: How to Write Code the Compiler Can Actually Optimize
> or perhaps recoding Java into C++.
Unfortunately, this is the 1st level of optimization by the inexperienced and is the *wrong* place to start.
You need to look at the *data first*, NOT the code before you start optimizing.
Switching implementation languages is (usually) a symptom of NOT understanding the problem.
The modern vernacular paraphrase is:
Here is my post where I explain the 4 levels of optimizations in greater detail.
Yup, I use ca65 directly since c65's code generate is so horrible.
There was a paper at ASPLOS two years ago that showed that you get a 30% delta on most programs from randomised code layouts just from different cache interaction
What's funny is when someone micro-benchmarks the heck out of something and shows me empirical "evidence" that their code is faster than mine. Then they put the code into production and their code is suddenly running really slow under heavy load. Throw in my code for S&Gs and suddenly the service is running much faster, and I don't even put much effort into the design like they do. Many people fail to understand how processes within the system interact with each other when it comes to cache and memory bandwidth. Even when I try to explain my theory as to why my code runs faster, their eyes glaze over. And that's why many of our services run like crap.
In other words, a 100% increase in performance of a micro-benchmark can result in a 10% reduction in the macro-benchmark.
If you are on an 8 bit MCU, speed is critical because you wouldn't be on an 8 bit MCU if you had a choice to use larger faster hardware.
Frequently the programmer has no choice about the hardware but is still required to make the code as fast as possible.
All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
How dumb are you?
You are mixing up memory access patterns with data structures.
class Part {
int data;
}
class Composite {
Part (*)somePart;
}
In C++ I can over load operator new() so that all Parts are 'close' their Composite.
In Java I can't.
But thanks for your idiotic post ablout my ignorance. Idiot
And all your idiotic ranting has nothing to do with assembly anyway. A processor cache works more or less the same regardless what language you use or what processor architectue... Idiot, as you just have shown yourself. Idiot
Chance is that I speak more assembly languages than you idiot anyway, and no: it does not help me write better C/C ++ or Java. It is completely irrelevant
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Little $2 8-bit microcontrollers are all over the place. They are cheap, low power, small. They are also slow. (20 MHz or less).
Play Command HQ online