Learning Computer Science via Assembly Language
johnnyb writes "
A new book was just released which is based on a new concept - teaching computer science through assembly language (Linux x86 assembly language, to be exact). This book teaches how the machine itself operates, rather than just the language. I've found that the key difference between mediocre and excellent programmers is whether or not they know assembly language. Those that do tend to understand computers themselves at a much deeper level.
Although unheard of today, this concept isn't really all that new -- there used to not be much choice in years past. Apple computers came with only BASIC and assembly language, and there were books available on assembly language for kids.
This is why the old-timers are often viewed as 'wizards': they had to know assembly language programming. Perhaps this current obsession with learning using 'easy' languages is the wrong way to do things. High-level languages are great, but learning them will never teach you about computers. Perhaps it's time that computer science curriculums start teaching assembly language first."
Is "Linux x86 assembly" any different to any other kind of "x86 assembly"?
Although unheard of today, this concept isn't really all that new -- there used to not be much choice in years past.
While starting Computer Science students off with assembly (without first introducing them to a high-level language) may be a relatively new concept these days, the idea of teaching low-level languages to Computer Science students is not a revolutionary technique whatsoever. Every decent Computer Science curriculum includes several semesters of courses in which assembly language is required, to demonstrate their knowledge of basic computer processes.
That reminds me of a great fortune:
"The C Programming Language -- A language which combines the
flexibility of assembly language with the power of assembly language."
My Grandfater worked for IBM in the 70's and 80's. He did all his coding in assembly and machine language. His motto is "Anyone who doesn't know machine language has no business using a computer."
There has to be a happy medium IMHO, and I think this is a great start. While my Grandfather was on the cutting edge of the PC revolution, he now has trouble figuring out email, etc, because he operates at too LOW a level (and I feel that he now has no business being online!). Then you have the users who have the same problems because they operate at too HIGH a level (AOL, etc...). The majority of programmers nowadays fall about smack in the middle of these two groups, but I'd argue they should be a little closer to the lower levels than they currently are.
I learned LOGO and BASIC as a kid, then grew into Cobol and C, and learned a little assembly in the process. I now use C++, Perl, and (shudder) Visual Basic (when the need arises). My introduction to programming at a young age through very simple languages really helped to whet my appetite, but I think that my intermediate experiences with low level languages helps me to write code that is a lot tighter than some of my peers. Let's hope this starts a trend, it would be great if more young (and current) programmers appreciated the nuts and bolts!
Sounds more like a programming book than compsci book.
writing an RB tree or an A* search an assembly would be a huge pain in the ass, if you ask me.
compsci is a large part about data structures, how to choose the right datastructure, how to get the most out of an algorithm by picking the best datastructure, etc...
but i didn't read the book, so i'll just go back to my websurfing now...
https://www.accountkiller.com/removal-requested
Good Idea: First teaching simple programming fundamentals through a simple to understand language. Then, confuse the hell out of a student with assembly Bad Idea: Teaching CS by starting with one of the most cryptic languages around, and then trying to teach basic CS fundamentals. There are already problems with people interested in CS getting turned off by intro/intermediate programming classes. Imagine the retention rates once my CS100 class is taught in assembly.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
The think concepts of registers and memory locations and stack pointers and branching is easier to understand in assembly. You can teach a simple subset of instructions. It was the way I started back in the day. I scratched my head more later learning C, etc. I guess its just the opposite to kids these days.
I started out learning to code in asm on my c64 and I'd have to say it was a very rewarding experience.
Anyone who disagrees with this probably doesn't have much experience coding in assembler to begin with. Asm really is fairly easy, the trick is that most who teach asm actually spend too much time on those computer concepts and not enough time on actual real coding. It's wonderful understanding how the machine works, and necessary to write good assembler but you should start with the 2 pages of understanding that is needed to "get" asm at all.
Then teach language basics and THEN teach about the machine using actual programs (text editor, other simple things) and explaining the reason they are coded the way they are in small chunks. Instead of handing a chart of bios calls and a tutorial on basic assembler, introduce bios calls in actual function in a program, most of them are simple enough that when shown in use they are quite clear and anyone can understand.
After all assembler, pretty much any assembler, is composed of VERY simple pieces, it's understanding how those pieces can be fit together to form a simple construct and how those simple constructs form together to create a simple function and how those simple functions form together to create a simple yet powerful program that teaches someone programming. Learning to program this way keeps things easy, but still yields a wealth of knowledge about the system.
It also means that when you write code for the rest of your life you'll have an understanding of what this and that form of loop do in C (insert language here) and why this one is going to be faster since simply looking at the C (insert language here) concepts doesn't show any benefit to one over the other.
The idea isn't to actually use the language but rather to learn it to help you understand other languages.
It's like learning Latin. Nobody actually uses it, but it can give you a deeper understanding of the languages that are based on it.
TW
Of all the processors out there, yes the x86 is common but it has to be one of the WORST instruction sets - one of the most difficult to work with.
Is it just me???
I DO think it's a good idea to be teaching assembly, not so sure as the core of a comp sci program however. I started playing with assembly fairly early, on 6052, z80, and then later with 68000 and IBM 370. It's good to know, but I would do major stuff in it anymore. That's what high-level languages are for. You only drop to assembly when you have to for speed or space.
Assembler-first might work with beginners if it was on an emulator where they could see exactly what was happening, and there was no way to crash it. Otherwise, I just don't see the point of making things harder.
Of course if you really want to make it hard, you hand every twelve-year-old kid a copy of Knuth and a hardware implementation of Knuth's hypothetical processor. Then our generation could be completely assured of job security.
Find free books.
There are a million fields in CS-- you can view them as points on a line that stretches from engineering to mathematics. The people who work in architecture are at the most extreme end of the engineering section. If you want to go into systems programming or into architecture, then I can see how would want to base everything off of asm. But if you specialize in ai, or algorithms, or theory, you really don't encounter assembly that often... for the most part, the need isn't there to develop extremely high performance, system dependent apps. In these fields, you could do of a cs curriculum (through graduate) entirely in Matlab, Prolog and ML. The emphasis is on the mathematical structures the program represents over how the computer actually deals with them.
I agree. It's rather unfortunate that one of the most ugly, ungainly, and hacked ISAs out there is also the dominant one. There are some assemblies that are a pleasure to use, though. The 68K line and almost all of the load/store ISAs are nice to use. Some of the older *really* CISC ones are OK too.
I understand C much better than I would have had I not learned assembly language first. I think of C as a somewhat-more-abstract version of assembly. It has that "down to the bare metal" aspect in much of what you can do with it, particularly pointers.
My first IBM PC job was C, but I had to learn 8086 so that I could debug since there was no source level debugging when using overlays.
Anyways, how do you find a compiler bug, if you can't read the code the compiler generates?
Fight Spammers!
Well said. Computing is not a science.
By definition. Science is the application of a rigorous discipline in an attempt to understand nature. Computing has nothing to do with understanding nature and everything to do with implementing logic in physical systems.
This is not to degrade computing. In fact, computing is provably correct as it is based on logic. Science is a statistical endeavour in which nothing is proven, but theories are constructed which demonstrate usefullness and have not yet been disproven. Computing is, however, built on the results of science.
Maths is a branch of logic. Science is a branch of logic. They are of course cross-fertilising. Computing is so close to father logic as to be almost indistinguishable - it's just a way of logic happening in acceptable time scales.
Of course, you might disagree.
Cheers!
Yours Sincerely, Michael.
Furthermore Slashdot should make it a policy for people who submit their own books/publications to reveal that they are the author so that there is no conflict of interest (sort of like how News channels who report news on their parent company or subsidiary always say so explicitly). I think that's only fair to the readers of Slashdot and it won't make us feel like we're being scammed into buying someone's book.
"Injustice anywhere is a threat to justice everywhere." - Martin Luther King, Jr.
I've found that the key difference between mediocre and excellent programmers is whether or not they know assembly language.
You've got it backwards I think. The excellent programmers actually care about what they're doing, and as such have all learned assembly.
Teaching assembly to someone who doesn't care won't turn them into an excellent programmer.
I think whether this idea is a good one or not depends on what the program considers a Computer Science Degree. Where I have taken classes, the philosophy of Computer Science is more the science of algorithms and mathematics rather than practical programming experience. The idea being the research of new and more efficient algorithms or data structures not tied to a specific language . This is more suited towards graduate work in the field of Mathematics and Computer Science.
Some other programs may approach the degree as a professional/vocational type of program preparing the student for eventual work in the field of programming.
Learning assembly may be more beneficial to the student learning as an eventual programmer in that understanding some of the low level work that the computer is doing could be important in programming.
I'm not sure that the mathematics and concept work would help as much considering a lot of the ideas in this is more general and not tied to any specific architecture, so learning the low level process may not help as much.
-- Wolfpup
"A man whose circumstances went beyond his control." -- Styx
Well-designed CISC instruction sets (like PDP11, VAX, and 68k)
Okay, you had me up to that sentence. It's my understanding from people I know who actually used VAX assembly, that it was a bear. Especially if you had to decode the assembly by hand. It had variable length opcodes, which if I remember right, a single instruction could extend to be upwards of 60 bytes. Oh, and that's not to mention, the 11 different addressing modes, which could be mixed and matched on all three operands (dest, left data, right data). Because all instructions could store directly back to memory, it was a bestie to create.
Call me crazy, but somehow that much mixing and matching just doesn't sound like [fun].
I didn't actually use the VAX instruction set - though I did use the PDP11's - both writing and reverse-engineering. They were both designed by the same guy (Gordon Bell) or teams he lead. I'm prepared to defer to people who actually dealt with it that the VAX instruction set was difficult.
But in the case of the PDP11 (where the few-instructions, lots-of-address-modes style came into its own), the plethora of modes actually simplified the job of the assembly programmer (and the reverse engineer). I can tell you this from experience.
Multiple address modes shrank the job of understanding the instruction set by splitting it into two parts - the much smaller set of base instructions than you'd need without the symmetry, and the small set of addressing modes. Rather than having an explosion of special purpose instructions with their own addressing modes, you have an explosion of combinations of the members of two small sets. Once you learned the two sets, the proper combination to achieve the result you wanted was obvious.
The tricks were few - and brilliant.
- The indirect-increment and indirect-decrement modes let you use any register as a stack pointer or index register, for instance. The "official" stack pointer was just the one that was implied by certain other features: interrupt and subroutine calls, primarily.
- With the program counter as one of the general registers, applying indirect-auto-increment to it gave you inline constants.
- The indirect and indirect increment/decrement addressing modes made any register a pointer. (They also led directly to the ++, --, +=, and -= operators of the C language.)
- The register+offset mode and base/offset duality gets you to particular elements of an argument array, or index into a fixed-location array.
and so on.
If you understand a few things about C (Array/pointer duality, walking arrays with auto-increment, etc.), you have exactly the understanding you need to grok the modes of the PDP11 instruction set. (Which is hardly surprising: I understand that much of C was inspired by that instruction set.)
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
I race cars, albeit not professionally. You are very incorrect.
Being able to tell your crew that you think your car is leaning out under hard accelleration or that your suspension is too stiff or unbalanced is made easier if you understand the physics and engineering involved. Most professional race car drivers know a very great deal about these things indeed. Unless you are born rich, most dedicated racers build and repair their own cars and know a great deal indeed about the tools of their trade.
I have an EE degreee; I was taught how to build registers from logic gates; how to build counters and adders from those; how to form the basics of a primitive cpu and implement one in vhdl; how to program x86 assembly; I was also taught how the electrical signals interact to make those things possible; the physics of semiconductors and the things that make those logic gates possible. All of those things have made me able to more effectively program computers on a high level. Why would we expect less from a CS program? Computational engines, computers, are the things that drive the CS profession. I would expect anyone in the field to be intimately aware of their theoretical underpinnings.
Ironically, they have also made me a much better driver as I am intimately aware of the workings and how to tune my car's EFI system than most may be.
Would you go to a doctor who has never taken chemistry? Didn't think so.
..don't panic
The problem is that computer scientists don't make good programmers and vice-versa. If you're good with code and hunker down to write lots of programs, then you tend to clash with the all-theory-no-code camp that delights in big-O notation and graph theory. Of course there is a lot of middle ground, but in general the PHd professor types that staff CompSci departments I've been in tend to have stopped learning about computers as soon as they finished their doctorate and instead concentrate on internecine politics, incomprehensible papers, and teaching the occaisional class (leaving most of that to T.A.'s who actually teach the class and understand how to compile programs).
Meanwhile the coder types graduate with a B.S. or maybe a masters then go into commercial development shops and crank out code, forgetting as much as they can about red-black trees and other subtle CompSci concepts.
So if you want to crank out programmers, then assembly is probably a good thing. God knows I learned a lot from the assembly classes I took.
If you're trying to scare students away then assembly is also a good tactic. Nothing like a good hex dump to get some non CompSci students eyes to glaze over. Sort of like making people take Biology or Physics, but instead of teaching about cells and newtonian motion, jump right into the finer points of quantum mechanics or amino acid chemistry.
On the other hand, for 2nd year CompSci students, Assembly is probably a good thing to get out of the way. It really sucks, for example to take economics for 4 years only to learn at the end "just kidding, reality is too complex to model so these are all just gross oversimplifications." Sort of like thinking programming == Java then finding out how it all _really_ works.
"But actually trying to use m4 as a general-purpose langage would be deeply perverse" --ESR
lesson 1: never listen to radio shack employees, you would probably get better advice from some random homeless people on the street, at least one of them is probably an out of work software engineer these days
lesson 2 (for the grandparent): It could just be that doing basic for 3 years prepared you to do asm, which obviously has a steeper learning curve. Also, you just did the same thing to the great grandparent that the stupid HS teacher did to you, conratulations on being a complete hypocrite.
Perhaps it's time that computer science curriculums start teaching assembly language first.
It's more critical they actually teach computer science first, instead of programming. A new CS hire, assuming their school was worth a damn, can learn a new language. I want to know if they have the math background to understand the problems that will be handed to them and that they have the ability to self-learn.
Computer science isn't "knowing computers on a deeper level." Computer science is algorithms and lots of math. Computer scientists don't care about how a computer works. They don't care about the language either. They are interested in data structures and how to work with them. What language is in use is really unimportant, be it Java or Assembly.
Join Tor today!
I have say that trying to program in low level languages, or worrying about the details of the machine archtecture has usually been (in my experience) counter productive in terms of efficiency.
I'm not saying that there aren't places where low level details aren't critical, but for the most part they just draw attention away from the thing that has the most impact on performance.
Application Architecture.
The choices of algorithms and data structures are far more important than any low level details. But low level details are more fun, and tend to make us feel more manly or guruly or something so we tend to focus on them instead. In practice I find that using low level languages or super optimized tools make it hard to worry about high level structure, so the structure gets ignored.
I once worked on a project in which people were seriously freaking out over the performance hit in using virtual functions while parsing the configuration file.
At the same time, the application (a firewall) was performing multiple linear searches through linked lists of several hundred items per packet. These searches were very carefully optimized, so they had to be fast... (sigh). When I switched the system to use STL dictionaries (and later hashes), total throughput jumped three fold, yet some of the developers were worried about the cost of the templates and virtual functions used.
The fact that the algorithm is more important thatnthe details of implementation is a lesson that everyone (myself included) needs to keep getting pounded into them, because it's so easy to forget.
There are places where assembler and hardware details matter a great deal. But they are usually places that contain a lot of repetition that can't be removed algorithmically. Graphics are the obvious example.
A recent example:
My brother in law gave me one of those boards with pegs in which you try to jump your way down to a single peg remaining. I have no idea what it's called, but anyway....
I decided to be cute, and wrote a 100 line python scrpt over lunch to find all possible solutions. I was suprised when it hadn't found a single solution by the time I was finished eating. I was a lot more suprised when it hadn't found anything by the end of the day.
So I killed it and started in optimizing for performance and tweaking and trying different things. This kept me occupied over lunch for a couple of weeks, but didn't produce anything else. Finally I started doing some analysis of the problem. The first thing I found was that the search space (for the board I had) was roughly 10**18.
I didn't matter how much I tweaked the details of my search, it wasn't going to find very many solutions in less than a century (actually, it looks like a naive full search will take several thousand years).
So, after wasting several weeks of lunch breaks, I have redefined the problem. Find A solution, and rewritten my search to use a heuristic. I finished everthing but the heuristic at lunch a couple of days ago. The new system will take 100 or even a 1000 times as long to perform a jump, but I'm expecting to find a solution before I'm dead.
So, don't get bogged down in the details of an implementation. They won't usually take you very far.
plus-good, double-plus-good