Learning Computer Science via Assembly Language

← Back to Stories (view on slashdot.org)

Learning Computer Science via Assembly Language

Posted by CowboyNeal on Thursday February 5, 2004 @11:46AM from the unconventional-approaches dept.

johnnyb writes " A new book was just released which is based on a new concept - teaching computer science through assembly language (Linux x86 assembly language, to be exact). This book teaches how the machine itself operates, rather than just the language. I've found that the key difference between mediocre and excellent programmers is whether or not they know assembly language. Those that do tend to understand computers themselves at a much deeper level. Although unheard of today, this concept isn't really all that new -- there used to not be much choice in years past. Apple computers came with only BASIC and assembly language, and there were books available on assembly language for kids. This is why the old-timers are often viewed as 'wizards': they had to know assembly language programming. Perhaps this current obsession with learning using 'easy' languages is the wrong way to do things. High-level languages are great, but learning them will never teach you about computers. Perhaps it's time that computer science curriculums start teaching assembly language first."

20 of 1,328 comments (clear)

Min score:

Reason:

Sort:

Knuth by red+floyd · 2004-02-05 11:48 · Score: 4, Informative

Isn't that what Knuth did with his ASM language? I believe it was a synthetic assembler for a hypothetical stack machine -- hence the name ASM - Abstract Stack Machine.

--
The only reason we have the rights we have is that people just like us died to gain those rights. -- Cheerio Boy
Re:Linux x86 assembly? by shaitand · 2004-02-05 11:48 · Score: 5, Informative

It is in the same fashion that win32 asm is different from linux asm. The core is the same but knowing the core of x86 assembler is going to get you far if what you are wanting to do is talk to the kernel.
Your book? by Tet · 2004-02-05 11:51 · Score: 5, Informative

A new book was just released
What you meant to say was that your new book has just been released. If you're going to pimp your wares on Slashdot, at least put an appropriate disclaimer on. That said, I completely agree with the premise of the book. I've met a lot of mediocre programmers, and a few good ones. But I've never yet met a real star that didn't have some background in assembly language programming. Personally, I haven't written anything in assembly in well over a decade. But that fact that I can do so if needed makes me a better programmer, and I'd recommend it to any aspiring coder as a key skill to learn. I wouldn't say IA32 is a particularly nice introduction (I'd start with a cleaner, simpler architecture, such as 6502), but it is at least widely available to anyone that wants to study it...

--
"The invisible and the non-existent look very much alike." -- Delos B. McKown
Available under GNU FDL by JoshuaDFranklin · 2004-02-05 11:53 · Score: 5, Informative

I don't know why he didn't mention that this is a free documentation project:
http://savannah.nongnu.org/projects/pgubook/
It's also being used at Princeton
Disclosure: The submitter is the Author. by Wise+Dragon · 2004-02-05 11:57 · Score: 4, Informative

I think the article should have disclosed that the submitter (johnnyb) is also the author of the book, Jonathan Bartlett. So rather than saying "A new book was just released", I would rather see something like "I wrote this new book." Here is johnnyb's website. http://www.eskimo.com/~johnnyb/
Re:Not So New Concept by tealover · 2004-02-05 12:05 · Score: 5, Informative

Since this submission is nothing more than an attempt to hawk his one book, on principle I refuse to buy it. I don't like dishonesty in the submission process. He should have come out and directly admitted that it was his book.

--
-- You see, there would be these conclusions that you could jump to
Re:No, but... by Alan+Shutko · 2004-02-05 12:06 · Score: 4, Informative

Except that things like "i = i + 1" vs. "i++" vs "i+=1" are mostly irrelevant today, since that's a very easy thing for compilers to optimize. And they've been optimizing stuff like that for years.
Try looking at the asm output from GCC at -O2 on those two statements.

Knuth had reasons for using ASM that were a lot better than that. It does give you a better idea of how things are laid out in memory, because you have to do it yourself. It's easier to do detailed performance analysis of algorithms, because you can get exact cycle counts. (Which in turn helps train your intuition, and tell you how to find out from a CPU's instruction set how it does at various things to tune algorithms.) You can look at how cache affects things.

Take a look at his reasons.
Re:Linux x86 assembly? by shaitand · 2004-02-05 12:08 · Score: 4, Informative

Your post makes no sense unless you were confused by my mistype, I meant to say "the x86 core ISN'T going to get you far if what your wanting to do is talk to the kernel". Parts of the kernel ARE in assembler, and the bootloader is largely in ASM.

So in truth, the kernel is the car. Asm can be the road, it can be the engine, it can be the passengers, it can be the wind resistance, it can be virtually any component. But nonetheless, if your writting an application sitting on top of the kernel you are going to need to speak to the kernel's api at some point (or the api of a layer sitting on top of it), just as if your writting a windows application in asm or c, or vb, you need to be speaking to the win32 api.

Asm is no different than any other language, knowing the language is great and all, but it's worthless without learning the proper api's you'll need to actually write a program that does something. That's a major flaw in most programming tutorials. They'll teach C or another language and not mention a single word about the api's one needs to know to actually write a program that does more than calculate pie.
Implementation specific vs. generic... by Cryptnotic · 2004-02-05 12:11 · Score: 4, Informative

A real computer science program will teach generic principles of programming and systems development, with projects that delve into a variety of actual implementations of systems.

For example, a b-tree data structure is fundamentally the same thing whether you implement it in 32-bit ARM assembly language or 16-bit x86 assembly language or C or Java.

To understand how assembly language works, you need to understand how a processor works, how instruction decoding works, how register transfer language works, how clocking a processor makes it accomplish things. To understnad how registers hold values electrically and transfer values between registers you need to understand some physics and electronics.

To understand how a compiler takes a source language and translates it into a target language, you need to understand a little about the kinds of languages computers can understand (Context-Free Languages) and how they can parse them (Context-Free Grammars). Delving into that field will lead to the core theory of computer science, what is possible with machines in general and what is impossible.

A real computer science program at a university will take you through all of these subjects over several years, allowing for some excursions into other things like databases and cryptography. A real computer science program is always theory with projects that are applied to actual implementations.

--
My other first post is car post.
Re:Syntax, OS interfaces... by Anonymous Coward · 2004-02-05 12:14 · Score: 5, Informative

There are two standards, the AT&T ... and the other one

Incorrect. There are at least four different assemblers and standards:

ASM - GNU Assembler. AT&T standard, as commonly used on Linux. The syntax hasn't changed since the 60's - which is both very good and very bad. I personally think it should be retired.

MASM - Microsoft Assembler. Intel standard assembly. The syntax is nice, but there are some ambiguous operators (is [] address of or address by value? - the meaning changes depending on the context). This is typically what the commercial Windows world uses. MASM itself is mostly obsolete - the Visual C compiler can now do everything that it could and supports all modern CPU instructions (even on Visual C++ 6 if you install the latest CPU pack).

NASM - Netwide Assembler. An assembler that set out to put right all the things that were wrong with MASM. The syntax is excellent, ambiguous operators are cleared up, documentation is also excellent, it interoperates beautifully with Visual C on Windows and GNU C on Linux. Ideally NASM would replace AS as the standard now that it's open source.

TASM - Borland Turbo Assembler. Based around the Intel standards, but does things slightly differently. Has extensions which allow for easy object-oriented assembly programming - which can make for some very nice code. Had a MASM compatibility mode, but nobody in their right mind used that if they could help it. I had version 5, but I don't believe they've kept it up to date, so it's obsolete now.

There are a couple of others as well, most notably AS86 (which was the leading independent solution for writing assembler back in the DOS days).
Re:It is not the language, it is the paradigm. by pclminion · 2004-02-05 12:48 · Score: 4, Informative

I would organize those differently:
1. Imperative
-- 1a. Procedural (Pascal/C/BASIC)
-- 1b. Object-Oriented (Eiffel/Smalltalk/Java/C++)
-- 1c. Assembly language
2. Functional-Type
-- 2a. Pseudo-functional (Scheme/Lisp)
-- 2b. Pure functional (Haskell/ML/Pure lambda calculus)
3. Declarative (Prolog)
Imperative languages are based on the execution of individual commands. Fundamentally they are based on the concept of assignment -- moving data from one place to another. Functional languages are based on the evaluation of expressions and the absence of side-effects. Pseudo-functional languages have variables, loops, and side-effects but are mainly based on functional concepts. Declarative languages are based on the concept of goals, and the recursive description of how those goals should be achieved, or the definition of what constitutes achievement of the goals.
I'm not sure why you consider Forth a declarative language. To me it seems more like an imperative language with an unusual syntax.
Re:Syntax, OS interfaces... by larry+bagina · 2004-02-05 12:50 · Score: 5, Informative

those are implementations, not standards.
x86 instructions that deal with 2 data points can be written 2 ways:
instr src,dest instr dest,src
The intel standard (used by nasm, tasm, masm) is dest,src. The ATT standard (used by gas) is src,dest

--
Do you even lift?
These aren't the 'roids you're looking for.
Re:Linux x86 assembly? by Endive4Ever · 2004-02-05 12:53 · Score: 5, Informative

Well, some of us code assembly on bare hardware. We have to roll our own 'api' and include it in there with the rest of the code.

I've worked before with programmers who had little experience in programming 'bare hardware'- they do really foolish things like not initing timers, setting up stack pointers, and the like.

Writing bare ASM code for a processor (where it boots up out of your own EPROM or on an emulator) is good experience in minimalism. It can give you a good feeling when the project is all done and you can say you did it all yourself.

For those interested in getting into this kind of thing, start with a PIC embedded controller and a cheap programmer. You can get PIC assembly language tools for free, and build a programmer, or buy a kit for a programmer, that plugs into your serial or parallel port. Your first PIC machine can be the CPU, a clock crystal, a few resistors and capacitors, and the LED you want to blink, or whatever other intrigues you. If you're not into complex soldering, and/or layout and complex schematics, you can buy pre-etched boards you just plug the PIC into.

Another easy-start processor would be the 68HC11. It has a bootstrap built into ROM. Basically, you can jumper the chip so it wakes up listening on the serial port for code you send down the wire at it, and burns it into the EEPROM memory in the 'HC11 chip itself. Move the jumper and reboot the chip, and it's running your code.

I think this is far more interesting that just writing apps that run on an Operating System you didn't roll yourself.

--
---
ASM is not the place to start. by John+Whitley · 2004-02-05 12:57 · Score: 4, Informative

Perhaps it's time that computer science curriculums start teaching assembly language first.

Having taught an assembly/into computer arch class, I agree with the sentiment that students who get "under the hood" gain valuable knowledge and working skills. Not just pounding ASM, but in learning how the machine works. Point agreed.

Also having taught first year computer science students, and seen how some of academia's transitions in pedagogy affected students... I have to say that the idea of teaching first year students in assembly is friggin' daft.

My reasoning is the same as why I strongly advocated an objects-first teaching model. It is increasingly critical for students to build a strong sense of software design and abstraction early on. This foundation makes students much better prepared to solve problems of many different scales (asm to component-systems) in the long run.

There's evidence from a paper in one of the Empirical Studies of Programmers workshops that this approach does trade off design skills for purely algorithmic reasoning for students at the end of their first year. But my own experience, as well as that of some prominent Comp Sci Education (CSE) folks seems to indicate that this is far more than compensated for as a student's skills grow.

Here's my theory as to why this is the case:
The details of debugging, alogrithmic thinking, and problem solving are very much skill building exercises that really require time of exposure to improve. But it is much more difficult in my experience for students to build good design sense on their own. Once the framework for thinking in terms of good abstractions is laid down, it provides much stronger support for later filling all of those gory low-level details.

Historical perspective: Ironically, this same reasoning is much of why I believe that academia's switch to C++ from languages like Pascal, Modula-2, etc. was an educational disaster for many years. The astute reader is now thinking: "hey, you just said you like objects-first; what up?" In the Procedural Era, many schools wouldn't expose students to C in the first year, as it had too many pitfalls that distracted from learning the basics of algorithmic thinking and important abstraction skills. Once the foundation was put in place, it was okay to swtich 'em to C for the rest of the program.

When C++ and the early object boom really hit, this put on big pressure to teach first year students using C++. At one point in the mid-90's, upwards of 75% of 4-year institutions were teaching their first year in C++. Thus a language that had plenty more pitfalls than C, previously shunned for its pedagogical failings, entered the classroom. Combined with a lack of of proper OO mental retooling on the part of first year instructors and faculty made for something of a skills disaster on a broad scale. At best, students learned "Modula-C" instead of good OO style. At worst, they were so confused by this melange of one-instance classes and sloppy hybrid typing that they didn't get a cohesive foundation whatsoever.
Re:Not the point! by tomstdenis · 2004-02-05 12:58 · Score: 4, Informative

Correct me if I'm wrong but isn't the most primitive CMOS gate a NAND gate? So I highly doubt you would make AND out of and XOR gate [XOR being the more costly of the three].

Tom

--
Someday, I'll have a real sig.
This book by voodoo1man · 2004-02-05 13:43 · Score: 4, Informative

has been available for some time under the GNU Free Documentation License. I tried to use it a while back when I decided to learn assembler, but I found Paul Carter's PC Assembly Language to be a much better introduction.

--
In the great CONS chain of life, you can either be the CAR or be in the CDR.
Re:Somewhere in the middle... by Lord+Ender · 2004-02-05 13:47 · Score: 5, Informative

This was posted to USENET by its author, Ed Nather, on May 21, 1983. A recent article devoted to the *macho* side of programming made the bald and unvarnished statement: Real Programmers write in FORTRAN. Maybe they do now, in this decadent era of Lite beer, hand calculators, and "user-friendly" software but back in the Good Old Days, when the term "software" sounded funny and Real Computers were made out of drums and vacuum tubes, Real Programmers wrote in machine code. Not FORTRAN. Not RATFOR. Not, even, assembly language. Machine Code. Raw, unadorned, inscrutable hexadecimal numbers. Lest a whole new generation of programmers grow up in ignorance of this glorious past, I feel duty-bound to describe, as best I can through the generation gap, how a Real Programmer wrote code. I'll call him Mel, because that was his name. I first met Mel when I went to work for Royal McBee Computer Corp., a now-defunct subsidiary of the typewriter company. The firm manufactured the LGP-30, a small, cheap (by the standards of the day) drum-memory computer, and had just started to manufacture the RPC-4000, a much-improved, bigger, better, faster --- drum-memory computer. Cores cost too much, and weren't here to stay, anyway. (That's why you haven't heard of the company, or the computer.) I had been hired to write a FORTRAN compiler Mel didn't approve of compilers. "If a program can't rewrite its own code", he asked, "what good is it?" Mel had written, in hexadecimal, the most popular computer program the company owned. It ran on the LGP-30 and played blackjack with potential customers at computer shows. Its effect was always dramatic. The LGP-30 booth was packed at every show, and the IBM salesmen stood around talking to each other. Whether or not this actually sold computers was a question we never discussed. Mel's job was to re-write the blackjack program for the RPC-4000. (Port? What does that mean?) The new computer had a one-plus-one addressing scheme, in which each machine instruction, in addition to the operation code and the address of the needed operand, had a second address that indicated where, on the revolving drum, the next instruction was located. In modern parlance, every single instruction was followed by a GO TO! Put *that* in Pascal's pipe and smoke it. Mel loved the RPC-4000 because he could optimize his code: that is, locate instructions on the drum so that just as one finished its job, the next would be just arriving at the "read head" and available for immediate execution. There was a program to do that job, an "optimizing assembler", but Mel refused to use it. "You never know where it's going to put things", he explained, "so you'd have to use separate constants". It was a long time before I understood that remark. Since Mel knew the numerical value of every operation code, and assigned his own drum addresses, every instruction he wrote could also be considered a numerical constant. He could pick up an earlier "add" instruction, say, and multiply by it, if it had the right numeric value. His code was not easy for someone else to modify. I compared Mel's hand-optimized programs with the same code massaged by the optimizing assembler program, and Mel's always ran faster. That was because the "top-down" method of program design hadn't been invented yet, and Mel wouldn't have used it anyway. He wrote the innermost parts of his program loops first, so they would get first choice of the optimum address locations on the drum.

--
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
Re:Linux x86 assembly? by Daytona955i · 2004-02-05 14:13 · Score: 4, Informative

That all depends on what you are doing... if you are doing it for fun then yes, I agree with you... however, if you are a programmer who picked up learn c++ in 24 hours, and now call yourself a coder, you have a lot to learn, and x86 asm might be the place to start.
Just to clarify by Raul654 · 2004-02-05 15:41 · Score: 5, Informative

The correct answers are down there, but just to collect them and clarify - you can build anything using nothing but NANDS. Alternatively, you can build anything using nothing but XORS. You can prove this easily using demorgan's theorem.

However, in the real world, NANDS are cheap (2-3 transistors), so that's what everyone uses.

--

To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
in reality... by slew · 2004-02-05 19:51 · Score: 4, Informative

For what it's worth, they don't use just NANDs in cmos chip design in the real world. The primary primitive is the AND-OR-INVERT (AOI) structure.

In the cmos world, pass-gates are much cheaper than amplifying gates (in the size vs speed vs power tradeoff), although you can't put too many pass gates in a row (signal degradation). So in fact MUX (multiplexor to pass one of the two inputs using the control of a third) and XORS (use input A to pass either !B or B) are used quite a bit.

Some background might be helpful to think about the more complicated AOI struture, though...

In a cmos NAND-gate, the pull-up side is two p-type pass gates in parallel from the output to Vdd (the positive rail) so that if either of the two p-type gates is low, the output is pulled high. For the pull-down side, two n-type pass gates are in series to ground so both n-type gates have to be low before the output is pulled to ground. This gives us a total of 4 transistors for a cmos-nand where the longest pass gate depth is 2 (the pull-down). The pull-down is restricted to be the complement function of the pull-down in CMOS (otherwize either the pull-up and pull-down will fight or nobody will pull causing the output to float and/or oscillate).

A 2-input NOR gate has the p-type in series and the n-type in parallel (for the same # of transistors).

Due to a quirk of semi-conductor technology, n-type transistors are easier to make more powerfull than p-type so usually a NAND is often slightly faster than a NOR (the two series n-types in a NAND gate are better at pulling down than the two series p-types are at pulling up in a NOR gate). However, this isn't the end of the story...

Notice that you can build a 3-input NAND by just adding more p-type transistors in parallel to the pull-up and more n-type in series to the pull-down. You can make even more complicated logic by putting the pull-up and pull-down transistor in combinations of series and parallel configurations. The most interesting cmos configurations are called AOI (and-or-invert) since they are the ones you can make with simple parallel chains of pass transistors in series for pull-up and pull-down.

For most cmos semi-conductor technologies, you are limited to about 4 pass gates in series or parallel before the noise margin starts to kill you and you need to stop using pass gates and just start a new amplifying "gate". Thus most chips are designed to use 4 input AOI gates where possible and smaller gates to finish out the logic implementation.

Thus "everyone" really uses lots of different types of gates (including simple NAND and XORS as well as more complicated AOI).