Learning Computer Science via Assembly Language
johnnyb writes "
A new book was just released which is based on a new concept - teaching computer science through assembly language (Linux x86 assembly language, to be exact). This book teaches how the machine itself operates, rather than just the language. I've found that the key difference between mediocre and excellent programmers is whether or not they know assembly language. Those that do tend to understand computers themselves at a much deeper level.
Although unheard of today, this concept isn't really all that new -- there used to not be much choice in years past. Apple computers came with only BASIC and assembly language, and there were books available on assembly language for kids.
This is why the old-timers are often viewed as 'wizards': they had to know assembly language programming. Perhaps this current obsession with learning using 'easy' languages is the wrong way to do things. High-level languages are great, but learning them will never teach you about computers. Perhaps it's time that computer science curriculums start teaching assembly language first."
It is in the same fashion that win32 asm is different from linux asm. The core is the same but knowing the core of x86 assembler is going to get you far if what you are wanting to do is talk to the kernel.
What you meant to say was that your new book has just been released. If you're going to pimp your wares on Slashdot, at least put an appropriate disclaimer on. That said, I completely agree with the premise of the book. I've met a lot of mediocre programmers, and a few good ones. But I've never yet met a real star that didn't have some background in assembly language programming. Personally, I haven't written anything in assembly in well over a decade. But that fact that I can do so if needed makes me a better programmer, and I'd recommend it to any aspiring coder as a key skill to learn. I wouldn't say IA32 is a particularly nice introduction (I'd start with a cleaner, simpler architecture, such as 6502), but it is at least widely available to anyone that wants to study it...
"The invisible and the non-existent look very much alike." -- Delos B. McKown
http://savannah.nongnu.org/projects/pgubook/
It's also being used at Princeton
Since this submission is nothing more than an attempt to hawk his one book, on principle I refuse to buy it. I don't like dishonesty in the submission process. He should have come out and directly admitted that it was his book.
-- You see, there would be these conclusions that you could jump to
There are two standards, the AT&T ... and the other one
Incorrect. There are at least four different assemblers and standards:
ASM - GNU Assembler. AT&T standard, as commonly used on Linux. The syntax hasn't changed since the 60's - which is both very good and very bad. I personally think it should be retired.
MASM - Microsoft Assembler. Intel standard assembly. The syntax is nice, but there are some ambiguous operators (is [] address of or address by value? - the meaning changes depending on the context). This is typically what the commercial Windows world uses. MASM itself is mostly obsolete - the Visual C compiler can now do everything that it could and supports all modern CPU instructions (even on Visual C++ 6 if you install the latest CPU pack).
NASM - Netwide Assembler. An assembler that set out to put right all the things that were wrong with MASM. The syntax is excellent, ambiguous operators are cleared up, documentation is also excellent, it interoperates beautifully with Visual C on Windows and GNU C on Linux. Ideally NASM would replace AS as the standard now that it's open source.
TASM - Borland Turbo Assembler. Based around the Intel standards, but does things slightly differently. Has extensions which allow for easy object-oriented assembly programming - which can make for some very nice code. Had a MASM compatibility mode, but nobody in their right mind used that if they could help it. I had version 5, but I don't believe they've kept it up to date, so it's obsolete now.
There are a couple of others as well, most notably AS86 (which was the leading independent solution for writing assembler back in the DOS days).
x86 instructions that deal with 2 data points can be written 2 ways:
instr src,dest
instr dest,src
The intel standard (used by nasm, tasm, masm) is dest,src. The ATT standard (used by gas) is src,dest
Do you even lift?
These aren't the 'roids you're looking for.
Well, some of us code assembly on bare hardware. We have to roll our own 'api' and include it in there with the rest of the code.
I've worked before with programmers who had little experience in programming 'bare hardware'- they do really foolish things like not initing timers, setting up stack pointers, and the like.
Writing bare ASM code for a processor (where it boots up out of your own EPROM or on an emulator) is good experience in minimalism. It can give you a good feeling when the project is all done and you can say you did it all yourself.
For those interested in getting into this kind of thing, start with a PIC embedded controller and a cheap programmer. You can get PIC assembly language tools for free, and build a programmer, or buy a kit for a programmer, that plugs into your serial or parallel port. Your first PIC machine can be the CPU, a clock crystal, a few resistors and capacitors, and the LED you want to blink, or whatever other intrigues you. If you're not into complex soldering, and/or layout and complex schematics, you can buy pre-etched boards you just plug the PIC into.
Another easy-start processor would be the 68HC11. It has a bootstrap built into ROM. Basically, you can jumper the chip so it wakes up listening on the serial port for code you send down the wire at it, and burns it into the EEPROM memory in the 'HC11 chip itself. Move the jumper and reboot the chip, and it's running your code.
I think this is far more interesting that just writing apps that run on an Operating System you didn't roll yourself.
---
This was posted to USENET by its author, Ed Nather, on May 21, 1983.
A recent article devoted to the *macho* side of programming
made the bald and unvarnished statement:
Real Programmers write in FORTRAN.
Maybe they do now,
in this decadent era of
Lite beer, hand calculators, and "user-friendly" software
but back in the Good Old Days,
when the term "software" sounded funny
and Real Computers were made out of drums and vacuum tubes,
Real Programmers wrote in machine code.
Not FORTRAN. Not RATFOR. Not, even, assembly language.
Machine Code.
Raw, unadorned, inscrutable hexadecimal numbers.
Lest a whole new generation of programmers
grow up in ignorance of this glorious past,
I feel duty-bound to describe,
as best I can through the generation gap,
how a Real Programmer wrote code.
I'll call him Mel,
because that was his name.
I first met Mel when I went to work for Royal McBee Computer Corp.,
a now-defunct subsidiary of the typewriter company.
The firm manufactured the LGP-30,
a small, cheap (by the standards of the day)
drum-memory computer,
and had just started to manufacture
the RPC-4000, a much-improved,
bigger, better, faster --- drum-memory computer.
Cores cost too much,
and weren't here to stay, anyway.
(That's why you haven't heard of the company, or the computer.)
I had been hired to write a FORTRAN compiler
Mel didn't approve of compilers.
"If a program can't rewrite its own code",
he asked, "what good is it?"
Mel had written,
in hexadecimal,
the most popular computer program the company owned.
It ran on the LGP-30
and played blackjack with potential customers
at computer shows.
Its effect was always dramatic.
The LGP-30 booth was packed at every show,
and the IBM salesmen stood around
talking to each other.
Whether or not this actually sold computers
was a question we never discussed.
Mel's job was to re-write
the blackjack program for the RPC-4000.
(Port? What does that mean?)
The new computer had a one-plus-one
addressing scheme,
in which each machine instruction,
in addition to the operation code
and the address of the needed operand,
had a second address that indicated where, on the revolving drum,
the next instruction was located.
In modern parlance,
every single instruction was followed by a GO TO!
Put *that* in Pascal's pipe and smoke it.
Mel loved the RPC-4000
because he could optimize his code:
that is, locate instructions on the drum
so that just as one finished its job,
the next would be just arriving at the "read head"
and available for immediate execution.
There was a program to do that job,
an "optimizing assembler",
but Mel refused to use it.
"You never know where it's going to put things",
he explained, "so you'd have to use separate constants".
It was a long time before I understood that remark.
Since Mel knew the numerical value
of every operation code,
and assigned his own drum addresses,
every instruction he wrote could also be considered
a numerical constant.
He could pick up an earlier "add" instruction, say,
and multiply by it,
if it had the right numeric value.
His code was not easy for someone else to modify.
I compared Mel's hand-optimized programs
with the same code massaged by the optimizing assembler program,
and Mel's always ran faster.
That was because the "top-down" method of program design
hadn't been invented yet,
and Mel wouldn't have used it anyway.
He wrote the innermost parts of his program loops first,
so they would get first choice
of the optimum address locations on the drum.
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
The correct answers are down there, but just to collect them and clarify - you can build anything using nothing but NANDS. Alternatively, you can build anything using nothing but XORS. You can prove this easily using demorgan's theorem.
However, in the real world, NANDS are cheap (2-3 transistors), so that's what everyone uses.
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton