Domain: iecc.com
Stories and comments across the archive that link to iecc.com.
Comments · 40
-
Re:That's called a compiler. Fortran 1957
With C, we got optimizing compilers that totally rewrite the specification, doing things in a different order, entirely skipping steps that don't end up affecting the result, etc.
We didn't. FORTRAN I was specificially designed with optimization in mind and in fact the first compiler was an optimizing compiler:
https://compilers.iecc.com/com...
But yes, your point is otherwise sound. What is run-of-the-mill compiler optimization today would have been AI in the days of FORTRAN I. Modern code looks nothing like the early machine-level descriptions. I also agree that languages are (and will increasingly become) precise specifications of what we want with the details left up to the compiler.
-
Re:Some non-Knuth suggestions
Regarding the dragon book, I'd like to add that if that's all you know it makes you prone to massive over-engineering. To the point that the likes of XML will seem like a good shortcut to avoid going through all the hassle.
It's not a book, but a useful skill is explained in this little gem, how to build relatively simple and easy input parsers of remarkable power for their simplicity, and suitable not just for toy compilers but also configuration file parsing and general input parsing: Jack Crenshaw's Let's Build a Compiler (when I checked, the site was slow but functional; there are other sources of the same material available). If nothing else, it's also a good fun exercise to work through.
-
Re:except for garbage collection
"Interesting", said a person who hasn't followed C++ so much recently. Does anyone know how easy is it to mix different garbage collection policies in different threads in the same application? Looking at the FAQ it seems this would be pretty nasty? Is it easy to have a real time thread that uses only manual allocate/free and another which might get blocked during garbage collection?
-
Re:My intepretation
While it's true that nothing is stopping someone from making a hardware Java bytecode running solution, the fact that it's possible doesn't make Java a non-interpreted language, any more than the fact that it's theoretically possible to create a hardware BASIC solution makes BASIC a non-interpreted language.
Hardware CPUs that run Java bytecode directly already exist and have done for years.
So Java is no more "interpreted" than C, which was designed for a specific architecture (the PDP), and is also available in interpreted forms and used to compile code which is run as bytecode. You can even compile C to Java bytecode.
Whether something is compiled or interpreted is not a function of the language, nor is it defined based on which CPU instruction set happens to be most popular at the moment.
-
Re:A clear statement about GPL virality
This book might clear up some of your misconceptions.
-
C for yourself, PostScript for NeWS!
The NeWS window system was programmed in PostScript, and was the original "AJAXian" window system, except that it used PostScript instead of JavaScript, PostScript instead of XML, and PostScript instead of DHTML, so it was much more consistent and vastly better designed than JavaScript and AJAX.
Some people didn't prefer programming directly in PostScript, so there were several projects to compile high level languages into PostScript code for NeWS:
In 1987, Dave Singer at Schlumberger wrote LispScript, a Lisp to PostScript compiler for NeWS.
In 1988, Rehmi Post at UniPress wrote C2PS, an C to PostScript compiler based on the Amsterdam compiler kit.
In 1992, Arthur van Hoff at the Turing Institute (the same guy who later wrote the Java compiler in Java) wrote PdB, a C++ to PostScript compiler.
OpenLaszlo compiles the high level Laszlo programming language (which is a combination of JavaScript embedded in XML) into Flash byte codes, as well as JavaScript that runs in web browsers.
This idea has been around for a LOT longer than the term "AJAX", or the JavaScript language.
-Don
-
Re:Linux file & memory management shines
I was only considering functions, not global data structures, but this page appears to indicate that PIC-enabled ELF programs (the default/prevalent configuration on Linux) use a Global Offset Table (GOT) for static data, which is itself referenced through a relative address from the code. Thus, relative addresses are indeed used for both functions and data in shared libraries. According to the page, Windows(tm) actually does something similar for data symbols marked with dllimport, at least in recent versions of Microsoft(tm)'s C compiler, although programmers did formerly have to take care of data relocation manually.
On the other hand, Linux does use copy-on-write for initialized data; I only meant to say that at some point in the application's execution the data would have to be copied (when it is written to), and thus the initialized data section cannot be permanently shared like the executable and constant data sections. However, that wasn't precisely what I said, so thanks for pointing out the discrepancy. -
Re:What other pre-web services are out there?
Without thinking too much about it, the mailing list sf-lovers (aka, morphed into USENET's rec.arts.sf.written) stems from about 1972 or so. When I checked a few days ago, there were still quite a few posters there: http://w3.aces.uiuc.edu/AIM/scale/nethistory.html
The RISKS list dates from 1985 or so: http://catless.ncl.ac.uk/Risks/
The comp.compilers group goes back to 1986 or so: http://compilers.iecc.com/ -
Re:Thin Clients, Fat PocketsNeWS was developed a long time before Java, by the same person working for the same company: James Gosling at Sun.
NeWS used PostScript throughout, as the imaging model (like DHTML), the scripting language (like JavaScript) and the data model (like XML). It was like AJAX in that it sent asynchronous messages over the network and used a dynamic scripting language on the client side (called the NeWS server), so it could implement local graphical user input feedback, and efficient application specific network protocols (using a binary encoding for PS data).
NeWS was much more consistent and better designed than AJAX's amalgamation of accidental technologies (DHTML, JavaScript, XML). NeWS also has many other advantages over AJAX, such an excellent imaging model, wysiwyg printer compatibility, shared modules, multithreading, synchronization, a programmable event distribution system, a fully developed Open Look gui toolkit, and graphical interface builder (HyperLook).
Writing NeWS PostScript is a lot like directly programming byte code for the Java or Flash virtual machines, which are both object oriented stack machines a lot like PostScript and Forth. At the time, we were well aware that many people had a hard time programming in PostScript directly (although I love it), so several interesting compilers were developed. Rehmi Post wrote a back-end to the Amsterdam Compiler Kit (CScript: C for yourself, PostScript for NeWS), Arthur van Hoff (who later wrote the Java compiler in Java) wrote PdB (Pretty darn Brilliant), a compiler that translated object oriented C into PostScript , which supported subclassing PostScript NeWS toolkit classes. Dave Singer at Schlumberger wrote LispScript, a Lisp to PostScript compiler, which allowed you to take full advantage of Common Lisp macros to develop PostScript programs!
OpenLaszlo is a high level XML/JavaScript based programming language, which compiles into Flash byte code that runs in the Flash player, and works exactly the same across all platform. The inner loops and hot-spots of Laszlo are hand written in "flasm" (Flash Assembler), as hand optimized alternatives to the compiled JavaScript code. (Laszlo is a JavaScript compiler that currently outputs SWF code, but will support other virtual machines in the future.) Flasm looks a lot like NeWS PostScript code, with all the stack comments. Laszlo is open source, so you can grab a copy of the LPS sources and look at "LaszloView.as" to see what I mean.
-Don
-
Re:You don't need new standards
C is one of the simplest syntaxes among the popular compiled languages. A proper, complete recursive descent parser could be implemented in a day by a person versed in writing parsers. I'm curious as to what you think is so complex or hard about it.
Here are a few examples of the difficulties in parsing C. Obviously it's not impossible to write a parser, but there are a lot of tricky cases that can bite you if you're not careful. One such example (described here) is this:
(b)-(c)
Does that say "subtract c from b", or does it say "negate c and cast it to b"? Depends on whether b is a typedef. -
Compacting conservative GC
See here.
Anything above writing your own heap allocator over raw OS calls is "a crutch". It's merely a matter of choosing the right crutch.
File handles and network connections? That's none of the GC's business. The algorithm deals with that stuff - GC is there to save you the hassle of asking "who owns this object to delete it" and "is this pointer still good". GC lets you nest function calls in the knowledge intermediate objects won't leak. It lets you grab an arbitrary object and point to it from a data structure without worrying it will become invalid. It's not there to do shutdown housekeeping, that's a C++ism. -
Re:Dependency Hell vs DLL Hell: Call for submissio
Oops. Sorry. After writing my last reply, I noticed that ELF does standardize a SONAME for linking which includes some versioning information, and which can be different from the actual filename it resolves to.
Of course, I suppose it could be argued that the SONAME resolution is not part of Linux since it is up to each executable to implement it. However, since most execs implement it by linking to the standard ld.so, I'm not sure the distinction is that significant.
So if versioning conflicts are not the problem with Linux, why are there still so many incompatibilities and why are packages often restricted to a single version of a single distro? -
Re:my $0.02 after a couple compiler classes
apparently antlr does python (first entry) too. i stick by my claim that lex/bison are really antiquated (you have to do some majorly crufty and poorly documented stuff to make bison output thread safe, for example). lalr is a powerful approach to parsing. it's by no means the only approach, or, for that matter, necessarily the best approach. antlr is capable of compiling a superset of the languages that lalr tools (specifically yacc/bison) are capable of ( see here). does the average joe care about the difference betwee ll(k) and lalr(k)? probably not. still, theoretically, antlr is a more powerful tool than yacc/bison
-
Don't extend. Its overrated.
Honestly, its easier to write a recursive descent parser by hand for a programming language than you think, and interpreters are ridiculously easy unless you're worried about making it fast, which is way overrated too. It mattered with 640KB of RAM at 20MHz, but these days, its just stupid to care unless you notice its insanely slow.
First off, if you've not found this link: http://compilers.iecc.com/crenshaw/, then I recommend you start with it. While its about writing a compiler, it really help make parsing much clearer.
Scheme is a good language to check out if you want to start with another design(a scheme interpreter can be written in a few hours, even in C, if you're slick, even if you're not, it would be short project to get 90%).
Some other reference material: Parsing Techniques(free online). Also: Modern Compiler Design by the same guys and well worth the investment. Concepts, Techniques, and Models of Programming Languages, teaches kernal theory of language design, and may open your mind to some other techniques you may not be aware of.
Checking out the archives on Lambda The Ultimate would be wise too. Also, if you're in Boston on December the 4th, you might check out the Lightweight Languages Workshop at MIT.
-
And it's easy to implement
If you want to build your first compiler, Pascal-style languages are a good place to start. They are amenable to recursive descent parsing.
I strongly recommend Jack Crenshaw's (free) introduction.
I seem to remember that the compiler is written in Pascal. I translated it to C as a I went along. You could always use GNU Pascal (That's a google link, because the site seems to be refusing connections. Could that be related to this FPP?) -
Re:full C compatability?
we won't (realistically) be able to turn off the garbage collector, which means that we won't be able to write real-time programs, and it'll even be touchy writing programs, such as, oh, audio or video players, that require near real-time performance. (Not to mention the disappointment we all felt with the various java window-widget APIs (AWT, Swing) that looked great but couldn't run fast enough to respond to the mouse.)
Yeah, nice FUD. Java is slow because it's bytecode, not because it's garbage collected. (Incidentally, all the Swing applications I've used recently have been every bit as responsive as I could desire, so it isn't even necessarily slow.)
So why complicate things with garbage collector and tracking down circular references...
BOOM! And you reveal that you don't know what you're talking about. Circular references are irrelevant to any GC scheme more sophisticated than reference counting. They simply have nothing to do with it.
I suggest you read this. Pay particular attention to the bits that explain things like "Modern garbage collectors appear to run as quickly as manual storage allocators (malloc/free or new/delete)," and "for very many applications modern garbage collectors provide pause times that are completely compatible with human interaction. Pause times below 1/10th of a second are often the case," and "Does garbage collection cause my program's execution to pause? Not necessarily.".
Then come back and make some informed comments, instead of spouting nonsense. Thank you. -
Everything old is new again...
An interesting trck, but it has been thought of before...
A more interesting question is why you wouldn't rather just use C on these various devices, which by their nature are constrained and lend themselves to code that squeezes all you can get out of them.
A C to .Net bridge won't help you if there's some native feature of the device with no Compact.Net library support.
And then of course there are the number of devices that support Compact.Net... wouldn't you be better off finishing up that C->Java compiler so you could write bytecoded C for things like the blackberry or sidekick or Treo?
Seems kinda astroturfy to me. -
Re:Not true.
You challenge me to find an architecuture that has a non-power-of-2 word size? You haven't been around very long, have you?
Quick scan of google: PDP-10 emulator. The PDP-11 also had some interesting word size limitations: PDP-11 addresses were 16 bits, limiting program space to 64K, though an MMU could be used to expand total address space (18-bits and 22-bits in different PDP-11 versions). I see that an early design by Seymour Cray was 60-bit. You probably also know that the Itanium has a variable instruction bundle size.
Here is a link on porting gcc, including a warning on the word size. Not the best evidence, but it will have to do.
-
Re:MOD PARENT FLAMEBAIT, as wellI see you don't know the difference, yourself.
http://www.answerbag.com/q_view.php/948 http://compilers.iecc.com/comparch/article/93-08-
0 96Essentially there is no difference. Most so-called scripting langauges are turing-complete. Generally, scripting languages are interpreted and more application-specific, while programming languages are compiled and general-purpose. But considering the power of most scripting languages, they can be used for general-purpose programming, so the difference is more in how you use them than in anything else.
-
Re:I'd like to take this oppertunity..
I think the point is often made, at least in C++ circles, that a mandated GC in standard C++ would be non-real-time, because a real-time GC has worse average performance (like you said) and is very OS-dependent.
Were such a thing to be mandated in C++, I suspect the standard would simply specify the APIs and semantics without touching on performance, just as they do with new/delete and malloc/free. However, I don't think making GC part of standard C++ would be a good idea, anyway. The language isn't well suited to GC, mainly because the programmer can play all kinds of pointer tricks, which means that any GC for C++ must be conservative (has to assume that any pattern of bits that might be a pointer *is* a pointer) and non-copying (meaning it's not allowed to move stuff around in memory, because tracking down and updating all of the references would be slow and error-prone).
I think GC is a useful add-on to C++, though, and it's not really necessary to make it part of the standard for it to work. The relatively small performance degradation you're likely to get from a conservative, non-copying collector vs. malloc/free is often well worth the savings in programmer effort and the increase in reliability.
I'm not sure you could implement a real-time GC for current Linux 2.4 for example. It would require kernel support, and so it is fair to compare malloc/free vs. non-real-time GC because that is realistically what developers would face.
I don't think there's any difference between GC and malloc/free in that regard. Kernel support isn't necessary at all for GC, although I have to wonder if there's anything that could usefully be done... Normally, though, a GC-based system requests large chunks of memory from the OS (via sbrk(), etc.) and then parcels it out as needed, just like malloc does. This is a good thing performance-wise in both cases -- switching to kernel mode for every memory allocation would be tremendously expensive.
Anyway, if you need hard real-time guarantees on Linux, you'd better get a Linux patched up for real-time, and you'd better use a real-time memory manager, whether malloc/free or GC.
Even on 64-bit architectures? Even with lots of allocations (hand-written lists, etc)? I guess it's possible but non-trivial, and a search is always at least O(log N), right?
I can't really see why a 64-bit architecture would matter at all. In fact, I'd think a conservative GC would end up being less conservative, since the space of potentially valid pointers would be such a small fraction of the total 64-bit space.
As to the cost of a search... in the worst case I think it's linear with respect to stack size + size of reachable heap objects. In practice, however, people have come up with all sorts of really clever ways to reduce the effort required. I'm no expert on this (though I have a casual interest that used to motivate me to read papers), so I can't really describe the techniques. There's some good information here.
One that I think is cool, though, is generational copying collection. Basically, it exploits the fact that most objects are allocated and then die very quickly, with only a few being retained "long-term". So, what they do is allocate all new objects in a "young object" pool. When a GC happens, they move all of the objects that are still live into another pool and then just mark the entire "young object" pool as free. You can see how this makes allocation and deallocation of short-lived objects very, very cheap, since the young object pool has no real structure -- just a pointer that indicates where the free space begins; allocation always grabs memory from there, with no worries about finding a block large enough, or updating lists or flag bits or anything. The copying is a bit of overhead on the deallocation side, of course, but it's usually dominated by the savings of not having to examine all of those individual chunks, update free l
-
Richard Stallman contributed tons of code[...] I can't recall any software he's written other than GNU Emacs.
I believe that Richard Stallman wrote most of the original GNU C compiler, although it was derived partly from a portable optimizer from a 1978 Univeristy of Arizona research project.
"GNU `diff' was written by Mike Haertel, David Hayes, Richard Stallman, Len Tower, and Paul Eggert."
"GNU Make was written by Richard Stallman and Roland McGrath."
"Richard Stallman was the original author of GDB, and of many other GNU programs."
-
Also wrote "Let's Build A Compiler" series...
Mr. Crenshaw is also the author of the popular Let's Build a Compiler series of articles a while back.
These articles don't go into a lot of the complicated stuff that's involved in modern compiler design-- Crenshaw keeps it simple, keeps it straightforward, and still produces a working (if not optimizing) compiler by the end of the second or third article.
No, it won't let you code a C compiler that will beat the pants off of gcc or Borland's latest offering, but the end result is pretty useful. -
Link & More
That's it. For those who want the quick link for the Let's Write a Compiler, right here (http://compilers.iecc.com/crenshaw/.) I really hope that Crenshaw might write again about compilers. I agree with the Pascal and 68k part -- they're old, and even some of the approach taken by the tutorial is probably not up to speed with modern practices. But hey, at least it gives a good historical account.
-
compiler design books and resources.
Well...the Dragon book for starters, as mentioned earlier. That's probably the ur-source for most of the theory behind the magic. Makes my head hurt, though.
Terence Parr's book, Practical Computer Language Recognition and Translation (out of print). His doctoral dissertation is a useful thing too (try the Purdue University library).
comp.compilers is another useful resource. It's archived at http://compilers.iecc.com.
Alan Holub's Compiler Design in C is a classic.
The ACM's SIGPLAN ("Special Interest Group On Programming Languages") and it's journal SIGPLAN Notices of the ACM are all fine resources. So is ACM Transactions on Programming Languages and Systems.
Don't forget the IEEE as well.
Not to mention Abelman and Sussman: Structure and Interpretation of Computer Programs.
The garbage collection page is a good source for information on memory management and garbage collection.
Your university's library is another good resource.
Well. That should keep you out of trouble.
-
Compilers and Parsing...
Compilers and parsing seem to go hand-in-hand. With that in mind, here's a link to Jack Crenshaw's How to Build a Compiler
-
Re:Missing From The List
So I guess that Word 97 uses a copy collector?
-
Re:Not a beta, -1 wrong
Announced on the garbage collection list on the 28th of March, which shows you that at least if
/. isn't paying attention some people are...
* 2002-03-28 14:29:32 Microsoft release Shared Source CLI (for .NET) (articles,microsoft) (rejected) -
Re:Not a beta, -1 wrong
Announced on the garbage collection list on the 28th of March, which shows you that at least if
/. isn't paying attention some people are...
* 2002-03-28 14:29:32 Microsoft release Shared Source CLI (for .NET) (articles,microsoft) (rejected) -
Re:Self compiling and newbie Slashdot readers
It's a kind of proof that the compiler actually works
Its also necessay step if you're creating your language from scratch. For more precisions try : comp.compilers
For your abbreviation problem dry your tears and try http://www.everything2.com -
POP to relay
There are mail systems that can open up a relay for a specific IP after a successful POP login from said machine. It stays open for a little while, and then closes.
A quick Google search for "pop temporary relay" finds us this page which will make sendmail work this way.
Problem solved? I doubt it. -
Living in the 70's my friend
That isn't to say garbage collection is necessarily a bad thing -- it's good for security and portability (the two things Java really aims for) since it eliminates the need for all those nasty pointers. But it's the main reason C++ code can runs circles around similar Java code. And doing native compilation won't help the situation any.
Garbage collection is not by definition slow and is generally the same as new/malloc -- although lots of people seem to think it is alot slower. For non-gui code and if your doing pure Java code (i.e. no JNI) the performence is rougly comparable to C++ (assuming you have a good JVM). -
Re:The end of gcc 'cause intel's compiler is faste
Actually, GCC's speed sucks, at least in part, because software patents suck. See for example Compiler Patents, or register allocation patent, or graph-coloring algorithm a nonoption. Then complain to your congressperson (or non-US equivalent).
-
Re:The end of gcc 'cause intel's compiler is faste
Actually, GCC's speed sucks, at least in part, because software patents suck. See for example Compiler Patents, or register allocation patent, or graph-coloring algorithm a nonoption. Then complain to your congressperson (or non-US equivalent).
-
Re:Slashdot Boggles Me Again...No wonder the software industry is such a mess. I've seem CS *GRADUATE* students who couldn't use malloc(). Note that I did not say "who use malloc() wrong - no, these students could not even figure out how to call malloc() nor explain what it did. There's something strange happening (I call it cheating) when someone can graduate with a CS degree yet never use dynamic memory
Maybe it's because a lot of school are favoring Java as a language to learn? Not that it's all bad, there are a lot of reasons garbage collection isn't all evil. Still, I agree with you; they should be at least aware of what malloc() is, and what it does. -
Re:From a small isp perspective..
The whole we were placed on these lists was not due to anyone complaining about spam originating or being relayed from our server, but just because it had an open relay.
Shoulda used POP before SMTP: only allow SMTP access to IP addresses that have successfully authenticated with POP in the last few minutes. Since most email programs automatically check for new messages the moment they start, and repeatedly check every few minutes thereafter, legitimate users don't even know you are filtering. To the best of my knowledge, lots of ISPs have good success with this technique.Don't blame the MAPS/ORBS because you can't deploy a trivial and obvious technical solution that thousands of other people use.
Most of these Blackhole lists do send a message back to the person trying to send the mail, and they often portray admins who run open relays as evil spammers or complete morons. Neither of these is true.
Anyone who operates a high-gain publicly-accessible network data amplifier *IS* either evil or a moron. Smurf amplifier, 0WN3D unix box, open mail relay, who cares. LART 'em till they glow then shoot 'em in the dark.(Not that I'm not sympathetic to the difficulties of being a small ISP: I just don't think there's any excuse for operating an abusable data amplifier.)
-
Early BASIC *was* compiledFortran is a compiler, which turns commands into machine code once, rather than an interpreter like early BASIC, which has to interpret commands into machine code every time the program is run. These days BASIC (such as the dreaded M$ Visual Basic) can be compiled as well.
The first BASIC system, the one written by Kemeny and Kurtz at Dartmouth, was a compiler. It was felt (rightfully so) that this was needed in order to make the system fast enough to be usable. For whatever reason, this is often overlooked nowadays, and many people assume that BASIC compilers started with VB 5.
References:
A History of BASIC (Jones Telecommunication & Multimedia Encycolpedia)
BASIC (Wikipedia)
Re: Scripting vs. Programming language vs. 4GL? (comp.compilers article by David Wright)
-
Re:So?
One more thing, for the benefit of other readers: If you search for "AboveNet" or "Above.Net" in the spamtools archive at http://wx.iecc.com/cgi-bin/spamtoolsearch, I think it's pretty clear that the only defense offered for AboveNet's activities is that they have written their Acceptable Use Policy to allow it (again, it's just their fig leaf excuse).
-
Why a new VM?
-
Re:Profoundly counterintuitive?
First off, there are profound performance differences that can be acheived simply by optimizing for the Pentium Pro and not the Pentium, see The Twofish Implementations. This is mainly just instruction ordering issues- possible to do in a JIT. There's more to optimization anymore than simple cycle counting.
Second, memory management is a different beast under object oriented programming than under procedural programming. The holy grail of object oriented programming (as it were) is black box reuse- allowing a class maintainer to modify how a class works without those using the class needing to change their code. Not having garbage collection in an OO language either disallows true black-box reuse (which is a maintainability issue), forces the object to implement it's _own_ garbage collection (such as the C++ STL template basic_string does- and this is generally the worst sort of garbage collection, such as basic_string's reference counting), or forces the program to leak memory like a seive. If those are the options, then garbage collection is not too high a price to pay.
Especially when you consider that GC isn't that high of a cost. _Every_ memory management scheme that isn't static in nature that I've seen is O(n). Either on the allocation or on the free. Sooner or later you have to search for a hole. GC doesn't remove this penalty, but instead it allows you to _defer_ it until a better time. Since most programs spend most of their time waiting for input, there are lots of places where garbage collection can be free because the program isn't doing anything else.
Last, but not least, I'd like to point you at the The GC FAQ, a good resource for all things GC related. Highly recommended reading- after it, it was "intuitively obvious" that the world was flat, that the sun went around the earth, and that time didn't slow down as you went faster.
-
Memory leaks
Given the ubiquity of memory leaks, why don't more programs use garbage collection? The idea being not to rely on garbage collection, but to let it clean up after your mistakes, because you can't find them ALL. The Boehm-Demers-Weiser conservative garbage collector is such a package for C/C++. (And before everyone jumps all over GC, please read this portion of the GC faq)