Extensible Programming for the 21st Century
Anonymous Cowardly Lion writes "An interesting article written by a professor at the University of Toronto argues that next-generation programming systems will combine compilers, linkers, debuggers, and that other tools will be plugin frameworks [mirror], rather than monolithic applications. Programmers will be able to extend the syntax of programming languages, and programs will be stored as XML documents so that programmers can represent and process data and meta-data uniformly. It's a very insightful and thought-provoking read. Is this going to be the next generation of extensible programming?"
XML for everything? Sure! All you need is a super-duper Athlon64 5.0ghz to process all the overhead and we can have a very descriptive, very rich programming language with little regard for efficiency.
I don't mean to be a troll, but you have to sacrifice speed and size to allow for this type of flexibility. We're already seeing this with Microsoft's upcoming GUI wrapper Avalon.
A good example is code like this in C++
.
Vector a,b,c;
. .
c = a+2*b;
Written naively the overloaded '+' operator returns a vector object. But I don't want any object returned. I want the code to be expanded in place as
c[0] = a[0]+2*b[0]
c[1] = a[1]+2*b[1]
c[2] = a[2]+2*b[2]
Now you can do this in C++, but look at what you need to implement to do it. The code is a hideous nightmare of template metaprogramming. Of course you can do it in a language like C, but then you lose the ability to express yourself cleanly through code like 'c=a+2*b'.
It would be great, if instead, I could hook into the compiler and tell it exactly how it should handle vectors.
Of course you'd clearly need to document your code well as people reading your code would also be forced to understand the plugins.
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
Instead of treating each new idea as a special case, they allow programmers to say what they want to, when they want to, as they want to.
Is this not the Ultimate goal of programming? The Holy Grail of programming perhaps?
I have a theory that the truth is never told during the nine-to-five hours. -- Hunter S. Thompson
So here I am, coding away merrily, when I run into a *STICKY* problem.
I'm running applications as user X, and need to access data as user Y. I have all the routines and everything written (in PHP) to access the data, but I need to do this as user Y, while accessing the data as user X.
There's just no easy way to do this. You have to use some kind of glue (such as XML), along with parsers, socket connections, pipes, shared memory, and all that jazz just to be able to access data remotely.
Ouch.
What I'd like to see is the concept of a "remote object". Imagine standard OOP, except that a particular object doesn't have to exist in the same memory/process space as the parent.
For example, instantiate an object on a remote server, or as another user on the same server, or at least in a different memory space as the same user & server.
The biggest problem with XML is that it's heavy, very heavy, and requires specialized scripting in order to work.
If you have an class already written that does what you need, you should be able to simply instantiate that object in the context you need it to run in, and then begin using it, COM style.
Obviously, some calls (such as GLOBAL) would be affected or even disabled with such functionality - but can you imagine the benefits?
Ah well. That world doesn't exist, yet.
I have no problem with your religion until you decide it's reason to deprive others of the truth.
If I recall correctly,
Fourth-Generation languages was going to be the future of programming back in the early 80's?
(Machine code, Fortran/Basic-type languages and Pascal/C-type languages being the supposed first, second and third generations, IIRC)
Then in the early 90's.. OOP was going to save the world. Not that it hasn't had impact, but it certainly hasn't fundamentally changed things.
And now it's XML that's going to save the programmers, while the old-timers whine that we should all really be using Lisp.
Not that I'm a computer-language conservative myself, but it's worth pointing out that historically, there has been quite a big discrepancy between which languages the Comp-Sci researchers feel everyone should be using, and the ones which actually are used.
that next-generation programming systems will combine compilers, linkers, debuggers,
...THINK Pascal (for the Mac) was doing this almost 20 years ago: the editor served as the front end to the compiler --- so the syntax highlighting in the THINK Pascal editor was driven by the lexer (really was the lexer): you knew about syntax errors immediately. The debugger was fully integrated into the environment. It was really sweet, and probably one of the best programming environments ever written.
and that other tools will be plugin frameworks
Like Unix pipes and Eclipse?
Tomorrow arrived yesterday and appears today.
And IMHO lisp's syntax has always had a nicer structure than XML's repetitive redundancy.
is nothing but a set of s-expressions that read much nicer in a lisp-like syntax:IMveryHO the big failure of the lisp guys of old was that they were so proud of how many ')' they could put next to each other that it made their code harder to read than necessary. I bet XML would have failed too if it were commonly written(and yes, the _ are just there for /.'s formatting)
This is incredibly stupid. How come XML helps in dealing with data and metadata? Metadata *is* data.
Metadata is, of course, data. Sometimes, a finer-grained taxonomy method is helpful. After all, sausages and uranium are both matter, but calling them matter doesn't help me with my dinner selection.
Mmmmmm - sausage.
It's simple: I demand prosecution for torture.
Gawd, here we go again. I think someone needs to make a new version of Greenspun's Tenth Rule of Programming (all complex system eventually approach an imperfect LISP runtime), something like "all programming languages and environments eventually look like sloppy LISP".
combine compilers, linkers, debuggers
Yup, LISP usually involves an integrated environment for editing, debugging, execution, compiling. But since it's all LISP, it can all be extended, swapped, or componentized at will.
Programmers will be able to extend the syntax of programming languages
Wow, what a *concept*. Not only is this easy in LISP, it's easy in a lot of dynamic languages. In Ruby for instance, you can use blocks to factor out common code sequences, example:
Before:
After:
You LISP hackers are laughing of course, this level of refactoring and more is common with macros. Those of you who don't practice "extreme refactoring" probably don't see the difference.
programs will be stored as XML documents
Uhm, if you treat XML tags as "named parentheses" you have S-expressions (LISP again). XML and S-expressions map onto each other completely. But S-expressions are a lot simpler to program with, so you might as well use them. Sure, they aren't as hip as XML or optimal for document storage, but they are better suited to programming.
programmers can represent and process data and meta-data uniformly
LISP treats everything as list, if you want.
I'm no "smug lisp weenie". I have never used LISP for production code. I use usually use Ruby and Perl. But it's a trade-off, Ruby has better libraries and a single implementation. Lisp has more raw power but isn't as practical.
The point I'm trying to make is that what the author describes ALREADY EXISTS. But I guess it's "cooler" to invent something new than to just perfect something that already exists.
Could somebody please come up with a LISP that has modern stuff like threads, CGI, and process control and is optimized for Unix-like systems, and please get a uniform version pre-installed on Linux and Mac systems so all the LISP folks could finally get some cred?
In my opinion the author tries to do too much. By doing so it is almost difficult to form an opinion on or start a discussion about the article.
As for using XML as source code, I hope the author is kidding. If you use PHP or JSP code in (x)html, you should always keep most functionality outside the page (in separate objects) unless it is directly related to the representation of the page. Otherwise the source will become unreadable very quickly.
Obviously XML can be used to transport the parse tree from one tool to the other. Various proposals have been made to do this already. Using a well defined, easy to handle (and verify) data tranport can really help here. This is really usefull, but has little to do with the representation of a language itself.
I think the a big change will be the use of source code alltogether. A parse tree is actually all that is really needed. This will be edited with a (graphical) editor and accompanying GUI tools (eg Agile code development and GUI design tools). This parse tree will be coded to intermediate language and then translated - at some point - to machine language. These last two steps are already happening obviously.
Now for the graphical representation; leave out those silly restrictions of where to put braces, line delimiters etc. The program shall be shown according to the defaults of the company (overruled by the defaults of the department and then the developer). There is nothing against using graphical editors for literals (a 2x2 table that can be edited as a table e.g.) or comments etc. Evenly spaced ascii has had its 50 years of glory, and I whish it goodbye.
To have all these features the language should be easy to parse however. Putting in macro definitions is stupid, as is putting in operator overloading. They make the language complex to use for both humans and parsers. Templates as used in Eclipse are much more well behaved. Code asists can take case for a lot as well. Emacs etc. have features for those as well, but they have no knowledge about what the code means, making it way more difficult to do things right.
I have the feeling I am still missing some important points, but this terminal like screen is running out of whitespace. I would strongly suggest for everybody to look at Eclipse.org to see what modularization/plugins can do to an IDE. It's free, very easy to install (unix & windows) and will give you an insight in what tomorrow (today) will hold. Even if you don't use Java.
But when it comes to data it just confuses, because taxonomies imply a hierarchy, and hierarchies are hard to agree upon on the first place, are quite arbitrary, and tend to change quite fast and radically.
The relational model already provides a better alternative to taxinomies: attributes. And then, metadata becomes just data.
Leandro Guimarães Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
I very much agree. I think the reason that Java has risen to its status today is because it constrains the programmer to be explicit with what methods are being called upon what data, and thus easier for someone other than the orginal writter to understand what's going on. In Java A=B will do the same thing every time (as far as the programmer is concerned), but in C++, A=B may mean any number of things depending on operator overloading. I think XML has the same advantage over S-expressions that a strongly typed programming language (E.G. Java) has over an untyped language: transparency. Image if HTML was based on S-expressions:
(html (head (title "bla")) (body (bgcolor blue)(b (a (href "http://osdn.com/osdnsearch.pl") (style "text-decoration: none") (fontsize "-2") (face "verdana") Search)))(i Here))))
Sure if you have an editor that will show you matching (), then the above s-expression makes sense, but otherwise it would be a nightmare trying to figure out why Here was not in bold. HTML and XML are much easier for humans to read (and thus debug) because it's clear where in the tree you are editing, just as it's easier to program in Jave because you are forced to be explicit about what data types you are manipulating and what calls you are making.
- Low level developers. People programming in C; the ones writing Linux and KDE.
- Quasi-low level developers. People programming in Java; the one writing much of the business software right now.
- High level developers. People programming in scripting languages, like Ruby, Python, PHP, JSP, Javascript.
The second group is the most visible, because business loves them. The first group is the second most visible, because -- while it isn't as "hot" a technology in Monster -- most of the software we use is written at this level. I suspect that the third group is the one that will goose the business community in the future, and will probably eclipse the second group. I'd guess that this is a submarine technology; you don't see many job postings for Ruby programmers, but a heck of a lot of software is being written in it. Even more is being written in PHP, JSP, and Python.I imagine something like Python or Ruby, or some other high-level language that's easy to write software with, coupled with a decent compiler will be the real winner in the near future. Get some type inferrence for one of these languages, and the ability to compile it (as with Parrot), and group two will mostly go away. Java claims to be a more productive language than C because of higher level features; modern scripting languages are even better at increasing productivity, and their only real limitation is their speed, or lack of it. Just as Java eventually overcame the speed issue, so, too, do I expect some future version of a scripting language.
But, maybe Java will hang in there. If you look at Java 1.5, you see a lot of increased syntactic sugar that has usually been only available in languages like Ruby -- I've heard that this was motivated by similar constructions in C#. Perhaps Java or C# will evolve enough syntactic sugar that hacking out code will be as easy as doing so in Ruby. IMO, it'll take a more radical language change than that provided by 1.5; my biggest complaint about Java these days is that it gets in your way; a large chunk of the code you write for any application is infrastructure, and you write it over, and over, and over (anybody else sick of ActionListeners yet?). I'd like to see the typing system changed to type inferrence... but it is possible.
I doubt, however, that software development is going to evolve into choosing black boxes from a set of tools and plugging them into each other, mostly because to do cover all possible jobs, the framework would have to have access to a huge amount of fine-grained tools, and by that point, you might as well just write the code yourself. Look at the size of the Java APIs. How many packages are there? How many classes? How many methods? This is making our lives, as programmers, easier... how?
* compilers, linkers, debuggers, and other tools will be plugin frameworks, rather than monolithic applications;
For example, see the .Net Microsoft.CSharp Namespace, the System.Codedom namespace to represent code as objects, etc. in the framework class library.
* programmers will be able to extend the syntax of programming languages; and
don't know about extension of languages yet, but the next one is interesting ....
* programs will be stored as XML documents, so that programmers can represent and process data and meta-data uniformly.
take a look at Microsoft's XAML technology -- describing code by using XML. That's the general direction.
I'm sure other technology frameworks have similar things, but I'm not as familiar with those technologies.
Metrowerks Codewarrior is an IDE (and I believe has a commandline tool for processing the project file ala Make) that uses plugin based preprocessors, compilers, prelinkers, linkers, postlinkers, and other tools, which the master project controls execution of (and through a nice GUI, allows easy association of file extensions with their tools and build information). It's been doing this since at least '97.
Hunt your preferred prey at Aliens vs Predator MUD. Join the war at avpmud.com port 4000
Where Wilson goes wrong is in assuming that this kind of environment will be built based on plug-ins. The interrelationships needed between the components to get the required level of functionality are too great. What many people have already noted is that the current Unix environment is in fact based on plug-in development. Editors, debuggers and compilers are modularized as programs, with clean lines of communication between them in the forms of files and streams (which Unix again abstracts to one concept). The limitation of this system lies in the fact that the modules all use their own separate address spaces, and hence each one has to have a private representation of the program. This can't be mitigated by having the separate tools communicate to a central database (this is the most that Wilson's proposal of using XML as the underlying format can accomplish), because then the method of communication would be the limiting factor. Of course, you can use the neutral code-data representation to make the communications between the modules and the database be in terms of sending closures (from reading the paper, I don't think Wilson even considers this), but then you've just designed a single distributed address space, and in the process removed all the encapsulation and modularity advantages of the communication links (not to mention introducing a whole slew of concurrency issues)!
One such integrated system has been built in the past, called Interlisp. Barstow, Shrobe, and Sandewall's book (mentioned above) has a few papers that describe the system, but briefly a few lessons can be distilled from it. First of all, the system itself was an integrated development environment for a dialect of Lisp, where everything was done in one in-core address space: source code (including comments) was represented by data structures in memory, upon which the structure editor (residing in the same address space) operated directly. Code could either be interpreted from the data structure or compiled by the (yes, in-core) compiler. There were several extended packages (besides a Lisp macro-like facility), notably the structure editor and "Conversational LISP," a pseudo-natural language command-prompt parsing system. Although source code (and data) could be serialized to files (there was a sophisticated change-tracking facility that took care of this), the usual way of working was by saving the core image to disk and loading it next session, so the whole environment was persistent. There were hooks for everything from the parser to the compiler to error handling down to the most basic frame-handling code of the stack-based VM, and in order to implement the facilities mentioned above (and some other ones I left out, like the ever-present DWIM automatic error-correction facility) the code took full advantage of them. This caused some trouble when it came to portability of the components and the Interlisp itself (the heavy interdependence caused many problems in bootstrapping the system). Some of these incidents are documented in Barstow et al.'s book, but the Interlisp bootstrapping difficulty has been mentioned in all of the Interlisp porting papers I've read. Unfortunately, I don't think a system with those capabilities can be built with the rescrictions of modularization, since all of the things it did are applicable to programming in any language, and to do them required precisely the
In the great CONS chain of life, you can either be the CAR or be in the CDR.
I've been working on a pet project very similar to this for a couple of years now off and on.
Currently, I'm constructing the editor as a javascript/xul/xbl based application under mozilla (not yet publicly released) and tossing the documents over jabber to a code repository which connects as another client. Other pieces in the suite, such as the compiler, talk over jabber to the repository, helping to ensure modularity.
Why mozilla? It gives me a cross platform editing environment and I can take advantage of the built in xhtml/mathml rendering. (Although, I admit I'm largely hamstrung by the faulty mathml rendering on Mac OS X at the moment)
Why jabber? It serves as a glorified RPC mechanism for exchanging XML document fragments for me. Its primary advantage compared to SOAP, XML RPC, etc, is that I can allow the repository or execution environment to send out updates to the clients, rather than rely on client based polling. After all, in this day and age of everything lying being NAT, you usually can't open sockets to clients directly. It also has the advantage that it makes evolving the platform into a collaboration environment a simple logical progression, rather than something grafted on as an afterthought.
My main interest is in what advantages you derive from allowing a rich text markup language and extensible grammar, and the ability to tag information and retain markup across versions.
A smarter editor allows you to move towards allowing dynamically defined operators, which can have their precedence defined in terms of a partial ordering with respect to one or more existing operator, that way you can red flag during the editing process when something is ambiguous. Superscripts, subscripts, radicals, Riemann sums are allowed by defining small extensions to the grammar in the language and loading them into the editor.
The potential for language tagging comments or method labels for internationalization is nifty, but more than a bit of a Pandora's box.
An XML namespace for version control means the repository can store one document much like a cvs system. By having the editor submit a series of change requests to the repository rather than edit the document directly, integrity is ensured.
Since you have a fairly stable set of tags you can now embed more information for statistical collection, loop counting from debugging compiles. Links to hand- or auto-generated proofs of algorithmic correctness, big-O information, etc.
So, yes, there is a value to storing the data in XML and making the editor smarter.
However, one primary is that any such project has a rather high bar to clear to become even marginally useful.
There are also a number of interesting problems regarding how to handle certain types of code refactoring and traditional text editing operations in this sort of environment.
Sanity is a sandbox. I prefer the swings.
So why not take the python approach and use the indentation as the structure ?
compare
(whatever _(you
want to
do
in (xml xml)
_)
)
(indentation doesn't follow strcture)
with
whatever
you
want to do in : xml
If the goal is to express programs more briefly, presumably at a "higher level", then I don't believe an extensible language alone can solve the problem. After all, we already have languages which are extensible to varying degrees. In my experience, code that maximizes the use of language extensibility tends to be the most cryptic to read, at least until you thoroughly understand all the extensions used. Witness C++ code that makes heavy use of overloading and templates.
A large part of the problem (IMHO) is the slowness of developers in standardizing common data types and programming idioms. I'm not a linguist, but there must be some interesting parallels here with the development of human language. However, there is one important difference: programmers are usually more concerned about getting a computer to understand their programs than getting other programmers to understand them. Thus, there isn't the same motivation that Tak and Nog would have had to agree on the linguistic expressions and social conventions that would allow them to hunt the woolly mammoth and divide up the women.
In fact, experienced programmers do tend to standardize their own use of language extensions, libraries, and program development tools, as this does enable them to program more efficiently. Many software companies do the same, which is probably how they manage to accomplish anything at all. But most software companies place very little priority on creating industry standards (unless, of course, they think they can own them).
What most people don't realize is that most of the developments in programming languages knowingly or unknowingly follow this trend. For example, object oriented programming (a la C++, Java etc etc) is closely related to how our brain sees the world as a hierarchy of objects. Type inference (as in ML) is closely related to how our brain derives useful information from incomplete data.
Computer languages are moving away from small languages to bigger more expressive languages. The evolution of Perl I think reflects this trend. XML etc are just syntactic sugar. They don't reflect any fundamental steps. The difficult parts have yet to be done.
In many ways we are at the beginning of the evolution of computer languages. I think the next 10-20 years are going to be very exciting.
While agreeing with Paul Graham that programming languages represent notation, I don't quite agree that they must evolve as slowly as changes in notation. I think there is a real difference between how notation is used to represent meaning in programming languages and how they are used in human languages. Different domains demand different notations of differing precision
In conclusion, computer or programming languages will evolve to be closer to human languages. There will be specializations of the language for different domains. This is akin to using mathematical notation to describe mathematics rather than write each rlationship out in wordy English (or other language).