Are Extensible Programming Languages Coming?
gManZboy writes "Programming writer and instructor Greg Wilson is proposing that the next generation of programming languages will use XML to store not only such things as formatting (so you can see indentation your way, and I can see it my way, via XSLT) but even programmatic entities -- like: <invoke-expr method="myMethod"><evaluate>record</evaluate></invoke-expr>. Wacky, but perhaps wacky enough to be possible?"
It's called XAML. It is not a programming language, it is a declarative way to control the user interface of a client application. It's nothing new conceptually, just jumping on the XML bandwagon. You can read more about it on this MSDN Blog.
Oh, and there are already commercial clones of it out, even though it won't be released until Avalan/Longhorn timeframe.
Three and a bit years ago, as a satire on the absurd over-enthusiasm for all things XML that was then taking over the world, I invented a parody language, XMC. Guess what? The over-enthusiasm for XML has continued unabated and now has taken over the world. And so life imitates art.
Herewith, a sample XMC program:
Exercises for the reader:
1. What does this do?
2. Is it easier to read than the corresponding C program?
--
What short sigs we have -
One hundred and twenty chars!
Too short for haiku.
Actually, this is a potentially good use of XML.
Bear with me.
The primary benefit of XML is that as a standardized language, standardized parsers can be made available that are reasonably easy to use.
The primary "oversell" of XML is to extend that claim to cover semantics. Human readability is great and should not be underestimated in some uses, but should not be oversold either.
So let's say you want to write a tool to validate that some Java code conforms to something or other. What's the hardest part of that task, that shouldn't be hard? Parsing the Java code. I mean, it's not like it's impossible for someone skilled enough to even consider writing such a tool in the first place, but it is a legitimate challenge; it's very easy to get 90% correct but that last 10% can be a bitch.
And let's not even talk about trying to parse Perl.
Exposing the AST in XML is not a bad idea; it's a perfect match of the two technologies. You still have semantics to deal with, but we're a ways from being able to standardize those.
Now, as LISP has shown, programming straight to the AST, while it has its advantages, is a historical loser. But that doesn't make exposing it in a standard fashion a Bad Idea. On top of it, you layer a more conventional language... and as strongly as I am defending an XML-exposed AST, I even more strongly believe it would be a stupid, fatal mistake to make that the human-exposed language that one actually programs in. I've worked with a couple of things that only come close (XBL and XUL Templates, where the primary flaw of the latter is that it should be a language and isn't) and a thing that is an XML language (XSLT) and I think it was a grave error on both counts.
Even better if multiple languages could share one standard or at least semi-standard AST; sharing semantics has proven to be a non-starter I think but there's still hope at the syntax level. (.Net shows that trying to share semantics means you end up with five languages that are just different spellings of the same thing, only without the convenient AST in XML to work with.)
This isn't a pointless abstraction, but it is something that will probably only be useful on big projects. I say "probably" because if a language with this capability ever got off the ground, there might be interesting things that could be done that we've not even thought of.
> I love C syntax, but it's stale and there are many things we can improve upon.
The problem with C is not the syntax; it's the semantics of the language that
are stuck in the twentieth century. Sure, there are improvements that could
be made to the syntax, but good syntax doesn't make a language good. C needs
semantic things: automatic memory management, dynamic length strings, garbage
collection, numeric types that automatically promote as needed, context-aware
functions, mixins, unicode strings that automatically keep track of their
encoding and know what a grapheme is, that sort of thing.
Cut that out, or I will ship you to Norilsk in a box.
I personally do not understand what the entire hype about XML is, or even specifically what problem it is supposed to solve. My understanding is that there was a big push for XML because of a perceived need for open document formats. The idea being that binary formats were proprietary, closed and non-portable.
If this is the problem XML intends to solve, then I feel it is a miguided effort. Binary formats are "closed" only in so far as we do not have access to the source of the program that created them. Once that source is available, binary file formats are open, portable, and a hell of a lot more space efficient than XML. JPEG is a binary file format, yet we have open standards and the committee who designed it released open source reference implementations of the decoder and encoder. Hence, JPEG is an open format and nobody goes around trying to stuff pixels in XML files.
I really think XML is a solution to the wrong problem. The problem is closed source software, not binary files.
-- Marcio
Your editor would make the code easily legible and comprehensive to humans.
Such an editor introduces an unnecessary layer of abstraction which only makes coding more complex and error-prone. The real experts will need to know how to at least read if not write the underlying XML code because there will be cases where just using your abstract higher-level editor will be insufficient to track down a bug. There is no need for this, parsers are not going to be more efficient because of it and humans will not be able to do their jobs better because of it.
I didn't really want to post anything about it so soon, given that it's not quite ready for prime time (ie, lack of documentation among other things), but XPP might be of interest to some of you. It does some of what this guys proposes, although it's not quite a "next generation programming language", more like a pre-processor on top of C++. It allows you to use external XML templates (to describe automatically written pieces of code based on the rest of your program) and inline XML comments in the cpp source (to perform higher level macros, like, for instance, calls that morph depending on the rest of the code).
It's been used in a pretty big project from a well known company we all like to hate, though unfortunately the project itself has been cancelled. Hopefully it does mean that it's useable and could be useful to others as well. I had been waiting on a rework of the site w/documentation before drawing any more attention to it, but given this article, this is as good a time as any.
Cheers,
lone.
It does not mean that programmers will be typing in straight XML, it means that they'll be using smart IDEs of one sort or another to interact with their code, and probably that XML to (language) and (language) to XML filters will be built.
This greatly facilitates a number of things that are currently difficult - here are a random few (not all I can think of and in no particular order) :
And more.
To some extent this stuff is available today in one place or another. Using an XML based markup for the language would make such things much easier to implement and much more under the control of programmers.
You would never edit the raw XML of your source code...
Call me an old fart, but why couldn't I edit the raw source code? My PHB can't understand why I don't use MSWord to write C++ code. He can't grok the concept of plain text. "Puh-lane tekst" I keep telling me, but he keeps complaining that he wants my local variables in a different font. Now you come along validating his insanity!
The days of 4-space vs 8-space tab debates are over.
Here's a clue: those days were over decades ago. Some people still argue over it, but they're the type of people who argue over nothing. Just ignore them and move on. If you cater to their quirks and foibles you only encourage them. Here's the answer to that debate: it doesn't matter because it's too trivial to bother with.
Suddenly, parsing all that code becomes much easier, because we have a well-established XML validation mechanism.
Thank goodness! I don't know how many more years I could have put up with stupid compilers not being able to validate my code.
Drag an if-then block into your source code. Drag a for loop block into your source code. Your editor can collapse or expand blocks.
Since when did you need XML for this? Correct me if I'm mistaken, but doesn't Kate/Kdevelop do this already? I understand that many of you use that leprous shit of an editor that comes with Visual Studio. But that's no excuse to eliminate plain text source code. Get a real editor and stop dragging the rest of us down to your level.
Don't blame me, I didn't vote for either of them!
Don't most programming languages already have formal grammars? Why would you need to mark them up.
It's 10 PM. Do you know if you're un-American?
Then there are the 100 (it's probably closer to 250 formats, of which only 100 are still being used) other formats we accept in. Most of them had to be manually reverse engineered. We are exporting data out of a third party software. The people on the other end of the phone didn't write the software, they don't know how the software works. So we get to deduce all of that from them.
The 10th time you come across a variable length record file, where the sample didn't have all of the record types that exist, you'll be more then a touch peveed about it. Okay, so *K records are 432 bytes, *M records are 345 bytes. Aggg! I've never seen a *I record before. Whoopie, I get to go add one line of code recompile and deploy the software to overcome the lack of file documentation. You'll think to yourself, you know, if this was in XML, I could just skip this whole sub-tree. All the data I need, I've already found, so just clue the parser in to read to the end of the data, and I'll start on the next record.
In the end, I'm much rather throw a bit of RAM and some CPU at the problem and teach the software the semantics it needs to know to read XML DTD's to figure out how to extract data, instead of trying to reverse engineer a binary file format (gotta love people who use file formats for export but forget that Endianness exists so the same file format isn't portable across platforms). Figuring out the parsing of each file type is the part that is time consuming. Once you get the actual state machine to parse the data built, it's cake to actually extract the information I need.
There's a reason everyone documents a file format. Why not just use a general case one until you see that there is in fact a speed problem. 90% of the time, speed of parsing the damn data, and the storage overhead it incurs is minimal relative to the amount of resources wasted dealing with writting efficient parsers for each file type. The documentation and validation of the data is built in to XML (yeah, no more error detection of the system that are newline oriented, and the input system accepts carriage returns in the fields, or just corrupted on particular row of a huge CSV file). The file format is well maintained and can express all data desired (I've seen more then my fair share of file formats that if you put the delimiter in a datafile, it won't bother to escape it, I love those).
Just like I've learned to love the STL. The STL is better, faster, more memory efficient, and well organized then any code I'll probably ever write in my life. Your arguments sound very similar to the idiots I see expousing how they can out do the STL, right up until I make them benchmark their code against the most naive STL implementation you could imagine of their code.
I'm fairly sure the same is true of the XML parsers. Sure the files themselves are overly verbose. However, barring just ludicrust constraints (either enormous files, or really limited resources on the device), does it make any difference?
I'd never use XML internally to transport data around (in that case your arguments make some sense). My problem, is that there are thousands of people who might want data from me. I'd much rather use XML and tell them to use a stock XML parser then give them the EBNF to read my and some sample Lex/Yacc on how to process it. I would provide an XML export to any third party who wanted one. It'd be extremely simple, and it'd solve a ton of problems for me if I could just send and recieve data in XML. I'd have saved several man years here at my job if everyone who handed me data did it in an XML format. Sure, I'd have to have spent $5-10K extra on CPU and drive space (given that I've spent on the order of $250K-$500K it would have been no big deal).
Kirby
Yeah, I think it comes back to the original idea of using XML for this very purpose.
Using XML, we will have all the goodies of having a language-specific format and also a common ground that is Unicode text, for generating Diffgram, for example.
The language can be found in the bowels of Sun Identity Manager, formerly known as Waveset Lighthouse, and it's just as wacky as you would expect from an XML-based language. It's used for implementing workflows for the user provisioning and what have you not.
I thought this was a stupid idea, until XPRESS confirmed my beliefs.
Ok when your storing data files you might as well use a database vs XML. But, if your storing configuration files I like to use (int;name lenght) (name as an ascii str) (int; element id) (int; data type) (int; data size) (data)
...
It's low over head and easy to parce so you can store a tun of data, simple or complex data, with ease. It's also easy to check the syntax so you know you did not mess up when writing or reading the data. It's also easy to store list's or nest data. Ok now if you tell me how XML is so much better than this fine but other than being a little more readable by hand I don't see what the advantage is. As to data type's;
0 = new tree
1 = string ASCII
2 = String Unicode
10 = signed int
20 = unsigned int
30 = floating point number (format 1) 31 = floating point number (format 2)
I basicly keep aything under 10,000 as uneversal format's and use stuff over that for odd adhock stuff which I don't have time to deconstruct.
Lisp failed in the early eighties. That is twenty years ago. Things were different back then, and it is important not to learn the wrong lessons. Here are some of the reasons LISP failed.
Virtual Memory Mysticism Four megabytes was a lot of memory back then. The junor manager could be persuaded to sign off on the purchase of a workstation with that much, but the middle manager would cut it back to two megabytes and the senior manager would cut it back to one megabyte. The fancy Lisp application would have a working set of three megabytes, so the machine would page thrash and run like cold treacle. Eventually management would be pursuaded to buy another megabyte, but it would still be too slow. The basic problem was that people treated virtual memory as magic and were reluctant to accept that there were still hard limits on the ammount of physical memory required. VLSI killed bit slice Back then, everybody and their dog was starting new mini-computer companies, using 4 AMD 2901 four bit slices to make the ALU and a 2910 to sequence the micro-code. The big Lisp vendors had their own hardware. Symbolics had Lisp machines, Xerox had D-machines. As the decade wore on, technology progressed, and VLSI CPU's such as the 68020 and 80386 slaughted the LSI CPU's. DEC's VAX was the biggest losser, but the Lisp vendors suffered greviously from mis-reading the way the hardware market was going. Naive garbage collectors Garbage collectors are hard to write. In the 80's Lisp vendors were proud that their garbage collectors were free of bugs. Unfortunately the old mark and sweep collectors tended to walk swapped out data structures, thrashing the virtual memory system. Modern generational collectors are enormously better. Computer science was young Many programmers were self taught. (I read mathematics at University.) Others had studied computer science, but were learning from academics who were new to computing themselves and thought PASCAL was a good language.Why do you want access to the AST? The idea is that you can avoid manual maintainance of repetitous code and boiler plate code by automatically generating it. Well, it is a nice idea but it is also a sophisticated idea.
Some of the application is written in the base language. Some of the application is automatically generated. The generated code doesn't appear by magic. There have to be source files, in an application specific language. There have to be code generators, written in a meta language, that take the application specific language and expand it to the base language.
It is a huge win if the meta language is transforming AST's into AST's rather than transforming text files into text files. To obtain this benefit, the programmer must be thinking in terms of the abstract syntax, rather than the surface syntax.At this point it becomes natural to give up on surface syntax and program straight to the AST. It is also natural to insist that the base language is suitable for writing AST to AST transformers, so that a separate meta language is not required. This is what Lispniks are getting at when they say that Lisp is its own meta language.
This way of thinking about programming went right over the heads of most programmers in the 80s. I think that the most important reason why Lisp failed in the early eighties is that it was 20 years ahead of its time