Parrot 0.1.1 'Poicephalus' Released
Pan T. Hose writes "The long awaited release of Parrot 0.1.1 "Poicephalus" has been finally announced on perl.perl6.internals newsgroup and perl6-internals mailing list simultaneously by Leopold Toetsch followed by an announcement on use Perl by Will Coleda and now also on Slashdot." (Read on for a list of changes since the last release, as well as a number of useful links.)
Pan T. Hose continues "The most important changes since the previous version 0.1.0 (code-named 'Leaping Kakapo' and
released in February) are:
- Python support: Parrot runs 4/7 of the pie-thon test suite
- Better OS support: more platforms, compilers, OS functions
- Improved PIR syntax for method calls and <op>= assignment
- Dynamic loading reworked including a "make install" target
- MMD - multi method dispatch for binary vtable methods
- Library improvement and cleanup
- BigInt, Complex, *Array, Slice, Enumerate, None PMC classes
- IA64 and hppa JIT support
- Tons of fixes, improvements, new tests, and documentation updates
from the faq:
Parrot is the new interpreter being designed from scratch to support the upcoming Perl6 language. It is being designed as a standalone virtual machine that can be used to execute bytecode compiled dynamic languages such as Perl6, but also Perl5. Ideally, Parrot can be used to support other dynamic, bytecode-compiled languages such as Python, Ruby and Tcl.
There are 10 kinds of people: Those that understand ternary; those that don't; and those that don't care.
Performance.
.NET VM didn't even exist when we started development, or at least we didn't know about it when we were working on the design. We do now, though it's still not suitable.
.NET back ends.
From the FAQ:
Why your own virtual machine? Why not compile to JVM/.NET?
Those VMs are designed for statically typed languages. That's fine, since Java, C#, and lots of other languages are statically typed. Perl isn't. For a variety of reasons, it means that Perl would run more slowly there than on an interpreter geared towards dynamic languages.
The
So you won't run on JVM/.NET?
Sure we will. They're just not our first target. We build our own interpreter/VM, then when that's working we start in on the JVM and/or
My UID is the product of 2 primes.
For one, the JVM is stack-based, which makes it hard to implement in hardware, while Parrot is stack-based like CPUs. This also has performance advantages for various reasons.
thisnukes4u.net
It says in the "read more" section that it unites perl and python.
When you get to hell -- tell 'em Itchy sent ya!
Parrot is register based
;-)
Use the god-damn preview button
That's not what it is. Perl and Python (and most "major" interpreted languages, I would suspect), first compile the script to byte-code, then execute the byte-code in a VM.
.Net VM. .Net for two reasons: .Net didn't exist when they began :) .Net is designed for compiled languages (C++, C#, VB, etc), and supposedly that would cause a performance hit over a VM designed for interpreted languages. (Plus, it's not designed to be portable, like Parrot is)
At some point along the line, someone said "Hey! Why are we writting VMs for every single language? They all do pretty much the same thing!".
So Parrot is that, a common VM. Perl6 compiles to Parrot bytecode, instead of the perl-only-bytecode it was using for Perl5.
Since that bytecode format is open, and the VM free, any other interpreted (or compiled, I guess) language (Python, Ruby, TCL, LUA, whatever) can make their compilers output Parrot bytecode.
That way they don't have to build and maintain their own VM, and they get the benefits of future optimizations to Parrot.
For example, I could write a just-in-time compiler for Parrot for my favorite platform, and every Parrot-enabled language could take advantage of that.
Doing that now, I'd have to pick a language to optimize. Do I want to make Python faster? Or Perl?
Python is going to use it in the future,last I heard. (probably as another backend like Jython or IronPython, not as a replacement for CPython)
I think they were waiting on the byte-code format stabilizing.
All in all, it's a very cool idea. Makes it a hell of a lot easier for people to make new interpreted languages, since they only have to target Parrot, and they've got a mature, debugged, VM that runs on multiple platforms. (in theory, at least. I don't think it's there yet)
It's not that similar to the JavaVM (which is only designed to run Java, not a pile of different languages), but it is kinda like the
The developers of Parrot created Parrot instead of targeting
1.
2.
I respond to your sigs
As it said, Python, Perl, Ruby, Smalltalk, Lisp, and most of the languages targeted by Parrot are dynamically typed and have dynamic message passing (method calls). This means that typechecking is done by the run-time environment, not by the compiler. Likewise, it is not known untill runtime which if an object has a method. Therefore, the runtime has to do a fair amount of checking (mostly symbol table lookups). If you were to do this with a VM designed for static languages (JVM, .NET) that do not do this checking for you, then you would have to implement all of it as byte-code in the VM - in effect you would be writing a big chunk of a Perl interpretor in Java.
This approach would inevitably be slower than the existing Perl 5 interpretor, while the Parrot approach has managed to be significatly faster than the current Perl 5 interpretor. The reasons are that 1) all of the runtime checking is highly optimized native code 2) after the complex perl code is translated into a simpler form, the traditional compiler optimizations can be applied to the code.
I think he means register based.
From one of the examples, we get:
set I1, 1
set N1, 4.2
set S1, "Resting"
print I1
print ", "
print N1
print ", "
print S1
print "\n"
end
Which seems to indicate a heavy use of register type functionality. This will map to hardware (thusly faster) better, much more so than a stack based (java) VM implementation. Especially for dynamically typed languages (perl, python, ruby, etc).
"The Genus Poicephalus are small to medium sized, stocky birds with short, squarish tails and proportionately substantial bills." I guess it's just your basic African parrot, then. Funny, with the word "phalus" in it, I thought it would be something else...
"Freedom means freedom for everybody" -- Dick Cheney
It's not just possible, it's a goal.
Any existing language running on Parrot right now can use the SDL bindings, if the language has syntactic support for loading the appropriate library and calling class and instance methods.
As well, Tim Bunce's plans for DBI 2, Perl's almost ubiquitous database module family, includes porting it to Parrot to make it available to any language running on Parrot.
Everything compiles down to Parrot bytecode, so if your language has the syntax for interoperability, you'll have it.
how to invest, a novice's guide
The point of that is also that you have only one debugged runtime and don't have to write your own for each language.
A very important fact hasn't been mentioned here: When you compile multiple languages to the same VM, you don't automatically get interoperatibility between those languages. Interoperatibility is a different concept that must be separately achieved, mostly by defining additional rules and standards on-top-of the VM specification.
For example, look at different C or C++ compilers on Linux/x86. They all compile to the same machine code, but Intel machine code has no concept of "symbol names", "class names", "type names" etc., so, for the compiled codes to be interoperable, they must comply to additional specifications like ELF or some C++ ABI ("name mangling").
Parrot bytecode may define some or all of those concepts, but some scripting languages will probably define additional features (macros, MI, continuations etc.) that have no "native" representation in Parrot, so additional conventions to represent those things in Parrot must be agreed upon. The .NET CLS ("common language specification") is another example of such a set of addional rules to provide for language interoperatibility.
This is NOT a strong vs weak typing thing. This is a static vs dynamic typing thing. The strength of a type system has no relation to when it's enforced.
You can have all combinations:
The meme of `register byte code will map nicely
to hardware' is also rubbish.
For one, the type of the registers in parrot do
not map to the underlying hardware types (ints
and floats is all cpus can do), and second of all
not every CPU has all the registers parrot has.
So if you generate code that uses 32 registers,
you would still need to map to 6 registers on
Intel.
To make things worse, register allocation is
one of the hardest problems in a compiler, and
the one that probably has the most impact on the
performance of generated code.
So now every compiler author is forced to write
a register allocator, only to find out that
parrot will throw away all that information and
redo its own register allocation.
That is why Medium Level IRs do not use registers,
they are too high level to have any real effect.
As for lower-level IRs, most of them assume
infinite registers anyways as a simple way of
"labeling data".
---
Parrot comes with a system that will let
compiler developers not worry about register
allocation: you emit some kind of infinite
register, and a tool produces the IR registers
(which will later be discarded anyways).
Miguel.
I would be very interested to see an example of a real-world application where unpredictable user inputs can demonstrably render static type-checking useless, or where static type-checking makes a real-world application significantly harder to write. Because I sure can't think of one.
Static type checking is still useful for those portions of the program that are, well, static. Any part of the program that deals only with data that is known at compile time, can benefit from static type checking. For example, if your menus are not dynamic (their content does not change) then that portion of your code can benefit from static type checking, etc.
But by far the more difficult bit is dealing with data that isn't known at compile time. Let's take the same example. What if I have a menu item that lists the names of recently opened files with their full path - say in a File menu. However, the menu drawing code I use assumes that all menu items are less than a certain number of characters. There is absolutely no way that I as a programmer can know at compile time whether the type of that file path will be stringLessThan64Characters, or whatever the limit is. I must do a run time check of the data, and respond accordingly. The benefits of compile time type checking are lost for this portion of the code, because it is interactive. The nature of the data that my algorithms will be dealing with cannot be known at compile time because it is determined by user input.
This issue is multiplied in the sequences of user mouse and keyboard actions in, for example, a graphics program. The user can cut/paste/drag/drop/copy-drag etc. At each juncture, my code must check the nature of the data to assure that it can be handled by my algorithms. Static type checking here becomes of little utility. I end up with much of my data being of type "Untyped" and so, it must be type checked repeatedly at run time.
Note that this need to do run time type checking is unavoidable not only in practice but in principle. There is simply no way to know what the type of the data generated by a complex series of user actions will be. Remember that type in this sense is strict, and includes such things as data size and dimensions, not just the underlying primitive type of which it is made. It is not enough to know that I'm getting a slew of bytes, I must know that they are well formed wrt my application's algorithms, and that they are of the right range of size.
Now add to this multiple user interactions across the network, and realize that by far your most important task with regard to type checking is not the ivory tower theoretical exercise of type inference, but the very real world task of checking each and every piece of data that the users throw at your code, and the results of combinations of such user inputs that you may not have originally predicted ("Oh, it never occurred to me that the user would select that menu item, click this radio button, type in such -and-such and then click this other unrelated button").
The more interactive a piece of software is, the more choices the user has, the greater the use of network connectivity, the more the need for run time type checking. Since I'm doing all of these run time type checks, not because I want to, but because I have to I might as well use a language that is designed on the basis of run time type checking. I simply define the necessary constrained types I need, and throw an execption/condition when one of my functions/methods is called with the wrong type, as determined at run time.
The future belongs to interactive software, and interactive software demands run time type checking. It follows that the future belongs to languages that are dynamically typed.
I'm no expert on all the political nuances, but there are at least two groups of people that can't use Unicode.
The first is a subset of CJK (Chinese, Japanese, and Korean) speakers. CJK support has been a political minefield for years, and Unicode just made it worse. If you're not aware, each of those languages uses Chinese ideograms in its written language (Korean less so today), even though the actual pronunciations are completely different. The first mine is the PRC's creation of "Simplified Chinese" in the 1950's, and whether a character set favors Simplified or Traditional characters determines a broad swath of political implications. A second mine is how many literary characters a character set supports; to properly record classic Chinese literature, you need somewhere near 50,000 code points, but if you're just recording modern communication, you can get by with around 10,000 (Traditional) or 2,000 (Simplified). The latest mine is the Unicode Consortium's attempt at Han unification, which merged all 4 languages (counting Simplified and Traditional separately) into a single list of code points, without regard for the fact that the actual glyphs sometimes varied dramatically between languages; the expectation was that users would have the appropriate font for their local language installed, but it makes using multiple CJK languages (e.g. a list containing both Chinese and Japanese names) difficult at best and sometimes outright impossible.
The second group of people who can't use Unicode are people using languages that aren't in Unicode. Most of these are historical languages being studied by linguists, but there are even a handful of modern languages.
There are also some situations where, although Unicode works, using a different encoding is much more space-efficient. Ignoring the obvious example of the ISO-8859-N family, CJK text is very, very bulky when encoded in UTF-8 (3 to 4 bytes), but is easily represented by fixed-width 2 byte encodings, and sometimes Japanese can even be shoehorned into 1 byte per character (if it's all kana).
Range Voting: preference intensity matters
Notice that stack-based operations are nothing
but a linear representation of a tree.
It is in effect the output that you would get from
serializing a tree, so you turn internal compiler
representation, like for example the following
tree:
(assign var (add 1 2))
into a series of stack operations:
push 1
push 2
add
var_address
store
You can certainly "interpret" those bytecodes,
and for an interpreter it is debatable if there
is a performance improvement or not.
But for any self respecting JIT, the above is
only an MIR (Medium-Level IR) that must be
processed into something else.
The tree is reconstructed from the stack
operations (every JIT does that) and then you run
your standard optimizations, with register
allocation to the target architecture being the
last step of a long chain of operations.