Microsoft Roslyn: Reinventing the Compiler As We Know It
snydeq writes "Fatal Exception's Neil McAllister sees Microsoft's Project Roslyn potentially reinventing how we view compilers and compiled languages. 'Roslyn is a complete reengineering of Microsoft's .NET compiler toolchain in a new way, such that each phase of the code compilation process is exposed as a service that can be consumed by other applications,' McAllister writes. 'The most obvious advantage of this kind of "deconstructed" compiler is that it allows the entire compile-execute process to be invoked from within .NET applications. With the Roslyn technology, C# may still be a compiled language, but it effectively gains all the flexibility and expressiveness that dynamic languages such as Python and Ruby have to offer.'"
yeah I know I must be new here http://developers.slashdot.org/story/11/09/16/0253202/microsoft-previews-compiler-as-a-service-software
I'm wondering if my programming skills have fallen away so much through lack of use that I don't understand this as well anymore, or if the summary/article is just full of buzzwords and impressive sounding jargon.
It seems that Neil McAllister has never heard of LLVM and Clang, while Microsoft obviously has.
What do they exactly mean by "flexibility and expressiveness of other dynamic languages" ?
I remember a demo at a Microsoft Developer congress where C# would be able to execute and rebuilt itself dynamically.
At the time it got me really excited (as I've bumped into many problem which would have a much more beautiful solution should I be able to compile during runtime.) but this seems yet another technology?
I don't think of it as a new technology, but Microsoft is finally getting around to it. They are such a big dog now that some people don't recognize change until Microsoft rolls it out - years after others have already been mucking about in it for years.
If they roll it out in a good package, that's a good thing. If they price it above most developers budgets than they're going to be bypassed.
A feeling of having made the same mistake before: Deja Foobar
Roslyn is a complete reengineering of Microsoft's .NET compiler toolchain in a new way, such that each phase of the code compilation process is exposed as a service that can be consumed by other applications,
Sounds like LLVM.
This isn't exactly new. LISP had it from the early days. It's an idea that's been tried before, now available with more modern buzzwords, like "the compiler as a service".
Tiny C compiler does this for years:
http://bellard.org/tcc/
Features
SMALL! You can compile and execute C code everywhere, for example on rescue disks (about 100KB for x86 TCC executable, including C preprocessor, C compiler, assembler and linker).
FAST! tcc generates x86 code. No byte code overhead. Compile, assemble and link several times faster than GCC.
UNLIMITED! Any C dynamic library can be used directly. TCC is heading torward full ISOC99 compliance. TCC can of course compile itself.
SAFE! tcc includes an optional memory and bound checker. Bound checked code can be mixed freely with standard code.
Compile and execute C source directly. No linking or assembly necessary. Full C preprocessor and GNU-like assembler included.
C script supported : just add '#!/usr/local/bin/tcc -run' at the first line of your C source, and execute it directly from the command line.
With libtcc, you can use TCC as a backend for dynamic code generation.
platform-independence for malware just got easier!
"I've got more toys than Teruhisa Kitahara."
You might want to read this wrt tinycc: http://www.landley.net/code/tinycc/
Also, no x86-64.
Filling in a few blanks and tweaks, that can be done on ANY unix system. It allows ANY software to feed source code to a compiler. Nobody has complained of this as a security risk before.
Now, it might be a *slight* security risk if it is running as a background process that is always on, and therefore corrupting it once could potentially corrupt all future output, but I doubt MS means it as that type of service. As long as it doesn't run in that method, it's no worse than having the GCC binary callable as I showed above.
Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
So? my code could be put in an apache module. Use WSGI and it is available in Python. PHP has the ability to do it straight away.
It's still not adding any vulnerabilities to the ecosystem that haven't existed before. Yes they used it as a demo, but that's probably because it's a quickly visible demo that everyone can easily see what it is doing. Only an idiot would use it like that on a production system, just like only an idiot would use C, PHP or Python to do the same thing, and those have had that feature for almost as long as they've been around.
Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
This is the reason why gcc (or any other compiler) should never be installed on any production Linux machine.
Having a compiler installed permits the add-hoc creation of code, with all the resultant security risks. Self-modifying code, including compiled self-modifying code, is an elegant solution in certain environments, however it is a huge security and reliability risk in any production application.
The problem with the Windows NT 4.0 security model was that security was present almost everywhere, except if an application could be tricked into loading a DLL which then permitted uncontrolled code execution. Microsoft developed Internet Explorer and Active/X, and Microsoft Windows platform security has been weak every since. If you want a secure system, it is necessary to block all methods of running unapproved and unverified code.
LLVM is a better example. It's a set of libraries for generating a code in an intermediate representation, transforming that representation (usually for optimisation, but also for instrumentation and other things) and then emitting it as object code, assembly, or JIT'd executable code in memory. I've written compilers for Smalltalk and for a toy JavaScript-like language using it, and they share the same set of optimisations that I wrote for Objective-C and the same object model. The total amount of Smalltalk-specific code is about 15KLoC (including comments).
I am TheRaven on Soylent News
Uh, clang is 'libraryized'. The clang binary is a tiny wrapper around the various libraries. It's pretty simple to write a replacement or to embed the libraries in something else. Look at Cling, for example, which implements a C++ REPL system using the libraries, or LLDB, which uses clang to parse [Objective-]C[++] expressions in the debugger.
If the grandparent thinks any of this is easy with gcc, then he's never tried hacking on gcc - even using it for syntax highlighting is almost impossible because the gcc team intentionally avoids clean layering incase someone uses their code evil proprietary programs.
I am TheRaven on Soylent News
The code is here. The AST / back end are in LanguageKit, the Smalltalk front end is in Smalltalk (this also contains a few support things that make OpenStep classes look a bit more like Smalltalk-80 ones). The JavaScript-like language is in EScript, but it may not be working at the moment. It currently requires a trunk build of GNUstep libobjc, but I plan on releasing 1.6 of the runtime Real Soon Now.
I periodically write things about it on the Étoilé blog. You can also read some slightly out of date slides from a talk I gave about it at FOSDEM in 2009, and some more current ones from ESUG this year. Drop me an email if you've got any more questions.
I am TheRaven on Soylent News