Microsoft Roslyn: Reinventing the Compiler As We Know It
snydeq writes "Fatal Exception's Neil McAllister sees Microsoft's Project Roslyn potentially reinventing how we view compilers and compiled languages. 'Roslyn is a complete reengineering of Microsoft's .NET compiler toolchain in a new way, such that each phase of the code compilation process is exposed as a service that can be consumed by other applications,' McAllister writes. 'The most obvious advantage of this kind of "deconstructed" compiler is that it allows the entire compile-execute process to be invoked from within .NET applications. With the Roslyn technology, C# may still be a compiled language, but it effectively gains all the flexibility and expressiveness that dynamic languages such as Python and Ruby have to offer.'"
What do they exactly mean by "flexibility and expressiveness of other dynamic languages" ?
I remember a demo at a Microsoft Developer congress where C# would be able to execute and rebuilt itself dynamically.
At the time it got me really excited (as I've bumped into many problem which would have a much more beautiful solution should I be able to compile during runtime.) but this seems yet another technology?
I think we can keep recursing like this until someone returns 1
yeah I know I must be new here http://developers.slashdot.org/story/11/09/16/0253202/microsoft-previews-compiler-as-a-service-software
If I wanted to, I could rig GCC and the like to do that too: That's the wonderful thing about command-line tools and piping, you can munge things together any way you want. And of course you can always tell gcc to stop partway through the compilation if you need assembler code or a parse tree or something. This sort of thing is common in open-source compilers, because they need these features for debugging purposes and have no reason to leave them out of the released version.
Of course, I probably don't want to include a feature like this dynamic code execution, because if I screw up, it would be a fantastic way to get a machine to execute code that it's not supposed to.
I am officially gone from
It seems that Neil McAllister has never heard of LLVM and Clang, while Microsoft obviously has.
doesn't this allow for malicious programs to get even more malicious?
If some weasel can figure a way to insert malicious code between Source and Executable, there's always the possibility (and always has been.)
A feeling of having made the same mistake before: Deja Foobar
That's something that I haven't seen a language really get right since FORTH. I'd love to be able to use C# in a similar way, entering small function definitions from the command line, compiling them as they're entered, interactively testing functions as they're written. It's a great way to speed development.
This sounds great if you're doing stuff like autotuning, but for the vast (vast, vast, vast) majority of programmers out there I don't really see how opening up the internals of the compiler is useful. Who cares if that loop gets fused or that function gets unrolled?
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
Roslyn is a complete reengineering of Microsoft's .NET compiler toolchain in a new way, such that each phase of the code compilation process is exposed as a service that can be consumed by other applications,
Sounds like LLVM.
Compile and execute code from within an application? That's exactly what Krita (http://www.krita.org) does with OpenGTL (http://opengtl.org) -- we have code written in special languages for filters and so on which gets compiled by Krita and then executed as native code. It's pretty safe as well.
This isn't exactly new. LISP had it from the early days. It's an idea that's been tried before, now available with more modern buzzwords, like "the compiler as a service".
Can't wait to get my hands on a FOSS clone of it.
"When information is power, privacy is freedom" - Jah-Wren Ryel
Isn't the point of having the compiler as a service so that your executable can feed source code to a compiler?
Now malware can be shipped in various partially-compiled steps and in different packaging (one,two,three modules, arriving from different vectors, etc), making detection harder, and can then be compiled targetting the cpu it lands on. Oh, what a fricken great IDEA! platform-independence for malware just got easier! It''s really getting hard to distinguish between the bad guys and producers of ideas like this.
Pavlov wouldn't be so famous if he'd used a can opener instead of a bell.
It allows for something similar to eval in .NET. From the article:
If your program is doing what the demo code does, then sure, you're asking for code injection attacks.
like the Scala compiler? an API, plugin support and more? the Scala shell uses it as an example of how to use it
They may not necessarily mean 'service' as runtime service (the Windows world equivalent of a Daemon), but rather more a sense of a software library (provides a service to other applications, without actually being standalone itself).
In this case, it's not much different from python/ruby on any system. I could execute arbitrary python code from a C executable without much difficulty. That doesn't mean that there is a problem with how, C, Python or the underlying OS is written.
Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
platform-independence for malware just got easier!
"I've got more toys than Teruhisa Kitahara."
If I get a dime for each time someone "reinvents" Common Lisp, I would be rich.
Please continue innovating, Microsoft. Hint: I think whoever invents the bicycle again, will get to headlines too.
http://de.wikipedia.org/wiki/Klein_Zaches,_genannt_Zinnober
Progressive compilation that allows access to parse tree, AST, symbol tables, and other such artifacts is a great help in IDE and other "introspective" applications.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
I was thinking cpu-specific, not OS-independent. Sorry for ambiguity. CPU-specific compilation may allow for use of idiosyncratic features/bugs in the production of invasive code, something a little more difficult if the target hardware is unknown.
Pavlov wouldn't be so famous if he'd used a can opener instead of a bell.
Filling in a few blanks and tweaks, that can be done on ANY unix system. It allows ANY software to feed source code to a compiler. Nobody has complained of this as a security risk before.
Now, it might be a *slight* security risk if it is running as a background process that is always on, and therefore corrupting it once could potentially corrupt all future output, but I doubt MS means it as that type of service. As long as it doesn't run in that method, it's no worse than having the GCC binary callable as I showed above.
Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
C#, Ruby, and Python are all (in their main implementations) compiled languages. Where they differ is that C# is mostly-statically-typed, and Ruby and Python are dynamically-typed. The .NET compiler toolchain being exposed as a runtime service doesn't really make C# much more like Ruby or Python, since it doesn't change their main area of difference between the languages. It does mean that you can implement the equivalent of eval for .NET languages that don't already have it (like C#), which makes it a little bit more like Ruby or Python, but I don't think "C# doesn't have eval" is really the main reason people would think Ruby or Python is better for certain tasks than C#.
Oh?
I run it on Windows, Linux, FreeBSD and MacOS.
I run it on x86 and ARM.
Seems pretty damn independent to me.
Regarding what the GP stated though, with the right libraries and a little clever coding, a similar independent 'partially compiled' method could be used with C as well. Of course the partial compile of the windows version would have to check for a C compiler, and download/install one if it isn't available. Java and Flash could conceivably be used to do the same. So, it's really not adding a whole lot of new threats to the ecosystem.
Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
Consider cases like the console interpreter for Python or Perl, you could do malicious things with them, but I wouldn't call them code injection attacks. Mind you, using a REPL in a more complex piece of code that has other functions is probably a bad idea, but I think the REPL is more of an easy demo to show what it can do, not the intended use.
Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
"each phase of the code compilation process is exposed as a service that can be consumed by other applications."
How bout if the 'other app' is a web browser window? TFA suggests this will be possible with MS's product.
Pavlov wouldn't be so famous if he'd used a can opener instead of a bell.
So? my code could be put in an apache module. Use WSGI and it is available in Python. PHP has the ability to do it straight away.
It's still not adding any vulnerabilities to the ecosystem that haven't existed before. Yes they used it as a demo, but that's probably because it's a quickly visible demo that everyone can easily see what it is doing. Only an idiot would use it like that on a production system, just like only an idiot would use C, PHP or Python to do the same thing, and those have had that feature for almost as long as they've been around.
Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
This is the reason why gcc (or any other compiler) should never be installed on any production Linux machine.
Having a compiler installed permits the add-hoc creation of code, with all the resultant security risks. Self-modifying code, including compiled self-modifying code, is an elegant solution in certain environments, however it is a huge security and reliability risk in any production application.
The problem with the Windows NT 4.0 security model was that security was present almost everywhere, except if an application could be tricked into loading a DLL which then permitted uncontrolled code execution. Microsoft developed Internet Explorer and Active/X, and Microsoft Windows platform security has been weak every since. If you want a secure system, it is necessary to block all methods of running unapproved and unverified code.
But this feature will be encouraging developers to write applications that accept source at run time. If you don't write that into your app, you only have to protect it from malicious code before compilation, in most cases. Which is much, much easier.
Of course there are always other security threats after compilation as well, and those will still be there in addition to the ones this opens up.
Well.. maybe. Or Maybe not. But Definitely not sort of.
They found a way to shove XML into the compiler! Kudos to MS!
(see sig)
Stupidity is an equal opportunity striker.
Fellow slashdotter Bill Dog
Yes, that's true. When I talked about injection attacks I was thinking more about using this to run JSON-like strings of code when you don't trust the source.
By that logic, Python, PHP, Perl and any other scripting language that can popen() or eval() should not be allowed on a production server as well.
Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
We used that approach for PBL some years ago. It is wasteful to having to rewrite parsers and lexers for languages to build IDEs, and other tooling.
For example, code indentation can be done by walking the AST (you need to be careful to preserve hidden tokens, such as comments).
You can also allow code completion by changing the compiler to accept a "COMPLETION" token in some places in the grammar. Then, from the editor, when someone presses "Ctrl+SPACE" (or whatever) you mark the location in the lexer and send the code to the compiler. When you build the ast, you insert a completion node in the AST, and you have now contextual information about what can go in there and produce a list of potential things that can go in there.
Also, syntax highlighting can use the lexer for basic coloring and some type information to then add more information (such as what are field, or functions, etc.)
What's new is exposing these phases in a standardized manner in the language. That's a bold move, since backward compatibility will be tricky to maintain. Maybe they're thinking in finally stabilizing C#.
There are already APIs to emit IL or to invoke a C# compiler built into .NET and the security systems built into .NET give you a way to prohibit them. There's no additional risk exposed by Roslyn. Rather, it's a way of getting at the juicy knowledge about the code that the compiler builds up before it exits and that libraries have been written to poorly piece together. That's a good idea that I'd like to see accompany more official language compilers, static or not.
Ruby and Javascript were interpreted languages. The kicker isn't the eval function, but rather the def/prototype functions. In Ruby, you can instantiate a String object named str, add a method to String, and then immediately call that method on str. Upshot? - Imagine for a moment replacing (or removing) an object's toString method on the fly.
Ever used JSP before? You know that JSP pages are compiled (either on the fly or precompiled) and (if you're smart) you stored off the compiled .java files so you can debug when you page goes belly-up.
(You have to store the pages, because the line numbers match the .java classes, not the JSP pages themselves)
Now, we're removing the compiling mess, moving it to .NET as a service, and standardizing the calling of compiling those pages.
This premise, a managed AST you can manipulate programmatically (a SOM, Source Object Model), plus a managed compiler pipeline to compile, is nothing new. Boo language was doing this on .NET , and I'm sure there are many examples before it: Boo was started in 2003.
Seems to me Microsoft is now attempting to do with compilers what they attempted to do with the mobile phone.
Join the Slashcott! Feb 10 thru Feb 17!
Really? I like the REPL, but I wasn't aware that they had fixed the entanglement issues.
Thanks to Roslyn being designed explicitly for these kinds of scenarios, it can give you helpful information from nearly every stage of the compilation progress. You can get syntax trees! Not only that, you can feed it an invalid program and you'll get back a syntax tree that says that it's invalid, but knows when it stopped parsing, what kind of token it expected and can be stringified to the exact text you fed it. You can do flow analysis. There's a solidified model for how C# works in a REPL or scripting environment outside of everything-is-in-a-class mode, which admittedly it was never up to Mono to define.
csharp-repl is a very good REPL and the mcs family (which now seems to be merging into a single compiler) are very good compilers with source readily available, but I think it takes something that's designed from the start for reusability and being a library as much as a tool to get you these things.
How is this any different at all from Javassist?
Arbitrary CPU instructions aren't a problem. Arbitrary holding of the CPU and arbitrary API calls are a problem. The OS shouldn't award too big a slice to just any arbitrary sub-process. The parent process should check for unauthorized API calls. Something like this can be sandboxed. The question is, what kind of box are they putting it in?
Given the current ability of scripts to lock up IE and perform drive-by attacks, I'm not too optimistic about how they've secured it either. I'm just saying that it's not impossible.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
Yeah, you could write a C++ compiler in PHP and use that instead of gcc.
No sig today...
The Csharp REPL is really just an example usage of Mono.CSharp to which I should have linked to instead.
// file: mice.h
#include "frickin_lasers.h"
http://www.occupytheboardroom.org/
Ah, you are so right. Shame you'll just get modded down for telling the Truth (shame there are so many folk out there who think Window's desktop dominance means that Windows == computing).
Sure, but I meant the compiler when I said that, which I thought was clear from the third paragraph.
The Mono C# compiler can't do all this stuff, and that's completely fine because it's almost impossible to do by accident. I don't fault them for not having done it by accident. If anything, they should be commended for being able to whip up the REPL so easily; that shows commitment to solid design principles.
But that also means that just staying where they are thinking that they have parity, or someone proposing that they should do that, is unfortunate. They demonstrably don't "do this already" and they shouldn't settle for what they've got.
I've been around a long time, and I've never heard that. It has the kind of plausible ring that usually sends me to Snopes, where two thirds of the time I come away chastised for loaning the idea five seconds of credence.
What I know about GCC is that it had a rough adolescence and that over-arching design hardly entered into it for long stretches of time.
There's some truth to this aphorism. GPL is designed around what Stallman doesn't want people to do. It builds from a negative. Stallman doesn't want others to take away his freedom by building something he can't have.
I admire what Stallman's dogmatism enabled him to achieve. We're probably better off on both sides of the license fence because of it. At the same time, his repurposing of the word "freedom" is one of the most toxic subversions in the history of language. No, he couldn't just come up with his own word, he had to take someone else's word away. I wonder what Marshall McLuhan could have come up with given the starting point "gift culture Lebensraum".
I think closer to the truth of the matter is that gcc gained far too many extremely important use cases to start dabbling in architectural modernism. You'll note over the same time period, that Linux remained fairly far to the monolithic end of the spectrum. When a project reaches that scale, specific success factors put the stomp on architectural ideology.
Futhermore, on the C++ side, the rapid evolution of the C++ language wasn't doing anyone any favours in IDE integration.
The time is ripe for a new approach. The king is dead. Long live the king.
Here you go! Microsoft Roslyn: Reinventing The Wheel As We Already Know It
Only an idiot would use it like that on a production system
Would you call a JIT compiler idiotic then ? Because this is exactly how I foresee this stuff being used in enterprise apps, particularly ones that rely heavily on dynamic entities. We could have the app generate code on-the-fly that is then reused as needed, rather than reinterpreted every time with a hundred DB calls and long-winded generic form-generating code.
-Billco, Fnarg.com
I see a lot of other tools that do this, but since C# mostly started off as a ripped off Java, it's also worth pointing out that since Java 1.6, that language also provided public interfaces to compile code at runtime.
These are nice features. Sometimes, they are even useful (as opposed to just another hammer developers can abuse). But the announcement makes it seem, wrongly, that MSFT is doing something really unique here.
I'm simply talking about popen() and exec() like features. Loading of dynamic libraries, etc.
You can have pretty much the same effect without having to resort to native compiled code.
Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
Please read the previous comments, a JIT compiler is NOT what we are discussing. Those are used to compile static, pre-existing code.
Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
Addendum, didn't read your whole comment, still not what we were discussing. We were discussing the use of dynamic code entered by the user making a request via the web, and compiled/executed by the application. Not a specific set of pre-defined templated code that is modified without taking code directly or indirectly from the user.
Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
It is not possible to reinvent. That's an oxymoron. They are simply building upon what they and others have developed. Radical technology change, yes, but you can't reinvent anything, especially a compiler.
You can lead a man with reason but you can't make him think.
Bogus response. Compilers are on programmer's computers and they are used in production and suffer the same potential as you describe.
You can lead a man with reason but you can't make him think.
C# has always supported compiling additional code at runtime.
I've had it in projects since the 1.0 release.
They may be redoing the structure and making it easier to do, but doing it isn't new.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
The .NET framework includes the C# compiler for free, csc.exe.
The ability to compile C# code in .NET apps has been available since before 1.0 release. This isn't new, just a different way of doing it.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Having a compiler installed permits the add-hoc creation of code
wait, if gcc is a security risk, then so is chmod + (any program capable of writing binary data to a file). the only 'vulnerability' that gcc exposes is the ability to create cross-architecture exploits.
It's always good to see someone proud enough to stand up and say "im a cock"
The new right fascists are bilingual. They speak English and Bullshit.
.NET has had "something similar to eval" for a long time. You can easily compile C# code and instantiate the compiled objects today. This is quite different.
This is the reason why gcc (or any other compiler) should never be installed on any production Linux machine.
So anything that uses JIT is out then?