Help crack the Java 1.6 Classfile Verifier
pdoubleya writes "As part of the development of Mustang (Java 1.6), Sun is developing a new, smaller and faster classfile verifier which they want your help in trying to break. As Sun VP Graham Hamilton puts it in his blog entry, "As part of Mustang we will be delivering a whole new classfile verifier implementation based on an entirely new verification approach. The classfile verifier is the very heart of the whole Java sandbox model, so replacing both the implementation and the basic verification model is a Really Big Deal.... The new verifier is faster and smaller than the classic verifier, but at the same time it doesn't have the ten years of reassuring shakedown history that we have with the classic verifier." You can read about the new verifier on Gilad Bracha's blog, and join the new Crack the Verifier initiative to if you can break it. Read all about the Crack the Verifier - Challenge."
and join the new Crack the Verifier initiative to if you can break it.
I'm going to if I can break it right now!
No cold hard cash or equivalent for "cracking the verifier?"
I guess it could lead to more pay in some cases.
Before those who go on to dismiss Java for various reasons (no matter how ignorant they are), take a look at the presentation given by Google at this year's JavaZone conference on how Google is using Java internally at extreme scales. Among them are AdWords and GMail.
www.rexguo.com - Technologist + Designer
I'm not sure how the MS beta process works, but I get the impression that it's not just a straightforward download but you need to sign up or something (passport?).
I wonder what would happen if they junked the whole exclusive beta thing (which might get some of the more privacy-concerned, tech-savvy people on board? dunno - just a guess), and then actively encourage people to try and break the security? Surely that would produce better results than product x coming out, and then massive security problems follow for days, months and years afterwards.
I'm not pretending that this would cure the world of buggy ms software, but it can't hurt, can it? They should be doing it with vista right now.
Craig
If his name doesn't ring a bell, he's a Java guru who works for Sun and wrote the 2nd and 3rd editions of the Java Language Spec. A bunch of his papers are listed here.
It's a relief that JDK 1.6 won't include any language changes (as far as I know?). Updating various parsers and whatnot to work with all the JDK 1.5 language changes was a big job, although some of the new features certainly are quite handy.
The Army reading list
That link doesn't work.
h tml. Java scores just a bit better on this front than C and C++, because of the big standard library and GC, but it's a long way from perfect, and not as good, IMO, as scripting languages.
In any case, the tools that impress me most are those that scale up, but scale down as well too. Google has the money and the brains to make anything work, more or less. What's more impressive to me is technology that lets Joe Schmoe get things up and running easily, and then still scale up reasonably well. I wrote a bit about it here: http://www.dedasys.com/articles/scalable_systems.
http://www.welton.it/davidw/
You mean like this; http://www.microsoft.com/athome/security/spyware/s oftware/default.mspx ?
I am a bit tired of the aproach "Let's just see if it works!". That aproach works well on an old car, but it does not work well on the linch-pin of one of the most important technologies today!
Why not do what it takes: Prove that it will work, and prove that it cannot be broken!
Could Java/SUN afford a major flaw in the Java sandbox/class loader...? I think not!
Sounds like a desperate attempt to save a few bucks by not hiring testers: release the software and "challenge" people to break it.
I challenge Sun to hire a full development team including quality assurance and not put the onus on the community to find their bugs.
Why doesn't Slashdot ever get slashdotted?
Here you go:
https://mustang.dev.java.net/
I can safely say that Java doesn't suck, but on the other hand, .NET is an extremely nice technology. Java is certainly lagging behind (and has been for some time) in terms of suitability to desktop applications, but it's a solid base and is extremely useful for just about everything else.
It allows you to work faster and create more in a short while. It allows you to create abnormally slow programs that you can't even speed up with the willpower to do so, because of Windows internals. Those exact internals that Java won't touch with a stick.
// some function
// different, automatic polymorphism
// some function // some function too
// some other function! again automatic! // some function too, not polymorphic!
// function
// functiontoo
// override, kind of pointless...
// ... why new? that's reserved for memory allocation...
.NET (in C#) requires you to make everything you want so explicit that I'm inclined to say that you're wasting time doing that more than you're gaining time due to other factors.
Java doesn't look like win32 because it isn't even trying to. It's trying to look platform-independant and the same on all platforms, with the option to skin it to any GUI you want. dotNET IS windows. There's no wonder that it looks a lot more like windows.
I must strongly disagree on the OO implementation however, aside from it not supporting multiple inheritance, it's just good. Microsofts methodics are plain stupid, because for everything you want to do you have to specify it so explicitly my fingers still hurt last time I tried it.
Compare:
Java:
public class xyz {
int function() {
}
}
public class abc extends xyz {
int function() {
}
}
C++:
class xyz {
public: virtual int function() {
};
public: int functiontoo() {
};
};
class abc : xyz {
public: virtual int function() {
};
public: int functiontoo() {
}:
};
C#: (might contain errors, been a while)
public class xyz {
public virtual int function() {
}
public int functiontoo() {
}
}
public class abc : xyz {
public override int function() {
}
public new int functiontoo() {
}
}
My point is,
Plus, I just don't like their idea of a good library. Rape the C++ STL, why don't ya. Either support c++ (and the STL), or don't support it at all.
Toplevel ISPs have issues, not google. It would probably require the whole intarweb to break down to down all of google's various servers.
"The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
> C is portable, fast, very complex and since 35+ years the leading standard for professional OS and APP development.
.Net Framework is huge, there's no way that's simpler than the C standard library. Then you've got to think about reflection, inheritance, dozens of things that C just doesn't have.
I agree that C is portable and fast, however I don't it can be called very complex.
The smallest programming language manual I have ever owned (and I've owned quite a number), has to be "The C Programming Language", often hailed as the One True Reference to the language. How can it be that complex if the manual is less than half the size of most of my other manuals? I think languages (in general) have got more complex since then. The size of the
If what you mean is that C programs end up looking more complex, that's probably because C is used for systems programming. If you mean that you have to write more code to do it in C, then you may have a point, but I think C is actually one of the simpler languages. The closer to assembly you get, the simpler the language has to be.
Craig
Another nice thing about the new classfile specification is that it's going to make certain new kinds of optimization possible. The more you can prove about what's on the stack at any given point, the more you can inline.
Not only does inlining eliminate method call overhead, but it allows you to re-run the peephole optimizer, which can eliminate range checks, reduce redudant type checks, etc.
The ultimate performance promise of Java is that it can do optimization very, very late in the process. Native libraries are basically black boxes in C/C++, and it's very hard to do that sort of inlining because most of the type information has been lost. Java may, someday, with sufficient ingenuity, rival or even beat C++ in performance, and it already does in certain limited areas.
Of course C# has all of the same advantages, and even though it's more recent there are some areas where its performance beats Java. I'd love to see all the Microsoft reasearchers vs. all the Sun researchers coming up with increasingly brilliant ways to take advantage of the late binding to turn a performance hindrance into a benefit.
Java 1.5 introduced the two things that make me willing to consider Java as a practical language for real work (as opposed to a "safe to let untrained programmers run rampant, too bad about the 10000k LoC required to do anything" language). Those two things are collections and generics.
... refreshing.
I was forced to use Java 1.2 some time ago, and found it a horrific experience with my background in dynamic languages. Since then, I've learned C++ and got used to the pluses and minuses of static languages (both in the sense of "compiled" and in the sense of "statically typed"). Java also largely ceased to suck, so having to work on it again and finding that sort code that would've been hundreds and hundreds of repetitive lines can now be expressed using a short set of comparitors and a collections-based sort was
After Java 1.5, I can understand why they'd want to let things settle down for a while. It seems to me that they finally got all the really important stuff into the language.
I think that one of the big advantages C has is that it is a more simple language. Whereas if you were to write a book on PHP and it had "just the language" you would disappoint quite a few people.
Though it always amazed me that they had time to go into things like binary trees in "The C Programming Langauge". They write about things like that as if they are perfectly natural to beginning programmers (In truth they are not that hard). I love that book, but most books are so big because readers want documentation on the standard library of the language as well as the language.
Jeremy
Do not underestimate Brainfuck !
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Look, I wish people who keep banging on about how Java is nearly as fast as C would get their heads out of their asses and realise that the biggest defect in Java is not raw execution speed but the "business processing" holiday that the system takes every "completely unpredictible once in a while". If I have a throughput capacity system, I can control the rate of throughput in a number of ways (eg selling less than my total capacity and then throttling at times of peak use) but when the system goes and does something like a garbage collection and the whole pipe goes "fnark" for a some seconds I am quite pissed since my users who want the service level in their SLA aren't getting it.
Predictability and execution control are why I use C (and to a lesser extent c++) not Java. That cannot change for languages that give me no control over the raw execution path.
"The first thing to do when you find yourself in a hole is stop digging."
If I remember correctly, C has 47 keywords. Everything else is built from them. printf and scanf are not keywords - they're functions built from some character-handling routines(no stings in C, remember?) and the read and write keywords... which is also why you need to put that pesky #include at the top if you want to use them.
I'd rather be flying
Proving any none trivial Software rapidly hits to much complexity, consider the halting problem alone and not even output and absolute proof's are (O^N) complexity, it is only realistic to prove trivially simple functions.
While the "automatic" aspect is oversold, I would not want to give up the bug prevention aspect: you can't dereference stale pointers in Java. That and mandatory bounds checking makes me prefer VM based servers (not necessarily Java) for security reasons. IMHO, GC and mandatory bounds checking should be requirements for all public internet servers.
This conversation was old two or three years ago.
And only NOW you're getting around to it?
Only now?
And how about staying on topic! Yeee - HAWWWW!
No sir, none of you really comprehend what the article is talking about, and those that do, sure as fuck don't post on Slashdot.
So this is what passes for up to date informed thinking here.
I believe that there is a concurrent garbage collector in Sun's JVM that while not as effecient over-all but runs continously preventing pauses and bubbles associated with traditional garbage collectors.
I'm my Java in a Nutshell 4th Edition (p. 246) one of the java(the interpreter) arguments is:
As this book was written for Java 1.4 I'd bet it was fine-tuned and enhance for 1.5 and even more so for the upcoming 1.6. Might be worth trying out.Your CPU is not doing anything else, at least do something.
C... Complex?! Yeah right. It's one of the most basic lanauges out there.
Function pointers are slower than calling a function directly. They are basically what you call virtual functions in C++. C++ provides an advantage by allowing non-virtual direct method calls. It's not that you can't do the same thing in C but it is a lot harder to use. You can't use function pointers in C and expect the same performance as C++.
The language may be more simple, but that doesn't mean writing code in it is. I say that because you must write a lot libraries from scratch that would be readily available in a "more complex language".
There's no place like ~/
Right, it's Sun's "responsibility", and of course if the verifier falls down they will get all the bad press -- but they are giving the rest of us the *chance*, if we want it, to contribute to the security of an essential part of the JVM (because we're the ones who'll be vulnerable if there's a flaw!), PLUS to find out if their new verifier is going to play nicely with the myriad 3rd party tools that manipulate and generate Java bytecode.
There are tons of alternative compilers out there that turn various languages into Java classfiles. Then there are all of the tools that alter classfiles to add in AOP, or to optimize filesize, or to obfuscate the code against decompilers, etc. etc..
Obviously Sun isn't going to test all of these, though, for example, some of the obfuscators have historically depended on the specific implementation of the verifier (and NOT the classfile spec) to make tweaks that would break popular decompilers.
See the reasoning now? Personally I think they should put a bounty on it to recognize the value they're getting out of it (and to help folks justify the time they'd cracking it...) but it's silly to say they shouldn't even have made the offer.
Pardon my ignorance, but isn't this a violation of the anti cracking clauses of the DCMA?
What's really scary is that anyone would think that binary trees might pose even a minor challenge to a programmer. This speaks to the quality of code monkeys being turned out these days.
The post mentions "verifier" about ten times without definition... so for those of us who are lowly lab administrators and not full-on code monkeys.... eh? What is this and what does its success bode for the adoption of Java?
that isn't bad at all.
Stoned4Life
gen = new Random
There is a large ISP in the US called AOL, which runs on a web server called AOLserver, which is now open source software. The web server makes extensive use of Tcl. Many other large sites use PHP. So, you can make it scale if you want to (although this may involve mixing in some C code).
Does it scale as well as Java? Perhaps not. The point is that it's easier to get started with the scripting language, though. Your customers probably had an easier time of it getting started with scripting rather than diving into some complex, difficult to implement Java solution. In some cases, I know I wouldn't have had any customers at all if they had had to start out with C or Java rather than a scripting language, which was easy enough that they could at least implement their idea to the point where it made money - and then they could pay me to fix up their code and make it faster, more elegant, and so on.
In any case, what's interesting to me is the range of scaling - Java does pretty well if you consider the high end, but not so well at the low end. Some scripting language systems don't scale up well enough. The best things start out pretty easy and will grow with you a long way before you have to go implement the big hairy memory intensive Java thing though.
I do agree with you WRT the speed thing... JITing stuff seems to be a pretty good solution.
http://www.welton.it/davidw/
And the Algol manual/language definition is what, 15 pages long?
Sit down.
AC misunderstands the issue. The class verifier checks that that a program is unable to break out of the "sandbox" infect the system. This is not related to DRM. Hate DRM all you want, but at least RTFA before you post.
What is going to be tested is the implementation, not the theory behind the algorithm, however. It doesn't really matter how well proven the algorithm is if the implementation has a bug (which could caused by a typo for example).
Code proving is a nice theoretical excersise, but don't for a second believe that just because some code has been put through a formal proof that it cannot contain bugs.
Cripes. You might want to do a tiny bit of fact checking before posting such nonsense. The JVM can and does return memory to the operating system, subject to parameters that you can easily set on the command line. There's a pair of thresholds (low and high) on the ratio of free space to total heap size. When free space drops below the low threshold, the VM will get more memory from the operating system (up to the MX limit). Conversely, when free memory rises _above_ the high threshold, the overall heap size will _shrink_, and memory is _returned_ to the operating system. Look up -XX:MaxHeapFreeRatio.
The default high limit used to be 70% -- memory would only be returned to the OS after the heap was measured to be 70% free. Many people will observe their app and say "it doesn't return the memory". It does, but you have to be aware of how this is done.
You can tighten up this up and have your application run much closer to its "true" size. But do you really want that? You have to study the app in its live environment and understand where the time and memory are going. It can be a good idea, and sometimes it's not.
To make the contest even more attractive, we have to sign a legal agreement to review the source code:
Thanks Sun, but no thanks. If you want me to do your work for you, I'd better be getting paid in a cash equivalent.* java is nearly as fast as C++ according to all the benchmarks I've seen. Yes, really. The perception of java as being "slow" is simply the legacy of the old awt apps. Yes, the awt gui was (and is) slow. Server-side java applications are not. The "much better performance" is simply not there, particularly for typical enterprise apps.
* *All* the enterprise apps (which is the area where java is particularly successful) store stuff in a database and/or talk to remote apps. Newsflash: a database query or a remote procedure call is *orders of magnitude slower* than an in-process procedure call. Once you include DB/RPC into the equation, whatever little speed advantage C++ has is wiped out completely.
* This is CS 101: performance of a program is largely determined by the algorithm used. You can write a linear search in assembly, and it will be very fast for small lists. But for large lists, a binary search written in shell script will beat it.
* In an enterprise application scalability is much more important than raw speed. So what if I can write a C++ app that's 20% faster than an equivalent java app? Java has frameworks that make it easy to write an app that you can scale horizontally (i.e. by adding more boxes). Easy being the keyword.
* Developer's time is much more expensive than runtime. It is *much faster* to write an app in java than in C++. And for all but the smallest/simplest apps it is faster to write the app in java than in PHP/perl/whatever.
If it's a safety/security issue then again you could build the same thing in a native compiled language, sandbox and all.
Uhhhm, yes. Safety and security are *big* issues in enterprise apps. Show me *one* native language and platform that does it. You are saying it like one can just wave a magic wand and have it built in no time. "You could build the same thing" is not "it's already built".
I mean really, is it just because Java provides a lot of easy to use API's?
Yes. among all the other things I've mentioned.
These are just a few reasons why java is so popular in enterprise apps. Sure, I wouldn't write a game in java, but for enterprise apps, it's perfect. Why java and not PHP/perl/? Simply because java is better. It has all the advantages of compiled laguages (type safety, variable declaration checking, syntax checking, etc.) without some of the disadvantages (manual memory management). Think of java compiler as a sanity checker for your code. It will catch common mistakes like typos, missing return statements, invalid function parameters, etc. A scripting language will not complain about that, but force you to spend hours tracking down the bug. That's why java is faster to develop in than any scripting language for large apps.
___
If you think big enough, you'll never have to do it.
What I don't understand is exactly what advantage is Java providing on the server-side. Do you really need cross-platform bytecode at that level?
Actually, yes -- the cross-platform ability is extremely useful. Speaking personally: the two biggest projects I have worked on, both for one client, are deployed (production) on a IBM iSeries server (these used to be called AS/400s -- using the OS/400 operating system), and a Solaris server respectively. Both web apps are built on the same code base, and we developed and tested them on Windows 2000 workstations (XP, now, plus I am starting to do more and more development in RedHat Fedora).
Can you imagine if I needed my own iSeries at home to run a test server here? Those things aren't cheap. Also, because the client has more in-house iSeries experience, we're going to be moving the Solaris webapp to an iSeries as well at some point -- and guess what? The Java code doesn't need any changes whatsoever; it's only the database SQL that will need to be migrated (DB2 UDB to DB2400 SQL isn't consistent).
When I'm starting new projects, I can get people started on architecture and writing code in most cases *without* finalizing the eventual platform, and without getting access to the big hardware yet. You aren't locking yourself into anything from the beginning -- this is actually a pretty serious power to have. It also allows me to run side by side performance testing on servers to see the *real* differences in capabilities; this is HUGE because the folks selling the big iron suddenly are a commodity, not an unquestioned master in a domain with benefits we can't actually measure usefully.
Just my 2 cents -- I'm sure some people wouldn't actually care (e.g., "my webhost only runs RedHat, so that's all I need to care about"), but gotcha-free cross-platform code is a big deal.
Actually, I'm more interested in making sure that the Java applets my browser executes from Web pages have their shackles on nice and tight, so they cant empty my home directory.
My phone lets me download MID files from the Net; so I can just upload them to my home page and get them from there. Of course the data transfer costs money, but hacking the phone's Java interpreter doesn't change this.
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
I'd like them to add the ability to check the returned value from a function in the Java Debugging Interface API. Like they said they would 5 years ago.
Java Pros:
1. Zero memory fragmentation. The GC compacts the memory at runtime. This means indefinite uptimes. A server written in a refcounted script language might lack that.
2. Zero chance of a buffer overflow attack anywhere. Maybe if there is a bug in the VM, however, this might become possible.
3. All libraries in the standard distribution have been tested for almost a decade now.
4. Incredibly powerful multithreading and synchronization.
5. Rapid development of fast programs. Only someone well versed in Java can do that, but java is well worth it when you have the people. This can be done in other languages but at insane costs in security.
6. Performance costs for all of the above is within the 20% margin, which is great for a server app that does not do anything computationally expensive. Most of the work is offloaded to a fully optimized DB server anyway.
7. With the right framework, you can easily load and unload modules at runtime. Not easily done though.
Java Cons:
1. Incredibly slow startup time. It may take up to a minute for a large app to get fully loaded and JITed. This is a non-issue in a server environment, however.
2. Extreme memory usage. Up to 10 x the equivalent C++ app. However, the GC makes sure that memory usage remains almost constant under similar loads for months and years of uptime, because there is no memory fragmentation.
3. Due to 2, sometimes most of the memory gets swapped. This shouldn't happen in a server environment, but on a desktop running server apps (dev machines for instance) this is a great nuisance. It might take running a full GC manually to force your redmond-developed OS to re-load all the memory for the app. Again, a non-issue for servers.
4. The default Sun Java VM configuration makes Java run any program with a 64 megs of mem usage limit. This is ridiculous for a serious Java app. It takes passing a command-line param to fix that. People can get frustrated because of this.
If only I could hack the Gibson, I'd have the power I needed to crack the verifier.
/hack the planet!
I am scientifically inaccurate.
Simply set -Xverify:none and you'll have no class verifier running in your vm. This is a known, old trick to speed up applications.
One app that bears the lie to lagging behind on desktop applications.
If you know what you're doing, you can write pretty decent Java desktop apps.
The cesspool just got a check and balance.
I don't like .NET, but I do feel I have to ask this...
When's the last time you've seen an application written in C that only used the standard C library?
GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011
I have helped mentor some people with programming. I have taught them just enough of the language to build a more complicated data structure like a binary tree or a hash table. Explain the math, it is all relatively easy, and the logic and most people have no trouble building these data structures. If you teach people these data structures and algorithms for efficiently working with the data structures those things become a part of their programming vocabulary and they can generally solve programming problems more effectively. In general you only need a data structure every once in a while. But when you need it and it is the only thing that really solves the problem nicely it makes all of the difference in the world to have that knowledge.
Jeremy
You forget that accessing all that extra RAM has a cost in time. As CPUs get faster, RAM does not improve at nearly the same speed. To bridge this widening gap, you need L1 and L2 caches (some processors have even introduced L3 caches). But caches only work under the assumption of locality of reference. But as you've said above, GC violates this assumption (they "roam all over the heap"). That makes GC cache-hostile. Hence, as CPUs get faster, the gap in performance between programs using GC versus non-GC will continue to widen.
That makes GC cache-hostile.
I agree, except that a good generational copier shouldn't traverse beyond the most recently allocated memory.. Even as freshly allocated-and-not-yet-released memory exceeds the nursury size it merely gets lumped into the next generation.. This means that it won't even be looked at for many dozens or hundreds of garbage collects.
It is only when that particular generation gets refreshed (because that generation is too full to accept the previously younger generations over-spill) that it's memory pages will be touched.
By putting all the reference handles in a tight continugous space, you also reduce the "touching" of too many pages for the purposes of GC analysis.
Compared to an allocator slab alloc/free, you have two possibilities: external referencing where a tight continuous space of free-pointer/size data-structures lives (memory handles) and the untouched pages of heap-space, or inlined linked-lists of free an allocated allocation chunks. In the case of inlined, then the process of walking a map on a single alloc and or free is likely to do similar degrees of damage as a minor GC to the CPU cache. If, for example, you use a "best-fit" algorithm, or even a buddy-system, you're going to dance around a non-trivial number of pages that are highly likely to not be in the cache (because they weren't directly in use by the application, and have statistical randomness as to where the traversals take them).
And don't forget the extreme overhead of compactification. Buddy systems aren't too bad (though I haven't worked closely w/ them), but fitting algorithms can be pretty bad.
I would agree however that in practice, I've seen more swaps of death from competing java processes than from any other architecture system.
-Michael
> When's the last time you've seen an application written in C that only used the standard C library?
Every single day for the last 4 years, at a guess. There are a heck of a lot of awfully useful unix utilities that don't use anything other than the standard libraries.
Craig
My web browser lets me download files from the web and my mac lets me send them to my phove using bluetooth. I've never been able to unerstand why anyone pays for ringtones or phone sized images since most phones seem to play mp3s anyway these days. mine does and it was a free phone that came with my bottom-of-the barrel vodafone account.
I used to have a better sig than this, but I got tired of it
Toplevel ISPs have issues, not google. It would probably require the whole intarweb to break down to down all of google's various servers
Umm, you know, google had a big outage a year or two ago...