Help crack the Java 1.6 Classfile Verifier

← Back to Stories (view on slashdot.org)

Help crack the Java 1.6 Classfile Verifier

Posted by ryuzaki0 on Monday October 31, 2005 @01:40AM from the crack-it-break-it-open-jump-on-it dept.

pdoubleya writes "As part of the development of Mustang (Java 1.6), Sun is developing a new, smaller and faster classfile verifier which they want your help in trying to break. As Sun VP Graham Hamilton puts it in his blog entry, "As part of Mustang we will be delivering a whole new classfile verifier implementation based on an entirely new verification approach. The classfile verifier is the very heart of the whole Java sandbox model, so replacing both the implementation and the basic verification model is a Really Big Deal.... The new verifier is faster and smaller than the classic verifier, but at the same time it doesn't have the ten years of reassuring shakedown history that we have with the classic verifier." You can read about the new verifier on Gilad Bracha's blog, and join the new Crack the Verifier initiative to if you can break it. Read all about the Crack the Verifier - Challenge."

14 of 276 comments (clear)

Min score:

Reason:

Sort:

More on Gilad Bracha by tcopeland · 2005-10-31 02:02 · Score: 4, Informative

If his name doesn't ring a bell, he's a Java guru who works for Sun and wrote the 2nd and 3rd editions of the Java Language Spec. A bunch of his papers are listed here.

It's a relief that JDK 1.6 won't include any language changes (as far as I know?). Updating various parsers and whatnot to work with all the JDK 1.5 language changes was a big job, although some of the new features certainly are quite handy.

--
The Army reading list
1. Re:More on Gilad Bracha by Naikrovek · 2005-10-31 02:10 · Score: 2, Informative
  
  I believe Java 6 introduces java.io.Console, which provides several convenience methods when using the console, but other than that I don't know of any language changes.
MS Anti spyware beta by Anonymous Coward · 2005-10-31 02:12 · Score: 2, Informative

You mean like this; http://www.microsoft.com/athome/security/spyware/s oftware/default.mspx ?
Re:Condition by Anonymous Coward · 2005-10-31 02:18 · Score: 2, Informative

Here you go:
https://mustang.dev.java.net/
dotNET is overrated by dascandy · 2005-10-31 02:25 · Score: 3, Informative

It allows you to work faster and create more in a short while. It allows you to create abnormally slow programs that you can't even speed up with the willpower to do so, because of Windows internals. Those exact internals that Java won't touch with a stick.

Java doesn't look like win32 because it isn't even trying to. It's trying to look platform-independant and the same on all platforms, with the option to skin it to any GUI you want. dotNET IS windows. There's no wonder that it looks a lot more like windows.

I must strongly disagree on the OO implementation however, aside from it not supporting multiple inheritance, it's just good. Microsofts methodics are plain stupid, because for everything you want to do you have to specify it so explicitly my fingers still hurt last time I tried it.

Compare:

Java:
public class xyz {
int function() { // some function
}
}

public class abc extends xyz {
int function() { // different, automatic polymorphism
}
}

C++:
class xyz {
public: virtual int function() { // some function
};
public: int functiontoo() { // some function too
};
};

class abc : xyz {
public: virtual int function() { // some other function! again automatic!
};
public: int functiontoo() { // some function too, not polymorphic!
}:
};

C#: (might contain errors, been a while)
public class xyz {
public virtual int function() { // function
}

public int functiontoo() { // functiontoo
}
}

public class abc : xyz {
public override int function() { // override, kind of pointless...
}

public new int functiontoo() { // ... why new? that's reserved for memory allocation...
}
}

My point is, .NET (in C#) requires you to make everything you want so explicit that I'm inclined to say that you're wasting time doing that more than you're gaining time due to other factors.

Plus, I just don't like their idea of a good library. Rape the C++ STL, why don't ya. Either support c++ (and the STL), or don't support it at all.
Re:Take Java seriously by bbn · 2005-10-31 02:48 · Score: 3, Informative

You realise that the C malloc/free calls are just the same right? It never releases memory back to the OS.

The only difference in this regard, is that java might alloc extra memory from the OS, when it could have run the garbage collector and reused some memory instead.

Very few programs actually ever release memory back to the OS.
Re:Take Java seriously by jaywee · 2005-10-31 02:55 · Score: 3, Informative

That depends on libc implementation. glibc for example uses mmap() for bigger allocations - therefore is able to return the freed memory to the OS.
Garbage Collection by CaptainPinko · 2005-10-31 04:01 · Score: 2, Informative

I believe that there is a concurrent garbage collector in Sun's JVM that while not as effecient over-all but runs continously preventing pauses and bubbles associated with traditional garbage collectors.

I'm my Java in a Nutshell 4th Edition (p. 246) one of the java(the interpreter) arguments is:
-Xincgc Uses incremental garbage collection. In this mode the garbage collector eun continuously in the background, and a running program is rarely if ever, subject to noticeable pauses while gharbage collection occurs. Using this option typically results in A 10% decrease in performance overall, however. Java 1.3 and later.
As this book was written for Java 1.4 I'd bet it was fine-tuned and enhance for 1.5 and even more so for the upcoming 1.6. Might be worth trying out.

--
Your CPU is not doing anything else, at least do something.
Re:Take Java seriously by icebattle · 2005-10-31 04:03 · Score: 5, Informative

Have you tried running your app with -XX:+UseParNewGC -XX:+UseConcMarkSweepGC?
This will allow the vm to do small amounts of gc whenever it has a chance, as opposed to wating until an allocation request will fail and then running through the entire heap.
Our app runs 24/7 and while the markets are open and 10meg+ of raw data is coming down the line every second, we can't allow the app to take a timeout for a gc run. The app runs in 256meg, too.
Re:Take Java seriously by Anonymous Coward · 2005-10-31 04:15 · Score: 1, Informative

Java /is/ a natively compiled language, or rather it can be. I suppose the politically-correct-for-Slashdot complier to mention is GCJ, but personally I use the excellent Excelsior JET. Great performance, reasonable footprint, much faster development than C++, more scalable than Perl/Python/etc.
Re:Take Java seriously by 21mhz · 2005-10-31 04:16 · Score: 2, Informative

if only Sun would do something like writing a new faster classfile verifier.

They do: since (1.)5.0, JRE classes come preloaded in a shareable file.

--
My exception safety is -fno-exceptions.
Re:Take Java seriously by swillden · 2005-10-31 05:00 · Score: 5, Informative

Java will ALWAYS keep the 64MB heap. It will NEVER shrink it.
Who cares? This is completely irrelevant, as long as the JVM marks the pages it's not using as discardable, which modern JVMs on modern OSes do.
You have to remember that all memory is virtual. I can allocate a 1GB array, but as long as I never actually touch the pages (making them "dirty"), no storage, whether RAM or disk, is ever used. When the JVM allocates its 64MB, that virtual memory is initially all "clean" and therefore consumes no RAM. As it's used, it gets dirtied and physical RAM gets mapped to it... but when a garbage allocation occurs, both the Sun and IBM JVMs mark the now-unused pages as clean, allowing the OS to reuse them at will. Effectively, they no longer consume any space.
Even if the JVMs didn't mark the pages as clean, the impact of the JVM holding onto the 64MB wouldn't be that significant. The allocated, dirtied but now-unused memory will simply get swapped out to disk, allowing those pages of RAM to be used by other applications.
With a decent OS, it really doesn't matter if an app holds onto some memory that it's not using, especially if it has the decency to tell the OS that it's not actually using it.
That said, there is a "problem" here, it's just not the one you're pointing out. The problem, if you want to call it that, arises from the generational, copying, compacting garbage collector used by modern VMs. Now, don't get me wrong, this GC is very cool. It's significantly more efficient than typical malloc/free memory management, *and* it eliminates some major classes of memory leaks (programmers can still leak memory with GC, but it's harder).
Without getting into the details, although GC is safer and more efficient in terms of CPU cycles, and although on average it doesn't use a great deal more memory than manual allocation would use (particularly since many performance-sensitive manual allocation apps end up managing their own memory pools in order to avoid the run-time cost of malloc() and free()), GC does tend to increase the number of memory pages that get touched over time.
Why? Two reasons. First, suppose the application allocates a small chunk of memory, uses it, frees it, allocates another small chunk (about the same size), and then uses and frees it. Most malloc()/free() implementations will tend to return the same chunk of memory for both allocations. Repeating the process a thousand times (which isn't all that uncommon) will probably only dirty a single page of memory. With the sort of garbage collector used by current JVMs, every one of those allocations will return a different chunk of memory, and many pages will therefore be dirtied. In terms of CPU cycles, GC wins big, because, rather than a thousand small frees, there is one big one. And allocation is up to two orders of magnitude faster. It may sound like the GC approach is going to use a lot more memory, on average, but it's not that bad because of the tendency of malloc/free-managed heaps to become fragmented. On average, GC implementations don't have many more pages in use than malloc/free implementations, and may actually have less, but the GC allocators tend to "roam" across the pages.
Second, the "copying" nature of the GC. The main thing that makes generational, copying GC-based memory management so fast it that it never has to deal with fragmentation. To describe it in an oversimplified way: Every now and then, the GC relocates all of the in-use objects into a nice, compact block. That makes allocation fast, and tends to reduce the number of pages in active use, but it increases the number of pages that get dirtied and therefore require actual RAM to be provided by the OS. The copying also has a cost in CPU cycles, but it's small relative to the cost of managing and searching free lists, which is what malloc/free implementations have to do.
Theoretically, as the GC copies objects and marks the pages where they used to live as discardable, the OS coul

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:Take Java seriously by Hard_Code · 2005-10-31 06:21 · Score: 3, Informative

And given the escape analysis that is apparently going to be implemented in the next version of Java, it looks like a large chunk of the class of problem you propose (many small allocations/deallocations causing "roaming") may be able to be eliminated. The VM will know when some references never "escape" a certain context/scope, and adjust memory management accordingly. One common scenario, and source of many "temporary" objects, will be automatic/local variables in methods, which can be stack-allocated, eliminating this problem.

--

It's 10 PM. Do you know if you're un-American?
Re:Take Java seriously by hr+raattgift · 2005-10-31 07:36 · Score: 1, Informative

GC systems in a similar steady state tend to roam all over their heap, touching lots of pages. This can cause the OS to keep more physical pages mapped to the process's virtual memory space, even though the process isn't actually using more memory.
Not necessarily. Modern GCs tend to take advantage of the generational hypothesis which suggests that most objects die young. One can code with generational GC with fast handling of deaths in the "nursery" of recently-created objects in mind.

A typical generational GC with two generations of old objects and a nursery works like this.

You maintain four pointers:

G1, the top of the oldest generation.
G0, the top of the 2nd oldest generation.
FREE, the top of the nursery.
CEIL, which limits the size of the heap.

At initialization, the first two of these all point to the bottom of the heap.
CEIL will point to a memory location half way between the bottom and the top of the heap, or higher.

Set FREE to CEIL, so we begin allocating in of the top of the heap.

When we allocate an object, we simply increment FREE.

When FREE reaches the top of the heap, we collect the nursery.

We do this by sliding down all live objects between FREE and the top of the heap to the area between G0 and CEIL.

In a tracing GC system, live objects are those pointed to by the roots (registers, for example, or live stack frames) as well as objects in the older generations. We can optimize the identification of the latter by maintaining a write barrier on the older generations, which tells us whether a section (perhaps a page) in the older generation was touched since the last garbage collection. A page which hasn't been touched since the last gc cannot point to live objects in the nursery. A page which has been touched might point to live objects in the nursery, so we have to trace the pointers in the page as if they were roots.

The slide-down relocation is usually done by copying the object and leaving behind a forwarding pointer. In the case of linked lists and the like, the lists are walked recursively, often breadth-first, although depth-first can give better locality of reference.

During the migration of the live objects, we can increment G0 as we go, reusing the allocation mechanism, or we can adjust G0 after the slide-down relocations are finished. Either way, when we are done, we have expanded the area between G1 and G0, and no longer care what's above CEIL. If there is sufficient space between G0 and CEIL, we set FREE to CEIL and operate as usual.

If we are tight on space between G0 and CEIL we can either expand the heap (and increase CEIL) or we can collect one or both of the older generations. Again, we identify live objects and slide them down to the earlier generation.

In the case where we are creating lots of short-lived objects we are doing very little copying during the nursery collection -- we can end up effectively doing nothing but writing objects into the top of the heap then resetting the FREE pointer when we get there. If there are many more pages between CEIL and the top of the heap then yes we will be touching more pages than if we had a very disciplined explicit allocate-and-free mechanism. If our nursery is small, however, that is not the case.

There are other ways of handling the nursery, including the Cheney on the MTA approach of using C's runtime stack (alloca() and friends) to store young objects. We still have to find and live objects when our nursery gets too large, but the C "return" reclaims all the dead ones. The implementation is straightforward for a continuation-passing-style language compiled into C, works well with small nursery sizes, and has been implemented in the Chicken implementation of Scheme.

Optimizing the speed of nursery handling, and partitioning the