Slashdot Mirror


Help crack the Java 1.6 Classfile Verifier

pdoubleya writes "As part of the development of Mustang (Java 1.6), Sun is developing a new, smaller and faster classfile verifier which they want your help in trying to break. As Sun VP Graham Hamilton puts it in his blog entry, "As part of Mustang we will be delivering a whole new classfile verifier implementation based on an entirely new verification approach. The classfile verifier is the very heart of the whole Java sandbox model, so replacing both the implementation and the basic verification model is a Really Big Deal.... The new verifier is faster and smaller than the classic verifier, but at the same time it doesn't have the ten years of reassuring shakedown history that we have with the classic verifier." You can read about the new verifier on Gilad Bracha's blog, and join the new Crack the Verifier initiative to if you can break it. Read all about the Crack the Verifier - Challenge."

12 of 276 comments (clear)

  1. Take Java seriously by rexguo · · Score: 4, Interesting

    Before those who go on to dismiss Java for various reasons (no matter how ignorant they are), take a look at the presentation given by Google at this year's JavaZone conference on how Google is using Java internally at extreme scales. Among them are AdWords and GMail.

    --
    www.rexguo.com - Technologist + Designer
    1. Re:Take Java seriously by Naikrovek · · Score: 5, Insightful

      I don't think Java is as slow as you think it is. It is very fast lately, and it is actually giving C a run for its money in some respects. It is *definitely* not the slug everyone thinks it is.

      They're probably using Java because its not as slow as its reputation, and its becoming a commodity language in the enterprise lately. My corporate overlords have dictated the use of Java (IBMs WebSphere) for all current and future enterprise development, and most of us developers couldn't be happier. Everywhere I do contracting work for lately also uses Java. Java Is A Great Language(TM), especially since 5.0.

      There used to be a time when I believed that all techies had agreed that Java was slow and bloated, but once I stopped reading Slashdot comments so religiously I began to see some truth. It isn't slow, it isn't bloated, and it isn't something I expect the Slashdot crowd (that I'm a founding member of) to understand anytime soon.

    2. Re:Take Java seriously by jiushao · · Score: 4, Funny
      It depends on what you mean by "slow." If you're talking about long running processes, then no, it isn't slow at all; in fact, it is quite fast. If you're talking about short-running processes, then the JVM startup time overshadows any commendable performance.

      Yeah, it is a shame, if only Sun would do something like writing a new faster classfile verifier.

    3. Re:Take Java seriously by AKAImBatman · · Score: 5, Interesting

      The reasons to use Java on the server are quite simple. The combination of factors that attracted developers to Java in the first place make them want to use it on the server. Those factors are:

      1. Cross-platform capability - Many companies still prefer to deploy applications on large Sun, IBM, or Linux (name your brand) servers. However, these companies would also like to give their developers Windows desktops so they can interact with the rest of the company. (Who most likely uses MS Office/Outlook.) As long as you avoid explicit path names, it is quite easy (and common!) to develop on a Windows machines but deploy on a Unix or Unix-like machine.

      2. Automatic Memory Management - So your server is running along, and suddenly someone generates an unexpected error. In Java you can sleep soundly because even the worst programmer would have a hard time doing anything to completely take down the application. If you use a language that allows direct memory management, you have a good chance of that new guy coding a General Protection Fault/Segfault. The result is that your entire system coredumps when you least expect it.

      3. Security - While Java is able to control the Security of the ENTIRE JVM through its security framework, most companies are happy with the lack of buffer overruns, code injection techniques, and other common attacks. That's not to say that a poor programmer can't put a security hole in the application wide enough to drive a Mack truck through, but at least you can rely on the underlying system not to betray you.

      4. Flexibility - The Java server side frameworks are exceedingly flexible in their designs. For example, the servlet framework allows you to plug in your own custom server page technology. I have seen many a programmer (including myself) implement something like Reports by simply linking the ".rpt" extension to a custom servlet. The servlet then loads the requested configuration file and executes it. Very nice.

      Another example is servlet filters. Need a security framework added in a hurry? Just add a filter servlet! It will execute before the rest of the code, allowing you to check the variables and security permissions to ensure that the client isn't trying any funny business.

      5. API - When Java was first introduced, it absolutely creamed all the competing languages in the richness of its bundled API. As time has worn on, this has changed. However, Java still enjoys a sizable lead over even C/C++ with features such as Type IV (tested cross-platform, pure Java) JDBC database drivers. Unlike ODBC, many of these drivers have been tuned for excellent performance. Similarly, there are free APIs for handling Office Documents, PDF Creation/Editing, SOAP/XML-RPC communications, Object-Relational mapping, Image Management/Creation/Editing, CORBA, XML Databases, XSL-T, etc. While these APIs are all available for C/C++, there are significant cross-platform issues with many of them, as well as a lack of common "pluggable" APIs that allow for one API to many implementations.

      Other languages have a hit/miss score with these sorts of features, often not providing these features, providing only a small subset, or only being available in an expensive commercial package.

      6. Dynamic Loading - While C/C++ can manage dynamic loading of shared objects, it's a very difficult thing to implement. Java does it out of the box, with a full reflection API and interface support, thus allowing such wonderful code as Beans, Servlets, Pluggable Drivers, self-organizing code, and a host of other features that other systems can't compete with.

      (If you don't believe me, try adding support for a feature in PHP sometime. "It's so simple! Just install the SO and recompile PHP!" Meh.)

      7. Performance - This may sound like an odd thing to say, but the performance of Java is a key selling feature. Java server applications may execute more slowly than one written in C/C++ (just as C/C++ may execute more slowly than

    4. Re:Take Java seriously by icebattle · · Score: 5, Informative
      Have you tried running your app with -XX:+UseParNewGC -XX:+UseConcMarkSweepGC?

      This will allow the vm to do small amounts of gc whenever it has a chance, as opposed to wating until an allocation request will fail and then running through the entire heap.

      Our app runs 24/7 and while the markets are open and 10meg+ of raw data is coming down the line every second, we can't allow the app to take a timeout for a gc run. The app runs in 256meg, too.

    5. Re:Take Java seriously by swillden · · Score: 5, Informative

      Java will ALWAYS keep the 64MB heap. It will NEVER shrink it.

      Who cares? This is completely irrelevant, as long as the JVM marks the pages it's not using as discardable, which modern JVMs on modern OSes do.

      You have to remember that all memory is virtual. I can allocate a 1GB array, but as long as I never actually touch the pages (making them "dirty"), no storage, whether RAM or disk, is ever used. When the JVM allocates its 64MB, that virtual memory is initially all "clean" and therefore consumes no RAM. As it's used, it gets dirtied and physical RAM gets mapped to it... but when a garbage allocation occurs, both the Sun and IBM JVMs mark the now-unused pages as clean, allowing the OS to reuse them at will. Effectively, they no longer consume any space.

      Even if the JVMs didn't mark the pages as clean, the impact of the JVM holding onto the 64MB wouldn't be that significant. The allocated, dirtied but now-unused memory will simply get swapped out to disk, allowing those pages of RAM to be used by other applications.

      With a decent OS, it really doesn't matter if an app holds onto some memory that it's not using, especially if it has the decency to tell the OS that it's not actually using it.

      That said, there is a "problem" here, it's just not the one you're pointing out. The problem, if you want to call it that, arises from the generational, copying, compacting garbage collector used by modern VMs. Now, don't get me wrong, this GC is very cool. It's significantly more efficient than typical malloc/free memory management, *and* it eliminates some major classes of memory leaks (programmers can still leak memory with GC, but it's harder).

      Without getting into the details, although GC is safer and more efficient in terms of CPU cycles, and although on average it doesn't use a great deal more memory than manual allocation would use (particularly since many performance-sensitive manual allocation apps end up managing their own memory pools in order to avoid the run-time cost of malloc() and free()), GC does tend to increase the number of memory pages that get touched over time.

      Why? Two reasons. First, suppose the application allocates a small chunk of memory, uses it, frees it, allocates another small chunk (about the same size), and then uses and frees it. Most malloc()/free() implementations will tend to return the same chunk of memory for both allocations. Repeating the process a thousand times (which isn't all that uncommon) will probably only dirty a single page of memory. With the sort of garbage collector used by current JVMs, every one of those allocations will return a different chunk of memory, and many pages will therefore be dirtied. In terms of CPU cycles, GC wins big, because, rather than a thousand small frees, there is one big one. And allocation is up to two orders of magnitude faster. It may sound like the GC approach is going to use a lot more memory, on average, but it's not that bad because of the tendency of malloc/free-managed heaps to become fragmented. On average, GC implementations don't have many more pages in use than malloc/free implementations, and may actually have less, but the GC allocators tend to "roam" across the pages.

      Second, the "copying" nature of the GC. The main thing that makes generational, copying GC-based memory management so fast it that it never has to deal with fragmentation. To describe it in an oversimplified way: Every now and then, the GC relocates all of the in-use objects into a nice, compact block. That makes allocation fast, and tends to reduce the number of pages in active use, but it increases the number of pages that get dirtied and therefore require actual RAM to be provided by the OS. The copying also has a cost in CPU cycles, but it's small relative to the cost of managing and searching free lists, which is what malloc/free implementations have to do.

      Theoretically, as the GC copies objects and marks the pages where they used to live as discardable, the OS coul

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    6. Re:Take Java seriously by TheRaven64 · · Score: 4, Interesting
      I would refer you to some research done around a decade ago, which involved running a MIPS emulator on a MIPS machine. The emulator was doing dynamic optimisation, and got around a 10% speed increase over the same code running directly on the hardware.

      A Java VM does some things that are simply not possible with C. To inline a C function, you need the source for both - this leads to some really ugly things like putting simple functions in headers, which should be reserved for interface definitions, not implementation details. The Java VM will inline functions on the fly. This can potentially give a huge performance boost - I got almost a 50% speed increase on some C code I was recently writing by shuffling things around to allow the compiler to inline some common functions.

      The other advantage of higher-level languages is that they provide more semantic information to the optimiser. Consider the trivial example of autovectorisation. In C, if you want to do an operation on a vector, you will usually iterate over every element and perform the operation. The compiler then needs to check that there are no dependencies between loop iterations, which can be non-trivial. In a language like FORTRAN or Smalltalk, you can simply perform an operation on a vector type. The compiler then just needs to check if the operation you are trying to perform corresponds to one or more vector unit instructions, and substitute these in to your code. This is much easier to do.

      C is a fairly easy language to write optimised code in for any CPU up to and including a 386. For anything more modern, you will find yourself fighting a language which is simply not designed to deal with parallelism - and compiler writers find themselves fighting even harder.

      --
      I am TheRaven on Soylent News
    7. Re:Take Java seriously by bbn · · Score: 4, Insightful

      You can say the same thing about the parent comment about java memory management. The J2ME for mobilephones does indead free the memory. Funny how java for embedded systems uses the same strategy eh? J2ME works with very little memory. A Nokia 3410 only has 64 KB memory for the java games.

      The parent post was obviously talking about ordinary client applications running on PC under either Windows, Linux, Mac or something simelar.

      On such a system, malloc cannot map directly to the OS API, because the OS will only allocate full pages at a time. So if you want 40 bytes of memory, the OS would give you 1 KB (or whatever the page size is).

      This also means that if you allocate 2x40 bytes, then free the first 40 bytes block, malloc can not free the page. It is simply not possible, since it would have to free the whole page.

      Some J2ME implementations can defragment the memory in this situation, to be able to release memory back to the OS. That is impossible with a C program, where direct pointers to memory is allowed.

      For large blocks you can of course go directly to the OS.

  2. More on Gilad Bracha by tcopeland · · Score: 4, Informative

    If his name doesn't ring a bell, he's a Java guru who works for Sun and wrote the 2nd and 3rd editions of the Java Language Spec. A bunch of his papers are listed here.

    It's a relief that JDK 1.6 won't include any language changes (as far as I know?). Updating various parsers and whatnot to work with all the JDK 1.5 language changes was a big job, although some of the new features certainly are quite handy.

  3. Re:Aren't QA people supposed to get paid? by Naikrovek · · Score: 4, Insightful

    the onus isn't on the community, the onus is on the developers and their QA team. This is just an attempt to get a few more eyeballs on the verifier in case something falls through. There's nothing wrong with that.

    Also it is an opportunity for someone to get recognition for breaking a new peice of software.

    It is important to get extra scrutiny on newly designed peices of software, for it is the new designs that usually break in the least expected ways.

  4. Optimization and late binding by jfengel · · Score: 4, Insightful

    Another nice thing about the new classfile specification is that it's going to make certain new kinds of optimization possible. The more you can prove about what's on the stack at any given point, the more you can inline.

    Not only does inlining eliminate method call overhead, but it allows you to re-run the peephole optimizer, which can eliminate range checks, reduce redudant type checks, etc.

    The ultimate performance promise of Java is that it can do optimization very, very late in the process. Native libraries are basically black boxes in C/C++, and it's very hard to do that sort of inlining because most of the type information has been lost. Java may, someday, with sufficient ingenuity, rival or even beat C++ in performance, and it already does in certain limited areas.

    Of course C# has all of the same advantages, and even though it's more recent there are some areas where its performance beats Java. I'd love to see all the Microsoft reasearchers vs. all the Sun researchers coming up with increasingly brilliant ways to take advantage of the late binding to turn a performance hindrance into a benefit.

  5. Re:Why not prove it? by TrappedByMyself · · Score: 4, Insightful

    Why not do what it takes: Prove that it will work, and prove that it cannot be broken!

    Did you just walk out of an undergrad Computer Science class? ;)
    Popping in pre/postconditions and doing line-by-line proofs doesn't cut it for an application of this complexity. While that is an important part of a real process, it doesn't guarentee coverage. You still have to make assumptions about the environment, which is the gotcha. Testing and QA is all about the assumptions you make and the boundaries you set. With a complex application the number of factors grows so large, that you cannot have the resources to cover every possible test. You can grab the most common stuff, but really need to dump it to the community to get the real 'out of the box' thinking hitting it.

    --

    Help me take back Slashdot. When did 'News for Nerds' become 'FUD and Conspiracy Theories for Extremist Nutjobs'?