Slashdot Mirror


Oracle Calls Java Serialization 'A Horrible Mistake', Plans to Dump It (infoworld.com)

An anonymous reader quotes InfoWorld: Oracle plans to drop from Java its serialization feature that has been a thorn in the side when it comes to security. Also known as Java object serialization, the feature is used for encoding objects into streams of bytes... Removing serialization is a long-term goal and is part of Project Amber, which is focused on productivity-oriented Java language features, says Mark Reinhold, chief architect of the Java platform group at Oracle.

To replace the current serialization technology, a small serialization framework would be placed in the platform once records, the Java version of data classes, are supported. The framework could support a graph of records, and developers could plug in a serialization engine of their choice, supporting formats such as JSON or XML, enabling serialization of records in a safe way. But Reinhold cannot yet say which release of Java will have the records capability. Serialization was a "horrible mistake" made in 1997, Reinhold says. He estimates that at least a third -- maybe even half -- of Java vulnerabilities have involved serialization. Serialization overall is brittle but holds the appeal of being easy to use in simple use cases, Reinhold says.

6 of 198 comments (clear)

  1. Re:Was very obvious back then by Anonymous Coward · · Score: 3, Informative

    But the Java fanatics just put in more and more features, regardless of whether sane languages had them or not.

    Obvious?

    Well, given the abstraction from actual hardware that is Java's goal, how would you create a way to pass data from machine to machine without worrying about things like word size and endianness?

    Got any objective reasons? Because what you've posted is just an opinion. And just like that other thing everyone else has, frankly it stinks.

  2. Re:Was very obvious back then by Anonymous Coward · · Score: 0, Informative

    But the Java fanatics just put in more and more features, regardless of whether sane languages had them or not.

    Obvious?

    Well, given the abstraction from actual hardware that is Java's goal, how would you create a way to pass data from machine to machine without worrying about things like word size and endianness?

    Are you for real? Maybe I'm not getting your sarcasm, but this is a solved problem and was a solved problem back in 1997.

    See here for one example of object serialization of binary fields that was doable back in 1997. Serializing 15 or so fields in a single statement; not exactly what I would call rocket science.

  3. Re:Was very obvious back then by h8sg8s · · Score: 3, Informative

    It was solved even earlier with XDR (R.I.P. Dr Bruce Nelson..)

    --
    Organization? You must be joking..
  4. Re:The reason why it is dangerous by Anonymous Coward · · Score: 5, Informative

    If I'm following you correctly, the problem isn't serialization per se but rather the fact that the deserialization is being done by the Java runtime (which has no way to validate the resulting objects against the application's requirements, since its deserialization code is application-independent, and also has the power to instantiate any kind of object, even those that are totally irrelevant to the task at hand), rather than by the application itself.

    Java deserialization is magic. By which I mean it behaves in several ways that user code pretty much can't.

    The default system effectively loads a binary blob off the input stream and then creates each object without calling a constructor*. You can't just not call a constructor in Java, but Java deserialization does. All the fields are set by magic, by which I mean it ignores getters and setters and whatever access level might be on the fields. Any field marked as "not serialized" (transient) is left with default values - but those may not be the default values you think! If you write private transient int foo = 3; then foo won't be serialized, and when the object is deserialized, it will instead be ... 0. Because 0 is the default for ints.

    How does Java deserialization know if it's loading the right fields for a given object? Well, it's magic, but not that magic - you're supposed to let it know by setting the serialization ID for the class. And how do you do that? By declaring a static long serialVersionUID, and making sure you update it whenever your class structure changes. Don't do that and the deserialization logic might not notice that the structure doesn't quite match. No, you can't just have it autogenerate one - if not set, the serialization/deserialization code will create one, but it may be dependent on compiler and randomly break across identical code bases. Surprise!

    But in any case, the serialization system is magic. How do you write a custom serializer/deserializer? By creating the private methods writeObject(ObjectOutputStream) and readObject(ObjectOutputStream). Because the serializer is magic, it can access these private methods. (Note that readObject(ObjectOutputStream) gets called on a magically created object that has never had a constructor called on it, so all fields will have their default values! How does that work with final fields? Well... the short answer is "like shit." The longer answer is that the default deserializer just ignores the final modifier (which you can't do in generic code), and that if you want to do the same, there's some reflection magic or non-standard APIs you can do.)

    So anyway, there's a basic overview of how Java serialization defies expectations and basically guarantees that anyone writing code that involves serialization will do it wrong.

    * This is false. What it really does is go up the object hierarchy and look for the first parent class that does not declare itself serializable and calls its default no-args constructor. But that means that your class that you declared serializable therefore, by definition, does not get its constructor called. Surprise!

  5. Re:I don't get it by angel'o'sphere · · Score: 4, Informative

    Java is in so far unique as when you use build in serialization, you also serialize the class files.
    There are two "marker interfaces" to make a Java class serializable: Serializable and Externalizable.

    In casse of the first one, the Java Framework/VM uses reflection to serialize and deserialize objects.
    In case of the second one, you are required to implement the methods writeExternal() and readExternal().

    As the class files are in the serialized data stream, a program reading "untrusted" serialized data might also load classes aka code from that stream. If that code implements Externalizable and thus has an "unknonwn foreign" method readExternal(), the deserialization framework will call that unknown/untrusted method readExternal() which means: you run code coming from outside, which can do what ever it wants besides reading the object from the object stream.

    --
    Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
  6. Re: Was very obvious back then by peppepz · · Score: 3, Informative

    Java is one of the fastest, if not the fastest, non-native language.