Oracle Calls Java Serialization 'A Horrible Mistake', Plans to Dump It (infoworld.com)

← Back to Stories (view on slashdot.org)

Oracle Calls Java Serialization 'A Horrible Mistake', Plans to Dump It (infoworld.com)

Posted by EditorDavid on Saturday May 26, 2018 @07:34AM from the just-in-time dept.

An anonymous reader quotes InfoWorld: Oracle plans to drop from Java its serialization feature that has been a thorn in the side when it comes to security. Also known as Java object serialization, the feature is used for encoding objects into streams of bytes... Removing serialization is a long-term goal and is part of Project Amber, which is focused on productivity-oriented Java language features, says Mark Reinhold, chief architect of the Java platform group at Oracle.

To replace the current serialization technology, a small serialization framework would be placed in the platform once records, the Java version of data classes, are supported. The framework could support a graph of records, and developers could plug in a serialization engine of their choice, supporting formats such as JSON or XML, enabling serialization of records in a safe way. But Reinhold cannot yet say which release of Java will have the records capability. Serialization was a "horrible mistake" made in 1997, Reinhold says. He estimates that at least a third -- maybe even half -- of Java vulnerabilities have involved serialization. Serialization overall is brittle but holds the appeal of being easy to use in simple use cases, Reinhold says.

36 of 198 comments (clear)

Min score:

Reason:

Sort:

Was very obvious back then by gweihir · 2018-05-26 07:40 · Score: 4, Insightful

But the Java fanatics just put in more and more features, regardless of whether sane languages had them or not.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
1. Re:Was very obvious back then by Anonymous Coward · 2018-05-26 07:53 · Score: 3, Informative
  
  But the Java fanatics just put in more and more features, regardless of whether sane languages had them or not.
  Obvious?
  Well, given the abstraction from actual hardware that is Java's goal, how would you create a way to pass data from machine to machine without worrying about things like word size and endianness?
  Got any objective reasons? Because what you've posted is just an opinion. And just like that other thing everyone else has, frankly it stinks.
2. Re:Was very obvious back then by greenwow · 2018-05-26 08:30 · Score: 2
  
  Oh please. This wasn't a failure with their implementation. It's an issue with the concept which is still a good thing because the positives still outweigh the negatives.
  It just sucks though going through a 100+ projects to add jaxb to their pom files to prepare for Java 11 LTS that's coming in September.
3. Re:Was very obvious back then by h8sg8s · 2018-05-26 08:31 · Score: 3, Informative
  
  It was solved even earlier with XDR (R.I.P. Dr Bruce Nelson..)
  
  --
  Organization? You must be joking..
4. Re:Was very obvious back then by TheRealMindChild · 2018-05-26 09:06 · Score: 4, Funny
  
  If XML isn't the solution to your problem, you aren't using enough
  
  --
  
  "When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
5. Re: Was very obvious back then by Z00L00K · 2018-05-26 09:10 · Score: 4, Insightful
  
  The disadvantage with xml is that it creates a lot of overhead, which could be a problem in embedded applications and large scale solutions.
  
  --
  If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
6. Re:Was very obvious back then by gweihir · 2018-05-26 11:02 · Score: 2
  
  You think older mistakes should not be corrected?
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
7. Re:Was very obvious back then by TechyImmigrant · 2018-05-26 11:16 · Score: 2
  
  No. The problem is with the concept. Persistent objects are dangerous.
  Insanity.
  I like files. They are objects that are persistent.
  Just because people are too weak to sanitize inputs doesn't mean that storing bytes persistently is a bad idea.
  
  --
  I should use this sig to advertise my book ISBN-13 : 978-1501515132.
8. Re: Was very obvious back then by TechyImmigrant · 2018-05-26 11:18 · Score: 2
  
  Ada is a subset of a HDL language used to design the CPU you are running your web browser on right now.
  Just because software engineers find it hard, it doesn't stop hardware engineers managing just fine with it.
  
  --
  I should use this sig to advertise my book ISBN-13 : 978-1501515132.
9. Re:Was very obvious back then by angel'o'sphere · 2018-05-26 13:31 · Score: 2
  
  The link is about structs, not about objects.
  So when you deserialize them, you have no vtable.
  And if you had read (and comprehended your link) you had realized: the author shows up all the problems in serialization. He does not really propose portable solutions.
  So: what exactly was your point?
  
  --
  Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
10. Re: Was very obvious back then by angel'o'sphere · 2018-05-26 13:37 · Score: 2
  
  Ada did not take off because when it came into the industry compilers were absurdly expensive and every "Ada vendor" wanted your leg and your first born.
  Besides that Ada is a nice language, very well designed. I would love to program in Ada, but because of the idiots who made it unpayable expensive most ada projects switched to C++
  It is barely usable.
  If you can not use Ada effectively you likely can not use any other programming language either.
  
  --
  Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
11. Re:Was very obvious back then by Junta · 2018-05-26 13:57 · Score: 2
  
  The issue is that in practice this feature (and features like it in other languages) 99% of the time the programmer intends it to be used for persisting "boring old data" in the laziest way possible. The feature of having data be evaluated with executable instructions being honored is just a huge liability.
  The 1% of the time when the programmer explicitly does use the "data can have code-to-eval" capability, it has been in my experience, always done better another way (such code is generally a pain to debug because an intuitive path is used for some of the code that executes, and I've yet to see a situation where they really *needed* to accomplish their goals that specific way.
  Basically, it's encouraging a bad practice of mixing and matching data and executable code. Ideally you want your .jpg to be "just an image" and only have to worry about arbitrary executable data when dealing with an executable file.
  
  --
  XML is like violence. If it doesn't solve the problem, use more.
12. Re: Was very obvious back then by Jeremi · 2018-05-26 16:13 · Score: 2
  
  The number of bugs in CPUs is an order of magnitude less than in most software. It has to be, because recalling a million CPUs is economically unfeasible. âoeRecallingâ a million software installs (via auto-update), OTOH, is so commonplace as to be unremarkable.
  
  --
  
  I don't care if it's 90,000 hectares. That lake was not my doing.
13. Re: Was very obvious back then by arglebargle_xiv · 2018-05-26 19:18 · Score: 2
  
  The disadvantage with xml is that it creates a lot of overhead
  They're already using Java, obviously they're not concerned about overhead.
14. Re: Was very obvious back then by Pseudonym · 2018-05-26 20:44 · Score: 2
  
  Ada is a subset of a HDL language used to design the CPU you are running your web browser on right now.
  Both VHDL and Verilog are like the mafia. Hardware designers don't do business with those languages because they want to.
  
  --
  sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
15. Re: Was very obvious back then by gweihir · 2018-05-26 21:50 · Score: 2
  
  Well, just another aspect of why so many coders are so bad: They cannot recognize whether a tool is good or bad.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
16. Re: Was very obvious back then by peppepz · 2018-05-26 23:58 · Score: 3, Informative
  
  Java is one of the fastest, if not the fastest, non-native language.
Re:Records? Is that a thing? by hazem · 2018-05-26 08:09 · Score: 2

Cobol anyone?
I thought I was going to old-school school people by mentioning QBasic's "type" structures, but you punked me with Cobol.
But then again, not even Python does this well if you need a structure with specific data types to match a binary stream you need to read/write reading/writing.
Object serialization is dangerous. by Gravis+Zero · 2018-05-26 08:28 · Score: 2, Insightful

Regardless of language, object serialization is a dangerous idea. While it may seem like a nice idea at first, loading objects from unverified mutable data is an invitation for someone to tinker with that data. The situation only gets worse when your object structure changes because now your object data is invalid or incomplete.
Much like goto, I'm not arguing that it's not useful but rather that it's use it is inherently dangerous.

--
Anons need not reply. Questions end with a question mark.
1. Re:Object serialization is dangerous. by goose-incarnated · 2018-05-26 08:38 · Score: 4, Interesting
  
  Regardless of language, object serialization is a dangerous idea. While it may seem like a nice idea at first, loading objects from unverified mutable data is an invitation for someone to tinker with that data.
  
  Okay then, smartypants, what do you propose for persisting fields of an object? Anything you propose is, by definition, "serialisation". The only alternative to serialisation is non-persistent objects.
  (TBH, I kinda like the thought of signed serialiased blobs)
  
  --
  I'm a minority race. Save your vitriol for white people.
2. Re:Object serialization is dangerous. by HornWumpus · 2018-05-26 09:55 · Score: 3, Interesting
  
  Fortran had (has?) calculated goto. Not goto pointerVar, goto intVar where intVar contains _LINE_NUMBER_.
  I've seen it used. Integer NextIter. Then you use the middle bits of that Int as binary option flags. At least that's what you do if you have an applied math PhD and a case of cranial rectosis...
  On point: You're not supposed to deserialize from untrusted sources, in any language. Might as well execute SQL right from a web form.
  
  --
  John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
3. Re:Object serialization is dangerous. by Gravis+Zero · 2018-05-26 10:28 · Score: 2, Insightful
  
  Regardless of language, object serialization is a dangerous idea.
  Okay then, smartypants, what do you propose for persisting fields of an object?
  I was speaking specifically about object serialization. There's nothing wrong with data serialization but using it for object serialization is asking for trouble. If you don't understand the difference then you should excuse yourself.
  
  --
  Anons need not reply. Questions end with a question mark.
4. Re: Object serialization is dangerous. by K.+S.+Kyosuke · 2018-05-26 13:31 · Score: 5, Funny
  
  Thank Go weaning me off ruby's eval().
  That's because Google's motto is "Do no eval".
  
  --
  Ezekiel 23:20
That is nonsense ... by angel'o'sphere · 2018-05-26 09:06 · Score: 3, Insightful

Why would serialization be a security risk?
Hu? Cant ... you write to a disk or to a socket and thats it.
Sure, I'm nitpicking, because deserialization might be a security risk.
However only if you actually do it and e.g. leave open paths how bad files can end on your disk, which you then read, or open a socket and accept incoming serialized objects.
A typical Java program is absolutely not vulnerable to anything regarding serialization unless the programmer (intentionally?) made it so.
Articles about this (and basically every post here in the story while I type this): are simply wrong.
Java Serialization was once its strongest point of success. Many GUI builders let you edit "beans" and simply serialize the GUI as an graph of objects that simply gets read in again when the application starts and you call the setVisible(true) method to show your window.
Not needing to write any boilerplate code for writing and reading objects is a huge time saver and simplification.

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
1. Re:That is nonsense ... by HornWumpus · 2018-05-26 09:36 · Score: 2
  
  Deserialization is a risk, it's a risk in every language.
  Are Java coders just letting users browse for object in the file system? Accepting objects in web form inputs or unvalidated webmethod parameters? Turning around and running those objects as root?
  The problem isn't Java per se...it's coders who only know one language. They really do need a language that 'bubble wraps' the OS. But at some point, they have to get things done.
  
  --
  John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
The reason why it is dangerous by Wookie+Monster · 2018-05-26 09:24 · Score: 5, Interesting

I'm concerned that someone might hear "object serialization is bad, but JSON is good" and make the same mistakes that were made with Java object serialization. Java object serialization is bad for the following reasons:
1. No validation. You might have a nicely designed object, well tested, and has all sorts of validation checks to ensure that the internal state is never broken. Java object serialization bypasses all validation, permitting an attacker to construct a malformed object. Exactly how that would cause a problem requires a bit more work on the attacker's part, by studying how the application reacts to the malformed object. Adding validation is supported with Java serialization, but its not used by default. The designers favored simplicity over safety. Does switching to JSON magically fix the validation problem? Nope.
2. Loading of classes that you didn't expect to load. If I expect to receive a serialized list of strings, there's nothing to prevent an attacker to providing a list of any kind of object instead, due to type erasure. The application might fail to process the list because of a ClassCastException, but the potential damage is done. Java serialization /does/ support filtering out classes that aren't expected, but this is off by default. You need to define the blacklist yourself. Why is loading other classes a problem? See the next reasons:
3. Custom code during deserialization, which is actually necessary for performing your own validation. You can define your own code which runs when the object is deserialized, and the code can do pretty much anything. An attacker might be able to trick the code (using malformed input) into doing something harmful.
4. Additional classes on the classpath. Even if all of your code is well behaved, and has proper validation checks, and proper custom code, you're still vulnerable because additional classes exist that you're not aware of. You had no idea that there's this class 'Q' which has broken custom code, because Q was sucked in as a dependency of something else. That popular open source library you're using might be exposing your application to attack, and you didn't even know it.
For anyone designing an object serialization mechanism, always consider the tradeoffs when trying to make the system easier to use. Always use whitelists for trusted code instead of blacklists. Always construct objects using the object's public API. Favor the use of standard representations (maps, lists, tuples) instead of supporting full-blown customization. A little bit of friction can be a good thing.
1. Re:The reason why it is dangerous by Anonymous Coward · 2018-05-26 09:45 · Score: 2, Insightful
  
  To answer your points with the obvious:
  1) Use the validation supported by java, just like you would in XML, JSON, . Problem solved. Serialization isn't the issue here, the app dev is. The app dev can be lazy on XML or any other serialization classes.
  2) Same as point 1. The facility is there, use it. 'off by default' isn't an excuse for it being 'bad'
  3) Unit tests and write proper code. Again, this problem isn't different to any other XML/JSON serialisation mechanism.
  4) You have the same issue with any XML classes if you're using opensource code, JSON, etc, etc.
  So, basically, the issues are the same for any serialization classes, the java ones aren't any different to be honest.
2. Re:The reason why it is dangerous by Jeremi · 2018-05-26 10:25 · Score: 4, Insightful
  
  If I'm following you correctly, the problem isn't serialization per se but rather the fact that the deserialization is being done by the Java runtime (which has no way to validate the resulting objects against the application's requirements, since its deserialization code is application-independent, and also has the power to instantiate any kind of object, even those that are totally irrelevant to the task at hand), rather than by the application itself.
  A user-supplied deserialization-routine, OTOH, has at least a chance of being secure in the face of invalid source data, since it can check to make sure that its constraints are correctly satisfied and reject the data if they aren't.
  Of course, avoiding making every application developer write his own application-specific serialization/deserialization routines was largely the point of this Java feature, but in hindsight it appears that was a bad decision.
  
  --
  
  I don't care if it's 90,000 hectares. That lake was not my doing.
3. Re:The reason why it is dangerous by drinkypoo · 2018-05-26 10:30 · Score: 2
  
  Of course, avoiding making every application developer write his own application-specific serialization/deserialization routines was largely the point of this Java feature, but in hindsight it appears that was a bad decision.
  And this decision is just further evidence of Oracle's incompetence. Instead of keeping it but requiring every application developer to write his own object verifier, they're simply removing it because doing it right is hard.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
4. Re:The reason why it is dangerous by Anonymous Coward · 2018-05-26 11:05 · Score: 5, Informative
  
  If I'm following you correctly, the problem isn't serialization per se but rather the fact that the deserialization is being done by the Java runtime (which has no way to validate the resulting objects against the application's requirements, since its deserialization code is application-independent, and also has the power to instantiate any kind of object, even those that are totally irrelevant to the task at hand), rather than by the application itself.
  Java deserialization is magic. By which I mean it behaves in several ways that user code pretty much can't.
  The default system effectively loads a binary blob off the input stream and then creates each object without calling a constructor*. You can't just not call a constructor in Java, but Java deserialization does. All the fields are set by magic, by which I mean it ignores getters and setters and whatever access level might be on the fields. Any field marked as "not serialized" (transient) is left with default values - but those may not be the default values you think! If you write private transient int foo = 3; then foo won't be serialized, and when the object is deserialized, it will instead be ... 0. Because 0 is the default for ints.
  How does Java deserialization know if it's loading the right fields for a given object? Well, it's magic, but not that magic - you're supposed to let it know by setting the serialization ID for the class. And how do you do that? By declaring a static long serialVersionUID, and making sure you update it whenever your class structure changes. Don't do that and the deserialization logic might not notice that the structure doesn't quite match. No, you can't just have it autogenerate one - if not set, the serialization/deserialization code will create one, but it may be dependent on compiler and randomly break across identical code bases. Surprise!
  But in any case, the serialization system is magic. How do you write a custom serializer/deserializer? By creating the private methods writeObject(ObjectOutputStream) and readObject(ObjectOutputStream). Because the serializer is magic, it can access these private methods. (Note that readObject(ObjectOutputStream) gets called on a magically created object that has never had a constructor called on it, so all fields will have their default values! How does that work with final fields? Well... the short answer is "like shit." The longer answer is that the default deserializer just ignores the final modifier (which you can't do in generic code), and that if you want to do the same, there's some reflection magic or non-standard APIs you can do.)
  So anyway, there's a basic overview of how Java serialization defies expectations and basically guarantees that anyone writing code that involves serialization will do it wrong.
  * This is false. What it really does is go up the object hierarchy and look for the first parent class that does not declare itself serializable and calls its default no-args constructor. But that means that your class that you declared serializable therefore, by definition, does not get its constructor called. Surprise!
Re: RMI and serialization was useful by isj · 2018-05-26 10:20 · Score: 2

The Java serialization feature was fine. It just wasn't meant to be robust against adversaries. Java RMI uses the serialization feature and thus have the same problem. It is fine in trusted environments.
It is a bad idea to use them with untrusted sources.
I didn't RTFA so I don't know if there are other reasons that security for removing the feature.
Re: Records? Is that a thing? by TechyImmigrant · 2018-05-26 11:20 · Score: 2

Better than Joy Division!!
It will return your data in a new order.

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
It won't change anything by pestilence669 · 2018-05-26 11:56 · Score: 4, Insightful

Serialization isn't inherently bad. It's bad practices and misuse, which won't change. It'll just be replaced by many developers with XML, JSON, Protobuf, YAML, or other. Then, someone will inevitably sprinkle on some reflection or code generation, and you've almost done a 360... but with a lot more code and even more that could go wrong. I don't agree that adding more training wheels and/or removing features is always the best way to fix bad developer habits.
Re:I don't get it by angel'o'sphere · 2018-05-26 13:53 · Score: 4, Informative

Java is in so far unique as when you use build in serialization, you also serialize the class files.
There are two "marker interfaces" to make a Java class serializable: Serializable and Externalizable.
In casse of the first one, the Java Framework/VM uses reflection to serialize and deserialize objects.
In case of the second one, you are required to implement the methods writeExternal() and readExternal().
As the class files are in the serialized data stream, a program reading "untrusted" serialized data might also load classes aka code from that stream. If that code implements Externalizable and thus has an "unknonwn foreign" method readExternal(), the deserialization framework will call that unknown/untrusted method readExternal() which means: you run code coming from outside, which can do what ever it wants besides reading the object from the object stream.

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Programming languages behaving paternalistically by CustomSolvers2 · 2018-05-26 20:14 · Score: 2

I have never used this serialisation (under potentially dangerous conditions) or any other language feature not allowing me to have a reasonably good understanding of what is going on in the code. I also support any effort to reduce unnecessary functionalities from programming languages and, in general, to let them efficiently accomplish their expected goal. On other hand, some people might prefer to rely on certain features about which I don't care and their code might be as good as mine.

What I certainly don't support are the deflecting-responsibility-paternalistic-reaction attitudes which seem so common lately. Descriptive example:
- (developer wannabe) "My application has lots of security problems. The responsible is the language for allowing me to do what I shouldn't.".
- (paternalistic programming language) "You are right. We are aware that you might do really stupid things in case of being able to do so."
(further ridiculous, childish, amateur, tons-of-problems-prone, etc. nonsense...)

Don't you think that certain functionality in your programming language is worth keeping? Remove it! Do you usually have problems when relying on certain feature of a programming language for whatever reason? Don't use it! Or learn to use it properly! But what is with all these generic paternalistic behaviours lately? Making the overall programming experience more comfortable is certainly a good thing. Arbitrarily restricting the capabilities of programming languages (expected to be used by experts!) to somehow account for incompetence or lack of care is ridiculous. Over-protection is usually a very bad idea which provokes lower quality outputs (worse programmers) and other problems (false sense of security, disproportionate trust in what might be faulty, etc.).

--
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
Re:Records? Is that a thing? by Pseudonym · 2018-05-26 20:37 · Score: 2

This isn't about having bit-perfect layout, it's just about having a way to build an array of structs that doesn't require each individual struct to be individually allocated.

--
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});