Several Languages Java™ 8
More than 3 years ago, on March 18, 2014, Java™ 8 was released, and on the same day Eclipse released support for this new version. I have repeatedly made the point that the Eclipse compiler for Java (ecj) as a second implementation of JLS (after javac) serves the entire Java community as a premier means for quality assurance.
Today, in April 2017, I can report that this effort is still far from complete. Still, JLS, javac and ecj do not define the exact same language. Time to take stock what these differences are about.
My own work on ecj focuses on an aspect that tries hard to remain invisible to casual users: type inference. It’s the heavy machinery behind generic methods, diamond expressions and lambdas, allowing users to omit explicit type information in many places, leaving it to the compiler to figure out the fine print.
To be honest, when we first shipped support for Java 8, I was well expecting lots of bug reports to come in, which would point out corner cases of JLS that we hadn’t implemented correctly. There was one area, that I felt particularly uneasy about: how type inference blends with overload resolution. During the Mars cycle of development Srikanth thankfully performed a major rework and clean up of this exact area.
(I can’t pass the opportunity to report what I learned from this exercise: Overloading is a huge contributor of complexity in the Java language which in (not only) my opinion doesn’t carry its own weight — not a fraction of it).
We are not done with Java 8
The fact that still 2 years after that rework we constantly receive bug reports against Java 8 type inference is unsettling in a way.
To give some numbers to it: during every 6-week milestone we fixed between 1 and 6 bugs in type inference. None of these bugs is solved in a coffee break, some compete for the title “toughest challenge I faced in my career”.
We have a total of 103 bugs explicitly marked as 1.8 inference bugs. Of these
- 17 were resolved before Java 8 GA
- 52 have been resolved in the three years since Java 8 GA
- 34 are still unresolved today.
This will likely keep me busy for at least one more year.
In the early days of Java 8 we could identify two clusters where behavioral differences between javac and ecj could be observed:
- wildcard capture
- raw types
(I’ll have a note about the latter at the end of this post).
In these areas we could comfort ourselves by pointing to known bugs in javac. We even implemented code to conditionally mimic some of these javac bugs, but frankly, establishing bug compatibility is even more difficult than truthfully implementing a specification.
Meanwhile, in the area of wildcard capture, javac has been significantly improved. Even though some of these fixes appear only in Java 9 early access builds, not in Java 8, we can observe both compilers converging, and given that the major bugs have been fixed, it is getting easier to focus on remaining corner cases. Good.
One event almost went under our radar: In February 2015 a revised version of JLS 8 was published. As part of this update, a few sentences have been added on behalf of JDK-8038747. While the spec may be hard to grok by outsiders, the goal can be explained as enabling a compiler to include more type hints from the bodies of lambda expressions that are nested inside a generic method invocation.
In fact, collecting type information from different levels in the AST was a big goal of the type inference rewrite in Java 8, but without the 2015 spec update, compilers weren’t even allowed to look into the body of a lambda, if the lambda does not specify types for its arguments.
|m(a -> a.b())|
|What do we know about b, while types for m and a are still unknown?|
Conceptually, this is immensely tricky, because generally speaking, the code in the bodies of such type-elided lambdas can mean just about anything, while the signature of the lambda is not yet known. So we were happy about the spec update, as it promised to resolve a bunch of test cases, where javac accepts programs that ecj – without the update – was not able to resolve.
Ever since, each improved version of ecj created a regression for one or more of our dear users. We debugged no end, read between the lines of JLS, but couldn’t find a solution that would satisfy users in all cases. And they kept complaining that javac had no problems with their programs, even earlier versions of ecj accepted their program, so rejecting it now must be a regression.
“Switching the alliance”
Up-to that point, I saw our main ally in the authors of JLS, Dan Smith and Alex Buckley from Oracle. In particular Dan Smith has been a tremendous help in understanding JLS 8 and analyzing where our reading of it deviated from the authors’ intention. Together we identified not only bugs in my interpretation and implementation of JLS, but also several bugs in javac.
When we iterated bugs relating to JDK-8038747 time and again, this approach was less effective, coming to no conclusion in several cases. I slowly realized, that we were reaching a level of detail that’s actually easier to figure out when working with an implementation, than at the specification level.
This is when I started to seek advice from javac developers. Again, I received very valuable help, now mostly from Maurizio Cimadamore. Previously, my perception was, that JLS is the gold standard, and any deviation or even just liberal interpretation of it is bad. During the discussion with Maurizio I learned, that in some sense javac is actually “better” than JLS, not only in accepting more type-correct programs, but also in terms of better reflecting the intention of the JSR 335 experts.
So I started to deliberately deviate from JLS, too. Instead of “blaming” javac for deviating from JLS, I now “blame” JLS for being incomplete wrt the intended semantics.
To put this effort into proportion, please consider the figure of 103 bugs mentioned above. From these, 17 bugs have a reference to JDK-8038747. Coincidentally, this is the exact same number as those great bug reports prior to Java 8 GA, that gave us the huge boost, enabling us to indeed deliver a high quality implementation right on GA. In other words, this is a huge engineering effort, and we have no idea, how close to done we are. Will we face the next round of regressions on every new release we publish?
If you work from a specification, there is a point where you feel confident that you did all that is required. Knowing that fulfilling the spec is not enough, it’s impossible to say, what is “enough”.
What is “better”?
With wildcard captures and raw types, it was easy to argue, that certain programs must be rejected by a compiler, because they are not type safe and can blow up at runtime in unexpected locations. In the area around JDK-8038747 javac tends to accept more programs than JLS, but here it would be unreasonable to expect javac to change and start rejecting these “good” programs.
Still, calling out a competition of who accepts more “good” programs would be a bad idea, too, because this would completely abandon the goal of equivalence between compilers. After compiling with one compiler, one could never be sure that another compiler would also accept the same program. The term “Java” would loose its precise meaning.
This implies, every attempt to better align ecj with javac, based on knowledge about the implementation and not based on JLS, should be seen as a temporary workaround. To resume its role of leadership, JLS must catch up with any improvements done in the implementation(s).
To comfort the reader, I should say that in all cases discussed here, there’s always a safe fallback: when inference fails to find a solution, it is always possibly to help the compiler by adding some explicit type arguments (or argument types for a lambda). More importantly, such additions, which may be required for one compiler, should never cause problems for another compiler.
Also note, that explicit type arguments are always to be preferred over type casts (which some people tend to use as a workaround): type arguments will help for effective type checking, whereas type casts bypass type checking and can blow up at runtime.
Thanks and Sorry!
I wrote this post in the desire to reach out to our users.
First: Each reproducible bug report is highly valuable; this is what drives JDT code towards higher and higher quality. By accumulating test cases from all these reports we gradually create a test suite that provides the best available safety net.
Second: I am sorry about every regression introduced by any of our fixes, but as this post should explain, we are traveling uncharted territory: some of the corner cases we are currently addressing are not sufficiently covered by JLS. Additionally, type inference is inherently sensitive to the slightest of changes. Predicting, which programs will be affected by a given change in the implementation of type inference is near impossible.
Yet, it’s certainly not a game of “them” vs “us”: JLS, javac, and ecj, we’re all in this together, and only by continuing to talk to each other, eventually we will all speak the same language, when we say “Java 8”. Please bear with us as the saga continues …
PS: Another pet peeve
I am a type system enthusiast, mostly, because I like how type checkers can completely eliminate entire categories of bugs from your programs. I like to give a guarantee that no code that is accepted by the compiler will ever fail at runtime with an exception like attempting to invoke a method that is not present on the receiver, or class cast exceptions in source code that doesn’t mention any class cast. Type inference is the tool that alleviates the verbosity of explicitly typed programs, while at the same time maintaining the same guarantees about type safety.
Unfortunately, there is a class of Java programs for which such guarantees can not be given: if a program uses raw types, the compiler needs to generate lots of checkcast instructions, to make the code acceptable for the JVM. Each of these instructions can cause the program to blow up at runtime in totally unsuspicious locations.
There are situations where javac silently assumes that a raw type List is a subtype of its parameterized form List<String>. This is wrong. Still I cannot just ignore this problem, because lots of “bugs” are reported against ecj, based on the observation that javac and ecj accept different programs, where in many cases the difference concerns the handling of raw types during type inference.
Economically speaking, investigating in the subtleties of how Java 8 compilers handle raw types is a huge waste of efforts. Any one reading this: if you want to do me a favor, and thus help me to focus on the most relevant aspects of compiler development, please clean up your code. If you keep your code private, nobody will suffer except from yourself, but please, before posting a bug report against JDT, if your code example contains raw types, think thrice before submitting the report. Adding proper type arguments will certainly improve the quality of your code. Likely, after that exercise also ecj will be a lot happier with your code and give you correct answers.
Do I need to repeat that raw types were a workaround for migrating towards Java 5? … that raw types were discouraged starting from day 1 of Java 5? If that doesn’t convince you, search on StackOverflow for questions mentioning raw types and type inference, and you will see that by the use of raw types you are basically disabling much of the power of type inference. Let’s please shed the legacy of raw types.