The Object Teams Blog

Adding team spirit to your objects.

Archive for the ‘Uncategorized’ Category

Several Languages Java™ 8

with 4 comments

More than 3 years ago, on March 18, 2014, Java™ 8 was released, and on the same day Eclipse released support for this new version. I have repeatedly made the point that the Eclipse compiler for Java (ecj) as a second implementation of JLS (after javac) serves the entire Java community as a premier means for quality assurance.

Today, in April 2017, I can report that this effort is still far from complete. Still, JLS, javac and ecj do not define the exact same language. Time to take stock what these differences are about.

My own work on ecj focuses on an aspect that tries hard to remain invisible to casual users: type inference. It’s the heavy machinery behind generic methods, diamond expressions and lambdas, allowing users to omit explicit type information in many places, leaving it to the compiler to figure out the fine print.

To be honest, when we first shipped support for Java 8, I was well expecting lots of bug reports to come in, which would point out corner cases of JLS that we hadn’t implemented correctly. There was one area, that I felt particularly uneasy about: how type inference blends with overload resolution. During the Mars cycle of development Srikanth thankfully performed a major rework and clean up of this exact area.
(I can’t pass the opportunity to report what I learned from this exercise: Overloading is a huge contributor of complexity in the Java language which in (not only) my opinion doesn’t carry its own weight — not a fraction of it).

We are not done with Java 8

The fact that still 2 years after that rework we constantly receive bug reports against Java 8 type inference is unsettling in a way.

To give some numbers to it: during every 6-week milestone we fixed between 1 and 6 bugs in type inference. None of these bugs is solved in a coffee break, some compete for the title “toughest challenge I faced in my career”.

We have a total of 103 bugs explicitly marked as 1.8 inference bugs. Of these

  • 17 were resolved before Java 8 GA
  • 52 have been resolved in the three years since Java 8 GA
  • 34 are still unresolved today.

This will likely keep me busy for at least one more year.

In the early days of Java 8 we could identify two clusters where behavioral differences between javac and ecj could be observed:

  • wildcard capture
  • raw types

(I’ll have a note about the latter at the end of this post).

In these areas we could comfort ourselves by pointing to known bugs in javac. We even implemented code to conditionally mimic some of these javac bugs, but frankly, establishing bug compatibility is even more difficult than truthfully implementing a specification.

Meanwhile, in the area of wildcard capture, javac has been significantly improved. Even though some of these fixes appear only in Java 9 early access builds, not in Java 8, we can observe both compilers converging, and given that the major bugs have been fixed, it is getting easier to focus on remaining corner cases. Good.

Java “8.1”

One event almost went under our radar: In February 2015 a revised version of JLS 8 was published. As part of this update, a few sentences have been added on behalf of JDK-8038747. While the spec may be hard to grok by outsiders, the goal can be explained as enabling a compiler to include more type hints from the bodies of lambda expressions that are nested inside a generic method invocation.

In fact, collecting type information from different levels in the AST was a big goal of the type inference rewrite in Java 8, but without the 2015 spec update, compilers weren’t even allowed to look into the body of a lambda, if the lambda does not specify types for its arguments.

m(a -> a.b())
What do we know about b, while types for m and a are still unknown?

Conceptually, this is immensely tricky, because generally speaking, the code in the bodies of such type-elided lambdas can mean just about anything, while the signature of the lambda is not yet known. So we were happy about the spec update, as it promised to resolve a bunch of test cases, where javac accepts programs that ecj – without the update – was not able to resolve.

Ever since, each improved version of ecj created a regression for one or more of our dear users. We debugged no end, read between the lines of JLS, but couldn’t find a solution that would satisfy users in all cases. And they kept complaining that javac had no problems with their programs, even earlier versions of ecj accepted their program, so rejecting it now must be a regression.

“Switching the alliance”

Up-to that point, I saw our main ally in the authors of JLS, Dan Smith and Alex Buckley from Oracle. In particular Dan Smith has been a tremendous help in understanding JLS 8 and analyzing where our reading of it deviated from the authors’ intention. Together we identified not only bugs in my interpretation and implementation of JLS, but also several bugs in javac.

When we iterated bugs relating to JDK-8038747 time and again, this approach was less effective, coming to no conclusion in several cases. I slowly realized, that we were reaching a level of detail that’s actually easier to figure out when working with an implementation, than at the specification level.

This is when I started to seek advice from javac developers. Again, I received very valuable help, now mostly from Maurizio Cimadamore. Previously, my perception was, that JLS is the gold standard, and any deviation or even just liberal interpretation of it is bad. During the discussion with Maurizio I learned, that in some sense javac is actually “better” than JLS, not only in accepting more type-correct programs, but also in terms of better reflecting the intention of the JSR 335 experts.

So I started to deliberately deviate from JLS, too. Instead of “blaming” javac for deviating from JLS, I now “blame” JLS for being incomplete wrt the intended semantics.

To put this effort into proportion, please consider the figure of 103 bugs mentioned above. From these, 17 bugs have a reference to JDK-8038747. Coincidentally, this is the exact same number as those great bug reports prior to Java 8 GA, that gave us the huge boost, enabling us to indeed deliver a high quality implementation right on GA. In other words, this is a huge engineering effort, and we have no idea, how close to done we are. Will we face the next round of regressions on every new release we publish?

If you work from a specification, there is a point where you feel confident that you did all that is required. Knowing that fulfilling the spec is not enough, it’s impossible to say, what is “enough”.

What is “better”?

With wildcard captures and raw types, it was easy to argue, that certain programs must be rejected by a compiler, because they are not type safe and can blow up at runtime in unexpected locations. In the area around JDK-8038747 javac tends to accept more programs than JLS, but here it would be unreasonable to expect javac to change and start rejecting these “good” programs.

Still, calling out a competition of who accepts more “good” programs would be a bad idea, too, because this would completely abandon the goal of equivalence between compilers. After compiling with one compiler, one could never be sure that another compiler would also accept the same program. The term “Java” would loose its precise meaning.

This implies, every attempt to better align ecj with javac, based on knowledge about the implementation and not based on JLS, should be seen as a temporary workaround. To resume its role of leadership, JLS must catch up with any improvements done in the implementation(s).

To comfort the reader, I should say that in all cases discussed here, there’s always a safe fallback: when inference fails to find a solution, it is always possibly to help the compiler by adding some explicit type arguments (or argument types for a lambda). More importantly, such additions, which may be required for one compiler, should never cause problems for another compiler.

Also note, that explicit type arguments are always to be preferred over type casts (which some people tend to use as a workaround): type arguments will help for effective type checking, whereas type casts bypass type checking and can blow up at runtime.

Thanks and Sorry!

I wrote this post in the desire to reach out to our users.

First: Each reproducible bug report is highly valuable; this is what drives JDT code towards higher and higher quality. By accumulating test cases from all these reports we gradually create a test suite that provides the best available safety net.

Second: I am sorry about every regression introduced by any of our fixes, but as this post should explain, we are traveling uncharted territory: some of the corner cases we are currently addressing are not sufficiently covered by JLS. Additionally, type inference is inherently sensitive to the slightest of changes. Predicting, which programs will be affected by a given change in the implementation of type inference is near impossible.

Yet, it’s certainly not a game of “them” vs “us”: JLS, javac, and ecj, we’re all in this together, and only by continuing to talk to each other, eventually we will all speak the same language, when we say “Java 8”. Please bear with us as the saga continues …


PS: Another pet peeve

I am a type system enthusiast, mostly, because I like how type checkers can completely eliminate entire categories of bugs from your programs. I like to give a guarantee that no code that is accepted by the compiler will ever fail at runtime with an exception like attempting to invoke a method that is not present on the receiver, or class cast exceptions in source code that doesn’t mention any class cast. Type inference is the tool that alleviates the verbosity of explicitly typed programs, while at the same time maintaining the same guarantees about type safety.

Unfortunately, there is a class of Java programs for which such guarantees can not be given: if a program uses raw types, the compiler needs to generate lots of checkcast instructions, to make the code acceptable for the JVM. Each of these instructions can cause the program to blow up at runtime in totally unsuspicious locations.

There are situations where javac silently assumes that a raw type List is a subtype of its parameterized form List<String>. This is wrong. Still I cannot just ignore this problem, because lots of “bugs” are reported against ecj, based on the observation that javac and ecj accept different programs, where in many cases the difference concerns the handling of raw types during type inference.

Economically speaking, investigating in the subtleties of how Java 8 compilers handle raw types is a huge waste of efforts. Any one reading this: if you want to do me a favor, and thus help me to focus on the most relevant aspects of compiler development, please clean up your code. If you keep your code private, nobody will suffer except from yourself, but please, before posting a bug report against JDT, if your code example contains raw types, think thrice before submitting the report. Adding proper type arguments will certainly improve the quality of your code. Likely, after that exercise also ecj will be a lot happier with your code and give you correct answers.

Do I need to repeat that raw types were a workaround for migrating towards Java 5? … that raw types were discouraged starting from day 1 of Java 5? If that doesn’t convince you, search on StackOverflow for questions mentioning raw types and type inference, and you will see that by the use of raw types you are basically disabling much of the power of type inference. Let’s please shed the legacy of raw types.

Advertisements

Written by Stephan Herrmann

April 2, 2017 at 21:43

Posted in Eclipse, Uncategorized

Tagged with , ,

Eclipse Neon.2 is on Maven Central

with 6 comments

It’s done, finally!

Bidding farewell to my pet peeve

In my job at GK Software I have the pleasure of developing technology based on Eclipse. But those colleagues consuming my technology work on software that has no direct connection to Eclipse nor OSGi. Their build technology of choice is Maven (without tycho that is). So whenever their build touches my technology we are facing a “challenge”. It doesn’t make a big difference if they are just invoking a code generator built using Xtext etc or whether some Eclipse technology should actually be included in their application runtime.

Among many troubles, I recall one situation that really opened my eyes: one particular build had been running successfully for some time, until one day it was fubar. One Eclipse artifact could no longer be resolved. Followed long nights of searching why that artifact may have disappeared, but we reassured ourselves, nothing had disappeared. Quite to the contrary somewhere on the wide internet (Maven Central to be precise) a new artifact had appeared. So what? Well, that artifact was the same that we also had on our internal servers. Well, if it’s the same, what’s the buzz? It turned out it had a one-char difference in its version: instead of 1.2.3.v20140815 its version was 1.2.3-v20140815. Yes take a close look, there is a difference. Bottom line, with both almost-identical versions available, Maven couldn’t figure out what to do, maybe each was considered as worse than the other, to the effect that Maven simply failed to use either. Go figure.

More stories like this and I realized that relying on Eclipse artifacts in Maven builds was always at the mercy of some volunteers, who typically don’t have a long-term relationship to Eclipse, who filled in a major gap by uploading individual Eclipse artifacts to Maven Central (thanks to you volunteers, please don’t take it personally: I’m happy that your work is no longer needed). Anybody who has ever studied the differences between Maven and OSGi (wrt dependencies and building that is) will immediately see that there are many possible ways to represent Eclipse artifacts (OSGi bundles) in a Maven pom. The resulting “diversity” was one of my pet peeves in my job.

At this point I decided to be the next volunteer who would screw up other people’s builds who would collaborate with the powers that be at Eclipse.org to produce the official uploads to Maven Central.

As of today, I can report that this dream has become reality, all relevant artifacts of Neon.2 that are produced by the Eclipse Project, are now “officially” available from Maven Central.

Bridging between universes

I should like to report some details of how our artifacts are mapped into the Maven world:

The main tool in this endeavour is the CBI aggregator, a model based tool for transforming p2 repositories in various ways. One of its capabilities is to create a Maven repository (a dual use repo actually, but the p2 side of this is immaterial to this story). That tool does a great job of extracting meta data from the p2 repo in order to create “meaningful” pom files, the key feature being: it copies all dependency information, which is originally authored in MANIFEST.MF, into corresponding declarations in the pom file.

Still a few things had to be settled, either by improving the tool, by fine tuning the input to the tool, or by some steps of post-processing the resulting Maven repo.

  • Group IDs
    While OSGi artifacts only have a single qualified Bundle-SymbolicName, Maven requires a two-part name: groupId x artifactId. It was easy to agree on using the full symbolic name for the artifactId, but what should the groups be? We settled on these three groups for the Eclipse Project:

    • org.eclipse.platform
    • org.eclipse.jdt
    • org.eclipse.pde
  • Version numbers
    In Maven land, release versions have three segments, in OSGi we maintain a forth segment (qualifier) also for releases. To play by Maven rules, we decided to use three-part versions for our uploads to Maven Central. This emphasizes the strategy to only publish releases, for which the first three parts of the version are required to be unique.
  • 3rd party dependencies
    All non-Eclipse artifacts that we depend on should be referenced by their proper coordinates in Maven land. By default, the CBI aggregator assigns all artifacts to the synthetic group p2.osgi.bundle, but if s.o. depends on p2.osgi.bundle:org.junit this doesn’t make much sense. In particular, it must be avoided that projects consuming Eclipse artifacts will get the same 3rd party library under two different names (perhaps in different versions?). We identified 16 such libraries, and their proper coordinates.
  • Source artifacts
    Eclipse plug-ins have their source code in corresponding .source plug-ins. Maven has a similar convention, just using a “classifier” instead of appending to the artifact name. In Maven we conform to their convention, so that tools like m2e can correctly pick up the source code from any dependencies.
  • Other meta data
    Followed a hunt for project url, scm coordinates, artifact descriptions and related data. Much of this could be retrieved from our MANIFEST.MF files, some information is currently mapped using a static, manually maintained mapping. Other information like licences and organization are fully static during this process. In the end all was approved by the validation on OSSRH servers.

If you want to browse the resulting wealth, you may start at

Everything with fully qualified artifact names in these groups (and date of 2017-01-07 or newer) should be from the new, “official” upload.

This is just the beginning

The bug on which all this has been booked is Bug 484004: Start publishing Eclipse platform artifacts to Maven central. See the word “Start”?

To follow-up tasks are already on the board:

(1) Migrate all the various scripts, tools, and models to the proper git repo of our releng project. At the end of the day, this process of transformation and upload should become a routine operation to be invoked by our favourite build meisters.

(2) Fix any quirks in the generated pom files. E.g., we already know that the process did not handle fragments in an optimal way. As a result, consuming SWT from the new upload is not straight forward.

Both issues should be handled in or off bug 510072, in the hope, that when we publish Neon.3 the new, “official” Maven coordinates of Eclipse artifacts will be even fit all all real world use. So: please test and report in the bug any problems you might find.

(3) I was careful to say “Eclipse Project”. We don’t yet have the magic wand to apply this to literally all artifacts produced in the Eclipse community. Perhaps s.o. will volunteer to apply the approach to everything from the Simultaneous Release? If we can publish 300+ artifacts, we can also publish 7000+, can’t we? 🙂

happy building!

Written by Stephan Herrmann

January 9, 2017 at 23:21

Posted in Eclipse, Uncategorized

Tagged with , , , ,

Help the JDT Compiler helping you! – 1: Resource Leaks

with 6 comments

During the Juno cycle a lot of work in the JDT has gone into more sophisticated static analysis, and some more is still in the pipe-line. I truly hope that once Juno is shipped this will help all JDT users to find more bugs immediately while still typing. However, early feedback regarding these features shows that users are starting to expect miracles from the analysis 🙂

On the one hand seeing this is flattering, but on the other hand it makes me think we should perhaps explain what exactly the analysis can see and what is beyond its vision. If you take a few minutes learning about the concepts behind the analysis you’ll not only understand its limitations, but more importantly you will learn how to write code that’s better readable – in this case for reading by the compiler. Saying: with only slightly rephrasing your programs you can help the compiler to better understand what’s going on, to the effect that the compiler can answer with much more useful error and warning messages.

Since there’s a lot of analysis in this JDT compiler I will address just one topic per blog post. This post goes to improvements in the detection of resource leaks.

Resource leaks – the basics

Right when everybody believed that Eclipse Indigo RC 4 was ready for the great release, another blocker bug was detected: a simple resource leak basically prevented Eclipse from launching on a typical Linux box if more than 1000 bundles are installed. Coincidentally, at the same time the JDT team was finishing up work on the new try-with-resources statement introduced in Java 7. So I was thinking: shouldn’t the compiler help users to migrate from notoriously brittle handling of resources to the new construct that was designed specifically to facilitate a safe style of working with resources?

What’s a resource?

So, how can the compiler know about resources? Following the try-with-resources concept, any instance of type java.lang.AutoCloseable is a resource. Simple, huh? In order to extend the analysis also to pre Java 7 code, we also consider java.io.Closeable (available since 1.5).

Resource life cycle

The expected life cycle of any resource is : allocate—use—close. Simple again.

From this we conclude the code pattern we have to look for: where does the code allocate a closeable and no call to close() is seen afterwards. Or perhaps a call is seen but not all execution paths will reach that call, etc.

Basic warnings

With Juno M3 we released a first analysis that could now tell you things like:

  • Resource leak: “input” is never closed
  • Resource leak: “input” is never closed at this location (if a method exit happens before reaching close())

If the problem occurs only on some execution paths the warnings are softened (saying “potential leak” etc.).

Good, but

Signal to noise – part 1

It turned out that the analysis was causing significant noise. How come? The concepts are so clear and all code that wouldn’t exhibit the simple allocate—use—close life cycle should indeed by revised, shouldn’t it?

In fact we found several patterns, where these warnings were indeed useless.

Resource-less resources

We learned that not every subtype of Closeable really represents a resource that needs leak prevention. How many times have you invoked close() on a StringWriter, e.g.? Just have a look at its implementation and you’ll see why this isn’t worth the effort. Are there more classes in this category?

Indeed we found a total of 7 classes in java.io that purely operate on Java objects without allocating any resources from the operating system:

  • StringReader
  • StringWriter
  • ByteArrayInputStream
  • ByteArrayOutputStream
  • CharArrayReader
  • CharArrayWriter
  • StringBufferInputStream

For none of these does it make sense to warn about missing close().

To account for these classes we simply added a white list: if a class is in the list suppress any warnings/errors. This white list consists of exactly those 7 classes listed above. Sub-classes of these classes are not considered.

Wrapper resources

Another group of classes implementing Closeable showed up, that are not strictly resources themselves. Think of BufferedInputStream! Does it need to be closed?

Well? What’s your answer? The correct answer is: it depends. A few examples:

        void wrappers(String content) throws IOException {
                Reader r1, r2, r3, r4;
                r1 = new BufferedReader(new FileReader("someFile"));
                r2 = new BufferedReader(new StringReader(content));
                r3 = new FileReader("somefile");
                r4 = new BufferedReader(r3);
                r3.close();
        }
 

How many leaks? With same added smartness the compiler will signal only one resource leak: on r1. All others are safe:

  • r2 is a wrapper for a resource-less closeable: no OS resources are ever allocated here.
  • r3 is explicitly closed
  • r4 is just a wrapper around r3 and since that is properly closed, r4 does not hold onto any OS resources at the end.
  • returning to r1, why is that a leak? It’s a wrapper, too, but now the underlying resource (a FileReader) is not directly closed so it’s the responsibility of the wrapper and can only be triggered by calling close() on the wrapper r1.

EDIT: We are not recommending to close a wrapped resource directly as done with r3, closing the wrapper (r4) is definitely cleaner, and when wrapping a FileOutputStream with a BufferedOutputStream closing the former is actually wrong, because it may lose buffered content that hasn’t been flushed. However, the analysis is strictly focused on resource leaks and for analysing wrappers we narrow that notion to leaks of OS resources. For the given example, reporting a warning against r4 would be pure noise.

Summarizing: wrappers don’t directly hold an OS resource, but delegate to a next closeable. Depending on the nature and state of the nested closeable the wrapper may or may not be responsible for closing. In arbitrary chains of wrappers with a relevant resource at the bottom, closing any closeable in the chain (including the bottom) will suffice to release the single resource. If a wrapper chain is not properly closed the problem will be flagged against the outer-most wrapper, since calling close() at the wrapper will be delegated along all elements of the chain, which is the cleanest way of closing.

Also for wrappers the question arises: how does the compiler know? Again we set up a white list with all wrapper classes we found in the JRE: 20 classes in java.io, 12 in java.util.zip and 5 in other packages (the full lists are in TypeConstants.java, search for “_CLOSEABLES”).

Status and outlook

Yes, a leak can be a stop-ship problem.

Starting with Juno M3 we have basic analysis of resource leaks; starting with Juno M5 the analysis uses the two white lists mentioned above: resource-less closeables and resource wrappers. In real code this significantly reduces the number of false positives, which means: for the remaining warnings the signal-to-noise ratio is significantly better.

M5 will actually bring more improvements in this analysis, but that will be subject of a next post.

Written by Stephan Herrmann

January 26, 2012 at 15:25

Posted in Eclipse, Uncategorized

Tagged with , , ,