The Object Teams Blog

Adding team spirit to your objects.

IDE for your own language embedded in Java? (part 2)

with 4 comments

In the first part I demonstrated how Object Teams can be used to extend the JDT compiler for a custom language to be embedded in Java. I concluded by saying that more substantial features like Refactoring might need more rocket science which I wanted to show next.

The “bad news” is: before I started to do some strong adaptations of DOM AST etc to make Refactorings work, I just made a few experiments of how Refactorings actually behaved in my hybrid language. To my own surprise a lot of things already worked OK: I could extract a custom syntax expression into a local variable and inline the variable again and more stuff of that kind. Just look at this example:

Actually this reflects an experience I’ve made more than once: If you reuse some module and perform some adaptations in terms of provided API and extension points etc. more often than not one adaptation entails the next, adding tweaks to workarounds because you keep scratching at the surface. If, OTOH, you succeed to make your adaptation right at the core where the decisions are made, just one or two cuts and stitches may suffice to get your job done. Clean, effective and consistent. That’s what we see when cleanly inserting a custom AST node into the JDT: if our CustomIntLiteral behaves well a lot of JDT functionality can just work with this thing without knowing it’s not a genuine Java thing.

Now this means for my next example I had to look for an extra challenge. I decided to enhance the example in two ways:

  • The custom syntax should be a bit more realistic, so I chose to create a syntax for money, consisting of a number and the name of a currency
  • I wanted source formatting to work for the whole hybrid language

A word of warning: this post uses some bells and whistles of OT/J and applies it to the non-trivial JDT. This might be a bit overwhelming for the novice. If you prefer lower dosage first, you may want to check out our example section in the wiki. It’s still far from complete but I’m working on it.

A syntax for money

The new syntax should allow me to write this:

int getMoney() {
    return <% 13 euro %>;
}
 

and the stuff should internally be stored as a structured AST node. This is how class CurrencyExpression starts:

public class CurrencyExpression extends Expression {
 
    public IntLiteral value;
    public String currency;
 
    final static String[] CURRENCIES = { "euro", "dollar" };
 
    public CurrencyExpression(int sourceStart, int sourceEnd) { ...
 
    public boolean setCurrency(String string) { ...
 
    @Override
    public StringBuffer printExpression(int indent, StringBuffer output) { ...
    ....
}
 

For creating a CurrencyExpression from source I wrote a little CustomParser, normal boring stuff with 40% just reading individual chars and manipulating character positions, another 45% actually does some error reporting and only 3 lines are relevant: those that create a new CurrencyExpression, create an IntValue for the value part and invoke setCurrency with the currency string.

In the ScannerAdaptor from the previous post I simply replaced this

Expression replacement = new CustomIntLiteral(source, start, end, start+2, end-2);

with this:

Expression replacement = customParser.parseCurrencyExpression(source, start, end, this.getProblemReporter());

That suffices to make the above little method compile and run just as expected.

Interlude: DOM AST

Well, with this slightly more realistic syntax you’d actually see a number of exceptions in the IDE that can all be fixed by letting the DOM AST know about our addition. For those who don’t regularly program against the JDT API: the DOM AST is the public data structure by which tools outside the JDT core manipulate Java programs. Inside the JDT extending the DOM AST would mean to subclass either org.eclipse.jdt.core.dom.ASTNode or one of its subclasses. Unfortunately, all constructors in this hierarchy are package private, and even with OT/J we respect what the javadoc says: “clients are unable to declare additional subclasses“.

But we can do something similar: instead of subclassing we can use instances of a regular DOM class and attach a role instance to them. As the base I chose org.eclipse.jdt.core.dom.SimpleName which inside the JDT could mean a lot of different things, so for most parts a node of this kind is regarded as a black box, just what we need. This is the role I added to the team SyntaxAdaptor from the previous post:

protected class DomCurrencyLiteral playedBy SimpleName {
    protected String currency;
 
    void setSourceRange(int sourceStart, int length) -> void setSourceRange(int sourceStart, int length);
 
    @SuppressWarnings("decapsulation")
    public DomCurrencyLiteral(AST ast, CurrencyExpression expression) {
        base(ast);
        this.currency = expression.currency;
        setSourceRange(expression.sourceStart, expression.sourceEnd-expression.sourceStart+1);
    }
}
 

So this almost looks like subclassing except we use playedBy instead of extends and base() instead of super(). And yes, when creating an instance with “new DomCurrencyLiteral(ast, expr)” inside the constructor we create a SimpleName from DOM using the package private constructor. But by using role playing instead of sub-classing this has become part of the aspectBinding relationship, which makes analysis of the state of encapsulation much easier.

So, who actually creates these nodes? Inside the JDT this is the responsibility of the ASTConverter, which takes an AST from the compiler and converts it to the public variant. In order to tell the ASTConverter how to handle our currency nodes I added this role to the existing team SyntaxAdaptor:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
@SuppressWarnings("decapsulation")
protected class DomConverterAdaptor playedBy ASTConverter {
 
    // whenever convert(Expression) is called ...
    org.eclipse.jdt.core.dom.Expression convertCurrencyExpression(CurrencyExpression expression)
    <- replace org.eclipse.jdt.core.dom.Expression convert(Expression expression)
        // ... and when the literal is actually a CurrencyExpression ...
        base when (expression instanceof CurrencyExpression)
        // ... perform the cast we just checked for and feed it into the callin method below.
        with { expression <- (CurrencyExpression)expression }
 
    /**
     * Convert a CustomIntLiteral from the compiler to its dom counter part.
     * This method uses inferred callouts (OTJLD §3.1(j))
     * which need to be enabled in the OT/J compiler preferences.
     */
    @SuppressWarnings({ "basecall", "inferredcallout" })
    callin org.eclipse.jdt.core.dom.Expression convertCurrencyExpression(CurrencyExpression expression){
        final DomCurrencyLiteral name = new DomCurrencyLiteral(this.ast, expression);
        if (this.resolveBindings) {
            recordNodes(name, expression);
        }
        return name;
    }
}
 

I deliberately used some special OT/J syntax worth explaining:

  • Lines 5ff. define a callin bindings like we’ve seen before.
  • Line 8 adds a guard predicate to the binding, saying that this binding should only fire when the argument expression is actually of type CurrencyExpression
  • After passing the guard we know that we can safely cast to CurrencyExpression so I added a parameter mapping (line 10) which feeds a casted value into the role method.
  • Inside the role method convertCurrencyExpression everything looks normal, but at a closer look this.ast and this.resolveBindings seem to be undefined in the scope of the current class. In fact these fields are defined in the base class ASTConverter and we could use explicit callout accessors like in the previous post. However, this time I chose to let the compiler infer these callouts so that the method would look exactly like existing methods in ASTConverter do (this option has to be enabled in the OT/J compiler preferences).

OK, with this little addition our CurrencyExpressions are converted to something that the JDT can handle and we’re already prepared for doing real AST manipulation including our syntax.

Source Formatting

Inside the JDT source formatting (Ctrl-Shift-F) is essentially performed by class CodeFormatterVisitor. This class is one of many subclasses of the general ASTVisitor. If one wanted to make these visitors aware of our CurrencyExpression we would have to add one visit method to ASTVisitor and each of its sub-classes! That’s certainly not viable, so with plain Java we’re pretty much out of luck.

The situation that needs adaptation can be described as follows:

  • A visitor will be created and invoked in order to descend into the AST
  • At the point when traversal finds a CurrencyExpression it will invoke its traverse(ASTVisitor) method.

Of course we could manually inspect the type of visitor within the traverse method, but that would defy the whole purpose of having visitors: keep all those add-on functions out from your data structures. Instead I only gave a default implementation to CurrencyExpression.traverse and used OT/J for the cleanest implementation of double dispatch (which is what the visitor pattern painstakingly emulates): we need dispatch that considers both the visitor type and the node type for finding the suitable method implementation.

In green-field development this would be still easier but even on top of an existing visitor infrastructure it get’s pretty concise.

Visitor adaptation – version 1

My first version looks like this (explanations follow below):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
public team class VisitorsAdaptor {
 
    protected team class AstFormatting playedBy CodeFormatterVisitor {
        // whenever visiting something that could contain an expression
        // activate this team to enable callins of the inner role
        callin void visiting() {
            within(this) {
                base.visiting();
            }
        }
        @SuppressWarnings("decapsulation")
        void visiting()
            <- replace
                boolean visit(Block block, BlockScope scope),
                boolean visit(FieldDeclaration fieldDeclaration, MethodScope scope),
                void formatStatements(BlockScope scope, final Statement[] statements, boolean insertNewLineAfterLastStatement);
 
        Scribe getScribe() -> get Scribe scribe;
 
        /** This role implements formating of our custom ast: */
        protected class CustomAst playedBy CurrencyExpression
        {
            void traverse() <- replace void traverse(ASTVisitor visitor, BlockScope scope);
 
            @SuppressWarnings({ "inferredcallout", "basecall" })
            callin void traverse() {
                Scribe scribe = getScribe();
                Scanner scanner = scribe.scanner;
 
                // format this AST node into a StringBuffer:
                StringBuffer replacement = new StringBuffer();
                replacement.append("<% ");
                this.value.printExpression(0, replacement);
                replacement.append(' ');
                replacement.append(this.currency);
                replacement.append(" %>");
 
                // feed the formatted string into the Scribe:
                int start = this.sourceStart();
                int end = this.sourceEnd();
                scribe.addReplaceEdit(start, end, replacement.toString());
 
                // advance the scanner:
                scanner.resetTo(end+1, scribe.scannerEndPosition - 1);
                scribe.pendingSpace = false;
            }
        }
    }
}
 

The key trick in this example is nesting:

  • Role AstFormatting is responsible for detecting when a CodeFormatterVisitor is visiting any subtree that may contain expressions. This is done using a callin binding that lists three relevant base methods which all should be intercepted by the same role method (lines 12-16).
  • Inside role AstFormatting (which is also marked as a team) an inner role CustomAst will only be triggered if a CodeFormatterVisitor calls the traverse method of a CurrencyExpression (see callin binding in line 23).
  • The connection between both levels is wired in method AstFormatting.visiting: the block statement within() { } temporarily and locally activates the given team instance, here denoted by this. Only during this block the nested team AstFormatting is active – meaning that only during this block the callin binding in role CustomAst will fire.
  • Within role CustomAst we can naturally access the CodeFormatterVisitor via the enclosing instance of AstFormatting. No instanceof and casting needed, because all this only happens in the context of a CodeFormatterVisitor

The body of method traverse contains only domain logic: pretty-printing the current node into a string buffer and interacting with the underlying infrastructure (Scanner, Scribe) that drives the formatting.

That’s it, with these classes in place, we can write this method:

int getMoney() {
   int myMoney = <% 
                  3
          euro %>
        ; 
                System
        .out.println("myMoney ="+myMoney);
                                        return myMoney;
}
 

then hit Ctrl-Shift-F et voilà:

private static int getMoney() {
    int myMoney = <% 3 euro %>;
    System.out.println("myMoney =" + myMoney);
    return myMoney;
}
 

How’s that? 🙂
The formatter smoothly operates on the full hybrid language, not just skipping over our nodes but handling them as well.

Generalizing visitor adaptations

After success wrt both challenges I’d like to clean up even more and prepare for further adaptations of other visitors. Given how many subclasses of ASTVisitor are used within the JDT we wouldn’t want to write the infrastructure for double dispatch over and over again. So let’s generalize, that is: extract a common super-class, by extracting everything re-usable out off class AstFormatting

public team class VisitorsAdaptor
{
    protected abstract team class AstVisiting playedBy ASTVisitor {
        // whenever visiting something that could contain an expression
        // activate this team to enable callins of the inner role
        callin void visiting() {
            within(this)
                base.visiting();
        }
        void visiting()
            <- replace
                boolean visit(Block block, BlockScope scope),
                boolean visit(FieldDeclaration fieldDeclaration, MethodScope scope);
 
        protected abstract class CustomAst playedBy CurrencyExpression {
            // variant of traversal that should be used when the enclosing team is active:
            // (implement in subclasses)
            abstract callin void traverse();
            void traverse() <- replace void traverse(ASTVisitor visitor, BlockScope scope);
        }
        // Insert more roles for binding more AST nodes...
    }
 
    protected team class AstFormatting extends AstVisiting playedBy CodeFormatterVisitor
    {
        // one more trigger that should activate the team:
        @SuppressWarnings("decapsulation")
        visiting <- replace formatStatements;
 
        Scribe getScribe() -> get Scribe scribe;
 
        /** This role implements formating of our custom ast: */
        @Override
        protected class CustomAst {
            @SuppressWarnings({ "inferredcallout", "basecall" })
            callin void traverse() {
                // method body as before
            }
        }
    }
    protected team class OtherVisitorAdaptor extends AstVisiting playedBy XYVisitor
    {
        @Override
        protected class CustomAst {
            callin void traverse() {
                // domain logic
            }
        }
        // Insert more roles for actually handling more AST nodes ...
    }
}
 

Now team class AstVisiting contains the part that is common for all visitors. At this level several things are still abstract: method traverse, role class CustomAst and even the whole team AstVisiting.

Team class AstFormatting extends the abstract team and defines everything specific to formatting. We have one more trigger for visiting, one callout binding to a field of class CodeFormatterVisitor and then we only refine the previously abstract role class CustomAst. At this level it is no longer abstract because we give an implementation for traverse.

I’ve also sketched another nested team showing a minimal specialization of AstVisiting for adapting some other visitor and adding another implementation for CustomAst.traverse plus potentially more roles for more node types.

Conclusion

For those who don’t work in the compiler business on a day-to-day basis this is probably pretty tough stuff, but let me summarize what we’ve just achieved:

  • Embed a custom syntax into Java, showing how a custom parser can be plugged in to create custom AST from a region of the Java source.
  • Adapt the conversion between two different AST structures (internal -> DOM) to also handle custom nodes.
  • Adapt the code formatter so that hybrid sources can be formatted with a single command.
  • Prepared the infrastructure for adapting other visitors, too. By this we have achieved that new visitor adaptations will only need to add their specific implementation with close to zero scaffolding.
  • Cleanly separated each implemented concern in one module.
  • Keep each module in the scale of only tens of lines of code.
  • Yet implement significant steps towards a production quality IDE for our custom hybrid language.

Maybe I shouldn’t have told you, how easy these things can be – if your tools are sharp – maybe.
But professional carvers know: if your knife is sharp, it’s actually easy to handle. Only if it is blunt you are in real danger of hurting yourself – because you need to apply disproportionate force to cut your wood. So:

Spare your fingers, sharpen your knife!

PS: Here’s the archive of all sources, ready to be imported into the OTDT.

Advertisements

Written by Stephan Herrmann

February 26, 2010 at 13:18

Posted in Eclipse, Examples, Object Teams, OTEquinox

Tagged with

4 Responses

Subscribe to comments with RSS.

  1. Hey,
    sorry for being off topic but I really like your theme and just couldn’t resist. It seems to go really well with Eclipse. What is its name?

    Best regards,
    Piotr

    piter

    March 2, 2010 at 15:29

  2. @Piotr:
    The theme is called Journalist, but that’s just the clean page layout. You might be more interested in the syntax highlighting, which I adapted to match Eclipse. It’s basically the wp-syntax plugin using Geshi. The adjusted java.php can be found in this bug (it’s now used in the Eclipse wiki, too):

    https://bugs.eclipse.org/236811

    HTH
    Stephan

    stephan

    March 2, 2010 at 16:01

  3. Hey Stephan,
    I was not being obvious 😦 but I was asking about your OS theme. On the screenshot in this entry and also former entries it goes really well with Eclipse layout.

    Best regards,
    Piotr

    piter

    March 3, 2010 at 07:32

  4. Obviously “theme” can mean many things these days 🙂

    As for the windowing scheme I use KDE, the GTK+ style is “QtCurve”, window decorations “Plastik” and custom colors for some things like window title, selection background.

    stephan

    March 3, 2010 at 12:31


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: