The Object Teams Blog

Adding team spirit to your objects.

IDE for your own language embedded in Java? (part 1)

with 8 comments

Have you ever thought of making Java a bit smarter? Perhaps, for some task you would prefer a custom syntax, and snippets using that syntax should then be embedded into Java? Sure, many never seriously think about this because of the prohibitively high effort to create the compiler for such hybrid language. And even if you are a compiler guru, knowing your toolkits so that translation wouldn’t be a problem for you, you’ll probably surrender at the mere thought of how to create a mature IDE that would allow efficiently productive work with you hybrid language.

You shouldn’t give up. Think: If you build your own IDE you’ll never be able to really compete with the JDT, right? Still anything falling back behind the quality of the JDT won’t raise your productivity but will stand in your way at the most common tasks during development, right?
What does this tell you? Give up? No. If you can’t beat us, join us. Don’t write a new IDE for any Java-based language. Join the JDT. Well, but the JDT doesn’t provide an extension point for embedding a different syntax, does it? Sure they don’t, but it’s actually not their job to do so because every embedded language will probably have slightly different requirements so designing such an extension point would be a battle you can never win.

I have developed a tiny extension to Java and integrated this into the JDT by a mere 204 lines of code including comments and a plugin.xml. As some may guess the only trick needed is to use Object Teams. By this post I will explain how Object Teams can be used for extending the JDT in this way. And I will also argue against the most common fear in this context: “Is that solution maintainable?” From my very own experience this design is not just barely manageable, but from all I’ve seen this is the best maintainable solution for this kind of task, but I’m getting ahead of myself.

In order not to distract from the interesting design issues I’ll be using the most simply language extension: I want to be able to write integer constants in natural language, and while I’m at it, I want it to work in an multilingual setting. So, this should, e.g., be a legal program:

public class EmbeddingTest {
    private static int foo() {
        return <% one %>;
    }
    public static void main(String[] args) {
        System.out.println(foo());
    }
}
 

I’m using <% and %> tokens to switch between Java syntax and custom syntax.

The first step can be achieved in plain Java, it’s creating a class for ASTNodes representing my custom int literals within the compiler. If you really want you may inspect class CustomIntLiteral, but it’s actually pretty boring old Java. Its main job is to lookup a given string from an array of known number words and thus translate the word into an int. It even detects the language used and remembers this for later use. The behaviour is hooked into the JDT compiler by overriding method TypeBinding resolveType(BlockScope scope) — just normal Java practice.

Drilling down into the example

Here’s an overview of the module that does all the rest:

  1. package embedding.jdt;
  2.  
  3. import org.eclipse.jdt.core.compiler.CharOperation;
  4. import org.eclipse.jdt.core.compiler.InvalidInputException;
  5. import org.eclipse.jdt.core.dom.AST;
  6. import org.eclipse.jdt.core.dom.ASTNode;
  7. import org.eclipse.jdt.internal.compiler.ast.Expression;
  8. import org.eclipse.jdt.internal.compiler.ast.IntLiteral;
  9. import org.eclipse.jdt.internal.compiler.parser.TerminalTokens;
  10.  
  11. import embedding.custom.ast.CustomIntLiteral;
  12.  
  13. import base org.eclipse.jdt.core.dom.ASTConverter;
  14. import base org.eclipse.jdt.core.dom.NumberLiteral;
  15. import base org.eclipse.jdt.internal.compiler.parser.Parser;
  16. import base org.eclipse.jdt.internal.compiler.parser.Scanner;
  17.  
  18. public team class SyntaxAdaptor {
  19.  
  20. /**
  21.   * <h3>Part 1 of the adaptation:</h3>
  22.   * Wait until '<' is seen and check if it actually is a special string enclosed in '<%' and '%>'.
  23.   */
  24. protected class ScannerAdaptor playedBy Scanner { ... }
  25.  
  26. /**
  27.   * <h3>Part 2 of the adaptation:</h3>
  28.   * If the ScannerAdaptor found a match intercept creation of the faked null expression
  29.   * and replace it with a custom AST.
  30.   *
  31.   * This is a team with a nested role so that we can control activation separately.
  32.   *
  33.   * This team should be activated for the current thread only to ensure that
  34.   * concurrent compilations don't interfere: By using thread activation any state of
  35.   * this team is automatically local to that thread.
  36.   */
  37. protected team class InnerCompilerAdaptor {
  38. /** This inner role does the real work of the InnerCompilerAdaptor. */
  39. protected class ParserAdaptor playedBy Parser { ... }
  40. }
  41.  
  42. /**
  43.   * Dom representation of CustomIntLiteral.
  44.   * Since the constructor of NumberLiteral is package private we cannot subclass, so use a role instead.
  45.   */
  46. protected class DomCustomIntLiteral playedBy NumberLiteral
  47. // don't adapt plain NumberLiterals, just those that already have a DomCustomIntLiteral role:
  48. base when (SyntaxAdaptor.this.hasRole(base, DomCustomIntLiteral.class))
  49. { ... }
  50.  
  51. /**
  52.   * <h3>Part 3 of the adaptation:</h3>
  53.   * This adaptor role helps the ASTConverter to convert CustomIntLiterals, too.
  54.   */
  55. @SuppressWarnings("decapsulation")
  56. protected class DomConverterAdaptor playedBy ASTConverter { ... }
  57. }
  58.  

Imports

Why am I showing you boring import declarations to begin with? Well, with OT/J there’s a fine distinction that is worth looking at: all imports starting with import base indicate that these classes are imported for attaching a role to them. So just from these lines you see that the given module adds roles to classes from org.eclipse.jdt.internal.compiler.parser and org.eclipse.jdt.core.dom (2 classes each). All other imports are plain Java imports and won’t let you apply any OT/J tricks.

Teams and Roles

Line 18 above tells you that the class SyntaxAdaptor is actually a team. Teams are used for grouping a set of roles – nested classes of a team. Using the playedBy keyword a role declares that it adapts the specified base class (which are the same classes we base-imported above). The purpose of these roles should be roughly clear by the doc comments.

So, role ScannerAdaptor will be responsible for switching between both syntaxes.

Role ParserAdaptor (line 39) will be responsible for creating our AST node (CustomIntLiteral). But wait, what’s that: the role is nested within an intermediate team, InnerCompilerAdaptor. This team will show you, how to define a role that is only effective in specific situations, here, the ParserAdaptor should only be effective after the ScannerAdaptor has detected a syntax switch. Details follow below.

The other two roles will do advanced stuff so I’ll discuss them later.

Role implementation (1)

Here is the full(!) code of role ScannerAdaptor:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
protected class ScannerAdaptor playedBy Scanner {
 
    // access fields from Scanner ("callout bindings"):
    int getCurrentPosition()                     -> get int currentPosition;
    void setCurrentPosition(int currentPosition) -> set int currentPosition;
    char[] getSource()                           -> get char[] source;
 
    // intercept this method from Scanner ("callin binding"):
    int getNextToken() <- replace int getNextToken();
 
    callin int getNextToken() throws InvalidInputException {
        // invoke the original method:
        int token = base.getNextToken();
        if (token == TerminalTokens.TokenNameLESS) {
            char[] source = getSource();
            int pos = getCurrentPosition();
            if (source[pos++] == '%') {                                        // detecting the opening "<%" ?
                int start = pos;                                               // inner start, just behind "<%"
                try {
                    while (source[pos++] != '%' || source[pos++] != '>')       // detecting the closing "%>" ?
                        ;                                                      // empty body
                } catch (ArrayIndexOutOfBoundsException aioobe) {              // not found, proceed as normal
                        return token;
                }
                setCurrentPosition(pos);                                       // tell the scanner what we have consumed (pointing one past '>')
                int end = pos-2;                                               // position of "%>"
                char[] fragment = CharOperation.subarray(source, start, end);  // extract the custom string (excluding <% and %>)
                // prepare an inner adaptor to intercept the expected parser action
                new InnerCompilerAdaptor(fragment, start-2, end+1).activate(); // positions include <% and %>
                return TerminalTokens.TokenNamenull;                           // pretend we saw a valid expression token ('null')
            }
        }
        return token;
    }
}
 

Comments describing the logic are in the right column. Inline comments describe the usage of OT/J:

  • Lines 3-6 define accessors for two fields from the base class Scanner.
  • Line 9 defines that calls to method getNextToken() should be intercepted by our version of this method
  • Line 11 marks the role method as callin which is a pre-requisite for line 13
  • Line 13 invokes the original method from Scanner
  • In line 29 we are in the situation that we have detected a region delimited by <% and %>. We have extracted the text fragment between delimiters, and we know the start and end positions within the source file. Only now we create an instance of InnerCompilerAdaptor and immediately activate it for the current thread (activate()).

At this point the ScannerAdaptor is done and now an InnerCompilerAdaptor is watching what comes next.

Here’s the nested team InnerCompilerAdaptor with its role ParserAdaptor:

  1. protected team class InnerCompilerAdaptor {
  2.  
  3. char[] source;
  4. int start, end;
  5. protected InnerCompilerAdaptor(char[] source, int start, int end) {
  6. this.source = source;
  7. this.start = start;
  8. this.end = end;
  9. }
  10.  
  11. /** This inner role does the real work of the InnerCompilerAdaptor. */
  12. protected class ParserAdaptor playedBy Parser {
  13. // import methods from Parser ("callout bindings"):
  14. @SuppressWarnings("decapsulation")
  15. void pushOnExpressionStack(Expression expr) -> void pushOnExpressionStack(Expression expr);
  16.  
  17. // intercept this method from Parser ("callin binding"):
  18. void consumeToken(int type) <- replace void consumeToken(int type);
  19.  
  20. @SuppressWarnings("basecall")
  21. callin void consumeToken(int type) {
  22. if (type == TerminalTokens.TokenNamenull) { // 'null' token is the faked element pushed by the SyntaxAdaptor
  23. // this inner adaptor has done its job, no longer intercept
  24. InnerCompilerAdaptor.this.deactivate();
  25. // TODO analyse source to find what AST should be created
  26. Expression replacement = new CustomIntLiteral(source, start, end, start+2, end-2);
  27. this.pushOnExpressionStack(replacement); // feed custom AST into the parser:
  28. return;
  29. }
  30. // shouldn't happen: only activated when scanner returns TokenNamenull
  31. base.consumeToken(type);
  32. }
  33. }
  34. }
  35.  
  • Lines 3-4 define state of the nested team, which is used for passing the information collected by the ScannerAdaptor down the pipe
  • Line 15 provides access to a protected method from Parser. By @SuppressWarnings("decapsulation") we document that this access inserts a tiny little hole into the encapsulation of Parser
  • Line 18 defines a callin binding as we have seen it before.
  • Line 24 already deactivates the enclosing InnerCompilerAdaptor, ensuring this is a one-shot adaptation, only.
  • Line 26/27 perform the payload: feed a CustomIntLiteral node into the parser

Coming to life

Wow, if you’ve read so far, you’ve seen a lot of OT/J on just a few lines of code. Let’s wire things together, by throwing the code into an Object Teams Plug-in Project and declaring one extension:
Aspect bindings for the SyntaxAdaptor
I have defined one aspectBinding between the existing plugin org.eclipse.jdt.core and my team classes SyntaxAdaptor and InnerAdaptor (there’s a man behind the curtain pushing an ugly __OT__ prefix into the declaration, please ignore him – he’ll be gone in the next release of the tool).
Please note that for team SyntaxAdaptor I have set the activation to ALL_THREADS which means that at application launch an instance of this team will be created and activated globally. Without this flag the whole thing would actually have no effect at all.

That’s all the wiring needed, so kick up a runtime workbench, create a Java project and class, insert the code for class EmbeddingTest from the top of this post and boldly select Run As > Java Application. In the console we see a result:

1

Oops, the compiler for our little language extension already works? Did you see me writing a compiler?

Well, beginner’s luck, let’s assume. But, oops, watch this: When I mistype the return type of foo and ask the JDT for help, this is what I see:

The problem view tells me it knows that <% one %> has type int, which doesn’t match the declared return type boolean. Next I positioned the cursor on “one” (the element that’s definitely not Java) and hit Ctrl-1, and the standard JDT quickfix knows that I should change the return type of foo to int.
Did you watch me implementing a quickfix??

Summary so far

Here’re the stats:

  • 204 lines of code including plugin.xml
  • roles adapting two base classes from org.eclipse.jdt.core.
  • callout bindings to two fields and one method
  • callin bindings to two methods
  • all adaptation is cleanly encapsulated in one team class. If you wish you could even deactivate this one team in a running workbench and thus disable all our adaptions with a single click.
  • one plain Java class to implement the semantics of our extension

As for maintainability: The only dependencies are the items mentioned above: two classes, two fields and three methods. Only if one of these are modified under evolution, my adaptation has to be updated accordingly – and: if this happens I will definitely be told by the compiler because one of the bindings will break. If it doesn’t break there’s no need to worry.

With this implementation the compiler seamlessly works with our new syntax and even UI features that operate on the compiler AST can handle our extension, too.

What’s next?

I’m sure some think that the above is probably a forged example. You might challenge me to do something real, like refactoring. If you do so, you actually got me (mumble, mumble) – with the above implementation refactoring does not work with our custom syntax. Now that you’ve seen the start, what do you expect, how much additional rocket science does it take to add minimal refactoring support? (to be continued)

Advertisements

Written by Stephan Herrmann

February 22, 2010 at 22:19

Posted in Eclipse, Examples, Object Teams, OTEquinox

Tagged with

8 Responses

Subscribe to comments with RSS.

  1. Very cool, Stephen. I’m looking forward to chatting with you again at EclipseCon.

    Andrew E

    February 25, 2010 at 04:31

  2. Thanks, Andrew. Me too, I’d love to continue our chat, but after all four of my submissions were rejected I can’t come to EclipseCon. Too bad really. But yes, let’s stay in touch anyway!

    Whom would we kidnap this time to ensure your patches are committed? Pascal again? Olivier? 😉

    stephan

    February 25, 2010 at 14:57

  3. […] the first part I demonstrated how Object Teams can be used to extend the JDT compiler for a custom language to be […]

  4. Hi,

    Your project is existing and should help anybody who wants to develop java like language.

    But I have some question:
    – Your sample uses as delimiters. Is your solution still easy if I do not used delimiters (e.g. becomes $one).
    – I tried you code and I made some tests:
    The syntax ‘if ( 1 <=) {}’ is valid but ‘if ( 1 <)’ is not. I imagine that there is some conflicts between << and <<%. Is this the problem ?

    Thanks for your answer.

    lgnord

    March 2, 2010 at 16:39

  5. Regarding delimiters: the only hard requirement is that the Scanner must detect the start and end of the custom syntax during getNextToken(). You could use something like the CustomParser from the part 2 post [1], call it already during getNextToken() and ask the parser how many chars it has consumed before calling setCurrentPosition() which tells the JDT parser where it should resume normal Java parsing.

    I’m not sure I understand your syntax examples but yes, << and < are different tokens which could create a conflict (try asking for TokenNameLEFT_SHIFT etc.).

    Is it any clearer now?
    Stephan

    [1] full sources are at http://www.objectteams.org/examples/embedded-expression-language-v2.zip

    stephan

    March 2, 2010 at 17:46

  6. Hi Stephan,

    Thanks for answer and sorry for the delay of my new comments.

    The first part of your explanation is very clear. I will now try to clarify my question on the ‘<<‘ token.

    First I write to write a conditional expression. Of course, It uses a expression:
    ‘if ( 1 <=) {}’
    The compiler does not complain. Then I modify my code:
    ‘if ( 1 <) {}’
    And the compiler complains. I do not understand why. I only replace a binary operator ‘<=’ by a new one ‘<‘. Now if I add some layouts after the first ‘<‘, it works…
    ‘if ( 1 < ) {}’

    PS. I read again my first post and my (whitout spaces) disappeared. This perhaps explains why my question was difficult to understand

    lgnord

    March 10, 2010 at 17:48

  7. #

    Hi Stephan,

    Thanks for answer and sorry for the delay of my new comments.

    The first part of your explanation is very clear. I will now try to clarify my question on the ‘<<’ token.

    First I write to write a conditional expression.
    ‘if ( 1 <=<%one%>) {}’
    The compiler does not complain. Then I modify my code:
    ‘if ( 1 <<%one%>) {}’
    And the compiler complains. I do not understand why. I only replace a binary operator ‘<=’ by a new one ‘<’. Now if I add some layouts after the first ‘<’, it works…
    ‘if ( 1 < <%one%>) {}’

    PS. I read again my first post and my <%one%> disappeared. This perhaps explains why my question was difficult to understand

    lgnord

    March 15, 2010 at 15:53

  8. Hi lgnord,
    now I see your problem 🙂

    When the scanner sees “<<%" it interprets this as a LEFT_SHIFT ("<<") plus trailing garbage.

    If you really want to re-interpret this you'd just have to add a few lines to the ScannerAdaptor (let's see if the code get's through):

    } else if (token == TerminalTokens.TokenNameLEFT_SHIFT) {
        char[] source = getSource();
        int pos = getCurrentPosition();
        if (source[pos] == '%') { // detecting the opening "<<%" ?
            setCurrentPosition(pos-1); // put back the second '<'
    	return TerminalTokens.TokenNameLESS; // return the first '<'
        }
    

    So it is possible, though I think it would actually be confusing if “<<" is broken into two parts.

    best
    Stephan

    stephan

    March 17, 2010 at 14:10


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: