[DROOLS-7577] drools-lsp : Add comments and explanations for drools-p… (

#43) * [DROOLS-7577] drools-lsp : Add comments and explanations for drools-parser * removed syntax examples
kiegroup · Oct 24, 2024 · cdd9d3c · cdd9d3c
1 parent e56f2a7
commit cdd9d3c
Show file tree

Hide file tree

Showing 7 changed files with 142 additions and 25 deletions.
diff --git a/drools-drl/drools-drl10-parser/Developer_Notes.md b/drools-drl/drools-drl10-parser/Developer_Notes.md
@@ -0,0 +1,38 @@
+## drools-parser
+
+This module is a reimplementation of the DRL (Drools Rule Language) parser based on ANTLR4.
+
+The current [DRL6Parser](https://github.com/apache/incubator-kie-drools/blob/main/drools-drl/drools-drl-parser/src/main/java/org/drools/drl/parser/lang/DRL6Parser.java) is based on ANTLR3 and contains a lot of custom modifications, which is hard to maintain. This new module should keep the separation between the parser syntax (`DRLParser.g4`) and the Descr generation (`DRLVisitorImpl.java`).
+
+This module started with a part of LSP to develop DRL editors, but it is not limited to that. This module will also replace DRL6Parser in the drools code base.
+
+### How is this developed?
+
+1. The starting point is [DRL6Parser](https://github.com/apache/incubator-kie-drools/blob/main/drools-drl/drools-drl-parser/src/main/java/org/drools/drl/parser/lang/DRL6Parser.java). While it contains lots of customizations, we can map its javadoc (e.g. `packageStatement := PACKAGE qualifiedIdentifier SEMICOLON?`) to `DRLParser.g4` (e.g. `packagedef : PACKAGE name=drlQualifiedName SEMI? ;`).
+2. `DRLLexer.g4` is written to define tokens for DRL.
+3. `DRLLexer.g4` imports `JavaLexer.g4` to reuse Java tokens. `DRLParser.g4` imports `JavaParser.g4` to reuse Java grammar. These Java parser files are distributed by ANTLR4 under BSD license.
+4. In `DRLLexer.g4`, basically define tokens with a prefix "DRL_" to clarify they are DRL keywords.
+5. In `DRLParser.g4`, define parser rules with a prefix "drl" if the rule name conflicts with `JavaParser.g4`. Sometimes we need to do that, because such a rule may contain DRL keywords.
+6. (As of 2023/10/31) this parser doesn't deeply parse rule RHS (just multiple `RHS_CHUNK`s), because Drools passes RHS text to drools-compiler as-is. In case of developing DRL editors, we may need to integrate another Java LSP to support RHS code completion, etc.
+7. LHS constraint (e.g. `age > 30`) is also handled as text. Further processing will be done in the later compiler phase.
+8. `DRLParser` processes a DRL text and produces an AST(abstract syntax tree). Then apply `DRLVisitorImpl` to generate PackageDescr following the visitor pattern. So the main work would be implementing `DRLParser.g4` and `DRLVisitorImpl`.
+9. Errors are handled by `DRLErrorListener`
+10. (As of 2023/10/31) We have 2 test classes. `DRLParserTest` is a very basic test to check if the parser can parse DRL. `MiscDRLParserTest` contains various DRL syntax to check if the parser generates correct Descr objects. `MiscDRLParserTest` was ported from [RuleParserTest](https://github.com/apache/incubator-kie-drools/blob/main/drools-test-coverage/test-compiler-integration/src/test/java/org/drools/mvel/compiler/lang/RuleParserTest.java) so that we can ensure the compatibility of generated Descr objects between the current implementation and the new one.
+11. As `DRL6Parser` contains hard-coded customizations, sometimes we need to read and understand the `DRL6Parser` source codes to meet the compatibility.
+12. (As of 2023/10/31) `MiscDRLParserTest` still has several test cases with `@Disabled` which are relatively lower priority or edge cases. They need to be resolved at some point in the future. To fix the issues, file a JIRA, remove the `@Disabled` annotation, and fix the implementation to pass the test case.
+
+### Next steps
+
+1. Create a feature branch in drools repo and replace `DRL6Parser` with this new parser.
+2. We will detect issues in the new parser by running the existing tests in drools repo. If we find any issues, we will fix them in the new parser and add new tests to cover them. Such tests would be more or less Descr comparison tests, so we would add a new test class which is similar to `MiscDRLParserTest`.
+
+### Refactoring candidates
+- `DRLParserHelper` and `DRLParserWrapper` have some duplicated code and purpose. We can merge them into one class.
+- `MiscDRLParserTest` can be cleaner and fixed to align with SonarLint suggestions.
+- Constraint related parser rules after `conditionalOrExpression` are written in antlr3 style. They could be refactored to antlr4 style (like `lhsExpression`).
+
+### Development tips
+- IntelliJ IDEA has an ANTLR4 plugin, which "ANTLR Preview" window displays a parse tree. It is very useful to debug the parser rules.
+
+### Resources
+[The Definitive ANTLR 4 Reference](https://pragprog.com/titles/tpantlr2/the-definitive-antlr-4-reference/)
diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLErrorListener.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLErrorListener.java
@@ -7,6 +7,9 @@
 import org.antlr.v4.runtime.RecognitionException;
 import org.antlr.v4.runtime.Recognizer;
 
+/**
+ * Collect errors while parsing DRL
+ */
 public class DRLErrorListener extends BaseErrorListener {
 
     private final List<DRLParserError> errors = new ArrayList<>();

diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserError.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserError.java
@@ -1,5 +1,8 @@
 package org.drools.parser;
 
+/**
+ * Error information while parsing DRL
+ */
 public class DRLParserError {
 
     private int lineNumber;

diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserHelper.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserHelper.java
@@ -10,11 +10,18 @@
 import org.antlr.v4.runtime.tree.TerminalNode;
 import org.drools.drl.ast.descr.PackageDescr;
 
+/**
+ * Collection of static helper methods for DRLParser
+ */
 public class DRLParserHelper {
 
     private DRLParserHelper() {
     }
 
+    /**
+     * Entry point for parsing DRL.
+     * Unlike DRLParserWrapper.parse(), this method does not collect errors.
+     */
     public static PackageDescr parse(String drl) {
         DRLParser drlParser = createDrlParser(drl);
         return compilationUnitContext2PackageDescr(drlParser.compilationUnit(), drlParser.getTokenStream());
@@ -27,6 +34,9 @@ public static DRLParser createDrlParser(String drl) {
         return new DRLParser(commonTokenStream);
     }
 
+    /**
+     * DRLVisitorImpl visits a parse tree and creates a PackageDescr
+     */
     public static PackageDescr compilationUnitContext2PackageDescr(DRLParser.CompilationUnitContext ctx, TokenStream tokenStream) {
         DRLVisitorImpl visitor = new DRLVisitorImpl(tokenStream);
         Object descr = visitor.visit(ctx);
@@ -37,6 +47,9 @@ public static PackageDescr compilationUnitContext2PackageDescr(DRLParser.Compila
         }
     }
 
+    /**
+     * Given a row and column of the input DRL, return the index of the matched token
+     */
     public static Integer computeTokenIndex(DRLParser parser, int row, int col) {
         for (int i = 0; i < parser.getInputStream().size(); i++) {
             Token token = parser.getInputStream().get(i);

diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserWrapper.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserWrapper.java
@@ -10,12 +10,18 @@
 
 import static org.drools.parser.DRLParserHelper.compilationUnitContext2PackageDescr;
 
+/**
+ * Wrapper for DRLParser. Somewhat duplicated from DRLParserHelper, but this class is instantiated and holds errors.
+ */
 public class DRLParserWrapper {
 
     private static final Logger LOGGER = LoggerFactory.getLogger(DRLParserWrapper.class);
 
     private final List<DRLParserError> errors = new ArrayList<>();
 
+    /**
+     * Main entry point for parsing DRL
+     */
     public PackageDescr parse(String drl) {
         DRLParser drlParser = DRLParserHelper.createDrlParser(drl);
         DRLErrorListener errorListener = new DRLErrorListener();

diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLVisitorImpl.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLVisitorImpl.java
@@ -48,6 +48,12 @@
 import static org.drools.parser.ParserStringUtils.trimThen;
 import static org.drools.util.StringUtils.unescapeJava;
 
+/**
+ * Visitor implementation for DRLParser.
+ * Basically, each visit method creates and returns a Descr object traversing the parse tree.
+ * Finally, visitCompilationUnit() returns a PackageDescr object.
+ * Try not to depend on DRLVisitorImpl's internal state for clean maintainability
+ */
 public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> {
 
     private final TokenStream tokenStream;
@@ -56,6 +62,9 @@ public DRLVisitorImpl(TokenStream tokenStream) {
         this.tokenStream = tokenStream;
     }
 
+    /**
+     * Main entry point for creating PackageDescr from a parser tree.
+     */
     @Override
     public PackageDescr visitCompilationUnit(DRLParser.CompilationUnitContext ctx) {
         PackageDescr packageDescr = new PackageDescr();
@@ -67,7 +76,11 @@ public PackageDescr visitCompilationUnit(DRLParser.CompilationUnitContext ctx) {
         return packageDescr;
     }
 
+    /**
+     * Add all children Descr to PackageDescr
+     */
     private void applyChildrenDescrs(PackageDescr packageDescr, List<BaseDescr> descrList) {
+        // This bunch of if-blocks will be refactored by DROOLS-7564
         descrList.forEach(descr -> {
             if (descr instanceof UnitDescr) {
                 packageDescr.setUnit((UnitDescr) descr);
@@ -147,6 +160,8 @@ public FunctionDescr visitFunctiondef(DRLParser.FunctiondefContext ctx) {
             functionDescr.setReturnType("void");
         }
         functionDescr.setName(ctx.IDENTIFIER().getText());
+
+        // add function parameters
         DRLParser.FormalParametersContext formalParametersContext = ctx.formalParameters();
         DRLParser.FormalParameterListContext formalParameterListContext = formalParametersContext.formalParameterList();
         if (formalParameterListContext != null) {
@@ -163,7 +178,7 @@ public FunctionDescr visitFunctiondef(DRLParser.FunctiondefContext ctx) {
 
     @Override
     public BaseDescr visitDeclaredef(DRLParser.DeclaredefContext ctx) {
-        return visitDescrChildren(ctx).get(0);
+        return visitDescrChildren(ctx).get(0); // only one child
     }
 
     @Override
@@ -202,6 +217,9 @@ public WindowDeclarationDescr visitWindowDeclaration(DRLParser.WindowDeclaration
         return windowDeclarationDescr;
     }
 
+    /**
+     * entry point for one rule
+     */
     @Override
     public RuleDescr visitRuledef(DRLParser.RuledefContext ctx) {
         RuleDescr ruleDescr = new RuleDescr(safeStripStringDelimiters(ctx.name.getText()));
@@ -228,16 +246,27 @@ public RuleDescr visitRuledef(DRLParser.RuledefContext ctx) {
 
         if (ctx.rhs() != null) {
             ruleDescr.setConsequenceLocation(ctx.rhs().getStart().getLine(), ctx.rhs().getStart().getCharPositionInLine()); // location of "then"
-            ruleDescr.setConsequence(trimThen(getTextPreservingWhitespace(ctx.rhs())));
+            ruleDescr.setConsequence(trimThen(getTextPreservingWhitespace(ctx.rhs()))); // RHS is just a text
         }
 
         return ruleDescr;
     }
 
     private void slimLhsRootDescr(AndDescr root) {
+        // Root Descr is always AndDescr.
+        // For example, if there are nested AndDescr like
+        //  AndDescr
+        //  /\
+        // P  AndDescr
+        //     /\
+        //    P  P
+        // is slimmed down to
+        //  AndDescr
+        //  / | \
+        // P  P  P
         List<BaseDescr> descrList = new ArrayList<>(root.getDescrs());
         root.getDescrs().clear();
-        descrList.forEach(root::addOrMerge); // This slims down nested AndDescr
+        descrList.forEach(root::addOrMerge);
     }
 
     @Override
@@ -373,6 +402,9 @@ public AttributeDescr visitDurationAttribute(DRLParser.DurationAttributeContext
         return attributeDescr;
     }
 
+    /**
+     * entry point for LHS
+     */
     @Override
     public List<BaseDescr> visitLhs(DRLParser.LhsContext ctx) {
         if (ctx.lhsExpression() != null) {
@@ -394,10 +426,12 @@ public BaseDescr visitLhsPatternBind(DRLParser.LhsPatternBindContext ctx) {
     }
 
     private PatternDescr getSinglePatternDescr(DRLParser.LhsPatternBindContext ctx) {
-        Optional<BaseDescr> optPatternDescr = visitFirstDescrChild(ctx);
-        PatternDescr patternDescr = optPatternDescr.filter(PatternDescr.class::isInstance)
-                .map(PatternDescr.class::cast)
-                .orElseThrow(() -> new IllegalStateException("lhsPatternBind must have at least one lhsPattern : " + ctx.getText()));
+        List<BaseDescr> patternDescrList = visitDescrChildren(ctx);
+        if (patternDescrList.isEmpty() || !(patternDescrList.get(0) instanceof PatternDescr)) {
+            throw new IllegalStateException("lhsPatternBind must have at least one lhsPattern : " + ctx.getText());
+        }
+        PatternDescr patternDescr = (PatternDescr)patternDescrList.get(0);
+
         if (ctx.label() != null) {
             patternDescr.setIdentifier(ctx.label().IDENTIFIER().getText());
         } else if (ctx.unif() != null) {
@@ -423,6 +457,9 @@ private OrDescr getOrDescrWithMultiplePatternDescr(DRLParser.LhsPatternBindConte
         return orDescr;
     }
 
+    /**
+     * entry point for a Pattern
+     */
     @Override
     public PatternDescr visitLhsPattern(DRLParser.LhsPatternContext ctx) {
         PatternDescr patternDescr = new PatternDescr(ctx.objectType.getText());
@@ -528,13 +565,19 @@ public EntryPointDescr visitFromEntryPoint(DRLParser.FromEntryPointContext ctx)
         return new EntryPointDescr(safeStripStringDelimiters(ctx.stringId().getText()));
     }
 
+    /**
+     * Collect constraints in a Pattern
+     */
     @Override
     public List<ExprConstraintDescr> visitConstraints(DRLParser.ConstraintsContext ctx) {
         List<ExprConstraintDescr> exprConstraintDescrList = new ArrayList<>();
         populateExprConstraintDescrList(ctx, exprConstraintDescrList);
         return exprConstraintDescrList;
     }
 
+    /**
+     * Collect constraints in a Pattern. Positional constraints comes first with semicolon.
+     */
     private List<ExprConstraintDescr> visitConstraints(DRLParser.PositionalConstraintsContext positionalCtx, DRLParser.ConstraintsContext ctx) {
         List<ExprConstraintDescr> exprConstraintDescrList = new ArrayList<>();
         populateExprConstraintDescrList(positionalCtx, exprConstraintDescrList);
@@ -557,6 +600,9 @@ private void populateExprConstraintDescrList(ParserRuleContext ctx, List<ExprCon
         }
     }
 
+    /**
+     * Takes one constraint as String and create ExprConstraintDescr
+     */
     @Override
     public ExprConstraintDescr visitConstraint(DRLParser.ConstraintContext ctx) {
         String constraint = visitConstraintChildren(ctx);
@@ -618,6 +664,7 @@ public EvalDescr visitLhsEval(DRLParser.LhsEvalContext ctx) {
 
     @Override
     public BaseDescr visitLhsExpressionEnclosed(DRLParser.LhsExpressionEnclosedContext ctx) {
+        // enclosed expression is simply stripped because Descr itself is encapsulated
         return (BaseDescr) visit(ctx.lhsExpression());
     }
 
@@ -708,7 +755,7 @@ public BaseDescr visitLhsAndDef(DRLParser.LhsAndDefContext ctx) {
 
     @Override
     public BaseDescr visitLhsUnary(DRLParser.LhsUnaryContext ctx) {
-        return visitDescrChildren(ctx).get(0);
+        return visitDescrChildren(ctx).get(0); // lhsUnary has only one child
     }
 
     private void populateStartEnd(BaseDescr descr, ParserRuleContext ctx) {
@@ -718,6 +765,10 @@ private void populateStartEnd(BaseDescr descr, ParserRuleContext ctx) {
         descr.setEndCharacter(ctx.getStop().getStopIndex());
     }
 
+    /**
+     * This is a special version of visitChildren().
+     * This collects children BaseDescr objects and returns them as a list.
+     */
     private List<BaseDescr> visitDescrChildren(RuleNode node) {
         List<BaseDescr> aggregator = new ArrayList<>();
         int n = node.getChildCount();
@@ -768,21 +819,10 @@ public ParseTree getNode() {
         }
     }
 
-    // leaves of constraint concatenate return Strings
+    /**
+     * Return the text of constraint as-is
+     */
     private String visitConstraintChildren(ParserRuleContext ctx) {
         return getTokenTextPreservingWhitespace(ctx, tokenStream);
     }
-
-    private Optional<BaseDescr> visitFirstDescrChild(RuleNode node) {
-        int n = node.getChildCount();
-
-        for (int i = 0; i < n && this.shouldVisitNextChild(node, null); ++i) {
-            ParseTree c = node.getChild(i);
-            Object childResult = c.accept(this);
-            if (childResult instanceof BaseDescr) {
-                return Optional.of((BaseDescr) childResult);
-            }
-        }
-        return Optional.empty();
-    }
-}
+}
diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/ParserStringUtils.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/ParserStringUtils.java
@@ -5,13 +5,17 @@
 import org.antlr.v4.runtime.misc.Interval;
 
 /**
- * will be merged in drools-util
+ * Collection of String utilities used by DRLParser.
+ * This may be merged in drools-util
  */
 public class ParserStringUtils {
 
     private ParserStringUtils() {
     }
 
+    /**
+     * Strip string delimiters (e.g. "foo" -> foo)
+     */
     public static String safeStripStringDelimiters(String value) {
         if (value != null) {
             value = value.trim();
@@ -22,6 +26,9 @@ public static String safeStripStringDelimiters(String value) {
         return value;
     }
 
+    /**
+     * Get text from ParserRuleContext's CharStream without trimming whitespace
+     */
     public static String getTextPreservingWhitespace(ParserRuleContext ctx) {
         // Using raw CharStream
         int startIndex = ctx.start.getStartIndex();
@@ -34,11 +41,18 @@ public static String getTextPreservingWhitespace(ParserRuleContext ctx) {
         return ctx.start.getTokenSource().getInputStream().getText(interval);
     }
 
+    /**
+     * Get text from ParserRuleContext's CharStream without trimming whitespace
+     * tokenStream is required to get hidden channel token (e.g. whitespace).
+     * Unlike getTextPreservingWhitespace, this method reflects Lexer normalizeString
+     */
     public static String getTokenTextPreservingWhitespace(ParserRuleContext ctx, TokenStream tokenStream) {
-        // tokenStream is required to get hidden channel token (e.g. whitespace). Unlike getTextPreservingWhitespace, this method reflects Lexer normalizeString
         return tokenStream.getText(ctx.start, ctx.stop);
     }
 
+    /**
+     * Just remove leading "then"
+     */
     public static String trimThen(String rhs) {
         if (rhs.startsWith("then")) {
             return rhs.substring("then".length());