Skip to content

Commit

Permalink
[DROOLS-7577] drools-lsp : Add comments and explanations for drools-p… (
Browse files Browse the repository at this point in the history
#43)

* [DROOLS-7577] drools-lsp : Add comments and explanations for drools-parser

* removed syntax examples
  • Loading branch information
tkobayas authored and rgdoliveira committed Oct 24, 2024
1 parent e56f2a7 commit cdd9d3c
Show file tree
Hide file tree
Showing 7 changed files with 142 additions and 25 deletions.
38 changes: 38 additions & 0 deletions drools-drl/drools-drl10-parser/Developer_Notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
## drools-parser

This module is a reimplementation of the DRL (Drools Rule Language) parser based on ANTLR4.

The current [DRL6Parser](https://github.com/apache/incubator-kie-drools/blob/main/drools-drl/drools-drl-parser/src/main/java/org/drools/drl/parser/lang/DRL6Parser.java) is based on ANTLR3 and contains a lot of custom modifications, which is hard to maintain. This new module should keep the separation between the parser syntax (`DRLParser.g4`) and the Descr generation (`DRLVisitorImpl.java`).

This module started with a part of LSP to develop DRL editors, but it is not limited to that. This module will also replace DRL6Parser in the drools code base.

### How is this developed?

1. The starting point is [DRL6Parser](https://github.com/apache/incubator-kie-drools/blob/main/drools-drl/drools-drl-parser/src/main/java/org/drools/drl/parser/lang/DRL6Parser.java). While it contains lots of customizations, we can map its javadoc (e.g. `packageStatement := PACKAGE qualifiedIdentifier SEMICOLON?`) to `DRLParser.g4` (e.g. `packagedef : PACKAGE name=drlQualifiedName SEMI? ;`).
2. `DRLLexer.g4` is written to define tokens for DRL.
3. `DRLLexer.g4` imports `JavaLexer.g4` to reuse Java tokens. `DRLParser.g4` imports `JavaParser.g4` to reuse Java grammar. These Java parser files are distributed by ANTLR4 under BSD license.
4. In `DRLLexer.g4`, basically define tokens with a prefix "DRL_" to clarify they are DRL keywords.
5. In `DRLParser.g4`, define parser rules with a prefix "drl" if the rule name conflicts with `JavaParser.g4`. Sometimes we need to do that, because such a rule may contain DRL keywords.
6. (As of 2023/10/31) this parser doesn't deeply parse rule RHS (just multiple `RHS_CHUNK`s), because Drools passes RHS text to drools-compiler as-is. In case of developing DRL editors, we may need to integrate another Java LSP to support RHS code completion, etc.
7. LHS constraint (e.g. `age > 30`) is also handled as text. Further processing will be done in the later compiler phase.
8. `DRLParser` processes a DRL text and produces an AST(abstract syntax tree). Then apply `DRLVisitorImpl` to generate PackageDescr following the visitor pattern. So the main work would be implementing `DRLParser.g4` and `DRLVisitorImpl`.
9. Errors are handled by `DRLErrorListener`
10. (As of 2023/10/31) We have 2 test classes. `DRLParserTest` is a very basic test to check if the parser can parse DRL. `MiscDRLParserTest` contains various DRL syntax to check if the parser generates correct Descr objects. `MiscDRLParserTest` was ported from [RuleParserTest](https://github.com/apache/incubator-kie-drools/blob/main/drools-test-coverage/test-compiler-integration/src/test/java/org/drools/mvel/compiler/lang/RuleParserTest.java) so that we can ensure the compatibility of generated Descr objects between the current implementation and the new one.
11. As `DRL6Parser` contains hard-coded customizations, sometimes we need to read and understand the `DRL6Parser` source codes to meet the compatibility.
12. (As of 2023/10/31) `MiscDRLParserTest` still has several test cases with `@Disabled` which are relatively lower priority or edge cases. They need to be resolved at some point in the future. To fix the issues, file a JIRA, remove the `@Disabled` annotation, and fix the implementation to pass the test case.

### Next steps

1. Create a feature branch in drools repo and replace `DRL6Parser` with this new parser.
2. We will detect issues in the new parser by running the existing tests in drools repo. If we find any issues, we will fix them in the new parser and add new tests to cover them. Such tests would be more or less Descr comparison tests, so we would add a new test class which is similar to `MiscDRLParserTest`.

### Refactoring candidates
- `DRLParserHelper` and `DRLParserWrapper` have some duplicated code and purpose. We can merge them into one class.
- `MiscDRLParserTest` can be cleaner and fixed to align with SonarLint suggestions.
- Constraint related parser rules after `conditionalOrExpression` are written in antlr3 style. They could be refactored to antlr4 style (like `lhsExpression`).

### Development tips
- IntelliJ IDEA has an ANTLR4 plugin, which "ANTLR Preview" window displays a parse tree. It is very useful to debug the parser rules.

### Resources
[The Definitive ANTLR 4 Reference](https://pragprog.com/titles/tpantlr2/the-definitive-antlr-4-reference/)
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
import org.antlr.v4.runtime.RecognitionException;
import org.antlr.v4.runtime.Recognizer;

/**
* Collect errors while parsing DRL
*/
public class DRLErrorListener extends BaseErrorListener {

private final List<DRLParserError> errors = new ArrayList<>();
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
package org.drools.parser;

/**
* Error information while parsing DRL
*/
public class DRLParserError {

private int lineNumber;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,18 @@
import org.antlr.v4.runtime.tree.TerminalNode;
import org.drools.drl.ast.descr.PackageDescr;

/**
* Collection of static helper methods for DRLParser
*/
public class DRLParserHelper {

private DRLParserHelper() {
}

/**
* Entry point for parsing DRL.
* Unlike DRLParserWrapper.parse(), this method does not collect errors.
*/
public static PackageDescr parse(String drl) {
DRLParser drlParser = createDrlParser(drl);
return compilationUnitContext2PackageDescr(drlParser.compilationUnit(), drlParser.getTokenStream());
Expand All @@ -27,6 +34,9 @@ public static DRLParser createDrlParser(String drl) {
return new DRLParser(commonTokenStream);
}

/**
* DRLVisitorImpl visits a parse tree and creates a PackageDescr
*/
public static PackageDescr compilationUnitContext2PackageDescr(DRLParser.CompilationUnitContext ctx, TokenStream tokenStream) {
DRLVisitorImpl visitor = new DRLVisitorImpl(tokenStream);
Object descr = visitor.visit(ctx);
Expand All @@ -37,6 +47,9 @@ public static PackageDescr compilationUnitContext2PackageDescr(DRLParser.Compila
}
}

/**
* Given a row and column of the input DRL, return the index of the matched token
*/
public static Integer computeTokenIndex(DRLParser parser, int row, int col) {
for (int i = 0; i < parser.getInputStream().size(); i++) {
Token token = parser.getInputStream().get(i);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,18 @@

import static org.drools.parser.DRLParserHelper.compilationUnitContext2PackageDescr;

/**
* Wrapper for DRLParser. Somewhat duplicated from DRLParserHelper, but this class is instantiated and holds errors.
*/
public class DRLParserWrapper {

private static final Logger LOGGER = LoggerFactory.getLogger(DRLParserWrapper.class);

private final List<DRLParserError> errors = new ArrayList<>();

/**
* Main entry point for parsing DRL
*/
public PackageDescr parse(String drl) {
DRLParser drlParser = DRLParserHelper.createDrlParser(drl);
DRLErrorListener errorListener = new DRLErrorListener();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,12 @@
import static org.drools.parser.ParserStringUtils.trimThen;
import static org.drools.util.StringUtils.unescapeJava;

/**
* Visitor implementation for DRLParser.
* Basically, each visit method creates and returns a Descr object traversing the parse tree.
* Finally, visitCompilationUnit() returns a PackageDescr object.
* Try not to depend on DRLVisitorImpl's internal state for clean maintainability
*/
public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> {

private final TokenStream tokenStream;
Expand All @@ -56,6 +62,9 @@ public DRLVisitorImpl(TokenStream tokenStream) {
this.tokenStream = tokenStream;
}

/**
* Main entry point for creating PackageDescr from a parser tree.
*/
@Override
public PackageDescr visitCompilationUnit(DRLParser.CompilationUnitContext ctx) {
PackageDescr packageDescr = new PackageDescr();
Expand All @@ -67,7 +76,11 @@ public PackageDescr visitCompilationUnit(DRLParser.CompilationUnitContext ctx) {
return packageDescr;
}

/**
* Add all children Descr to PackageDescr
*/
private void applyChildrenDescrs(PackageDescr packageDescr, List<BaseDescr> descrList) {
// This bunch of if-blocks will be refactored by DROOLS-7564
descrList.forEach(descr -> {
if (descr instanceof UnitDescr) {
packageDescr.setUnit((UnitDescr) descr);
Expand Down Expand Up @@ -147,6 +160,8 @@ public FunctionDescr visitFunctiondef(DRLParser.FunctiondefContext ctx) {
functionDescr.setReturnType("void");
}
functionDescr.setName(ctx.IDENTIFIER().getText());

// add function parameters
DRLParser.FormalParametersContext formalParametersContext = ctx.formalParameters();
DRLParser.FormalParameterListContext formalParameterListContext = formalParametersContext.formalParameterList();
if (formalParameterListContext != null) {
Expand All @@ -163,7 +178,7 @@ public FunctionDescr visitFunctiondef(DRLParser.FunctiondefContext ctx) {

@Override
public BaseDescr visitDeclaredef(DRLParser.DeclaredefContext ctx) {
return visitDescrChildren(ctx).get(0);
return visitDescrChildren(ctx).get(0); // only one child
}

@Override
Expand Down Expand Up @@ -202,6 +217,9 @@ public WindowDeclarationDescr visitWindowDeclaration(DRLParser.WindowDeclaration
return windowDeclarationDescr;
}

/**
* entry point for one rule
*/
@Override
public RuleDescr visitRuledef(DRLParser.RuledefContext ctx) {
RuleDescr ruleDescr = new RuleDescr(safeStripStringDelimiters(ctx.name.getText()));
Expand All @@ -228,16 +246,27 @@ public RuleDescr visitRuledef(DRLParser.RuledefContext ctx) {

if (ctx.rhs() != null) {
ruleDescr.setConsequenceLocation(ctx.rhs().getStart().getLine(), ctx.rhs().getStart().getCharPositionInLine()); // location of "then"
ruleDescr.setConsequence(trimThen(getTextPreservingWhitespace(ctx.rhs())));
ruleDescr.setConsequence(trimThen(getTextPreservingWhitespace(ctx.rhs()))); // RHS is just a text
}

return ruleDescr;
}

private void slimLhsRootDescr(AndDescr root) {
// Root Descr is always AndDescr.
// For example, if there are nested AndDescr like
// AndDescr
// /\
// P AndDescr
// /\
// P P
// is slimmed down to
// AndDescr
// / | \
// P P P
List<BaseDescr> descrList = new ArrayList<>(root.getDescrs());
root.getDescrs().clear();
descrList.forEach(root::addOrMerge); // This slims down nested AndDescr
descrList.forEach(root::addOrMerge);
}

@Override
Expand Down Expand Up @@ -373,6 +402,9 @@ public AttributeDescr visitDurationAttribute(DRLParser.DurationAttributeContext
return attributeDescr;
}

/**
* entry point for LHS
*/
@Override
public List<BaseDescr> visitLhs(DRLParser.LhsContext ctx) {
if (ctx.lhsExpression() != null) {
Expand All @@ -394,10 +426,12 @@ public BaseDescr visitLhsPatternBind(DRLParser.LhsPatternBindContext ctx) {
}

private PatternDescr getSinglePatternDescr(DRLParser.LhsPatternBindContext ctx) {
Optional<BaseDescr> optPatternDescr = visitFirstDescrChild(ctx);
PatternDescr patternDescr = optPatternDescr.filter(PatternDescr.class::isInstance)
.map(PatternDescr.class::cast)
.orElseThrow(() -> new IllegalStateException("lhsPatternBind must have at least one lhsPattern : " + ctx.getText()));
List<BaseDescr> patternDescrList = visitDescrChildren(ctx);
if (patternDescrList.isEmpty() || !(patternDescrList.get(0) instanceof PatternDescr)) {
throw new IllegalStateException("lhsPatternBind must have at least one lhsPattern : " + ctx.getText());
}
PatternDescr patternDescr = (PatternDescr)patternDescrList.get(0);

if (ctx.label() != null) {
patternDescr.setIdentifier(ctx.label().IDENTIFIER().getText());
} else if (ctx.unif() != null) {
Expand All @@ -423,6 +457,9 @@ private OrDescr getOrDescrWithMultiplePatternDescr(DRLParser.LhsPatternBindConte
return orDescr;
}

/**
* entry point for a Pattern
*/
@Override
public PatternDescr visitLhsPattern(DRLParser.LhsPatternContext ctx) {
PatternDescr patternDescr = new PatternDescr(ctx.objectType.getText());
Expand Down Expand Up @@ -528,13 +565,19 @@ public EntryPointDescr visitFromEntryPoint(DRLParser.FromEntryPointContext ctx)
return new EntryPointDescr(safeStripStringDelimiters(ctx.stringId().getText()));
}

/**
* Collect constraints in a Pattern
*/
@Override
public List<ExprConstraintDescr> visitConstraints(DRLParser.ConstraintsContext ctx) {
List<ExprConstraintDescr> exprConstraintDescrList = new ArrayList<>();
populateExprConstraintDescrList(ctx, exprConstraintDescrList);
return exprConstraintDescrList;
}

/**
* Collect constraints in a Pattern. Positional constraints comes first with semicolon.
*/
private List<ExprConstraintDescr> visitConstraints(DRLParser.PositionalConstraintsContext positionalCtx, DRLParser.ConstraintsContext ctx) {
List<ExprConstraintDescr> exprConstraintDescrList = new ArrayList<>();
populateExprConstraintDescrList(positionalCtx, exprConstraintDescrList);
Expand All @@ -557,6 +600,9 @@ private void populateExprConstraintDescrList(ParserRuleContext ctx, List<ExprCon
}
}

/**
* Takes one constraint as String and create ExprConstraintDescr
*/
@Override
public ExprConstraintDescr visitConstraint(DRLParser.ConstraintContext ctx) {
String constraint = visitConstraintChildren(ctx);
Expand Down Expand Up @@ -618,6 +664,7 @@ public EvalDescr visitLhsEval(DRLParser.LhsEvalContext ctx) {

@Override
public BaseDescr visitLhsExpressionEnclosed(DRLParser.LhsExpressionEnclosedContext ctx) {
// enclosed expression is simply stripped because Descr itself is encapsulated
return (BaseDescr) visit(ctx.lhsExpression());
}

Expand Down Expand Up @@ -708,7 +755,7 @@ public BaseDescr visitLhsAndDef(DRLParser.LhsAndDefContext ctx) {

@Override
public BaseDescr visitLhsUnary(DRLParser.LhsUnaryContext ctx) {
return visitDescrChildren(ctx).get(0);
return visitDescrChildren(ctx).get(0); // lhsUnary has only one child
}

private void populateStartEnd(BaseDescr descr, ParserRuleContext ctx) {
Expand All @@ -718,6 +765,10 @@ private void populateStartEnd(BaseDescr descr, ParserRuleContext ctx) {
descr.setEndCharacter(ctx.getStop().getStopIndex());
}

/**
* This is a special version of visitChildren().
* This collects children BaseDescr objects and returns them as a list.
*/
private List<BaseDescr> visitDescrChildren(RuleNode node) {
List<BaseDescr> aggregator = new ArrayList<>();
int n = node.getChildCount();
Expand Down Expand Up @@ -768,21 +819,10 @@ public ParseTree getNode() {
}
}

// leaves of constraint concatenate return Strings
/**
* Return the text of constraint as-is
*/
private String visitConstraintChildren(ParserRuleContext ctx) {
return getTokenTextPreservingWhitespace(ctx, tokenStream);
}

private Optional<BaseDescr> visitFirstDescrChild(RuleNode node) {
int n = node.getChildCount();

for (int i = 0; i < n && this.shouldVisitNextChild(node, null); ++i) {
ParseTree c = node.getChild(i);
Object childResult = c.accept(this);
if (childResult instanceof BaseDescr) {
return Optional.of((BaseDescr) childResult);
}
}
return Optional.empty();
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,17 @@
import org.antlr.v4.runtime.misc.Interval;

/**
* will be merged in drools-util
* Collection of String utilities used by DRLParser.
* This may be merged in drools-util
*/
public class ParserStringUtils {

private ParserStringUtils() {
}

/**
* Strip string delimiters (e.g. "foo" -> foo)
*/
public static String safeStripStringDelimiters(String value) {
if (value != null) {
value = value.trim();
Expand All @@ -22,6 +26,9 @@ public static String safeStripStringDelimiters(String value) {
return value;
}

/**
* Get text from ParserRuleContext's CharStream without trimming whitespace
*/
public static String getTextPreservingWhitespace(ParserRuleContext ctx) {
// Using raw CharStream
int startIndex = ctx.start.getStartIndex();
Expand All @@ -34,11 +41,18 @@ public static String getTextPreservingWhitespace(ParserRuleContext ctx) {
return ctx.start.getTokenSource().getInputStream().getText(interval);
}

/**
* Get text from ParserRuleContext's CharStream without trimming whitespace
* tokenStream is required to get hidden channel token (e.g. whitespace).
* Unlike getTextPreservingWhitespace, this method reflects Lexer normalizeString
*/
public static String getTokenTextPreservingWhitespace(ParserRuleContext ctx, TokenStream tokenStream) {
// tokenStream is required to get hidden channel token (e.g. whitespace). Unlike getTextPreservingWhitespace, this method reflects Lexer normalizeString
return tokenStream.getText(ctx.start, ctx.stop);
}

/**
* Just remove leading "then"
*/
public static String trimThen(String rhs) {
if (rhs.startsWith("then")) {
return rhs.substring("then".length());
Expand Down

0 comments on commit cdd9d3c

Please sign in to comment.