From 70ac76146ce6b1562ba2de4ff50033529bae2890 Mon Sep 17 00:00:00 2001 From: yui-knk Date: Fri, 20 Sep 2024 00:09:20 +0900 Subject: [PATCH] Document for compressed state table --- .../compressed_state_table/main.md | 637 ++++++++++++++++++ .../compressed_state_table/parse.output | 174 +++++ .../compressed_state_table/parse.y | 22 + .../compressed_state_table/parser.rb | 282 ++++++++ 4 files changed, 1115 insertions(+) create mode 100644 doc/development/compressed_state_table/main.md create mode 100644 doc/development/compressed_state_table/parse.output create mode 100644 doc/development/compressed_state_table/parse.y create mode 100644 doc/development/compressed_state_table/parser.rb diff --git a/doc/development/compressed_state_table/main.md b/doc/development/compressed_state_table/main.md new file mode 100644 index 00000000..243ac6b4 --- /dev/null +++ b/doc/development/compressed_state_table/main.md @@ -0,0 +1,637 @@ +# Compressed State Table + +LR parser generates two large tables, action table and GOTO table. +Action table is a matrix of states and tokens. Each cell of action table indicates next action (shift, reduce, accept and error). +GOTO table is a matrix of states and nonterminal symbols. Each cell of GOTO table indicates next state. + +Action table of "parse.y": + +| |EOF| LF|NUM|'+'|'*'|'('|')'| +|--------|--:|--:|--:|--:|--:|--:|--:| +|State 0| r1| | s1| | | s2| | +|State 1| r3| r3| r3| r3| r3| r3| r3| +|State 2| | | s1| | | s2| | +|State 3| s6| | | | | | | +|State 4| | s7| | s8| s9| | | +|State 5| | | | s8| s9| |s10| +|State 6|acc|acc|acc|acc|acc|acc|acc| +|State 7| r2| r2| r2| r2| r2| r2| r2| +|State 8| | | s1| | | s2| | +|State 9| | | s1| | | s2| | +|State 10| r6| r6| r6| r6| r6| r6| r6| +|State 11| | r4| | r4| s9| | r4| +|State 12| | r5| | r5| r5| | r5| + +GOTO table of "parse.y": + +| |$accept|program|expr| +|--------|------:|------:|---:| +|State 0| | g3| g4| +|State 1| | | | +|State 2| | | g5| +|State 3| | | | +|State 4| | | | +|State 5| | | | +|State 6| | | | +|State 7| | | | +|State 8| | | g11| +|State 9| | | g12| +|State 10| | | | +|State 11| | | | +|State 12| | | | + + +Both action table and GOTO table are sparse. Therefore LR parser generator compresses both tables and creates these tables. + +* `yypact` & `yypgoto` +* `yytable` +* `yycheck` +* `yydefact` & `yydefgoto` + +See also: https://speakerdeck.com/yui_knk/what-is-expected?slide=52 + +## Introduction to major tables + +### `yypact` & `yypgoto` + +`yypact` specifies offset on `yytable` for the current state. +As an optimization, `yypact` also specifies default reduce action for some states. +Accessing the value by `state`. For example, + +```ruby +offset = yypact[state] +``` + +If the value is `YYPACT_NINF` (Negative INFinity), it means execution of default reduce action. +Otherwise the value is an offset in `yytable`. + +`yypgoto` plays the same role as `yypact`. +But `yypgoto` is used for GOTO table. +Then its index is nonterminal symbol id. +Especially `yypgoto` is used when reduce happens. + +```ruby +rule_for_reduce = rules[rule_id] + +# lhs_id holds LHS nonterminal id of the rule used for reduce. +lhs_id = rule_for_reduce.lhs.id + +offset = yypgoto[lhs_id] + +# Validate access to yytable +if yycheck[offset + state] == state + next_state = yytable[offset + state] +end +``` + +### `yytable` + +`yytable` is a mixture of action table and GOTO table. + +#### For action table + +For action table, `yytable` specifies what actually to do on the current state. + +Positive number means shift and specifies next state. +For example, `yytable[yyn] == 1` means shift and next state is State 1. + +`YYTABLE_NINF` (Negative INFinity) means syntax error. +For example, `yytable[yyn] == YYTABLE_NINF` means syntax error. + +Other negative number and zero mean reducing with the rule whose number is opposite. +For example, `yytable[yyn] == -1` means reduce with Rule 1. + +#### For GOTO table + +For GOTO table, `yytable` specifies the next state for given LSH nonterminal. + +The value is always positive number which means next state id. +It never becomes `YYTABLE_NINF`. + +### `yycheck` + +`yycheck` validates accesses to `yytable`. + +Each line of action table and GOTO table is placed into single array in `yytable`. +Consider the case where action table has only two states. +In this case, if the second array is shifted to the right, they can be merged into one array without conflict. + +```ruby +[ + [ 'a', 'b', , , 'e'], # State 0 + [ , 'B', 'C', , 'E'], # State 1 +] + +# => Shift the second array to the right + +[ + [ 'a', 'b', , , 'e'], # State 0 + [ , 'B', 'C', , 'E'], # State 1 +] + +# => Merge them into single array + +yytable = [ + 'a', 'b', 'B', 'C', 'e', 'E' +] +``` + +`yypact` is an array of each state offset. + +```ruby +yypact = [ + 0, # State 0 is not shifted + 1 # State 1 is shifted one to right +] +``` + +We can access the value of `state1[2]` by consulting `yypact`. + +```ruby +yytable[yypact[1] + 2] +# => yytable[1 + 2] +# => 'C' +``` + +However this approach doesn't work well when accessing to nil value like `state1[3]`. +Because it tries to access to `state0[4]`. + +```ruby +yytable[yypact[1] + 3] +# => yytable[1 + 3] +# => 'e' +``` + +This is why `yycheck` is needed. +`yycheck` stores valid indexes of the original table. +In the current example: + +* 0, 1 and 4 are valid index of State 0 +* 1, 2 and 4 are valid index of State 1 + +`yycheck` stores these indexes with same offset with `yytable`. + +```ruby +# yytable +[ + [ 'a', 'b', , , 'e'], # State 0 + [ , 'B', 'C', , 'E'], # State 1 +] + +yytable = [ + 'a', 'b', 'B', 'C', 'e', 'E' +] + +# yycheck +[ + [ 0, 1, , , 4], # State 0 + [ , 1, 2, , 4], # State 1 +] + +yycheck = [ + 0, 1, 1, 2, 4, 4 +] +``` + +We can validate accesses to `yytable` by consulting `yycheck`. +`yycheck` stores valid indexes in the original arrays then validation is comparing `yycheck[index_for_yytable]` and `index_for_the_state`. +The access is valid if both values are same. + +```ruby +# Validate an access to state1[2] +yycheck[yypact[1] + 2] == 2 +# => yycheck[1 + 2] == 2 +# => 2 == 2 +# => true (valid) + +# Validate an access to state1[3] +yycheck[yypact[1] + 3] == 3 +# => yycheck[1 + 3] == 3 +# => 4 == 3 +# => false (invalid) +``` + +### `yydefact` & `yydefgoto` + +`yydefact` stores rule id of default actions for each state. +`0` means syntax error, other number means reduce using Rule N. + +```ruby +rule_id = yydefact[state] +# => 0 means syntax error, other number means reduce using Rule whose id is `rule_id` +``` + +`yydefgoto` stores default GOTOs for each nonterminal. +The number means next state. + +```ruby +next_state = yydefgoto[lhs_id] +# => Next state id is `next_state` +``` + +## Example + +Take a look at compressed tables of "parse.y". +See "parse.output" for detailed information of symbols and states. + +### `yytable` + +Original action table and GOTO table look like: + +```ruby +# Action table is a matrix of terminals * states +[ +# [ EOF, error, undef, LF, NUM, '+', '*', '(', ')'] (default reduce) + [ , , , , s1, , , s2, ], # State 0 (r1) + [ , , , , , , , , ], # State 1 (r3) + [ , , , , s1, , , s2, ], # State 2 () + [ s6, , , , , , , , ], # State 3 () + [ , , , s7, , s8, s9, , ], # State 4 () + [ , , , , , s8, s9, , s10], # State 5 () + [ , , , , , , , , ], # State 6 (accept) + [ , , , , , , , , ], # State 7 (r2) + [ , , , , s1, , , s2, ], # State 8 () + [ , , , , s1, , , s2, ], # State 9 () + [ , , , , , , , , ], # State 10 (r6) + [ , , , , , , s9, , ], # State 11 (r4) + [ , , , , , , , , ], # State 12 (r5) +] + +# GOTO table is a matrix of states * nonterminals +[ +# [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] State No (default goto) + [ , , , , , , , , , , , , ], # $accept (g0) + [ g3, , , , , , , , , , , , ], # program (g3) + [ g4, , g5, , , , , , g11, g12, , , ], # expr (g4) +] + +# => Remove default goto + +[ +# [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] State No (default goto) + [ , , , , , , , , , , , , ], # $accept (g0) + [ , , , , , , , , , , , , ], # program (g3) + [ , , g5, , , , , , g11, g12, , , ], # expr (g4) +] +``` + +These are compressed to `yytable` like below. +If offset equals to `YYPACT_NINF`, the line has only default value then the line can be ignored (commented out in this example). + +```ruby +[ +# Action table +# (offset, YYPACT_NINF = -4) + [ , , , , s1, , , s2, ], # State 0 ( 6) +# [ , , , , , , , , ], # State 1 (-4) + [ , , , , s1, , , s2, ], # State 2 ( 6) + [ s6, , , , , , , , ], # State 3 ( 1) + [ , , , s7, , s8, s9, , ], # State 4 (-1) + [ , , , , , s8, s9, , s10], # State 5 ( 3) +# [ , , , , , , , , ], # State 6 (-4) +# [ , , , , , , , , ], # State 7 (-4) + [ , , , , s1, , , s2, ], # State 8 ( 6) + [ , , , , s1, , , s2, ], # State 9 ( 6) +# [ , , , , , , , , ], # State 10 (-4) +[ , , , , , , s9, , ], # State 11 (-3) +# [ , , , , , , , , ], # State 12 (-4) + +# GOTO table +# [ , , , , , , , , , , , , ], # $accept (-4) +# [ , , , , , , , , , , , , ], # program (-4) + [ , , g5, , , , , , g11, g12, , , ], # expr (-2) +] + +# => compressed into single array +[ , , , g5, s6, s7, s9, s8, s9, g11, g12, s8, s9, s1, s10, , s2, ] + +# => Cut blank cells on head and tail, remove 'g' and 's' prefix, fill blank with 0 +# This is `yytable` + [ 5, 6, 7, 9, 8, 9, 11, 12, 8, 9, 1, 10, 0, 2] +``` + +`YYTABLE_NINF` is the minimum negative number. +In this case, `0` is the minimum offset number then `YYTABLE_NINF` is `-1`. + +### `yycheck` + +```ruby +[ +# Action table valid indexes +# (offset, YYPACT_NINF = -4) + [ , , , , 4, , , 7, ], # State 0 ( 6) +# [ , , , , , , , , ], # State 1 (-4) + [ , , , , 4, , , 7, ], # State 2 ( 6) + [ 0, , , , , , , , ], # State 3 ( 1) + [ , , , 3, , 5, 6, , ], # State 4 (-1) + [ , , , , , 5, 6, , 8], # State 5 ( 3) +# [ , , , , , , , , ], # State 6 (-4) +# [ , , , , , , , , ], # State 7 (-4) + [ , , , , 4, , , 7, ], # State 8 ( 6) + [ , , , , 4, , , 7, ], # State 9 ( 6) +# [ , , , , , , , , ], # State 10 (-4) +[ , , , , , , 6, , ], # State 11 (-3) +# [ , , , , , , , , ], # State 12 (-4) + +# GOTO table valid indexes +# [ , , , , , , , , , , , , ], # $accept (-4) +# [ , , , , , , , , , , , , ], # program (-4) + [ , , 2, , , , , , 8, 9, , , ], # expr (-2) +] + +# => compressed into single array +[ , , , 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, , 7, ] + +# => Cut blank cells on head and tail, fill blank with -1 because no index can be -1 and comparison always fails +# This is `yycheck` + [ 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, -1, 7] +``` + +### `yypact` & `yypgoto` + +`yypact` & `yypgoto` are mixture of offset in `yytable` and `YYPACT_NINF` (default reduce action). +Index in `yypact` is state id and index in `yypgoto` is nonterminal symbol id. +`YYPACT_NINF` is the minimum negative number. +In this case, `-3` is the minimum offset number then `YYPACT_NINF` is `-4`. + +```ruby +YYPACT_NINF = -4 + +yypact = [ +# 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 (State No) + 6, -4, 6, 1, -1, 3, -4, -4, 6, 6, -4, -3, -4 +] + +yypgoto = [ +# $accept, program, expr + -4, -4, -2 +] +``` + +### `yydefact` & `yydefgoto` + +`yydefact` & `yydefgoto` store default value. + +`yydefact` specifies rule id of default actions of the state. +Because `0` is reserved for syntax error, Rule id starts with 1. + +``` +# In "parse.output" +Grammar + + 0 $accept: program "end of file" + + 1 program: ε + 2 | expr LF + + 3 expr: NUM + 4 | expr '+' expr + 5 | expr '*' expr + 6 | '(' expr ')' + +# => + +# In `yydefact` +Grammar + + 0 Syntax Error + + 1 $accept: program "end of file" + + 2 program: ε + 3 | expr LF + + 4 expr: NUM + 5 | expr '+' expr + 6 | expr '*' expr + 7 | '(' expr ')' +``` + +For example, default action for state 1 is 4 (`yydefact[1] == 4`). +This means Rule 3 (`3 expr: NUM`) in "parse.output" file. + +`yydefgoto` specifies next state id of the nonterminal. + +```ruby +yydefact = [ +# 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 (State No) + 2, 4, 0, 0, 0, 0, 1, 3, 0, 0, 7, 5, 6 +] + +yydefgoto = [ +# $accept, program, expr + 0, 3, 4 +] +``` + +### `yyr1` & `yyr2` + +Both of them are tables for rules. +`yyr1` specifies nonterminal symbol id of rule's Left-Hand-Side. +`yyr2` specifies the length of the rule, that is, number of symbols on the rule's Right-Hand-Side. +Index 0 is not used because Rule id starts with 1. + +```ruby +yyr1 = [ +# 0, 1, 2, 3, 4, 5, 6, 7 (Rule id) +# no rule, $accept, program, program, expr, expr, expr, expr (LHS symbol id) + 0, 9, 10, 10, 11, 11, 11, 11 +] + +yyr2 = [ +# 0, 1, 2, 3, 4, 5, 6, 7 (Rule id) + 0, 2, 0, 2, 1, 3, 3, 3 +] +``` + +## How to use tables + +See also "parse.rb" which implements LALR parser based on "parse.y" file. + +At first, define important constants and arrays: + +```ruby +YYNTOKENS = 9 + +# The last index of yytable and yycheck +# The lenght of yytable and yycheck are always same +YYLAST = 13 +YYTABLE_NINF = -1 +yytable = [ 5, 6, 7, 9, 8, 9, 11, 12, 8, 9, 1, 10, 0, 2] +yycheck = [ 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, -1, 7] + +YYPACT_NINF = -4 +yypact = [ 6, -4, 6, 1, -1, 3, -4, -4, 6, 6, -4, -3, -4] +yypgoto = [ -4, -4, -2] + +yydefact = [ 2, 4, 0, 0, 0, 0, 1, 3, 0, 0, 7, 5, 6] +yydefgoto = [ 0, 3, 4] + +yyr1 = [ 0, 9, 10, 10, 11, 11, 11, 11] +yyr2 = [ 0, 2, 0, 2, 1, 3, 3, 3] +``` + +### Determine what to do next + +Determine what to do next based on current state (`state`) and next token (`yytoken`). + +The first step to decide action is looking up `yypact` table by current state. +If only default reduce exists for the current state, `yypact` returns `YYPACT_NINF`. + +```ruby +# Case 1: Only default reduce exists for the state +# +# State 7 +# +# 2 program: expr LF • +# +# $default reduce using rule 2 (program) + +state = 7 +yytoken = nil # Do not use yytoken in this case + +offset = yypact[state] # -4 +if offset == YYPACT_NINF # true + next_action = :yydefault + return +end +``` + +If both shift and default reduce exists for the current state, `yypact` returns offset in `yytable`. +Index is the sum of `offset` and `yytoken`. +Need to check index before access to `yytable` by consulting `yycheck`. +Index can be out of range because blank cells on head and tail are omitted, see how `yycheck` is constructed in the example above. +Therefore need to check an index is not less than 0 and not greater than `YYLAST`. + +```ruby +# Case 2: Both shift and default reduce exists for the state +# +# State 11 +# +# 4 expr: expr • '+' expr +# 4 | expr '+' expr • [LF, '+', ')'] +# 5 | expr • '*' expr +# +# '*' shift, and go to state 9 +# +# $default reduce using rule 4 (expr) + +# Next token is '*' then shift it +state = 11 +yytoken = nil + +offset = yypact[state] # -3 +if offset == YYPACT_NINF # false + next_action = :yydefault + break +end + +unless yytoken + yytoken = yylex() # yylex returns 6 ('*') +end + +idx = offset + yytoken # 3 +if idx < 0 || YYLAST < idx # false + next_action = :yydefault + break +end +if yycheck[idx] != yytoken # false + next_action = :yydefault + break +end + +act = yytable[idx] # 9 +if act == YYTABLE_NINF # false + next_action = :syntax_error + break +end +if act > 0 # true + # Shift + next_action = :yyshift + break +else + # Reduce + next_action = :yyreduce + break +end +``` + +### Execute (default) reduce + +Once next action is decided to default reduce, need to determine + +1. the rule to be applied +2. the next state from GOTO table + +Rule id for the default reduce is stored in `yydefact`. +`0` in `yydefact` means syntax error so need to check the value is not `0` before continue the process. + +Once rule is determined, the lenght of the rule can be decided from `yyr2` and the LHS nonterminal can be decided from `yyr1`. + +The next state is determined by LHS nonterminal and the state after reduce. +GOTO table is also compressed into `yytable` then the process to decide next state is similar to `yypact`. + +1. Look up `yypgoto` by LHS nonterminal. Note `yypact` is indexed by state but `yypgoto` is indexed by nonterminal. +2. Check the value on `yypgoto` is `YYPACT_NINF` is not. +3. Check the index, sum of offset and state, is out of range or not. +4. Check `yycheck` table before access to `yytable`. + +Finally push the state to the stack. + +```ruby +# State 11 +# +# 4 expr: expr • '+' expr +# 4 | expr '+' expr • [LF, '+', ')'] +# 5 | expr • '*' expr +# +# '*' shift, and go to state 9 +# +# $default reduce using rule 4 (expr) + +# Input is "1 + 2 + 3 LF" and next token is the second '+'. +# Current state stack is `[0, 4, 8, 11]`. +# What to do next is reduce with default action. +state = 11 +yytoken = 5 # '+' + +rule = yydefact[state] # 5 +if rule == 0 # false + next_action = :syntax_error + break +end + +rhs_length = yyr2[rule] # 3. Because rule 4 is "expr: expr '+' expr" +lhs_nterm = yyr1[rule] # 11 (expr) +lhs_nterm_id = lhs_nterm - YYNTOKENS # 11 - 9 = 2 + +case rule +when 1 + # Execute Rule 1 action +when 2 + # Execute Rule 2 action +#... +when 7 + # Execute Rule 7 action +end + +stack.pop(rhs_length) # state stack: `[0, 4, 8, 11]` -> `[0]` +state = stack[-1] # state = 0 + +offset = yypgoto[lhs_nterm_id] # -2 +if offset == YYPACT_NINF # false + state = yydefgoto[lhs_nterm_id] +else + idx = offset + state # 0 + if idx < 0 || YYLAST < idx # true + state = yydefgoto[lhs_nterm_id] # 4 + elsif yycheck[idx] != state + state = yydefgoto[lhs_nterm_id] + else + state = yytable[idx] + end +end + +# yyval = $$, yyloc = @$ +push_state(state, yyval, yyloc) # state stack: [0, 4] +``` diff --git a/doc/development/compressed_state_table/parse.output b/doc/development/compressed_state_table/parse.output new file mode 100644 index 00000000..02e8a2ef --- /dev/null +++ b/doc/development/compressed_state_table/parse.output @@ -0,0 +1,174 @@ +Symbol + + -2 EMPTY + 0 "end of file" + 1 error + 2 "invalid token" (undef) + 3 LF + 4 NUM + 5 '+' + 6 '*' + 7 '(' + 8 ')' + 9 $accept # Start of nonterminal + 10 program + 11 expr + + +Grammar + + 0 $accept: program "end of file" + + 1 program: ε + 2 | expr LF + + 3 expr: NUM + 4 | expr '+' expr + 5 | expr '*' expr + 6 | '(' expr ')' + + +State 0 + + 0 $accept: • program "end of file" + 1 program: ε • ["end of file"] + 2 | • expr LF + 3 expr: • NUM + 4 | • expr '+' expr + 5 | • expr '*' expr + 6 | • '(' expr ')' + + NUM shift, and go to state 1 + '(' shift, and go to state 2 + + $default reduce using rule 1 (program) + + program go to state 3 + expr go to state 4 + + +State 1 + + 3 expr: NUM • + + $default reduce using rule 3 (expr) + + +State 2 + + 3 expr: • NUM + 4 | • expr '+' expr + 5 | • expr '*' expr + 6 | • '(' expr ')' + 6 | '(' • expr ')' + + NUM shift, and go to state 1 + '(' shift, and go to state 2 + + expr go to state 5 + + +State 3 + + 0 $accept: program • "end of file" + + "end of file" shift, and go to state 6 + + +State 4 + + 2 program: expr • LF + 4 expr: expr • '+' expr + 5 | expr • '*' expr + + LF shift, and go to state 7 + '+' shift, and go to state 8 + '*' shift, and go to state 9 + + +State 5 + + 4 expr: expr • '+' expr + 5 | expr • '*' expr + 6 | '(' expr • ')' + + '+' shift, and go to state 8 + '*' shift, and go to state 9 + ')' shift, and go to state 10 + + +State 6 + + 0 $accept: program "end of file" • + + $default accept + + +State 7 + + 2 program: expr LF • + + $default reduce using rule 2 (program) + + +State 8 + + 3 expr: • NUM + 4 | • expr '+' expr + 4 | expr '+' • expr + 5 | • expr '*' expr + 6 | • '(' expr ')' + + NUM shift, and go to state 1 + '(' shift, and go to state 2 + + expr go to state 11 + + +State 9 + + 3 expr: • NUM + 4 | • expr '+' expr + 5 | • expr '*' expr + 5 | expr '*' • expr + 6 | • '(' expr ')' + + NUM shift, and go to state 1 + '(' shift, and go to state 2 + + expr go to state 12 + + +State 10 + + 6 expr: '(' expr ')' • + + $default reduce using rule 6 (expr) + + +State 11 + + 4 expr: expr • '+' expr + 4 | expr '+' expr • [LF, '+', ')'] + 5 | expr • '*' expr + + '*' shift, and go to state 9 + + $default reduce using rule 4 (expr) + + Conflict between rule 4 and token '+' resolved as reduce (%left '+'). + Conflict between rule 4 and token '*' resolved as shift ('+' < '*'). + + +State 12 + + 4 expr: expr • '+' expr + 5 | expr • '*' expr + 5 | expr '*' expr • [LF, '+', '*', ')'] + + $default reduce using rule 5 (expr) + + Conflict between rule 5 and token '+' resolved as reduce ('+' < '*'). + Conflict between rule 5 and token '*' resolved as reduce (%left '*'). + + diff --git a/doc/development/compressed_state_table/parse.y b/doc/development/compressed_state_table/parse.y new file mode 100644 index 00000000..9ed0d71f --- /dev/null +++ b/doc/development/compressed_state_table/parse.y @@ -0,0 +1,22 @@ +%union { + int val; +} +%token LF +%token NUM +%type expr +%left '+' +%left '*' + +%% + +program : /* empty */ + | expr LF { printf("=> %d\n", $1); } + ; + +expr : NUM + | expr '+' expr { $$ = $1 + $3; } + | expr '*' expr { $$ = $1 * $3; } + | '(' expr ')' { $$ = $2; } + ; + +%% diff --git a/doc/development/compressed_state_table/parser.rb b/doc/development/compressed_state_table/parser.rb new file mode 100644 index 00000000..5f7b274c --- /dev/null +++ b/doc/development/compressed_state_table/parser.rb @@ -0,0 +1,282 @@ +class Parser + YYNTOKENS = 9 + YYLAST = 13 + YYTABLE_NINF = -1 + YYTABLE = [ 5, 6, 7, 9, 8, 9, 11, 12, 8, 9, 1, 10, 0, 2] + YYCHECK = [ 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, -1, 7] + + YYPACT_NINF = -4 + YYPACT = [ 6, -4, 6, 1, -1, 3, -4, -4, 6, 6, -4, -3, -4] + YYPGOTO = [ -4, -4, -2] + + YYDEFACT = [ 2, 4, 0, 0, 0, 0, 1, 3, 0, 0, 7, 5, 6] + YYDEFGOTO = [ 0, 3, 4] + + YYR1 = [ 0, 9, 10, 10, 11, 11, 11, 11] + YYR2 = [ 0, 2, 0, 2, 1, 3, 3, 3] + + YYFINAL = 6 + + # Symbols + SYM_EMPTY = -2 + SYM_EOF = 0 # "end of file" + SYM_ERROR = 1 # error + SYM_UNDEF = 2 # Invalid Token + SYM_LF = 3 # LF + SYM_NUM = 4 # NUM + SYM_PLUS = 5 # '+' + SYM_ASTER = 6 # '*' + SYM_LPAREN = 7 # '(' + SYM_RPAREN = 8 # ')' + # Start of nonterminal + SYM_ACCEPT = 9 # $accept + SYM_PROGRAM = 10 # program + SYM_EXPR = 11 # expr + + def initialize(debug = false) + @debug = debug + end + + def parse(lexer) + state = 0 + stack = [] + yytoken = SYM_EMPTY + parser_action = :push_state + next_state = nil + rule = nil + + while true + _parser_action = parser_action + parser_action = nil + + case _parser_action + when :syntax_error + debug_print("Entering :syntax_error") + + return 1 + when :accept + debug_print("Entering :accept") + + return 0 + when :push_state + # Precondition: `state` is set to new state + debug_print("Entering :push_state") + + debug_print("Push state #{state}") + stack.push(state) + debug_print("Current stack #{stack}") + + if state == YYFINAL + parser_action = :accept + next + end + + parser_action = :decide_parser_action + next + when :decide_parser_action + debug_print("Entering :decide_parser_action") + + offset = yypact[state] + if offset == YYPACT_NINF + parser_action = :yydefault + next + end + + # Ensure next token + if yytoken == SYM_EMPTY + debug_print("Reading a token") + + yytoken = lexer.next_token + end + + case yytoken + when SYM_EOF + debug_print("Now at end of input.") + when SYM_ERROR + parser_action = :syntax_error + next + else + debug_print("Next token is #{yytoken}") + end + + idx = offset + yytoken + if idx < 0 || YYLAST < idx + debug_print("Decide next parser action as :yydefault") + + parser_action = :yydefault + next + end + if yycheck[idx] != yytoken + debug_print("Decide next parser action as :yydefault") + + parser_action = :yydefault + next + end + + action = yytable[idx] + if action == YYTABLE_NINF + parser_action = :syntax_error + next + end + if action > 0 + # Shift + debug_print("Decide next parser action as :yyshift") + + next_state = action + parser_action = :yyshift + next + else + # Reduce + debug_print("Decide next parser action as :yyreduce") + + rule = -action + parser_action = :yyreduce + next + end + when :yyshift + # Precondition: `next_state` is set + debug_print("Entering :yyshift") + raise "next_state is not set" unless next_state + + yytoken = SYM_EMPTY + state = next_state + next_state = nil + parser_action = :push_state + next + when :yydefault + debug_print("Entering :yydefault") + + rule = yydefact[state] + if rule == 0 + parser_action = :syntax_error + next + end + + parser_action = :yyreduce + next + when :yyreduce + # Precondition: `rule`, used for reduce, is set + debug_print("Entering :yyreduce") + raise "rule is not set" unless rule + + rhs_length = yyr2[rule] + lhs_nterm = yyr1[rule] + lhs_nterm_id = lhs_nterm - YYNTOKENS + + text = "Execute action for Rule (#{rule}) " + case rule + when 1 + text << "$accept: program \"end of file\"" + when 2 + text << "program: ε" + when 3 + text << "program: expr LF" + when 4 + text << "expr: NUM" + when 5 + text << "expr: expr '+' expr" + when 6 + text << "expr: expr '*' expr" + when 7 + text << "expr: '(' expr ')'" + end + debug_print(text) + + debug_print("Pop #{rhs_length} elements") + debug_print("Stack before pop: #{stack}") + stack.pop(rhs_length) + debug_print("Stack after pop: #{stack}") + state = stack[-1] + + # "Shift" LHS nonterminal + offset = yypgoto[lhs_nterm_id] + if offset == YYPACT_NINF + state = yydefgoto[lhs_nterm_id] + else + idx = offset + state + if idx < 0 || YYLAST < idx + state = yydefgoto[lhs_nterm_id] + elsif yycheck[idx] != state + state = yydefgoto[lhs_nterm_id] + else + state = yytable[idx] + end + end + + rule = nil + parser_action = :push_state + next + else + raise "Unknown parser_action: #{parser_action}" + end + end + end + + private + + def debug_print(str) + if @debug + $stderr.puts str + end + end + + def yytable + YYTABLE + end + + def yycheck + YYCHECK + end + + def yypact + YYPACT + end + + def yypgoto + YYPGOTO + end + + def yydefact + YYDEFACT + end + + def yydefgoto + YYDEFGOTO + end + + def yyr1 + YYR1 + end + + def yyr2 + YYR2 + end +end + +class Lexer + def initialize(tokens) + @tokens = tokens + @index = 0 + end + + def next_token + if @tokens.length > @index + token = @tokens[@index] + @index += 1 + return token + else + return Parser::SYM_EOF + end + end +end + +lexer = Lexer.new([ + # 1 + 2 + 3 LF + Parser::SYM_NUM, + Parser::SYM_PLUS, + Parser::SYM_NUM, + Parser::SYM_PLUS, + Parser::SYM_NUM, + Parser::SYM_LF, +]) +Parser.new(debug: true).parse(lexer)