Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for detecting left recursion in rules like: #2

Open
wants to merge 95 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
d4094c8
Fix for detecting left recursion in rules like:
mingodad May 28, 2021
d890d78
Fix segfault due to bad format string parameters
mingodad Jun 1, 2021
93fc7f9
Replace instantiations using 'new' with RAII stack instances
mingodad Jun 3, 2021
375e702
Avoid unnecessary string copy/leak in 'Comment' creation
mingodad Jun 3, 2021
8bcf918
Replace instatiation with 'new' by RAII, also remove unnecessary stri…
mingodad Jun 3, 2021
35a18bf
Replace instatiation with 'new' by RAII
mingodad Jun 3, 2021
0712997
Replace multiple copies of bitwise expression by a macro
mingodad Jun 3, 2021
94860db
Remove unecessary string copy/delete from Scanner
mingodad Jun 3, 2021
0df8d32
Hash table now makes a copy of the key to avoid dangling pointers, al…
mingodad Jun 3, 2021
8009b1c
add missing cleanup
mingodad Jun 3, 2021
2fe0330
Replace instantiation with 'new' by RAII
mingodad Jun 3, 2021
ed2ecca
Add destructor for cleanup
mingodad Jun 3, 2021
13107cc
Added cleanup
mingodad Jun 3, 2021
a2cfa19
Added cleanup
mingodad Jun 3, 2021
7ee5eda
Add cleanup
mingodad Jun 3, 2021
d608f8d
Fix several memory leaks
mingodad Jun 3, 2021
5e9f36a
Fix several memory leaks
mingodad Jun 3, 2021
fa03b86
Fix memory leaks, and change function 'DetachAction' to return an ind…
mingodad Jun 3, 2021
8f68e61
Fix memory leaks
mingodad Jun 3, 2021
2b93528
Fix several memory leaks
mingodad Jun 3, 2021
839e7de
Fix memory leaks
mingodad Jun 3, 2021
170138f
Fix several memory leaks
mingodad Jun 3, 2021
dd477f2
Cleanup and fix several memory leaks
mingodad Jun 3, 2021
eac5f1e
Convert ArrayList to a templated one for future simplifications
mingodad Jun 3, 2021
4f22dea
Add a basic AST generator based on https://github.com/rochus-keller/E…
mingodad Jun 3, 2021
8fe04c0
Convert ArrayList to TArrayList<T>
mingodad Jun 3, 2021
1750865
Add 'const' qualifier in several places
mingodad Jun 4, 2021
076d923
Replace recursive calls to 'Scanner::NextToken()' with iteration
mingodad Jun 4, 2021
2c69a1d
Allow till 8 characters for multiline comment delimiters
mingodad Jun 4, 2021
e0a955f
Add a limited semantic action to TokenDecl to allow for example parsi…
mingodad Jun 4, 2021
42922cc
Add column info to Node and Symbol to create better diagnostics, also…
mingodad Jun 4, 2021
6b258ca
Small code reformat
mingodad Jun 4, 2021
f2e7af5
Replace constants for node kinds by enum
mingodad Jun 4, 2021
3b4c868
Initial implementation of a kind of TreeView for LL1 errors
mingodad Jun 4, 2021
a658cca
Add the token names between comments in several places to make easier…
mingodad Jun 4, 2021
d13715b
Start the refactoring to allow compile with and without wchar_t
mingodad Jun 4, 2021
28d9809
Remove several unneeded calls to 'printf' family functions
mingodad Jun 5, 2021
d795f69
Move 'ArrayList' to Scanner.frame to use in the AST (parser tree) gen…
mingodad Jun 5, 2021
88dc67f
Replace STRL and CHL by the unified _SC macro
mingodad Jun 5, 2021
079cfa1
Close to achieve build with and without wchar
mingodad Jun 5, 2021
5184aad
Fix the scanner generation to work without wchar_t
mingodad Jun 5, 2021
f1d4df6
Fix other places that can cause trouble when compiling without char_t
mingodad Jun 5, 2021
a262593
Fix AST generation to work with and without wchar_t
mingodad Jun 5, 2021
1fa17eb
Add the Taste example with memory leaks fixed
mingodad Jun 5, 2021
268e32c
Minor code layout fix
mingodad Jun 5, 2021
4ea34e7
Another memory leak fixed
mingodad Jun 5, 2021
31c62f3
Replace some magic numbers
mingodad Jun 6, 2021
b9359ff
Remove unnecessary function and it's usages
mingodad Jun 6, 2021
ae044ac
Remove unnecessary string allocation/deallocation
mingodad Jun 6, 2021
ef40822
Remove unnecessary string allocation/deallocation
mingodad Jun 6, 2021
9dc7b76
Remove unnecessary string allocation/deallocation
mingodad Jun 6, 2021
0c739db
Remove unnecessary string allocation/deallocation
mingodad Jun 6, 2021
17e2ab3
Refactor code removing unnecessary layer that could leak memory
mingodad Jun 6, 2021
a05edfe
Fix memory leak
mingodad Jun 6, 2021
92f46df
Fix memory leak
mingodad Jun 6, 2021
87aed46
Add filename to error messages based on https://github.com/cviehb/Coc…
mingodad Jun 6, 2021
987595c
Put braces around token declaration demantic actions
mingodad Jun 6, 2021
ea0ff02
Add stub code to allow build CocoR parsers without dependency on libs…
mingodad Jun 6, 2021
110c390
Fix to cross compile on linux with mingw64 compiler
mingodad Jun 6, 2021
9e5a932
Start playing with compiling CocoR-CPP to wasm
mingodad Jun 6, 2021
72d6035
Implement the generation of an EBNF grammar understood by https://www…
mingodad Jun 8, 2021
b8c95fe
Small code change without functionality change
mingodad Jun 8, 2021
672e3c2
Add missing Taste.cpp and fixes for latest changes
mingodad Jun 9, 2021
f6cb7b2
Remove unnecessary 'while' loop because it's using 'goto' to loop ins…
mingodad Jun 9, 2021
25ec536
Add 'ANY' when generating RREBNF
mingodad Jun 9, 2021
60beabb
Reorganize the code removing duplication
mingodad Jun 10, 2021
f16cbd1
Remove unused include
mingodad Jun 10, 2021
d028315
Finally the last known memory leak is fixed
mingodad Jun 10, 2021
8e9f19f
Fix narrow signed char conversion when 'wcahr_t' == 'char'
mingodad Jun 10, 2021
010462e
Add the TestSuite
mingodad Jun 10, 2021
5f1d5d3
Add an overview of my main changes
mingodad Jun 10, 2021
07c3244
Fix for possible narrow conversion when wchar_t == char
mingodad Jun 10, 2021
c9e56bf
Fix my mistake by forget to wrap a literal string used as wchar_t *
mingodad Jun 10, 2021
0c151cb
Add reference to the Java and CSharp versions
mingodad Jun 10, 2021
e6a2b21
Fix typo
mingodad Jun 10, 2021
5a04a9c
My last fix for left recursion detection didn't worked for any depth,…
mingodad Jun 11, 2021
2f2beee
Fix SynTree.dump2 that is supposed to show a pruned tree
mingodad Jun 12, 2021
3ecb057
Rename SynTree::dump to SynTree::dump_all and SynTree::dump to SynTre…
mingodad Jun 14, 2021
b80f2e0
Fix to make it behave the same as the Java/CSharp version
mingodad Jul 1, 2021
ec65db3
Fix for endless loop with some ill grammars
mingodad Jul 1, 2021
530714c
Fix for when 'wchar_t' is 'char'
mingodad Jul 6, 2021
0efd1ec
Remove unused variable
mingodad Jul 6, 2021
01b226c
Add examples folder and an initial bison grammar
mingodad Jul 9, 2021
223c079
Add the suffix "_NT" to non terminal generated functions to minimize …
mingodad Aug 14, 2021
8fd041a
Add token inheritance from https://github.com/Lercher/CocoR
mingodad Aug 14, 2021
1e5715c
Add the extra features description from last commits
mingodad Aug 14, 2021
9182e47
Add column info to non terminals
mingodad Sep 4, 2021
f3b3e15
Add missing code for proper handling token inheritance
mingodad Sep 4, 2021
f3f29f5
Fix railroad EBNF generation and other fixes
mingodad Dec 25, 2021
b8b387e
Fix genRREBNF when outputting ANY
mingodad Dec 27, 2021
8bc8539
Fix trace output
mingodad Jul 11, 2022
b36a884
Change Node/Symbol type/kind to an independent header
mingodad Jul 14, 2022
84e4333
Fix memory leak
mingodad Jul 14, 2022
cde7538
Fix my mistake of calling "First" before testing, introduced here d60…
mingodad Jul 14, 2022
cd146b6
Fixes to build with https://github.com/jart/cosmopolitan
mingodad Sep 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,29 @@
Coco/R is a compiler generator, which takes an attributed grammar of a source language and generates a scanner and a parser for this language. The scanner works as a deterministic finite automaton. The parser uses recursive descent. LL(1) conflicts can be resolved by a multi-symbol lookahead or by semantic checks. Thus the class of accepted grammars is LL(k) for an arbitrary k.

http://ssw.jku.at/coco/

And this are my main modifications to the original:

- Fix all known memory leaks

- Enhance left recursion detection

- Allow semantic actions on `token declaration` similar to `pragmas` but the code executes on the Scanner

- Allow till 8 characters as comment delimiters

- Add option `-genRREBNF` to generate an EBNF grammar to crate railroad diagrams at https://www.bottlecaps.de/rr/ui

- Add option `-geAST` to generate code to generate `parser syntax tree` based on https://github.com/rochus-keller/EbnfStudio

- Add option `-ignoreGammarErrors` to make easier to develop grammars, like commenting one non terminal and still generating the parser and scanner even with sevral non reachable non terminals

- Add a `TERMINALS` section to generate user define tokens not managed by the Scanner (from cocoxml)

- Refactor the code to allow compile with and without wchar_t depending on the definition of `PARSER_WITH_AST` compiler macro

- Generate between comments the correspondent representation of several magic numbers (mainly Tokens)
- Add the `_NT` suffix to non terminal functions to prevent name collision
- Add token inheritance from https://github.com/Lercher/CocoR

See also https://github.com/mingodad/CocoR-Java and https://github.com/mingodad/CocoR-CSharp
346 changes: 346 additions & 0 deletions examples/bison.atg
Original file line number Diff line number Diff line change
@@ -0,0 +1,346 @@
$namespace=CocoBison

COMPILER Bison

TERMINALS
T_SYMBOL

CHARACTERS
letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_".
digit = "0123456789".
cr = '\r'.
lf = '\n'.
tab = '\t'.
ff = '\f'.
stringCh = ANY - '"' - '\\' - cr - lf.
charCh = ANY - '\'' - '\\' - cr - lf.
printable = '\u0020' .. '\u007e'.
hex = "0123456789abcdef".

TOKENS
ID = (letter | '.') { letter | digit | '.' | '-'}.
INT_LITERAL = digit { digit }.
STRING = '"' { stringCh | '\\' printable } '"'.
badString = '"' { stringCh | '\\' printable } (cr | lf).
CHAR_LITERAL = '\'' ( charCh | '\\' printable { hex } ) '\''.

PERCENT_TOKEN = "%token".
PERCENT_NTERM = "%nterm".

PERCENT_TYPE = "%type".
PERCENT_DESTRUCTOR = "%destructor".
PERCENT_PRINTER = "%printer".

PERCENT_LEFT = "%left".
PERCENT_RIGHT = "%right".
PERCENT_NONASSOC = "%nonassoc".
PERCENT_PRECEDENCE = "%precedence".

PERCENT_PREC = "%prec".
PERCENT_DPREC = "%dprec".
PERCENT_MERGE = "%merge".

PERCENT_CODE = "%code".
PERCENT_DEFAULT_PREC = "%default-prec".
PERCENT_DEFINE = "%define".
PERCENT_DEFINES = "%defines".
PERCENT_ERROR_VERBOSE = "%error-verbose".
PERCENT_EXPECT = "%expect".
PERCENT_EXPECT_RR = "%expect-rr".
PERCENT_FLAG = "%<flag>".
PERCENT_FILE_PREFIX = "%file-prefix".
PERCENT_GLR_PARSER = "%glr-parser".
PERCENT_INITIAL_ACTION = "%initial-action".
PERCENT_LANGUAGE = "%language".
PERCENT_NAME_PREFIX = "%name-prefix".
PERCENT_NO_DEFAULT_PREC = "%no-default-prec".
PERCENT_NO_LINES = "%no-lines".
PERCENT_NONDETERMINISTIC_PARSER = "%nondeterministic-parser".
PERCENT_OUTPUT = "%output".
PERCENT_PURE_PARSER = "%pure-parser".
PERCENT_REQUIRE = "%require".
PERCENT_SKELETON = "%skeleton".
PERCENT_START = "%start".
PERCENT_TOKEN_TABLE = "%token-table".
PERCENT_VERBOSE = "%verbose".
PERCENT_YACC = "%yacc".

//BRACED_CODE = "{...}".
//BRACED_PREDICATE = "%?{...}".
//BRACKETED_ID = "[identifier]".
//CHAR_LITERAL = "character literal".
COLON = ":".
EPILOGUE = "epilogue".
EQUAL = "=".
//ID = "identifier".
//ID_COLON "identifier:".
PERCENT_PERCENT = "%%".
PIPE = "|".
PROLOGUE = "%{...%}".
SEMICOLON = ";".
//TAG = "<tag>".
//TAG_ANY = "<*>".
//TAG_NONE = "<>".
LEFT_BRACE = '{'.
RIGHT_BRACE = '}'.
LEFT_ANGLE_BRACK = '<'.
RIGHT_ANGLE_BRACK = '>'.

PRAGMAS

COMMENTS FROM "/*" TO "*/" NESTED
COMMENTS FROM "//" TO lf

IGNORE cr + lf + tab + ff

/*-------------------------------------------------------------------------*/

PRODUCTIONS

Bison =
prologue_declarations "%%" grammar [epilogue]
EOF
.

prologue_declarations =
prologue_declaration {prologue_declaration}
.

prologue_declaration =
grammar_declaration
| "%{" {ANY} "%}"
| "%<flag>"
| "%define" variable [value]
| "%defines" [STRING]
| "%error-verbose"
| "%expect" INT_LITERAL
| "%expect-rr" INT_LITERAL
| "%file-prefix" STRING
| "%glr-parser"
| "%pure_parser"
| "%initial-action" params
| "%language" STRING
| "%name" ID
| "%name-prefix" ['='] STRING
| "%no-lines"
| "%nondeterministic-parser"
| "%output" STRING
| ("%param" | "%lex-param" | "%parse-param") params
| "%pure-parser"
| "%require" STRING
| "%skeleton" STRING
| "%token-table"
| "%verbose"
| "%yacc"
//| "%include-enum" STRING ID
| "%debug"
| "%locations"
//| error ";"
| /*FIXME: Err? What is this horror doing here? */ ";"
//| "BISONPRE_VERSION" '(' ANY {ANY} ')'
.

params =
'{' (. // manage nested braces
if(la->kind != _RIGHT_BRACE) {
//print("==", la->line, la->kind, la->val);
for (int nested = 1; nested > 0;) {
//print("==1", la->line, la->kind, la->val, nested);
//print("==", la->line, nested, la->kind, la->val);
if(la->kind == _LEFT_BRACE) ++nested;
Get();
if(la->kind == _RIGHT_BRACE) --nested;
else if(la->kind == _EOF) break;
//print("==2", la->line, la->kind, la->val, nested);
}
}
.)
{ANY} '}'
.

grammar_declaration =
symbol_declaration
| "%union" [union_name] params
| "%start" symbol
| code_props_type params generic_symlist
| "%default-prec"
| "%no-default-prec"
| "%code" [ID] params
.

code_props_type =
"%destructor"
| "%printer"
.

generic_symlist =
generic_symlist_item {generic_symlist_item}
.

generic_symlist_item =
symbol
| tag
.

union_name =
ID | tag
.

symbol_declaration =
"%nterm" nterm_decls
| "%token" token_decls
| "%term" symbol_decls
| "%type" symbol_decls
| precedence_declarator token_decls_for_prec
.

nterm_decls =
token_decls
.

token_decls =
[tag] token_decl_1 {token_decl_1}
.

token_decl_1 =
token_decl
.

token_decl =
id [int_opt] [alias]
.

int_opt =
INT_LITERAL
.

alias =
string_as_id
| "_(" STRING ')' //TSTRING
.

symbol_decls =
[tag] symbol_decl_1 {symbol_decl_1}
.

symbol_decl_1 =
symbol
.

precedence_declarator =
"%left"
| "%right"
| "%nonassoc"
| "%precedence"
| "%binary"
.

token_decls_for_prec =
[tag] token_decl_for_prec_1 {token_decl_for_prec_1}
.

// One or more token declarations for precedence declaration.
token_decl_for_prec_1 =
token_decl_for_prec
.

token_decl_for_prec =
id [int_opt]
| string_as_id
.

grammar =
rules_or_grammar_declaration {rules_or_grammar_declaration}
.

rules_or_grammar_declaration =
rules
| grammar_declaration ";"
//| error ";"
.

rules =
id_colon (. printf("%s ::= ", t->val); .)
[named_ref_opt | tag ] ":" rhses_1 (. printf("\n"); .)
.

rhses_1 =
rhs {
'|' (. printf("| "); .)
rhs
} ';'
.

rhs =
/*empty*/ (. printf("/*empty*/ "); .)
| "%empty" [params]
| rhs_symbol {rhs_symbol}
.

rhs_symbol =
symbol (. printf("%s ", t->val); .) [named_ref_opt | tag]
| params
//| [tag] params //named_ref_opt
| "%?{" {ANY} '}'
| "%prec" symbol
| "%dprec" INT_LITERAL
| "%merge" tag
| "%expect" INT_LITERAL
| "%expect-rr" INT_LITERAL
.

named_ref_opt =
'[' ID ']' //BRACKETED_ID
.

epilogue =
"%%" {ANY}
.

variable =
ID
.

value =
ID
| STRING
| params
| INT_LITERAL
.

id =
ID
| CHAR_LITERAL
.

id_colon =
ID //':'
.


symbol =
id
| string_as_id
.

string_as_id =
STRING
.

tag =
'<' (. // manage nested angle brackets
if(la->kind != _RIGHT_ANGLE_BRACK) {
for (int nested = 1; nested > 0;) {
//print("==", la->line, nested, la->kind, la->val);
if(la->kind == _LEFT_ANGLE_BRACK) ++nested;
Get();
if(la->kind == _RIGHT_ANGLE_BRACK) --nested;
else if(la->kind == _EOF) break;
}
}
.)
{ANY} '>'
.


END Bison.
4 changes: 4 additions & 0 deletions examples/build-cocobison.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
../src/Coco -frames ../src bison.atg
g++ -g -Wall -o cocobison Parser.cpp Scanner.cpp cocobison.cpp
#./cocobison "postgresql-13.3/src/backend/parser/gram.y"

Loading