Merge branch 'dev'

antlr · May 21, 2023 · a9639d0 · a9639d0
2 parents 8188dc5 + 134eda9
commit a9639d0
Show file tree

Hide file tree

Showing 208 changed files with 6,931 additions and 20,018 deletions.
diff --git a/.github/workflows/hosted.yml b/.github/workflows/hosted.yml
@@ -14,7 +14,7 @@ permissions:
   contents: read
 
 jobs:
-  cpp-builds:
+  cpp-lib-build:
     runs-on: ${{ matrix.os }}
 
     strategy:
@@ -26,6 +26,7 @@ jobs:
           windows-2022
         ]
         compiler: [ clang, gcc ]
+        unity_build: [ ON, OFF ]
         exclude:
           - os: windows-2022
             compiler: gcc
@@ -95,7 +96,7 @@ jobs:
 
         cd runtime/Cpp
 
-        cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug -DANTLR_BUILD_CPP_TESTS=OFF -S . -B out/Debug
+        cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug -DANTLR_BUILD_CPP_TESTS=OFF -DCMAKE_UNITY_BUILD=${{ matrix.unity_build }} -DCMAKE_UNITY_BUILD_BATCH_SIZE=20 -S . -B out/Debug
         if %errorlevel% neq 0 exit /b %errorlevel%
 
         cmake --build out/Debug -j %NUMBER_OF_PROCESSORS%
@@ -130,7 +131,7 @@ jobs:
 
         cd runtime/Cpp
 
-        cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug -DANTLR_BUILD_CPP_TESTS=OFF -S . -B out/Debug
+        cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug -DANTLR_BUILD_CPP_TESTS=OFF -DCMAKE_UNITY_BUILD=${{ matrix.unity_build }} -DCMAKE_UNITY_BUILD_BATCH_SIZE=20 -S . -B out/Debug
         cmake --build out/Debug --parallel
 
         cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DANTLR_BUILD_CPP_TESTS=OFF -S . -B out/Release
@@ -140,11 +141,12 @@ jobs:
       if: always()
       run: |
         cd ${{ github.workspace }}/..
-        tar czfp antlr_${{ matrix.os }}_${{ matrix.compiler }}.tgz antlr4
+        tar czfp antlr_${{ matrix.os }}_${{ matrix.compiler }}.tgz --exclude='.git' antlr4
         mv antlr_${{ matrix.os }}_${{ matrix.compiler }}.tgz ${{ github.workspace }}/.
 
     - name: Archive artifacts
       if: always()
+      continue-on-error: true
       uses: actions/upload-artifact@v3
       with:
         name: antlr_${{ matrix.os }}_${{ matrix.compiler }}
@@ -338,11 +340,12 @@ jobs:
       if: always()
       run: |
         cd ${{ github.workspace }}/..
-        tar czfp antlr_${{ matrix.os }}_${{ matrix.target }}.tgz antlr4
+        tar czfp antlr_${{ matrix.os }}_${{ matrix.target }}.tgz --exclude='.git' antlr4
         mv antlr_${{ matrix.os }}_${{ matrix.target }}.tgz ${{ github.workspace }}/.
 
     - name: Archive artifacts
       if: always()
+      continue-on-error: true
       uses: actions/upload-artifact@v3
       with:
         name: antlr_${{ matrix.os }}_${{ matrix.target }}

diff --git a/.gitignore b/.gitignore
@@ -83,6 +83,9 @@ nbactions*.xml
 /gen4/
 /tool/playground/
 tmp/
+**/generatedCode/*.interp
+**/generatedCode/*.tokens
+**/generatedCode/*.bak
 
 # Configurable build files
 bilder.py
@@ -107,6 +110,9 @@ runtime/PHP
 # Swift binaries
 .build/
 
+# Code coverage reports
+coverage/
+
 # Cpp generated build files
 runtime/Cpp/CMakeCache.txt
 runtime/Cpp/CMakeFiles/
@@ -126,3 +132,7 @@ runtime/Cpp/runtime/cmake_install.cmake
 runtime/Cpp/runtime/libantlr4-runtime.4.10.1.dylib
 runtime/Cpp/runtime/libantlr4-runtime.a
 runtime/Cpp/runtime/libantlr4-runtime.dylib
+/runtime/Cpp/runtime/libantlr4-runtime.4.12.0.dylib
+
+# Go test and performance trace files
+**/*.pprof
diff --git a/README.md b/README.md
@@ -3,20 +3,6 @@
 [![Java 11+](https://img.shields.io/badge/java-11+-4c7e9f.svg)](http://java.oracle.com)
 [![License](https://img.shields.io/badge/license-BSD-blue.svg)](https://raw.githubusercontent.com/antlr/antlr4/master/LICENSE.txt)
 
-
-## Versioning
-
-ANTLR 4 supports 10 target languages, and ensuring consistency across these targets is a unique and highly valuable feature.
-To ensure proper support of this feature, each release of ANTLR is a complete release of the tool and the 10 runtimes, all with the same version.
-As such, ANTLR versioning does not strictly follow semver semantics:
-
-* a component may be released with the latest version number even though nothing has changed within that component since the previous release
-* major version is bumped only when ANTLR is rewritten for a totally new "generation", such as ANTLR3 -> ANTLR4 (LL(\*) -> ALL(\*) parsing)
-* minor version updates may include minor breaking changes, the policy is to regenerate parsers with every release (4.11 -> 4.12)
-* backwards compatibility is only guaranteed for patch version bumps (4.11.1 -> 4.11.2)
-
-If you use a semver verifier in your CI, you probably want to apply special rules for ANTLR, such as treating minor change as a major change.
-
 **ANTLR** (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build parse trees and also generates a listener interface (or visitor) that makes it easy to respond to the recognition of phrases of interest.
 
 **Dev branch build status**
@@ -32,17 +18,33 @@ If you use a semver verifier in your CI, you probably want to apply special rule
 [![Travis-CI Build Status (Swift-Linux)](https://img.shields.io/travis/antlr/antlr4.svg?label=Linux-Swift&branch=master)](https://travis-ci.com/github/antlr/antlr4)
 -->
 
+
+## Versioning
+
+ANTLR 4 supports 10 target languages, and ensuring consistency across these targets is a unique and highly valuable feature.
+To ensure proper support of this feature, each release of ANTLR is a complete release of the tool and the 10 runtimes, all with the same version.
+As such, ANTLR versioning does not strictly follow semver semantics:
+
+* a component may be released with the latest version number even though nothing has changed within that component since the previous release
+* major version is bumped only when ANTLR is rewritten for a totally new "generation", such as ANTLR3 -> ANTLR4 (LL(\*) -> ALL(\*) parsing)
+* minor version updates may include minor breaking changes, the policy is to regenerate parsers with every release (4.11 -> 4.12)
+* backwards compatibility is only guaranteed for patch version bumps (4.11.1 -> 4.11.2)
+
+If you use a semver verifier in your CI, you probably want to apply special rules for ANTLR, such as treating minor change as a major change.
+
 ## Repo branch structure
 
 The default branch for this repo is [`master`](https://github.com/antlr/antlr4/tree/master), which is the latest stable release and has tags for the various releases; e.g., see release tag [4.9.3](https://github.com/antlr/antlr4/tree/4.9.3).  Branch [`dev`](https://github.com/antlr/antlr4/tree/dev) is where development occurs between releases and all pull requests should be derived from that branch. The `dev` branch is merged back into `master` to cut a release and the release state is tagged (e.g., with `4.10-rc1` or `4.10`.) Visually our process looks roughly like this:
 
 <img src="doc/images/new-antlr-branches.png" width="500">
 
-Targets such as Go that pull directly from the repository can use the default `master` branch but can also pull from the active `dev` branch:
+The Go target now has its own dedicated repo:
 
 ```bash
-$ go get github.com/antlr/antlr4/runtime/Go/antlr@dev
+$ go get github.com/antlr4-go/antlr
 ```
+**Note**
+The dedicated Go repo is for `go get` and `import` only. Go runtime development is still performed in the main `antlr/antlr4` repo. 
 
 ## Authors and major contributors
 

diff --git a/THIRD-PARTY-NOTICES.txt b/THIRD-PARTY-NOTICES.txt
diff --git a/antlr4-maven-plugin/pom.xml b/antlr4-maven-plugin/pom.xml
@@ -8,7 +8,7 @@
   <parent>
     <groupId>org.antlr</groupId>
     <artifactId>antlr4-master</artifactId>
-    <version>4.12.1-SNAPSHOT</version>
+    <version>4.13.0-SNAPSHOT</version>
   </parent>
   <artifactId>antlr4-maven-plugin</artifactId>
   <packaging>maven-plugin</packaging>

diff --git a/doc/cpp-target.md b/doc/cpp-target.md
@@ -78,7 +78,7 @@ This example assumes your grammar contains a parser rule named `key` for which t
 
 There are a couple of things that only the C++ ANTLR target has to deal with. They are described here.
 
-### Build Aspects
+### Code Generation Aspects
 The code generation (by running the ANTLR4 jar) allows to specify 2 values you might find useful for better integration of the generated files into your application (both are optional):
 
 * A **namespace**: use the **`-package`** parameter to specify the namespace you want.
@@ -102,6 +102,16 @@ In order to create a static lib in Visual Studio define the `ANTLR4CPP_STATIC` m
 
 For gcc and clang it is possible to use the `-fvisibility=hidden` setting to hide all symbols except those that are made default-visible (which has been defined for all public classes in the runtime).
 
+### Compile Aspects
+
+When compiling generated files, you can configure a compile option according to your needs (also optional):
+
+* A **thread local DFA macro**: Add `-DANTLR4_USE_THREAD_LOCAL_CACHE=1` to the compilation options
+will enable using thread local DFA cache (disabled by default), after that, each thread uses its own DFA.
+This will increase memory usage to store thread local DFAs and redundant computation to build thread local DFAs (not too much).
+The benefit is that it can improve the concurrent performance running with multiple threads.
+In other words, when you find your concurent throughput is not high enough, you should consider turning on this option.
+
 ### Memory Management
 Since C++ has no built-in memory management we need to take extra care. For that we rely mostly on smart pointers, which however might cause time penalties or memory side effects (like cyclic references) if not used with care. Currently however the memory household looks very stable. Generally, when you see a raw pointer in code consider this as being managed elsewhere. You should never try to manage such a pointer (delete, assign to smart pointer etc.).
 

diff --git a/doc/go-changes.md b/doc/go-changes.md
@@ -0,0 +1,179 @@
+# Changes to the Go Runtime over time
+
+## v4.12.0 to v4.12.1
+
+Strictly speaking, if ANTLR was a go only project following [SemVer](https://semver.org/) release v4.12.1 would be
+at least a minor version change and arguably a bump to v5. However, we must follow the ANTLR conventions here or the
+release numbers would quickly become confusing. I apologize for being unable to follow the Go release rules absolutely 
+to the letter.
+
+There are a lot of changes and improvements in this release, but only the change of repo holding the runtime code,
+and possibly the removal of interfaces will cause any code changes. There are no breaking changes to the runtime
+interfaces.
+
+ANTLR Go Maintainer: [Jim Idle](https://github.com/jimidle) - Email:  [[email protected]](mailto:[email protected])
+
+### Code Relocation
+
+For complicated reasons, including not breaking the builds of some users who use a monorepo and eschew modules, as well
+as not making substantial changes to the internal test suite, the Go runtime code will continue to be maintained in
+the main ANTLR4 repo `antlr/antlr4`. If you wish to contribute changes to the Go runtime code, please continue to submit 
+PRs to this main repo, against the `dev` branch.
+
+The code located in the main repo at about the depth of the Mariana Trench, means that the go tools cannot reconcile
+the module correctly. After some debate, it was decided that we would create a dedicated release repo for the Go runtime
+so that it will behave exactly as the Go tooling expects. This repo is auto-maintained and keeps both the dev and master
+branches up to date.
+
+Henceforth, all future projects using the ANTLR Go runtime, should import as follows:
+
+```go
+import (
+    "github.com/antlr4-go/antlr/v4"
+    )
+```
+
+And use the command:
+
+```shell
+go get github.com/antlr4-go/antlr
+```
+
+To get the module - `go mod tidy` is probably the best way once imports have been changed. 
+
+Please note that there is no longer any source code kept in the ANTLR repo under `github.com/antlr/antlr4/runtime/Go/antlr`.
+If you are using the code without modules, then sync the code from the new release repo.
+
+### Documentation
+
+Prior to this release, the godocs were essentially unusable as the go doc code was essentially copied without
+change, from teh Java runtime. The godocs are now properly formatted for Go and pkg.dev.
+
+Please feel free to raise an issue if you find any remaining mistakes. Or submit a PR (remember - not to the new repo).
+It is expected that it might take a few iterations to get the docs 100% squeaky clean.
+
+### Removal of Unnecessary Interfaces
+
+The Go runtime was originally produced as almost a copy of the Java runtime but with go syntax. This meant that everything 
+had an interface. There is no need to use interfaces in Go if there is only ever going to be one implementation of
+some struct and its methods. Interfaces cause an extra deference at runtime and are detrimental to performance if you
+are trying to squeeze out every last nanosecond, which some users will be trying to do.
+
+This is 99% an internal refactoring of the runtime with no outside effects to the user.
+
+### Generated Recognizers Return *struct and not Interfaces
+
+The generated recognizer code generated an interface for the parsers and lexers. As they can only be implemented by the
+generated code, the interfaces were removed. This is possibly the only place you may need to make a code change to
+your driver code.
+
+If your code looked like this:
+
+```go
+var lexer = parser.NewMySqlLexer(nil)
+var p = parser.NewMySqlParser(nil)
+```
+
+Or this:
+
+```go
+lexer := parser.NewMySqlLexer(nil)
+p := parser.NewMySqlParser(nil)
+```
+
+Then no changes need to be made. However, fi you predeclared the parser and lexer variables with there type, such as like
+this:
+
+```go
+var lexer parser.MySqlLexer
+var p parser.MySqlParser
+// ...
+lexer = parser.NewMySqlLexer(nil)
+p = parser.NewMySqlParser(nil)
+```
+
+You will need to change your variable declarations to pointers (note the introduction of the `*` below. 
+
+```go
+var lexer *parser.MySqlLexer
+var p *parser.MySqlParser
+// ...
+lexer = parser.NewMySqlLexer(nil)
+p = parser.NewMySqlParser(nil)
+```
+
+This is the only user facing change that I can see. This change though has a very beneficial side effect in that you
+no longer need to cast the interface into a struct so that you can access methods and data within it. Any code you
+had that needed to do that, will be cleaner and faster.
+
+The performance improvement is worth the change and there was no tidy way for me to avoid it.
+
+### Parser Error Recovery Does Not Use Panic
+
+THe generated parser code was again essentially trying to be Java code in disguise. This meant that every parser rule
+executed a `defer {}` and a `recover()`, even if there wer no outstanding parser errors. Parser errors were issued by
+issuing a `panic()`! 
+
+While some major work has been performed in the go compiler and runtime to make `defer {}` as fast as possible, 
+`recover()` is (relatively) slow as it is not meant to be used as a general error mechanism, but to recover from say
+an internal library problem if that problem can be recovered to a known state. 
+
+The generated code now stores a recognition error and a flag in the main parser struct and use `goto` to exit the
+rule instead of a `panic()`. As might be imagined, this is significantly faster through the happy path. It is also 
+faster at generating errors.
+
+The ANTLR runtime tests do check error raising and recovery, but if you find any differences in the error handling
+behavior of your parsers, please raise an issue. 
+
+### Reduction in use of Pointers
+
+Certain internal structs, such as interval sets are small and immutable, but were being passed around as pointers
+anyway. These have been change to use copies, and resulted in significant performance increases in some cases. 
+There is more work to come in this regard.
+
+### ATN Deserialization
+
+When the ATN and associated structures are deserialized for the first time, there was a bug that caused a needed
+optimization to fail to be executed. This could have a significant performance effect on recognizers that were written
+in a suboptimal way (as in poorly formed grammars). This is now fixed.
+
+### Prediction Context Caching was not Working
+
+This has a massive effect when reusing a parser for a second and subsequent run. The PredictionContextCache merely
+used memory but did not speed up subsequent executions. This is now fixed, and you should see a big difference in 
+performance when reusing a parser. This single paragraph does not do this fix justice ;) 
+
+### Cumulative Performance Improvements
+
+Though too numerous to mention, there are a lot of small performance improvements, that add up in accumulation. Everything
+from improvements in collection performance to slightly better algorithms or specific non-generic algorithms. 
+
+### Cumulative Memory Improvements
+
+The real improvements in memory usage, allocation and garbage collection are saved for the next major release. However,
+if your grammar is well-formed and does not require almost infinite passes using ALL(*), then both memory and performance
+will be improved with this release.
+
+### Bug Fixes
+
+Other small bug fixes have been addressed, such as potential panics in funcs that did not check input parameters. There
+are a lot of bug fixes in this release that most people were probably not aware of. All known bugs are fixed at the 
+time of release preparation.
+
+### A Note on Poorly Constructed Grammars
+
+Though I have made some significant strides on improving the performance of poorly formed grammars, those that are
+particularly bad will see much less of an incremental improvement compared to those that are fairly well-formed.
+
+This is deliberately so in this release as I felt that those people who have put in effort to optimize the form of their
+grammar are looking for performance, where those that have grammars that parser in seconds, tens of seconds or even
+minutes, are presumed to not care about performance. 
+
+A particularly good (or bad) example is the MySQL grammar in the ANTLR grammar repository (apologies to the Author 
+if you read this note - this isn't an attack). Although I have improved its runtime performance
+drastically in the Go runtime, it still takes about a minute to parse complex select statements. As it is constructed, 
+there are no magic answers. I will look in more detail at improvements for such parsers, such as not freeing any
+memory until the parse is finished (improved 100x in experiments).
+
+The best advice I can give is to put some effort in to the actual grammar itself. well-formed grammars will potentially
+see some huge improvements with this release. Badly formed grammars, not so much.