Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Rust frontend #585

Merged
merged 25 commits into from
Aug 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
ccc04b4
Initial commit for Rust frontend
robinmaisch Jul 20, 2022
f07e39e
Merge branch 'master-remote' into rust
robinmaisch Jul 25, 2022
366333f
Merge branch 'master-remote' into rust
robinmaisch Aug 7, 2022
0f1f4d2
Integrate Rust frontend into JPlag CLI, add more token types
robinmaisch Aug 8, 2022
7e3da70
Merge branch 'master-remote' into rust
robinmaisch Aug 8, 2022
1becc60
Reformat code
robinmaisch Aug 8, 2022
95ada54
Extend Token set, move ParserState to own file
robinmaisch Aug 10, 2022
d280b64
Merge branch 'master-remote' into rust
robinmaisch Aug 10, 2022
9f64955
Reformat code
robinmaisch Aug 10, 2022
f786dc8
Fix wrong case in filename
robinmaisch Aug 10, 2022
21bac61
Add TYPE_ARGUMENT TokenConstant, fix APPLY/ARG order of tokens
robinmaisch Aug 11, 2022
46c4e51
Fix Sonarcloud code smells, fix tuples, handle "if let"
robinmaisch Aug 15, 2022
abbc8cf
Reformat code
robinmaisch Aug 15, 2022
7ffb23a
Apply review feedback, fix struct initialization token
robinmaisch Aug 16, 2022
04a16ac
Reformat
robinmaisch Aug 16, 2022
746c8d5
Add annotated demo files
robinmaisch Aug 16, 2022
285aa62
Merge branch 'master' into rust
robinmaisch Aug 16, 2022
35bcc87
Fix more code smells
robinmaisch Aug 16, 2022
1a93231
Merge remote-tracking branch 'origin/rust' into rust
robinmaisch Aug 16, 2022
138ce5e
Extend README: problem with grammar
robinmaisch Aug 16, 2022
d586838
Clean up RustLexerBase
robinmaisch Aug 19, 2022
c1057d2
Merge branch 'master-remote' into rust
robinmaisch Aug 24, 2022
af2dfa1
Update annotated files
robinmaisch Aug 24, 2022
f4d7147
Remove unused import
robinmaisch Aug 24, 2022
f07c887
Update main README
robinmaisch Aug 24, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 16 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,20 +15,21 @@ JPlag is a system that finds similarities among multiple sets of source code fil

In the following, a list of all supported languages with their supported language version is provided. A language can be selected from the command line using the `-l <cli argument name>` argument.

| Language | Version | CLI Argument Name | [state](https://github.com/jplag/JPlag/wiki/3.-Language-Modules) | parser
| --- | ---: | --- | :---: | :---: |
| [Java](https://www.java.com) | 17 | java | mature | JavaC |
| [C++](https://isocpp.org) | 11 | cpp | legacy | JavaCC |
| [C#](https://docs.microsoft.com/en-us/dotnet/csharp/) | 8 | csharp | beta | ANTLR 4 |
| [Go](https://go.dev) | 1.17 | golang | beta | ANTLR 4 |
| [Kotlin](https://kotlinlang.org) | 1.3 | kotlin | beta | ANTLR 4 |
| [Python](https://www.python.org) | 3.6 | python3 | legacy | ANTLR 4 |
| [R](https://www.r-project.org/) | 3.5.0 | rlang | beta | ANTLR 4 |
| [Scala](https://www.scala-lang.org) | 2.13.8 | scala | beta | Scalameta |
| [Scheme](http://www.scheme-reports.org) | ? | scheme | unknown | JavaCC |
| [EMF Metamodel](https://www.eclipse.org/modeling/emf/) | 2.25.0 | emf-metamodel | alpha | EMF |
| [EMF Metamodel](https://www.eclipse.org/modeling/emf/) (dynamic) | 2.25.0 | emf-metamodel-dynamic | alpha | EMF |
| Text (naive) | - | text | legacy | ANTLR |
| Language | Version | CLI Argument Name | [state](https://github.com/jplag/JPlag/wiki/3.-Language-Modules) | parser |
|------------------------------------------------------------------|--------:|-----------------------| :---: | :---: |
| [Java](https://www.java.com) | 17 | java | mature | JavaC |
| [C++](https://isocpp.org) | 11 | cpp | legacy | JavaCC |
| [C#](https://docs.microsoft.com/en-us/dotnet/csharp/) | 8 | csharp | beta | ANTLR 4 |
| [Go](https://go.dev) | 1.17 | golang | beta | ANTLR 4 |
| [Kotlin](https://kotlinlang.org) | 1.3 | kotlin | beta | ANTLR 4 |
| [Python](https://www.python.org) | 3.6 | python3 | legacy | ANTLR 4 |
| [R](https://www.r-project.org/) | 3.5.0 | rlang | beta | ANTLR 4 |
| [Rust](https://www.rust-lang.org/) | 1.60.0 | rust | beta | ANTLR 4 |
| [Scala](https://www.scala-lang.org) | 2.13.8 | scala | beta | Scalameta |
| [Scheme](http://www.scheme-reports.org) | ? | scheme | unknown | JavaCC |
| [EMF Metamodel](https://www.eclipse.org/modeling/emf/) | 2.25.0 | emf-metamodel | alpha | EMF |
| [EMF Metamodel](https://www.eclipse.org/modeling/emf/) (dynamic) | 2.25.0 | emf-metamodel-dynamic | alpha | EMF |
| Text (naive) | - | text | legacy | ANTLR |

## Download and Installation

Expand Down Expand Up @@ -67,7 +68,7 @@ Usage: JPlag [ options ] [ <root-dir> ... ] [ -new <new-dir> ... ] [ -old <old-d

named arguments:
-h, --help show this help message and exit
-l {java,python3,cpp,csharp,golang,kotlin,rlang,scala,text,scheme,emf-metamodel,emf-metamodel-dynamic} Select the language to parse the submissions (default: java)
-l {java,python3,cpp,csharp,golang,kotlin,rlang,rust,scala,text,scheme,emf-metamodel,emf-metamodel-dynamic} Select the language to parse the submissions (default: java)
-bc BC Path of the directory containing the base code (common framework used in all submissions)
-v {quiet,long} Verbosity of the logging (default: quiet)
-d Debug parser. Non-parsable files will be stored (default: false)
Expand Down
4 changes: 4 additions & 0 deletions jplag.cli/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,10 @@
<groupId>de.jplag</groupId>
<artifactId>rlang</artifactId>
</dependency>
<dependency>
<groupId>de.jplag</groupId>
<artifactId>rust</artifactId>
</dependency>
<dependency>
<groupId>de.jplag</groupId>
<artifactId>scala</artifactId>
Expand Down
93 changes: 93 additions & 0 deletions jplag.frontend.rust/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# JPlag Rust language frontend

The JPlag Rust frontend allows the use of JPlag with submissions in Scala. <br>
It is based on the [Rust ANTLR4 grammar](https://github.com/antlr/grammars-v4/tree/master/rust), licensed under MIT.

### Rust specification compatibility

According to the grammar's documentation, it was updated to Rust 1.60.0 (April 2022).

### Token Extraction

#### General

The choice of tokens is intended to be similar to the Java or C# frontends. Specifically, among others, it includes a
range of nesting structures (class and method declarations, control flow expressions) as well as variable declaration,
object creation, assignment, and control flow altering keywords. <br>
Blocks are distinguished by their context, i.e. there are separate `TokenConstants` for `if` blocks, `for` blocks, class
bodies, method bodies, array constructors, and the like.

More syntactic elements of Rust may turn out to be helpful to include in the future, especially those that are newly
introduced.

#### Problem in Rust (1): Grammar formulation

In contrast to other grammars used in frontends, the underlying Rust ANTLR4 grammar uses very general syntactic categories
that do not provide very much _semantic_ information. For example, the ifExpression rule features a `blockExpression` as
its body instead of a separate `ifBody` rule. This makes it hard to differentiate different uses of those `blockExpression`s.

It should be possible to refactor the grammar to include more specific rules. While not hard, this will still be tedious. Most of the
`ParserState` mechanism should become obsolete if this is done.

#### Problem in Rust (2): Pattern resolution

Rust allows to destruct complex objects using pattern matching.

```rust
// assigns a = 1; b = 2; c = 5;
let (a, b,.., c) = (1, 2, 3, 4, 5);

// assigns d = tuple[0]; f = tuple[n-1]
let (d,.., f) = tuple;
```

The _patterns_ on the left hand side as well as the elements on the right hand side can be nested freely. The _rest_
or _etcetera_ pattern `..` is used to skip a number of elements, so that the elements following it match the end part of
the assigned object.

These `let` pattern assignments can be replaced with a sequence of more basic assignments. This is a possible
problem of this frontend.

#### Problem in Rust (3): `return` is optional

In Rust, the `return` keyword is optional. If omitted, the last expression evaluated in the function body is used as the
return value.

```rust
fn power(base: i32, exponent: i32) -> i32 {
if exponent == 0 { 1 } // mark this return value?
else if exponent == 1 { base } // and this one?
else if exponent % 2 == 0 {
let square = |i: i32| { i * i };
square(power(base, exponent / 2)) // and this one?
} else {
base * power(base, exponent - 1) // and this one?
}
}
```

That raises the question whether to try and mark these more implicit return values, so that the output of this frontend
would be consistent with others.

To determine all possible return values, semantic information about control structures is necessary which may be tedious
to extract from the AST, but possible (e.g. by means of a stack mechanic).
On the other hand, "the last expression of a block evaluated" does not hold the same _syntactical_ weight to it as a
return
statement.

For the moment, implicit block values get no special tokens.
tsaglam marked this conversation as resolved.
Show resolved Hide resolved

#### Problem in Rust (4): Macros

Macros are a vital part of Rust. They allow to expand brief statements into more complex, repeating code at compile time.

The expansion of the macro arguments into the macro code and the expansion of the macro code itself are purely textual, so a Rust parser does not parse their syntax (apart from the bracket structure). This makes it hard to generate meaningful tokens for them.

Currently, macro rule definition bodies and macro macro invocation arguments/bodies get no tokens.

### Usage

To use the Rust frontend, add the `-l rust` flag in the CLI, or use a `JPlagOption` object set
to `LanguageOption.RUST` in the Java API as described in the usage information in
the [readme of the main project](https://github.com/jplag/JPlag#usage)
and [in the wiki](https://github.com/jplag/JPlag/wiki/1.-How-to-Use-JPlag).
45 changes: 45 additions & 0 deletions jplag.frontend.rust/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>de.jplag</groupId>
<artifactId>aggregator</artifactId>
<version>${revision}</version>
</parent>
<artifactId>rust</artifactId>

<dependencies>
<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr4-runtime</artifactId>
</dependency>
<dependency>
<groupId>de.jplag</groupId>
<artifactId>frontend-utils</artifactId>
</dependency>
<dependency>
<groupId>de.jplag</groupId>
<artifactId>frontend-testutils</artifactId>
<version>${revision}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.antlr</groupId>
<artifactId>antlr4-maven-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>antlr4</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Rust ANTLR 4 grammar

This grammar is based on official language reference.

Licensed under MIT

Entry rule is `crate`.

Last updated for rust v1.60.0

## Maven build

Install the parser into the local Maven repository with `mvn install`.

## Known limitation

- Only v2018+ stable feature is implemented.
- Checks about isolated `\r` is not implemented.
Loading