Skip to content

Commit

Permalink
Merge pull request #585 from robinmaisch/rust
Browse files Browse the repository at this point in the history
Add Rust frontend
  • Loading branch information
tsaglam authored Aug 24, 2022
2 parents cc19b33 + f07c887 commit 4109691
Show file tree
Hide file tree
Showing 22 changed files with 18,335 additions and 15 deletions.
31 changes: 16 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,20 +15,21 @@ JPlag is a system that finds similarities among multiple sets of source code fil

In the following, a list of all supported languages with their supported language version is provided. A language can be selected from the command line using the `-l <cli argument name>` argument.

| Language | Version | CLI Argument Name | [state](https://github.com/jplag/JPlag/wiki/3.-Language-Modules) | parser
| --- | ---: | --- | :---: | :---: |
| [Java](https://www.java.com) | 17 | java | mature | JavaC |
| [C++](https://isocpp.org) | 11 | cpp | legacy | JavaCC |
| [C#](https://docs.microsoft.com/en-us/dotnet/csharp/) | 8 | csharp | beta | ANTLR 4 |
| [Go](https://go.dev) | 1.17 | golang | beta | ANTLR 4 |
| [Kotlin](https://kotlinlang.org) | 1.3 | kotlin | beta | ANTLR 4 |
| [Python](https://www.python.org) | 3.6 | python3 | legacy | ANTLR 4 |
| [R](https://www.r-project.org/) | 3.5.0 | rlang | beta | ANTLR 4 |
| [Scala](https://www.scala-lang.org) | 2.13.8 | scala | beta | Scalameta |
| [Scheme](http://www.scheme-reports.org) | ? | scheme | unknown | JavaCC |
| [EMF Metamodel](https://www.eclipse.org/modeling/emf/) | 2.25.0 | emf-metamodel | alpha | EMF |
| [EMF Metamodel](https://www.eclipse.org/modeling/emf/) (dynamic) | 2.25.0 | emf-metamodel-dynamic | alpha | EMF |
| Text (naive) | - | text | legacy | ANTLR |
| Language | Version | CLI Argument Name | [state](https://github.com/jplag/JPlag/wiki/3.-Language-Modules) | parser |
|------------------------------------------------------------------|--------:|-----------------------| :---: | :---: |
| [Java](https://www.java.com) | 17 | java | mature | JavaC |
| [C++](https://isocpp.org) | 11 | cpp | legacy | JavaCC |
| [C#](https://docs.microsoft.com/en-us/dotnet/csharp/) | 8 | csharp | beta | ANTLR 4 |
| [Go](https://go.dev) | 1.17 | golang | beta | ANTLR 4 |
| [Kotlin](https://kotlinlang.org) | 1.3 | kotlin | beta | ANTLR 4 |
| [Python](https://www.python.org) | 3.6 | python3 | legacy | ANTLR 4 |
| [R](https://www.r-project.org/) | 3.5.0 | rlang | beta | ANTLR 4 |
| [Rust](https://www.rust-lang.org/) | 1.60.0 | rust | beta | ANTLR 4 |
| [Scala](https://www.scala-lang.org) | 2.13.8 | scala | beta | Scalameta |
| [Scheme](http://www.scheme-reports.org) | ? | scheme | unknown | JavaCC |
| [EMF Metamodel](https://www.eclipse.org/modeling/emf/) | 2.25.0 | emf-metamodel | alpha | EMF |
| [EMF Metamodel](https://www.eclipse.org/modeling/emf/) (dynamic) | 2.25.0 | emf-metamodel-dynamic | alpha | EMF |
| Text (naive) | - | text | legacy | ANTLR |

## Download and Installation

Expand Down Expand Up @@ -67,7 +68,7 @@ Usage: JPlag [ options ] [ <root-dir> ... ] [ -new <new-dir> ... ] [ -old <old-d
named arguments:
-h, --help show this help message and exit
-l {java,python3,cpp,csharp,golang,kotlin,rlang,scala,text,scheme,emf-metamodel,emf-metamodel-dynamic} Select the language to parse the submissions (default: java)
-l {java,python3,cpp,csharp,golang,kotlin,rlang,rust,scala,text,scheme,emf-metamodel,emf-metamodel-dynamic} Select the language to parse the submissions (default: java)
-bc BC Path of the directory containing the base code (common framework used in all submissions)
-v {quiet,long} Verbosity of the logging (default: quiet)
-d Debug parser. Non-parsable files will be stored (default: false)
Expand Down
4 changes: 4 additions & 0 deletions jplag.cli/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,10 @@
<groupId>de.jplag</groupId>
<artifactId>rlang</artifactId>
</dependency>
<dependency>
<groupId>de.jplag</groupId>
<artifactId>rust</artifactId>
</dependency>
<dependency>
<groupId>de.jplag</groupId>
<artifactId>scala</artifactId>
Expand Down
93 changes: 93 additions & 0 deletions jplag.frontend.rust/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# JPlag Rust language frontend

The JPlag Rust frontend allows the use of JPlag with submissions in Scala. <br>
It is based on the [Rust ANTLR4 grammar](https://github.com/antlr/grammars-v4/tree/master/rust), licensed under MIT.

### Rust specification compatibility

According to the grammar's documentation, it was updated to Rust 1.60.0 (April 2022).

### Token Extraction

#### General

The choice of tokens is intended to be similar to the Java or C# frontends. Specifically, among others, it includes a
range of nesting structures (class and method declarations, control flow expressions) as well as variable declaration,
object creation, assignment, and control flow altering keywords. <br>
Blocks are distinguished by their context, i.e. there are separate `TokenConstants` for `if` blocks, `for` blocks, class
bodies, method bodies, array constructors, and the like.

More syntactic elements of Rust may turn out to be helpful to include in the future, especially those that are newly
introduced.

#### Problem in Rust (1): Grammar formulation

In contrast to other grammars used in frontends, the underlying Rust ANTLR4 grammar uses very general syntactic categories
that do not provide very much _semantic_ information. For example, the ifExpression rule features a `blockExpression` as
its body instead of a separate `ifBody` rule. This makes it hard to differentiate different uses of those `blockExpression`s.

It should be possible to refactor the grammar to include more specific rules. While not hard, this will still be tedious. Most of the
`ParserState` mechanism should become obsolete if this is done.

#### Problem in Rust (2): Pattern resolution

Rust allows to destruct complex objects using pattern matching.

```rust
// assigns a = 1; b = 2; c = 5;
let (a, b,.., c) = (1, 2, 3, 4, 5);

// assigns d = tuple[0]; f = tuple[n-1]
let (d,.., f) = tuple;
```

The _patterns_ on the left hand side as well as the elements on the right hand side can be nested freely. The _rest_
or _etcetera_ pattern `..` is used to skip a number of elements, so that the elements following it match the end part of
the assigned object.

These `let` pattern assignments can be replaced with a sequence of more basic assignments. This is a possible
problem of this frontend.

#### Problem in Rust (3): `return` is optional

In Rust, the `return` keyword is optional. If omitted, the last expression evaluated in the function body is used as the
return value.

```rust
fn power(base: i32, exponent: i32) -> i32 {
if exponent == 0 { 1 } // mark this return value?
else if exponent == 1 { base } // and this one?
else if exponent % 2 == 0 {
let square = |i: i32| { i * i };
square(power(base, exponent / 2)) // and this one?
} else {
base * power(base, exponent - 1) // and this one?
}
}
```

That raises the question whether to try and mark these more implicit return values, so that the output of this frontend
would be consistent with others.

To determine all possible return values, semantic information about control structures is necessary which may be tedious
to extract from the AST, but possible (e.g. by means of a stack mechanic).
On the other hand, "the last expression of a block evaluated" does not hold the same _syntactical_ weight to it as a
return
statement.

For the moment, implicit block values get no special tokens.

#### Problem in Rust (4): Macros

Macros are a vital part of Rust. They allow to expand brief statements into more complex, repeating code at compile time.

The expansion of the macro arguments into the macro code and the expansion of the macro code itself are purely textual, so a Rust parser does not parse their syntax (apart from the bracket structure). This makes it hard to generate meaningful tokens for them.

Currently, macro rule definition bodies and macro macro invocation arguments/bodies get no tokens.

### Usage

To use the Rust frontend, add the `-l rust` flag in the CLI, or use a `JPlagOption` object set
to `LanguageOption.RUST` in the Java API as described in the usage information in
the [readme of the main project](https://github.com/jplag/JPlag#usage)
and [in the wiki](https://github.com/jplag/JPlag/wiki/1.-How-to-Use-JPlag).
45 changes: 45 additions & 0 deletions jplag.frontend.rust/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>de.jplag</groupId>
<artifactId>aggregator</artifactId>
<version>${revision}</version>
</parent>
<artifactId>rust</artifactId>

<dependencies>
<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr4-runtime</artifactId>
</dependency>
<dependency>
<groupId>de.jplag</groupId>
<artifactId>frontend-utils</artifactId>
</dependency>
<dependency>
<groupId>de.jplag</groupId>
<artifactId>frontend-testutils</artifactId>
<version>${revision}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.antlr</groupId>
<artifactId>antlr4-maven-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>antlr4</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Rust ANTLR 4 grammar

This grammar is based on official language reference.

Licensed under MIT

Entry rule is `crate`.

Last updated for rust v1.60.0

## Maven build

Install the parser into the local Maven repository with `mvn install`.

## Known limitation

- Only v2018+ stable feature is implemented.
- Checks about isolated `\r` is not implemented.
Loading

0 comments on commit 4109691

Please sign in to comment.