Skip to content

Commit

Permalink
Merge pull request #289 from Alberth289346/old-new-20220209
Browse files Browse the repository at this point in the history
Extend JPlag with checking against prior submissions
  • Loading branch information
tsaglam authored Apr 12, 2022
2 parents 0109467 + 4ffde7e commit fbb668a
Show file tree
Hide file tree
Showing 16 changed files with 271 additions and 83 deletions.
36 changes: 20 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,31 +39,35 @@ JPlag can either be used via the CLI or directly via its Java API. For more info
```
JPlag - Detecting Software Plagiarism
Usage: JPlag [ options ] [<root-dir> ...]
<root-dir> Root-directory that contains submissions
Usage: JPlag [ options ] [ <root-dir> ... ] [ -new <new-dir> ... ] [ -old <old-dir> ... ]
<root-dir> Root-directory with submissions to check for plagiarism
<new-dir> Root-directory with submissions to check for plagiarism
<old-dir> Root-directory with prior submissions to compare against
named arguments:
-h, --help show this help message and exit
-l {java,python3,cpp,csharp,char,text,scheme} Select the language to parse the submissions (default: java)
-bc BC Path of the directory containing the base code (common framework used in all submissions)
-v {quiet,long} Verbosity of the logging (default: quiet)
-d Debug parser. Non-parsable files will be stored (default false)
-S S Look in directories <root-dir>/*/<dir> for programs
-p P comma-separated list of all filename suffixes that are included
-x X All files named in this file will be ignored in the comparison (line-separated list)
-t T Tunes the comparison sensitivity by adjusting the minimum token required to be counted as a matching section. A smaller <n> increases the sensitivity but might lead to more false-positives
-m M Comparison similarity threshold [0-100]: All comparisons above this threshold will be saved (default: 0.0)
-n N The maximum number of comparisons that will be shown in the generated report, if set to -1 all comparisons will be shown (default: 30)
-r R Name of the directory in which the comparison results will be stored (default: result)
-c {normal,parallel} Comparison mode used to compare the programs (default: normal)
-h, --help show this help message and exit
-l {java,python3,cpp,csharp,char,text,scheme} Select the language to parse the submissions (default: java)
-bc BC Path of the directory containing the base code (common framework used in all submissions)
-v {quiet,long} Verbosity of the logging (default: quiet)
-d Debug parser. Non-parsable files will be stored (default: false)
-S S Look in directories <root-dir>/*/<dir> for programs
-p P comma-separated list of all filename suffixes that are included
-x X All files named in this file will be ignored in the comparison (line-separated list)
-t T Tunes the comparison sensitivity by adjusting the minimum token required to be counted as a matching section. A smaller
<n> increases the sensitivity but might lead to more false-positives
-m M Comparison similarity threshold [0-100]: All comparisons above this threshold will be saved (default: 0.0)
-n N The maximum number of comparisons that will be shown in the generated report, if set to -1 all comparisons will be shown
(default: 30)
-r R Name of the directory in which the comparison results will be stored (default: result)
-c {normal,parallel} Comparison mode used to compare the programs (default: normal)
```

### Java API

The new API makes it easy to integrate JPlag's plagiarism detection into external Java projects:

```java
JPlagOptions options = new JPlagOptions(List.of("/path/to/rootDir"), LanguageOption.JAVA);
JPlagOptions options = new JPlagOptions(List.of("/path/to/rootDir"), List.of(), LanguageOption.JAVA);
options.setBaseCodeSubmissionName("template");

JPlag jplag = new JPlag(options);
Expand Down
20 changes: 19 additions & 1 deletion jplag/src/main/java/de/jplag/CLI.java
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

import static de.jplag.CommandLineArgument.*;

import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import java.util.Random;

Expand Down Expand Up @@ -98,8 +100,16 @@ public JPlagOptions buildOptionsFromArguments(Namespace namespace) {
if (fileSuffixString != null) {
fileSuffixes = fileSuffixString.replaceAll("\\s+", "").split(",");
}

// Collect the root directories.
List<String> submissionDirectories = new ArrayList<>();
List<String> oldSubmissionDirectories = new ArrayList<>();
addAllMultiValueArgument(ROOT_DIRECTORY.getListFrom(namespace), submissionDirectories);
addAllMultiValueArgument(NEW_DIRECTORY.getListFrom(namespace), submissionDirectories);
addAllMultiValueArgument(OLD_DIRECTORY.getListFrom(namespace), oldSubmissionDirectories);

LanguageOption language = LanguageOption.fromDisplayName(LANGUAGE.getFrom(namespace));
JPlagOptions options = new JPlagOptions(ROOT_DIRECTORY.getListFrom(namespace), language);
JPlagOptions options = new JPlagOptions(submissionDirectories, oldSubmissionDirectories, language);
options.setBaseCodeSubmissionName(BASE_CODE.getFrom(namespace));
options.setVerbosity(Verbosity.fromOption(VERBOSITY.getFrom(namespace)));
options.setDebugParser(DEBUG.getFrom(namespace));
Expand Down Expand Up @@ -152,4 +162,12 @@ private String generateDescription() {
var randomDescription = DESCRIPTIONS[new Random().nextInt(DESCRIPTIONS.length)];
return String.format("JPlag - %s" + System.lineSeparator() + CREDITS, randomDescription);
}

private void addAllMultiValueArgument(List<List<String>> argumentValues, List<String> destinationRootDirectories) {
if (argumentValues == null) {
return;
}

argumentValues.stream().forEach(value -> destinationRootDirectories.addAll(value));
}
}
12 changes: 9 additions & 3 deletions jplag/src/main/java/de/jplag/CommandLineArgument.java
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import static de.jplag.options.JPlagOptions.DEFAULT_COMPARISON_MODE;
import static de.jplag.options.JPlagOptions.DEFAULT_SHOWN_COMPARISONS;
import static de.jplag.options.JPlagOptions.DEFAULT_SIMILARITY_THRESHOLD;
import static net.sourceforge.argparse4j.impl.Arguments.append;
import static net.sourceforge.argparse4j.impl.Arguments.storeTrue;

import java.util.Collection;
Expand All @@ -30,8 +31,9 @@
* @author Timur Saglam
*/
public enum CommandLineArgument {

ROOT_DIRECTORY(new Builder("rootDir", String.class).nargs(NumberOfArgumentValues.ONE_OR_MORE_VALUES)),
ROOT_DIRECTORY(new Builder("rootDir", String.class).nargs(NumberOfArgumentValues.ZERO_OR_MORE_VALUES)),
NEW_DIRECTORY(new Builder("-new", String.class).nargs(NumberOfArgumentValues.ONE_OR_MORE_VALUES)),
OLD_DIRECTORY(new Builder("-old", String.class).nargs(NumberOfArgumentValues.ONE_OR_MORE_VALUES)),
LANGUAGE(new Builder("-l", String.class).defaultsTo(LanguageOption.getDefault().getDisplayName()).choices(LanguageOption.getAllDisplayNames())),
BASE_CODE("-bc", String.class),
VERBOSITY(new Builder("-v", String.class).defaultsTo("quiet").choices(List.of("quiet", "long"))), // TODO SH: Replace verbosity when integrating a
Expand Down Expand Up @@ -167,8 +169,12 @@ public void parseWith(ArgumentParser parser, CliGroupHelper groupHelper) {
if (type == Boolean.class) {
argument.action(storeTrue());
}
if (numberOfValues == NumberOfArgumentValues.ONE_OR_MORE_VALUES) {
if (!numberOfValues.toString().isEmpty()) {
// For multi-value arguments keep all invocations.
// This causes the argument value to change its type to 'List<List<String>>'.
// Also, when the retrieved value after parsing the CLI is 'null', the argument is not used.
argument.nargs(numberOfValues.toString());
argument.action(append());
}
}

Expand Down
3 changes: 2 additions & 1 deletion jplag/src/main/java/de/jplag/NumberOfArgumentValues.java
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
*/
public enum NumberOfArgumentValues {
SINGLE_VALUE(""),
ONE_OR_MORE_VALUES("+"),;
ONE_OR_MORE_VALUES("+"),
ZERO_OR_MORE_VALUES("*");

private final String representation;

Expand Down
16 changes: 15 additions & 1 deletion jplag/src/main/java/de/jplag/Submission.java
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,11 @@ public class Submission implements Comparable<Submission> {
*/
private final File submissionRootFile;

/**
* Whether the submission is new. That is, must be checked for plagiarism.
*/
private final boolean isNew;

/**
* Files of the submission.
*/
Expand Down Expand Up @@ -62,13 +67,15 @@ public class Submission implements Comparable<Submission> {
* Creates a submission.
* @param name Identification of the submission (directory or filename).
* @param submissionRootFile is the submission file, or the root of the submission itself.
* @param isNew states whether the submission must be checked for plagiarism.
* @param files are the files of the submissions, if the root is a single file it should just contain one file.
* @param language is the language of the submission.
* @param errorCollector is the interface for error reporting.
*/
public Submission(String name, File submissionRootFile, Collection<File> files, Language language, ErrorCollector errorCollector) {
public Submission(String name, File submissionRootFile, boolean isNew, Collection<File> files, Language language, ErrorCollector errorCollector) {
this.name = name;
this.submissionRootFile = submissionRootFile;
this.isNew = isNew;
this.files = files;
this.language = language;
this.errorCollector = errorCollector;
Expand Down Expand Up @@ -96,6 +103,13 @@ public File getRoot() {
return submissionRootFile;
}

/**
* @return whether the submission is new, That is, must be checked for plagiarism.
*/
public boolean isNew() {
return isNew;
}

/**
* @return Number of tokens in the parse result.
*/
Expand Down
67 changes: 53 additions & 14 deletions jplag/src/main/java/de/jplag/SubmissionSetBuilder.java
Original file line number Diff line number Diff line change
Expand Up @@ -51,19 +51,25 @@ public SubmissionSetBuilder(Language language, JPlagOptions options, ErrorCollec
* @throws ExitException if the directory cannot be read.
*/
public SubmissionSet buildSubmissionSet() throws ExitException {
Set<File> rootDirectoryNames = verifyRootDirectories(options.getRootDirectoryNames());
Set<File> submissionDirectories = verifyRootDirectories(options.getSubmissionDirectories(), true);
Set<File> oldSubmissionDirectories = verifyRootDirectories(options.getOldSubmissionDirectories(), false);
checkForNonOverlappingRootDirectories(submissionDirectories, oldSubmissionDirectories);

// For backward compatibility, don't prefix submission names with their root directory
// if there is only one root directory.
boolean multipleRoots = (rootDirectoryNames.size() > 1);
int numberOfRootDirectories = submissionDirectories.size() + oldSubmissionDirectories.size();
boolean multipleRoots = (numberOfRootDirectories > 1);

// Collect valid looking entries from the root directories.
Map<File, Submission> foundSubmissions = new HashMap<>();
for (File rootDirectory : rootDirectoryNames) {
processRootDirectoryEntries(rootDirectory, multipleRoots, foundSubmissions);
for (File directory : submissionDirectories) {
processRootDirectoryEntries(directory, multipleRoots, foundSubmissions, true);
}
for (File oldDirectory : oldSubmissionDirectories) {
processRootDirectoryEntries(oldDirectory, multipleRoots, foundSubmissions, false);
}

Optional<Submission> baseCodeSubmission = loadBaseCode(rootDirectoryNames, foundSubmissions);
Optional<Submission> baseCodeSubmission = loadBaseCode(submissionDirectories, oldSubmissionDirectories, foundSubmissions);

// Merge everything in a submission set.
List<Submission> submissions = new ArrayList<>(foundSubmissions.values());
Expand All @@ -73,7 +79,11 @@ public SubmissionSet buildSubmissionSet() throws ExitException {
/**
* Verify that the given root directories exist and have no duplicate entries.
*/
private Set<File> verifyRootDirectories(List<String> rootDirectoryNames) throws ExitException {
private Set<File> verifyRootDirectories(List<String> rootDirectoryNames, boolean areNewDirectories) throws ExitException {
if (areNewDirectories && rootDirectoryNames.isEmpty()) {
throw new RootDirectoryException("No root directories specified with submissions to check for plagiarism!");
}

Set<File> canonicalRootDirectories = new HashSet<>(rootDirectoryNames.size());
for (String rootDirectoryName : rootDirectoryNames) {
File rootDirectory = new File(rootDirectoryName);
Expand All @@ -94,17 +104,43 @@ private Set<File> verifyRootDirectories(List<String> rootDirectoryNames) throws
return canonicalRootDirectories;
}

private Optional<Submission> loadBaseCode(Set<File> rootDirectories, Map<File, Submission> foundSubmissions) throws ExitException {
/**
* Verify that the new and old directory sets are disjunct and modify the old submissions set if necessary.
*/
private void checkForNonOverlappingRootDirectories(Set<File> submissionDirectories, Set<File> oldSubmissionDirectories) {

Set<File> commonRootdirectories = new HashSet<>(submissionDirectories);
commonRootdirectories.retainAll(oldSubmissionDirectories);
if (commonRootdirectories.isEmpty()) {
return;
}

// As old submission directories are only read while new submission directories are both read and checked, the
// former use can be removed without affecting the result of the checks.
oldSubmissionDirectories.removeAll(commonRootdirectories);
for (File rootDirectory : commonRootdirectories) {
System.out.println("Warning: Root directory \"" + rootDirectory.toString()
+ "\" is specified both for plagiarism checking and for prior submissions, will perform plagiarism checking only.");
}
}

private Optional<Submission> loadBaseCode(Set<File> submissionDirectories, Set<File> oldSubmissionDirectories,
Map<File, Submission> foundSubmissions) throws ExitException {
// Extract the basecode submission if necessary.
Optional<Submission> baseCodeSubmission = Optional.empty();
if (options.hasBaseCode()) {
String baseCodeName = options.getBaseCodeSubmissionName().get();
Submission baseCode = loadBaseCodeAsPath(baseCodeName);
if (baseCode == null) {
if (rootDirectories.size() > 1) {
int numberOfRootDirectories = submissionDirectories.size() + oldSubmissionDirectories.size();
if (numberOfRootDirectories > 1) {
throw new BasecodeException("The base code submission needs to be specified by path instead of by name!");
}
File rootDirectory = rootDirectories.iterator().next();

// There is one root directory, and the submissionDirectories variable has been checked to be non-empty.
// That set thus contains the the one and only root directory.
File rootDirectory = submissionDirectories.iterator().next();

// Single root-directory, try the legacy way of specifying basecode.
baseCode = loadBaseCodeViaName(baseCodeName, rootDirectory, foundSubmissions);
}
Expand Down Expand Up @@ -140,7 +176,7 @@ private Submission loadBaseCodeAsPath(String baseCodeName) throws ExitException
try {
// Use an unlikely short name for the base code. If all is well, this name should not appear
// in the output since basecode matches are removed from it
return processSubmission(basecodeSubmission.getName(), basecodeSubmission);
return processSubmission(basecodeSubmission.getName(), basecodeSubmission, false);
} catch (SubmissionException exception) {
throw new BasecodeException(exception.getMessage(), exception); // Change thrown exception to basecode exception.
}
Expand Down Expand Up @@ -228,10 +264,11 @@ private String isExcludedEntry(File submissionEntry) {
/**
* Process the given directory entry as a submission, the path MUST not be excluded.
* @param submissionFile the file for the submission.
* @param isNew states whether submissions found in the root directory must be checked for plagiarism.
* @return The entry converted to a submission.
* @throws ExitException when an error has been found with the entry.
*/
private Submission processSubmission(String submissionName, File submissionFile) throws ExitException {
private Submission processSubmission(String submissionName, File submissionFile, boolean isNew) throws ExitException {

if (submissionFile.isDirectory() && options.getSubdirectoryName() != null) {
// Use subdirectory instead
Expand All @@ -248,23 +285,25 @@ private Submission processSubmission(String submissionName, File submissionFile)
}

submissionFile = makeCanonical(submissionFile, it -> new SubmissionException("Cannot create submission: " + submissionName, it));
return new Submission(submissionName, submissionFile, parseFilesRecursively(submissionFile), language, errorCollector);
return new Submission(submissionName, submissionFile, isNew, parseFilesRecursively(submissionFile), language, errorCollector);
}

/**
* Process entries in the root directory to check whether they qualify as submissions.
* @param rootDirectory is the root directory being examined.
* @param foundSubmissions Submissions found so far, is updated in-place.
* @param isNew states whether submissions found in the root directory must be checked for plagiarism.
*/
private void processRootDirectoryEntries(File rootDirectory, boolean multipleRoots, Map<File, Submission> foundSubmissions) throws ExitException {
private void processRootDirectoryEntries(File rootDirectory, boolean multipleRoots, Map<File, Submission> foundSubmissions, boolean isNew)
throws ExitException {
for (String fileName : listSubmissionFiles(rootDirectory)) {
File submissionFile = new File(rootDirectory, fileName);

String errorMessage = isExcludedEntry(submissionFile);
if (errorMessage == null) {
String rootDirectoryPrefix = multipleRoots ? (rootDirectory.getName() + File.separator) : "";
String submissionName = rootDirectoryPrefix + fileName;
Submission submission = processSubmission(submissionName, submissionFile);
Submission submission = processSubmission(submissionName, submissionFile, isNew);
foundSubmissions.put(submission.getRoot(), submission);
} else {
System.out.println(errorMessage);
Expand Down
Loading

0 comments on commit fbb668a

Please sign in to comment.