-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend JPlag with checking against prior submissions #289
Conversation
dc65a68
to
e94b599
Compare
Rebased. Note I have no way to test anymore.
|
Please give me a head-up when you reached this PR and you want a rebase for a review or merge. I don't get notifications when a conflict happens nor is it much use to rebase the patch only to find more conflicts a few weeks later. I'd rather handle all conflicts at the same time. |
@Alberth289346 now would be the time for conflict solving. Then we can review this PR and integrate it.
|
- New collection cannot be used from the interface yet.
e94b599
to
ec11c4c
Compare
Nice!
I can change to newer Java versions, the OS package manager gives me various javas. They just all don't work due to the pasted mvn crash above. The Internet suggests it's a bug in guice: https://stackoverflow.com/a/62809287 (trying to use 0 processors for computing doesn't quite work in Unix). As for the patch: I rebased, force-pushed (as my maven isn't fixed yet, I just tried both with java 16 and 17), found a modified name in the new report code due to failing the CI build, and fixed that in d367380 . One thing that could be wrong now is that reporting about comparison results may now pull in submissions from the |
…iarism-check folders.
After a night of sleep, I think the better solution is to merge all the root directories before giving them to the reporter. That should just work, while the user still has the benefit of a reduced number of plagiarism checks. Future work is perhaps to make the distinction between "new" and "old" work more clear in the report. |
Yes, that does sound like a good solution.
Of course, that is something we need to do, but that is not your responsibility. The new report view has a few things we need to optimize anyways. If we can provide the information required to differ between old and new in the result data structure (maybe a field in the submission class or similar), it would be ideal. However, we can always implement that later as well. |
Btw @Alberth289346 what is the output of |
In case you're not familiar with Linuxes,
#!/bin/sh
set -e -u -x
mvn clean package assembly:single Click to expand!# Available javas, didn't install 8 and 13.
$ apt list openjdk-\*-jdk | grep amd64
openjdk-11-jdk/focal-updates,focal-security,now 11.0.14.1+1-0ubuntu1~20.04 amd64 [installed]
openjdk-13-jdk/focal-updates 13.0.7+5-0ubuntu1~20.04 amd64
openjdk-16-jdk/focal-updates,focal-security,now 16.0.1+9-1~20.04 amd64 [installed]
openjdk-17-jdk/focal-updates,focal-security,now 17.0.2+8-1~20.04 amd64 [installed]
openjdk-8-jdk/focal-updates,focal-security 8u312-b07-0ubuntu1~20.04 amd64
# Available mavens.
$ apt list maven
maven/focal,focal,now 3.6.3-1 all [installed]
# and it's there.
$ mvn --version
Apache Maven 3.6.3
Maven home: /usr/share/maven
Java version: 17.0.2, vendor: Private Build, runtime: /usr/lib/jvm/java-17-openjdk-amd64
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.4.0-107-generic", arch: "amd64", family: "unix"
# Trying java 11:
$ sudo update-java-alternatives -s /usr/lib/jvm/java-1.11.0-openjdk-amd64
$ java -version
openjdk version "11.0.14.1" 2022-02-08
OpenJDK Runtime Environment (build 11.0.14.1+1-Ubuntu-0ubuntu1.20.04)
OpenJDK 64-Bit Server VM (build 11.0.14.1+1-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)
$ ../build.sh
+ mvn clean package assembly:single
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$1 (file:/usr/share/maven/lib/guice.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.google.inject.internal.cglib.core.$ReflectUtils$1
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
[INFO] Scanning for projects...
# Lots of log deleted here
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for JPlag Plagiarism Detector 4.0.0-SNAPSHOT:
[INFO]
[INFO] JPlag Plagiarism Detector .......................... SUCCESS [ 4.945 s]
[INFO] frontend-utils ..................................... FAILURE [ 1.155 s]
[INFO] chars .............................................. SKIPPED
[INFO] cpp ................................................ SKIPPED
[INFO] csharp-1.2 ......................................... SKIPPED
[INFO] java ............................................... SKIPPED
[INFO] python-3 ........................................... SKIPPED
[INFO] scheme ............................................. SKIPPED
[INFO] text ............................................... SKIPPED
[INFO] jplag .............................................. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 8.342 s
[INFO] Finished at: 2022-04-07T09:12:38+02:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project frontend-utils: Fatal error compiling: error: invalid target release: 14 -> [Help 1]
[ERROR] So Java 11 works, but the JPlag project doesn't like it. I worked around this for some time by taking out the However, as more and more 12+ java code is being added to the project, this doesn't work any more. Click to expand!# Trying java 16.
$ sudo update-java-alternatives -s /usr/lib/jvm/java-1.16.0-openjdk-amd64
$ + mvn clean package assembly:single
[ERROR] Error executing Maven.
[ERROR] java.lang.IllegalStateException: Unable to load cache item
[ERROR] Caused by: Unable to load cache item
[ERROR] Caused by: Could not initialize class com.google.inject.internal.cglib.core.$MethodWrapper
# Trying java 17.
$ sudo update-java-alternatives -s /usr/lib/jvm/java-1.17.0-openjdk-amd64
# Forgot that above, but java version does change.
$ java -version
openjdk version "17.0.2" 2022-01-18
OpenJDK Runtime Environment (build 17.0.2+8-Ubuntu-120.04)
OpenJDK 64-Bit Server VM (build 17.0.2+8-Ubuntu-120.04, mixed mode, sharing)
$ ./build.sh
+ mvn clean package assembly:single
[ERROR] Error executing Maven.
[ERROR] java.lang.IllegalStateException: Unable to load cache item
[ERROR] Caused by: Unable to load cache item
[ERROR] Caused by: Could not initialize class com.google.inject.internal.cglib.core.$MethodWrapper I am using an LTS distribution for my OS (don't want to do re-installs every 6 months), and it makes sense to me they then also pick an LTS java version (ie 11). Unfortunately, the JPlag project decided to use a newer Java. But the CI does run the tests afaik, and Github claims they are OK, so the code should be correct, wouldn't it? |
I do not use Linux frequently but I am vaguely familiar with it.
This is was I suspected. After a short google search I found some people claiming that the
That makes totally sense. However, as the Java frontend of JPlag is designed based on the JavaC API, it can only work with newer Java code if it is built with a newer java version. Thus, we are required to move on to newer Java versions more often than usually, as it is the only way to allow users to check the code of students of these newer Java versions. Regarding LTS versions, we are already planning to Java 17 soon, which then is again an LTS version.
Yes, but knowing that you cannot run your code locally means you cannot jar your feature version of JPlag and play around with it locally, thus allowing some more manual tests than the unit tests. This means I need to do this after the code review, which takes more time on my side (of which I have very little right now) and prolongs the process of merging the PR. This is why I was also interested in fixing your mvn issues. |
Two things that popped up in my mind:
Likely none of this will fly right now, but it could be a path into the (far?) future. |
Regarding 1: Yes, this might be possible, but I am not sure if it will work. Moreover, project restructurings like that do not come for free, so this is something that would need to be planned carefully (e.g. think of the CI pipeline to ensure consistency among the repos).
Yes, unfortunately, it does! For example, the tree scanner class receives additional methods for new language features. If you compile that with an older java version it will tell you that it cannot find this method in the super class tree scanner, as it was added with a newer java version. And that is only one example.
Yes, those are things we can certainly consider for long-term planning. Regarding this PR now, just tell me when it is ready to review. If you want to make some more changes that's fine too. |
You can review it when you have time. It just doesn't fly since you cannot specify more than 1 root directory currently due to #354 . EDIT: Basically it adds a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to find clear, concise, and consistent naming of concepts and use it throughout the code. See the comments for details.
Yep, that was the still open question that I discussed in my very first comment. As I already stated there, I agree we need consistent naming. I just didn't know the direction. In my view, the disadvantage of So it's a matter of opinion how good the |
Changed the names as you suggested. I purposely didn't reformat the code to avoid complicated diffs (it's an extensive change so I used a plain editor to get the same names changed in the same way across all files for different variables). The CI insists on reformatting apparently, but running spotless is non-fixable for me. It needs maven, so I switched to java 11, then it started barking about unsupported input in the frontend sources so I eventually just deleted all frontend source locally so it could do just jplag main code (where the changes are). Then it barked about unsupported input in the jplag code and I was out of options. I either have a running maven and no source code support, or a working spotless but not maven. |
Thank you for applying formatting. |
To be clear, I am done processing review comments. Please review the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I know the multi-root/archival features are currently not really compatible with the new report, but we will have a student working on fixing the new report viewer starting at the end of April. See #357, also feel free to comment there if something is missing. Thus I would say and merge this PR. This will prevent it from going out of date. Further improvements can always be made in separate PRs. I had no time yet to manually play around with the feature, but I will do that in the future.
@Alberth289346 are you okay with merging?
Yes, please merge so the feature is consolidated. I am done for now I think. I'll discuss internally how to proceed from here. I'll keep an eye on the project. For now, thanks for your time, assistance, and kind words and behavior, it was a pleasure working with you. Albert |
JPlag currently perform plagiarism checks for all combinations of submissions. However, as described in #49 and #91 there is a wish to extend checks with an additional set of "old" submissions. These old submissions are also used in the plagiarism checks, but checks between two old submission are skipped.
This improves speed of the checking process, and reduces clutter in the result, as plagiarism between old submissions is not reported.
This patch implements adding such an old submission set.
Notes:
In the code I opted for descriptive names
plagiarismCheckRootDirectories
andpriorSubmssionsRootDirectories
to distinguish both sets.At the command-line, I opted for
-new
and-old
, as that is reasonably short and clearly different from each other.For completeness the variants we considered were
Also there is -archive and -prior too from Archival submissions #49 and Added a -prior option. #91 .
I somewhat added -check and -prior as notions in the code, but since most users don't read source code I am not sure how relevant that is.
We should decide about these names, and adapt the code and/or options to that.
Clearly, code names and option names are not equal, not sure if that's desired. Programmers and program users are mostly disjoint groups of people.
#91 only added the -prior option, and adapted processing, so this PR can be closed I think.
#49 does additional HTML output things, such changes may need considering for the new HTML report backend.
#206 is implemented except for globbing support of root-directory paths.
EDIT: Not sure how badly we want globbing, but I think we can live without it for a while as the shell provides this functionality already.
#238 is my old discussion PR which is fully covered I think.
Closes #238
Adresses #49 #91 #206