Replace the outdated HTML report generation #192

tsaglam · 2021-10-12T09:25:24Z

Replace legacy report generation with a modern system
Generate reports that look cleaner and more modern
Should roughly cover the functionality of the legacy web report [internal link] (including clustering and max similarity)

sebinside · 2021-10-19T08:34:11Z

The current report generation generates HTML files by combining strings. Java 1.0 style, but considered bad practise for many years . Thus, I would propose the strict separation of machine-readable report data generation and GUI code to view those. 21b9bd3 prepares this by adding a Vue 3 boilerplate for a web-based report viewer. By updating the report generation, e.g., to a JSON-based format, we could separate generation and viewing and minimize dependencies.

To do so, I would propose the following steps:

Getting started with the report generation of JPlag
Identify all information from the plagiarism detection that is currently used in the HTML report generation
Identify which of this information is already accessible using the new Java API
Define new functionality to create JSON files with all required information on plagiarism (e.g., following the current format of one overview file and one file for each match, should be loaded dynamically. Don't forget to include a JPlag version flag)
Setup npm + vue 3 environment, e.g., by using vscode + ventur
Develop overview page with joined information of all plagiarism (including legacy views, e.g. histogram and also a possibility to anonymize)
Develop detailed viewer for different matches
Profit!

An alternative approach would be to create a new REST interface and to ask a (also not yet developed) JPlag server for details on demand. This would replace the need of JSON-file based generation but also require new interfaces. Also, JPlag is currently not used in a way that it runs in the background but to generate reports once and then stop again. This shall be discussed in advance. We use JSON files because the generation and viewing shall be two independend processes.

tsaglam · 2021-11-09T07:59:23Z

The max similarity should be already provided in the current result object. Thus it can be used in the new report generation to additionally display the top n matches according to max similarity.

cholakov11 · 2021-11-09T12:51:20Z

I have some questions regarding the old report and the API, since it is lacking a java doc (edit: java doc found).

What is the difference between program and submission, both are mention in the old html report?
What is the difference between getComparisons and getAllComparisons functions of a JPlagResult?
What is the difference between language and language option classes. I was able to find a couple of name representations of the language java9, JAVA_1_9, Javac 1.9+ based AST plugin, java19. Which one should be relevant for the report?
What are the language number of tokens? Are they relevant for the end report?
Is max number of matches, from the options class, all matches including those under the threshold?
In options we can set similarity metric with its threshold. In old report in section matches displayed in general info I can see two similarity metrics avg and max. Does this mean that the report is formed as a result of the combined info of two JPlagResult objects?
What is an exclusion file? There is getExclusionFileName function in options.
Is comparison mode (normal, parallel) relevant information for the end report?
If the submission contain more than one files, for example submission 1 has file1 and file2 and submission 2 has file1 and file2. A match object has only length and start of match line. How can we map a match to the correct files which are matching? There is a function file(integer) in comparison class. Is it related somehow and exactly for what it is there?

tsaglam · 2021-11-09T13:34:16Z

What is the difference between program and submission, both are mention in the old html report?

Program was the class for the main execution functionality in the legacy version of JPlag. In the current version, this class is called JPlag. Submission is still the same thing as it was: A single code submission of a student, containing one or more files (see the class diagram here).

What is the difference between getComparisons and getAllComparisons functions of a JPlagResult?

Please refer to the JavaDoc comments here.

What is the difference between language and language option classes. I was able to find a couple of name representations of the language java9, JAVA_1_9, Javac 1.9+ based AST plugin, java19. Which one should be relevant for the report?

For the report use the name of the Language. The language options tell JPlag what frontends exist, without requiring a dependency on every frontend. The frontends are instantiated via reflections. Also note that the names of the frontends changed during the major overhaul, thus they might vary in the legacy version.

What are the language number of tokens? Are they relevant for the end report?

I am not sure what you are referring to here, do you mean the minimum token match?

Is max number of matches, from the options class, all matches including those under the threshold?

No, that just specifies how many comparisons (matching submission pairs) are stored at max. For now, just use JPlagResult.getComparisons(). But you are right, we should rename the option to maximumNumberOfComparisons or something like that, as this would be a better description.

In options we can set similarity metric with its threshold. In old report in section matches displayed in general info I can see two similarity metrics avg and max. Does this mean that the report is formed as a result of the combined info of two JPlagResult objects?

~~I am not sure if the similarityMetric option is used at all, as the JPlagResult already provides the avg and max similarity via the comparisons. I think the option can be removed.~~ A report is generated for a single JPlagResult, however, it should allow listing submissions sorted after both max and avg similarity (see the legacy report).

EDIT: The similarityMetric option is used here and determines which comparisons are actually stored at all. Comparisons below that threshold never make it into the result object.

What is an exclusion file? There is getExclusionFileName function in options.

See here and -x here.

Is comparison mode (normal, parallel) relevant information for the end report?

No, but it could include what strategy was used, similarly as it shows how long the comparison took and other meta data.
However, that is not a high-priority feature.

If the submission contain more than one files, for example submission 1 has file1 and file2 and submission 2 has file1 and file2. A match object has only length and start of match line. How can we map a match to the correct files which are matching? There is a function file(integer) in comparison class. Is it related somehow and exactly for what it is there?

That is a good question, I would recommend you to look at the current web report generation, maybe you can find there how this is resolved.

cholakov11 · 2021-11-19T09:15:05Z

I inspected the old Web Report pages as well as the JPlag API. I am posting a file which contains a quick overview of what was included in the old report as well as what information can be obtained from a JPlagResult through the JPlag API. I have also defined some functional requirements for the new Web Report.
WebReportRequirementsV1.pdf

cholakov11 · 2021-11-19T09:22:37Z

Posting a first prototype of the new Web Report overview page. Design is not final, will undergo changes. To be used as starting point for discussions, ideas, etc.

cholakov11 · 2021-11-21T10:02:51Z

@tsaglam @sebinside I have some questions. So I took a second look at the old report from the report example, a report which I generated with the CLI and a report which I generated with the API and there are some differences.
I am attaching screenshots from all of them so it would be easier to observe and discuss. Through the current API the following information, which can be
seen in the report example, can't be accessed:

Names of failed submissions as well as their number cannot be obtained. In the report example their count and name were being displayed.
In the old report result of both AVG and MAX comparisons are showed. The CLI report also has results for both metrics. However when I run JPlag API I can only generate report for one metric. How is a JPlagResult for two or more metric generated? In JPlagResult object I can only access a single metric, only one threshold and a single distribution.
Suggestion: Can we store excluded file names directly in the JPlagResult as a list of strings? Currently the names are obtained by reading the exclusion file. This is being done one time when executing JPlag. In the report the file needs to be read again in order to obtain the names. If no file are being excluded then the JPlagResult could return simply an empty list.

Old Report Example:

CLI Report:

API Report

cholakov11 · 2021-11-21T19:12:12Z

Teaser, first jsons generated by the new reporting:
overview.txt
comparison1.txt

cholakov11 · 2021-11-22T18:44:10Z

Uploading fixed diagrams. Changes in class diagram:

Fixed class names in structure diagram
Introduced JPlagReport object which encapsulates the OverviewReport and ComparisonsReport

Changes in overview.json

Metrics are now an objects and are stored in overview as array of metrics rather than separate fields for each metric attribute

Changes in comparison.json

File code is now stored as an array of lines where each line of the code is an element in the array

Structure

Overview.json

Comparison.json

sebinside · 2021-11-23T08:21:09Z

@cholakov11
First of all, thank you for the comprehensive documentation of current and future functionality. The collection of functional requirements looks fine, I only would suggest one more feature: a button to toggle the visibility of names in the overview. This shall help Übungsleiter to show the results to plagiarising students without voiding anonymity of all others.

Regarding your questions:

Yes, you're right, this information is currently not stored, but probably useful in the report. We should consider adding this to the result object.
This is due to the major overhaul of Java API for JPlag #89 that does not consider other metrics than AVG in the JplagResult. We should also discuss how this can be added in the next version.
The suggestion looks fine to us, It's not only more useful but also better code style to read it once and store the information explicitly. However, we have to discuss the best location, which could also fit into the options.

The JSON file teasers and diagrams look promising, looking forward to seeing them in action!

tsaglam · 2021-11-23T12:17:52Z

Suggestion: Currently the names are obtained by reading the exclusion file. This is being done one time when executing JPlag. In the report, the file needs to be read again in order to obtain the names.

Multiple comments from me after reading it a second time:

Do you mean file names of submission names? A submission can contain multiple files. The Report shows valid/invalid submissions through the submission name
The exclude file specifies files to exclude, not necessarily submissions. However, if all files of a single submission are excluded, so is the submission itself.
Submissions are invalid for multiple reasons, currently, the information which submissions are invalid is not stored. However, we want to enable that in the future (see Streamline submission filtering and submission error handling #232) so you can expect to get a list of valid and invalid submissions in the future.

EDIT: I might have been wrong with the second point. This exclusion list is meant for folders and files in the root directory, thus excluding submissions directly. This is implemented in a confusing way.

sebinside · 2021-11-24T09:19:26Z

@cholakov11 I added a (private) repo with pseudonymized reports, which might help you think about large result sets (200-500 submissions).

cholakov11 · 2021-11-29T18:14:33Z

@sebinside Thanks for the update. @tsaglam

Do you mean file names of submission names? A submission can contain multiple files. The Report shows valid/invalid submissions through the submission name

What I meant was the file names. The exclusion files is read but then what is read is not stored and therefore I need to read it again in the when generating the report file, in order to obtain the names of the excluded files/folders.

Submissions are invalid for multiple reasons, currently, the information which submissions are invalid is not stored. However, we want to enable that in the future (see Streamline submission filtering and submission error handling #232) so you can expect to get a list of valid and invalid submissions in the future.

So for now I do not need to display the number of failed submissions and their names, is that right?

Sorry that I took so long to respond. I used the last week to get used to Typescript and Vue. Quick update on the project so far (screenshots are from working application and not prototype images) :

tsaglam · 2021-11-30T07:35:36Z

So for now I do not need to display the number of failed submissions and their names, is that right?

Yes, but you could make a placeholder for that information. Moreover, I think it would be helpful if you would make another issue (e.g. named "missing information in result object") where you document what you need for your report.
Then we can either add them to the result object or talk about how to obtain that information in another way.

sebinside · 2021-11-30T08:39:14Z

@cholakov11 The initial version looks good, thank you. We should discuss when and how the JSON files are loaded by the GUI which is also related to the deployment of the GUI files. I would propose to deliver them together with the JSON result files in a defined file/folder structure so that the reports can be loaded on startup. However, the drag and drop from the first picture is still a good idea as a fallback.

Where can I find the current version of your code? And did the vue 3 boilerplate I added to https://github.com/jplag/JPlag/tree/192-webreport help you?

Last, I would second the comment from @tsaglam, collecting these in a separate issue sounds good.

cholakov11 · 2021-12-02T09:03:58Z

@sebinside Here is the repo where I am working: https://github.com/cholakov11/JPlag/tree/192-webreport.

And did the vue 3 boilerplate I added to https://github.com/jplag/JPlag/tree/192-webreport help you?

Yes, I used the already created vue + typescript project in the repo.

@sebinside @tsaglam I submitted an issue about missing information.

cholakov11 · 2021-12-09T15:29:45Z

@tsaglam @sebinside Project update:
After some struggles with Vue I was able to produce a comparison page and update the overview page. The following screenshots show the new state of the report viewer.

Overview

Whole space on the right is reserved for comparisons.

Comparison View

In comparison view I am grouping the matches in file pairs. Then in the list on the left at the bottom the user can select which
pair of files (files containing matches with each other) wants to be displayed. Then the matches for this file pair are shown in the list and the files are loaded with colored matches.

If the view is too tight the user can hide the sidebar.

@sebinside I saw the repo with larger datasets but I cannot display these in the report viewer since they are csv and html files. Is it possible for me to obtain from somewhere folders with java files which I can feed to JPlag and produce the JSON files for the report viewer.

csidirop · 2021-12-10T08:22:55Z

From an "enduser" perspective this looks way cleaner and modern than what we have now. I have two questions:

Does your implementation allow the comparison of multiple submissions?
In the case of multiple files in one submission, are they aligned so that they do not start in different places, making comparison easier?

Apart from that consider changing the colorgradient for the report viewer. That looks really outdated like 2000s style. Maybe a clean white or that redish. And change the Logo to have transparent background. Here a quick and dirty edit:

sebinside · 2021-12-16T13:29:55Z

@cholakov11 Here are the simple test submissions I just mentioned: https://github.com/jplag/JPlag/tree/master/jplag/src/test/resources/de/jplag/samples/PartialPlagiarism

cholakov11 · 2021-12-20T10:30:36Z

@csidirop Thanks for the feedback. It was really helpful. @sebinside @tsaglam Project update:

All files of the submissions are not displayed in the comparison view as we discussed. I have added the option of rearranging the files order. This means that the file boxes are draggable and the user can move them, so they can, for example put two files next to each other. The Comparison page now takes parameters from the url before finding and reading the comparison file.
I also played with the color palette.

sebinside · 2022-01-07T09:57:45Z

@cholakov11 Looks good, looking forward to seeing it in action in our next meeting.

Two minor things:

I would suggest using some kind of gray as general colors for the boxes (e.g., in the comparisons but also as the base color for the table in the overview). This should enhance the overall readability and give you the possibility to color code other things, e.g., the percentages (green, yellow, red, etc.)
I prepared and attached a real transparent PNG of the JPlag logo for you to use, see below. If you need other resolutions, I will have a look into creating a SVG version 😄

sebinside · 2022-01-13T13:28:58Z

@cholakov11 The highlighting lib: https://highlightjs.org/

sebinside · 2022-01-29T13:37:32Z

@cholakov11 Just one note on the drag n drop feature for the hosted web viewer: Please ensure that everything is handled in-browser client-side. An actual data flow to our servers would probably violate some policies. See, e.g., : https://developer.mozilla.org/en-US/docs/Web/API/HTML_Drag_and_Drop_API/File_drag_and_drop

cholakov11 · 2022-02-09T15:03:50Z

@sebinside @tsaglam Initial pull request opened.

Replace the outdated HTML report generation #192

tsaglam added enhancement Issue/PR that involves features, improvements and other changes major Major issue/feature/contribution/change PISE-WS21/22 Tasks for the PISE practical course (WS21/22) labels Oct 12, 2021

tsaglam added this to the v3.1.0 milestone Oct 12, 2021

tsaglam mentioned this issue Oct 13, 2021

Clicking at the "arrows" in the generated html doesn't work at all. #186

Closed

sebinside added a commit that referenced this issue Oct 19, 2021

Add vue 3 boilerplate as preparation for #192

21b9bd3

tsaglam assigned cholakov11 Nov 8, 2021

tsaglam mentioned this issue Nov 9, 2021

Readd clustering and min/max/avg scores #116

Closed

3 tasks

tsaglam pinned this issue Nov 14, 2021

This comment has been minimized.

Sign in to view

sebinside mentioned this issue Jan 11, 2022

Enhance public appearance of JPlag #267

Open

10 tasks

tsaglam mentioned this issue Jan 19, 2022

Provide missing information in JPlagResult and JPlagOptions objects #273

Merged

tsaglam linked a pull request Feb 9, 2022 that will close this issue

Replace the outdated HTML report generation #192 #287

Merged

sebinside closed this as completed in #287 Mar 8, 2022

sebinside added a commit that referenced this issue Mar 8, 2022

Merge pull request #287 from cholakov11/192-webreport

bc0d9a9

Replace the outdated HTML report generation #192

sebinside unpinned this issue Mar 18, 2022

dfuchss mentioned this issue Mar 25, 2022

Feature: JPlag 4.0 ls1intum/Artemis#4861

Closed

sebinside mentioned this issue Apr 11, 2022

Enhance the new Report Viewer #357

Closed

30 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace the outdated HTML report generation #192

Replace the outdated HTML report generation #192

tsaglam commented Oct 12, 2021 •

edited

Loading

sebinside commented Oct 19, 2021 •

edited

Loading

tsaglam commented Nov 9, 2021

cholakov11 commented Nov 9, 2021 •

edited

Loading

tsaglam commented Nov 9, 2021 •

edited

Loading

cholakov11 commented Nov 19, 2021

cholakov11 commented Nov 19, 2021

This comment has been minimized.

cholakov11 commented Nov 21, 2021 •

edited

Loading

cholakov11 commented Nov 21, 2021

cholakov11 commented Nov 22, 2021

sebinside commented Nov 23, 2021 •

edited

Loading

tsaglam commented Nov 23, 2021 •

edited

Loading

sebinside commented Nov 24, 2021

cholakov11 commented Nov 29, 2021

tsaglam commented Nov 30, 2021

sebinside commented Nov 30, 2021

cholakov11 commented Dec 2, 2021 •

edited

Loading

cholakov11 commented Dec 9, 2021 •

edited

Loading

csidirop commented Dec 10, 2021

sebinside commented Dec 16, 2021

cholakov11 commented Dec 20, 2021

sebinside commented Jan 7, 2022

sebinside commented Jan 13, 2022

sebinside commented Jan 29, 2022

cholakov11 commented Feb 9, 2022

Replace the outdated HTML report generation #192

Replace the outdated HTML report generation #192

Comments

tsaglam commented Oct 12, 2021 • edited Loading

sebinside commented Oct 19, 2021 • edited Loading

tsaglam commented Nov 9, 2021

cholakov11 commented Nov 9, 2021 • edited Loading

tsaglam commented Nov 9, 2021 • edited Loading

cholakov11 commented Nov 19, 2021

cholakov11 commented Nov 19, 2021

This comment has been minimized.

cholakov11 commented Nov 21, 2021 • edited Loading

Old Report Example:

CLI Report:

API Report

cholakov11 commented Nov 21, 2021

cholakov11 commented Nov 22, 2021

Structure

Overview.json

Comparison.json

sebinside commented Nov 23, 2021 • edited Loading

tsaglam commented Nov 23, 2021 • edited Loading

sebinside commented Nov 24, 2021

cholakov11 commented Nov 29, 2021

tsaglam commented Nov 30, 2021

sebinside commented Nov 30, 2021

cholakov11 commented Dec 2, 2021 • edited Loading

cholakov11 commented Dec 9, 2021 • edited Loading

Overview

Comparison View

csidirop commented Dec 10, 2021

sebinside commented Dec 16, 2021

cholakov11 commented Dec 20, 2021

sebinside commented Jan 7, 2022

sebinside commented Jan 13, 2022

sebinside commented Jan 29, 2022

cholakov11 commented Feb 9, 2022

tsaglam commented Oct 12, 2021 •

edited

Loading

sebinside commented Oct 19, 2021 •

edited

Loading

cholakov11 commented Nov 9, 2021 •

edited

Loading

tsaglam commented Nov 9, 2021 •

edited

Loading

cholakov11 commented Nov 21, 2021 •

edited

Loading

sebinside commented Nov 23, 2021 •

edited

Loading

tsaglam commented Nov 23, 2021 •

edited

Loading

cholakov11 commented Dec 2, 2021 •

edited

Loading

cholakov11 commented Dec 9, 2021 •

edited

Loading