Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the outdated HTML report generation #192

Closed
tsaglam opened this issue Oct 12, 2021 · 25 comments · Fixed by #287
Closed

Replace the outdated HTML report generation #192

tsaglam opened this issue Oct 12, 2021 · 25 comments · Fixed by #287
Assignees
Labels
enhancement Issue/PR that involves features, improvements and other changes major Major issue/feature/contribution/change PISE-WS21/22 Tasks for the PISE practical course (WS21/22)
Milestone

Comments

@tsaglam
Copy link
Member

tsaglam commented Oct 12, 2021

  • Replace legacy report generation with a modern system
  • Generate reports that look cleaner and more modern
  • Should roughly cover the functionality of the legacy web report [internal link] (including clustering and max similarity)
@tsaglam tsaglam added enhancement Issue/PR that involves features, improvements and other changes major Major issue/feature/contribution/change PISE-WS21/22 Tasks for the PISE practical course (WS21/22) labels Oct 12, 2021
@tsaglam tsaglam added this to the v3.1.0 milestone Oct 12, 2021
@sebinside
Copy link
Member

sebinside commented Oct 19, 2021

The current report generation generates HTML files by combining strings. Java 1.0 style, but considered bad practise for many years . Thus, I would propose the strict separation of machine-readable report data generation and GUI code to view those. 21b9bd3 prepares this by adding a Vue 3 boilerplate for a web-based report viewer. By updating the report generation, e.g., to a JSON-based format, we could separate generation and viewing and minimize dependencies.

To do so, I would propose the following steps:

  • Getting started with the report generation of JPlag
  • Identify all information from the plagiarism detection that is currently used in the HTML report generation
  • Identify which of this information is already accessible using the new Java API
  • Define new functionality to create JSON files with all required information on plagiarism (e.g., following the current format of one overview file and one file for each match, should be loaded dynamically. Don't forget to include a JPlag version flag)
  • Setup npm + vue 3 environment, e.g., by using vscode + ventur
  • Develop overview page with joined information of all plagiarism (including legacy views, e.g. histogram and also a possibility to anonymize)
  • Develop detailed viewer for different matches
  • Profit!

An alternative approach would be to create a new REST interface and to ask a (also not yet developed) JPlag server for details on demand. This would replace the need of JSON-file based generation but also require new interfaces. Also, JPlag is currently not used in a way that it runs in the background but to generate reports once and then stop again. This shall be discussed in advance. We use JSON files because the generation and viewing shall be two independend processes.

@tsaglam
Copy link
Member Author

tsaglam commented Nov 9, 2021

The max similarity should be already provided in the current result object. Thus it can be used in the new report generation to additionally display the top n matches according to max similarity.

@cholakov11
Copy link
Contributor

cholakov11 commented Nov 9, 2021

I have some questions regarding the old report and the API, since it is lacking a java doc (edit: java doc found).

  1. What is the difference between program and submission, both are mention in the old html report?
  2. What is the difference between getComparisons and getAllComparisons functions of a JPlagResult?
  3. What is the difference between language and language option classes. I was able to find a couple of name representations of the language java9, JAVA_1_9, Javac 1.9+ based AST plugin, java19. Which one should be relevant for the report?
  4. What are the language number of tokens? Are they relevant for the end report?
  5. Is max number of matches, from the options class, all matches including those under the threshold?
  6. In options we can set similarity metric with its threshold. In old report in section matches displayed in general info I can see two similarity metrics avg and max. Does this mean that the report is formed as a result of the combined info of two JPlagResult objects?
  7. What is an exclusion file? There is getExclusionFileName function in options.
  8. Is comparison mode (normal, parallel) relevant information for the end report?
  9. If the submission contain more than one files, for example submission 1 has file1 and file2 and submission 2 has file1 and file2. A match object has only length and start of match line. How can we map a match to the correct files which are matching? There is a function file(integer) in comparison class. Is it related somehow and exactly for what it is there?

@tsaglam
Copy link
Member Author

tsaglam commented Nov 9, 2021

  • What is the difference between program and submission, both are mention in the old html report?

Program was the class for the main execution functionality in the legacy version of JPlag. In the current version, this class is called JPlag. Submission is still the same thing as it was: A single code submission of a student, containing one or more files (see the class diagram here).

  • What is the difference between getComparisons and getAllComparisons functions of a JPlagResult?

Please refer to the JavaDoc comments here.

  • What is the difference between language and language option classes. I was able to find a couple of name representations of the language java9, JAVA_1_9, Javac 1.9+ based AST plugin, java19. Which one should be relevant for the report?

For the report use the name of the Language. The language options tell JPlag what frontends exist, without requiring a dependency on every frontend. The frontends are instantiated via reflections. Also note that the names of the frontends changed during the major overhaul, thus they might vary in the legacy version.

  • What are the language number of tokens? Are they relevant for the end report?

I am not sure what you are referring to here, do you mean the minimum token match?

  • Is max number of matches, from the options class, all matches including those under the threshold?

No, that just specifies how many comparisons (matching submission pairs) are stored at max. For now, just use JPlagResult.getComparisons(). But you are right, we should rename the option to maximumNumberOfComparisons or something like that, as this would be a better description.

  • In options we can set similarity metric with its threshold. In old report in section matches displayed in general info I can see two similarity metrics avg and max. Does this mean that the report is formed as a result of the combined info of two JPlagResult objects?

I am not sure if the similarityMetric option is used at all, as the JPlagResult already provides the avg and max similarity via the comparisons. I think the option can be removed. A report is generated for a single JPlagResult, however, it should allow listing submissions sorted after both max and avg similarity (see the legacy report).

EDIT: The similarityMetric option is used here and determines which comparisons are actually stored at all. Comparisons below that threshold never make it into the result object.

  • What is an exclusion file? There is getExclusionFileName function in options.

See here and -x here.

  • Is comparison mode (normal, parallel) relevant information for the end report?

No, but it could include what strategy was used, similarly as it shows how long the comparison took and other meta data.
However, that is not a high-priority feature.

  • If the submission contain more than one files, for example submission 1 has file1 and file2 and submission 2 has file1 and file2. A match object has only length and start of match line. How can we map a match to the correct files which are matching? There is a function file(integer) in comparison class. Is it related somehow and exactly for what it is there?

That is a good question, I would recommend you to look at the current web report generation, maybe you can find there how this is resolved.

@tsaglam tsaglam pinned this issue Nov 14, 2021
@cholakov11
Copy link
Contributor

I inspected the old Web Report pages as well as the JPlag API. I am posting a file which contains a quick overview of what was included in the old report as well as what information can be obtained from a JPlagResult through the JPlag API. I have also defined some functional requirements for the new Web Report.
WebReportRequirementsV1.pdf

@cholakov11
Copy link
Contributor

Posting a first prototype of the new Web Report overview page. Design is not final, will undergo changes. To be used as starting point for discussions, ideas, etc.
Screen 1@1x

@cholakov11

This comment has been minimized.

@cholakov11
Copy link
Contributor

cholakov11 commented Nov 21, 2021

@tsaglam @sebinside I have some questions. So I took a second look at the old report from the report example, a report which I generated with the CLI and a report which I generated with the API and there are some differences.
I am attaching screenshots from all of them so it would be easier to observe and discuss. Through the current API the following information, which can be
seen in the report example, can't be accessed:

  • Names of failed submissions as well as their number cannot be obtained. In the report example their count and name were being displayed.

  • In the old report result of both AVG and MAX comparisons are showed. The CLI report also has results for both metrics. However when I run JPlag API I can only generate report for one metric. How is a JPlagResult for two or more metric generated? In JPlagResult object I can only access a single metric, only one threshold and a single distribution.

  • Suggestion: Can we store excluded file names directly in the JPlagResult as a list of strings? Currently the names are obtained by reading the exclusion file. This is being done one time when executing JPlag. In the report the file needs to be read again in order to obtain the names. If no file are being excluded then the JPlagResult could return simply an empty list.

Old Report Example:
oldreportexample

CLI Report:
CLI_Report

API Report
API_Report

@cholakov11
Copy link
Contributor

Teaser, first jsons generated by the new reporting:
overview.txt
comparison1.txt

@cholakov11
Copy link
Contributor

Uploading fixed diagrams. Changes in class diagram:

  1. Fixed class names in structure diagram
  2. Introduced JPlagReport object which encapsulates the OverviewReport and ComparisonsReport

Changes in overview.json

  1. Metrics are now an objects and are stored in overview as array of metrics rather than separate fields for each metric attribute

Changes in comparison.json

  1. File code is now stored as an array of lines where each line of the code is an element in the array

Structure
scratch_diagramV2

Overview.json
overview json

Comparison.json
comparisonv2_json

@sebinside
Copy link
Member

sebinside commented Nov 23, 2021

@cholakov11
First of all, thank you for the comprehensive documentation of current and future functionality. The collection of functional requirements looks fine, I only would suggest one more feature: a button to toggle the visibility of names in the overview. This shall help Übungsleiter to show the results to plagiarising students without voiding anonymity of all others.

Regarding your questions:

  1. Yes, you're right, this information is currently not stored, but probably useful in the report. We should consider adding this to the result object.
  2. This is due to the major overhaul of Java API for JPlag #89 that does not consider other metrics than AVG in the JplagResult. We should also discuss how this can be added in the next version.
  3. The suggestion looks fine to us, It's not only more useful but also better code style to read it once and store the information explicitly. However, we have to discuss the best location, which could also fit into the options.

The JSON file teasers and diagrams look promising, looking forward to seeing them in action!

@tsaglam
Copy link
Member Author

tsaglam commented Nov 23, 2021

Suggestion: Currently the names are obtained by reading the exclusion file. This is being done one time when executing JPlag. In the report, the file needs to be read again in order to obtain the names.

Multiple comments from me after reading it a second time:

  • Do you mean file names of submission names? A submission can contain multiple files. The Report shows valid/invalid submissions through the submission name
  • The exclude file specifies files to exclude, not necessarily submissions. However, if all files of a single submission are excluded, so is the submission itself.
  • Submissions are invalid for multiple reasons, currently, the information which submissions are invalid is not stored. However, we want to enable that in the future (see Streamline submission filtering and submission error handling #232) so you can expect to get a list of valid and invalid submissions in the future.

EDIT: I might have been wrong with the second point. This exclusion list is meant for folders and files in the root directory, thus excluding submissions directly. This is implemented in a confusing way.

@sebinside
Copy link
Member

@cholakov11 I added a (private) repo with pseudonymized reports, which might help you think about large result sets (200-500 submissions).

@cholakov11
Copy link
Contributor

@sebinside Thanks for the update. @tsaglam

Do you mean file names of submission names? A submission can contain multiple files. The Report shows valid/invalid submissions through the submission name

What I meant was the file names. The exclusion files is read but then what is read is not stored and therefore I need to read it again in the when generating the report file, in order to obtain the names of the excluded files/folders.

Submissions are invalid for multiple reasons, currently, the information which submissions are invalid is not stored. However, we want to enable that in the future (see Streamline submission filtering and submission error handling #232) so you can expect to get a list of valid and invalid submissions in the future.

So for now I do not need to display the number of failed submissions and their names, is that right?

Sorry that I took so long to respond. I used the last week to get used to Typescript and Vue. Quick update on the project so far (screenshots are from working application and not prototype images) :
screnshot_start_page
screnshot_overview_page

@tsaglam
Copy link
Member Author

tsaglam commented Nov 30, 2021

So for now I do not need to display the number of failed submissions and their names, is that right?

Yes, but you could make a placeholder for that information. Moreover, I think it would be helpful if you would make another issue (e.g. named "missing information in result object") where you document what you need for your report.
Then we can either add them to the result object or talk about how to obtain that information in another way.

@sebinside
Copy link
Member

@cholakov11 The initial version looks good, thank you. We should discuss when and how the JSON files are loaded by the GUI which is also related to the deployment of the GUI files. I would propose to deliver them together with the JSON result files in a defined file/folder structure so that the reports can be loaded on startup. However, the drag and drop from the first picture is still a good idea as a fallback.

Where can I find the current version of your code? And did the vue 3 boilerplate I added to https://github.com/jplag/JPlag/tree/192-webreport help you?

Last, I would second the comment from @tsaglam, collecting these in a separate issue sounds good.

@cholakov11
Copy link
Contributor

cholakov11 commented Dec 2, 2021

@sebinside Here is the repo where I am working: https://github.com/cholakov11/JPlag/tree/192-webreport.

And did the vue 3 boilerplate I added to https://github.com/jplag/JPlag/tree/192-webreport help you?

Yes, I used the already created vue + typescript project in the repo.

@sebinside @tsaglam I submitted an issue about missing information.

@cholakov11
Copy link
Contributor

cholakov11 commented Dec 9, 2021

@tsaglam @sebinside Project update:
After some struggles with Vue I was able to produce a comparison page and update the overview page. The following screenshots show the new state of the report viewer.

Overview

Whole space on the right is reserved for comparisons.
overview2

Comparison View

In comparison view I am grouping the matches in file pairs. Then in the list on the left at the bottom the user can select which
pair of files (files containing matches with each other) wants to be displayed. Then the matches for this file pair are shown in the list and the files are loaded with colored matches.
comparisonview2
If the view is too tight the user can hide the sidebar.
comparisonview2-sidebarhidden

@sebinside I saw the repo with larger datasets but I cannot display these in the report viewer since they are csv and html files. Is it possible for me to obtain from somewhere folders with java files which I can feed to JPlag and produce the JSON files for the report viewer.

@csidirop
Copy link

From an "enduser" perspective this looks way cleaner and modern than what we have now. I have two questions:

  1. Does your implementation allow the comparison of multiple submissions?
  2. In the case of multiple files in one submission, are they aligned so that they do not start in different places, making comparison easier?
    grafik

Apart from that consider changing the colorgradient for the report viewer. That looks really outdated like 2000s style. Maybe a clean white or that redish. And change the Logo to have transparent background. Here a quick and dirty edit:
grafik

@sebinside
Copy link
Member

@cholakov11
Copy link
Contributor

@csidirop Thanks for the feedback. It was really helpful. @sebinside @tsaglam Project update:
overview3
overview3-additional-info
comparisonview3-all-files
comparisonview3-opened-files
comparisonview3-sidebar
All files of the submissions are not displayed in the comparison view as we discussed. I have added the option of rearranging the files order. This means that the file boxes are draggable and the user can move them, so they can, for example put two files next to each other. The Comparison page now takes parameters from the url before finding and reading the comparison file.
I also played with the color palette.

@sebinside
Copy link
Member

@cholakov11 Looks good, looking forward to seeing it in action in our next meeting.

Two minor things:

  • I would suggest using some kind of gray as general colors for the boxes (e.g., in the comparisons but also as the base color for the table in the overview). This should enhance the overall readability and give you the possibility to color code other things, e.g., the percentages (green, yellow, red, etc.)
  • I prepared and attached a real transparent PNG of the JPlag logo for you to use, see below. If you need other resolutions, I will have a look into creating a SVG version 😄

jplag-transparent

@sebinside
Copy link
Member

@cholakov11 The highlighting lib: https://highlightjs.org/

@sebinside
Copy link
Member

@cholakov11 Just one note on the drag n drop feature for the hosted web viewer: Please ensure that everything is handled in-browser client-side. An actual data flow to our servers would probably violate some policies. See, e.g., : https://developer.mozilla.org/en-US/docs/Web/API/HTML_Drag_and_Drop_API/File_drag_and_drop

@cholakov11
Copy link
Contributor

@sebinside @tsaglam Initial pull request opened.

@tsaglam tsaglam linked a pull request Feb 9, 2022 that will close this issue
sebinside added a commit that referenced this issue Mar 8, 2022
Replace the outdated HTML report generation #192
@sebinside sebinside unpinned this issue Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issue/PR that involves features, improvements and other changes major Major issue/feature/contribution/change PISE-WS21/22 Tasks for the PISE practical course (WS21/22)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants