Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development: Provide documentation on how to use automatic assessment suggestions for modeling exercises #8704

Merged
merged 11 commits into from
Jun 4, 2024
Merged
2 changes: 2 additions & 0 deletions docs/admin/setup/athena.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _athena_service:

Athena Service
--------------

Expand Down
61 changes: 61 additions & 0 deletions docs/user/exercises/modeling-exercise-automatic-assessment.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
.. _generation_of_assessment_suggestions_for_modeling_exercises :

:orphan:

Generation of Assessment Suggestions for Modeling Exercises
===========================================================

Suggestion Generation Process
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This section intends to provide insights into how automated feedback suggestions are generated for modeling exercises using Athena.
While Athena does support having various evaluation modules per exercise type, at the moment only one module exists for the evaluation of modeling exercises (``module_modeling_llm``).
The following section therefore outlines the generation process implemented in this module which uses a Large Language Model (LLM) internally to generate feedback through the following process:

1. **Feedback Request Reception:** Upon receiving a feedback request, the corresponding modeling submission is serialized into an appropriate exchange format depending on the diagram type.
For BPMN diagrams, BPMN 2.0 XML is used as it is a commonly used exchange format for process models and proved to be well-understood by LLMs.
IDs of diagram elements are shortened during serialization to minimize the token count of the input provided to the language model.

2. **Prompt Input Collection:** The module gathers all required input to query the connected language model. This includes:

- Number of points and bonus points achievable
- Grading instructions
- Problem statement
- Explanation of the submission format
- Optional example solution
- Serialized submission

3. **Prompt Template Filling:** The collected input is used to fill in the prompt template. If the prompt exceeds the language model's token limit, omittable features are removed in the following order: example solution, grading instructions, and problem statement.
The system can still provide improvement suggestions without detailed grading instructions.

4. **Token Limit Check:** Feedback generation is aborted if the prompt is still too long after removing omittable features.
matthiaslehnertum marked this conversation as resolved.
Show resolved Hide resolved
Otherwise, the prompt is executed on the connected language model.

5. **Response Parsing:** The model's response is parsed into a dictionary representation.
Feedback items are mapped back to their original element IDs, ensuring that the feedback suggestions can be attached to referenced elements in the original diagram.

.. figure:: modeling/modeling-llm-activity.svg
:align: center

Optimizing Exercises for Automated Assessment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A few best practices should be considered to get the best possible assessment suggestions for a modeling exercise.
As the current version of the module for generating suggestions for modeling exercises is based on a large language model, when composing grading instructions for an exercise, it is advisable to follow similar strategies as for prompt engineering an LLM: https://platform.openai.com/docs/guides/prompt-engineering

One of the strategies for optimizing the prompt results of an LLM is instructing the model as clearly as possible about the expected output of the task at hand.
The following listing shows grading instructions for an exemplary BPMN process modeling exercise optimized for automatic assessment.
The instructions explicitly list all aspects Athena should assess of and how credits should be assigned accordingly ensuring consistent suggestions across all submissions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The instructions explicitly list all aspects Athena should assess of and how credits should be assigned accordingly ensuring consistent suggestions across all submissions.
The instructions explicitly list all aspects Athena should assess and how credits should be assigned accordingly, ensuring consistent suggestions across all submissions.


.. code-block:: html

Evaluate the following 10 criteria:

1. Give 1 point if all elements described in the problem statement are present in the submission, 0 otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the precise number of points depend on the exercise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does still depend on the exercise - but matching the number of points assigned for the various grading instructions with the number of points achievable in the exercise helps to decrease the variance in the number of points assigned :)

2. Give 1 point if the outgoing flows from an exclusive gateway are also labeled if there is more than one outgoing flow from the exclusive gateway, 0 otherwise.
3. Give 1 point if a start-event is present in the student's submission, 0 otherwise.
4. Give 1 point if an end-event is present in the student's submission, 0 otherwise.
5. Give 0 points if the activities in the diagram are not in the correct order according to the problem statement, 1 otherwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. Give 0 points if the activities in the diagram are not in the correct order according to the problem statement, 1 otherwise.
5. Give 1 point if the activities in the diagram are in the correct order according to the problem statement, 0 otherwise.

6. Give 1 point if all pools and swimlanes are labeled, 0 otherwise.
7. Give 1 point if the submission does not contain elements that are not described in the problem statement, 0 otherwise.
8. Give 1 point if all diagram elements are connected, 0 otherwise.
9. Give 1 point if all tasks are named in the "Verb Object"-format where a name consists of a verb followed by the object, 0 otherwise.
10. Give 1 point if no sequence flows connect elements in two different pools, 0 otherwise.
matthiaslehnertum marked this conversation as resolved.
Show resolved Hide resolved
34 changes: 34 additions & 0 deletions docs/user/exercises/modeling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ The following screenshot illustrates the first section of the form. It consists
The following screenshot illustrates the second section of the form. It consists of:

- **Enable automatic assessment suggestions**: When enabled, Artemis tries to automatically suggest assessments for diagram elements based on previously graded submissions for this exercise.
- **Enable feedback suggestions from Athena**: When enabled, Artemis tries to automatically suggest assessments for diagram elements using the Athena service.
matthiaslehnertum marked this conversation as resolved.
Show resolved Hide resolved
- **Problem Statement**: The task description of the exercise as seen by students.
- **Assessment Instructions**: Instructions for instructors while assessing the submission.

Expand Down Expand Up @@ -210,7 +211,40 @@ Once you're done assessing the solution, you can either:

- Click on |exercise-dashboard-button| to navigate to exercise dashboard page.

Automatic Assessment Suggestions
--------------------------------
If the checkbox ``Automatic assessment suggestions enabled`` is checked for a modeling exercise, Artemis generates assessment suggestions for submissions using the Athena service.
This section provides insights into how suggestions are retrieved in Artemis and how to apply them in the exercise grading process.

.. note::
To learn how to set up an instance of the Athena service and configure your Artemis installation accordingly, please refer to the section :ref:`Athena Service <athena_service>`.

After clicking on |assess-submission| on one of the submission entries on the Submissions and Assessments Page, assessment suggestions are loaded automatically as indicated by the following loading indicator:

.. figure:: modeling/assessment-suggestions-loading-indicator.png
:align: center
:scale: 50%

Once assessment suggestions have been retrieved, a notice on top of the page indicates that the current submission contains assessment suggestions created via generative AI.

.. figure:: modeling/assessment-suggestions-notice.png
:align: center

The suggestions themselves are shown as follows. If a suggestion directly references a diagram element, a dialog showing the suggested grading score for this specific suggestion as well as a suggestion on what could be improved is attached to the corresponding element.
In this example, a remark is made that an element is present in the evaluated BPMN diagram without being mentioned in the problem statement.

.. figure:: modeling/referenced-assessment-suggestion.png
:align: center
:scale: 50%

If a suggestion addresses a more general aspect of the diagram, multiple diagram elements at once, or elements that are missing from the diagram, the suggestion is shown in a card overview below the diagram.
These unreferenced suggestions can be accepted or discarded via buttons on the individual suggestion cards.

.. figure:: modeling/unreferenced-assessment-suggestion.png
:align: center
:scale: 50%

To learn how automatic suggestions are generated and how exercises can be optimized for automatic evaluation, please refer to :ref:`Generation of Assessment Suggestions for Modeling Exercises<generation_of_assessment_suggestions_for_modeling_exercises>`.

.. |edit| image:: modeling/edit.png
:scale: 75
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions docs/user/exercises/modeling/modeling-llm-activity.svg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to have a background to this model. If dark mode users are reading the documentation, the model would be very hard to read.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading