ls1intum · krusche · Jun 4, 2024 · May 13, 2024 · May 31, 2024 · May 31, 2024
@@ -1,3 +1,5 @@
+.. _athena_service:
+
 Athena Service
 --------------
 

@@ -0,0 +1,61 @@
+.. _generation_of_assessment_suggestions_for_modeling_exercises :
+
+:orphan:
+
+Generation of Assessment Suggestions for Modeling Exercises
+===========================================================
+
+Suggestion Generation Process
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+This section intends to provide insights into how automated feedback suggestions are generated for modeling exercises using Athena.
+While Athena does support having various evaluation modules per exercise type, at the moment only one module exists for the evaluation of modeling exercises (``module_modeling_llm``).
+The following section therefore outlines the generation process implemented in this module which uses a Large Language Model (LLM) internally to generate feedback through the following process:
+
+1. **Feedback Request Reception:** Upon receiving a feedback request, the corresponding modeling submission is serialized into an appropriate exchange format depending on the diagram type.
+For BPMN diagrams, BPMN 2.0 XML is used as it is a commonly used exchange format for process models and proved to be well-understood by LLMs.
+IDs of diagram elements are shortened during serialization to minimize the token count of the input provided to the language model.
+
+2. **Prompt Input Collection:** The module gathers all required input to query the connected language model. This includes:
+
+- Number of points and bonus points achievable
+- Grading instructions
+- Problem statement
+- Explanation of the submission format
+- Optional example solution
+- Serialized submission
+
+3. **Prompt Template Filling:** The collected input is used to fill in the prompt template. If the prompt exceeds the language model's token limit, omittable features are removed in the following order: example solution, grading instructions, and problem statement.
+The system can still provide improvement suggestions without detailed grading instructions.
+
+4. **Token Limit Check:** Feedback generation is aborted if the prompt is still too long after removing omittable features.
+Otherwise, the prompt is executed on the connected language model.
+
+5. **Response Parsing:** The model's response is parsed into a dictionary representation.
+Feedback items are mapped back to their original element IDs, ensuring that the feedback suggestions can be attached to referenced elements in the original diagram.
+
+.. figure:: modeling/modeling-llm-activity.svg
+          :align: center
+
+Optimizing Exercises for Automated Assessment
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+A few best practices should be considered to get the best possible assessment suggestions for a modeling exercise.
+As the current version of the module for generating suggestions for modeling exercises is based on a large language model, when composing grading instructions for an exercise, it is advisable to follow similar strategies as for prompt engineering an LLM: https://platform.openai.com/docs/guides/prompt-engineering
+
+One of the strategies for optimizing the prompt results of an LLM is instructing the model as clearly as possible about the expected output of the task at hand.
+The following listing shows grading instructions for an exemplary BPMN process modeling exercise optimized for automatic assessment.
+The instructions explicitly list all aspects Athena should assess of and how credits should be assigned accordingly ensuring consistent suggestions across all submissions.
-The instructions explicitly list all aspects Athena should assess of and how credits should be assigned accordingly ensuring consistent suggestions across all submissions.
+The instructions explicitly list all aspects Athena should assess and how credits should be assigned accordingly, ensuring consistent suggestions across all submissions.
-The instructions explicitly list all aspects Athena should assess of and how credits should be assigned accordingly ensuring consistent suggestions across all submissions.
+The instructions explicitly list all aspects Athena should assess and how credits should be assigned accordingly, ensuring consistent suggestions across all submissions.
+
+.. code-block:: html
+
+    Evaluate the following 10 criteria:
+
+    1. Give 1 point if all elements described in the problem statement are present in the submission, 0 otherwise.
+    2. Give 1 point if the outgoing flows from an exclusive gateway are also labeled if there is more than one outgoing flow from the exclusive gateway, 0 otherwise.
+    3. Give 1 point if a start-event is present in the student's submission, 0 otherwise.
+    4. Give 1 point if an end-event is present in the student's submission, 0 otherwise.
+    5. Give 0 points if the activities in the diagram are not in the correct order according to the problem statement, 1 otherwise.
-    5. Give 0 points if the activities in the diagram are not in the correct order according to the problem statement, 1 otherwise.
+    5. Give 1 point if the activities in the diagram are in the correct order according to the problem statement, 0 otherwise.
-    5. Give 0 points if the activities in the diagram are not in the correct order according to the problem statement, 1 otherwise.
+    5. Give 1 point if the activities in the diagram are in the correct order according to the problem statement, 0 otherwise.
+    6. Give 1 point if all pools and swimlanes are labeled, 0 otherwise.
+    7. Give 1 point if the submission does not contain elements that are not described in the problem statement, 0 otherwise.
+    8. Give 1 point if all diagram elements are connected, 0 otherwise.
+    9. Give 1 point if all tasks are named in the "Verb Object"-format where a name consists of a verb followed by the object, 0 otherwise.
+    10. Give 1 point if no sequence flows connect elements in two different pools, 0 otherwise.
@@ -74,6 +74,7 @@ The following screenshot illustrates the first section of the form. It consists
 The following screenshot illustrates the second section of the form. It consists of:
 
 - **Enable automatic assessment suggestions**: When enabled, Artemis tries to automatically suggest assessments for diagram elements based on previously graded submissions for this exercise.
+- **Enable feedback suggestions from Athena**: When enabled, Artemis tries to automatically suggest assessments for diagram elements using the Athena service.
 - **Problem Statement**: The task description of the exercise as seen by students.
 - **Assessment Instructions**: Instructions for instructors while assessing the submission.
 
@@ -210,7 +211,40 @@ Once you're done assessing the solution, you can either:
 
 - Click on |exercise-dashboard-button| to navigate to exercise dashboard page.
 
+Automatic Assessment Suggestions
+--------------------------------
+If the checkbox ``Automatic assessment suggestions enabled`` is checked for a modeling exercise, Artemis generates assessment suggestions for submissions using the Athena service.
+This section provides insights into how suggestions are retrieved in Artemis and how to apply them in the exercise grading process.
 
+.. note::
+   To learn how to set up an instance of the Athena service and configure your Artemis installation accordingly, please refer to the section :ref:`Athena Service <athena_service>`.
+
+After clicking on |assess-submission| on one of the submission entries on the Submissions and Assessments Page, assessment suggestions are loaded automatically as indicated by the following loading indicator:
+
+.. figure:: modeling/assessment-suggestions-loading-indicator.png
+          :align: center
+          :scale: 50%
+
+Once assessment suggestions have been retrieved, a notice on top of the page indicates that the current submission contains assessment suggestions created via generative AI.
+
+.. figure:: modeling/assessment-suggestions-notice.png
+          :align: center
+
+The suggestions themselves are shown as follows. If a suggestion directly references a diagram element, a dialog showing the suggested grading score for this specific suggestion as well as a suggestion on what could be improved is attached to the corresponding element.
+In this example, a remark is made that an element is present in the evaluated BPMN diagram without being mentioned in the problem statement.
+
+.. figure:: modeling/referenced-assessment-suggestion.png
+          :align: center
+          :scale: 50%
+
+If a suggestion addresses a more general aspect of the diagram, multiple diagram elements at once, or elements that are missing from the diagram, the suggestion is shown in a card overview below the diagram.
+These unreferenced suggestions can be accepted or discarded via buttons on the individual suggestion cards.
+
+.. figure:: modeling/unreferenced-assessment-suggestion.png
+          :align: center
+          :scale: 50%
+
+To learn how automatic suggestions are generated and how exercises can be optimized for automatic evaluation, please refer to :ref:`Generation of Assessment Suggestions for Modeling Exercises<generation_of_assessment_suggestions_for_modeling_exercises>`.
 
 .. |edit| image:: modeling/edit.png
     :scale: 75