Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a reusable app for task review #1058

Merged
merged 33 commits into from
Nov 30, 2023
Merged

Conversation

meta-paul
Copy link
Contributor

@meta-paul meta-paul commented Aug 25, 2023

This review app supports all functionality of the old command-line script (and adds some convenience), while providing a simple easy-to-use UI.

For guidelines on how to run the new Review App and make any Task code work with its interface, see mephisto/client/review_app/README.md.

Main features:

  • Intuitive UI that includes:
    • rendered display of the performed Task unit page
    • grouping unit results by worker, with an option of batch review
    • statistics of reviewed units, for better decision-making
  • Integration with any existing/future Task (see guidelines)
  • Recording of reviewer actions in Mephisto DB

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 25, 2023
@codecov-commenter
Copy link

codecov-commenter commented Aug 25, 2023

Codecov Report

Attention: 94 lines in your changes are missing coverage. Please review.

Comparison is base (ac52ad5) 60.36% compared to head (650326a) 62.62%.

Files Patch % Lines
...review_app/server/api/views/qualifications_view.py 69.04% 13 Missing ⚠️
...o/abstractions/providers/prolific/prolific_unit.py 0.00% 11 Missing ⚠️
.../abstractions/providers/prolific/prolific_utils.py 0.00% 10 Missing ⚠️
...o/client/review_app/server/api/views/stats_view.py 87.17% 10 Missing ⚠️
.../abstractions/providers/prolific/prolific_agent.py 25.00% 9 Missing ⚠️
...phisto/abstractions/providers/mturk/mturk_agent.py 25.00% 6 Missing ⚠️
mephisto/client/review_app/server/__init__.py 90.56% 5 Missing ⚠️
...review_app/server/api/views/qualify_worker_view.py 90.47% 4 Missing ⚠️
...o/client/review_app/server/api/views/units_view.py 90.24% 4 Missing ⚠️
...w_app/server/api/views/task_export_results_view.py 92.68% 3 Missing ⚠️
... and 11 more
Additional details and impacted files
@@             Coverage Diff              @@
##           v1.2-dev    #1058      +/-   ##
============================================
+ Coverage     60.36%   62.62%   +2.25%     
============================================
  Files           155      179      +24     
  Lines         11828    12594     +766     
============================================
+ Hits           7140     7887     +747     
- Misses         4688     4707      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@meta-paul meta-paul force-pushed the make-task-review-server branch from 9af5d0f to 5a0fc23 Compare September 13, 2023 14:28
mephisto/client/review_app/README.md Outdated Show resolved Hide resolved
mephisto/client/review_app/server/api/views/stats_view.py Outdated Show resolved Hide resolved
from mephisto.data_model.unit import Unit


class TasksWorkerUnitsView(MethodView):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty confusing name, unsure what it's getting at by reading it.

@JackUrb JackUrb changed the base branch from main to v1.2-dev September 14, 2023 20:26
@meta-paul meta-paul force-pushed the make-task-review-server branch from 9297ed2 to 4c291e3 Compare September 25, 2023 15:21
@meta-paul meta-paul force-pushed the make-task-review-server branch 4 times, most recently from 1f2621a to 25e5ea1 Compare September 26, 2023 00:23
@meta-paul meta-paul force-pushed the make-task-review-server branch from 25e5ea1 to 75bf0e3 Compare September 26, 2023 00:25
Copy link
Contributor

@JackUrb JackUrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I think a lot of the wiring and direction for the overall implementation makes sense and should be on-target for the final application.

The missing step is getting the custom review frontends working in a fairly low-effort and reusable way. I think iframes make sense as the underlying approach, but the way these are getting built and served at the moment is not what we're looking for.

At the heart, people writing Mephisto tasks should only really ever need to be making a TaskFrontend component. Following this PR, the signature for this should be:

function TaskFrontend({
  taskData, // contains either get_init_data() contents for a live task, or the ['inputs'] data field from a completed task. These should be equivalent.
  finalResults = null, // contains null for annotation, and the ['outputs'] data field from a completed task
  handleSubmit, // contains the function to trigger to submit final data during a task, null when reviewing a completed task.
  ... // other props, like remote procedure calls, error handling, etc.
}) {/* code that can show both an annotate or review mode based on props */}

(one could ask why we don't implement two separate views at this stage, and have the file export both, but ultimately some tasks are easy enough that the difference of having an immutable state for review and an editable state for the task could all be dealt with in the same code block, and this method doesn't preclude someone from doing if (finalResults !== null) {return <ReviewFrontend inputs={taskData} outputs={finalResults} />})

Now, with this TaskFrontend implemented, Mephisto should have two primary build paths in the package's webpack configuration. At the moment we have App.jsx (and main.js) which end up building an application for executing the task and collecting data. This should not need to change at all. To create a review bundle, instead we should have a ReviewApp.jsx (and review.js) file that get compiled/built separately. The basic flow for that file should be:

// Reviewapp.jsx
import React from "react";
import ReactDOM from "react-dom";
import {
  TaskFrontend,
} from "./components/core_components.jsx";

function ReviewApp() {
  ... code required to receive data from the containing iframe ...
  if (reviewData === null) {
    return <loading indicator>
  }
  return <TaskFrontend initialTaskData={reviewData['inputs']} finalResults={reviewData['outputs']} />;
}
ReactDOM.render(<RemoteProcedureApp />, document.getElementById("app"));

Now all that remains is, upon launch (or somewhere inside a user's mephisto config files) we can associate /path/to/review/bundle.js to the task-name used to launch a task, and then in the iframe we can render a simple html page (like static/index.html files used in Mephisto task webapps) that pulls the script from <review-app>/custom-review-bundles/<task-name>/bundle.js, which the review server can route to the appropriate bundle.

@@ -141,7 +141,13 @@ function Instructions({ taskData }) {
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the end we should be moving any of the mnist_for_review content into the mnist dir when it's fully done, as this should be establishing a new standard.

@meta-paul meta-paul force-pushed the make-task-review-server branch 26 times, most recently from a71256a to f938fbd Compare November 25, 2023 00:51
@meta-paul meta-paul force-pushed the make-task-review-server branch from f938fbd to 650326a Compare November 25, 2023 01:06
Copy link
Contributor

@JackUrb JackUrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nearly ready, just a few last comments/notes.

Great to see all the tests as well here, especially being fixed. 1.2 is close!

@@ -1,11 +1,9 @@
<!---
Copyright (c) Meta Platforms and its affiliates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are licenses not required for .md?

@@ -1,9 +1,3 @@
/*
* Copyright (c) Meta Platforms and its affiliates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird license drops

@@ -1,9 +1,3 @@
/*
* Copyright (c) Meta Platforms and its affiliates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📜

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated copyright headers (re-ran updated script, and it added a few more actually)

@@ -1,7 +1,2 @@
#!/bin/sh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📜

annotation = data["outputs"]["final_submission"]["annotations"][0]
annotation = data["final_submission"]["annotations"][0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change intentional? I thought the standardization to having agent state's data contain 'inputs' and 'outputs' was part of the overall plan.

Copy link
Contributor Author

@meta-paul meta-paul Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had to fix that to get the Task Review front-end to work for mnist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should update the front-end to index into outputs rather than this, such that we can have a more standard format/expectations on where to find data across different AgentState types: #1065

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, I've updated this as well. I've missed screening flow when updating the code. As for the front-end, no changes needed - we either send the entire "task_data", or (as in TaskReviewApp) the UI is aware of "inputs" and "outputs".

Comment on lines +186 to +187
This is a sample YAML configuration to run your Task on **AWS EC2** architect with **Prolific** provider

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still remains a combination that is only really relevant for FAIR. This should be an internal document on the wiki, as the advice is really quite good. For external, we'd probably still suggest heroku.

Copy link
Contributor Author

@meta-paul meta-paul Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can surely move this to internal docs. The goal here was to provide a simple guide supporting a typical workflow for our FAIR team members. Will look for a good landing place for this instruction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maryam likely has good advice on where would be appropriate to store this documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved that part out of the readme

Comment on lines 4 to -7
LICENSE file in the root directory of this source tree.
-->

# Mephisto
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new contents in the readme overall are great, thanks for extending this with so much detail. (hopefully enough to get people over the docker hump).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without Docker it would quite painful to troubleshoot everyone's local environment :D

Copy link
Contributor

@JackUrb JackUrb Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - it's good to get people to adopt the most reproducible route, but it's hard to get people to change! In trying to get someone to adopt a new framework (Mephisto) over whatever they're doing before, it's important for adoption to also support something closer to a workflow your users are already using (notebooks, local environment, generally not docker).

@@ -610,10 +610,10 @@ def approve_work(client: MTurkClient, assignment_id: str, override_rejection: bo
)


def reject_work(client: MTurkClient, assignment_id: str, reason: str) -> None:
def reject_work(client: MTurkClient, assignment_id: str, review_note: Optional[str] = None) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling this review_note is a bit strange, as a note doesn't necessarily imply it will be given as feedback to the worker. In retrospect, reason didn't really do this well either. Perhaps feedback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct, I renamed it because it may or may not be sent to worker (and "feedback" implies that it does get sent).

In current Mturk implementation, review_note is sent to worker if HIT was rejected; but reviewer may want to leave a note for their own records for an accepted HIT as well. In that case, review_note will only be kept in datastore, and not sent to the worker.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the issue is that you'd rather imply that it's sent than not. Imagine someone starts using this for local notes like "this task was garbage" and that ends up pushed to the worker.

Even better would be to have explicit labelling for whether the note is pushed or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better would be to have explicit labelling for whether the note is pushed or not.

That's a good idea, I'll add that option in the UI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added that option, and disabled it (along with bonusing) right away, because back-end is not ready for that yet. Once back-end logic is added (could be in v1.2.1), I'll re-enable the UI for it. So for now users can leave comment, and it will only be saved in the database.

def approve_work(
self,
review_note: Optional[str] = None,
bonus: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having bonus here is a bit strange, as it implies that adding a bonus on this call will trigger bonusing the worker. (which I don't believe it does.

Copy link
Contributor Author

@meta-paul meta-paul Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jack, I did face a dilemma here. By design, TaskReviewApp should allow user to bonus workers during task review. But by implementation, bonusing so far has been done via launching a script.

For now I picked a middle approach: I'm saving user-provided bonuses in the datastore, so that later a script could pick them up and issue these bonuses if needed. The bonusing script will need to be adjusted then (I didn't want to touch it as it's not really part of a review app).

We can go one step further, and properly grant bonuses with the provider right away inside the "approve_work" method. I can add that code for Prolific, and probably could do that for mturk as well.

IMO the best approach would be to support both scripted or "live" bonusing (that choice will be in TaskRun args). Bonus could be saved in the datastore, and we'd need to add an extra column "is_paid" so that live and script bonusing can work side-by-side as needed for a partuclar TaskRun.

Let me know which route makes most sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I've disabled bonusing, so we can re-enable it with the next patch release (As per #1058 (comment))

@@ -0,0 +1,84 @@
## Run TaskReview app


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good opportunity for a screenshot - demonstrates value of a UI more easily when it can be seen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can attach something there

@meta-paul meta-paul merged commit f26decb into v1.2-dev Nov 30, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants