-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a reusable app for task review #1058
Conversation
9af5d0f
to
5a0fc23
Compare
from mephisto.data_model.unit import Unit | ||
|
||
|
||
class TasksWorkerUnitsView(MethodView): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty confusing name, unsure what it's getting at by reading it.
mephisto/client/review_app/server/api/views/units_approve_view.py
Outdated
Show resolved
Hide resolved
9297ed2
to
4c291e3
Compare
1f2621a
to
25e5ea1
Compare
25e5ea1
to
75bf0e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright I think a lot of the wiring and direction for the overall implementation makes sense and should be on-target for the final application.
The missing step is getting the custom review frontends working in a fairly low-effort and reusable way. I think iframe
s make sense as the underlying approach, but the way these are getting built and served at the moment is not what we're looking for.
At the heart, people writing Mephisto tasks should only really ever need to be making a TaskFrontend
component. Following this PR, the signature for this should be:
function TaskFrontend({
taskData, // contains either get_init_data() contents for a live task, or the ['inputs'] data field from a completed task. These should be equivalent.
finalResults = null, // contains null for annotation, and the ['outputs'] data field from a completed task
handleSubmit, // contains the function to trigger to submit final data during a task, null when reviewing a completed task.
... // other props, like remote procedure calls, error handling, etc.
}) {/* code that can show both an annotate or review mode based on props */}
(one could ask why we don't implement two separate views at this stage, and have the file export both, but ultimately some tasks are easy enough that the difference of having an immutable state for review and an editable state for the task could all be dealt with in the same code block, and this method doesn't preclude someone from doing if (finalResults !== null) {return <ReviewFrontend inputs={taskData} outputs={finalResults} />}
)
Now, with this TaskFrontend
implemented, Mephisto should have two primary build paths in the package's webpack
configuration. At the moment we have App.jsx
(and main.js
) which end up building an application for executing the task and collecting data. This should not need to change at all. To create a review bundle, instead we should have a ReviewApp.jsx
(and review.js
) file that get compiled/built separately. The basic flow for that file should be:
// Reviewapp.jsx
import React from "react";
import ReactDOM from "react-dom";
import {
TaskFrontend,
} from "./components/core_components.jsx";
function ReviewApp() {
... code required to receive data from the containing iframe ...
if (reviewData === null) {
return <loading indicator>
}
return <TaskFrontend initialTaskData={reviewData['inputs']} finalResults={reviewData['outputs']} />;
}
ReactDOM.render(<RemoteProcedureApp />, document.getElementById("app"));
Now all that remains is, upon launch (or somewhere inside a user's mephisto config files) we can associate /path/to/review/bundle.js
to the task-name
used to launch a task, and then in the iframe we can render a simple html page (like static/index.html
files used in Mephisto task webapps) that pulls the script from <review-app>/custom-review-bundles/<task-name>/bundle.js
, which the review server can route to the appropriate bundle.
examples/remote_procedure/mnist_for_review/custom-review/src/components/CollectionView.jsx
Outdated
Show resolved
Hide resolved
examples/remote_procedure/mnist_for_review/custom-review/sample-data.csv
Outdated
Show resolved
Hide resolved
examples/remote_procedure/mnist_for_review/custom-review/sample-data.jsonl
Outdated
Show resolved
Hide resolved
examples/remote_procedure/mnist_for_review/custom-review/README.md
Outdated
Show resolved
Hide resolved
@@ -141,7 +141,13 @@ function Instructions({ taskData }) { | |||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the end we should be moving any of the mnist_for_review
content into the mnist
dir when it's fully done, as this should be establishing a new standard.
examples/remote_procedure/mnist_for_review/webapp/webpack.config.js
Outdated
Show resolved
Hide resolved
mephisto/client/review_app/client/src/pages/TaskPage/TaskPage.tsx
Outdated
Show resolved
Hide resolved
mephisto/client/review_app/client/src/pages/TaskPage/TaskPage.tsx
Outdated
Show resolved
Hide resolved
mephisto/client/review_app/client/src/pages/TaskPage/TaskPage.tsx
Outdated
Show resolved
Hide resolved
a71256a
to
f938fbd
Compare
f938fbd
to
650326a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nearly ready, just a few last comments/notes.
Great to see all the tests as well here, especially being fixed. 1.2 is close!
@@ -1,11 +1,9 @@ | |||
<!--- | |||
Copyright (c) Meta Platforms and its affiliates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are licenses not required for .md
?
@@ -1,9 +1,3 @@ | |||
/* | |||
* Copyright (c) Meta Platforms and its affiliates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird license drops
@@ -1,9 +1,3 @@ | |||
/* | |||
* Copyright (c) Meta Platforms and its affiliates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📜
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated copyright headers (re-ran updated script, and it added a few more actually)
@@ -1,7 +1,2 @@ | |||
#!/bin/sh | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📜
annotation = data["outputs"]["final_submission"]["annotations"][0] | ||
annotation = data["final_submission"]["annotations"][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change intentional? I thought the standardization to having agent state's data contain 'inputs'
and 'outputs'
was part of the overall plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I had to fix that to get the Task Review front-end to work for mnist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should update the front-end to index into outputs rather than this, such that we can have a more standard format/expectations on where to find data across different AgentState
types: #1065
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out, I've updated this as well. I've missed screening flow when updating the code. As for the front-end, no changes needed - we either send the entire "task_data", or (as in TaskReviewApp) the UI is aware of "inputs" and "outputs".
This is a sample YAML configuration to run your Task on **AWS EC2** architect with **Prolific** provider | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still remains a combination that is only really relevant for FAIR. This should be an internal document on the wiki, as the advice is really quite good. For external, we'd probably still suggest heroku.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can surely move this to internal docs. The goal here was to provide a simple guide supporting a typical workflow for our FAIR team members. Will look for a good landing place for this instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maryam likely has good advice on where would be appropriate to store this documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved that part out of the readme
LICENSE file in the root directory of this source tree. | ||
--> | ||
|
||
# Mephisto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new contents in the readme overall are great, thanks for extending this with so much detail. (hopefully enough to get people over the docker hump).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without Docker it would quite painful to troubleshoot everyone's local environment :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - it's good to get people to adopt the most reproducible route, but it's hard to get people to change! In trying to get someone to adopt a new framework (Mephisto) over whatever they're doing before, it's important for adoption to also support something closer to a workflow your users are already using (notebooks, local environment, generally not docker).
@@ -610,10 +610,10 @@ def approve_work(client: MTurkClient, assignment_id: str, override_rejection: bo | |||
) | |||
|
|||
|
|||
def reject_work(client: MTurkClient, assignment_id: str, reason: str) -> None: | |||
def reject_work(client: MTurkClient, assignment_id: str, review_note: Optional[str] = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling this review_note
is a bit strange, as a note doesn't necessarily imply it will be given as feedback to the worker. In retrospect, reason
didn't really do this well either. Perhaps feedback
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct, I renamed it because it may or may not be sent to worker (and "feedback" implies that it does get sent).
In current Mturk implementation, review_note
is sent to worker if HIT was rejected; but reviewer may want to leave a note for their own records for an accepted HIT as well. In that case, review_note
will only be kept in datastore, and not sent to the worker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the issue is that you'd rather imply that it's sent than not. Imagine someone starts using this for local notes like "this task was garbage" and that ends up pushed to the worker.
Even better would be to have explicit labelling for whether the note is pushed or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better would be to have explicit labelling for whether the note is pushed or not.
That's a good idea, I'll add that option in the UI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added that option, and disabled it (along with bonusing) right away, because back-end is not ready for that yet. Once back-end logic is added (could be in v1.2.1), I'll re-enable the UI for it. So for now users can leave comment, and it will only be saved in the database.
def approve_work( | ||
self, | ||
review_note: Optional[str] = None, | ||
bonus: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having bonus
here is a bit strange, as it implies that adding a bonus
on this call will trigger bonusing the worker. (which I don't believe it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jack, I did face a dilemma here. By design, TaskReviewApp should allow user to bonus workers during task review. But by implementation, bonusing so far has been done via launching a script.
For now I picked a middle approach: I'm saving user-provided bonuses in the datastore, so that later a script could pick them up and issue these bonuses if needed. The bonusing script will need to be adjusted then (I didn't want to touch it as it's not really part of a review app).
We can go one step further, and properly grant bonuses with the provider right away inside the "approve_work" method. I can add that code for Prolific, and probably could do that for mturk as well.
IMO the best approach would be to support both scripted or "live" bonusing (that choice will be in TaskRun args). Bonus could be saved in the datastore, and we'd need to add an extra column "is_paid" so that live and script bonusing can work side-by-side as needed for a partuclar TaskRun.
Let me know which route makes most sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I've disabled bonusing, so we can re-enable it with the next patch release (As per #1058 (comment))
@@ -0,0 +1,84 @@ | |||
## Run TaskReview app | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good opportunity for a screenshot - demonstrates value of a UI more easily when it can be seen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can attach something there
This review app supports all functionality of the old command-line script (and adds some convenience), while providing a simple easy-to-use UI.
For guidelines on how to run the new Review App and make any Task code work with its interface, see
mephisto/client/review_app/README.md
.Main features: