Skip to content

Commit

Permalink
Full extended docs for every core directory in Mephisto (#316)
Browse files Browse the repository at this point in the history
* moving around the stuff from data_model and webapp/client

* Migrating crowd providers

* Moving script utils

* Migrating architects

* Forgot to update channel callsites

* Moving blueprints

* Migrating core

* Mass delete redirection files

* Accidentally deleted too much in the cli

* weird merge output

* import fix

* Update core directory readme

* Documentation for tools

* script docs

* Operations read me

* Moving testing folders

* data model docs

* Abstrastion readmes

* Curse of not pressing ctrl-s

* address comments
  • Loading branch information
JackUrb authored Dec 1, 2020
1 parent c10cdc4 commit b23d5b5
Show file tree
Hide file tree
Showing 30 changed files with 201 additions and 81 deletions.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ We actively welcome your pull requests.
6. If you haven't already, complete the Contributor License Agreement ("CLA").

### Task Contributions
TODO TODO TODO
Generally we encourage people to provide their own blueprints as part of the repo in which they release their code, though if someone creates a strong case for an abstract `Blueprint` that is generally applicable we'd be happy to review it.

## Contributor License Agreement ("CLA")
In order to accept your pull request, we need you to submit a CLA. You only need
Expand Down
21 changes: 7 additions & 14 deletions mephisto/README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,9 @@
# Mephisto
This is the main package directory, containing all of the core workings of Mephisto. The breakdown is as following:
This is the main package directory, containing all of the core workings of Mephisto. They roughly follow the divisions noted in the [architecture overview doc](https://github.com/facebookresearch/Mephisto/blob/master/docs/architecture_overview.md#agent). The breakdown is as following:

- `client`: Contains interfaces for using Mephisto at a very high level. Primarily comprised of the python code for the cli and
- `core`: Contains components that operate on top of the data_model layer
- `data_model`: Contains the data model components as described in the architecture document, as well as the base classes for all the core abstractions.
- `providers`: contains implementations of the `CrowdProvider` abstraction
- `scripts`: contains commonly executed convenience scripts for Mephisto users
- `server`: contains implementations of the `Architect` and `Blueprint` abstractions.
- `tasks`: an empty default directory to work on your own tasks
- `utils`: unorganized utility classes that are useful in scripts and other places
- `webapp`: contains the frontend that is deployed by the main client

## Discussions

Changes to this structure for clarity are being discussed in [#285](https://github.com/facebookresearch/Mephisto/issues/285).
- `abstractions`: Contains the interface classes for the core abstractions in Mephisto, as well as implementations of those interfaces. These are the Architects, Blueprints, Crowd Providers, and Databases.
- `client`: Contains user interfaces for using Mephisto at a very high level. Primarily comprised of the python code for the cli and the web views.
- `data_model`: Contains the data model components as described in the architecture document. These are the relevant data structures that build upon the underlying MephistoDB, and are utilized throughout the Mephisto codebase.
- `operations`: Contains low-level operational code that performs more complex functionality on top of the Mephisto data model.
- `scripts`: Contains commonly executed convenience scripts for Mephisto users.
- `tools`: Contains helper methods and modules that allow for lower-level access to the Mephisto data model than the clients provide. Useful for creating custom workflows and scripts that are built on Mephisto.
18 changes: 18 additions & 0 deletions mephisto/abstractions/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Mephisto Core Abstractions
This directory contains the interfaces for the four core Mephisto abstractions (as well as subcomponents of those abstractions). Those abstractions are discussed at a high level in the [architecture overvierw doc](https://github.com/facebookresearch/Mephisto/blob/master/docs/architecture_overview.md).

Specific implementations can be made to extend the Mephisto data model to work with new crowd providers, new task types, and new backend server architectures. These four primary abstractions are summarized below, but other sections go more in-depth.

### `Architect`
An [`Architect`](https://github.com/facebookresearch/Mephisto/blob/master/mephisto/abstractions/architects/README.md#architect) is an abstraction that allows Mephisto to manage setup and maintenance of task servers for you. When launching a task, Mephisto uses an `Architect` to build required server files, launch that server, deploy the task files, and then later shut it down when the task is complete. More details are found in the `abstractions/architects` folder, along with the existing `Architects`.

Architects also require a `Channel` to allow the `Supervisor` to communicate with the server, and are expected to define their own or select a compatible one from the ones already present.

### `Blueprint`
A [`Blueprint`](https://github.com/facebookresearch/Mephisto/blob/master/mephisto/abstractions/blueprints/README.md#overview) is the essential formula for running a task on Mephisto. It accepts some number of parameters and input data, and that should be sufficient content to be able to display a frontend to the crowdworker, process their responses, and then save them somewhere. It comprises of extensions of the `AgentState` (data storage), `TaskRunner` (actual steps to complete the task), and `TaskBuilder` (resources to display a frontend) classes. More details are provided in the `abstractions/blueprints` folder, where all the existing `Blueprint`s live.

### `CrowdProvider`
A [`CrowdProvider`](https://github.com/facebookresearch/Mephisto/blob/master/mephisto/abstractions/providers/README.md#implementation-details) is a wrapper around any of the required functionality that Mephisto will need to utilize to accept work from workers on a specific service. Ultimately it comprises of an extension of each of `Worker`, `Agent`, `Unit`, and `Requester`. More details can be found in the `abstractions/providers` folder, where all of the existing `CrowdProvider`s live.

### `MephistoDB`
The [`MephistoDB`](https://github.com/facebookresearch/Mephisto/blob/master/mephisto/abstractions/databases/README.md) is an abstraction around the storage for the Mephisto data model, such that it could be possible to create alternate methods for storing and loading the kind of data that mephisto requires without breaking functionality.
16 changes: 4 additions & 12 deletions mephisto/abstractions/blueprints/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,36 +8,28 @@ The agent state is responsible for defining the data that is important to store
- `set_init_state(data)`: given data provided by the `get_init_data_for_agent` method, initialize this agent state to whatever starting state is relevant for this `Unit`.
- `get_init_state()`: Return the initial state to be sent to the agent for use in the frontend.
- `load_data()`: Load data that is saved to file to re-initialize the state for this `AgentState`. Generally data should be stored in `self.agent.get_data_dir()`, however any storage solution will work as long as it remains consistent.
- `get_data()`: Return the stored data for this task in the format expected to render a completed task in the frontend.
- `get_data()`: Return the stored data for this task in the format containing everything the frontend needs to render and run the task.
- `get_parsed_data()`: Return the stored data for this task in the format that is relevant for review or packaging the data.
- `save_data()`: Save data to a file such that it can be re-initialized later. Generally data should be stored in `self.agent.get_data_dir()`, however any storage solution will work as long as it remains consistent, and `load_data()` will be able to find it.
- `update_data()`: Update the local state stored in this `AgentState` given the data sent from the frontend. Given your frontend is what packages data to send, this is entirely customizable by the task creator.

(TODO) Specify a format for data to be sent to the frontend for review.

### `TaskBuilder`
`TaskBuilder`s exist to abstract away the portion of building a frontend to however one would want to, allowing Mephisto users to design tasks however they'd like. They also can take build options to customize what ends up built. They must implement the following:
- `build_in_dir(build_dir)`: Take any important source files and put them into the given build dir. This directory will be deployed to the frontend and will become the static target for completing the task.
- `get_extra_options()`: Return the specific task options that are relevant to customize the frontend when `build_in_dir` is called.
(TODO) Remove all references to the below functon
- `task_dir_is_valid(task_dir)`: Originally this was intended to specify whether the task directory supplied outside of the task for this task to use was properly formatted, however when `Blueprint`s were finalized, the gallery no longer existed and this route of customization is no longer supported.

### `TaskRunner`
The `TaskRunner` component of a blueprint is responsible for actually stepping `Agent`s through the task when it is live. It is, in short, able to set up task control. A `TaskRunner` needs to implement the following:
- `get_init_data_for_agent`: Provide initial data for an assignment. If this agent is reconnecting (and as such attached to an existing task), update that task to point to the new agent (as the old agent object will no longer receive data from the frontend).
- `run_assignment`: Handle setup for any resources required to get this assignment running. It will be launched in a background thread, and should be tolerant to being interrupted by cleanup_assignment.
- `cleanup_assignment`: Send any signals to the required thread for the given assignment to tell it to terminate, then clean up any resources that were set within it.
- `get_data_for_assignment` (optional): Get the data that an assignment is going to use when run. By default, this pulls from `assignment.get_assignment_data()` however if a task has a special storage mechanism or data type, the assignment data can be fetched here. (TODO) make this optional by having the base class use the `StaticTaskRunner`'s implementation.
(TODO) task launching management at the moment is really sloppy, and the API for it is unclear. Something better needs to be picked, as at the moment `get_init_data_for_assignment` is responsible for ensuring that `run_assignment` is set up in a thread. Perhaps this responsibility should be consolidated into the `TaskLauncher` class.
- `get_data_for_assignment` (optional): Get the data that an assignment is going to use when run. By default, this pulls from `assignment.get_assignment_data()` however if a task has a special storage mechanism or data type, the assignment data can be fetched here.

## Implementations
### `StaticBlueprint`
The `StaticBlueprint` class allows a replication of the interface that MTurk provides, being able to take a snippet of `HTML` and a `.csv` file and deploy tasks that fill templates of the `HTML` with values from the `.csv`.

(TODO) support other sources than a .csv
You can also specify the task data in a `.json` file, or by passing the data array or a generator to `SharedStaticTaskState.static_task_data`.

### `MockBlueprint`
The `MockBlueprint` exists to test other parts of the Mephisto architecture, and doesn't actually provide a real task.

## Future work
(TODO) - Clean up the notion of galleries and parent task ids, as we're consolidating into blueprints
(TODO) - Allow for using user blueprints
5 changes: 5 additions & 0 deletions mephisto/abstractions/databases/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# MephistoDB implementations
This folder contains implementations of the `MephistoDB` abstraction.

## `LocalMephistoDB`
An implementation of the Mephisto Data Model outlined in `MephistoDB`. This database stores all of the information locally via SQLite. Some helper functions are included to make the implementation cleaner by abstracting away SQLite error parsing and string formatting, however it's pretty straightforward from the requirements of MephistoDB.
6 changes: 2 additions & 4 deletions mephisto/abstractions/providers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,9 @@ A specific interface for launching tasks on the MTurk sandbox

(TODO) Can we bundle this into the `MTurkProvider` and make it so that providers have a TEST/SANDBOX mode bundled in? This would clarify how the testing utilities work, without needing to publish real tasks.

### LocalProvider
### LocalProvider (TODO)
An interface that allows for launching tasks on your local machine, allowing for ip-address based workers to submit work.

(TODO) IMPLEMENT THIS

### MockProvider
An implementation of a provider that allows for robust testing by exposing all of the underlying state to a user.

Expand Down Expand Up @@ -71,7 +69,7 @@ The `<Crowd>Unit` implementation needs to be able to handle the following intera
### `<Crowd>Requester`
The `<Crowd>Requester` mostly just needs to abstract the registration process, but the full list of functions are below:
- `register`: Given arguments, register this requester
- `get_register_args`: Return the arguments required to register one of these requesters. (TODO) can we turn this into an argparse group somehow? And then later extract from the argparse group to send to the frontend.
- `get_register_args`: Return the arguments required to register one of these requesters.
- `is_registered`: Determine if the current credentials for a `Requester` are valid.
- `get_available_budget` (Optional): return the available budget for this requester.

Expand Down
5 changes: 2 additions & 3 deletions mephisto/abstractions/providers/mturk/utils/script_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from mephisto.abstractions.providers.mturk.mturk_utils import give_worker_qualification
from mephisto.data_model.requester import Requester
from mephisto.data_model.unit import Unit
from tqdm import tqdm

if TYPE_CHECKING:
from mephisto.abstractions.database import MephistoDB
Expand Down Expand Up @@ -42,9 +43,7 @@ def direct_soft_block_mturk_workers(
)

mturk_client = requester._get_client(requester._requester_name)
for idx, worker_id in enumerate(worker_list):
if idx % 50 == 0:
print(f"Blocked {idx + 1} workers so far.")
for worker_id in tqdm(worker_list):
try:
give_worker_qualification(
mturk_client, worker_id, qualification_id, value=1
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# data_model/test
## Testers
# Abstraction testers
This folder contains a number of Mephisto Data Model "test benches", which serve to be the standard tests that Mephisto Abstractions need to be able to pass in order for the system to be able to use them. As such, they define a number of tests, and then new classes can be tested against the bench by making a subclass that implements the required setup functions. See the `test/server/architects/test_heroku_architect` implementation for an example.

Implementations can add their own additional test methods after extending the baseline test benches in order to ensure that they have a common place to test their complete functionality.

## Utils
Any utility functions that can be used for creating useful mocks, DB setups, or other such prerequisites for a test.
The `utils.py` module is set up with utility functions that can be used for creating useful mocks, DB setups, or other such prerequisites for a test.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
import requests
from mephisto.abstractions.architect import Architect
from mephisto.data_model.task_run import TaskRun
from mephisto.data_model.test.utils import get_test_task_run
from mephisto.abstractions.test.utils import get_test_task_run
from mephisto.abstractions.database import MephistoDB
from mephisto.abstractions.blueprint import SharedTaskState
from mephisto.abstractions.blueprints.mock.mock_task_builder import MockTaskBuilder
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
from mephisto.abstractions.databases.local_database import LocalMephistoDB
from mephisto.data_model.assignment import Assignment
from mephisto.data_model.task_run import TaskRun
from mephisto.data_model.test.utils import get_test_task_run
from mephisto.abstractions.test.utils import get_test_task_run
from mephisto.abstractions.providers.mock.mock_agent import MockAgent
from mephisto.data_model.agent import Agent
from mephisto.operations.hydra_config import MephistoConfig
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

import unittest
from typing import Optional, Tuple
from mephisto.data_model.test.utils import (
from mephisto.abstractions.test.utils import (
get_test_assignment,
get_test_project,
get_test_requester,
Expand Down
File renamed without changes.
Loading

0 comments on commit b23d5b5

Please sign in to comment.