-
Notifications
You must be signed in to change notification settings - Fork 23
DAML User Guide
Joe K Qiao edited this page May 11, 2023
·
8 revisions
Data Annotator for Machine Learning (DAML) is designed to enable an end-to-end data annotation process for common data types. Here we provide a high level users guide of the key features in DAML:
- From the Projects tab, click Create New Annotation Project and choose the project type.
- Supported projects types are:
- text classification
- tabular
- named entity recognition (NER)
- log classification
- image classification
- question answer
- Supported projects types are:
- Depending on the annotation project type, you will be asked different project setup questions. In general, the requirements are a project name, uploading data, label values, configuring active learning, and assigning to annotators via email. Here, we show the project set up for a NER project:
- INSERT_NER_SETUP_IMAGE
- Click Create to complete the project set up.
- You will receive an email notification confirming the project creation and this project will show up in the Projects tab
- Annotators will receive an email link to join the project and start annotating
- From the Annotate tab, click START on the project of your choice. This example will use an NER project:
- INSERT_NER_PICTURE
- On the left hand side menu (which can be toggled to hide), you have the following:
- Projects selector to switch between projects
- Project info including annotation instructions from the Project Owner
- Your Progress on the current annotation project
- A history of your labels in this session
- On the right hand side, you are presented the Original Ticket which is one entry from the overall project
- The flag icon next to this entry allows the annotator to send this entry to the Project Owner to review for fit (eg; the entry might not fit the current set of labels or is bad data)
- In an NER project, the annotator can select the entity (one of the buttons) and then click the text from the entry to highlight
- Note: a single click will annotate the clicked word or you can select a span of text to be annotated as this entity
- INSERT_IMAGE_NER_ANNOTATION
- At any time, you may skip the current entry, return to a previous entry
- Click EXIT at any time to stop annotating. Your progress is automatically saved for resumption later
In the Projects tab, choose click on the name of the project to view the overall progress:
- On the top you will see overall project details in addition to two charts:
- # Annotations Per User
- # Annotations Per Category
- Underneath the charts, you will see two tabs:
- Annotations tab which presents all currently annotated examples in a table format for your review
- Flag tab which presents all examples flagged by users for review. For a flagged ticket, you have two options:
- Delete the example from the project. This will permanently remove this example from the dataset.
- Silence the flag will return the specific example back into the pool which will be shown to annotators again
- For projects with Active Learning support, you will see an additional Active Learning tab showing the computed accuracy over time of models which are used to query annotators
- On the Projects tab, under the actions column, choose "Edit Project" to see the following options:
- Project Name
-
Project Owners
- Add using comma separated emails
- Delete by clicking "X" next to a user's email
-
Annotators
- Add using comma separated emails
- Delete by clicking "X" next to a user's email
-
Labels
- Add new individual labels
- Delete by clicking "X" next to a label
- Note: if a label is already in-use for any entries in your project, it cannot be deleted
- Assignment Logic: choose from Random or Sequential
- On the Projects tab, under the actions column, click the Append New Entries icon to see the following options:
-
Quick Add:
- Add individual entries matching the headers of your data
- Note: for Logs or computer vision, you may add individual files matching the required format
- Add individual entries matching the headers of your data
- Bulk Add:
- Upload a CSV or a zip file depending on your project setup
- Note: for CSVs, you must match the same column headers or the file will be rejected
- Upload a CSV or a zip file depending on your project setup
- On the Projects tab, under the actions column, click the "Download Project" icon to see the following options:
-
Choose an Export format and check if you want all un-labled entries removed:
- Standard: a DAML format suitable for ML tasks
- Note: for NER, Log annotation, and image classification projects, Standard is the only available option
- Top: adds a "top" column to the Standard data export based on the number of labels for a specific entry
- Probabilistic: adds the ratio of each label to the overall label count for a specific entry
- Standard: a DAML format suitable for ML tasks
- On the Projects tab, under the actions column, click the "Share Dataset" icon
- Provide a description of the dataset and click OK, the dataset will now be available in the Community Dataset tab
- On the Projects tab, under the actions column, click the Delete icon
- WARNING: Deletes are permanent and you must confirm before a deletion takes place.
To setup the admin users, you need to add admin user's emails in the "adminDefault" field in the app-os.js file.
To change a user's role, click the Admin tab- Select the user from the table and click "Edit"
- Reassign the roles based on the following definitions:
- Annotator: default role for all logged in users
- Project Owner: in addition to being Annotator, provides functionality to create, manage, and delete projects that are self-originated
- Admin: Access to all Project Owner functionality and User management functionality
- In the My Datasets tab, click Upload New Dataset
- Provide the following and click Okay:
- Name
- Browse to the file on your local system
- File Format
- Has Header: only displayed if CSV or Tabular is chosen as File Format
All common functionality is supported via the API for CRUD operations with the exception of annotation of logs and CV.