Skip to content

DAML User Guide

Joe K Qiao edited this page May 11, 2023 · 8 revisions

Data Annotator for Machine Learning (DAML) is designed to enable an end-to-end data annotation process for common data types. Here we provide a high level users guide of the key features in DAML:

Create a new data annotation project:

  • From the Projects tab, click Create New Annotation Project and choose the project type.
    • Supported projects types are:
      • text classification
      • tabular
      • named entity recognition (NER)
      • log classification
      • image classification
      • question answer
  • Depending on the annotation project type, you will be asked different project setup questions. In general, the requirements are a project name, uploading data, label values, configuring active learning, and assigning to annotators via email. Here, we show the project set up for a NER project:
    • INSERT_NER_SETUP_IMAGE
  • Click Create to complete the project set up.
  • You will receive an email notification confirming the project creation and this project will show up in the Projects tab
    • Annotators will receive an email link to join the project and start annotating

Annotate a Project:

The design objective for DAML is to provide an interface to focus on a single task to enable rapid annotations. To start annotating, you must be assigned to the project.
  • From the Annotate tab, click START on the project of your choice. This example will use an NER project:
    • INSERT_NER_PICTURE
  • On the left hand side menu (which can be toggled to hide), you have the following:
    • Projects selector to switch between projects
    • Project info including annotation instructions from the Project Owner
    • Your Progress on the current annotation project
    • A history of your labels in this session
  • On the right hand side, you are presented the Original Ticket which is one entry from the overall project
    • The flag icon next to this entry allows the annotator to send this entry to the Project Owner to review for fit (eg; the entry might not fit the current set of labels or is bad data)
    • In an NER project, the annotator can select the entity (one of the buttons) and then click the text from the entry to highlight
      • Note: a single click will annotate the clicked word or you can select a span of text to be annotated as this entity
      • INSERT_IMAGE_NER_ANNOTATION
    • At any time, you may skip the current entry, return to a previous entry
  • Click EXIT at any time to stop annotating. Your progress is automatically saved for resumption later

Tracking the Project Progress

DAML aggregates all annotation actions in real time so you can see the progress of all annotators.

In the Projects tab, choose click on the name of the project to view the overall progress:

  • On the top you will see overall project details in addition to two charts:
    • # Annotations Per User
    • # Annotations Per Category
  • Underneath the charts, you will see two tabs:
    • Annotations tab which presents all currently annotated examples in a table format for your review
    • Flag tab which presents all examples flagged by users for review. For a flagged ticket, you have two options:
      • Delete the example from the project. This will permanently remove this example from the dataset.
      • Silence the flag will return the specific example back into the pool which will be shown to annotators again
    • For projects with Active Learning support, you will see an additional Active Learning tab showing the computed accuracy over time of models which are used to query annotators

Manage a Project

Project owners can modify existing projects (add or remove data from a project, edit project owners and annotators, and more), export and share data to service users, and have a full view of annotation progress as well as resolve any data conflicts.
Update project details
  • On the Projects tab, under the actions column, choose "Edit Project" to see the following options:
  • Project Name
  • Project Owners
    • Add using comma separated emails
    • Delete by clicking "X" next to a user's email
  • Annotators
    • Add using comma separated emails
    • Delete by clicking "X" next to a user's email
  • Labels
    • Add new individual labels
    • Delete by clicking "X" next to a label
      • Note: if a label is already in-use for any entries in your project, it cannot be deleted
  • Assignment Logic: choose from Random or Sequential
Add Data
  • On the Projects tab, under the actions column, click the Append New Entries icon to see the following options:
  • Quick Add:
    • Add individual entries matching the headers of your data
      • Note: for Logs or computer vision, you may add individual files matching the required format
  • Bulk Add:
    • Upload a CSV or a zip file depending on your project setup
      • Note: for CSVs, you must match the same column headers or the file will be rejected
Export labeled data
  • On the Projects tab, under the actions column, click the "Download Project" icon to see the following options:
  • Choose an Export format and check if you want all un-labled entries removed:
    • Standard: a DAML format suitable for ML tasks
      • Note: for NER, Log annotation, and image classification projects, Standard is the only available option
    • Top: adds a "top" column to the Standard data export based on the number of labels for a specific entry
    • Probabilistic: adds the ratio of each label to the overall label count for a specific entry
Share Dataset
You have the option to share any annotated dataset within the service.
  • On the Projects tab, under the actions column, click the "Share Dataset" icon
  • Provide a description of the dataset and click OK, the dataset will now be available in the  Community Dataset tab
Delete Project
You have the option to delete any unused projects.
  • On the Projects tab, under the actions column, click the Delete icon
  • WARNING: Deletes are permanent and you must confirm before a deletion takes place.

Project and User Administration

Setup Admin Role
Admin level users within DAML can manage all projects just like a Project Owner (this is covered in the Manage Project section) and in addition has control over role-based access control (RBAC). Note this tab is only available if you've been assigned an Admin role.

To setup the admin users, you need to add admin user's emails in the "adminDefault" field in the app-os.js file.

Setup User Role
To change a user's role, click the Admin tab
  • Select the user from the table and click "Edit"
  • Reassign the roles based on the following definitions:
    • Annotator: default role for all logged in users
    • Project Owner: in addition to being Annotator, provides functionality to create, manage, and delete projects that are self-originated
    • Admin: Access to all Project Owner functionality and User management functionality
To delete a user permanently from DAML click the "Delete" icon next to their username under the actions column.

Manage Datasets

In addition to uploads during project creation, DAML provides a dedicated My Datasets tab. This is useful for large uploads above 500MB and to delete legacy data from the system.
To upload a dataset:
  • In the My Datasets tab, click Upload New Dataset
  • Provide the following and click Okay:
    • Name
    • Browse to the file on your local system
    • File Format
    • Has Header: only displayed if CSV or Tabular is chosen as File Format
To delete a dataset permanently from DAML click the "Delete" icon next to the dataset name under the actions column.

Using the DAML REST API

DAML provides a set of common APIs to manage your data annotation projects. A swagger UI is available for easy interactivity at your-service.com/api-docs/.

All common functionality is supported via the API for CRUD operations with the exception of annotation of logs and CV.