Skip to content

Commit

Permalink
Merge pull request #137 from TeselaGen/feature/ap/plugin-example
Browse files Browse the repository at this point in the history
Plugin example
  • Loading branch information
eabeliuk authored Nov 25, 2022
2 parents 630104e + 6570652 commit 13965e4
Show file tree
Hide file tree
Showing 10 changed files with 422 additions and 64 deletions.
2 changes: 1 addition & 1 deletion packages/standard/src/manifest/data/supplemental-data.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ export const SupplementalData = Type.Object({
utilityScore: Type.Number(), // normalized to be between 0 and 1 inclusive
// categoryScores: UtilityScoreDetails, // we need to design these categories
data: Type.Array(Type.Union([FileData, TabularData])),
// suplemental data sections.
// supplemental data sections.

// NOTE: This could be moved up a level.
provenance: Type.Optional(Provenance),
Expand Down
2 changes: 1 addition & 1 deletion packages/standard/src/manifest/data/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ export enum ColumnClassEnum {
*/
CALL = 'CALL',
/**
* Computed values are a computation of other values, (e.g., the divisin of two other observation columns).
* Computed values are a computation of other values, (e.g., the division of two other observation columns).
*/
COMPUTED_VALUE = 'COMPUTED_VALUE',
/**
Expand Down
26 changes: 3 additions & 23 deletions website/docs/examples/microbyre-example.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
---
title: 'MicroByre Manifest'
metaTitle: 'FSML Manifest for MicroByre'
metaDescription: 'FSML PDF Protocol with a YAML generated manifest'
metaDescription: 'MicroByre FSML Manifest Example'
sidebar_position: 1
---

# MicroByre: FSML Manifest Generation
Expand All @@ -17,28 +18,7 @@ Three files are necessary to carry on with this example and those are available

## Installing the FSML CLI Tool

Double-click on the `fsml-v1.1.0-239859b-x64.pkg` installer file and follow through the MacOS installer steps.

## Trying out the FSML CLI Tool

Open a terminal window and type the following command _(note that the “$>” symbol is not to be typed, it's just used here to represent your CLI prompt)_

```
$> fsml
```

You should see the FSML CLI helper docs.

```
fsml <command>
Commands:
fsml defaults <subcommand> Configures default values for CLI flags
fsml manifest <subcommand> Operates with the FSML manifest
fsml plugin <subcommand> Handles external plugin modules
```

Feel free to navigate into the CLI commands docs by typing any of the described commands (e.g., `$> fsml defaults`).
Follow the steps in [Installing CLI Tool](/software/tools/cli#installing-cli-tool). Get familiarized with the CLI Tool commands in [CLI Tool](/software/tools/cli).

## Installing the Microbyre plugin

Expand Down
292 changes: 292 additions & 0 deletions website/docs/examples/phycus-parser-example.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,292 @@
---
title: 'Phycus Parser Example'
metaTitle: 'FSML Manifest for MicroByre'
metaDescription: 'Phycus Parser Example'
sidebar_position: 2
---

# Phycus: FSML Parser Plugin

This is a real-world example from FSML partner **Phycus**.

This example showcases how to build an FSML Parser Plugin to parse experimental data into an FSML Manifest. The Plugin will then be installed
with the FSML CLI tool and used to take in the experimental data file and convert that into an FSML Manifest.

Three files are necessary to carry on with this example and those are available in the following link:

<!-- Not sure how to reference the file in the static folder -->
- [**phycusExample.zip**](/)


## Installing the FSML CLI Tool

Follow the steps in [Installing CLI Tool](/software/tools/cli#installing-cli-tool). Get familiarized with the CLI Tool commands in [CLI Tool](/software/tools/cli).

## Phycus Applikon Bioreactor Data

Bioreactors provide a controlled environment for experiments involving growth or biological reactions under specific conditions. For example, the Applikon bioreactor allows a user to set and measure parameters such as temperature, dissolved oxygen, pH, and stirring speed. After a run, the bioreactor exports a CSV describing the experimental design. An example of such CSV is included in the [**phycusExample.zip**](/) file.

The columns of the Applikon Bioreactor’s output file denote the independent variables of time, dissolved oxygen, pH, stirring speed, and volume.

## Phycus Plugin Parser

Leverage the FSML [Parser Plugin Template](/software/plugins/parser/#template) as the starting point for implementing the Phycus Plugin Parser for Applikon Bioreactor data.

This template generates a very generic FSML manifest file, so to convert an equipment-specific exported file to a more well-defined and descriptive FSML data schema, in this case a CSV exported from the Applikon Bioreactor, we'll need to describe the CSV columns by filling out the FSML [Columns](/model/manifest/supplemental-sections/data/tabular-data#columns) type object.

### Column Definitions

The first CSV column with the header of "Time [h]" is converted to an object with an index of 0 (column 0). The valueType object has a type of [Numeric](/model/manifest/supplemental-sections/data/data-types/values#numeric), as the timepoints are displayed as numbers. The [Column Kind](/model/manifest/supplemental-sections/data/data-types/columns#kind) object for the "Time [h]" column has a type of **Reference**, and the [Column Class](/model/manifest/supplemental-sections/data/data-types/columns#class) object has a type of **Reference_Dimension**, as the timepoints correspond to a series of observations. The dimension object has a dimension type of [TIME](/model/manifest/supplemental-sections/data/data-types/dimensions/), and the unit object has a value of **Hours**.

Finally, the primary schema for the FSML Column object for the first columns ends up as:

<details>
<summary>Time column definition</summary>

```json
{
"name": "Time [h]",
"description": "Time",
"valueType": { "type": "NUMERIC" },
"kind": {
"type": "REFERENCE",
"class": {
"type": "REFERENCE_DIMENSION",
"name": "Time",
"dimension": { "type": "DIMENSION", "dimensionType": "TIME" },
"unit": {
"type": "UNIT",
"value": "HOURS",
"dimension": { "type": "DIMENSION", "dimensionType": "TIME" }
}
}
}
}
```
</details>

Converting the CSV to a structured object allows for consistency across parsing the columns, as the "Time [h]" column contains the unit within the header, whereas the following columns specify the unit in the second row (e.g. the "cal_ls_opt_do" column and "%").

<br/>
The second CSV column with the header of "cal_ls_opt_do" is converted to an object with an index of 1 (column 1). The description key allows for a more human-readable explanation of the column as **Dissolved Oxygen**, compared to the machine generated name of "cal_ls_opt_do" used as the column header. The valueType object has a type of [Numeric](/model/manifest/supplemental-sections/data/data-types/values#numeric), as the dissolved oxygen measurements are numbers. Because dissolved oxygen is measured as a percentage, a range of '[0, 100]' can be specified.

The [Column Kind](/model/manifest/supplemental-sections/data/data-types/columns#kind) object for the "cal_ls_opt_do" column has a type of **FACTOR**, as dissolved oxygen is a controlled input variable used to trigger responses to be analyzed in the experiment. Similarly, the [Column Class](/model/manifest/supplemental-sections/data/data-types/columns#class) object has a type of **Descriptor**, as dissolved oxygen is an independent variable in the experiment. The dimension object has a dimension type of [CONCENTRATION](/model/manifest/supplemental-sections/data/data-types/dimensions/), corresponding to concentration of oxygen, and the unit object has a value of **Percent** (%).

Finally, the primary schema for the FSML Column object for the second columns ends up as:

<details>
<summary>Dissolved Oxygen column definition</summary>

```json
{
"name": "cal_ls_opt_do",
"description": "Dissolved Oxygen",
"valueType": { "type": "NUMERIC", "range": [0, 100] },
"kind": {
"type": "FACTOR",
"class": {
"type": "DESCRIPTOR",
"name": "Dissolved Oxygen",
"dimension": { "type": "DIMENSION", "dimensionType": "CONCENTRATION" },
"unit": {
"type": "UNIT",
"value": "PERCENT",
"dimension": { "type": "DIMENSION", "dimensionType": "CONCENTRATION" }
}
}
}
}
```
</details>

<br/>
Subsequent columns indicating other parameters of the experimental design are similarly converted to the structured FSML data schema format.
<br/><br/>

<details>
<summary>Other column definitions</summary>

```json
[
{
"name": "m_ph",
"description": "pH",
"valueType": { "type": "NUMERIC", "range": [0, 14] },
"kind": {
"type": "FACTOR",
"class": {
"type": "DESCRIPTOR",
"name": "pH",
"dimension": { "type": "DIMENSION", "dimensionType": "CONCENTRATION" }
}
}
},
{
"name": "m_stirrer",
"description": "Stirring Speed",
"valueType": { "type": "NUMERIC" },
"kind": {
"type": "FACTOR",
"class": {
"type": "DESCRIPTOR",
"name": "Stirring Speed",
"dimension": { "type": "DIMENSION", "dimensionType": "SPEED" },
"unit": {
"type": "UNIT",
"value": "RPM",
"dimension": { "type": "DIMENSION", "dimensionType": "SPEED" }
}
}
}
},
{
"name": "dm_spump1",
"description": "Volume",
"valueType": { "type": "NUMERIC" },
"kind": {
"type": "FACTOR",
"class": {
"type": "DESCRIPTOR",
"name": "Volume",
"dimension": { "type": "DIMENSION", "dimensionType": "VOLUME" },
"unit": {
"type": "UNIT",
"value": "MILLILITER",
"dimension": { "type": "DIMENSION", "dimensionType": "VOLUME" }
}
}
}
}
]
```

</details>


### Main Implementation

Implementation of a parser will only need to be done once before the plugin can be regularly used to parse files.

Following the FSML [Parser Plugin Template](/software/plugins/parser/#template), we'll describe how the main **run** function. Note that the parsing logic in this implementation is scoped for a CSV file, other than CSV file types may require different implementations. There are many popular npm packages out there commonly used for parsing different file types, such as .xlsx, .yaml, .json, etc.

#### CSV Parsing

The first step in the implementation is converting the CSV data into a javascript object (JSON) in order to handle it properly within the program. A popular npm csv parser is [papaparse](https://www.papaparse.com/). It is conveniently easy to use and a short snippet of it is shown here (complete implementation is found in the [**phycusExample.zip**](/))


<details>
<summary>CSV Snippet</summary>

```javascript
import * as fs from 'fs';
import papaparse from 'papaparse';

// Main function of the FSML Parser Plugin.
const run: (file) => {

// Reads the file in case its a filepath and converts the buffer stream into a string
// which papaparse accepts as input to return the CSV data as a JSON array.
let buffer = file;
if (typeof file === 'string') {
buffer = fs.readFileSync(file);
}
const dataString = buffer.toString('utf-8');
// the 'data' object will contain the array of rows in the CSV file.
const { data } = papaparse.parse(dataString);
}
```

</details>


#### FSML Schema Objects
The second thing implemented in the parser, is that it leverages the FSML SDK utils package to generate the empty FSML Manifest schema objects to be then filled with the data. These utility functions eases the developer's experience by auto-generating the manifest objects instead of doing it manually.


Here, we generate the [TabularData](http://localhost:4444/model/manifest/supplemental-sections/data/tabular-data/) schema object, which it's type is exported by the FSML standard package.

<details>
<summary>Schema Objects</summary>

```javascript
import fsml_standard from "fsml-standard"
import fsml_utils from "fsml-utils"
import lodash from "lodash"

// Imports the JSON of column definitions
import columnDefinitions from './columnDefinitions.json';

// Main function of the FSML Parser Plugin.
const run: (file) => {

// The 'createTemplateForType' utility function generates an empty template object for the provided FSML standard types.
const TabularData = fsml_utils.createTemplateForType(fsml_standard.TabularData);

/**
* The generated 'TabularData' object should look something like
*
* {
* "type": "TABULAR",
* "index": 0,
* "name": "",
* "rows": [],
* "columns": {},
* "fileReference": { "type": "FILE", "index": 0, "reference": "" }
* }
*/

// Then we can stitch everything together by first adding the column definitions
// to the 'column' property of the 'TabularData'.
lodash.set(TabularData, 'column', columnDefinitions);
}
```

</details>

Finally, all that is left to do is to populate the **rows** property of the **TabularData** object with the CSV data and return our FSML Manifest.

<details>
<summary>Populate Data</summary>

```javascript
import * as fs from 'fs';
import papaparse from 'papaparse';

// Main function of the FSML Parser Plugin.
const run: (file) => {

/**
* Recall the 'data' object we holding our data came:
* const { data } = papaparse.parse(dataString);
*
* And that the 'TabularData' came from:
* const TabularData = createTemplateForType(fsml_standard.TabularData);
* */

// Simply loops through the data array and populate the rows.
data.forEach((dataRow, rowIndex) => {
const values = [];
// loops through the CSV row to get each row's value.
dataRow.forEach((value, columnIndex) => {
values.push({
index: columnIndex,
value,
});
});
// Generates the Row schema, to then populate it with the rowIndex and row values.
const Row = createTemplateForType(fsml_standard.Row);
set(Row, "index", rowIndex)
set(Row, "values", values)
// Pushes a new Row to the 'TabularData' object.
TabularData.rows.push(Row);
});

// and return the FSML data schema
return await Promise.resolve({ data: TabularData });
}
```
</details>

## Phycus FSML Manifest

The Phycus Applikon Plugin Parser is complete. Next, we can use the FSML CLI tool to install it and generate the FSML Manifest. To do install this Plugin we can either publish it to the npm registries and install it via its public https URL, or use a local version of it.
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: 'Column'
---

# Column Types


## Column Kind

Column kinds increases the degree of completeness of the FSML Manifest by describing a column's "Kind". There are 4 different kinds defined
in the FSML data schema standard:

- **IDENTIFIER**: identifier columns are the experimental units or subjects under evaluation.

- **REFERENCE**: reference columns relates to the reference axis values. These correspond to the reference on which observations are taken course.

- **FACTOR**: factors are the inputs or controlled variables used to manipulate the experiment with the aim of triggering responses to be studied and analyzed.

- **OBSERVATION**: observations are the experiment's responses to the manipulated factors, these explain how the subjects behave under the experimental conditions applied. Also known as the dependent variables.


## Column Class

Column Classes are a more deeply detailed structure than Column Kinds. These include additional properties that further explain the nature of each column
such as the units, phyisical dimension and the type of the values that each column holds.

- **SUBJECT**: corresponds to the the subjects under evaluation or the experimental units subjected to experimental conditions.

- **REFERENCE_DIMENSION**: the reference dimension of the experiment's observations or measurements.

- **MEASURMENT**: the measurements performed during the experiment. These are restricted to be numeric values and associated with some dimensional units.

- **CALL**: observations restricted to be of type categoric.

- **COMPUTED_VALUE**: computed values are a computation of other values, (e.g., the division of two other observation columns).

- **DESCRIPTOR**: descriptors are columns that hold generic type descriptors/features value for each row. These can be independent variables that are controlled throughout the experiment or defined as initial conditions.

- **UNIT**: Usually units are equal across an entire column. But, passing in unit columns allows to be specific about the units of each value in another column.


Loading

0 comments on commit 13965e4

Please sign in to comment.