Skip to content

Commit

Permalink
removing second person from paper
Browse files Browse the repository at this point in the history
Signed-off-by: vsoch <[email protected]>
  • Loading branch information
vsoch committed Jul 5, 2020
1 parent 3613c40 commit b6e2eed
Showing 1 changed file with 33 additions and 35 deletions.
68 changes: 33 additions & 35 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,8 @@ realized that she might want to define just sets of parameters that expand
into matrices that can be useful across many use cases. This makes
the grids "first class citizens." For example, instead of a top to bottom
script that loops over some set of datasets, parameters, and algorithms,
you could define grids to generate each in a modular fashion. This
is explained in detail for the [clustering grids](https://github.com/vsoch/gridtest/tree/master/examples/clustering-grids) example derived from scikit-learn. As another example, you might
the user could define grids to generate each in a modular fashion. This
is explained in detail for the [clustering grids](https://github.com/vsoch/gridtest/tree/master/examples/clustering-grids) example derived from scikit-learn. As another example, the user might
just want to parameterize some set of inputs to randomly generate a cohort.
This example is detailed [in another tutorial here](https://vsoch.github.io/gridtest/tutorials/samplegrid/). The overall idea is simple. The current practice is generally to write parameterizations alongside code, whether that means nested for loops or decorators for testing.
GridTest allows for this same kind of functionality, but storing the parameterization
Expand All @@ -81,21 +81,20 @@ For an example of command line usage, the reader is directed to the ["How does i

### Parameterization

A **grid** is a global definition of a parameter matrix. You can define arguments,
A **grid** is a global definition of a parameter matrix. A user can define arguments,
and optionally functions to run to be mapped to arguments. Grids are generated
on demand, meaning when you iterate over a grid object, so that they are more
efficient to use because we don't store a large list in memory.
Grids can be put to many uses. You might share a repository that only defines grids that people
can use across many different kinds of machine learning models, likely to collect metrics
on demand, meaning when the user iterates over a Grid object, so that no large lists are stored in memory.
Grids can be put to many uses. The user might share a repository that only defines grids that people
can use across many different kinds of machine learning models, possibly to collect metrics
and compare different analysis strategies being used. An introduction to grids
is available [here](https://vsoch.github.io/gridtest/getting-started/grids/).

### Metrics

A **metric** is a Python decorator that is paired with a test that will measure some
attribute of a test. For example:
- you might run a function across a grid of arguments, and then measure the time that each combination takes (the metric), and generate a report for inspection.
- you might be doing text processing and having functions to parse text. Each function might be run over a grid of sentences and counts, and for each result, we want to count the number of unique words, and total words (metrics).
- the user might run a function across a grid of arguments, and then measure the time that each combination takes (the metric), and generate a report for inspection.
- the user might be doing text processing and having functions to parse text. Each function might be run over a grid of sentences and counts, and for each result, the number of unique and total words is counted (metrics).

Metrics are fully described in the [metrics](https://vsoch.github.io/gridtest/getting-started/metrics/) section
of the documentation.
Expand All @@ -114,31 +113,30 @@ GridTest extends the traditional definition [@sklearn-tutorial] of a grid to inc

- [generating random samples](https://vsoch.github.io/gridtest/tutorials/samplegrid/)
- [loading grids via a GridRunner](https://vsoch.github.io/gridtest/getting-started/grids/index.html) class separate from the application's Python code.
- generating grids as you go (meaning as an iterator)
- previewing grids on the command line before you use them
- generating grids as they are needed (meaning as an iterator)
- previewing grids on the command line before using them
- generating content of grids via external functions, and optionally unwrapping list values

Grids are generated on demand for more efficient memory allocation, and can be extended to any use case that requires some combination of arguments, and optionally functions to run to be mapped to arguments. See the section on the concept of a grid for more detail.


### 2. Capturing Metrics

How long does your function take when you provide parameter X as one value, versus
another? By way of allowing you to specify one or more metrics alongside tests,
you can easily capture metrics (Python decorators to your functions to test)
to output in an interactive report. For example, if we write a test that runs
a machine learning algorithm across a grid of datasets and algorithms, we can easily
How long does a function take to run when provided parameter X as one value, versus
another? By way of allowing the user to specify one or more metrics alongside tests,
they can easily capture metrics (Python decorators to functions to test)
to output in an interactive report. For example, if the user writes a test that runs
a machine learning algorithm across a grid of datasets and algorithms, they can easily
add a metric to record the time that each takes, and save this result to a file.
GridTest provides a standard set of [decorators](https://vsoch.github.io/gridtest/tutorials/decorators/index.html) for
ease of use, and the user is also free to write their own functions to collect
metrics.

### 3. Generating Reports

If you need to save results to a data file (e.g., results.json) or generate
an interactive report for GitHub pages, this is easy to do with running
GridTest with the `--save` or `--save-web` flags. An example web report is
shown in Figure 1. Any grid can also be exported
It can be handy to save results to a data file (e.g., results.json) or generate
an interactive report for GitHub pages. GridTest allows for this by way of the
`--save` or `--save-web` flags. An example web report is shown in Figure 1. Any grid can also be exported
to JSON for archiving in a repository or extension to other custom visualizations.

![Figure 1. An example GridTest web report](report.png)
Expand All @@ -149,40 +147,40 @@ and more information about reports and export formats is provided [here](https:/
### 4. Debugging

What programmer hasn't been in the scenario of running a group of tests,
and then having some fail? What do you do in that case? You can start an interactive
shell, import what you need, and try to reproduce, or you can turn up verbosity
and then having some fail? What can be done? The user might start an interactive
shell, import what is needed, and try to reproduce, or they can turn up verbosity
and add a bunch of print statements to figure out what is going on. GridTest makes
this much easier with it's `--interactive` mode, which will let you simply
shell into an interpreter right before the function is run, and let you debug
away. A detailed walkthrough of debugging is provided [here](https://vsoch.github.io/gridtest/getting-started/debugging/).
this much easier with it's `--interactive` mode, which allows the user to
shell into an interactive session right before the function is run to allow for debugging.
A detailed walkthrough of debugging is provided [here](https://vsoch.github.io/gridtest/getting-started/debugging/).

### 5. Running Reproducible Tests

When you write tests for a file, local, or system module, you store them in
When the user writes tests for a file, local, or system module, they store them in
a YAML file that is stored alongside the code, and can be tested with CI.
The YAML file can have grids of parameters defined so you can easily test many
The YAML file can have grids of parameters defined so the user can easily test many
different combinations.

### 6. Knowing the tests to write

Whether you write tests as you go or at the end, you have to look back at your files
Whether tests are written during development or at the end of it, the user typically needs to look through files
to know the function names and arguments that need to be tested. GridTest solves
this problem by way of discovery - give it a module name, a file name, or
an entire directory with Python files, and it will generate a template for you
to easily fill in that already includes arguments and functions.
Once you've written your tests, you can run GridTest with the `--check` feature
to find newly added functions in your code. For more details about creating, checking, and updating
this problem by way of discovery - the user can give it a module name, a file name, or
an entire directory with Python files, and it will generate a template
to fill in that already includes arguments and functions.
Once tests are written, the user can run GridTest with the `--check` feature
to find newly added functions in the code. For more details about creating, checking, and updating
tests, see the [testing](https://vsoch.github.io/gridtest/getting-started/testing/index.html)
documentation.

In summary, GridTest:

1. Lets you define grids to be generated programatically, version controlled, and used for multiple purposes
1. Lets the user define grids to be generated programatically, version controlled, and used for multiple purposes
2. Allows measuring metrics alongside tests
3. Stores tests in a YAML file that can be stored in version control
4. Generates data exports and interactive reports for results
5. Provides an easy way to interactively debug
6. Helps you to discover the tests that you need to write, and creates a template to fill in
6. Helps to discover the tests that need to be written, and creates a template to fill in
7. Makes it easy to define and interact with expanded parameter grids

## Conclusion
Expand Down

0 comments on commit b6e2eed

Please sign in to comment.