Skip to content

Commit

Permalink
Merge pull request #191 from NLPatVCU/development
Browse files Browse the repository at this point in the history
Development
  • Loading branch information
swfarnsworth authored May 25, 2020
2 parents 40d4eb1 + e53b048 commit d3f5591
Show file tree
Hide file tree
Showing 86 changed files with 2,382 additions and 2,398 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ dist: trusty
group: edge

python:
- "3.6"
- "3.7"
- "3.8"

os:
- linux
Expand Down
67 changes: 44 additions & 23 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,58 +1,79 @@
# Contributing to medaCy
MedaCy seeks to create a unified platform to streamline research efforts in medical text mining while also providing an interface to easily apply models to real world problems.
Due to this, contributions to medaCy are often consequences and direct by-products of active research projects.
However, if not for the contributions, bug fixes/reports, and suggestions of practioners - medaCy could not grow and thrive.
MedaCy seeks to create a unified platform to streamline research efforts in medical text mining while also providing an
interface to easily apply models to real world problems. Due to this, contributions to medaCy are often consequences
and direct by-products of active research projects. However, if not for the contributions, bug fixes/reports,
and suggestions of practioners - medaCy could not grow and thrive.

This contribution guide is designed to inform:

1. **Researchers** in how they can efficiently utilize medaCy to make their work more reachable by practioners.
2. **Practioners** in how they can tune medaCy's cutting-edge functionalities to their specific application.

## Table of contents
1. [Issues and Bug Reports](#issues-and-bug-reports)
2. [Development Set-up](#development-environment-setup)
3. [Running Unit Tests](#running-unit-tests)

## Issues And Bug Reports
Please do a search before posting an issue/bug report - your problem may already be solved! If your search comes up for not - congratulations, you may have something to contribute!
Please do a search before posting an issue/bug report - your problem may already be solved! If your search comes up for
not - congratulations, you may have something to contribute!

## Development Environment Setup
At it's most basic one can fork medaCy, clone down their fork, and use their favorite text editor to develop. However, some up-front set-up effort goes a long way towards streamlining the contribution process and keeping organized.
This section details a suggested set-up for efficient development, testing, and experimentation with medaCy utilizing [PyCharm](https://www.jetbrains.com/pycharm/).
At it's most basic one can fork medaCy, clone down their fork, and use their favorite text editor to develop.
However, some up-front set-up effort goes a long way towards streamlining the contribution process and keeping organized.
This section details a suggested set-up for efficient development, testing, and experimentation with medaCy utilizing
[PyCharm](https://www.jetbrains.com/pycharm/).

**Assumptions of this section:**
- You are working in a UNIX based operating system.
- Part 2 assumes you have Pycharm Professional installed - Pycharm Professional is provided with the Jetbrains University License. (this isn't entirely necessary but the useful Remote Host feature is disabled on the Community Edition)
- Part 2 assumes you have Pycharm Professional installed - Pycharm Professional is provided with the Jetbrains
University License. (this isn't entirely necessary but the useful Remote Host feature is disabled on the Community Edition)

**Part 1: Development Installation**

1. If you are shaky with git - [this link](https://nvie.com/posts/a-successful-git-branching-model/) provides an excellent description of the branching model medaCy follows to organize contributions. Read it.
1. If you are shaky with git - [this link](https://nvie.com/posts/a-successful-git-branching-model/) provides an
excellent description of the branching model medaCy follows to organize contributions.
2. Fork medaCy and copy the clone link.
3. On your machine, insure you have Python 3 installed. Set-up a [virtual environment](https://docs.python.org/3/library/venv.html) and activate it.
4. Run the bash commands: `python --version` and `pip list`. Upgrade pip to the latest version as suggested. Your python version should be above 3.4 and your installed packages should be few in number - if both of these conditions do not hold return to *Step 3*.
3. On your machine, insure you have Python 3 installed. Set-up a [virtual environment](https://docs.python.org/3/library/venv.html)
and activate it.
4. Run the bash commands: `python --version` and `pip list`. Upgrade pip to the latest version as suggested.
Your python version should be above 3.4 and your installed packages should be few in number - if both of these
conditions do not hold return to *Step 3*.
5. In a directory separate from the one created by the virtual envirorment set-up command, clone down your fork of medaCy.
6. Whilst inside your cloned fork, insure you are in at-least the *development* branch or a branch of the *development* branch. This can be verified by running `git status` and branching can be done with `git checkout <branch-name>`
7. Run `pip install -e .` This will install medaCy in editable mode inside of your virtual environment and will take several minutes to install dependencies - medaCy stands on the shoulders of giants! Errors one is likely to encounter here include the installation of sci-py and numpy. Google search the errors as they are easily fixable via the installation of some extra dependencies. Likely, your python installation is missing C headers required by scipy.
6. Whilst inside your cloned fork, insure you are in at-least the *development* branch or a branch of the *development* branch.
This can be verified by running `git status` and branching can be done with `git checkout <branch-name>`
7. Run `pip install -e .` This will install medaCy in editable mode inside of your virtual environment and will take
several minutes to install dependencies - medaCy stands on the shoulders of giants! Errors one is likely to encounter
here include the installation of sci-py and numpy. Google search the errors as they are easily fixable via the installation
of some extra dependencies. Likely, your python installation is missing C headers required by scipy.

**Part 2: Developing with PyCharm**

PyCharm can streamline development efforts - especially if you are developing locally and running medaCy on a remote machine for model building.
PyCharm can streamline development efforts - especially if you are developing locally and running medaCy on a remote
machine for model building.

**Part 3: Logging**

MedaCy uses the [logging](https://docs.python.org/3/howto/logging.html#logging-basic-tutorial) module to allow users insight into how medaCy is handling their data. Insure you are logging critical steps in any functionality you implement at the appropriate logging levels to make it easy for users to debug.
MedaCy uses the [logging](https://docs.python.org/3/howto/logging.html#logging-basic-tutorial) module to allow users
insight into how medaCy is handling their data. Insure you are logging critical steps in any functionality you implement
at the appropriate logging levels to make it easy for users to debug.

## Running Unit Tests
All components of medaCy have associated unit tests. Please insure these all pass before submitting pull requests. When medaCy runs unit tests, it first automatically installs the [END dataset](https://github.com/NanoNLP/medaCy_dataset_end) then uses it to test various functionalities of the package. Some tests involve building a model over the dataset - these may take some time to complete.
All components of medaCy have associated unit tests. Please insure these all pass before submitting pull requests.
When medaCy runs unit tests, it first automatically installs the [END dataset](https://github.com/NanoNLP/medaCy_dataset_end)
then uses it to test various functionalities of the package. Some tests involve building a model over the dataset - these
may take some time to complete.

After installing medaCy for development, make sure that `pytest` is installed. Then:

1) For quick testing of the whole framework, run:

1) For quick testing of the whole framework, run: \
`python setup.py test`.
1) For more fine-grained testing on individual files with colorful log output run:

1) For more fine-grained testing on individual files with colorful log output run: \
`pytest -s tests/tools/test_data_manager.py -o log_cli=True --log-cli-level=INFO`.

This will show log output during tests and allow you to adust logging level for the test file being run. Read the pytest documentation for details.

This will show log output during tests and allow you to adust logging level for the test file being run.
Read the pytest documentation for details.

Note that some of the unit tests require knowledge about the configuration of your machine, and that those tests will
be skipped if those configuration settings are not specified in the config.json file. These settings include
the location of a MetaMap binary file on your machine, which GPU core to use for certain tests, and the location
of a word embeddings file. It may be that your contributions will not affect functionality that depend on these features,
however, all pull requests will be tested against the full unit test suite.
19 changes: 13 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
[![spaCy](https://img.shields.io/badge/built%20with-spaCy-09a3d5.svg)](https://spacy.io)

# medaCy
:hospital: Medical Text Mining and Information Extraction with spaCy :hospital:

MedaCy is a text processing and learning framework built over [spaCy](https://spacy.io/) to support the lightning fast prototyping, training, and application of highly predictive medical NLP models. It is designed to streamline researcher workflow by providing utilities for model training, prediction and organization while insuring the replicability of systems.
MedaCy is a text processing and learning framework built over [spaCy](https://spacy.io/) to support the lightning fast
prototyping, training, and application of highly predictive medical NLP models. It is designed to streamline researcher
workflow by providing utilities for model training, prediction and organization while insuring the replicability of systems.

![alt text](https://nlp.cs.vcu.edu/images/Edit_NanomedicineDatabase.png "Nanoinformatics")


# :star2: Features
- Highly predictive, shared-task dominating out-of-the-box trained models for medical named entity recognition.
- Customizable pipelines with detailed development instructions and documentation.
Expand All @@ -31,7 +33,7 @@ MedaCy can be installed for general use or for pipeline development / research p


# :books: Power of medaCy
After installing medaCy and [medaCy's clinical model](examples/models/clinical_notes_model.md), simply run:
After installing medaCy and [medaCy's clinical model](guide/models/clinical_notes_model.md), simply run:

```python
from medacy.model.model import Model
Expand All @@ -49,7 +51,10 @@ and receive instant predictions:
('Duration', 46, 56, 'for 5 days')
]
```
To explore medaCy's other models or train your own, visit the [examples section](examples).

MedaCy can also be used through its command line interface, documented [here](./guide/command_line_interface.md)

To explore medaCy's other models or train your own, visit the [examples section](guide).

Reference
=========
Expand All @@ -69,9 +74,11 @@ This package is licensed under the GNU General Public License.

Authors
=======
Andriy Mulyar, Jorge Vargas, Corey Sutphin, Steele Farnsworth, Bobby Best, and Bridget T. McInnes
Current contributors: Steele Farnsworth, Anna Conte, Gabby Gurdin, Aidan Kierans, Aidan Myers, and Bridget T. McInnes

Former contributors: Andriy Mulyar, Jorge Vargas, Corey Sutphin, and Bobby Best

Acknowledgments
===============
- [VCU Natural Language Processing Lab](https://nlp.cs.vcu.edu/) ![alt text](https://nlp.cs.vcu.edu/images/vcu_head_logo "VCU")
- [VCU Natural Language Processing Lab](https://nlp.cs.vcu.edu/) ![alt text](https://nlp.cs.vcu.edu/images/vcu_head_logo "VCU")
- [Nanoinformatics Vertically Integrated Projects](https://rampages.us/nanoinformatics/)
2 changes: 1 addition & 1 deletion config.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"metamap_path": 0}
{"metamap_path": 0, "cuda_device": -2, "word_embeddings": 0}
56 changes: 0 additions & 56 deletions examples/guide/creating_an_external_dataset.md

This file was deleted.

49 changes: 0 additions & 49 deletions examples/release_notes.md

This file was deleted.

Loading

0 comments on commit d3f5591

Please sign in to comment.