Skip to content
Marcus D. R. Klarqvist edited this page Jul 17, 2020 · 6 revisions

ml4h --- Fast Experimentation Doing Cool Stuff

Why ml4h?

  • Bootstrap model building: some action point.
  • Bootstrap model building: some action point.
  • Bootstrap model building: some action point.

Quickstart

pip install git+https://github.com/broadinstitute/ml.git # Install the package

and then in Python

import ml4cvd
>>> solve disease

Table of Contents

Introduction

Machine Learning for Health (ML4H) is an open-source initiative begun in 2018 that aims to bridge the gap between clinical methods and machine learning. Our goal is to accelerate the principled use of ML for clinical and genomic research. The code was created by a group of ML researchers and clinical research scientists with support from the Broad Institute of Harvard and MIT’s Data Science Platform. ML4H began as a set of tools to make it easy to work with the UK Biobank on the Google Cloud Platform and has since expanded to include other data sources and functionality.

ML4H is aimed at two audiences:

  • Experienced ML researchers who want to incorporate best-practice clinical methods into their models
  • Clinician-researchers familiar with code who want to incorporate ML into research plans

Contact

Create a ticket with a bug or question on GitHub Issues to help the community help you and enrich it with your experience.

Features

The ML4H repository has three modular components:

  1. Data Harmonization: pre-processes multimodal clinical data for model training.
    • Ingest, standardize, and quality control clinical data.
    • Tools for labeling data at scale.
    • Works with diverse data types (EHR, MRI, ECG, genetics ...).
    • Ingests data into high-performance HDF5 containers
    • Standard feature generation from image modalities (abdominal MRIs, ECGs, cardiac MRIs, Brain MRIs, EHR), clinical notes, genetics data
    • Introduces a cross-platform metadata abstraction (TensorMap) to consistently generate numpy input/output tensors for training/evaluation
  2. Model Training: train ML algorithms on input/output tensors
    • Includes collections of both standard and clinically-focused loss functions, to optimize models for classification regression and survival analyses.
    • Prepackaged with flexible multi-task, multimodal deep learning architectures (CNN, VAE, UNets) abstracted from Keras and TensorFlow with sensible defaults for phenotypes, clinical time-series, wearables, image data and notes.
    • Integrated hyperparameter optimization framework.
    • Automatic generation of training curves, and performance metric plots including receiver operating characteristics, precision recall curves, Kaplan-Meier survival curves, discrimination, calibration plots.
  3. Evaluation / Inference: Generate predictions on new data, create outputs ready for clinician interrogation
    • Data visualization tools for overlying model predictions for clinician review (for example, ECG, Cardiac MRI overlays).
    • Predict outcomes on input data from model/data that conforms to TensorMaps
    • Interpret model learning though saliency plots and low dimensional representations (t-SNE) of the deep neural networks internal states.

Quick start

Using bash script indirection

./scripts/tf.sh -r /home/mklarqvi/ml/ml4cvd/recipes.py --mode train  --input_tensors ecg_rest --output_tensors ventricular_rate  --tensors /home/mklarqvi/google-ecg-rest-38k-tensors/2020-03-14/ --batch_size 8 --training_steps 72 --epochs 3 --inspect_model --tensormap_prefix tensormap.ukb.ecg

or python indirection

python ml4cvd/recipes.py --mode train --input_tensors ukb.ecg.ecg_rest --output_tensors ukb.ecg.ventricular_rate --tensors /app/google-ecg-rest-38k-tensors/2020-03-14/ --batch_size 8 --training_steps 72 --epochs 3 --inspect_model

Advanced approaches

For advanced approaches see the Wiki

Bugs and Issues

All bug reports, documentation improvements, enhancements and ideas are appreciated. Just let us know via GitHub.

Bug reports must:

  1. Include a short, self-contained R snippet reproducing the problem.
  2. Add a minimal data sample for us to reproduce the problem.
  3. Explain why the current behavior is wrong/not desired and what you expect instead.
  4. If the issue is about visualisations, please attach a picture to the issue. In other case we wouldn't be able to reproduce the bug and fix it.

Citation

ML4H Team. (2020). super cool title. Zenodo. http://doi.org/10.1234/zenodo.1337

BibTex:

@misc{ml4h_team_2020_1337,
  author       = {{ML4H Team}},
  title        = {super cool title}},
  month        = aug,
  year         = 2020,
  doi          = {10.1234/zenodo.1337},
  url          = {http://doi.org/10.1234/zenodo.1337}
}

License

The package is freely distributed under the BSD-3 license.


Previous

Welcome to the ml4cvd wiki!

  1. TensorMaps overview
  2. TensorMap defaults
    1. UK BioBank
    2. Partners ECG
  3. to be determined