-
Notifications
You must be signed in to change notification settings - Fork 23
Home
- Bootstrap model building: some action point.
- Bootstrap model building: some action point.
- Bootstrap model building: some action point.
pip install git+https://github.com/broadinstitute/ml.git # Install the package
and then in Python
import ml4cvd
>>> solve disease
Machine Learning for Health (ML4H) is an open-source initiative begun in 2018 that aims to bridge the gap between clinical methods and machine learning. Our goal is to accelerate the principled use of ML for clinical and genomic research. The code was created by a group of ML researchers and clinical research scientists with support from the Broad Institute of Harvard and MIT’s Data Science Platform. ML4H began as a set of tools to make it easy to work with the UK Biobank on the Google Cloud Platform and has since expanded to include other data sources and functionality.
ML4H is aimed at two audiences:
- Experienced ML researchers who want to incorporate best-practice clinical methods into their models
- Clinician-researchers familiar with code who want to incorporate ML into research plans
Create a ticket with a bug or question on GitHub Issues to help the community help you and enrich it with your experience.
The ML4H repository has three modular components:
- Data Harmonization: pre-processes multimodal clinical data for model training.
- Ingest, standardize, and quality control clinical data.
- Tools for labeling data at scale.
- Works with diverse data types (EHR, MRI, ECG, genetics ...).
- Ingests data into high-performance HDF5 containers
- Standard feature generation from image modalities (abdominal MRIs, ECGs, cardiac MRIs, Brain MRIs, EHR), clinical notes, genetics data
- Introduces a cross-platform metadata abstraction (TensorMap) to consistently generate numpy input/output tensors for training/evaluation
- Model Training: train ML algorithms on input/output tensors
- Includes collections of both standard and clinically-focused loss functions, to optimize models for classification regression and survival analyses.
- Prepackaged with flexible multi-task, multimodal deep learning architectures (CNN, VAE, UNets) abstracted from Keras and TensorFlow with sensible defaults for phenotypes, clinical time-series, wearables, image data and notes.
- Integrated hyperparameter optimization framework.
- Automatic generation of training curves, and performance metric plots including receiver operating characteristics, precision recall curves, Kaplan-Meier survival curves, discrimination, calibration plots.
- Evaluation / Inference: Generate predictions on new data, create outputs ready for clinician interrogation
- Data visualization tools for overlying model predictions for clinician review (for example, ECG, Cardiac MRI overlays).
- Predict outcomes on input data from model/data that conforms to TensorMaps
- Interpret model learning though saliency plots and low dimensional representations (t-SNE) of the deep neural networks internal states.
Using bash script indirection
./scripts/tf.sh -r /home/mklarqvi/ml/ml4cvd/recipes.py --mode train --input_tensors ecg_rest --output_tensors ventricular_rate --tensors /home/mklarqvi/google-ecg-rest-38k-tensors/2020-03-14/ --batch_size 8 --training_steps 72 --epochs 3 --inspect_model --tensormap_prefix tensormap.ukb.ecg
or python indirection
python ml4cvd/recipes.py --mode train --input_tensors ukb.ecg.ecg_rest --output_tensors ukb.ecg.ventricular_rate --tensors /app/google-ecg-rest-38k-tensors/2020-03-14/ --batch_size 8 --training_steps 72 --epochs 3 --inspect_model
For advanced approaches see the Wiki
All bug reports, documentation improvements, enhancements and ideas are appreciated. Just let us know via GitHub.
Bug reports must:
- Include a short, self-contained R snippet reproducing the problem.
- Add a minimal data sample for us to reproduce the problem.
- Explain why the current behavior is wrong/not desired and what you expect instead.
- If the issue is about visualisations, please attach a picture to the issue. In other case we wouldn't be able to reproduce the bug and fix it.
ML4H Team. (2020). super cool title. Zenodo. http://doi.org/10.1234/zenodo.1337
BibTex:
@misc{ml4h_team_2020_1337,
author = {{ML4H Team}},
title = {super cool title}},
month = aug,
year = 2020,
doi = {10.1234/zenodo.1337},
url = {http://doi.org/10.1234/zenodo.1337}
}
The package is freely distributed under the BSD-3 license.
Welcome to the ml4cvd wiki!