Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch correction methods, Scanpy 1.6-specific things #86

Merged
merged 32 commits into from
Aug 23, 2020
Merged
Changes from 30 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
83adae3
Correct scatter function usage and normalisation step
pinin4fjords Aug 4, 2020
c3f4d64
Relax most pins
pinin4fjords Aug 4, 2020
38701f1
Add some missing options
pinin4fjords Aug 5, 2020
10171a5
Add method argument for umap
pinin4fjords Aug 5, 2020
f4a0b6b
Fix fdg call to use scanpy-native functionality for slot specification
pinin4fjords Aug 5, 2020
155e356
Use Scanpy-native option for graph specification in umap, move associ…
pinin4fjords Aug 6, 2020
a36a7e5
Remove legacy snippet
pinin4fjords Aug 6, 2020
45f75c0
Switch louvain to native neighors_key/ obsp for slot selection
pinin4fjords Aug 6, 2020
5f05115
Switch leiden to native neighors_key/ obsp for slot selection, remove…
pinin4fjords Aug 6, 2020
feabc73
Add layer specification to rank_genes_groups call
pinin4fjords Aug 6, 2020
8a34634
Updata paga call to allow RNA velocity, use Scanpy-native graph slot …
pinin4fjords Aug 6, 2020
a161abf
Set diffmap, dpt to use neighbors_key, add other missing option.
pinin4fjords Aug 7, 2020
129483b
Fixes for embeddings, plot_paga to use new sc.pl.embeddings.
pinin4fjords Aug 10, 2020
1d2ee55
Fix plotting test params
pinin4fjords Aug 10, 2020
50a4ea5
Merge branch 'develop' into feature/update_for_1.5.1
pinin4fjords Aug 10, 2020
4b738de
Some plot files now have underscores
pinin4fjords Aug 10, 2020
85826e0
Add wrapper for harmony_integrate
pinin4fjords Aug 11, 2020
bbe1f52
Add bbknn CLI layer
pinin4fjords Aug 11, 2020
8e6e860
Add batch correction methods to dependencies for testing
pinin4fjords Aug 11, 2020
2a564e7
Add more harmonypy options
pinin4fjords Aug 11, 2020
0faa369
Added MNN batch correction
pinin4fjords Aug 13, 2020
4fb0b96
add mnnpy to deps
pinin4fjords Aug 13, 2020
924a745
Minor fixes, add ComBat, unify args
pinin4fjords Aug 14, 2020
79ad11c
Set batch handling to scanpy-native method for hvg, to properly respo…
pinin4fjords Aug 14, 2020
8dde170
Add new rank_genes_groups options
pinin4fjords Aug 14, 2020
494ff24
Bump Scanpy
pinin4fjords Aug 18, 2020
d50cb62
Whoops- forgot to add the combat file
pinin4fjords Aug 18, 2020
83d07f3
Merge branch 'develop' into feature/update_for_1.5.2
pinin4fjords Aug 18, 2020
c414693
Add help text with integration methods
pinin4fjords Aug 18, 2020
c37a378
Merge branch 'feature/update_for_1.5.2' of github.com:ebi-gene-expres…
pinin4fjords Aug 18, 2020
987d655
Simplify commands
pinin4fjords Aug 23, 2020
dc613d9
Finish: simplify commands
pinin4fjords Aug 23, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 15 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -37,18 +37,19 @@ Options:
--help Show this message and exit.
Commands:
read Read 10x data and save in specified format.
filter Filter data based on specified conditions.
norm Normalise data per cell.
hvg Find highly variable genes.
scale Scale data per gene.
regress Regress-out observation variables.
pca Dimensionality reduction by PCA.
neighbor Compute a neighbourhood graph of observations.
embed Embed cells into two-dimensional space.
cluster Cluster cells into sub-populations.
diffexp Find markers for each clusters.
paga Trajectory inference by abstract graph analysis.
dpt Calculate diffusion pseudotime relative to the root cells.
plot Visualise data.
read Read 10x data and save in specified format.
filter Filter data based on specified conditions.
norm Normalise data per cell.
hvg Find highly variable genes.
scale Scale data per gene.
regress Regress-out observation variables.
pca Dimensionality reduction by PCA.
neighbor Compute a neighbourhood graph of observations.
embed Embed cells into two-dimensional space.
cluster Cluster cells into sub-populations.
diffexp Find markers for each clusters.
paga Trajectory inference by abstract graph analysis.
dpt Calculate diffusion pseudotime relative to the root cells.
integrate Integrate cells from different experimental batches.
plot Visualise data.
```
92 changes: 92 additions & 0 deletions scanpy-scripts-tests.bats
Original file line number Diff line number Diff line change
@@ -69,6 +69,19 @@ setup() {
plt_rank_genes_groups_matrix_pdf="${output_dir}/rggmatrix_${test_clustering}.pdf"
plt_rank_genes_groups_dot_pdf="${output_dir}/rggdot_${test_clustering}.pdf"
plt_rank_genes_groups_heatmap_pdf="${output_dir}/rggheatmap_${test_clustering}.pdf"
harmony_integrate_obj="${output_dir}/harmony_integrate.h5ad"
harmony_integrate_opt="--batch-key ${test_clustering}"
harmony_plt_embed_opt="--projection 2d --color ${test_clustering} --title 'PCA embeddings after harmony' --basis 'X_pca_harmony'"
noharmony_plt_embed_opt="--projection 2d --color ${test_clustering} --title 'PCA embeddings before harmony' --basis 'X_pca'"
harmony_integrated_pca_pdf="${output_dir}/harmony_pca_${test_clustering}.pdf"
noharmony_integrated_pca_pdf="${output_dir}/pca_${test_clustering}.pdf"
bbknn_obj="${output_dir}/bbknn.h5ad"
bbknn_opt="--batch-key ${test_clustering} --key-added bbknn"
mnn_obj="${output_dir}/mnn.h5ad"
mnn_opt="--batch-key ${test_clustering}"
combat_obj="${output_dir}/combat.h5ad"
combat_opt="--batch-key ${test_clustering}"


if [ ! -d "$data_dir" ]; then
mkdir -p $data_dir
@@ -442,6 +455,85 @@ setup() {
[ -f "$plt_rank_genes_groups_matrix_pdf" ]
}

# Do harmony batch correction, using clustering as batch (just for test purposes)

@test "Run Harmony batch integration using clustering as batch" {
if [ "$resume" = 'true' ] && [ -f "$harmony_integrate_obj" ]; then
skip "$harmony_integrate_obj exists and resume is set to 'true'"
fi

run rm -f $harmony_integrate_obj && eval "$scanpy integrate harmony_integrate $harmony_integrate_opt $louvain_obj $harmony_integrate_obj"

[ "$status" -eq 0 ]
[ -f "$plt_rank_genes_groups_matrix_pdf" ]

}

# Run Plot PCA embedding before harmony

@test "Run Plot PCA embedding before Harmony" {
if [ "$resume" = 'true' ] && [ -f "$noharmony_integrated_pca_pdf" ]; then
skip "$noharmony_integrated_pca_pdf exists and resume is set to 'true'"
fi

run rm -f $noharmony_integrated_pca_pdf && eval "$scanpy plot embed $noharmony_plt_embed_opt $louvain_obj $noharmony_integrated_pca_pdf"

[ "$status" -eq 0 ]
[ -f "$noharmony_integrated_pca_pdf" ]
}

# Run Plot PCA embedding after harmony

@test "Run Plot PCA embedding after Harmony" {
if [ "$resume" = 'true' ] && [ -f "$harmony_integrated_pca_pdf" ]; then
skip "$harmony_integrated_pca_pdf exists and resume is set to 'true'"
fi

run rm -f $harmony_integrated_pca_pdf && eval "$scanpy plot embed $harmony_plt_embed_opt $harmony_integrate_obj $harmony_integrated_pca_pdf"

[ "$status" -eq 0 ]
[ -f "$harmony_integrated_pca_pdf" ]
}

# Do bbknn batch correction, using clustering as batch (just for test purposes)

@test "Run BBKNN batch integration using clustering as batch" {
if [ "$resume" = 'true' ] && [ -f "$bbknn_obj" ]; then
skip "$bbknn_obj exists and resume is set to 'true'"
fi

run rm -f $bbknn_obj && eval "$scanpy integrate bbknn $bbknn_opt $louvain_obj $bbknn_obj"

[ "$status" -eq 0 ]
[ -f "$plt_rank_genes_groups_matrix_pdf" ]
}

# Do MNN batch correction, using clustering as batch (just for test purposes)

@test "Run MNN batch integration using clustering as batch" {
if [ "$resume" = 'true' ] && [ -f "$mnn_obj" ]; then
skip "$mnn_obj exists and resume is set to 'true'"
fi

run rm -f $mnn_obj && eval "$scanpy integrate mnn_correct $mnn_opt $louvain_obj $mnn_obj"

[ "$status" -eq 0 ]
[ -f "$mnn_obj" ]
}

# Do ComBat batch correction, using clustering as batch (just for test purposes)

@test "Run Combat batch integration using clustering as batch" {
if [ "$resume" = 'true' ] && [ -f "$combat_obj" ]; then
skip "$combat_obj exists and resume is set to 'true'"
fi

run rm -f $combat_obj && eval "$scanpy integrate combat $combat_opt $louvain_obj $combat_obj"

[ "$status" -eq 0 ]
[ -f "$combat_obj" ]
}

# Local Variables:
# mode: sh
# End:
14 changes: 14 additions & 0 deletions scanpy_scripts/cli.py
Original file line number Diff line number Diff line change
@@ -30,6 +30,10 @@
PLOT_DOT_CMD,
PLOT_MATRIX_CMD,
PLOT_HEATMAP_CMD,
HARMONY_INTEGRATE_CMD,
BBKNN_CMD,
MNN_CORRECT_CMD,
COMBAT_CMD,
)


@@ -101,6 +105,16 @@ def cluster():
cli.add_command(DPT_CMD)


@cli.group(cls=NaturalOrderGroup)
def integrate():
"""Integrate cells from different experimental batches."""

integrate.add_command(HARMONY_INTEGRATE_CMD)
integrate.add_command(BBKNN_CMD)
integrate.add_command(MNN_CORRECT_CMD)
integrate.add_command(COMBAT_CMD)


@cli.group(cls=NaturalOrderGroup)
def plot():
"""Visualise data."""
Loading