Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation to reflect new API (v2.3.0) #122

Merged
merged 30 commits into from
Dec 31, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
33f02a2
Restructure docs
johnlees Oct 22, 2020
9042a82
Correct mock import for graph-tool
johnlees Oct 22, 2020
e8a8447
Write most of assign query page
johnlees Oct 22, 2020
fee2d00
Add update db
johnlees Oct 23, 2020
7714ac2
Start sketching doc
johnlees Oct 23, 2020
4e55983
Finish sketching doc
johnlees Oct 23, 2020
3bd6c5f
Tweak contents appearance
johnlees Oct 26, 2020
c5c3477
Add 0 threads note to troubleshooting
johnlees Oct 28, 2020
6186c82
Add note on visualisation of query results
johnlees Nov 16, 2020
13cf994
Merge branch 'master' into new_docs
johnlees Nov 17, 2020
f001cea
Online documentation and web in api.rst
Danderson123 Dec 1, 2020
9d6d1be
online.rst population phylogeny rephrase
Danderson123 Dec 1, 2020
16868fd
online.rst rephrase and errors
Danderson123 Dec 1, 2020
b006977
Add links to old doc versions
johnlees Dec 18, 2020
d6b9935
Merge branch 'master' into new_docs
johnlees Dec 21, 2020
fe03775
Added QC docs
johnlees Dec 21, 2020
59aa17a
Run the distance plot at create-db time
johnlees Dec 22, 2020
5cfa2ce
Add docs for BGMM
johnlees Dec 23, 2020
702093d
Add docs for DBSCAN
johnlees Dec 23, 2020
b1439cf
Add refine model docs
johnlees Dec 23, 2020
890f485
Finish model fitting doc page
johnlees Dec 28, 2020
12c5881
full db option mentioned
johnlees Dec 29, 2020
c8bb040
Merge branch 'master' into new_docs
johnlees Dec 29, 2020
b4cfe35
Add model distribution page
johnlees Dec 30, 2020
89b2eb6
Start on viz docs
johnlees Dec 30, 2020
f7b04b3
FInish viz docs
johnlees Dec 31, 2020
18e0c3a
Add subclustering docs
johnlees Dec 31, 2020
4bceb12
Add assign and viz to options docs
johnlees Dec 31, 2020
3b12c96
Update subclustering
johnlees Dec 31, 2020
2786b3e
Remove old docs images
johnlees Dec 31, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions PopPUNK/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@

# Minimum sketchlib version
SKETCHLIB_MAJOR = 1
SKETCHLIB_MINOR = 5
SKETCHLIB_PATCH = 3
SKETCHLIB_MINOR = 6
SKETCHLIB_PATCH = 0
16 changes: 11 additions & 5 deletions PopPUNK/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,12 +111,13 @@ def get_options():
type=float, default = None)
refinementGroup.add_argument('--manual-start', help='A file containing information for a start point. '
'See documentation for help.', default=None)
refinementGroup.add_argument('--indiv-refine', help='Also run refinement for core and accessory individually', default=False,
action='store_true')
refinementGroup.add_argument('--indiv-refine', help='Also run refinement for core and accessory individually',
choices=['both', 'core', 'accessory'], default = False)
refinementGroup.add_argument('--no-local', help='Do not perform the local optimization step (speed up on very large datasets)',
default=False, action='store_true')
refinementGroup.add_argument('--model-dir', help='Directory containing model to use for assigning queries '
'to clusters [default = reference database directory]', type = str)
refinementGroup.add_argument('--core-only', help='Save the core distance fit (with ')

# lineage clustering within strains
lineagesGroup = parser.add_argument_group('Lineage analysis options')
Expand Down Expand Up @@ -175,6 +176,7 @@ def main():
from .network import printClusters

from .plot import writeClusterCsv
from .plot import plot_scatter

from .prune_db import prune_distance_matrix

Expand Down Expand Up @@ -287,6 +289,11 @@ def main():
dists_out = args.output + "/" + os.path.basename(args.output) + ".dists"
storePickle(refList, queryList, True, distMat, dists_out)

# Plot results
plot_scatter(distMat,
args.output + "/" + os.path.basename(args.output) + "_distanceDistribution",
args.output + " distances")

#******************************#
#* *#
#* model fit and network *#
Expand Down Expand Up @@ -434,7 +441,6 @@ def main():
overall_lineage,
output_format = 'phandango',
epiCsv = None,
queryNames = refList,
suffix = '_Lineage')
genomeNetwork = indivNetworks[min(rank_list)]

Expand Down Expand Up @@ -469,10 +475,10 @@ def main():
output + "/" + os.path.basename(output) + \
"_" + dist_type + '_graph.gt', fmt = 'gt')

if args.core_only:
if args.indiv_refine == 'core':
fit_type = 'core'
genomeNetwork = indivNetworks['core']
elif args.accessory_only:
elif args.indiv_refine == 'accessory':
fit_type = 'accessory'
genomeNetwork = indivNetworks['accessory']

Expand Down
11 changes: 1 addition & 10 deletions PopPUNK/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,6 @@

import pp_sketchlib

from .plot import plot_scatter

# BGMM
from .bgmm import fit2dMultiGaussian
from .bgmm import assign_samples
Expand Down Expand Up @@ -126,7 +124,6 @@ def fit(self, X = None):
'''Initial steps for all fit functions.

Creates output directory. If preprocess is set then subsamples passed X
and draws a scatter plot from result using :func:`~PopPUNK.plot.plot_scatter`.

Args:
X (numpy.array)
Expand Down Expand Up @@ -159,12 +156,6 @@ def fit(self, X = None):
self.scale = np.amax(self.subsampled_X, axis = 0)
self.subsampled_X /= self.scale

# Show clustering
plot_scatter(self.subsampled_X,
self.scale,
self.outPrefix + "/" + os.path.basename(self.outPrefix) + "_distanceDistribution",
self.outPrefix + " distances")

def plot(self, X=None):
'''Initial steps for all plot functions.

Expand Down Expand Up @@ -474,7 +465,7 @@ def plot(self, X=None, y=None):
if not hasattr(self, 'subsampled_X'):
self.subsampled_X = utils.shuffle(X, random_state=random.randint(1,10000))[0:self.max_samples,]

non_noise = np.sum(np.where(self.labels != -1))
non_noise = np.sum(self.labels != -1)
sys.stderr.write("Fit summary:\n" + "\n".join(["\tNumber of clusters\t" + str(self.n_clusters),
"\tNumber of datapoints\t" + str(self.subsampled_X.shape[0]),
"\tNumber of assignments\t" + str(non_noise)]) + "\n\n")
Expand Down
18 changes: 14 additions & 4 deletions PopPUNK/plot.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import sys
import os
import subprocess
import random
import numpy as np
import matplotlib as mpl
mpl.use('Agg')
Expand All @@ -18,7 +19,7 @@
import pandas as pd
from collections import defaultdict
from scipy import spatial
from sklearn import manifold
from sklearn import manifold, utils
try: # sklearn >= 0.22
from sklearn.neighbors import KernelDensity
except ImportError:
Expand All @@ -27,16 +28,14 @@

from .utils import isolateNameToLabel

def plot_scatter(X, scale, out_prefix, title, kde = True):
def plot_scatter(X, out_prefix, title, kde = True):
"""Draws a 2D scatter plot (png) of the core and accessory distances

Also draws contours of kernel density estimare

Args:
X (numpy.array)
n x 2 array of core and accessory distances for n samples.
scale (numpy.array)
Scaling factor from :class:`~PopPUNK.models.BGMMFit`
out_prefix (str)
Prefix for output plot file (.png will be appended)
title (str)
Expand All @@ -46,6 +45,15 @@ def plot_scatter(X, scale, out_prefix, title, kde = True):

(default = True)
"""
# Plot results - max 1M for speed
max_plot_samples = 1000000
if X.shape[0] > max_plot_samples:
X = utils.shuffle(X, random_state=random.randint(1,10000))[0:max_plot_samples,]

# Kernel estimate uses scaled data 0-1 on each axis
scale = np.amax(X, axis = 0)
X /= scale

plt.figure(figsize=(11, 8), dpi= 160, facecolor='w', edgecolor='k')
if kde:
xx, yy, xy = get_grid(0, 1, 100)
Expand All @@ -58,11 +66,13 @@ def plot_scatter(X, scale, out_prefix, title, kde = True):
z = z.reshape(xx.shape).T

levels = np.linspace(z.min(), z.max(), 10)
# Rescale contours
plt.contour(xx*scale[0], yy*scale[1], z, levels=levels[1:], cmap='plasma')
scatter_alpha = 1
else:
scatter_alpha = 0.1

# Plot on correct scale
plt.scatter(X[:,0]*scale[0].flat, X[:,1]*scale[1].flat, s=1, alpha=scatter_alpha)

plt.title(title)
Expand Down
5 changes: 4 additions & 1 deletion PopPUNK/visualise.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ def get_options():
faGroup.add_argument('--cytoscape', help='Generate network output files for Cytoscape', default=False, action='store_true')
faGroup.add_argument('--phandango', help='Generate phylogeny and TSV for Phandango visualisation', default=False, action='store_true')
faGroup.add_argument('--grapetree', help='Generate phylogeny and CSV for grapetree visualisation', default=False, action='store_true')
faGroup.add_argument('--rapidnj', help='Path to rapidNJ binary to build NJ tree for Microreact', default=None)
faGroup.add_argument('--rapidnj', help='Path to rapidNJ binary to build NJ tree for Microreact', default='rapidnj')
faGroup.add_argument('--perplexity',
type=float, default = 20.0,
help='Perplexity used to calculate t-SNE projection (with --microreact) [default=20.0]')
Expand All @@ -135,6 +135,9 @@ def get_options():
if arg is not None:
arg = arg.rstrip('\\')

if args.rapidnj == "":
args.rapidnj = None

return args

def generate_visualisations(query_db,
Expand Down
20 changes: 14 additions & 6 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ Documentation for module functions (for developers)
.. contents::
:local:

assign.py
---------
``poppunk_assign`` main function

.. automodule:: PopPUNK.assign
:members:

bgmm.py
--------

Expand All @@ -24,12 +31,6 @@ Functions used to fit DBSCAN to a database. Access using
.. automodule:: PopPUNK.dbscan
:members:

mash.py
--------

.. automodule:: PopPUNK.mash
:members:

models.py
---------

Expand Down Expand Up @@ -80,6 +81,13 @@ utils.py
.. automodule:: PopPUNK.utils
:members:

visualise.py
------------
``poppunk_visualise`` main function

.. automodule:: PopPUNK.visualise
:members:

web.py
--------

Expand Down
72 changes: 72 additions & 0 deletions docs/best_practises.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
Best practises guide
====================
This page details the way in which we would advise that you *should* use and
run PopPUNK, if possible.

.. image:: images/poppunk_flowchart.png
:alt: Flowchart for choosing how to use PopPUNK
:align: center

Use an online interface
-----------------------
If available, you may want to use one of the browser-based interfaces to
PopPUNK. These include `PopPUNK-web <https://web.poppunk.net/>`__ and
`pathogen.watch <https://pathogen.watch/genomes/all?genusId=1301&speciesId=1313>`__
(*S. pneumoniae* only). See the :doc:`online` page for full details.

Using these interfaces requires nothing to be installed or set up, doesn't require any
genome data to be shared with us, and will return interactive visualisations. If your
species isn't available, or you have large batches of genomes to cluster you will
likely want to use the command line interface instead.

Use the command line interface
------------------------------

Installation and version
^^^^^^^^^^^^^^^^^^^^^^^^
Install via conda if possible. Please use at least version v2.3.0 of PopPUNK
and v1.5.1 of ``pp-sketchlib``.

Use query assignment mode
^^^^^^^^^^^^^^^^^^^^^^^^^
If a database is available for your species (see https://poppunk.net/pages/databases.html)
we would strongly recommend downloading it to use to cluster your genomes. This
has many advantages:

- No need to run through the potentially complex model fitting.
- Assured model performance.
- Considerable faster run times.
- Use existing cluster definitions.
- Use the context of large, high quality reference populations to interpret your
genomes' clusters.

See :doc:`query_assignment` for instructions on how to use this mode.

You can think of this as being similar to using an existing MLST/cgMLST/wgMLST scheme
to define your sample's strains.

Fit your own model
^^^^^^^^^^^^^^^^^^
If a database isn't available for your species, you can fit your own. Details
on how to do this can be found on :doc:`model_fitting`.

After getting a good fit, you may want to share it with others so that they can
use it to assign queries. See :doc:`model_distribution` for advice. We would also
be interested to hear from you if you'd like to add your new model to the
pre-fit databases above -- please contact [email protected].

Create visualisations
^^^^^^^^^^^^^^^^^^^^^
A number of plots are created by default. You can also
create files for further visualisation in `microreact <https://microreact.org/>`__,
`cytoscape <http://www.cytoscape.org/>`__,
`grapetree <http://dx.doi.org/10.1101/gr.232397.117>`__ and
`phandango <http://jameshadfield.github.io/phandango/>`_. We have found that
looking at the appearance of clusters on a tree is always very helpful, and would
recommend this for any fit.

Older versions of PopPUNK mandated this be chosen as part of the main analysis,
and then with ``--generate-viz`` mode. This is now run separately, after the
main analysis, with ``poppunk_visualise``.

See :doc:`visualisation` for details on options.
9 changes: 5 additions & 4 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,8 @@
# Causes a problem with rtd: https://github.com/pypa/setuptools/issues/1694
autodoc_mock_imports = ["hdbscan",
"numpy",
"graph-tool",
"graph_tool.all",
"graph_tool",
"pandas",
"scipy",
"sklearn",
Expand All @@ -65,16 +66,16 @@
# General information about the project.
project = 'PopPUNK'
copyright = '2018-2020, John Lees and Nicholas Croucher'
author = 'John Lees and Nicholas Croucher'
author = 'John Lees, Daniel Anderson and Nicholas Croucher'

# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = '2.2.0'
version = '2.3.0'
# The full version, including alpha/beta/rc tags.
release = '2.2.0'
release = '2.3.0'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
Binary file added docs/images/13mer_hist.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/images/DPGMM_fit_K2.png
Binary file not shown.
Binary file removed docs/images/DPGMM_fit_K3.png
Binary file not shown.
Binary file added docs/images/assign_network.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Binary file added docs/images/bgmm_k2_fit.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/bgmm_k4_boundary.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/bgmm_k4_fit.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/cytoscape.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/cytoscape_gpsc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/dbscan_fit.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/dbscan_fit_min_prop.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/images/fit_example_fixed.png
Binary file not shown.
Binary file removed docs/images/fit_example_wrong.png
Binary file not shown.
Binary file added docs/images/flu_phased.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/flu_unphased.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/grapetree.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/grapetree_collapse.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/indiv_refine.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/kmer_fit.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/listeria_dists.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/listeria_lineage_rank_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/listeria_lineage_rank_1_histogram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/listeria_lineage_rank_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/listeria_lineage_rank_3_histogram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/listeria_microreact.png
Binary file added docs/images/listeria_refined.png
Binary file added docs/images/listeria_threshold.png
Binary file removed docs/images/lm_GMM_K2.png
Diff not rendered.
Binary file removed docs/images/lm_GMM_K4.png
Diff not rendered.
Binary file removed docs/images/lm_dbscan.png
Diff not rendered.
Binary file removed docs/images/lm_distance_dist.png
Diff not rendered.
Binary file removed docs/images/lm_fit.png
Diff not rendered.
Binary file removed docs/images/lm_microreact.png
Diff not rendered.
Binary file added docs/images/phandango.png
Binary file added docs/images/poppipe_dag.png
Binary file added docs/images/poppunk_flowchart.png
Binary file added docs/images/web_cyto.png
Binary file added docs/images/web_home.png
Binary file added docs/images/web_micro.png
Binary file added docs/images/web_micro_assigned.png
Binary file added docs/images/web_micro_change.png
Binary file added docs/images/web_phylo.png
Binary file added docs/images/web_prevs.png
Binary file added docs/images/web_prevs_zoomed.png
Binary file added docs/images/web_stats.png
Loading