Skip to content

Commit

Permalink
Merge changes from Github.
Browse files Browse the repository at this point in the history
  • Loading branch information
Aleksey Jironkin authored and Aleksey Jironkin committed Oct 21, 2016
2 parents aec17d9 + 7384464 commit ff57414
Show file tree
Hide file tree
Showing 3 changed files with 42 additions and 2 deletions.
21 changes: 20 additions & 1 deletion docs/Filters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,23 @@ One of the key parts of the VCF processing is to filter quality calls. To do thi

- **uncall_gt** - Filter records with uncallable genotypes in VCF (**GT** is ./.).

All filters are applied for each position and those positions that pass ALL filters are kept as quality calls. Positions failing filter will be kept for future reference and creating fasta files, when needed. To specify filters to be used, simply list them as key:threshold pairs separated by comma(,). For filters that don't require threshold, leave blank after ':'.
All filters are applied for each position and those positions that pass ALL filters are kept as quality calls. Positions failing filter will be kept for future reference and creating fasta files, when needed. To specify filters to be used, simply list them as key:threshold pairs separated by comma(,). For filters that don't require threshold, leave blank after ':'.

Not all filters are available for all variant callers. Which filters can be used with your data depends on the variant caller that was used to create your VCF file.

============= ==========================================
Filter Remark
============= ==========================================
Quality score GATK and samtools mpileup
AD ratio GATK and samtools mpileup version >=1.3
DP4 ratio samtools mpileup version <=1.2
MQ score GATK and samtools mpileup
MQ0 ratio GATK only
MQ0F ratio samtools mpileup only
GQ score GATK and samtools mpileup
Minimum depth GATK and samtools mpileup
Uncall GT GATK and samtools mpileup
============= ==========================================


If you want to filter a VCF file that was not created with either GATK or samtools, please refer to the documentation of this tool to check which data is available in your VCF files.
23 changes: 22 additions & 1 deletion docs/Galaxy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ How to use Phenix on Galaxy:

The Phenix workflow is now ready to use. You need to upload your data to a Galaxy history to use it. There are multiple options depending on your local Galaxy configuration. If you have followed the instructions above under 'Get your own Galaxy' the only available option is uploading from your local harddrive. When doing this, please make sure your fastq files are in the 'fastqsanger' format and your reference genome is in 'fasta' format. The basename of your R1 fastq files will be used as the name for the resulting bam and vcf files with appropriate file endings and will also appear as the sequence header in your resulting SNP alignment. Once your data is ready to use follow theses instructions to run the workflow.

Remark: Processing a single sample with the Phenix workflow can use up to 1.5GB of RAM. It is recommended you do not run more samples than your total system memory divided by 1.5 (2 for 4GB, 5 for 8GB, ...) or only as may samples as you have processor cores, whichever is lower.
.. NOTE:: We have chosen reasonable defaults for filtering your VCF files for high-confidence SNPs (min_depth:5,mq_score:30,qual_score:30,dp4_ratio:0.9,mq0f_ratio:0.1). If you would like to change these settings please see the instructions below.

- Click on workflow in the top main menu and select 'run' from the 'Phenix workflow' context menu.
- Select your reference genome in the 'Reference fasta' selection box.
Expand All @@ -132,3 +132,24 @@ Remark: Processing a single sample with the Phenix workflow can use up to 1.5GB
- Select all files ending with filtered.vcf in the top file selection box under 'Input VCF files(s)', by holding down the Ctrl key.
- Click 'Execute'. Once everything is completed the "VCFs to fasta" tool with have produced your SNP alignment that can now be used for further processing.

.. topic:: How to change the number of jobs running simultaneously?

Galaxy runs 5 jobs at the same time by default. This is appropriate for machines that have at least 5 GB of RAM and at least 5 processor cores. If you have more or less compute resources at your disposal you can change the number of concurrent jobs, if you rename the file *config/job_conf.xml.sample_basic* to *config/job_conf.xml* and change the "workers" setting in the file to the desired number. This requires a restart of Galaxy.


Advanced - Changing the Phenix vcf filtering settings:
------------------------------------------------------

- Click on workflow in the top main menu and select 'edit' from the 'Phenix workflow' context menu.
- Click on the 'Filter VCF' tool box to highlight it.
- The parameters for this tool can now be edited in the panel on the right hand side of the browser window.
- Use the little trash can in the top right corner of each individual filter to remove it.
- Use the button labelled '+ Insert SNP filter' to add a new one.
- Use the drop-down menu to select a new filter type and the corresponding text box to set the threshold value.

.. figure:: workflow_editor.png
:align: center

The Galaxy workflow editor.

.. WARNING:: Not all filters are suitable for all variant callers. Please refer to the table under :doc:`/Filters`.
Binary file added docs/workflow_editor.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit ff57414

Please sign in to comment.