Skip to content

1D Functions

oluwatosin oluwadare edited this page Feb 20, 2019 · 45 revisions

Create an index for a reference genome

A. Purpose

To build index for the reference genome data. Indexing the reference genome makes querying fast, and can also compress the size of the genome data.

B. Input

The reference input FASTA file (usually having extension fa, mfa,.fna or similar).

C. Test Data

The human hg19 genome data (hg19.fa) can be downloaded from here: http://sysbio.rnet.missouri.edu/bdm_download/GenomeFlow/hg19_genome/hg19_genome_FASTA/

D. Output

A list of index files. This varies depending on the tool selected for indexing. BWA output 5 files (NAME.amb, NAME. ann, NAME. bwt, NAME .pac, and NAME.sa), where NAME is a prefix string, and Bowtie outputs 6 files (NAME.1.bt2, NAME.2.bt2, NAME.3.bt2, NAME.4.bt2, NAME.rev.1.bt2, and NAME.rev.2.bt2) where NAME is <bt2_base>.

E. Test Data Output

Generated index for the hg19 human genome by bowtie2 and bwa tools can be downloaded from here: http://sysbio.rnet.missouri.edu/bdm_download/GenomeFlow/hg19_genome/

F. Running

  • Access the function from the menu toolbar: 1D-Functions/Build index for reference genome

  • Select the appropriate input following the instruction in Table below.

  • Click on the Execute button

    • It will generate a shell script file, Indexer_script.sh
  • Linux/Mac OS User: The indexing operation will start automatically in background.

  • Cygwin/Mingw User: Manually execute the script

    • Open the Unix terminal
    • Change directory to the directory containing the shell script file. For example:
      • cd path/to/directory
    • Give executable permission to the script file. For example:
      • chmod +x Indexer_script.sh
    • Execute the shell script: For example:
      • ./Indexer_script.sh

G. Creating an index for a reference genome GUI

Field Description Default
Input Reference Genome file A reference genome file having extension. fa, .mfa, .fna or similar. For example human genome(GRCh37/hg19) NA
Output Directory The output directory path to output the script NA
Choose tool to use Two options are made available for indexing. Select bwa- Burrows-Wheeler alignment or Bowtie2. bwa
Binary file Browse and select the binary file for the chosen tool,
bwa: Select the bwa binary you compiled from bwa-* directory.
Bowtie2: Select the bowtie2-build indexer from the bowtie2-* directory
NA
Number of threads Specify the number of cores to use for multithreading.
This Option is available only for the bowtie2-build indexer. Specify the number of threads to use for this task. More threads means less processing time taken.
3
Execute This button generates a shell script (.sh) that is executed automatically for Linux/Mac OS users..

Mapping the raw single or pair read FASTQ files

A. Purpose

To perform alignment of the index and a set of sequencing read files

B. Input

A FASTQ read files usually with extension .fq or .fastq.

C. Test Data

Test datasets can be found here:

  1. MiSeq GM12878 in-situ files: http://sysbio.rnet.missouri.edu/bdm_download/GenomeFlow/MiSeq_GM12878/MiSeq_GM12878_Data/
  2. A karyotypically normal human lymphoblastoid cell line (GM06990) from Aiden et al: http://sysbio.rnet.missouri.edu/bdm_download/GenomeFlow/GM06990/GM06990_Data/

D. Output

The output will be found in a folder bowtie2_align for bowtie2 and bwa_align for bwa. By default, the output BAM file is named bwa_mapped.bam for bwa and named bowtie2_mapped.bam for bowtie2.

E. Test data Output

The generated bowtie2 and bwa alignment BAM file can be downloaded from the link below for each test data:

  1. MiSeq GM12878 in-situ files:
  1. GM06990 Cell line:

F. Running

  • Access the function from the menu toolbar: 1D-Functions/Map the raw FASTQ files
  • Select the appropriate input following the instruction in the Table below.
  • Click on the Execute button
    • It will generate a shell script file.
      • bowtie2 binary: Mapper_script_bowtie2.sh
      • bwa binary: Mapper_script_bwa.sh
  • Linux/Mac OS User: The mapping operation will start automatically in background.
  • Cygwin/Mingw User: Manually execute the script
    • Open the Unix terminal
    • Change directory to the directory containing the shell script file. For example:
      • cd path/to/directory
    • Give executable permission to the script file. For example:
      • chmod +x Mapper_script_bowtie2.sh
    • Execute the shell script: For example:
      • ./ Mapper_script_bowtie2.sh

G. Mapping the raw FASTQ files GUI

Field Description Default
Index Directory A path to the index created using bwa or bowtie2 NA
Output Directory The output directory path to output the script NA
Load Read-1(.fastq) The file containing mate 1, or file for a single read e.g HIC003_S2_L001_R1_001.fastq NA
Load Read-2(.fastq) The file containing mate 2 e.g HIC003_S2_L001_R2_001.fastq NA
Is Pair-End Read Check if the data is a pair end read data unchecked
Choose tool to use Two options are made available for indexing. Select bwa or Bowtie2.
Important : Select the tool which was used to generate the reference genome Index. bwa can only be used to map generated bwa index files, and bowtie2 can only be used to map generated bowtie2 index files.
bwa
Binary file Browse and select the binary file for the chosen tool.
Bwa: Select the bwa binary you compiled from bwa-* directory.
Bowtie2: Select the bowtie2 binary file to align from the bowtie2-* directory
NA
Number of threads Specify the number of cores to use for multithreading.
This Option is available only for the bowtie2-build indexer, Specify the number of threads to use for this task. More threads means less processing time taken.
3
Samtools binary file SAMtools is a collection of tools for manipulating and analyzing SAM and BAM alignment files. This tool allows you to get alignments in SAM format.
Browse and select the samtools binary file from the samtools-* directory.
NA
Execute This button generates a shell script (.sh) that is executed automatically for Linux/Mac OS users.

Filter a BAM alignment file

A. Purpose

To perform filtering of a BAM file to remove low- quality map reads, and unmapped reads among others.

B. Input

The BAM file generated from the mapping step above. For example, by default bwa BAM files is named bwa_mapped.bam and bowtie2 BAM files is named bowtie2_mapped.bam

C. Test Data

The generated bowtie2 alignment BAM file can be downloaded from the link below for each test data:

  1. MiSeq GM12878 in-situ files:
  1. GM06990 Cell line:

D. Output

A BAM binary format (. bam) named bowtie2_mapped.filtered.bam for bowtie2 and bwa_mapped.filtered.bam for bwa

E. Test Data Output

The generated filtered bowtie2 and bwa alignment BAM file can be downloaded from the link below for the MiSeq GM12878 in-situ and GM06990 Cell line test datasets

  1. MiSeq GM12878 in-situ files:
  1. GM06990 Cell line:

F. Running

  • Access the function from the menu toolbar: 1D-Functions/Filter a BAM alignment file
  • Select the appropriate input following the instruction in Table below.
  • Click on the Execute button
    • It will generate a shell script file, Filter_script_samtools.sh
  • Linux/Mac OS User: The filtering operation will start automatically in background.
  • Cygwin/Mingw User: Manually execute the script
    • Open the Unix terminal
    • Change directory to the directory containing the shell script file. For example:
      • cd path/to/directory
    • Give executable permission to the script file. For example:
      • chmod +x Filter_script_samtools.sh
    • Execute the shell script: For example:
      • ./Filter_script_samtools.sh

G. Filter a BAM alignment file GUI

Field Description Default
Created .bam file Select the BAM file generated using either bwa or Bowtie2. Select the BAM file named bwa_mapped.bam for bwa and BAM file named bowtie2_mapped.bam for bowtie2 NA
Output Directory The output directory path to output the script NA
Samtools binary file Samtools is a collection of tools for manipulating and analyzing SAM and BAM alignment files. This tool allows you to get alignments in SAM format.
Browse and select the samtools binary file from the samtools-* directory.
NA
Samtools Flag (-F) samtools allows you to sort based on certain flags that are specified on page 5 on the SAM format specification 0x4
Samtools MAPQ (-q) An integer value to Skip alignments with MAPQ smaller than INT. The lowest score is a mapping quality of zero, or **mq0 **for short. The reads map to multiple places on the genome, and we can't be sure of where the reads originated. To improve the quality of our data, we can remove these low-quality reads.
Generally, we select reads with MAPQ > 1.
1
Execute This button generates a shell script (.sh) that is executed automatically for Linux/Mac OS users. This script contains the basic parameters required by each tool for filtering.

Convert a BAM file to a HiC input file format

A. Purpose To generate a HiC input file format in medium file format , a text file describing mapped Hi-C reads that can be used as input to create a .hic file. A hic format file is a binary file containing contact matrices at different resolutions and normalized by different methods.

B. Input A filtered BAM alignment file. e.g bwa_mapped.filtered.bam

C. Test Data

The generated filtered bowtie2 and bwa alignment BAM file for the MiSeq GM12878 in-situ and GM06990 Cell line test datasets can be downloaded from the link below:

1.MiSeq GM12878 in-situ files:

  1. GM06990 Cell line:

D. Output

A medium input file format with 11 columns that can be used to create a .hic file This file format is explained in details here.

E. Test Data Output The generated input medium file format file for the input test datasets can be downloaded from the link below:

  1. MiSeq GM12878 in-situ files:
  1. GM06990 Cell line:

F. Running

  • Access the function from the menu toolbar: 1D-Functions/Convert a BAM file to a HiC input file format
  • Select the appropriate input following the instruction in Table below.
  • Click on the Execute button
    • It will generate a shell script file, Format_script_samtools.sh
  • Linux/Mac OS User: The formatting operation will start automatically in background.
  • Cygwin/Mingw User: Manually execute the script
    • Open the Unix terminal
    • Change directory to the directory containing the shell script file. For example:
      • cd path/to/directory
    • Give executable permission to the script file. For example:
      • chmod +x Filter_script_samtools.sh
    • Execute the shell script: For example:
      • ./Filter_script_samtools.sh

G. Convert to HiC Input File Format GUI

Field Description Default
Created .bam file Select the BAM file generated from the filtering.

By default, bwa filtered BAMfile is named bwa_mapped.filtered.bam and bowtie2 filtered BAM file is named bowtie2_mapped.filtered.bam
NA
Output Directory The output directory path to output the script NA
Samtools binary file Samtools is a collection of tools for manipulating and analyzing SAM and BAM alignment files. This tool allows you to get alignments in SAM format.
Browse and select the samtools binary file from the samtools-* directory
NA
Generate Scripts This button generates a shell script (.sh) that is executed automatically for Linux/Mac OS users.

HiC-Express

A. Purpose

To generate a HiC input file format in medium file format - a text file describing mapped Hi-C reads that can be used as input to create a .hic file from a raw fastq files derived from a Hi-C experiment.

B. Input

A FASTQ read files usually with extension, .fq or .fastq.

C. Test Data

Test datasets can be found here:

  1. MiSeq GM12878 in-situ files: http://sysbio.rnet.missouri.edu/bdm_download/GenomeFlow/MiSeq_GM12878/MiSeq_GM12878_Data/
  2. A karyotypically normal human lymphoblastoid cell line (GM06990) from Aiden et al: http://sysbio.rnet.missouri.edu/bdm_download/GenomeFlow/GM06990/GM06990_Data/

D. Output

An input medium file format with 11 columns that can be used to create a .hic file.

E. Test Data Output

The generated input medium file format file for the input test datasets can be downloaded from the link below:

  1. MiSeq GM12878 in-situ files:
  1. GM06990 Cell line:

F. Running

  • Access the function from the menu toolbar: 1D-Functions/HiC-Express
  • Select the appropriate input following the instruction in Table below.
  • Click on the Execute button
    • It will generate a shell script file, HiC-Express.sh
  • Linux/Mac OS User: The HiC Express operation will start automatically in background.
  • Cygwin/Mingw User: Manually execute the script
    • Open the Unix terminal
    • Change directory to the directory containing the shell script file. For example:
      • cd path/to/directory
      • Give executable permission to the script file. For example:
        • chmod +x HiC-Express.sh
      • Execute the shell script: For example:
        • ./ HiC-Express.sh

G. HiC-Express GUI

Field Description Default
Created Index Directory A path to the index created using bwa or bowtie2 NA
Output Directory The output directory path to output the script NA
Load Read-1(.fastq) The file containing mate 1, or file for a single read e.g HIC003_S2_L001_R1_001.fastq NA
Load Read-2(.fastq) The file containing mate 2 e.g HIC003_S2_L001_R2_001.fastq NA
Is Pair-End Read Check if the data is a pair end read data unchecked
Choose tool to use Two options are made available for indexing.
Select bwa or Bowtie2.
Important : Select the tool which was used to create the reference genome Index. bwa can only be used to map bwa index, and bowtie2 can only be used to map bowtie2 index.
bwa
Binary file Browse and select the binary file for the chosen tool.
bwa: Select the bwa binary you compiled from bwa-* directory.
Bowtie2: Select the bowtie2 binary file to align from the bowtie2-* directory
NA
Number of threads Specify the number of cores to use for multithreading.
This option is available only for the bowtie2-build indexer, Specify the number of threads to use for this task. More threads means less processing time taken.
3
Samtools binary file samtools is a collection of tools for manipulating and analyzing SAM and BAM alignment files. This tool allows you to get alignments in SAM format.
Browse and select the samtools binary file
NA
Samtools Flag (-F) samtools allows you to sort based on certain flags that are specified on page 5 on the SAM format specification 0x4
Samtools MAPQ (-q) An integer value to Skip alignments with MAPQ smaller than INT. The lowest score is a mapping quality of zero, or mq0 for short. The reads map to multiple places on the genome, and we can't be sure of where the reads originated. To improve the quality of our data, we can remove these low-quality reads.
Generally, we select reads with MAPQ > 1.
1
Generate Scripts This button generates a shell script (.sh) that is executed automatically for Linux/Mac OS users.