Skip to content

Generates synthetic cancer genomes with different contamination level and intra-tumor heterogeneity and devoid of any synthetic element

License

Notifications You must be signed in to change notification settings

rsemeraro/XomeBlender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Xome-Blender 3.1

  • New version for Python3
  • New CNV engine
  • Long reads support
  • Dockerized

Xome-Blender is a collection of python and R scripts based on SAMtools functions that allows to generate synthetic cancer genomes with user defined features such as the number of subclones, the number of somatic variants and the presence of CNV, without the addition of any synthetic element. It is composed of two modules: InXalizer and Xome-Blender. The first module is devoted to the blending process initialization. It takes as input a single BAM file, a set of user-defined parameters and returns the coverage of the sample and the input-files for the second module (Xome-Blender). Optionally, it creates a file containing the coordinates to insert CNV in the final product. The second module generates the synthetic heterogeneous sample.

Supported on Linux.

Requirements

Installation

  • Clone the Xome-Blender repository

      git clone git://github.com/rsemeraro/Xome-Blender.git
    
  • Compile

    The project can be compiled by calling setup.py in the top-level directory:

      python3 setup.py install
    

Docker image

  • Download

      docker pull r08d0ck/xomeblender_docker:3.1
    
  • Run image

     docker run -v /your/input_data/folder:/mnt/mydata -it --name XB3 r08d0ck/xomeblender_docker:3.1
    

Usage

First run InXalizer. It requires a BAM file, a label for it and a reference genome. By means of four parameters it is possible to tune the initialization process:

  1. Subclone number = the number of subclones that will compose the final product (-scn).
  2. Variants number = the number of somatic variants that will appear in the final product (-vn).
  3. Subclonal architecture = the evolution model for the sample synthesis, it can be Linear or Branched (-sa).
  4. CNV = it's an option that allows for the generation of a CNV file, defining their number and length (-cnv).

Optionally, it is possible to use a target file, in bed format, to edit only defined portions of the BAM file (whole-exome or target sequencing experiments).

Examples:

  • Running InXalizer with minimum requirements:

      python3 inxalizer -i file.bam -la my_label -r MyRef.fasta -scn 2 -vn 50 -sa Branched
    
  • Running InXalizer with target file:

      python3 inxalizer -i file.bam -la my_label -r MyRef.fasta -scn 2 -vn 50 -sa Branched -b MyTargetfile.bed
    
  • Running InXalizer with CNVs:

      python3 inxalizer -i file.bam -la my_label -r MyRef.fasta -scn 2 -vn 50 -sa Branched -cnv 3 1000000
    
    • Custom CNV

      With Xome Blender 3.1 you can customize the CNV events. The main function of InXalizer, which creates events basing on the number and the size defined by the user, is preserved but now you can pass to Xome Blender a hand made CNV file. The CNV file is made of 7 fields: chr, start, end, event (Del-Dup), copy number, subclones, reference.

         chr4    64680053    65680053    Del 2   1000_variants_S1    /ext-HD2/GRCh38.fa
         chr13   55771838    56771838    Del 2   1000_variants_S1    /ext-HD2/GRCh38.fa
         chr14   64913211    65913211    Dup 2   1000_variants_S1    /ext-HD2/GRCh38.fa
         chr10   24042890    25042890    Dup 1   1000_variants_S2    /ext-HD2/GRCh38.fa  
         chr17   8389846     9389846     Dup 4   1000_variants_S1-1000_variants_S2   /ext-HD2/GRCh38.fa
         chr2    100120027   101120027   Del 2   1000_variants_S1-1000_variants_S2   /ext-HD2/GRCh38.fa
         chr8    139185592   140185592   Del 2   1000_variants_S2    /ext-HD2/GRCh38.fa
         chr8    128351385   129351385   Del 1   1000_variants_S2    /ext-HD2/GRCh38.fa
         chr9    119828043   120828043   Dup 2   1000_variants_S2    /ext-HD2/GRCh38.fa
      

      For duplications, the maximum copy number is 5, on the other hand, for deletions you can set a single deletion (1) or a double deletion (2).

    InXalizer can be run to generate CNV files only by using the "CNV list" function, invoked as -l, and a reference sequence.

    python3 inxalizer -r ref.fa -l cnv_list.txt
    

    Each line of the cnv_list file must contain a label for the cnv file to be generated, the number and the size of CNV events and, optionally, the path to a target file.

    List_file example:
    My_label_1	23 10000000
    My_label_2    3 100000000 path/to/target.bed
    

The data generated by InXalizer are then used to proceed with the blending process, performed by Xome-Blender. The main module takes as input the BAM files produced by InXalizer, the sample's label (the same used in InXalizer), the starting coverage (-sc) (stored into the .cov file), the percentage (-p) of each subclone in order to create the mixed sample, the desired coverage (-fc) of the output file and the subclone variant file (-v) generated in the previous step. If provided, it can use a cnv file generated with InXalizer to add CNV events to the final product.
Xome-Blender can be run in "single mode" by typing all the options in a shell, as in the examples below.

Examples:

  • Running Xome-Blender with minimum requirements:

      python3 xome_blender -i Control.bam Subclone1.bam Subclone2.bam -la my_label -sc 127 -p 30 40 30 -fc 110 -v Subclone1.vcf Subclone2.vcf
    
  • Running Xome-Blender with CNVs:

      python3 xome_blender -i Control.bam Subclone1.bam Subclone2.bam -la my_label -sc 127 -p 30 40 30 -fc 110 -v Subclone1.vcf Subclone2.vcf -cnv my_label_CNV.txt
    
    *Multiple elements must be space separated.

Alternatively, it can be run in "automated mode" by using the --list (-l) option. The activation of this parameter requires a list file, containing the info to run multiple consecutive analyses.

  python3 xome_blender -l list_file.txt

The list_file is a tab separated file containing different anlayses (one per line). Each row must contain all the parameters above.

List_file example:
  NA18501_Control.bam NA18501_Subclone1.bam	NA18501	145	20 80   120 Subclone1.vcf
  NA18501_Control.bam NA18501_Subclone1.bam	NA18501	145	30 70	90  Subclone1.vcf
  NA18501_Control.bam NA18501_Subclone1.bam	NA18501	145	40 60	140 Subclone1.vcf   my_label_CNV.txt
  NA18501_Control.bam NA18501_Subclone1.bam	NA18501	145	50 50	50  Subclone1.vcf   my_label_CNV.txt

Contacts

This program has been developed by Roberto Semeraro, Department of Experimental and Clinical Medicine, University of Florence

About

Generates synthetic cancer genomes with different contamination level and intra-tumor heterogeneity and devoid of any synthetic element

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published