Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re Dev : To Do #20

Open
11 of 35 tasks
snewhouse opened this issue May 8, 2016 · 13 comments
Open
11 of 35 tasks

Re Dev : To Do #20

snewhouse opened this issue May 8, 2016 · 13 comments

Comments

@snewhouse
Copy link
Contributor

snewhouse commented May 8, 2016

New Branch in GIT repn

  • make a new branch

f1000_dev
on image
/home/ubuntu/scratch/ngseasy

Openstack VM

  • space
  • send key to amos
  • 30+ CPU
  • max RAM
  • Volume : 4TB

Images

  • build images
  • build tool set
  • build one image with all tools

Get Genomes

  • hg19.fasta
  • hs37d5.fasta
  • GRCh38.p7.fasta
  • hs38DH.fasta
  • gatk resources bundles
17.05.2016
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000001405.22_GRCh38.p7/GCA_000001405.22_GRCh38.p7_genomic.fna.gz

Get test data

  • small 30-150x data set

Index Genomes

  • bwa
    • hg19.fasta
    • hs37d5.fasta
    • hs38DH.fasta
  • snap
    • hg19.fasta
    • hs37d5.fasta
    • hs38DH.fasta
  • novoalign
    • hg19.fasta
    • hs37d5.fasta
    • hs38DH.fasta
  • bowtie2
    • hg19.fasta
    • hs37d5.fasta
    • hs38DH.fasta

bwa

├── hs37d5.fasta
├── hs37d5.fasta.amb
├── hs37d5.fasta.ann
├── hs37d5.fasta.bwt
├── hs37d5.fasta.pac
├── hs37d5.fasta.sa

PLAN BY MONDAY 23rd

giab_data_indexes

https://github.com/genome-in-a-bottle/giab_data_indexes

Test Data

  • 30x Exome
  • 150x Exome
  • 1x WGX at 30x min. (source better WGS data set as X10 is shit and messy)

GATK Gold Standard Run

  • run bwa-realing-bsqr-haplotypecaller on all 3 data sets

This is the "Gold Standard". This will a week if no bugs.

The Glue

Open :-

  1. BASH done better than before
  • logging
  • read a user supplied config file (spreadsheet like)
  • user specifies the pipeline
  • SJN TO ADD CONFIG PARAMETER LIST
  • consider converting to .yaml behind the scenes
  • self checks : does input exist move on

RECON BY MONDAY NEXT WEEK

@snewhouse
Copy link
Contributor Author

@snewhouse
Copy link
Contributor Author

@snewhouse
Copy link
Contributor Author

cloned into /home/ubuntu/scratch/ngseasy

@snewhouse
Copy link
Contributor Author

@snewhouse
Copy link
Contributor Author

snewhouse commented May 17, 2016

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000001405.22_GRCh38.p7/README.txt

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000001405.22_GRCh38.p7/GCA_000001405.22_GRCh38.p7_genomic.fna.gz

@snewhouse
Copy link
Contributor Author

snewhouse commented May 17, 2016

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

the readme

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/README.20150309.GRCh38_full_analysis_set_plus_decoy_hla

@snewhouse
Copy link
Contributor Author

@snewhouse
Copy link
Contributor Author

@snewhouse
Copy link
Contributor Author

@snewhouse
Copy link
Contributor Author

@snewhouse
Copy link
Contributor Author

SJN dev in /mnt/data1/scratch/ngseasy

@snewhouse
Copy link
Contributor Author

git lfs : https://git-lfs.github.com/

@snewhouse
Copy link
Contributor Author

cp -v GRCh38_full_analysis_set_plus_decoy_hla.fa GRCh38dH.fasta
'GRCh38_full_analysis_set_plus_decoy_hla.fa' -> 'GRCh38dH.fasta'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant