Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genome Additions Master Ticket #242

Open
8 of 19 tasks
jennaj opened this issue Jan 11, 2016 · 29 comments
Open
8 of 19 tasks

Genome Additions Master Ticket #242

jennaj opened this issue Jan 11, 2016 · 29 comments
Assignees

Comments

@jennaj
Copy link
Member

jennaj commented Jan 11, 2016

Genome and indexes for CVMFS and http://usegalaxy.org

CONVERT THIS ISSUE TO A PROJECT @jennaj

This list changes over time as new data sources are targetted for indexing and user requests are considered. See posts below for genome batches completed and in progress.

Current plans are to bring http://usegalaxy.org up to date with UCSC's released genomes, indexed for all tools, so those do not need to be requested by users at this time.

Main:

Other reference data:

Admin/Local Data and DM usage enhancements are included here.

Resolved data issues:

New genomes and indexes will be installed at https://test.galaxyproject.org/ first for testing. If your genome is listed and checked as complete, community testing and feedback can be posted to https://biostar.usegalaxy.org or through a bug report from the error dataset (from a mapping tool, etc).

All data will later be promoted to http://usegalaxy.org. Timeline is not firm.

Making a Reference genome request

  • Create reply below
  • Be specific
    • Name of organism, including common "key" used
    • Exact source. Ex: UCSC (dbkey), NCBI project ID, other URL
    • Build details: include mito, chloro, plasmids, etc.
    • All indexes are now generated for brand new genomes by default. Or pick one of: "All" "Bowtie2" "MyFavTool" if your genome is at http://usegalaxy.org, but not available in your tool of interest
    • Don't forget anyone can add and use a custom genome right now with most tools - no waiting! https://wiki.galaxyproject.org/Learn/CustomGenomes
@jennaj jennaj self-assigned this Jan 11, 2016
@jennaj
Copy link
Member Author

jennaj commented Jan 12, 2016

For reference,

Master spreadsheet of dbkeys and indexes done and to-do. Older genomes removed.
https://docs.google.com/spreadsheets/d/1jtDC-2STroUINP6KVrfhZwGQgpP5y-HhkRMONZtD1W4/edit?usp=sharing

dbkeys with fasta loaded.
https://gist.github.com/jennaj/aeb8d6af4e4722a89f62d15af8ce3452

@jennaj
Copy link
Member Author

jennaj commented Jan 14, 2016

Issues detected

Meets goal of consistency in nomeclature permiting DMs to function

change genome label in all_fasta (kill "full" in all descriptions/dbkeys). Other locs may need mods.

  • ci2full to ci2 (dbkey & description)
  • cb3full to cb3 (dbkey & description)
  • "panTro3 Full" to "panTro3" (description only, no "Canonical" exists)

@jennaj
Copy link
Member Author

jennaj commented Mar 10, 2016

Genomes that need followup:

Not at http://genome.ucsc.edu (browser or download). Are at http://genome-test.cse.ucsc.edu/. Pending release?

  • rheMac7 Rhesus (in builds list, but that does not contain genome-test anymore. odd)
  • rheMac8 Rhesus (not in builds list, can be captured next update)

@jennaj
Copy link
Member Author

jennaj commented Apr 11, 2016

Completed and indexes promoted to http://usegalaxy.org (Galaxy Main) April 2016

Fasta

New genomes (confirmed, to be indexed for all)

  • rn6 Rat
  • dm6 Fruit Fly
  • musFur1 Ferret
  • cerSim1 White Rhino
  • nomLeu3 Gibbon
  • danRer10 Zebrafish **not in builds list - created dbkey but did not populate to http://usegalaxy.org. Impact, all data (inc indexes) does not populate on mapping tools forms, etc) See Add latestest UCSC genomes to builds.txt plus .len data to http://usegalaxy.org galaxy#2530
  • bosTau8 Cow
  • papAnu2 Baboon
  • vicPac1 Alpaca
  • vicPac2 Alpaca
  • allMis1 American alligator
  • dasNov3 Armadillo
  • gadMor1 Atlantic cod
  • panPan1 Bonobo
  • aptMan1 Brown Kiwi not in builds list - created dbkey Same issue as danRer10.
  • felCat8 Cat

2bit

Note: Lastz indexes created by same DM

New genomes

  • rn6 Rat
  • dm6 Fruit Fly
  • musFur1 Ferret
  • cerSim1 White Rhino
  • nomLeu3 Gibbon
  • danRer10 Zebrafish
  • bosTau8 Cow
  • papAnu2 Baboon
  • vicPac1 Alpaca
  • vicPac2 Alpaca
  • allMis1 American alligator
  • dasNov3 Armadillo
  • gadMor1 Atlantic cod
  • panPan1 Bonobo
  • aptMan1 Brown Kiwi
  • felCat8 Cat

Existing

  • melUnd1 Budgerigar
  • bosTau7 Cow

Sam

New genomes

  • rn6 Rat
  • dm6 Fruit Fly
  • musFur1 Ferret
  • cerSim1 White Rhino
  • nomLeu3 Gibbon
  • danRer10 Zebrafish
  • bosTau8 Cow
  • papAnu2 Baboon
  • vicPac1 Alpaca
  • vicPac2 Alpaca
  • allMis1 American alligator
  • dasNov3 Armadillo
  • gadMor1 Atlantic cod
  • panPan1 Bonobo
  • aptMan1 Brown Kiwi
  • felCat8 Cat

Existing

Picard

New genomes

  • rn6 Rat
  • dm6 Fruit Fly
  • musFur1 Ferret
  • cerSim1 White Rhino
  • nomLeu3 Gibbon
  • danRer10 Zebrafish
  • bosTau8 Cow
  • papAnu2 Baboon
  • vicPac1 Alpaca
  • vicPac2 Alpaca
  • allMis1 American alligator
  • dasNov3 Armadillo
  • gadMor1 Atlantic cod
  • panPan1 Bonobo
  • aptMan1 Brown Kiwi
  • felCat8 Cat

Existing

Bowtie2/Tophat2

Issue about Bowtie2 DM creating duplicate indexes: galaxyproject/tools-devteam#319

New genomes

  • rn6 Rat
  • dm6 Fruit Fly
  • musFur1 Ferret
  • cerSim1 White Rhino
  • nomLeu3 Gibbon
  • danRer10 Zebrafish
  • bosTau8 Cow
  • papAnu2 Baboon
  • vicPac1 Alpaca
  • vicPac2 Alpaca
  • allMis1 American alligator
  • dasNov3 Armadillo
  • gadMor1 Atlantic cod
  • panPan1 Bonobo
  • aptMan1 Brown Kiwi
  • felCat8 Cat

Existing

  • galGal3 Chicken Full & Canonical
  • galGal4 Chicken
  • melGal1 Turkey
  • equCab1 Equus caballus
  • equCab2 Equus caballus
  • loxAfr1 African Elephant (duplicate in Bowtie2 - DM does not allow Tophat2 only creation - ticket)
  • loxAfr3 African Elephant
  • sacCer2 S. cerevisiae
  • sacCer3 S. cerevisiae
  • Schizosaccharomyces_pombe_1.1
  • rheMac2 Rhesus
  • rheMac3 Rhesus
  • eschColi_K12 Escherichia coli (str. K-12 substr. MG1655)
  • melUnd1 Budgerigar

BWA/BWA-MEM

New genomes

  • rn6 Rat
  • dm6 Fruit Fly
  • musFur1 Ferret
  • cerSim1 White Rhino
  • nomLeu3 Gibbon
  • danRer10 Zebrafish
  • bosTau8 Cow
  • papAnu2 Baboon
  • vicPac1 Alpaca
  • vicPac2 Alpaca
  • allMis1 American alligator
  • dasNov3 Armadillo
  • gadMor1 Atlantic cod
  • panPan1 Bonobo
  • aptMan1 Brown Kiwi (built using BWT-SW)
  • felCat8 Cat

Existing

  • equCab1 Equus caballus
  • equCab2 Equus caballus
  • sacCer2 S. cerevisiae
  • sacCer3 S. cerevisiae (created dup, needs cleanup)
  • Schizosaccharomyces_pombe_1.1
  • rheMac2 Rhesus
  • rheMac3 Rhesus
  • galGal3 Chicken (full) (built using BWT-SW)
  • galGal3 Canonical
  • galGal4 Chicken
  • melGal1 Turkey
  • loxAfr1 African Elephant
  • loxAfr3 African Elephant
  • bosTauMd3 Cow
  • ce9 C. elegans
  • susScr2 Pig
  • canFam2 Dog
  • canFam3 Dog
  • eschColi_K12 Escherichia coli (str. K-12 substr. MG1655)
  • papHam1 Baboon
  • melUnd1 Budgerigar (built using BWT-SW)
  • otoGar1 Bushbaby
  • otoGar3 Bushbaby
  • felCat5 Cat
  • panTro3 Chimpanzee Full & Canonical
  • panTro4 Chimpanzee
  • turTru2 Dolphin

HISAT2

New genomes

  • rn6 Rat
  • dm6 Fruit Fly
  • musFur1 Ferret
  • cerSim1 White Rhino
  • nomLeu3 Gibbon
  • danRer10 Zebrafish
  • bosTau8 Cow
  • papAnu2 Baboon
  • vicPac1 Alpaca
  • vicPac2 Alpaca
  • allMis1 American alligator
  • dasNov3 Armadillo
  • gadMor1 Atlantic cod
  • panPan1 Bonobo
  • aptMan1 Brown Kiwi
  • felCat8 Cat

Existing

  • hg38
  • hg38canon
  • hg38female
  • hg19
  • hg19canon
  • hg19female
  • hg19_rCRS_pUC18_phiX174
  • hg_g1k_v37 1000Genomes
  • mm10
  • mm9
  • dm3
  • equCab1 Equus caballus
  • equCab2 Equus caballus
  • sacCer2 S. cerevisiae
  • sacCer3 S. cerevisiae
  • Schizosaccharomyces_pombe_1.1
  • rheMac2 Rhesus
  • rheMac3 Rhesus
  • galGal3 Chicken Full & Canonical
  • galGal4 Chicken
  • melGal1 Turkey
  • loxAfr1 African Elephant
  • loxAfr3 African Elephant
  • ce9 C. elegans
  • ce10 C. elegans
  • susScr2 Pig
  • susScr3 Pig
  • bosTauMd3 Cow
  • bosTau7 Cow
  • canFam2 Dog
  • canFam3 Dog
  • eschColi_K12 Escherichia coli (str. K-12 substr. MG1655)
  • papHam1 Baboon
  • melUnd1 Budgerigar
  • otoGar1 Bushbaby
  • otoGar3 Bushbaby
  • felCat5 Cat
  • panTro3 Chimpanzee Full & Canonical
  • panTro4 Chimpanzee
  • turTru2 Dolphin

Liftover

See distinct tracking checklist, below

@jennaj
Copy link
Member Author

jennaj commented Apr 11, 2016

2018

Fasta

New genomes (confirmed, to be indexed for all)

  • Arabidopsis_thaliana_TAIR10 (exists in all_fasta.loc, check others)
  • hg38Patch2 - (GRCh38.p2 Human) source NCBI or UCSC? Check chrom IDs in .len
  • anaPla1 Mallard duck Apr 2013 (BGI_duck_1.0/anaPla1) - source UCSC test server
  • ce11 C. elegans Feb. 2013 (WBcel235/ce11)
  • criGri1 Chinese hamster
  • fr3 Fugu Oct. 2011 (FUGU5/fr3) (fr3)
  • galGal5 Dec 2015 (Gallus_gallus-5.0/galGal5)
  • latCha1 Coelacanth
  • oviAri3 Aug. 2012 (ISGC Oar_v3.1/oviAri3)
  • rheMac8 Nov. 2015 (BCM Mmul_8.0.1/rheMac8)
  • susScr11 Pig Feb. 2017 (Sscrofa11.1/susScr11)

2bit

Note: Lastz indexes created by same DM

New genomes

  • Arabidopsis_thaliana_TAIR10 (exists in all_fasta.loc, check others)
  • hg38Patch2 - (GRCh38.p2 Human) source NCBI or UCSC? Check chrom IDs in .len
  • anaPla1 Mallard duck Apr 2013 (BGI_duck_1.0/anaPla1) - source UCSC test server
  • ce11 C. elegans Feb. 2013 (WBcel235/ce11)
  • criGri1 Chinese hamster
  • fr3 Fugu Oct. 2011 (FUGU5/fr3) (fr3)
  • galGal5 Dec 2015 (Gallus_gallus-5.0/galGal5)
  • latCha1 Coelacanth
  • oviAri3 Aug. 2012 (ISGC Oar_v3.1/oviAri3)
  • rheMac8 Nov. 2015 (BCM Mmul_8.0.1/rheMac8)
  • susScr11 Pig Feb. 2017 (Sscrofa11.1/susScr11)

Existing

  • Arabidopsis_thaliana_TAIR10 (check if exists)
  • Add here

Sam

New genomes

  • Arabidopsis_thaliana_TAIR10 (exists in all_fasta.loc, check others)
  • hg38Patch2 - (GRCh38.p2 Human) source NCBI or UCSC? Check chrom IDs in .len
  • anaPla1 Mallard duck Apr 2013 (BGI_duck_1.0/anaPla1) - source UCSC test server
  • ce11 C. elegans Feb. 2013 (WBcel235/ce11)
  • criGri1 Chinese hamster
  • fr3 Fugu Oct. 2011 (FUGU5/fr3) (fr3)
  • galGal5 Dec 2015 (Gallus_gallus-5.0/galGal5)
  • latCha1 Coelacanth
  • oviAri3 Aug. 2012 (ISGC Oar_v3.1/oviAri3)
  • rheMac8 Nov. 2015 (BCM Mmul_8.0.1/rheMac8)
  • susScr11 Pig Feb. 2017 (Sscrofa11.1/susScr11)

Existing

  • Add here

Picard

New genomes

  • Arabidopsis_thaliana_TAIR10 (exists in all_fasta.loc, check others)
  • hg38Patch2 - (GRCh38.p2 Human) source NCBI or UCSC? Check chrom IDs in .len
  • anaPla1 Mallard duck Apr 2013 (BGI_duck_1.0/anaPla1) - source UCSC test server
  • ce11 C. elegans Feb. 2013 (WBcel235/ce11)
  • criGri1 Chinese hamster
  • fr3 Fugu Oct. 2011 (FUGU5/fr3) (fr3)
  • galGal5 Dec 2015 (Gallus_gallus-5.0/galGal5)
  • latCha1 Coelacanth
  • oviAri3 Aug. 2012 (ISGC Oar_v3.1/oviAri3)
  • rheMac8 Nov. 2015 (BCM Mmul_8.0.1/rheMac8)
  • susScr11 Pig Feb. 2017 (Sscrofa11.1/susScr11)

Existing

  • Add here

Bowtie2/Tophat2

New genomes

  • Arabidopsis_thaliana_TAIR10 (exists in all_fasta.loc, check others)
  • hg38Patch2 - (GRCh38.p2 Human) source NCBI or UCSC? Check chrom IDs in .len
  • anaPla1 Mallard duck Apr 2013 (BGI_duck_1.0/anaPla1) - source UCSC test server
  • ce11 C. elegans Feb. 2013 (WBcel235/ce11)
  • criGri1 Chinese hamster
  • fr3 Fugu Oct. 2011 (FUGU5/fr3) (fr3)
  • galGal5 Dec 2015 (Gallus_gallus-5.0/galGal5)
  • latCha1 Coelacanth
  • oviAri3 Aug. 2012 (ISGC Oar_v3.1/oviAri3)
  • rheMac8 Nov. 2015 (BCM Mmul_8.0.1/rheMac8)
  • susScr11 Pig Feb. 2017 (Sscrofa11.1/susScr11)

Existing

  • danRer9 Zebrafish (double check if needed)

BWA/BWA-MEM

New genomes

  • Arabidopsis_thaliana_TAIR10 (exists in all_fasta.loc, check others)
  • hg38Patch2 - (GRCh38.p2 Human) source NCBI or UCSC? Check chrom IDs in .len
  • anaPla1 Mallard duck Apr 2013 (BGI_duck_1.0/anaPla1) - source UCSC test server
  • ce11 C. elegans Feb. 2013 (WBcel235/ce11)
  • criGri1 Chinese hamster
  • fr3 Fugu Oct. 2011 (FUGU5/fr3) (fr3)
  • galGal5 Dec 2015 (Gallus_gallus-5.0/galGal5)
  • latCha1 Coelacanth
  • oviAri3 Aug. 2012 (ISGC Oar_v3.1/oviAri3)
  • rheMac8 Nov. 2015 (BCM Mmul_8.0.1/rheMac8)
  • susScr11 Pig Feb. 2017 (Sscrofa11.1/susScr11)

Existing

  • danRer9 Zebrafish
  • danRer10 Zebrafish

HISAT2

New genomes

  • Arabidopsis_thaliana_TAIR10 (exists in all_fasta.loc, check others)
  • hg38Patch2 - (GRCh38.p2 Human) source NCBI or UCSC? Check chrom IDs in .len
  • anaPla1 Mallard duck Apr 2013 (BGI_duck_1.0/anaPla1) - source UCSC test server
  • ce11 C. elegans Feb. 2013 (WBcel235/ce11)
  • criGri1 Chinese hamster
  • fr3 Fugu Oct. 2011 (FUGU5/fr3) (fr3)
  • galGal5 Dec 2015 (Gallus_gallus-5.0/galGal5)
  • latCha1 Coelacanth
  • oviAri3 Aug. 2012 (ISGC Oar_v3.1/oviAri3)
  • rheMac8 Nov. 2015 (BCM Mmul_8.0.1/rheMac8)
  • susScr11 Pig Feb. 2017 (Sscrofa11.1/susScr11)

Existing

  • danRer9 Zebrafish
  • danRer10 Zebrafish

Liftover

See distinct tracking checklist, below

RNA STAR

Fast tracked genomes https://github.com/galaxyproject/galaxy/issues/1470#issuecomment-307517254

New genomes

  • Arabidopsis_thaliana_TAIR10 (exists in all_fasta.loc, check others)
  • hg38Patch2 - (GRCh38.p2 Human) source NCBI or UCSC? Check chrom IDs in .len
  • anaPla1 Mallard duck Apr 2013 (BGI_duck_1.0/anaPla1) - source UCSC test server
  • ce11 C. elegans Feb. 2013 (WBcel235/ce11)
  • criGri1 Chinese hamster
  • fr3 Fugu Oct. 2011 (FUGU5/fr3) (fr3)
  • galGal5 Dec 2015 (Gallus_gallus-5.0/galGal5)
  • latCha1 Coelacanth
  • oviAri3 Aug. 2012 (ISGC Oar_v3.1/oviAri3)
  • rheMac8 Nov. 2015 (BCM Mmul_8.0.1/rheMac8)
  • susScr11 Pig Feb. 2017 (Sscrofa11.1/susScr11)

Existing

  • danRer9 Zebrafish
  • danRer10 Zebrafish

@jennaj
Copy link
Member Author

jennaj commented Apr 11, 2016

New genomes under review (source/licence)

@jennaj
Copy link
Member Author

jennaj commented Apr 25, 2016

Liftover

Needs DM: galaxyproject/galaxy#1904

Workaround: Use the LiftOver tool at UCSC (the source for the wrapped version in Galaxy) and upload the results to Galaxy to use with other analysis. http://genome.ucsc.edu/cgi-bin/hgLiftOver

New genomes

  • rn6 Rat
  • dm6 Fruit Fly
  • musFur1 Ferret
  • cerSim1 White Rhino
  • nomLeu3 Gibbon
  • danRer10 Zebrafish
  • danRer9 Zebrafish (update)
  • bosTau8 Cow
  • papAnu2 Baboon
  • vicPac1 Alpaca
  • vicPac2 Alpaca
  • allMis1 American alligator
  • dasNov3 Armadillo
  • gadMor1 Atlantic cod
  • panPan1 Bonobo
  • aptMan1 Brown Kiwi
  • melUnd1 Budgerigar (only had .fa, sam, picard originally - odd)
  • felCat8 Cat
  • criGri1 Chinese hamster
  • latCha1 Coelacanth

Existing (update)

  • Probably all - need automated retrieval of new, ignore existing

@natefoo
Copy link
Member

natefoo commented May 11, 2016

@jennaj I updated the April 2016 comment to include the missing BWA indexes that I was able to build with the BWT-SW algorithm.

Some (like galGal3 and panTro3) with full/canonical variants I rebuilt. The only difference from the original DM run is that after selecting the correct build from "Source FASTA Sequence", I put the build variant name (e.g. galGal3canon) in the "ID for sequence" field. Otherwise the builds clobber eachother in the index dir on disk (the "ID for sequence" field is used for naming the index subdirectory and defaults to the dbkey - which for both full and canonical builds is still just e.g. galGal3 - this could be a bit more intuitive in the DM, I had no idea what "ID for sequence" was for until I noticed that two loc file entries pointed to the same directory/indexes on disk and then dug into the DM code to understand it). I rebuilt these for any indexes which had the variants built originally, and cleaned up the old directories and their entries in the location files.

These BWA indexes and the rest of the indexes in that comment are now in the process of being published to CVMFS and once done (this may take a long time) will be available on usegalaxy.org (after a restart, I'll comment again when it's all ready).

@natefoo
Copy link
Member

natefoo commented May 11, 2016

@jennaj The publishing is finished and Main has been restarted.

@jennaj
Copy link
Member Author

jennaj commented May 18, 2016

Add hg38 MAF alignments. Request: https://biostar.usegalaxy.org/p/17690

@jennaj jennaj changed the title Early 2016 Genome Additions 2016 Genome Additions Jul 14, 2016
@massaali
Copy link

massaali commented Oct 8, 2016

Hello,

I saw 7 weeks ago that another user had made this same request for a newer version of the sheep reference genome - you currently have OviAri1 which is 6 years old and there are two newer versions (about to be 3 newer versions) could we get a newer version? Sheep are amazing agricultural species important for meat milk and wool production and more researchers should study them! I request the current version on NCBI/ENSEMBL for all tools Bowtie and mapping tools, and chIP-seq, RNA-seq tools too: Ovis aries Oar_v4.0 its from late 2015.

Thank you for considering!

@sayalih
Copy link

sayalih commented Oct 9, 2016

Hi

I think the best way to is to use your genome of interest - use the fasta
format and upload it on galaxy using firezilla. And there is an option to
align with your uploaded sequence instead of the reference genome.
Links to how to do this:
https://wiki.galaxyproject.org/Support#Custom_reference_genome

I don't think they are uploading any more reference genomes on their
default list.

Sayali.


Update by @jennaj: Yes, use a custom reference genome for now. I will add in sheep and other requests to the next list of updates https://github.com/galaxyproject/galaxy/issues/1470#issuecomment-208444904

@jennaj jennaj changed the title 2016 Genome Additions Genome Additions Master Ticket Oct 25, 2016
@vebaev
Copy link

vebaev commented Jan 25, 2017

It will be great if you include the tomate 2.40 genome from:
ftp://ftp.solgenomics.net/tomato_genome/
And pepper C.annuum_cvCM334 from:
ftp://ftp.solgenomics.net/genomes/Capsicum_annuum/C.annuum_cvCM334/

@jennaj Yes, they are in NCBI (tomato and pepper):
https://www.ncbi.nlm.nih.gov/genome/7
https://www.ncbi.nlm.nih.gov/genome/10896

@jennaj
Copy link
Member Author

jennaj commented Jun 9, 2017

Priority indexes

RNA STAR



Indexes

  • hg38
  • hg19
  • mm10
  • mm9
  • rn6
  • rn5
  • dm6
  • dm3
  • sacCer3
  • sacCer2
  • ce9
  • ce10

Future requests (may be moved to a new post in this same issue)

  • bosTau8

@iraplee
Copy link

iraplee commented Jun 12, 2017

We're looking for X. tropicalus index to be uploaded to HiSat2

xenTro1 xenTro1 Frog (Xenopus tropicalis): xenTro1 /galaxy/data/xenTro1/seq/xenTro1.fa
xenTro2 xenTro2 Frog (Xenopus tropicalis): xenTro2 /galaxy/data/xenTro2/seq/xenTro2.fa
xenTro3 xenTro3 Frog (Xenopus tropicalis): xenTro3 /galaxy/data/xenTro3/seq/xenTro3.fa

@bimbam23
Copy link

bimbam23 commented Jun 27, 2017

New Pig genome:
Sus Scrofa 11.1, susscr4
NCBI GCF_000003025.6
all: (chr and chrUn plus chrMT)

genome:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/003/025/GCF_000003025.6_Sscrofa11.1/GCF_000003025.6_Sscrofa11.1_genomic.fna.gz
gff3:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/003/025/GCF_000003025.6_Sscrofa11.1/GCF_000003025.6_Sscrofa11.1_genomic.gff.gz

lookup table nice names: https://test.galaxyproject.org/u/bickj/h/pig-genome-lookup-table

@galaxyproject galaxyproject deleted a comment from vebaev Jun 28, 2017
@PseudomonasP
Copy link

Dear Galaxy Team,
I hope this is still the right place to request genome additions.

If we could get Brassica napus (Bna) as a built-in genome, that would be amazing:
http://www.genoscope.cns.fr/brassicanapus/data/

Please note that although the annotation is titled v5 while the genome itself is v4.1, it should work just fine, as we have had no problems with it.

@jennaj
Copy link
Member Author

jennaj commented May 1, 2018

Add NCBI's Xenopus laevis and Xenopus tropicalis genomes (indexed for all tools).

The genome is at https://usegalaxy.eu -- so when we get the data synced between all mirrors that might be the best solution.

Request: https://biostar.usegalaxy.org/p/27778

@jennaj
Copy link
Member Author

jennaj commented Nov 26, 2018

Request: add Medicago truncatula https://biostar.usegalaxy.org/p/5916/#30132

To-do: Check if present in ELIXER plant genomes already indexed (to be added in cvmfs): https://www.elixir-europe.org/about/groups/galaxy-wg

@jennaj
Copy link
Member Author

jennaj commented Jan 29, 2019

@JulienLeclercq
Copy link

Dear Galaxy Team,

Thanks for your amazing work.
Please kindly consider adding the following genome to Galaxy Main:
Mexican tetra (Astyanax mexicanus)
The genome is available at NCBI : https://www.ncbi.nlm.nih.gov/genome/?term=astyanax+mexicanus
and the annotation too: https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Astyanax_mexicanus/102/

Please note that the genome is version 2.0 and made from the surface eco-morphotype (unlike the previous version 1.02 from cave eco-morphotype).

In the meantime, I am working with a custom genome.

Best,
Julien

@hexylena
Copy link
Member

Migrate to usegalaxy-playbook?

@martenson martenson transferred this issue from galaxyproject/galaxy Aug 12, 2019
@jennaj
Copy link
Member Author

jennaj commented Nov 27, 2019

Request:

Genome: Citrus sinensis v1.1

Source: https://www.citrusgenomedb.org/bio_data/79

@jennaj
Copy link
Member Author

jennaj commented Dec 3, 2019

Request:

Human herpesvirus 1 with ref accession number NC_001806

@jennaj
Copy link
Member Author

jennaj commented Dec 3, 2019

Request:

Genome: Tribolium castaneum genome assembly (Tcas5.2)

Source: https://www.ncbi.nlm.nih.gov/genome?term=tribolium%20castaneum

@jennaj
Copy link
Member Author

jennaj commented Dec 4, 2019

Request:

Dada tools: #273

@psyi
Copy link

psyi commented Dec 11, 2019

Dear Galaxy Team,

It would be great if the genome and annotation release of Physcomitrella patens can be added to Galaxy Main.

They are available at NCBI:
https://www.ncbi.nlm.nih.gov/genome/383
https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Physcomitrella_patens/100

Best,
Peishan

@jennaj
Copy link
Member Author

jennaj commented Dec 12, 2019

@echoyps & everyone else with genome requests:

We will now be adding new genomes over the next few months and throughout the upcoming year. Please continue to post requests here. UCSC and NCBI are the preferred data sources. Others are possible. However requested, be specific.

Reminders:

Anyone can use genome (or transcriptome/exome) fasta data as a custom genome "from the history" now -- you do not need to wait for us to index server-side. Annotation is supplied by the end-user from the history by default (even for built-in indexed genomes) -- with just a few tool exceptions, but those also accept annotation data from the history. Genomes (fasta) are the data that is currently indexed server-side. Annotation may be indexed in the future. Custom genomes (fasta) can be promoted to a custom build (User > Custom Builds) in order to create a custom "database" metadata key that can be assigned to datasets (some tools wrapped for Galaxy require that the "database" is assigned to inputs).

Be sure to format the genome fasta correctly (remove description content on the ">" title line) and make sure the genome build/version and chromosome identifiers are an exact match between the custom reference genome (fasta) and any reference annotation (gtf or gff3) you plan to use in your analysis, before starting any analysis that uses it or promoting the fasta to a custom build. This will avoid problems later on. If there is a formatting problem (example: headers on a gtf dataset) or chromosome mismatch issue between inputs, this usually requires the need to fix the fasta format and start the analysis over from the very start, which can be frustrating. If you have a choice about annotation formats, choose the gtf version instead of the gff3 version -- a gtf formatted annotation dataset is accepted by more tools, and using the same exact annotation data throughout an analysis workflow is very important.

Mapping jobs will usually not "fail" due to chromosome identifier mismatch issues. Instead, if the annotation is input during the mapping step, the annotation will not really be used, creating problematic scientific results that may not be obvious to detect. Tools used downstream with a mismatched genome+annotation can also produce problematic scientific results that are not obvious, or may fail outright with errors that are difficult to interpret. Problematic annotation formatting itself will also lead to problems. Try to avoid issues by preparing your inputs correctly at the start :)

Finally, when loading these data with the Upload tool, allow the datatype to be detected instead of assigning it. This triggers basic format checks and a Galaxy-assigned datatype. If you do not get the expected datatype assigned, this almost always means that there is a formatting issue that needs to be addressed. Most format issues can be resolved within Galaxy. After fixed, the correct datatype can be assigned: Click on the dataset's pencil icon > Edit Attributes forms > Datatypes tab > "detect datatype" (best choice) or directly assigned (be careful if choosing this option). If Galaxy cannot "detect" the format correctly, there is likely still a data content or format problem.

If you ever have a problem that you cannot figure out how to resolve, know that the vast majority of tool errors or unexpected results are due to input issues that can be fixed to achieve a successful and correct scientific/technical result. First, review the tool form help -- most have examples of the expected input's content and format. Next, review our Troubleshooting and other FAQs. If those do not resolve the issue, the Galaxy Help forum is a great place to review prior Q&A or to ask a novel question. The Galaxy Training Network (GTN) tutorials are also a very useful resource -- compare your methods to the examples.

I'm only posting this advice here now since it hasn't been covered for a while at Github, and there are newer related FAQs plus prior Q&A available. Any followup/clarification should be asked about at Galaxy Help (not here).

The FAQs/links below will help with all of the above.

All FAQs: https://galaxyproject.org/support/

Start with these to learn how to use a custom genome and the associated annotation:

Error or unexpected result FAQ:

Galaxy Help forum:

GTN Tutorials:

Thanks! Jen

@dram26
Copy link

dram26 commented Aug 31, 2020

Hi!

could you kindly add macaca fascicularis genome for BWA ? it's now at ncbi https://www.ncbi.nlm.nih.gov/genome/776

[There is a 2015 petition for the same here https://trello.com/c/mJWnAuuQ/1511-reference-genome-requests-for-http-usegalaxyorgby AmyK Feb 4, 2015 at 6:47 PM and another in 2016, but i guess the source wasn't validated then? ]

Best! David

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests