- Requirements: having successfully run the step insert species and taxonomy AND database creation AND insert genes.
- Goal: insert EST data used in Bgee.
- This step will insert normal EST libraries and their stage/organ.
- It will insert UniGene information from Ensembl, or from a mapping file, for EST-annotated species in Bgee. Currently only human, mouse, zebrafish and xenopus are concerned. Drosophila melanogaster is also concerned through a mapping file done by blast in a special target of the Makefile.
- Download of UniGene information requires about 3GB of disk space. The Makefile.common
$(TMPDIR)
variable points to a directory with enough free disk space. - Insertion of ESTs for Drosophila melanogaster requires an extra mapping step because the mapping between FlyBase genes and UniGene clusters is not available in Ensembl/BioMart. So need to create a mapping file by BLAST (
dmel_mapping
step in the Makefile)
- Download of UniGene information requires about 3GB of disk space. The Makefile.common
- It will insert miRNA EST information from smirnadb. Only human, mouse, zebrafish and Drosophila melanogaster are currently concerned.
- Please note that data from smirnadb are not updated anymore. So they are frozen in the svn and not re-downloaded & re-treated by the
Makefile
. - The file S.xls, once downloaded, has to be saved as tsv (change the name to
S.csv
: use openOffice to generate the tsv file, in UTF-8, with Tab delimiter, without quotes as field delimiter) and placed inPIPELINEROOT/EST
. Remove from this file the whole section concerning Rat libraries (some have the same name than human libraries and the file is a mess to parse).
- Please note that data from smirnadb are not updated anymore. So they are frozen in the svn and not re-downloaded & re-treated by the
- It will fill
expression
table and update the fieldestData
(quality) inexpressedSequenceTag
table.
- If it is the first time you execute this step in this pipeline run:
make clean
- Run Makefile:
make
WARNING: actually, before running expression_est
, you should check the generated file check_conditions
, to detect invalid conditions not supposed to exist in the related species. See 'Details' section of pipeline/post_processing/README.md for an explanation on how to fix such issues (in case the annotations were not incorrect).
- insert_miRNA_est.pl can return warnings (mainly for human and mouse) due to MySQL uppercase/lowercase un-sensitiveness. E.g.:
- DBD::mysql::st execute failed: Duplicate entry 'hsa-mi8-15' for key 'PRIMARY' at insert_miRNA_est.pl line 119.
make smiRNAdb
will download files from smirnadb. They are NOT updated anymore. So this target is NOT done by default!- The target will download the latest version of
S.xls
. Save it as a tsv (keep the name asS.csv
, use openOffice to generate the tsv file, in UTF-8, with Tab delimiter, without quotes as field delimiter). The target will place it in pipeline/ESTs. Remove from this file the whole section concerning Rat libraries (some have the same name than human libraries and the file is a mess to parse). - The target will download the latest version of the files
Report_x.csv
and place them in pipeline/ESTs.
- The target will download the latest version of
make dmel_mapping
will download D. melanogaster cdna in fasta from Ensembl/BioMart and create a BLAST db from it. Then it will download D. melanogaster UniGene clusters, and blast them against the previously created BLAST db to build the mapping file.