Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNPeff ENSEMBL 102 database construction , vep-102 installation and separate vep annotation by species to avoid human only option issues #10

Open
wants to merge 2 commits into
base: human-pipeline
Choose a base branch
from

Conversation

Just08
Copy link

@Just08 Just08 commented Feb 2, 2022

New SNPeff 102 and VEP 102 annotation work .
Note that my custom SNPeff custom database with ensembl 102 data not contains regulation and motif databases due to some issue ( comment part of my dockerfile modification for motif part ).

@Just08
Copy link
Author

Just08 commented Feb 2, 2022

For motif part ( comment part of my dockerfile modification ) 0 motifs are loads :

[Optional] Reading motifs: GFF
#51 1605.7 00:02:30             Loading PWMs from : /opt/snpEff-4.3T/./data/GRCm38.102/pwms.bin
#51 1605.7 00:02:30             Loading motifs from : /opt/snpEff-4.3T/./data/GRCm38.102/motif.gff
#51 1633.2 00:02:58             Loadded motifs: 0
#51 1633.2 00:02:58             Saving motifs to: /opt/snpEff-4.3T/./data/GRCm38.102/motif.bin

For regulation part, I test :

gunzip ${PACKAGE_DIR}/snpEff-4.3T/data/GRCm38.102/*.gz \
mkdir ${PACKAGE_DIR}/snpEff-4.3T/data/GRCm38.102/regulation.bed \
wget -nv -r -np -nd -A "*.bed.gz" -e robots=off  http://ftp.ensembl.org/pub/release-102/regulation/mus_musculus/Peaks/ \
ls *.bed.gz | awk -F"." -v mvCmd='mv "%s" "%s"\n' '{printf mvCmd,$0,"regulation."$3"."$4".bed.gz"}' | sh \
mv regulation.*.bed.gz ${PACKAGE_DIR}/snpEff-4.3T/data/GRCm38.102/regulation.bed/ \
gunzip ${PACKAGE_DIR}/snpEff-4.3T/data/GRCm38.102/regulation.bed/*.bed.gz

But I have the same issue that were reported without any solution : pcingola/SnpEff#304

This is why I can't achieve Building databases. Regulatory and Non-coding part of SnpEff documentation .

@Just08 Just08 changed the title SNPeff ENSEMBL 102 database construction and vep-102 installation SNPeff ENSEMBL 102 database construction , vep-102 installation and separate vep annotation by species to avoid human only option issues Feb 2, 2022
@NikdAK
Copy link
Member

NikdAK commented Feb 3, 2022

I can confirm the bug regarding the regulatory database build.
Anyways I found a workaround: Convert the BED to GFF
If the format is like this it will just work. Only columns 1,4,5,9 need valid entries. For the attributes only Cell_type seems to be mandatory, but setting name, alias, etc. could possibly be useful somewhen.

chr1 source feature 4426826 4427337 . . . Cell_type=CHD2_CH12_LX__Enriched_Site

All bed files should be combined into a single gff, which can be .gz to save space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants