Dependencies

Most dependencies for the pipeline are met by the two conda_requirements conda_protmap.yml and conda_protmap_R.yml.

The dependencies are tested on a ubuntu docker enviroment. To setup the docker enviroment the script ./setup_docker.sh is used. All further depencencies are shown there. To meet the other depencies that are not covered by docker adapte this script accordingly.

Transcriptom

In the contribution we used all raw reads from the bio sample PRJNA655119 for the expression analysis. https://www.ncbi.nlm.nih.gov/bioproject/?term=prjna655119

Genomes

For the publication the genomes for the following bacteria are downloaded.

This is handled by the script ./build_db/download_all_genomes.sh.

The short names are used throughout the scripts and should not be changed. The full names are the following.

short name	scientific name	taxid	assembly id
anaero	Anaerostipes caccae DSM 14662	411490	GCA_014131675.1
bact	Bacteroides thetaiotaomicron VPI5482	226186	GCA_014131755.1
bifi	Bifidobacterium longum NCC2705	206672	GCF_000007525.1
blautia	Blautia producta ATCC 27340 DSM 2950	1121114	GCA_014131715.1
clostri	Clostridium butyricum DSM 10702	1316931	GCA_014131795.1
ecoli	Escherichia coli str K12 substr MG1655	511145	GCF_000005845.2
ery	Erysipelatoclostridium ramosum DSM 1402	445974	GCA_014131695.1
lacto	Lactobacillus plantarum subsp plantarum ATCC 14917 JCM 1149 CGMCC 12437	525338	GCA_014131735.1

Structure of scripts

All major steps are covered in ther own sub directory.

build_db:

Builds databases for comet, downloads genomes

comet:

MS data is gathered, PSMs are generated.

transcritptom

All scripts for the mapping of the transcriptomic reads are here.

data_accumulation

Here most of the final analysis are done

candidates

Here the candidate selection and evaluation is done.

UCSC_track_tools

The UCSC track hub is generated here

figure_plotting

All scripts for plots that where automatically generated from data are here.

start_anno_html

The result for evidence of early annotation startsites are generated here.

Parameters

The file parameters.json holds paramters for the script to run and must be changed for each system.

session_id

trackhub session id

hub_id

hub_id for UCSC genome browser

data_dir

dir to store all data. 1.5 TB at least.

tmp_dir

dir for temporary files

publication_dir

dir to output figures and infos

ms_dir

dir of ms data. Not needed if PRIDE is available.

chrome_bin

dir to CORRECT (version) chrome binary .

bin_path

dir where comet etc is expected.

blastdb_dir

blast nt db.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
UCSC_track_tools		UCSC_track_tools
build_db		build_db
candidates		candidates
comet		comet
data_accumulation		data_accumulation
figure_plotting		figure_plotting
start_anno_html		start_anno_html
transcriptom		transcriptom
.gitignore		.gitignore
README.md		README.md
SIHUMI_info_dic.json		SIHUMI_info_dic.json
conda_protmap.yml		conda_protmap.yml
early_starts.csv		early_starts.csv
install_all_packages.R		install_all_packages.R
not_annotated_k10_ecoli.csv		not_annotated_k10_ecoli.csv
parameters.json		parameters.json
python_req.txt		python_req.txt
run_all.sh		run_all.sh
setup_conda.sh		setup_conda.sh
setup_docker.sh		setup_docker.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dependencies

Transcriptom

Genomes

Structure of scripts

Parameters

About

Releases

Packages

Languages

FantasticMrFux/PROTMAP_scripts

Folders and files

Latest commit

History

Repository files navigation

Dependencies

Transcriptom

Genomes

Structure of scripts

Parameters

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages