-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This tutorial was tested on Ubuntu 20.04.6 LTS. If you get stuck, or it becomes too confusing, it's because it is. Please go ahead and create an issue and we will help you out 👍
- First, download Miniconda https://docs.conda.io/en/main/miniconda.html
- Clone this repository by either clicking on the green "Code" button on the top right of this page and clicking "Download ZIP," or by
$ git clone https://github.com/ucrbioinfo/Fugue.git
in your desired directory. - These installation instructions are for Ubuntu 20.04.6 LTS. Create a conda environment and activate it:
cd Fugue
conda create -n fugue python=3.10 -y
conda activate fugue
- Install the required FUGUE Python packages.
pip install pandas
pip install biopython
pip install PyYAML
pip install requests
- Create the required DIAMOND conda environment, and install DIAMOND. FUGUE and DIAMOND run on different Python versions. Hence, you need a separate environment for DIAMOND.
conda create -n diamond python=2.7 bioconda::diamond -y
Do not switch to this environment. FUGUE will use it on its own at a later stage.
In case you already have an environment called diamond
, name this new environment anything you want, and then modify the file FUGUE/src/utils/diamond/0_just_run_this.sh
and change the name diamond
on lines 4, 5, and 6.
FUGUE downloads fungi data from the following databases:
- NCBI
- FungiDB
- EnsemblFungi
- MycoCosm
For steps 1 and 4, visit this Google Drive shared folder and perform the mandatory steps in documents "[Data Source] NCBI Datasets" and "[Data Source] MycoCosm." These two databases need special authentication methods AKA your own credentials. The other two documents for data sources are for your reference only and no further action is needed from you.
We recommend opening a tmux session at this point as downloading and processing this data may take a couple of hours. When you open a tmux session simply by typing tmux
, you may have to switch to the fugue conda environment again. After doing so, simply run $ python src/main.py
and continue with the instruction below. While FUGUE is running, you may abandon the tmux session (doing so will not stop the program as it will run in the background) by pressing CTRL + B, then D. To switch back to your session and continue working with FUGUE, type tmux attach -t 0
where 0 is your tmux session. To see a list of all sessions, type tmux list-sessions
. To kill session 0, type tmux kill-session -t 0
.
-
To get the data used in ALLEGRO, perform steps 6 and 8. Then run
$ python src/main.py
and enter 13. You may now leave the processes to run in the background and skip everything below. When finished, read the last bullet point at the bottom of this page. If you are using FUGUE for other purposes, continue reading. -
Run
$ python src/main.py
and download 1 through 4 (or simply choose just option 5). -
If you are preparing data for ALLEGRO, when all files are downloaded, you may create CDS files from GFF using option 7 (otherwise, skip 6 and 7). These CDS files are such that the intron/exon boundaries in each gene are separated by a pipe
|
character. ALLEGRO ignores gRNAs with pipes in their sequence. -
Select option 8 and then 9 to merge the downloaded files and remove duplicates. Run option 8 regardless of whether you downloaded from a single database, or multiple databases in options 1-5.
-
In ALLEGRO, we run DIAMOND to find orthogroups of S. cerevisiae genes of interest across all other species. Navigate to
src/utils/diamond
. Ensure you have the diamond python package installed in a conda environment called 'diamond' (alternatively, modify the name of the environment in which diamond is installed in the shell script0_just_run_this.sh
). Place the.faa
amino acid file for S. cerevisiae (or your reference species of interest) ininputs/reference
. You may findsaccharomyces_cerevisiae.faa
underdata/fourdbs_concat/proteomes/saccharomyces_cerevisiae.faa
. Simply copy this file and place it undersrc/utils/diamond/inputs/reference
. -
Modify
src/utils/diamond/config.yaml
to point to the correct FULL PATH of the CDS and proteome files for your reference species, as you see in the provided example configuration file that I used for myself.cds_path: '/point/to/your/own/path/FUGUE/my_data/saccharomyces_cerevisiae_cds.fna' proteome_path: '/point/to/your/own/path/FUGUE/my_data/proteomes/saccharomyces_cerevisiae.faa' query_proteins_dir: '/point/to/your/own/path/FUGUE/data/fourdbs_concat/proteomes/' # Do not change the path after /FUGUE/data/...
-
Navigate to the root directory and run FUGUE again
python src/main.py
. Select option 10 to run DIAMOND. This will generateOrthogroups.tsv
and removes orthogroups with < 30% protein identity. It will also generate a species list calledfourdbs_hi_input_species.csv
underdata/fourdbs_concat/
. -
Navigate to
src/utils/ortholog_finder
and set thereference_species
name andgene_names
in theconfig.yaml
file. -
Navigate to the root directory and run FUGUE again
python src/main.py
. Select option 11 to generate fasta files for each species with only the genes of interest and their orthologs underorthogroups/
. -
Run FUGUE one last time and select option 12 if you previously selected option 7. This will create orthologous gene files with pipe delimiters derived from GFF files.
-
The files we used for all ALLEGRO experiments are under
data/fourdbs_concat/ortho_from_gff
anddata/fourdbs_concat/fourdbs_input_species.csv
. We copied the directorydata/fourdbs_concat/ortho_from_gff
and placed it underALLEGRO/input
and copied the CSV file and placed it in the same place.
Back to the landing page.