-
Notifications
You must be signed in to change notification settings - Fork 2
Section 2 CLI Config Files
EnsemblLite
installation creates a elt
command. In the terminal, type
elt
This will initialise the package and then display the available list of subcommands. Try typing elt exportrc
Selecting what data you want from Ensembl is currently being specified using a plain text config file. The command line tool can generate a template for you. So let's do that now and then discuss the files and the config options.
Just for kicks, let's use the TUI. (A TUI is a terminal user interface.) This provides a point-and-click interface for a command-line application. (It makes it easier to explore the commands, but it also has some shortcomings.)
elt tui
You can use your mouse, or the tab and arrow keys to navigate. The main thing to look for now is the exportrc
subcommand. We will enter sample
as the value for the --outpath
option. Note that as you type, elt
is completing the terminal command for you. Click on the "Close & Run" button.
(The caveat with this interface is that the command is not recorded within your shell's history, so using up arrows will not recover it.)
The config files have distinct sections for the different types of data.
This is where you specify the FTP address for the Ensembl server containing the genomes of interest to you. You can pick any server you like as long as it matches exactly the one that's in this file. (Sorry, that's a lame joke, but at present we don't support any other Ensembl FTP servers.)
Here the staging_path
is the name of the directory where you want the download data to be put. The install_path
is where you want the installation to go.
The version of Ensembl that you want data from.
At present, we allow more options under the comparer section than we truly support. So let me focus on the two that really matter.
The align_names
option is the name of a directory on the Ensembl FTP server containing the alignments that you are interested in. This can be a comma separated list of names. Those names must match exactly the names listed at the following location on the FTP site: https://ftp.ensembl.org/pub/release-112/maf/ensembl-compara/multiple_alignments/
The homology
option indicates whether or not you want homology data for the genomes that you are going to be selecting. No information is required here aside from a correct syntax expression which is the word homology followed by an equal sign. If you have specified an alignment, you do not need to specify the homology option as this will automatically be added.
Note Currently don't support pairwise alignments. So if this is a feature that's important to you, create a discussion topic and tell us. Of course if somebody else has created that topic make sure and vote for it.
At present indicating which species you want is done by indicating either the Latin name or common name inside square brackets followed by the line db = core
. To see which names are supported take a look at the species.tsv
file which was also created by the exportrc
subcommand (see below).
There is a shortcut to naming species. At present if you want all of the species that are included as a part of one of the whole genome alignment sets then you only need to specify that alignment.
This file is a tab delimited file that contains the latin name and common names of the species present at ensembl.org
. (At the moment it also includes a column for the species prefix of ensemble identifiers however this will be discontinued.)
The contents of this file are used to validate the species names entered into the file indicated above and other operations that are executed by elt
.
WARNING While at present this file is included in the repository we will be changing this so it is downloaded and always up to date.
Edit the sample.cfg file so that it will download the genomes for yeast and c. elegans and homology data.
Make sure you specify a sensible destination for the staging data and for the installation.
When you have done this we will proceed to the next step.