Sleipnir is a C++ library enabling efficient analysis, integration, mining, and machine learning over genomic data. This includes a particular focus on microarrays, since they make up the bulk of available data for many organisms, but Sleipnir can also integrate a wide variety of other data types, from pairwise physical interactions to sequence similarity or shared transcription factor binding sites.
Main documentation: https://functionlab.github.io/sleipnir-docs/
The Sleipnir wiki and bug reporting system are at: (TBD)
The file README.developer has notes for Sleipnir developers.
Sleipnir also includes the code to compile SEEK (the human coexpression search engine). See the link http://seek.princeton.edu/installation.jsp for information on its installation.
The latest version of Sleipnir software can be obtained by issuing the following command:
git clone https://github.com/FunctionLab/sleipnir.git
-
Install g++, cmake
-
Install libraries
- On Mac:
brew install libsvm
brew install libomp
brew install thrift
brew install gsl
brew install boost
- On CentOS Linux:
sudo yum install libsvm
sudo yum install libgomp
sudo yum install thrift-devel
sudo yum install gsl
sudo yum install boost
- On Ubuntu Linux:
apt-get update
apt-get install build-essential
apt-get install libsvm-dev
apt-get install libomp-dev
apt-get install libthrift-dev
apt-get install libgsl-dev
apt-get install libboost-dev
apt-get install libboost-graph-dev
apt-get install libboost-regex-dev
apt-get install libreadline-dev
- On Mac:
-
Clone repository
git clone https://github.com/FunctionLab/sleipnir.git
cd sleipnir
git submodule init
git submodule update
-
Prep make files with cmake
mkdir Debug
cd Debug/
cmake -DCMAKE_BUILD_TYPE=Debug ..
- Alternately replace 'Debug' with 'Release' in all the above commands to make the release build
-
Build the code
- (On Mac) - Edit sleipnir/src/libsvm.h
- Replace: #include <libsvm/svm.h>
- With: #include <svm.h>
cd Debug/
make
- In case of errors:
make clean
make VERBOSE=1
- In case of errors:
- (On Mac) - Edit sleipnir/src/libsvm.h
-
[Optional] Install SVM_PERF libraries to build: Data2SVM, SVMperfer, SVMperfing, SVMfe, SVMer
wget http://download.joachims.org/svm_perf/current/svm_perf.tar.gz
mkdir svm_perf; cd svm_perf; tar xzvf ../svm_perf.tar.gz
make
ar rcs libsvmperf.a *.o /.o
cd ..; cp -a svm_perf /usr/local/lib/
ln -s /usr/local/lib/svm_perf/libsvmperf.a /usr/local/lib
ln -s /usr/local/lib/svm_perf /usr/local/include
-
One-time prep: create the conda environment (by default this will create the 'genomics' conda env)
conda env create --file scripts/seek/conda_environment.yml
-
Run the c++ unit tests
Debug/tests/unit_tests
-
Test the scripts for building and merging SEEK database compendiums
conda activate genomics
python -m pytest -s -v scripts/seek/tests
-
Run the SEEK system tests (test SeekMiner and SeekRPC)
conda activate genomics
python -m pytest -s -v tests/
-
Run Seek DB tests (test that the database gives expected bio-informative results). These tests can only be run where the full SEEK database is installed.
cd tests/bioinform_tests
- PREP: Install and init Git LFS (Large File Storage)
- On Mac:
brew install git-lfs
- On Centos:
yum install git-lfs
- On Ubuntu:
apt-get install git-lfs
- Initialize git-lfs:
git lfs install
- Refresh the gold standard tgz files (should be multipe MB in size)
rm gold_standard_results/*
git restore gold_standard_results/*
- On Mac:
- Run the tests:
(The bioinform test has an option for different lengths of test, i.e. how many queries are run)
bash run_paramtest.sh -v -s <path_to_seek_db> -b <path_to_seek_binaries>
bash run_querysize.sh -v -s <path_to_seek_db> -b <path_to_seek_binaries>
bash run_bioinform.sh -v -s <path_to_seek_db> -b <path_to_seek_binaries> -t [tiny,short,medium,long]