brew install openssl
export C_INCLUDE_PATH=${C_INCLUDE_PATH}:/usr/local/Cellar/openssl/your_version/include
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/Cellar/openssl/your_version/lib/"
sudo apt-get install openssl
sudo python setup.py install
# or
python setup.py install --user
Follow the instruction: https://conda.io/miniconda.html to install python 3.6
wget -c https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
bash install.sh
Activate the environment
source activate lisa
The size should be at least 10 genes, no more than 3000, it is too many.
without l1 regularization selected DNase-seq sample peak region to limit motif region,
time THEANO_FLAGS='mode=FAST_RUN,device=cpu,floatX=float32,openmp=True' OMP_NUM_THREADS=8 lisa logit --gene AR.symbol --tf no --name AR --species hg38 -O ${HOME}/public_html/lisa/ARfolder
with l1 regularization selected DNase-seq sample peak region to limit motif region,
time THEANO_FLAGS='mode=FAST_RUN,device=cpu,floatX=float32,openmp=True' OMP_NUM_THREADS=8 lisa logit --gene AR.symbol --tf no --name AR --species hg38 -O ${HOME}/public_html/lisa/ARfolder --DNase
with ChIP-seq peak to validate the cis-element prediction.
AR_ChIP-seq_peak
is made through intersection between genome wide 1kb windows and AR peak bed, return the index of the genome window, then use np.load
and np.save
to save as numpy binary array file.
time THEANO_FLAGS='mode=FAST_RUN,device=cpu,floatX=float32,openmp=True' OMP_NUM_THREADS=8 lisa logit --gene AR.symbol --tf AR_ChIP-seq_peak --name AR --species hg38 -O ${HOME}/public_html/lisa/ARfolder --DNase
By default, LISA will run all 10 marks, for selecting marks, use below:
time THEANO_FLAGS='mode=FAST_RUN,device=cpu,floatX=float32,openmp=True' OMP_NUM_THREADS=8 lisa logit --gene AR.symbol --tf AR_ChIP-seq_peak --name AR --species hg38 -O ${HOME}/public_html/lisa/ARfolder --DNase --histone "H3K27ac,DNase,H3K4me3,H3K27me3,ATAC-seq"
time THEANO_FLAGS='mode=FAST_RUN,device=cpu,floatX=float32,openmp=True' OMP_NUM_THREADS=8 lisa logit --gene AR.symbol --tf AR_ChIP-seq_peak --name AR --species hg38 -O ${HOME}/public_html/lisa/ARfolder --DNase --histone DNase
lisa fastq
mode needs to fill the configuration files with bwa genome index bwa_index
.
Remember, the --name
is the prefix for new-generated HDF5 sample id, avoid using numbers and try to make them as unique as possible, such as "LnCaP_Study_2017".
mkdir -p input_fastq_folder
mkdir -p input_fastq_folder/H3K4me3
mkdir -p input_fastq_folder/H3K27me3
mkdir -p input_fastq_folder/H3K27ac
# then cp the corresponding fastq files with suffix .fastq, .fastq.gz, .fq or .fq.gz into the sub-directory
The input_fastq_folder
contains sub-directories, such as: H3K27me3,H3K4me3,H3K27ac. Under each of the sub-directory, there are single-ended ChIP-seq fastq files. Then, specify "--histone H3K27me3,H3K4me3,H3K27ac". The folder names and --histone
option should be H3K27ac,DNase,H3K4me1,H3K4me3,H3K4me2,H3K27me3,H3K36me3,H3K9me3,ATAC-seq,H3K9ac
or its subset. Other factors are not supported yet.
Then, run the fastq
mode:
lisa fastq --fastq input_fastq_folder --name LnCaP_Study_2017 --gene genes.txt --tf no -O output_html_folder --histone H3K4me3,H3K27me3,H3K27ac
snakemake -j 4 --use-conda
lisa logit --histone H3K4me3 -O output_html --name test --gene gene.list --tf no --species hg38 --additional_h5 H3K4me3.h5 2>>{log}"
For fastq
and logit
mode, optionally, specify a region to filter chip-seq peak or motif regions.
--EXPANDDNase new_regions.bed
Sometimes pandas would throw out HDFstore error, try:
conda install --upgrade pandas pytables h5py
or
pip install --upgrade pandas tables h5py
One possible problem is about mismatch between hdf5 header and library when using pandas.read_hdf, which calls pytables. Set the env of HDF5_DIR and the LD_LIBRARY_PATH to the same hdf5 in .bashrc as follows, or just set nothing, let the system decide..
export HDF5_DIR=hdf5-x.x.x/hdf5
export LD_LIBRARY_PATH=hdf5-x.x.x/local/lib:$LD_LIBRARY_PATH