fastSTRUCTURE

This script is used to run the fastStructure software for genetic clustering analysis. It automates the process of running the program with different values of K, which represents the number of clusters to be inferred.

Description

This repository contains a script (run_fastStructure.sh) that automates the process of running the fastStructure software for genetic clustering analysis.

The script uses Docker to ensure that Python is installed in a consistent environment. It runs the fastStructure program with different values of K, which represents the number of clusters to be inferred. The program reads a dataset of genetic markers and infers a population structure using a Bayesian model-based clustering algorithm.

The script assumes that the input file is in the data/ directory and the output files will be saved in the output/ directory. The user can specify the input file name, output file prefix, range of K values to be used, and the random seed to be used for the analysis by editing the variables in the run_fastStructure.sh file.

The output files generated by the program will contain the inferred cluster membership probabilities for each sample in the dataset. These files can be further analyzed and visualized using other software tools.

This repository also includes a README file that provides instructions on how to use the script and a license file that specifies the terms and conditions for using and modifying the code.

Requirements

Docker installed on your system
Python installed inside the Docker container

Usage

Clone the repository on your local machine:

git clone <repository_url>

Open the terminal and navigate to the directory where the repository is cloned.
Add the input files to the data/ directory.
Open the run_fastStructure.sh file and edit the following variables according to your needs:
input: The name of the input file (without extension) located in the data/ directory.
output: The prefix to be used for the output files.
seq: The range of K values to be used for the analysis.
seed: The random seed to be used for the analysis.
Save and close the run_fastStructure.sh file.
In the terminal, run the following command:

bash run_fastStructure.sh

The program will generate output files for each value of K in the output/ directory.

Note

The script assumes that the input file is in the data/ directory and the output files will be saved in the output/ directory.
The docker run command in the script mounts the data/ directory inside the Docker container at the /fastStructure/data path. Therefore, the input file name and path in the command should be relative to this path.
The --full option in the command specifies the full data mode. You can remove this option if you want to use the SNP-only mode.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
fastStructure.sh		fastStructure.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fastSTRUCTURE

Description

Requirements

Usage

Note

About

Releases

Packages

Languages

paulocecco/fastSTRUCTURE

Folders and files

Latest commit

History

Repository files navigation

fastSTRUCTURE

Description

Requirements

Usage

Note

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages