Skip to content

paulocecco/fastSTRUCTURE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

fastSTRUCTURE

This script is used to run the fastStructure software for genetic clustering analysis. It automates the process of running the program with different values of K, which represents the number of clusters to be inferred.

Description

This repository contains a script (run_fastStructure.sh) that automates the process of running the fastStructure software for genetic clustering analysis.

The script uses Docker to ensure that Python is installed in a consistent environment. It runs the fastStructure program with different values of K, which represents the number of clusters to be inferred. The program reads a dataset of genetic markers and infers a population structure using a Bayesian model-based clustering algorithm.

The script assumes that the input file is in the data/ directory and the output files will be saved in the output/ directory. The user can specify the input file name, output file prefix, range of K values to be used, and the random seed to be used for the analysis by editing the variables in the run_fastStructure.sh file.

The output files generated by the program will contain the inferred cluster membership probabilities for each sample in the dataset. These files can be further analyzed and visualized using other software tools.

This repository also includes a README file that provides instructions on how to use the script and a license file that specifies the terms and conditions for using and modifying the code.

Requirements

  • Docker installed on your system
  • Python installed inside the Docker container

Usage

  1. Clone the repository on your local machine:
git clone <repository_url>
  1. Open the terminal and navigate to the directory where the repository is cloned.
  2. Add the input files to the data/ directory.
  3. Open the run_fastStructure.sh file and edit the following variables according to your needs:
  4. input: The name of the input file (without extension) located in the data/ directory.
  5. output: The prefix to be used for the output files.
  6. seq: The range of K values to be used for the analysis.
  7. seed: The random seed to be used for the analysis.
  8. Save and close the run_fastStructure.sh file.
  9. In the terminal, run the following command:
bash run_fastStructure.sh
  1. The program will generate output files for each value of K in the output/ directory.

Note

  • The script assumes that the input file is in the data/ directory and the output files will be saved in the output/ directory.
  • The docker run command in the script mounts the data/ directory inside the Docker container at the /fastStructure/data path. Therefore, the input file name and path in the command should be relative to this path.
  • The --full option in the command specifies the full data mode. You can remove this option if you want to use the SNP-only mode.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages