GENEVIC

GENetic data Exploration and Visualization Intelligent interactive Console

This is a smart chat assistant that is crafted to facilitate research in Biomedical Informatics for both beginners and intermediate-level researchers.

Introduction

GENEVIC is augmented by generative AI models implemented via Azure OpenAI platform.
It supports Python's built-in SQLITE as well as your own Microsoft SQL Server.
It can be run from your local host or Streamlit cloud.
Tasks that can be performed with GENEVIC:
- PGS Chat: Retrieve information from and visualize any custom database.
- GeneAPI Chat: Explore Bioinformatics websites via automated API calls.
- Literature Search: Search for relevant literature evidence in well-known portals for a given search query.

Project Structure

Features

PGS Chat

Retrieve information from and visualize custom database.
Demo database: Polygenic Score (PGS) Rank Database. See Supplimentary Materials for more information.
Code Writer: Auto-translate prompts/questions in natural language (e.g., English (US)) to SQL queries or Python code.
- Steps to use this section:
  - Use a question from the FAQ or enter your own question.
  - You can select show code and/or show prompt to show SQL & Python code and the prompt behind the scene.
  - Click on submit to execute and see result.
  - For advanced questions such as forecasting, you can use GPT-4 (if available) as the engine.
- Example prompts/questions:
  - Show the top 10 ranked genes for Alzheimer.
  - Plot distribution of ranks for the top 100 SNPs for Schizophrenia.
Download the query results as CSV for retrospective analysis and interpretation.
Query ChatGPT directly to generate more information or novel research hypothesis.

Gene API Chat

Explore external Bioinformatics websites via automated web API calls.
Demo APIs explored: STRING and ENRICHR.
Generate gene-gene interaction network, one or more gene names as input.
- Entire functionality of STRING API replicated as is.
- Interactive in-app display of the network.
Perform gene enrichment analysis with reference gene set libraries, given gene list as input.
- Visualize the network graph.
- Download the enrichment results as CSV and/or the visualizations in known image formats.

Literature Search

Search for literature evidence in PubMed, Google Scholar, or Arxiv.
Search in 1 or 2 or all of these websites at the same time.
- Example search queries:
  - Search for articles with gene APOE and Alzheimer in Pubmed
  - Search for articles with Schizophrenia in Google Scholar
  - articles with gene TREM2 and Schizophrenia in Arxiv
  - Search for articles with APOE gene name and trait Alzheimer
Displays the name and links of the articles for any given search query.
Displays the abstract of the article, given its link as search query.

Local Installation

Pre-requisites

Python 3.10+
Important: Python and the pip package manager must be in the path in Windows for the setup scripts to work.
Ensure you can run python --version from console.
On Ubuntu, you might need to run sudo apt install python-is-python3 to link python to python3.

Step-wise Instructions

Step 1. Clone this repository

Clone this repository:git clone https://github.com/anath2110/GENEVIC.git
From the terminal, navigate to cd [path-to-project-root-folder]

Step 2. Set up environmental variables

Provide settings for Open AI and Database. You can either create a file named secrets.env file in the root of this project folder in your PC as below or do it using the app's GUI later on.

- Option 1: use built-in SQLITE. Then you don't need to install SQL Server.

    AZURE_OPENAI_API_KEY="9999999999999999999999999"
    AZURE_OPENAI_GPT4_DEPLOYMENT="NAME_OF_GPT_4_DEPLOYMENT"
    AZURE_OPENAI_CHATGPT_DEPLOYMENT="NAME_OF_CHATGPT_4_DEPLOYMENT"
    AZURE_OPENAI_ENDPOINT=https://openairesourcename.openai.azure.com/
    SQL_ENGINE = "sqlite"


- Option 2: use your own SQL Server

    AZURE_OPENAI_API_KEY="9999999999999999999999999"
    AZURE_OPENAI_ENDPOINT="https://openairesourcename.openai.azure.com/"
    AZURE_OPENAI_GPT4_DEPLOYMENT="NAME_OF_GPT_4_DEPLOYMENT"
    AZURE_OPENAI_CHATGPT_DEPLOYMENT="NAME_OF_CHATGPT_4_DEPLOYMENT"
    SQL_USER="sqluserid"
    SQL_PASSWORD="sqlpassword"
    SQL_DATABASE="WideWorldImportersDW"
    SQL_SERVER="sqlservername.database.windows.net"

IMPORTANT If you are a Mac user, please follow this to install ODBC for PYODBC

Step 3. Configure development environment

NOTE all activities in this step will performed using the command line

Step 3.1 Navigate to the root directory of this project

Navigate to cd [path-to-project-root-folder]

Step 3.2 Create a python environment

This step is required ONLY if did not perform this earlier as part of the pre-requisites

Step 3.3 Import the requirements.txt

Run the command: pip install -r requirements.txt

Step 3.4 Run the application locally

To run the application from the command line: streamlit run Home.py
You will see the application load in your browser.

Note: For troubleshoot, see here Note: For Azure Open AI subscription and set up: see here

Docker Installation

Prerequisites:

Install 'Docker' in local system or create an account in Docker Cloud. Help Resources: https://docs.docker.com/engine/install/

Download Docker Image for GENEVIC:

Click here to download the zipped docker image file

Commands:

Run the following commands from the directory where you loaded the above image (here, exmaple for Windows CMD prompt is shown):
docker load -i genevic-v1.tar
This command loads the Docker image from the tar file into your local Docker repository.
docker run -p 8501:8501 genevic-v1
This command runs the container, mapping port 8501 on your local machine to port 8501 in the container.

Web Usage

Access the web application at: https://genevic-anath2024.streamlit.app/

Credits

This project was made possible by the dedicated efforts of our research team and the comprehensive support provided by Bioinformatics and Systems Medicine Laboratory and Department of Health Data Science and Artificial Intelligence at McWilliams School of Biomedicalinformatics at UTHealth Houston.

Team Members

Anindita Nath: First Author, AI Programmer, Web Application Developer and Maintener, Database Designer and Manager
Ushijima Mwesigwa, Goh Savannah: Co-Author, PGS Rank database curator, application evaluator
Yulin Dai, PhD : Co-Author, Guided the database development and evaluation of the application

Supervision and Guidance

Xiaoqian Jiang, PhD: Co-Author, Co-supervisor
Zhongping Zhao, PhD, MS: Co-Author, Principal Investigator

Major Reference Websites

STRING API Documentation:
Documentation for the web API of the STRING API website. This is the backbone web API used as one of the demos in the Gene API chat module. This web API is primarily used to generate and visualize the gene-gene interaction network graph.
ENRICHR API Documentation:
Documentation for the web API for ENRICHR website. This is the backbone web API used as one of the demos in the Gene API chat module. This web API is primarily used to perform gene enrichment ananlysis for a set of genes using the reference gene set libraries.
Langchain's PubMed API wrapper source code: Source code for langchain_community.utilities.pubmed.
Langchain's PubMed API wrapper documentation: Documentation for langchain_community.utilities.pubmed.
Q&A with RAG: Question and Answering use case of Langchain.
Google Scholar (SERP) API documentation: Google Scholar API which allows to scrape SERP results from a Google Scholar search query.
Langchain SERP API wrapper: This page covers how to use the SerpAPI search APIs within LangChain.
Langchain Arxiv API wrapper: Entire documentation for Arxiv API wrapper and Arxiv tool of Langchain.

GitHub Repositories and Resources

Our heartfelt thanks go to each team member, department, and external contributor for their indispensable roles in the fruition of this project.

Cite Us

Please cite us as

Anindita Nath, Savannah Mwesigwa, Yulin Dai, Xiaoqian Jiang, Zhongming Zhao, GENEVIC: GENetic data Exploration and Visualization via Intelligent interactive Console, Bioinformatics, Volume 40, Issue 10, October 2024, btae500, https://doi.org/10.1093/bioinformatics/btae500

Contact

Anindita Nath– [email protected], [email protected]

Project Link: https://github.com/anath2110/GENEVIC.git and https://github.com/bsml320/GENEVIC

Supplementary Materials:https://github.com/anath2110/GENEVIC_Supplimentary.git and https://github.com/bsml320/GENEVIC_Supplementary

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
.devcontainer		.devcontainer
data		data
images		images
pages		pages
test_results		test_results
Home.py		Home.py
README.md		README.md
analyze.py		analyze.py
interactivenetwork_evidence.html		interactivenetwork_evidence.html
llm_steps.py		llm_steps.py
modified_pubmed.py		modified_pubmed.py
modified_requests.py		modified_requests.py
requirements.txt		requirements.txt
sshot_workflow.png		sshot_workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GENEVIC

GENetic data Exploration and Visualization Intelligent interactive Console

This is a smart chat assistant that is crafted to facilitate research in Biomedical Informatics for both beginners and intermediate-level researchers.

Table of Contents

Introduction

Project Structure

Features

PGS Chat

Gene API Chat

Literature Search

Local Installation

Pre-requisites

Step-wise Instructions

Step 1. Clone this repository

Step 2. Set up environmental variables

Step 3. Configure development environment

Step 3.1 Navigate to the root directory of this project

Step 3.2 Create a python environment

Step 3.3 Import the requirements.txt

Step 3.4 Run the application locally

Docker Installation

Prerequisites:

Download Docker Image for GENEVIC:

Commands:

Web Usage

Credits

Team Members

Supervision and Guidance

Major Reference Websites

GitHub Repositories and Resources

Cite Us

Contact

About

Releases

Packages

Languages

anath2110/GENEVIC

Folders and files

Latest commit

History

Repository files navigation

GENEVIC

GENetic data Exploration and Visualization Intelligent interactive Console

This is a smart chat assistant that is crafted to facilitate research in Biomedical Informatics for both beginners and intermediate-level researchers.

Table of Contents

Introduction

Project Structure

Features

PGS Chat

Gene API Chat

Literature Search

Local Installation

Pre-requisites

Step-wise Instructions

Step 1. Clone this repository

Step 2. Set up environmental variables

Step 3. Configure development environment

Step 3.1 Navigate to the root directory of this project

Step 3.2 Create a python environment

Step 3.3 Import the requirements.txt

Step 3.4 Run the application locally

Docker Installation

Prerequisites:

Download Docker Image for GENEVIC:

Commands:

Web Usage

Credits

Team Members

Supervision and Guidance

Major Reference Websites

GitHub Repositories and Resources

Cite Us

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages