This is a smart chat assistant that is crafted to facilitate research in Biomedical Informatics for both beginners and intermediate-level researchers.
- GENEVIC is augmented by generative AI models implemented via Azure OpenAI platform.
- It supports Python's built-in SQLITE as well as your own Microsoft SQL Server.
- It can be run from your local host or Streamlit cloud.
- Tasks that can be performed with GENEVIC:
- PGS Chat: Retrieve information from and visualize any custom database.
- GeneAPI Chat: Explore Bioinformatics websites via automated API calls.
- Literature Search: Search for relevant literature evidence in well-known portals for a given search query.
- Retrieve information from and visualize custom database.
- Demo database: Polygenic Score (PGS) Rank Database. See Supplimentary Materials for more information.
- Code Writer: Auto-translate prompts/questions in natural language (e.g., English (US)) to SQL queries or Python code.
- Steps to use this section:
- Use a question from the FAQ or enter your own question.
- You can select
show code
and/orshow prompt
to show SQL & Python code and the prompt behind the scene. - Click on submit to execute and see result.
- For advanced questions such as forecasting, you can use GPT-4 (if available) as the engine.
- Example prompts/questions:
- Show the top 10 ranked genes for Alzheimer.
- Plot distribution of ranks for the top 100 SNPs for Schizophrenia.
- Steps to use this section:
- Download the query results as CSV for retrospective analysis and interpretation.
- Query ChatGPT directly to generate more information or novel research hypothesis.
- Explore external Bioinformatics websites via automated web API calls.
- Demo APIs explored: STRING and ENRICHR.
- Generate gene-gene interaction network, one or more gene names as input.
- Entire functionality of STRING API replicated as is.
- Interactive in-app display of the network.
- Perform gene enrichment analysis with reference gene set libraries, given gene list as input.
- Visualize the network graph.
- Download the enrichment results as CSV and/or the visualizations in known image formats.
- Search for literature evidence in PubMed, Google Scholar, or Arxiv.
- Search in 1 or 2 or all of these websites at the same time.
- Example search queries:
- Search for articles with gene APOE and Alzheimer in Pubmed
- Search for articles with Schizophrenia in Google Scholar
- articles with gene TREM2 and Schizophrenia in Arxiv
- Search for articles with APOE gene name and trait Alzheimer
- Example search queries:
- Displays the name and links of the articles for any given search query.
- Displays the abstract of the article, given its link as search query.
- Python 3.10+
Important: Python and the pip package manager must be in the path in Windows for the setup scripts to work.
Ensure you can runpython --version
from console.
On Ubuntu, you might need to runsudo apt install python-is-python3
to linkpython
topython3
.
Clone this repository:git clone https://github.com/anath2110/GENEVIC.git
From the terminal, navigate to cd [path-to-project-root-folder]
Provide settings for Open AI and Database. You can either create a file named secrets.env
file in the root of this project folder in your PC as below or do it using the app's GUI later on.
- Option 1: use built-in SQLITE. Then you don't need to install SQL Server.
AZURE_OPENAI_API_KEY="9999999999999999999999999"
AZURE_OPENAI_GPT4_DEPLOYMENT="NAME_OF_GPT_4_DEPLOYMENT"
AZURE_OPENAI_CHATGPT_DEPLOYMENT="NAME_OF_CHATGPT_4_DEPLOYMENT"
AZURE_OPENAI_ENDPOINT=https://openairesourcename.openai.azure.com/
SQL_ENGINE = "sqlite"
- Option 2: use your own SQL Server
AZURE_OPENAI_API_KEY="9999999999999999999999999"
AZURE_OPENAI_ENDPOINT="https://openairesourcename.openai.azure.com/"
AZURE_OPENAI_GPT4_DEPLOYMENT="NAME_OF_GPT_4_DEPLOYMENT"
AZURE_OPENAI_CHATGPT_DEPLOYMENT="NAME_OF_CHATGPT_4_DEPLOYMENT"
SQL_USER="sqluserid"
SQL_PASSWORD="sqlpassword"
SQL_DATABASE="WideWorldImportersDW"
SQL_SERVER="sqlservername.database.windows.net"
IMPORTANT If you are a Mac user, please follow this to install ODBC for PYODBC
NOTE all activities in this step will performed using the command line
Navigate to cd [path-to-project-root-folder]
This step is required ONLY if did not perform this earlier as part of the pre-requisites
Run the command: pip install -r requirements.txt
To run the application from the command line: streamlit run Home.py
You will see the application load in your browser.
Note: For troubleshoot, see here Note: For Azure Open AI subscription and set up: see here
Install 'Docker' in local system or create an account in Docker Cloud. Help Resources: https://docs.docker.com/engine/install/
Click here to download the zipped docker image file
Run the following commands from the directory where you loaded the above image (here, exmaple for Windows CMD prompt is shown):
docker load -i genevic-v1.tar
This command loads the Docker image from the tar file into your local Docker repository.
docker run -p 8501:8501 genevic-v1
This command runs the container, mapping port 8501 on your local machine to port 8501 in the container.
Access the web application at: https://genevic-anath2024.streamlit.app/
This project was made possible by the dedicated efforts of our research team and the comprehensive support provided by Bioinformatics and Systems Medicine Laboratory and Department of Health Data Science and Artificial Intelligence at McWilliams School of Biomedicalinformatics at UTHealth Houston.
- Anindita Nath: First Author, AI Programmer, Web Application Developer and Maintener, Database Designer and Manager
- Ushijima Mwesigwa, Goh Savannah: Co-Author, PGS Rank database curator, application evaluator
- Yulin Dai, PhD : Co-Author, Guided the database development and evaluation of the application
- Xiaoqian Jiang, PhD: Co-Author, Co-supervisor
- Zhongping Zhao, PhD, MS: Co-Author, Principal Investigator
- STRING API Documentation:
Documentation for the web API of the STRING API website. This is the backbone web API used as one of the demos in the Gene API chat module. This web API is primarily used to generate and visualize the gene-gene interaction network graph. - ENRICHR API Documentation:
Documentation for the web API for ENRICHR website. This is the backbone web API used as one of the demos in the Gene API chat module. This web API is primarily used to perform gene enrichment ananlysis for a set of genes using the reference gene set libraries. - Langchain's PubMed API wrapper source code: Source code for langchain_community.utilities.pubmed.
- Langchain's PubMed API wrapper documentation: Documentation for langchain_community.utilities.pubmed.
- Q&A with RAG: Question and Answering use case of Langchain.
- Google Scholar (SERP) API documentation: Google Scholar API which allows to scrape SERP results from a Google Scholar search query.
- Langchain SERP API wrapper: This page covers how to use the SerpAPI search APIs within LangChain.
- Langchain Arxiv API wrapper: Entire documentation for Arxiv API wrapper and Arxiv tool of Langchain.
Our heartfelt thanks go to each team member, department, and external contributor for their indispensable roles in the fruition of this project.
Please cite us as
Anindita Nath, Savannah Mwesigwa, Yulin Dai, Xiaoqian Jiang, Zhongming Zhao, GENEVIC: GENetic data Exploration and Visualization via Intelligent interactive Console, Bioinformatics, Volume 40, Issue 10, October 2024, btae500, https://doi.org/10.1093/bioinformatics/btae500
Anindita Nath– [email protected], [email protected]
Project Link: https://github.com/anath2110/GENEVIC.git and https://github.com/bsml320/GENEVIC
Supplementary Materials:https://github.com/anath2110/GENEVIC_Supplimentary.git and https://github.com/bsml320/GENEVIC_Supplementary