A Model Context Protocol (MCP) server implementation that provides access to Google BigQuery biomedical datasets. While other bigquery MCP servers exist, we decided to build a dedicated MCP server for specific datasets to help the MCP client find the right information faster and provide the right context for biopharma specific questions.
Note that this is work in progress and that the MCP itself is still in its very early days, so you can expect changes over the next weeks.
You will need a Google Cloud account and set up a service account with access to the BigQuery datasets.
The server exposes the following resources:
-
memo://insights
: Insights on the Analysis
A memo for the LLM to store information on the analysis. Of note, Claude does not seem to use this at the moment unless we explicitly define tools for it (which can make it overuse this capacity and therefore we removed explicit tools for resource access). We assume that future MCP clients will have better tool use (and are also working on our own client) but it does not have much impact for the moment. -
schema://database
: OpenTargets Database Schema
Detailed structural information about the OpenTargets database, including column names and a short table description. This helps the network to plan queries without the need of exploring the database itself.
The server offers several core tools:
-
list-datasets
- List all available BigQuery public datasets that can be queried
- Input: None required
- Returns: List of available datasets
-
read-query
- Execute
SELECT
queries on the specified BigQuery public dataset - Input:
dataset
(string): Name of the BigQuery dataset to queryquery
(string): TheSELECT
SQL query to execute
- Returns: Query results as JSON array of objects
- Execute
-
list-tables
- Get a list of all tables in the specified BigQuery dataset
- Input:
dataset
(string): Name of the BigQuery dataset to explore
- Returns: List of table names
-
describe-table
- View schema information for a specific table
- Input:
dataset
(string): Name of the BigQuery dataset containing the tabletable_name
(string): Name of table to describe
- Returns: Column definitions with names, types, and nullability
-
append-insight
- Add new findings to the analysis memo
- Input:
finding
(string): Analysis finding about patterns or trends
- Returns: Confirmation of finding addition
-
get-insights
- Retrieve all recorded insights from the current session
- Input: None required
- Returns: List of all recorded insights
The server requires the following environment variables:
BIGQUERY_CREDENTIALS
: Path to your Google Cloud service account key fileALLOWED_DATASETS
: Comma-separated list of allowed BigQuery datasets, e.g.:ALLOWED_DATASETS=open_targets_platform,open_targets_genetics,human_genome_variants,gnomad
BIGQUERY_MAX_COST_USD
: Maximum allowed cost per query in USD (default: 0.50) - this might seem exaggerated, but Claude can easily come up with queries that require a lot of data processing. I spent 25 USD in one day, and in most cases these costly queries fail anyway.
Add the following to your claude_desktop_config.json
:
From local source repository
"mcpServers": {
"BIGQUERY-BIOMEDICAL-MCP": {
"command": "uv",
"args": [
"--directory",
"PATH TO mcp-bigquery-biomedical REPOSITORY",
"run",
"mcp_bigquery_biomedical"
],
"env": {
"BIGQUERY_CREDENTIALS": "PATH_TO_YOUR_SERVICE_ACCOUNT_KEY.json",
"ALLOWED_DATASETS": "open_targets_platform,open_targets_genetics,human_genome_variants,gnomad" # or any other dataset name you want to work with
}
}
}
Compiled package
"mcpServers": {
"BIGQUERY-BIOMEDICAL-MCP": {
"command": "uvx",
"args": [
"mcp-bigquery-biomedical"
],
"env": {
"BIGQUERY_CREDENTIALS": "PATH_TO_YOUR_SERVICE_ACCOUNT_KEY.json",
"ALLOWED_DATASETS": "open_targets_platform,open_targets_genetics,human_genome_variants,gnomad" # or any other dataset name you want to work with
}
}
}
You can check the public datasets in BigQuery for a comprehensive list, but some of the biomedical datasets are:
- open_targets_platform
- open_targets_genetics
- ebi_chembl
- ebi_surechembl
- fda_drug
- deepmind_alphafold
- human_genome_variants
- human_variant_annotation
- immune_epitope_db
- cms_medicare
- patents
- patents_view
- patents_cpc
- google_patents_research
- uspto_oce_cancer
- patents_dsep
- ebi_mgnify
- human_genome_variants
- gnomad
Note, that many of these have not been updated for a year or two in the public bigquery database. We are building our own databases for AI supported translational research, so if you need more up to data information, please reach out to us at [email protected]
The server supports access to all BigQuery public datasets. The database resource has only been created for the OpenTargets datasets, but Claude also works well if no database description is provided. You can easily add one for another dataset and create a pull request.
Happy to hear from you if you have suggestions, questions, or feedback [email protected]
This MCP server is licensed under the GNU General Public License v3.0 (GPL-3.0). This means you have the freedom to run, study, share, and modify the software. Any modifications or derivative works must also be distributed under the same GPL-3.0 terms. For more details, please see the LICENSE file in the project repository.