GitHub - ottogroup/bquest: Effortlessly validate and test your Google BigQuery queries with the power of pandas DataFrames in Python.

BQuest

Effortlessly validate and test your Google BigQuery queries with the power of pandas DataFrames in Python.

We would like to thank Mike Czech who is the original inventor of bquest!

Warning

This library is a work in progress!

Breaking changes should be expected until a 1.0 release, so version pinning is recommended.

Overview

Use BQuest in combination with your favorite testing framework (e.g. pytest).
Create temporary test tables from JSON or pandas DataFrame.
Run BQ configurations and plain SQL queries on your test tables and check the result.

Installation

Via PyPi (standard):

pip install bquest

Via Github (most recent):

pip install git+https://github.com/ottogroup/bquest

BQuest also requires a dedicated BigQuery dataset for storing test tables, e.g.

resource "google_bigquery_dataset" "bquest" {
  dataset_id    = "bquest"
  friendly_name = "bquest"
  description   = "Source tables for bquest tests"
  location      = "EU"
  default_table_expiration_ms = 3600000
}

We recommend setting an expiration time for tables in the bquest dataset to assure removal of those test tables upon test execution.

Example

Given a pandas DataFrame

foo	weight	prediction_date
bar	23	20190301
my	42	20190301

and its table definition

from bquest.tables import BQTableDefinitionBuilder

table_def_builder = BQTableDefinitionBuilder(GOOGLE_PROJECT_ID, dataset="bquest", location="EU")
table_definition = table_def_builder.from_df("abc.feed_latest", df)

you can use the config file ./abc/config.py

{
    "query": """
        SELECT
            foo,
            PARSE_DATE('%Y%m%d', prediction_date)
        FROM
            `{source_table}`
        WHERE
            weight > {THRESHOLD}
    """,
    "start_date": "prediction_date",
    "end_date": "prediction_date",
    "source_tables": {"source_table": "abc.feed_latest"},
    "feature_table_name": "abc.myid",
}

and the runner

from bquest.runner import BQConfigFileRunner, BQConfigRunner

runner = BQConfigFileRunner(
    BQConfigRunner(bq_client, bq_executor_func),
    "config/bq_config",
)

result_df = runner.run_config(
    "20190301",
    "20190308",
    [table_definition],
    "abc/config.py",
    templating_vars={"THRESHOLD": "30"},
)

to assert the result table

assert result_df.shape == (1, 2)
assert result_df.iloc[0]["foo"] == "my"

Testing

For the actual testing bquest relies on an accessible BigQuery project which can be configured with the gcloud client. The corresponding GOOGLE_PROJECT_ID is extracted from this project and used with pandas-gbq to write temporary tables to the bquest dataset that has to be pre- configured before testing on that project.

For Github CI we have configured an identity provider in our testing project which allows only core members of this repository to access the testing projects' resources.

Important Links

Full documentation: https://ottogroup.github.io/bquest/

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github		.github
bquest		bquest
docs		docs
tests		tests
.gitignore		.gitignore
CHANGELOG.rst		CHANGELOG.rst
CONTRIBUTORS.rst		CONTRIBUTORS.rst
LICENSE		LICENSE
README.rst		README.rst
codecov.yml		codecov.yml
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BQuest

Overview

Installation

Example

Testing

Important Links

About

Releases 11

Contributors 8

Languages

License

ottogroup/bquest

Folders and files

Latest commit

History

Repository files navigation

BQuest

Overview

Installation

Example

Testing

Important Links

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 11

Contributors 8

Languages