Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Pydantic 2.x and BigQuery offline store Class Improvements #4795

Open
boumelhaa opened this issue Nov 27, 2024 · 0 comments
Open

Comments

@boumelhaa
Copy link

boumelhaa commented Nov 27, 2024

Title: Migration Issues with Pydantic 2.x and BigQuery Class Improvements

Description

I encountered an issue when upgrading to Pydantic 2.x. Below is a code snippet that can be used as a patch:

def project_id_exists(cls, v, values, **kwargs):
    if v and not values.data.get("project_id"):
        raise ValueError("Please specify project_id if billing_project_id is specified")
    return v

Problem Details

  1. Pydantic Validator Update:

    • When using Pydantic 2.x, the method project_id_exists seems to cause issues due to changes in how validators are expected to handle values. This needs to be re-evaluated to conform to the new Pydantic standards.
  2. BigQuery Class Refactoring:

    • Default Timeout Issue: Currently, the default timeout setting cannot be specified outside of the _execute_query methods. This limitation means that a job throws a "job canceled" exception if it runs longer than 30 minutes. There should be a way to specify a customizable timeout at a class or method level.

    • Billing Project Misuse: The billing project configuration is not appropriately used. The billing project ID should not replace the project_id. The project_id is meant for storage purposes, while the billing project ID is intended for switching computation resources. Generating the queries for the point in time joins and executing the them should use different clients initialized from both project_id and billing_project_id

Proposed Solution

  • Pydantic Validator Adjustment:

    • Ensure compatibility with Pydantic 2.x by updating the validator function to handle values appropriately according to the new guidelines.
  • BigQuery Class Improvement:

    • Implement a mechanism to allow specifying a default timeout at a more granular level than just at _execute_query.
    • Separate the responsibilities of project_id and billing_project_id to align with their intended purposes without interchanging their roles.

Environment

  • Python version: 3.12
  • Pydantic version: 2.x

similar issues raised

pydantic issue in validator: #4678

FYI @breno-costa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant