Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fides Pydantic V2 Upgrade #5020

Merged
merged 181 commits into from
Aug 20, 2024
Merged

Fides Pydantic V2 Upgrade #5020

merged 181 commits into from
Aug 20, 2024

Conversation

pattisdr
Copy link
Contributor

@pattisdr pattisdr commented Jun 25, 2024

Closes #PROD-2263

❗ Dependent on ethyca/fideslang#11
ℹ️ Replaces #4442

Description Of Changes

Pydantic V2/Fast API Upgrade

Code Changes

  • Removes support for Python 3.8-
  • Upgrades Pydantic to V2, upgrades FastAPI, and other supporting dependencies
  • class Config -> model_config=ConfigDict(...)
  • dict() -> model_dump(), often with mode="json" since model_dump() is not always serializable as json, depending on model contents
  • parse_obj() -> model_validate()
  • from_orm() -> model_validate()
  • conlist -> Annotated[List[], Field(max_length=50)]
  • root_validators -> model_validators. mode="after" is often the better way to go because before validators can be called in surprising places, where sometimes the incoming data is a dict, or other times, the schema itself
  • __init_subclass__ -> __pydantic_init_subclass__
  • __fields__ -> model_fields()
  • customise_sources -> customise_setting_sources
  • update_forward_refs -> model_rebuild
  • Always=True -> validate_default=True on the field directly
  • ConstrainedStr now need to be defined using StringConstraints
  • Sometimes our str(errors) needed to be wrapped with jsonable_encoder.  Errors are not always json serializable
  • Lots of default values of None added to keep Optional fields optional
  • @app.on_event("startup") is deprecated, using Lifespan instead
  • Updated definitions of custom types: SafeStr, HtmlStr, PhoneNumber, GPPMechanismConsentValue, URLOrigin, and CssStr
  • Pydantic V2 can't serialize unknown types. Adding FieldSerializers on some of our Python classes that we're using as types for DSR purposes since Pydantic V2 can't serialize them otherwise.
  • Using duck typing to serialize deeping nested collections to match original behavior for DSR processing. A customer's collection might not be fully defined with a Pydantic schema which was causing some of the collection representation to get dropped in V2.
  • Updates to NoValidationSchema so it can still be used as a Pydantic schema that skips validation allowing a schema to be used for docs-only
  • You can't just stash random things on schemas, use json_schema_extra
  • We return a lot of Pydantic error messages in failed API calls, and the text has changed
  • Models are no longer equal to the dicts containing their data, resulting in a lot of test changes
  • Middleware can no longer be updated after the application has started - hack to get this working since we allow cors origins to be updated after the fact
  • Exposes admin-ui settings in the Config API

Steps to Confirm

This is something that really needs a full regression. These are features that needed more attention:

  • Checking adding multiple types of integrations, postgres, bigquery, attentive, mailchimp
  • Verify connection secret schemas show up in docs
  • Running DSR's with Postgres and Mongo and verifying deeply nested results
  • Verifying saving messaging secrets
  • Verifying saving storage secrets
  • General verification that secrets aren't leaked in error messages/logs. Pydantic error messages are more verbose.
  • CORS origins, updating CORS origins after the application is running
  • Setting env vars in the toml and as env vars
  • CLI commands

Pre-Merge Checklist

  • All CI Pipelines Succeeded
  • Documentation:
    • documentation complete, PR opened in fidesdocs
    • documentation issue created in fidesdocs
    • if there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
  • Issue Requirements are Met
  • Relevant Follow-Up Issues Created
  • Update CHANGELOG.md
  • For API changes, the Postman collection has been updated
  • If there are any database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!

…r occurred while loading CORS domains: Cannot add middleware after an application has started"

- Pin fideslang, bump Pydantic to 2.7.1, bump FastAPI to 0.89.1, and temporarily add bump-pydantic to add in upgrade
- This caused us to have to update typing_extensions and fastapi-pagination as well. Also necessitated installing new requirements pydnatic-settings.
- FidesKey.validate has been replaced with validate_fides_key for validating FidesKeys outside of a model
- conlist max_items/min_items -> max_length/min_length
- root_validators -> model_validators, largely using mode=before as classmethods, some as after validators
- Need to define __pydantic_init_subclass__ instead of init_subclass
- Constrainedstr -> constr
- EmailStr takes no arguments
- Coercing string ports into integer ports since Pydantic longer does this for us and I don't want this to be a breaking change
- Advanced settings was being overridden by a non-annotated attribute
- smart_union supported by default now, var has been removed
- ConnectorTempltae.validate_config has config param name that conflicts
- SettingsSourceCallable is updated to PydanticBaseSettingsSource and has been moved to pydnatic-settings
- Field validators should be accessing values through info.data
- Deleted _FUNCS.clear under the assumption that allow_resuse behavior is no longer necessary
- URLOrigin custom type can no longer be defined as a subclass. Redefine with URL Origin
- PostgresDSR user argument -> username
…he application has already started.

- Replace fast api's app.on_event(startup) with a lifespan that defines logic before the app starts up.
- Instead, load cors middleware early, not using ConfigProxy. This is temporary, and I will address this more later.
- This gets the server up and running and the shell is loaded.
…p-pydantic on the src directory

- Update custom types from being defined as sub classes with __get_validators__ class methods to Annotated resources.
-  Replaced conlist(T, *args) with Annotated[List[T], Field(*args)].
- Add default of None to Optional fields to keep them Optional
- Swap class Config for model_config
- Swap vaidators for field validators (pre=True -> mode=before)
- Field validators that need access to "values" should be refactored to be "before" validators accessing info.data
…, not an AnyUrl, with the trailing slash removed. Our CORS urls can't have trailing slashes.
…"before", so they are always SaaSRequests.

- If I'm using the before model validator for SaaSRequests, sometimes "values" are a list, but other times they are a dict.  This surfaced in the Salesforce config where read requests are a list of SaaS Requests.
…PrivacyRequestResponse.

When creating a `PrivacyRequestCreate` object, custom_privacy_request_fields are now being coerced so the value is a CustomPrivacyRequestField.  Will need a follow-up to ensure this still gets saved properly.
… are moving them from the meta object to top-level attributes for convenience. Converting this back into a literal to/from to prevent a ValidationError.
…rly compare with the collections coming out of the database.
…lack of type coercion. Rather than push this to customers updating existing rollbar config, updating internally.
…ds optional.

- The validate_details_validator field_validator's second param is a ValidationInfo object not a dictionary.
- Use the EdgeDirection value
…at's coming out of the database with the pydantic model. In Pydantic V2, models are no longer equal to the dicts containing their data.
…s() doesn't include extra fields so Identity.labeled_dict is throwing out custom identities.
…t converted into a private attribute which can't be accessed as intended in the validators.
…ore validator - so I'm updating to use the validator in "after" mode.
…m fields in the SaaSSchemaFactory.

- Also, need to use is_required() function on the FieldInfo object.
- Update some error messages where the default text thrown by pydantic has changed slightly, but the intent is the same.
…e values.

- Adjust the error message in a test - the specifics of the error thrown by pydantic have changed but the essence is the same.
# Conflicts:
#	src/fides/api/api/v1/endpoints/privacy_request_endpoints.py
#	src/fides/api/schemas/messaging/messaging.py
#	src/fides/api/schemas/privacy_request.py
…ng thrown by pydantic have changed but the essence is the same.
…Converters, FieldAddresses, and CollectionAddresses. These are not Pydantic types.
…ject fields using duck-typing behavior. Nested object fields were not being included with json() when serializing a collection.

> "Duck-typing serialization is the behavior of serializing an object based on the fields present in the object itself, rather than the fields present in the schema of the object."
- Fides keys can no longer be integers
- Models are no longer equal to the dicts containing their data
- Pydantic error message has changed
- Newly added connection type needs to have _required_components be a ClassVar
- Saas secrets not making it into the database, being thrown out in serialization because they don't match fields on the generic SaaS Schema.  Using duck typing serialization by adding serialize_as_any=True
- Allow extra fields to come through on the SaaS schema so the secrets can come in through
- Don't save secrets to the database that were not officially set to match original behavior with exclude_unset=True
- New ValiationError has the input_value which is a SaaS Schema which isn't json serializable. Wrapping it in a jsonable encoder
- Use json_schema_extra to add extra data to the json schema since arbitrary arguments are no longer supported
- parse_obj -> model_validate
- Model validators with mode="after" are a lot easy to work with - before validators  - they are passed raw input, so it might be. adict, it could be an instance of the model itself -
- Fix incorrect advanced settings type I introduced which is causing cookie_ids to get dropped
- Behavior change: fields passed in for secrets that are not defined on the schema are no longer saved to the db
- Adjust tests that were looking at error messages thrown by pydantic where the generated error messages have changed
…s arbitrary keyword arguments. This was preventing secrets from being masked.
…"null" as acceptable options to many of the fields generated from the pydantic schemas.
@pattisdr pattisdr removed the do not merge Please don't merge yet, bad things will happen if you do label Aug 20, 2024
@pattisdr pattisdr merged commit 9dd7d7b into main Aug 20, 2024
46 of 48 checks passed
@pattisdr pattisdr deleted the fides_pydantic_v2_upgrade branch August 20, 2024 16:29
Copy link

cypress bot commented Aug 20, 2024



Test summary

4 0 0 0


Run details

Project fides
Status Passed
Commit 9dd7d7b
Started Aug 20, 2024 4:42 PM
Ended Aug 20, 2024 4:43 PM
Duration 00:37 💡
OS Linux Ubuntu -
Browser Electron 106

View run in Cypress Cloud ➡️


This comment has been generated by cypress-bot as a result of this project's GitHub integration settings. You can manage this integration in this project's settings in the Cypress Cloud

Roger-Ethyca pushed a commit that referenced this pull request Aug 20, 2024
Pydantic v1 -> Pydantic v2 upgrade

Co-authored-by: Adrian Galvan <[email protected]>
Co-authored-by: eastandwestwind <[email protected]>
@ThomasLaPiana
Copy link
Contributor

@pattisdr bravo!

@pattisdr
Copy link
Contributor Author

Thank you @ThomasLaPiana!! 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run unsafe ci checks Runs fides-related CI checks that require sensitive credentials
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants