Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use New Fideslang DSR-updated models [#1761] #2009

Merged
merged 21 commits into from
Dec 14, 2022

Conversation

pattisdr
Copy link
Contributor

@pattisdr pattisdr commented Dec 8, 2022

❗ Blocked by https://github.com/ethyca/fideslang/pull/95
❗ Contains data migration

Closes #1761

Code Changes

Installed Requirements

  • Temporarily point fideslang to the unmerged branch here https://github.com/ethyca/fideslang/pull/95 while that PR remains unmerged. Revert before merge.
  • Temporarily install git in the Dockerfiles prior to installing requirements. Revert before merge.

Dataset yaml files

  • Updated redis_dataset.yml (which is a dataset describing our redis cache) to not have data_categories at the object level when there are data_categories at the nested field level. This is a constraint introduced from the fidesops-side to keep there from being data category conflicts in nested data structures when we're filtering data on data category before building the DSR package
  • Updated the dataset yamls (for Database datasets, not saas datasets to have fides_meta.) Fideslang supports both in the request, but any fidesops_meta supplied is converted to fides_meta. We have both in our yaml files right now.
  • Update the db_dataset.yml which maps our fides database to have a fides_meta instead of a fidesctl_meta field on the ctl_datasets table

Replacing Fidesops Pydantic schemas with new Fideslang ones

  • Use fideslang FidesKey in favor of FidesOpsKey - FidesKey now allows '>' '<' characters (useful for saas templates) and also bad validation throws a ValueError which is more easily picked up by FastAPI and turned into a ValidationError.
  • Remove FidesopsDataset in favor of the new Fideslang Dataset which has the important attributes from FidesopsDataset
  • Rename Dataset to be GraphDataset to not get confused with the Fideslang Dataset.
  • At the code level - adjust all fidesops_meta variables (at the Dataset, DatasetCollection, and DatasetField levels) to use fides_meta instead. However, leave fidesops_meta keys in the saas-related yaml files for now. If fidesops_meta is supplied, Fideslang converts it to fides_meta for backwards compatibility.
  • Remove FidesopsDatasetReference in favor of identical Fideslang FidesDatasetReference

Validation

  • Add new data category validation for the PATCH /datasetconfig endpoints where we validate against data_categories in the database instead of just the static default taxonomy as users could have added other data categories. When running a DSR, the dataset_config.dataset (which is a Fideslang Dataset) is converted into a GraphDataset, and all the GraphDatasets are combined to make a DatasetGraph
  • Because FidesKey validation errors now throw a FidesValidationError that inherits from ValueError instead of exception, adjust a couple of locations -ctl side where model validation is now throwing a ValidationError.

Database migration

  • Update the ctl_datasets.fidesctl_meta to be named fides_meta

Steps to Confirm

  • Same steps in https://github.com/ethyca/fideslang/pull/95. General QA around CTL Datasets and Ops Connections is handy. Loading ops datasets into the db with the old fidesops_meta key and verifying that it is retrieved and serialized with fides_meta is a good check. Main thing, both Ops DatasetConfig.datasets and Ctl ctl_datasets are being serialized with the same model now.

Pre-Merge Checklist

Description Of Changes

Update Fides to use updated Fideslang modes (mostly changes on the -ops side) now that Fideslang has been updated so it's base models can support ops code.

We used to have two repositories: fidesctl and fidesops that were both using Fideslang Dataset concepts. Fidesops added more items to subclassed Datasets to support DSRs. We're now moving some of these fields to Fideslang, so we can share the same schemas between both -ctl and -ops usages.

TheAndrewJackson and others added 11 commits November 22, 2022 09:27
Note to self: The current issue with server startup is related to `fides_meta` being missing from dataset yaml files. The validation error is eaten and isn't surfaced.
- Replace instances of "fidesops_meta" with fides_meta in the code.  The fideslang Dataset validation will take care of converting fidesops_meta to fides_meta.  Don't update the yaml files however.
- Remove FidesopsDataset in favor of Fideslang Dataset  (DSR concepts have been moved to this model)
- Remove FidesopsDatasetCollection in favor of Fideslang DatasetCollection  (DSR concepts have been moved to this model)
- Remove FidesopsDatasetField in favor of Fideslang DatasetField (DSR concepts have been moved to this model)
- Remove FidesopsDatasetReference in favor of Fideslang FidesDatasetReference
- Remove Fidesops FidesCollectionKey in favor of Fideslang FidesCollectionKey
- Adjust Dataset data category validation to validate against data categories saved in the database instead of just the static default taxonomy.
…atasets prevents data categories from being set at object level and nested field level.
…rove that both fidesops_meta and fides_meta fields are currently supported.
…updated to have the important validation from FidesOpsKey.
# Conflicts:
#	requirements.txt
#	src/fides/api/ops/api/v1/endpoints/connection_endpoints.py
@pattisdr pattisdr changed the title Use New Fideslang DSR-updated models [#1761] [WIP] Use New Fideslang DSR-updated models [#1761] Dec 8, 2022
@@ -192,7 +192,7 @@ def test_put_validate_dataset_invalid_length(
assert response.status_code == 422
assert (
json.loads(response.text)["detail"][0]["msg"]
== "Illegal length (-1). Only positive non-zero values are allowed."
== "ensure this value is greater than 0"
Copy link
Contributor Author

@pattisdr pattisdr Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation of field length was moved to Fideslang. It was simultaneously updated to use a built-in Pydantic validator instead of our own custom validation to enforce positive non-zero models, so the error message has changed.

@@ -288,7 +288,7 @@ def test_put_validate_dataset_invalid_category(
)
assert response.status_code == 422
details = json.loads(response.text)["detail"]
assert ["body", "collections", 0, "fields", 0, "data_categories"] in [
assert ["collections", 0, "fields", 0, "data_categories"] in [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated validation was added to check data_categories against those in the database, not those in the static default data categories list. To do this, I couldn't use a regular pydantic validator, but needed to do this after the fact, when I could access a database session. So the response has adjusted as the request body itself isn't being validated here.

@pattisdr
Copy link
Contributor Author

pattisdr commented Dec 9, 2022

My last failing tests are in tests/ops/service/saas_request/test_saas_request_override_factory.py::TestSaasRequestOverrideFactory.

It's odd because that file passes if you run it directly by itself, but if you run nox -s "pytest_ops(unit)" it fails.

EDIT:
This seemed to just be related to the equality comparison. To address, I'm going to switch the equality comparison to be "==" instead of "is" which is the safer way to check this - it checks to be sure the values are equal, rather than where they are stored in memory.

Main branch

(Pdb) signature(f).return_annotation
typing.List[typing.Dict[str, typing.Any]]
(Pdb) signature(f).return_annotation is List[Row]
True
(Pdb) signature(f).return_annotation == List[Row]
True

This branch:

(Pdb) signature(override_function).return_annotation
typing.List[typing.Dict[str, typing.Any]]
(Pdb) signature(override_function).return_annotation is List[Row]
False
(Pdb) signature(override_function).return_annotation == List[Row]
True

@pattisdr pattisdr self-assigned this Dec 9, 2022
@pattisdr pattisdr added the do not merge Please don't merge yet, bad things will happen if you do label Dec 9, 2022
…er they are stored at the same address in memory - this is error prone.

- Revert change from bad merge.
.fides/db_dataset.yml Show resolved Hide resolved
Comment on lines -10 to -18
data_categories:
- system.operations
data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
fidesops_meta:
data_type: object # Stores an encrypted representation of the fidesops graph that executes the privacy requests.
fields:
- name: <dataset_name>:<collection_name> # The current collection
data_categories:
- system.operations
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dataset file reflects what we store in Redis. Now that we're sharing fideslang models between ops and ctl there's validation that doesn't allow data categories at the object field level - they should be on the nested fields instead.

@@ -26,6 +26,7 @@ RUN apt-get update && \
g++ \
gnupg \
gcc \
git \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary, will remove when fideslang has been updated.

@@ -11,7 +11,7 @@ dataset:
data_categories: [user.contact.address.street]
- name: id
data_categories: [system.operations]
fidesops_meta:
fides_meta:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adjusted the fidesops_meta fields in the database datasets, but not the saas datasets to demonstrate both still work. Fideslang dataset schemas can take in both fides_meta and fidesops_meta but they are converted to fides_meta. We also don't want to break backwards compatibility with existing saas datasets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Should/can the saas templates be converted over to use fides_meta in a future PR down the line in the unified fides resources work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this comment, it sounded like we wanted to preserve the backwards compatibility of this for now https://github.com/ethyca/fideslang/issues/94#issuecomment-1341175140

@@ -5,6 +5,7 @@ RUN apt-get update && \
g++ \
gnupg \
gcc \
git \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary. Will remove when fideslang version is bumped.

Comment on lines 23 to 29
If no data categories in the database, default to using data categories from the default taxonomy.
"""
defined_data_categories: List[FidesKey] = [
cat[0] for cat in db.query(DataCategory.fides_key).all()
]
if not defined_data_categories:
defined_data_categories = list(DefaultTaxonomyDataCategories.__members__.keys())
Copy link
Contributor Author

@pattisdr pattisdr Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding new validation on data category that looks in the database to see if data categories match ones defined there, instead of matching against the default data categories.

As a backup, if there's nothing in the db, which shouldn't be the case, matching against the static list. Related issue added here: #2016

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add an INFO log saying what DataCategory didn't have any categories in the database? If it's a situation that shouldn't happen it might be worth making a note of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing

Comment on lines 97 to 122

class FidesopsMeta(BaseModel):
"""Fidesops-specific annotations used for query traversal"""

references: Optional[List[FidesopsDatasetReference]]
identity: Optional[str]
primary_key: Optional[bool]
data_type: Optional[str]
"""Optionally specify the data type. Fidesops will attempt to cast values to this type when querying."""
length: Optional[int]
"""Optionally specify the allowable field length. Fidesops will not generate values that exceed this size."""
return_all_elements: Optional[bool]
"""Optionally specify to query for the entire array if the array is an entrypoint into the node. Default is False."""
read_only: Optional[bool]
"""Optionally specify if a field is read-only, meaning it can't be updated or deleted."""

@validator("data_type")
def valid_data_type(cls, v: Optional[str]) -> Optional[str]:
"""Validate that all annotated data categories exist in the taxonomy"""
return _valid_data_type(v)

@validator("length")
def valid_length(cls, v: Optional[int]) -> Optional[int]:
"""Validate that the provided length is valid"""
return _valid_data_length(v)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these removed concepts have moved to Fideslang.

Comment on lines -6 to -24
class FidesOpsKey(FidesKey):
"""
Overrides fideslang FidesKey validation to throw ValueError
"""

@classmethod
def validate(cls, value: Optional[str]) -> Optional[str]:
"""Throws ValueError if val is not a valid FidesKey"""
if value == "<instance_fides_key>":
# Ignore <instance_fides_key> in saas templates. This value will be replaced with a
# user-specified value.
return value

if value is not None and not cls.regex.match(value):
raise ValueError(
"FidesKey must only contain alphanumeric characters, '.', '_' or '-'."
)

return value
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting rid of FidesOpsKey in favor of an updated FidesKey.

I've updated Fides Key to allow ">" "<" characters pending approval. Also the error that FidesKey throws has been updated to be a subclass of a ValueError so can be converted into a pydantic ValidationError.

The only behavior I didn't move to FidesKey was allowing the value argument to be Optional in FidesKey.value. I think on our end we can just use Optional[FidesKey] for this.

@@ -133,7 +133,7 @@ def validate_read_override_function(f: Callable) -> None:
and that it declares at least 5 parameters.
"""
sig: Signature = signature(f)
if sig.return_annotation is not List[Row]:
if sig.return_annotation != List[Row]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was causing some failing tests on this branch. I think a '!=' comparison is recommended here anyway. We just want to see if the values are equal, not if they are stored in the same place in memory.

class Dataset(BaseModel):
class GraphDataset(BaseModel):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to not confuse with Fideslang Dataset, which we're using now in "ops" in favor of "FidesopsDataset"

To run a DSR, a Fideslang Dataset (stored in datasetconfig.dataset) is converted into a GraphDataset. Many GraphDatasets make up a DatasetGraph. The DatasetGraph is what we pass to run_access_request or run_erasure_request

@pattisdr pattisdr changed the title [WIP] Use New Fideslang DSR-updated models [#1761] Use New Fideslang DSR-updated models [#1761] Dec 9, 2022
@pattisdr pattisdr marked this pull request as ready for review December 9, 2022 20:40
@pattisdr pattisdr requested a review from a team December 9, 2022 20:40
@pattisdr
Copy link
Contributor Author

pattisdr commented Dec 9, 2022

@ethyca/docs-authors This work is changing fidesops_meta across the codebase to be fides_meta instead, although fidesops_meta is still supported for backwards compatibility. I know this attribute is scattered throughout the docs -

@TheAndrewJackson
Copy link
Contributor

Does it make sense to merge this into a feature branch instead of directly into main? This epic won't be finished until a few more PRs are merge in. Another fides release may happen before that point and release some of the Unified Resources epic early.

@pattisdr pattisdr changed the base branch from main to unified-fides-resources December 13, 2022 19:05
@pattisdr
Copy link
Contributor Author

Hi @TheAndrewJackson thanks for the feature branch suggestion. I'd tried to make this PR standalone but we've discussed some other breaking changes in subsequent PR's. So I think a feature branch makes sense and we can really check to make sure it's been solidly QA'd on the FE and BE before merge.

@pattisdr pattisdr removed the do not merge Please don't merge yet, bad things will happen if you do label Dec 13, 2022
… GraphDataset rename.

- Revert try/except.
- Temporarily point fideslang to main branch now that related work has been merged but not yet released.
@pattisdr
Copy link
Contributor Author

Thanks for your review @TheAndrewJackson! Merging into the feature branch -

@pattisdr pattisdr merged commit 2313b7a into unified-fides-resources Dec 14, 2022
@pattisdr pattisdr deleted the fides_1761_use_new_fideslang_models branch December 14, 2022 20:24
@pattisdr pattisdr mentioned this pull request Jan 17, 2023
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Fides to use the Fideslang models
5 participants