-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use New Fideslang DSR-updated models [#1761] #2009
Use New Fideslang DSR-updated models [#1761] #2009
Conversation
Note to self: The current issue with server startup is related to `fides_meta` being missing from dataset yaml files. The validation error is eaten and isn't surfaced.
# Conflicts: # src/fides/api/main.py
- Replace instances of "fidesops_meta" with fides_meta in the code. The fideslang Dataset validation will take care of converting fidesops_meta to fides_meta. Don't update the yaml files however. - Remove FidesopsDataset in favor of Fideslang Dataset (DSR concepts have been moved to this model) - Remove FidesopsDatasetCollection in favor of Fideslang DatasetCollection (DSR concepts have been moved to this model) - Remove FidesopsDatasetField in favor of Fideslang DatasetField (DSR concepts have been moved to this model) - Remove FidesopsDatasetReference in favor of Fideslang FidesDatasetReference - Remove Fidesops FidesCollectionKey in favor of Fideslang FidesCollectionKey - Adjust Dataset data category validation to validate against data categories saved in the database instead of just the static default taxonomy.
…atasets prevents data categories from being set at object level and nested field level.
…rove that both fidesops_meta and fides_meta fields are currently supported.
…updated to have the important validation from FidesOpsKey.
# Conflicts: # requirements.txt # src/fides/api/ops/api/v1/endpoints/connection_endpoints.py
Fix an instance where Fides Dataset resource was being used instead of Fideslang Dataset. That's why I renamed!
@@ -192,7 +192,7 @@ def test_put_validate_dataset_invalid_length( | |||
assert response.status_code == 422 | |||
assert ( | |||
json.loads(response.text)["detail"][0]["msg"] | |||
== "Illegal length (-1). Only positive non-zero values are allowed." | |||
== "ensure this value is greater than 0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validation of field length was moved to Fideslang. It was simultaneously updated to use a built-in Pydantic validator instead of our own custom validation to enforce positive non-zero models, so the error message has changed.
@@ -288,7 +288,7 @@ def test_put_validate_dataset_invalid_category( | |||
) | |||
assert response.status_code == 422 | |||
details = json.loads(response.text)["detail"] | |||
assert ["body", "collections", 0, "fields", 0, "data_categories"] in [ | |||
assert ["collections", 0, "fields", 0, "data_categories"] in [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated validation was added to check data_categories against those in the database, not those in the static default data categories list. To do this, I couldn't use a regular pydantic validator, but needed to do this after the fact, when I could access a database session. So the response has adjusted as the request body itself isn't being validated here.
My last failing tests are in tests/ops/service/saas_request/test_saas_request_override_factory.py::TestSaasRequestOverrideFactory. It's odd because that file passes if you run it directly by itself, but if you run EDIT: Main branch (Pdb) signature(f).return_annotation
typing.List[typing.Dict[str, typing.Any]]
(Pdb) signature(f).return_annotation is List[Row]
True
(Pdb) signature(f).return_annotation == List[Row]
True This branch: (Pdb) signature(override_function).return_annotation
typing.List[typing.Dict[str, typing.Any]]
(Pdb) signature(override_function).return_annotation is List[Row]
False
(Pdb) signature(override_function).return_annotation == List[Row]
True |
…er they are stored at the same address in memory - this is error prone. - Revert change from bad merge.
data_categories: | ||
- system.operations | ||
data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified | ||
fidesops_meta: | ||
data_type: object # Stores an encrypted representation of the fidesops graph that executes the privacy requests. | ||
fields: | ||
- name: <dataset_name>:<collection_name> # The current collection | ||
data_categories: | ||
- system.operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dataset file reflects what we store in Redis. Now that we're sharing fideslang models between ops and ctl there's validation that doesn't allow data categories at the object field level - they should be on the nested fields instead.
@@ -26,6 +26,7 @@ RUN apt-get update && \ | |||
g++ \ | |||
gnupg \ | |||
gcc \ | |||
git \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is temporary, will remove when fideslang has been updated.
@@ -11,7 +11,7 @@ dataset: | |||
data_categories: [user.contact.address.street] | |||
- name: id | |||
data_categories: [system.operations] | |||
fidesops_meta: | |||
fides_meta: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adjusted the fidesops_meta fields in the database datasets, but not the saas datasets to demonstrate both still work. Fideslang dataset schemas can take in both fides_meta
and fidesops_meta
but they are converted to fides_meta
. We also don't want to break backwards compatibility with existing saas datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Should/can the saas templates be converted over to use fides_meta
in a future PR down the line in the unified fides resources work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From this comment, it sounded like we wanted to preserve the backwards compatibility of this for now https://github.com/ethyca/fideslang/issues/94#issuecomment-1341175140
@@ -5,6 +5,7 @@ RUN apt-get update && \ | |||
g++ \ | |||
gnupg \ | |||
gcc \ | |||
git \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporary. Will remove when fideslang version is bumped.
src/fides/api/ops/schemas/dataset.py
Outdated
If no data categories in the database, default to using data categories from the default taxonomy. | ||
""" | ||
defined_data_categories: List[FidesKey] = [ | ||
cat[0] for cat in db.query(DataCategory.fides_key).all() | ||
] | ||
if not defined_data_categories: | ||
defined_data_categories = list(DefaultTaxonomyDataCategories.__members__.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding new validation on data category that looks in the database to see if data categories match ones defined there, instead of matching against the default data categories.
As a backup, if there's nothing in the db, which shouldn't be the case, matching against the static list. Related issue added here: #2016
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add an INFO
log saying what DataCategory
didn't have any categories in the database? If it's a situation that shouldn't happen it might be worth making a note of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing
src/fides/api/ops/schemas/dataset.py
Outdated
|
||
class FidesopsMeta(BaseModel): | ||
"""Fidesops-specific annotations used for query traversal""" | ||
|
||
references: Optional[List[FidesopsDatasetReference]] | ||
identity: Optional[str] | ||
primary_key: Optional[bool] | ||
data_type: Optional[str] | ||
"""Optionally specify the data type. Fidesops will attempt to cast values to this type when querying.""" | ||
length: Optional[int] | ||
"""Optionally specify the allowable field length. Fidesops will not generate values that exceed this size.""" | ||
return_all_elements: Optional[bool] | ||
"""Optionally specify to query for the entire array if the array is an entrypoint into the node. Default is False.""" | ||
read_only: Optional[bool] | ||
"""Optionally specify if a field is read-only, meaning it can't be updated or deleted.""" | ||
|
||
@validator("data_type") | ||
def valid_data_type(cls, v: Optional[str]) -> Optional[str]: | ||
"""Validate that all annotated data categories exist in the taxonomy""" | ||
return _valid_data_type(v) | ||
|
||
@validator("length") | ||
def valid_length(cls, v: Optional[int]) -> Optional[int]: | ||
"""Validate that the provided length is valid""" | ||
return _valid_data_length(v) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these removed concepts have moved to Fideslang.
class FidesOpsKey(FidesKey): | ||
""" | ||
Overrides fideslang FidesKey validation to throw ValueError | ||
""" | ||
|
||
@classmethod | ||
def validate(cls, value: Optional[str]) -> Optional[str]: | ||
"""Throws ValueError if val is not a valid FidesKey""" | ||
if value == "<instance_fides_key>": | ||
# Ignore <instance_fides_key> in saas templates. This value will be replaced with a | ||
# user-specified value. | ||
return value | ||
|
||
if value is not None and not cls.regex.match(value): | ||
raise ValueError( | ||
"FidesKey must only contain alphanumeric characters, '.', '_' or '-'." | ||
) | ||
|
||
return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting rid of FidesOpsKey in favor of an updated FidesKey.
I've updated Fides Key to allow ">" "<" characters pending approval. Also the error that FidesKey throws has been updated to be a subclass of a ValueError so can be converted into a pydantic ValidationError.
The only behavior I didn't move to FidesKey was allowing the value argument to be Optional in FidesKey.value. I think on our end we can just use Optional[FidesKey] for this.
@@ -133,7 +133,7 @@ def validate_read_override_function(f: Callable) -> None: | |||
and that it declares at least 5 parameters. | |||
""" | |||
sig: Signature = signature(f) | |||
if sig.return_annotation is not List[Row]: | |||
if sig.return_annotation != List[Row]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was causing some failing tests on this branch. I think a '!=' comparison is recommended here anyway. We just want to see if the values are equal, not if they are stored in the same place in memory.
class Dataset(BaseModel): | ||
class GraphDataset(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to not confuse with Fideslang Dataset, which we're using now in "ops" in favor of "FidesopsDataset"
To run a DSR, a Fideslang Dataset (stored in datasetconfig.dataset) is converted into a GraphDataset. Many GraphDatasets make up a DatasetGraph. The DatasetGraph is what we pass to run_access_request
or run_erasure_request
…Fides Dataset to be GraphDataset.
@ethyca/docs-authors This work is changing |
src/fides/api/ops/service/connectors/saas/connector_registry_service.py
Outdated
Show resolved
Hide resolved
Does it make sense to merge this into a feature branch instead of directly into main? This epic won't be finished until a few more PRs are merge in. Another fides release may happen before that point and release some of the Unified Resources epic early. |
Hi @TheAndrewJackson thanks for the feature branch suggestion. I'd tried to make this PR standalone but we've discussed some other breaking changes in subsequent PR's. So I think a feature branch makes sense and we can really check to make sure it's been solidly QA'd on the FE and BE before merge. |
… GraphDataset rename. - Revert try/except. - Temporarily point fideslang to main branch now that related work has been merged but not yet released.
Thanks for your review @TheAndrewJackson! Merging into the feature branch - |
❗ Blocked by https://github.com/ethyca/fideslang/pull/95
❗ Contains data migration
Closes #1761
Code Changes
Installed Requirements
Dataset yaml files
redis_dataset.yml
(which is a dataset describing our redis cache) to not have data_categories at the object level when there are data_categories at the nested field level. This is a constraint introduced from the fidesops-side to keep there from being data category conflicts in nested data structures when we're filtering data on data category before building the DSR packagefides_meta
.) Fideslang supports both in the request, but any fidesops_meta supplied is converted to fides_meta. We have both in our yaml files right now.db_dataset.yml
which maps our fides database to have afides_meta
instead of afidesctl_meta
field on the ctl_datasets tableReplacing Fidesops Pydantic schemas with new Fideslang ones
FidesKey
in favor ofFidesOpsKey
- FidesKey now allows '>' '<' characters (useful for saas templates) and also bad validation throws a ValueError which is more easily picked up by FastAPI and turned into a ValidationError.Dataset
to beGraphDataset
to not get confused with the Fideslang Dataset.fides_meta
instead. However, leavefidesops_meta
keys in the saas-related yaml files for now. If fidesops_meta is supplied, Fideslang converts it to fides_meta for backwards compatibility.Validation
dataset_config.dataset
(which is a Fideslang Dataset) is converted into a GraphDataset, and all the GraphDatasets are combined to make aDatasetGraph
Database migration
ctl_datasets.fidesctl_meta
to be namedfides_meta
Steps to Confirm
Pre-Merge Checklist
CHANGELOG.md
Description Of Changes
Update Fides to use updated Fideslang modes (mostly changes on the -ops side) now that Fideslang has been updated so it's base models can support ops code.
We used to have two repositories:
fidesctl
andfidesops
that were both using FideslangDataset
concepts. Fidesops added more items to subclassed Datasets to support DSRs. We're now moving some of these fields to Fideslang, so we can share the same schemas between both -ctl and -ops usages.