Remove DatasetConfig.dataset field + New Get DatasetConfig Endpoints [#1763] #2096

pattisdr · 2022-12-20T23:13:22Z

❗ Contains migration
👉 This PR is against the unified-fides-resources branch, not main.

Closes #1763

Code Changes

Add a data migration to delete the DatasetConfig.dataset column. It should have been previously migrated to a ctl dataset resource
Remove the "dataset" key from the data sent to DatasetConfig.create_or_update call in the new patch_dataset_configs endpoint
- Leave the dataset key in the data sent to DatasetConfig.upsert_with_ctl_dataset because this method currently upserts both the CTL Dataset first and then the DatasetConfig. This method is currently used in the old patch dataset config endpoints (json and yaml, soon to be deprecated) as well as when you're creating a connection config from a template.
Update fixtures to not pass in a "dataset" key to data passed to DatasetConfig.create
Throw a 404 if the ctl_dataset_id does not existing when creating a DatasetConfig through the new patch_dataset_configs endpoint
Also add validation when linking an existing CTL Dataset to a DatasetConfig that that existing dataset has valid data categories that exist in the db.
Add a fix to prevent existing upsert from attempting to update the id of an existing resource. Datasets are referenced by DatasetConfigs now, we need them to stay the same.
Add new GET datasetconfig list and detail endpoints that include both the fides_key and nested ctl_dataset. The DatasetConfig and CTLDataset are different resources and their fides_keys can differ, so both keys need to be in the response.

Steps to Confirm

list any manual steps for reviewers to confirm the changes

Pre-Merge Checklist

All CI Pipelines Succeeded
Documentation:
- documentation complete, PR opened in fidesdocs
- documentation issue created in fidesdocs
Issue Requirements are Met
Relevant Follow-Up Issues Created
Update CHANGELOG.md

Description Of Changes

Get rid of the DatasetConfig.dataset field.

The previous PR stopped reading from DatasetConfig.dataset field and started writing to the new DatasetConfig.ctl_dataset resource.

Nothing should be using this field so let's get rid of it.

- Add a data migration to delete the column - Remove the "dataset" key from the data sent to DatasetConfig.create_or_update in the patch_dataset_configs endpoint - Leave the "dataset" key in the data sent to DatasetConfig.upsert_with_ctl_dataset because this method currently upserts both the CTL Dataset first and then the DatasetConfig. This method is currently used in the old patch dataset config endpoints (soon to be deprecated) as well as when you're creating a connection config from a template. - Update a lot of fixtures to not pass in a "dataset" key to DatasetConfig.create - Throw a 404 if the ctl_dataset_id does not existing when creating a DatasetConfig through the new patch_dataset_configs endpoint

…instead of a Datasets tag to not conflict with CTL Datasets. Dataset Configs link a Connection Config to a CTL Dataset.

…a datasetconfig. Data categories must exist in the db.

…e fides_key of the DatasetConfig as well as the entire nested CTL Dataset which has the fides_key of the separate CTL Dataset resource.

…an collection to use new flow to upsert the CTL DatasetConfig before creating a ConnectionConfig.

pattisdr · 2022-12-22T03:34:50Z

src/fides/api/ops/api/v1/endpoints/dataset_endpoints.py

+            )
+
+        try:
+            fetched_dataset: Dataset = Dataset.from_orm(ctl_dataset)


CTL Datasets can currently be created without an organization_fides_key or a data_qualifier - so customers may have existing database records where this is the case. However, these are not optional fields on the Fideslang Dataset schema.

Trying to parse an existing CTL dataset with these issues would throw a 500 so I catch and throw a 422 instead. The issue would need to be corrected through the CTL dataset endpoint.

We should separately look into using the Fideslang Dataset schema to validate when CTL datasets are created, or alternatively allow these fields to be optional on the Fideslang Dataset. Allison's noticed this too #2113 (comment)

should we make a 🎫 for this issue?

yep will do!

pattisdr · 2022-12-22T03:38:35Z

src/fides/api/ops/api/v1/endpoints/dataset_endpoints.py

+            raise HTTPException(
+                status_code=HTTP_422_UNPROCESSABLE_ENTITY, detail=e.errors()
+            )
+        validate_data_categories(fetched_dataset, db)


Since we have to fetch the dataset anyway to validate for saas connectors, I"m going ahead and checking that its data categories are also in the database like we do on existing dataset config endpoints. If data categories are invalid, they need to be addressed on the ctl dataset.

It's easier to add this check here than to a generic crud endpoint on the ctl side, but I'm not sure how I feel about it. It's more of a meta question about where is the best place to validate that data categories exist? On resource creation? When we use the resource, like in a DSR? In both places?

The reason this isn't part of a standard validator is that it needs access to a database session.

my instinct would be on resource creation since that's the closest layer to the db so it'd catch the most things. but yes, the generic CRUD endpoint is nice code cleanliness wise, but makes changes like this more difficult. maybe something to discuss with the team and make a follow up ticket for?

It's tricky though because the data categories could change later! They may be valid at the time of creation and not at the time of use!

will make a ticket!

pattisdr · 2022-12-22T03:39:51Z

src/fides/api/ops/api/v1/endpoints/dataset_endpoints.py

+        db=db, conditions=(DatasetConfig.connection_config_id == connection_config.id)
+    ).order_by(DatasetConfig.created_at.desc())
+
+    return paginate(dataset_configs, params)


This is identical to get_datasets except the response has both the DatasetConfig fides_key and the CTL Dataset fides_key, which can differ. The other parallel endpoint will eventually be deprecated.

pattisdr · 2022-12-22T03:40:22Z

src/fides/api/ops/api/v1/endpoints/dataset_endpoints.py

+def get_dataset_config(
+    fides_key: FidesKey,
+    db: Session = Depends(deps.get_db),
+    connection_config: ConnectionConfig = Depends(_get_connection_config),
+) -> DatasetConfig:
+    """Returns the specific Dataset Config linked to the Connection Config."""


This is identical to get_dataset except the response has both the DatasetConfig fides_key and the CTL Dataset fides_key, which can differ. The other parallel endpoint will eventually be deprecated.

pattisdr · 2022-12-22T03:41:18Z

src/fides/api/ops/models/datasetconfig.py

@@ -117,6 +113,7 @@ def upsert_ctl_dataset(ctl_dataset_obj: Optional[CtlDataset]) -> CtlDataset:
                fetched_ctl_dataset
            )  # Create/update existing ctl_dataset first
            data["ctl_dataset_id"] = ctl_dataset.id
+            data.pop("dataset", None)


This method allows both the CTL Dataset and the DatasetConfig to be upserted. The dataset is passed in for the CTL Dataset, but we don't want this field when we update the DatasetConfig.

Conflicts: tests/ctl/core/test_dataset.py

pattisdr · 2022-12-22T19:39:17Z

src/fides/api/ctl/database/crud.py

+                    excluded = dict(insert_stmt.excluded.items())  # type: ignore[attr-defined]
+                    excluded.pop("id", None)  # If updating, don't update the "id"
+
                    result = await session.execute(
                        insert_stmt.on_conflict_do_update(
                            index_elements=["fides_key"],
-                            set_=insert_stmt.excluded,
+                            set_=excluded,


Allison ran into trying to upsert a CTL Dataset that was already linked to a DatasetConfig and ran into:

fides-fides-1 | 2022-12-22 17:00:45.731 [DEBUG] (crud:upsert_resources:182): (sqlalchemy.dialects.postgresql.asyncpg.IntegrityError) <class 'asyncpg.exceptions.ForeignKeyViolationError'>: update or delete on table "ctl_datasets" violates foreign key constraint "datasetconfig_ctl_dataset_id_fkey" on table "datasetconfig" fides-fides-1 | DETAIL: Key (id)=(ctl_8ca1f08e-53de-4954-a56a-73123145946e) is still referenced from table "datasetconfig".

I believe upserting is also trying to update the id of an existing resource, but we need this to stay consistent, especially since DatasetConfigs now have a FK to this table.

allisonking

such thorough tests! 🤩

allisonking · 2022-12-23T19:26:10Z

src/fides/api/ops/api/v1/endpoints/dataset_endpoints.py

+            )
+
+        try:
+            fetched_dataset: Dataset = Dataset.from_orm(ctl_dataset)


should we make a 🎫 for this issue?

allisonking · 2022-12-23T19:28:33Z

src/fides/api/ops/api/v1/endpoints/dataset_endpoints.py

+            raise HTTPException(
+                status_code=HTTP_422_UNPROCESSABLE_ENTITY, detail=e.errors()
+            )
+        validate_data_categories(fetched_dataset, db)


my instinct would be on resource creation since that's the closest layer to the db so it'd catch the most things. but yes, the generic CRUD endpoint is nice code cleanliness wise, but makes changes like this more difficult. maybe something to discuss with the team and make a follow up ticket for?

pattisdr self-assigned this Dec 20, 2022

pattisdr added Unified Fides Resources run unsafe ci checks Runs fides-related CI checks that require sensitive credentials labels Dec 20, 2022

Fix missed instances of accessing the DatasetConfig.dataset.

3b093f2

pattisdr force-pushed the fidesops_1763_stop_storing_datasetconfig_dataset branch from 8dfc7e4 to 3b093f2 Compare December 21, 2022 00:53

pattisdr added 4 commits December 21, 2022 09:34

Remove datasetconfig.dataset field from db_dataset yaml.

ded1ff4

Have the docs nest DatasetConfig endpoints under a DatasetConfig tag …

04646ab

…instead of a Datasets tag to not conflict with CTL Datasets. Dataset Configs link a Connection Config to a CTL Dataset.

Validate the data categories on the ctl_dataset before linking it to …

739be54

…a datasetconfig. Data categories must exist in the db.

Add new GET DatasetConfig List and Detail endpoints that show both th…

0f872f3

…e fides_key of the DatasetConfig as well as the entire nested CTL Dataset which has the fides_key of the separate CTL Dataset resource.

pattisdr changed the title ~~Remove DatasetConfig.dataset field [#1763]~~ Remove DatasetConfig.dataset field + New Get DatasetConfig Endpoints [#1763] Dec 21, 2022

pattisdr added 2 commits December 21, 2022 17:04

Raise a 422 if there's a validation issue with the linking CTL Dataset.

4e292ba

Update Minimum API calls to create an Access Privacy Request in Postm…

bb707aa

…an collection to use new flow to upsert the CTL DatasetConfig before creating a ConnectionConfig.

pattisdr commented Dec 22, 2022

View reviewed changes

pattisdr marked this pull request as ready for review December 22, 2022 03:43

pattisdr requested review from ThomasLaPiana, SteveDMurphy and seanpreston December 22, 2022 04:07

pattisdr added 2 commits December 22, 2022 12:32

When upserting, don't update the ids of existing resources.

4175501

Merge unified-fides-resources.

eabfcaa

Conflicts: tests/ctl/core/test_dataset.py

pattisdr commented Dec 22, 2022

View reviewed changes

allisonking approved these changes Dec 23, 2022

View reviewed changes

pattisdr mentioned this pull request Dec 23, 2022

Add Increased CTL Dataset Validation on Create/Update/Upsert #2115

Closed

Update admin UI for dataset config changes (#2113)

87197f7

pattisdr merged commit e6ad6b3 into unified-fides-resources Dec 23, 2022

pattisdr deleted the fidesops_1763_stop_storing_datasetconfig_dataset branch December 23, 2022 21:31

pattisdr linked an issue Dec 23, 2022 that may be closed by this pull request

Remove dataset yaml storing code from datasetconfig endpoint #1763

Closed

pattisdr mentioned this pull request Jan 17, 2023

Unified Fides Resources Feature #2254

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove DatasetConfig.dataset field + New Get DatasetConfig Endpoints [#1763] #2096

Remove DatasetConfig.dataset field + New Get DatasetConfig Endpoints [#1763] #2096

pattisdr commented Dec 20, 2022 •

edited

Loading

pattisdr Dec 22, 2022 •

edited

Loading

allisonking Dec 23, 2022

pattisdr Dec 23, 2022

pattisdr Dec 23, 2022

pattisdr Dec 22, 2022 •

edited

Loading

allisonking Dec 23, 2022

pattisdr Dec 23, 2022

pattisdr Dec 23, 2022

pattisdr Dec 23, 2022

pattisdr Dec 22, 2022

pattisdr Dec 22, 2022

pattisdr Dec 22, 2022

pattisdr Dec 22, 2022

allisonking left a comment

allisonking Dec 23, 2022

allisonking Dec 23, 2022

Remove DatasetConfig.dataset field + New Get DatasetConfig Endpoints [#1763] #2096

Remove DatasetConfig.dataset field + New Get DatasetConfig Endpoints [#1763] #2096

Conversation

pattisdr commented Dec 20, 2022 • edited Loading

Code Changes

Steps to Confirm

Pre-Merge Checklist

Description Of Changes

pattisdr Dec 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pattisdr Dec 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allisonking left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pattisdr commented Dec 20, 2022 •

edited

Loading

pattisdr Dec 22, 2022 •

edited

Loading

pattisdr Dec 22, 2022 •

edited

Loading