Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream DSR and connector related metadata from Fides #94

Closed
TheAndrewJackson opened this issue Nov 14, 2022 · 15 comments · Fixed by #95
Closed

Upstream DSR and connector related metadata from Fides #94

TheAndrewJackson opened this issue Nov 14, 2022 · 15 comments · Fixed by #95

Comments

@TheAndrewJackson
Copy link
Contributor

TheAndrewJackson commented Nov 14, 2022

The fides project has extended the fideslang models to included extra metadata for processing DSRs. Most notably the FidesopsMeta model. All of the changes made to fideslang models in the dataset.py should be upstreamed into the fideslang project. This is to standardize on what DSR metadata should look like.

Acceptance Criteria

  • Upstreamed all of the DSR fields from fides into fideslang
  • Update the Taxonomy model to not strip out extra fields that are included in JSON payloads
  • Update fidesops_meta to fides_meta
@TheAndrewJackson TheAndrewJackson added the enhancement New feature or request label Nov 14, 2022
@TheAndrewJackson
Copy link
Contributor Author

TheAndrewJackson commented Nov 14, 2022

@ThomasLaPiana I have a couple spec related questions:

  • Should fidesctl_meta be deprecated and or removed?
    • I'm not sure if this is used at all. If not, now would be a great time to delete it.
  • Does it make sense to rename fidesops_meta to dsr_meta instead of fides_meta? (@pattisdr I'm curious of what you think of this idea as well)
    • I was wondering if this might be a little more self documenting. From my undestanding fidesops_meta was used for DSRs so we could just skip straight to that.

@pattisdr
Copy link
Contributor

hey @TheAndrewJackson my vote would be for this to be as generic as possible and I wonder if dsr_meta is too specific because the information contained there could also be used for data maps too.

In other words, some of the information in fidesops_meta includes how one table is linked to another. We use that to build a graph when we're running DSR's, to know that we need to visit table A and table B before we visit table C, but those relationships also describe the existing system in a sense.

But I'm pretty new to the -ctl side of things, so @ThomasLaPiana can better weigh in on whether fidesops_meta information can be shared between -ops and -ctl features.

@TheAndrewJackson
Copy link
Contributor Author

@pattisdr That makes sense! Thanks for the input. I think fides_meta makes more sense if it contains other info in addition what's related to DSRs.

@ThomasLaPiana
Copy link
Contributor

Yep, I think fides_meta is a good catch-all! With the projects merged, it is much easier (neigh, impossible?) for us to accidentally overwrite each other's keys in that field, so I don't think we need to have separate _meta fields

also @SteveDMurphy can confirm whether or not we use fidesctl_meta, I believe we do for some system scanning and maybe classification?

@seanpreston
Copy link
Contributor

Hey @pattisdr I've re-assigned this from @TheAndrewJackson. It looks like some progress has already been made — when fidesplus is released we should do a quickhandover :).

@SteveDMurphy
Copy link
Contributor

also @SteveDMurphy can confirm whether or not we use fidesctl_meta, I believe we do for some system scanning and maybe classification?

oof, sorry I missed this! But yes the fidesctl_meta can currently be used by the scan system aws functionality. I believe there is also a generic meta field used that may be general purpose enough for our uses but open to other ideas as well 👍🏽

@pattisdr
Copy link
Contributor

pattisdr commented Dec 2, 2022

To simplify matters, we could keep our focus on combining just Dataset concepts here. scan system aws functionality writes to Systems > fidesctl_meta but not Datasets > fidesctl_meta. And Fidesops never did anything related to Systems, so we don't necessarily need to touch Systems just yet.

So focusing on Datasets, we could combine fidesctl_meta from fideslang and fidesops_meta from fidesops into one field:

  • Rename fideslang > Datasets.fidesctl_meta to be Datasets.fides_meta.
    • Update fideslang > DatasetMetadata class to also have fields from fides > FidesopsDatasetMeta
  • Downstream in fides, rename ctl_datasets.fidesctl_meta column to fides_meta

I think we should leave the generic meta field alone for customer use, per this docstring:

"An optional object that provides additional information about the Dataset. You can structure the object however you like. It can be a simple set of key: value properties or a deeply nested hierarchy of objects. How you use the object is up to you: Fides ignores it."

@ThomasLaPiana
Copy link
Contributor

To simplify matters, we could keep our focus on combining just Dataset concepts here. scan system aws functionality writes to Systems > fidesctl_meta but not Datasets > fidesctl_meta. And Fidesops never did anything related to Systems, so we don't necessarily need to touch Systems just yet.

So focusing on Datasets, we could combine fidesctl_meta from fideslang and fidesops_meta from fidesops into one field:

  • Rename fideslang > Datasets.fidesctl_meta to be Datasets.fides_meta.

    • Update fideslang > DatasetMetadata class to also have fields from fides > FidesopsDatasetMeta
  • Downstream in fides, rename ctl_datasets.fidesctl_meta column to fides_meta

I think we should leave the generic meta field alone for customer use, per this docstring:

"An optional object that provides additional information about the Dataset. You can structure the object however you like. It can be a simple set of key: value properties or a deeply nested hierarchy of objects. How you use the object is up to you: Fides ignores it."

Huge agree here!

@SteveDMurphy
Copy link
Contributor

  • Rename fideslang > Datasets.fidesctl_meta to be Datasets.fides_meta.
    • Update fideslang > DatasetMetadata class to also have fields from fides > FidesopsDatasetMeta
  • Downstream in fides, rename ctl_datasets.fidesctl_meta column to fides_meta

For the fideslang change only, would it make sense to introduce fides_meta across System and Organization (and anywhere else we need) as well so we can plan to deprecate the use of both fidesctl_meta and fidesops_meta without needing updates in both repo's again?

@ThomasLaPiana
Copy link
Contributor

Looking at the code, maybe we can directly add meta and fides_meta to the FidesBase class? That way we don't have to worry about this in the future, and I don't think it hurts for every object to have it? (every top-level object at least)

@NevilleS
Copy link
Contributor

NevilleS commented Dec 7, 2022

Quick backwards compat question: is it possible to make this field work if it's called fidesops_meta or fides_meta? I'd like to stay compatible with existing datasets, especially all the SaaS connector ones.

@pattisdr
Copy link
Contributor

pattisdr commented Dec 7, 2022

hey @NevilleS that's the very thing I'm mulling over now! What I'm considering is a validator that takes in fidesops_meta if it exists and replaces fides_meta with that value, and perhaps adds a deprecation warning there.

We also could move it over as fidesops_meta too, that is simpler.

@pattisdr
Copy link
Contributor

pattisdr commented Dec 8, 2022

Looking at the code, maybe we can directly add meta and fides_meta to the FidesBase class? That way we don't have to worry about this in the future, and I don't think it hurts for every object to have it? (every top-level object at least)

I think scope is starting to creep here, the PR's are getting large, so I'd prefer to keep this specific issue constrained to datasets. Happy to ticket a followup to address this if that's alright.

  • My current work is renaming fidesctl_meta to be fides_meta on the fideslang Dataset model and also doing a data migration downstream in fides ctl_datasets to match.
  • Adding this same thing to Systems, Organizations would mean matching migrations downstream, and adding updates to the systems-related code that was writing to the fidesctl_meta field.

@ThomasLaPiana
Copy link
Contributor

@pattisdr I can't argue with keeping a small scope 🙂 thanks for calling out the creep!

Agreed we should focus on datasets here

@pattisdr
Copy link
Contributor

pattisdr commented Dec 9, 2022

Great, reticketed here, we could do this last: https://github.com/ethyca/fideslang/issues/96

pattisdr added a commit that referenced this issue Dec 13, 2022
- Add < and > as allowed errors to FidesKey
- Override FidesValidationError to be a subclass of ValueError not Exception so Pydantic can pick up and raise as a ValidationError.
- Add fides_meta fields to Dataset, Collection, and DatasetField. These largely contain concepts for handling DSR's in fides.
- Dataset.fidesctl_meta has been renamed to Dataset.fides_meta.
- Add type validation.
- Allow both fidesops_meta and fides_meta to be specified on Dataset/Collection/DatasetField for backwards-compatibility but have instantiation move fidesops_meta to fides_meta.

Co-authored-by: Andrew Jackson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants