Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move privacy declarations to their own table #3098

Conversation

adamsachs
Copy link
Contributor

@adamsachs adamsachs commented Apr 18, 2023

❗ Contains migration; check downrev before merge

Closes #3036

Moves the objects stored in the JSON blob (array) column of System.privacy_declarations into their own table, privacydeclarations. All fields previously on the privacy_declaration objects are now columns on this new table. Additionally each record in the new table points via FK to their associated System record (identified by its fides_key); we also are able to make a proper FK link to the associated DataUse records, which are also identified by fides_key - previously, this was just a string field on the JSON object.

Maintains backward compatibility with existing "system" API by manually handling the privacy declarations defined inline on system create/update requests. There were already special-cased endpoints for System objects outside of the generic crud.py API due to special handling for system manager permissions - we build out those special-cased endpoints a bit further here. We're able to leverage the existing pydantic models (stored in fideslang) just about as-is - there was only a trivial fideslang update needed (v1.3.4) to set orm_mode = True on the PrivacyDeclaration model class.

Note that at this point, there's no API to interact with PrivacyDeclaration records directly. We may want to build that out in the future, but for now, keeping the API interactions through the System will support the existing UI just fine, and it allows us to limit our surface area of these changes a bit.

Code Changes

  • create a new PrivacyDeclaration table and associated SQLAlchemy model
    • include a data migration to populate the table based on System.privacy_declaration data
    • FK links to System (which will cascade deletes, i.e. System is the "parent" object) and to DataUse, which means that the declaration MUST point to an existing DataUse
  • update System APIs to create and maintain PrivacyDeclaration records
    • update the special-cased PUT/update function to upsert PrivacyDeclaration records associated with the System, as defined by their DataUse
    • make a new special-cased POST/create function for systems to create PrivacyDeclaration records associated with the System
    • make a new special-cased /upsert function to perform the update logic above for bulk upserts. note that we did not already have a special-cased /upsert function for System, i'm not exactly sure on the permissions implications
  • some other minor code updates for compatibility. generally, though, not much needs to change - we're still able to access privacy_declarations off of a System record via the back-populated relationship, so things mostly look the same for other parts of the codebase interacting with these records.

Steps to Confirm

  • from what I can tell, the UI seems to still be able to manage privacy declarations just fine

Pre-Merge Checklist

Description Of Changes

As privacy declarations (i.e. the data uses associated with a system) become more central to our application, it will benefit us to have them in their own table, with proper FK links to System records and DataUses. This will allow us to manage and query them much more easily moving forward.

The specific motivation for this change was to allow custom fields to apply to a given PrivacyDeclaration record, which wouldn't have been possible if they were defined as embedded JSON.

@adamsachs adamsachs force-pushed the 3036-move-systemprivacy_declarations-embedded-objects-into-their-own-db-table branch from 0565173 to 3295b50 Compare April 19, 2023 14:29
@cypress
Copy link

cypress bot commented Apr 19, 2023

Passing run #1544 ↗︎

0 3 0 0 Flakiness 0
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.

Details:

Merge 5503bee into e5cc20e...
Project: fides Commit: bf3c9d8541 ℹ️
Status: Passed Duration: 00:33 💡
Started: Apr 25, 2023 12:18 AM Ended: Apr 25, 2023 12:18 AM

This comment has been generated by cypress-bot as a result of this project's GitHub integration settings.

@adamsachs adamsachs force-pushed the 3036-move-systemprivacy_declarations-embedded-objects-into-their-own-db-table branch 2 times, most recently from f392170 to d3be6a1 Compare April 19, 2023 18:21
@codecov
Copy link

codecov bot commented Apr 19, 2023

Codecov Report

Patch coverage: 83.33% and project coverage change: -0.05 ⚠️

Comparison is base (e5cc20e) 87.52% compared to head (5503bee) 87.47%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3098      +/-   ##
==========================================
- Coverage   87.52%   87.47%   -0.05%     
==========================================
  Files         309      309              
  Lines       17840    17924      +84     
  Branches     2310     2325      +15     
==========================================
+ Hits        15614    15679      +65     
- Misses       1802     1820      +18     
- Partials      424      425       +1     
Impacted Files Coverage Δ
src/fides/connectors/aws.py 41.66% <ø> (ø)
src/fides/api/ctl/routes/system.py 83.47% <76.82%> (-12.45%) ⬇️
src/fides/api/ctl/routes/crud.py 88.00% <89.47%> (+0.32%) ⬆️
src/fides/api/ctl/sql_models.py 98.26% <95.65%> (+0.05%) ⬆️
src/fides/api/ctl/database/seed.py 91.87% <100.00%> (+0.12%) ⬆️
...i/ops/api/v1/endpoints/privacy_notice_endpoints.py 100.00% <100.00%> (ø)
src/fides/api/ops/models/privacy_notice.py 100.00% <100.00%> (ø)
src/fides/core/export.py 81.41% <100.00%> (ø)

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@adamsachs adamsachs force-pushed the 3036-move-systemprivacy_declarations-embedded-objects-into-their-own-db-table branch 2 times, most recently from ef2427e to 47aa334 Compare April 20, 2023 07:54
@adamsachs
Copy link
Contributor Author

This is still a bit rough around the edges, so i'll look to finalize some testing and polish, but in the meantime i wanted to get some preliminary eyes on it since i'm hoping this is how things will look generally, and want to know if there are any red flags.

@ThomasLaPiana - i think you're most familiar with Systems and the generic crud API, so would you mind looking over this change?
@pattisdr - tagging you too since it's always good to have your eyes on DB movements :) also, this interacts pretty heavily with a lot of the recent consent plumbing
@allisonking - you'd mentioned in the linked issue that you may be able to evaluate things on the FE. things looked good when i used the UI in testing, but it would be great to get a more discerning set of eyes on how the FE code is interacting with things. afaik, the API should be completely backward compatible, but i want to make sure nothing snuck in!

@adamsachs
Copy link
Contributor Author

codecov is complaining about a lot of places that i'm just about positive are covered by tests i've added (mainly in tests/ctl/core/test_api.py). perhaps @ThomasLaPiana you may have some ideas?

@pattisdr
Copy link
Contributor

Starting a review @adamsachs ...

Copy link
Contributor

@allisonking allisonking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there appears to be something bugging out in the frontend when there are two declarations with the same data use

Screen.Recording.2023-04-20.at.12.51.49.PM.mov

the FE was doing some logic to make sure we didn't get duplicate privacy declarations (declarations with the same data use AND the same name). this was because there was never a unique constraint on the BE for privacy declarations back when they were a json blob. since declarations also didn't have IDs, the frontend had to do some other weird stuff to work around that.

I suspect those workarounds combined with something in this change is creating this bug. I can look into it, but curious if you have any immediate thoughts first @adamsachs ?

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't spend as much time as I normally would here as it's in draft, but generally agree with the approach. Nice work especially on the upserting privacy declaration logic.

src/fides/api/ctl/sql_models.py Outdated Show resolved Hide resolved
src/fides/api/ctl/sql_models.py Outdated Show resolved Hide resolved
src/fides/api/ctl/sql_models.py Outdated Show resolved Hide resolved
src/fides/api/ctl/sql_models.py Outdated Show resolved Hide resolved
src/fides/api/ctl/routes/system.py Show resolved Hide resolved
src/fides/api/ctl/routes/system.py Outdated Show resolved Hide resolved
tests/ctl/conftest.py Outdated Show resolved Hide resolved
@adamsachs
Copy link
Contributor Author

I suspect those workarounds combined with something in this change is creating this bug. I can look into it, but curious if you have any immediate thoughts first @adamsachs ?

ah yes thanks for pointing this out @allisonking! i'd assumed that a given System can only have one privacy declaration with a given DataUse. Some of the BE API logic I implemented assumes that to be the case. We could certainly adjust that, though, if it's not desired behavior - which I think is more of a product question.

So tagging in @mfbrown and @rsilvery - do we want to allow for more than one 'privacy declaration' pointing to a particular DataUse on a given System?

@rsilvery
Copy link
Contributor

I suspect those workarounds combined with something in this change is creating this bug. I can look into it, but curious if you have any immediate thoughts first @adamsachs ?

ah yes thanks for pointing this out @allisonking! i'd assumed that a given System can only have one privacy declaration with a given DataUse. Some of the BE API logic I implemented assumes that to be the case. We could certainly adjust that, though, if it's not desired behavior - which I think is more of a product question.

So tagging in @mfbrown and @rsilvery - do we want to allow for more than one 'privacy declaration' pointing to a particular DataUse on a given System?

@mfbrown didn't this come up with Momentive and we changed it so you could have the same data use so long as the purpose for processing was different?

@mfbrown
Copy link

mfbrown commented Apr 20, 2023

I suspect those workarounds combined with something in this change is creating this bug. I can look into it, but curious if you have any immediate thoughts first @adamsachs ?

ah yes thanks for pointing this out @allisonking! i'd assumed that a given System can only have one privacy declaration with a given DataUse. Some of the BE API logic I implemented assumes that to be the case. We could certainly adjust that, though, if it's not desired behavior - which I think is more of a product question.
So tagging in @mfbrown and @rsilvery - do we want to allow for more than one 'privacy declaration' pointing to a particular DataUse on a given System?

@mfbrown didn't this come up with Momentive and we changed it so you could have the same data use so long as the purpose for processing was different?

That is correct. There is a need to have the same data use more than once on a system and the check for uniqueness is against both the Use + Activity (which is the declaration name in the BE, i believe)

@allisonking
Copy link
Contributor

thanks @adamsachs ! let me know if you want me to take a look again. if you update the BE to have that behavior and the FE still isn't behaving, let me know, since it could definitely be that the FE logic is no longer relevant. regardless, once this PR is in, I'll make a follow up ticket to update some of the FE, since I don't think it'll need to do as many checks as it is doing right now :)

one other thing I noticed—I tried to autogenerate some typescript types based off of the openapi spec and noticed the spec for a PrivacyDeclaration didn't have any changes. I'd expect it to have an id now that it has its own table? I just poked around a bit, and I'm guessing it's because the model in fideslang doesn't have an id? anyway, it's not urgent or anything, and I wouldn't want it to break anything with how systems can be updated right now, but just want to make sure I'm understanding things right

@adamsachs adamsachs force-pushed the 3036-move-systemprivacy_declarations-embedded-objects-into-their-own-db-table branch 3 times, most recently from 1e7fff7 to 4193756 Compare April 20, 2023 23:29
@adamsachs adamsachs marked this pull request as ready for review April 20, 2023 23:33
@adamsachs
Copy link
Contributor Author

marking this as ready for review as i think i've addressed all the major concerns/decisions from the initial review!

@allisonking i've pushed a new tag 2.11.1a4 that can be used for some testing if you'd like. i ran through things (though not in plus) on my end in the UI, and they looked good, including the edge-ish case you'd identified.

@adamsachs
Copy link
Contributor Author

hi @pattisdr @ThomasLaPiana, a friendly nudge on this for another round of BE review as i'd like to get this merged ASAP since there are a few subsequent pieces of functionality due this sprint that depend on this. i'll be out for most of the earlier part of the day on monday but should be able to circle back to address any feedback monday evening, if either of you are able to take a look earlier in the day. i would really appreciate it!

@pattisdr
Copy link
Contributor

thanks @adamsachs don't worry I haven't forgotten!

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works well @adamsachs just a couple of cleanup things, the main refactoring to FK to system id instead of system fides_key looks good, and also the soft references to the data use.

src/fides/api/ctl/sql_models.py Outdated Show resolved Hide resolved
src/fides/api/ctl/sql_models.py Show resolved Hide resolved
@ThomasLaPiana
Copy link
Contributor

@adamsachs Great work here, lots of places to trip-up but it looks like they all got solved 😄

As for the codecov, I looked through the new API tests and the codecov "missed lines". It is possible that some of those error states (mostly what it says is missing) aren't getting caught, but I'm not majorly concerned here either. I am unsure why the loss in coverage is so large though

@adamsachs adamsachs force-pushed the 3036-move-systemprivacy_declarations-embedded-objects-into-their-own-db-table branch from 56c6f9f to a578fbe Compare April 24, 2023 23:23
@adamsachs adamsachs merged commit d984bd6 into main Apr 25, 2023
@adamsachs adamsachs deleted the 3036-move-systemprivacy_declarations-embedded-objects-into-their-own-db-table branch April 25, 2023 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move System.privacy_declarations embedded objects into their own DB table
7 participants