Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fideslang 3.0 upgrades (language changes only, no pydantic updates!) #4502

Merged
merged 10 commits into from
Dec 15, 2023

Conversation

adamsachs
Copy link
Contributor

@adamsachs adamsachs commented Dec 8, 2023

Closes PROD-1490

⚠️ depends on https://github.com/ethyca/fideslang/pull/186 being merged and release of fideslang 3.0 ⚠️

Description Of Changes

Removes references to constructs that are being removed in fideslang 3.0.0.

Removals:

  • DataQualifier
  • Registry
  • System fields (already had been deprecated):
    • joint_controller
    • third_country_transfers
    • data_responsibility_title
    • data_protection_impact_assessment
  • Dataset fields (already had been deprecated):
    • joint_controller
    • data_qualifier
    • retention
    • third_country_transfers
  • DataUse fields (already had been deprecated):
    • legal_basis
    • special_category
    • recipients
    • legitimate_interest
    • legitimate_interest_impact_assessment

Code Changes

  • update dataset annotation
  • sqlalchemy model updates
  • migrate to remove tables and columns
  • remove relevant API routers
  • misc. BE references
  • remove a bunch of FE references ...

Steps to Confirm

  • tests passing
  • ensure relevant database artifacts are removed after migration
  • spin up admin UI
    • ensure system management works as expected - adding and editing systems, comprehensive test of different fields, etc.
    • ensure taxonomy editor still works as expected
  • perform some manual regression testing on fides CLI
  • will need comprehensive testing over in fidesplus too: PR here https://github.com/ethyca/fidesplus/pull/1267

Pre-Merge Checklist

Copy link

vercel bot commented Dec 8, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
fides-plus-nightly ⬜️ Ignored (Inspect) Visit Preview Dec 15, 2023 4:11pm

Copy link

cypress bot commented Dec 8, 2023

Passing run #5675 ↗︎

0 4 0 0 Flakiness 0
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.

Details:

Merge 1fd7cf8 into 9e98d22...
Project: fides Commit: 2e4ce81fcf ℹ️
Status: Passed Duration: 00:33 💡
Started: Dec 15, 2023 4:22 PM Ended: Dec 15, 2023 4:22 PM

Review all test suite changes for PR #4502 ↗︎

@adamsachs adamsachs force-pushed the asachs/PROD-1490-fides branch 2 times, most recently from cbbe934 to 121363b Compare December 8, 2023 22:44
Copy link

codecov bot commented Dec 8, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (a70ea22) 87.09% compared to head (1fd7cf8) 87.06%.
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4502      +/-   ##
==========================================
- Coverage   87.09%   87.06%   -0.04%     
==========================================
  Files         332      333       +1     
  Lines       20514    20465      -49     
  Branches     2642     2641       -1     
==========================================
- Hits        17867    17817      -50     
- Misses       2180     2181       +1     
  Partials      467      467              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking great Adam.

The main thing I'd want to check - do API requests ignore when deprecated values are supplied (which is what I'd expect) or do we get failures? I think "ignore" is Pydantic's default but I'm not sure every schema has that set.

Comment on lines -578 to -588
- name: legal_basis
data_categories:
- system.operations
data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
- name: legitimate_interest
description: Boolean value denoting whether or not the data use is marked as
a legitimate interest
data_categories:
- system.operations
data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
- name: legitimate_interest_impact_assessment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, did you remember to go into this file and remove this or was it automatically flagged somewhere? I've felt this file only gets flagged if it's missing values not if there are too many

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you're right on that behavior. pretty sure i found the references just via a global string search on the codebase 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, definitely error prone, if i've missed something!

@@ -1434,12 +1434,6 @@ def test_update_system_manager_existing_system_not_in_request_which_removes_syst
"description": "fixture-made-system",
"organization_fides_key": "default_organization",
"system_type": "Service",
"data_responsibility_title": "Processor",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also remove this whole test file: tests/ops/migration_tests/test_system_dictionary_data_migration.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i did consider removing that but i felt that the migration itself will still technically run, so in some senses, it's still valid to have those tests in place?? anyway, didn't see a whole lot of harm in keeping it in place because it was all passing without needing any updates 🤷

op.drop_column("ctl_datasets", "data_qualifier")
op.drop_index("ix_ctl_systems_name", table_name="ctl_systems")
op.drop_column("privacydeclaration", "data_qualifier")
op.drop_constraint("purpose_constraint", "tcf_purpose_overrides", type_="unique")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah this is my fault not properly defined on model, fixed in overhaul PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right i hadn't even noticed that here, shows you how closely i inspected the autogenerated migrations 😬

i'll remove from here, thanks for catching!

)
op.create_table(
"ctl_data_qualifiers",
sa.Column("id", sa.VARCHAR(length=255), autoincrement=False, nullable=False),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the downrevs, I'd update all these to be nullable=True so it can actually be downgraded

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, great catch!

tests/ctl/core/test_audit.py Show resolved Hide resolved
@adamsachs
Copy link
Contributor Author

looking great Adam.

The main thing I'd want to check - do API requests ignore when deprecated values are supplied (which is what I'd expect) or do we get failures? I think "ignore" is Pydantic's default but I'm not sure every schema has that set.

yup, it looks like it! here's some manual testing i've done, let me know if this looks to cover things decently:

  • Data qualifier: PUT /api/v1/policy with a payload that has a data_qualifier field - the API call succeeds, but data_qualifier is not saved, as expected:
curl -X 'PUT' \
  'http://localhost:8080/api/v1/policy' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer $TOKEN' \
  -H 'Content-Type: application/json' \
  -d '
  {
    "fides_key": "Gb3FAj1oKEI4DIrkUcINbyEyLBPOUMH8GKfO8<-pVIgzkIk3Tg-ToCqPVXUOKctdkl_G16mpQKrFs0u5okHM<RR-fqgeE2KJY3Wbr",
    "organization_fides_key": "default_organization",
    "tags": [
      "string"
    ],
    "name": "string",
    "description": "string",
    "rules": [
      {
        "name": "string",
        "data_categories": {
          "matches": "ANY",
          "values": [
            "pppr0IgkPJwLAqCw63lTVJxBl>k.K2cVROyHZnFHBg-ib7ixBBeR4yPQLHEmkv3pA3d0vVZNozW2GbDbumR"
          ]
        },
        "data_uses": {
          "matches": "ANY",
          "values": [
            "CKywI8KXUYyx8V6FJpiLBu2LFkEJ1MlrvdX6s<uiTsN8xYF..Vt6qf"
          ]
        },
        "data_subjects": {
          "matches": "ANY",
          "values": [
            "12j.3gF6OdWUYk955iP6UoDJl3oXQF9i_egDkvcmeWx9Dayws7GSUqStXMOPKo1PFM4xtiv_agmKR5sMccJC"
          ]
        },
        "data_qualifier": "aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified"
      }
    ]
  }
'
  • Registry: PUT /api/v1/system with a payload with a registry_id field - the API call succeeds, but registry_id field is ignored, as expected:
curl -X 'PUT' \
  'http://localhost:8080/api/v1/system' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer $TOKEN' \
  -H 'Content-Type: application/json' \
  -d '
  {
    "fides_key": "ts1",
    "organization_fides_key": "default_organization",
    "tags": [],
    "name": "ts1",
    "description": "",
    "meta": null,
    "fidesctl_meta": null,
    "system_type": "",
    "egress": null,
    "ingress": null,
    "privacy_declarations": [
      {
        "registry_id": 3,
        "name": "",
        "data_categories": [
          "system"
        ],
        "data_use": "analytics",
        "data_subjects": [
          "citizen_voter"
        ],
        "dataset_references": null,
        "egress": null,
        "ingress": null,
        "features": [],
        "flexible_legal_basis_for_processing": true,
        "legal_basis_for_processing": null,
        "impact_assessment_location": "",
        "retention_period": "",
        "processes_special_category_data": false,
        "special_category_legal_basis": null,
        "data_shared_with_third_parties": false,
        "third_parties": null,
        "shared_categories": [],
        "cookies": []
      }
    ],
    "administrating_department": "",
    "vendor_id": null,
    "previous_vendor_id": null,
    "dataset_references": [],
    "processes_personal_data": true,
    "exempt_from_privacy_regulations": false,
    "reason_for_exemption": null,
    "uses_profiling": false,
    "legal_basis_for_profiling": [],
    "does_international_transfers": false,
    "legal_basis_for_transfers": [],
    "requires_data_protection_assessments": false,
    "dpa_location": null,
    "dpa_progress": null,
    "privacy_policy": null,
    "legal_name": "",
    "legal_address": "",
    "responsibility": [],
    "dpo": "",
    "joint_controller_info": "",
    "data_security_practices": "",
    "cookie_max_age_seconds": null,
    "uses_cookies": false,
    "cookie_refresh": false,
    "uses_non_cookie_access": false,
    "legitimate_interest_disclosure_url": null,
    "cookies": [],
    "created_at": "2023-12-13T22:14:13.123634+00:00"
  }
'
  • Data use fields: PUT /api/v1/data_use with a payload with our deprecated fields - the API call succeeds, but deprecated fields are ignored, as expected:
curl -X 'PUT' \
  'http://localhost:8080/api/v1/data_use' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer $TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
    "version_added": "2.0.0",
    "version_deprecated": null,
    "replaced_by": null,
    "is_default": true,
    "fides_key": "analytics",
    "organization_fides_key": "default_organization",
    "tags": null,
    "name": "Analytics",
    "description": "Provides analytics for activities such as system and advertising performance reporting, insights and fraud detection.",
    "parent_key": null,
    "active": true,
    "legal_basis": "Consent",
    "recipients": ["foo"],
    "special_category": "Consent",
    "legitimate_interest": true,
    "legitimate_interest_impact_assessment": "http://foo.com"
  }'
  • System fields: PUT /api/v1/system with a payload with our deprecated fields - the API call succeeds, but deprecated fields are ignored, as expected:
curl -X 'PUT' \
  'http://localhost:8080/api/v1/system' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer $TOKEN' \
  -H 'Content-Type: application/json' \
  -d '
  {
    "fides_key": "ts1",
    "joint_controller": {
      "name": "foo"
    },
    "third_country_transfers": ["US"],
    "data_responsibility_title": "Controller",
    "data_protection_impact_assessment": {
      "is_required": true
    },
    "organization_fides_key": "default_organization",
    "tags": [],
    "name": "ts1",
    "description": "",
    "meta": null,
    "fidesctl_meta": null,
    "system_type": "",
    "egress": null,
    "ingress": null,
    "privacy_declarations": [
      {
        "registry_id": 3,
        "name": "",
        "data_categories": [
          "system"
        ],
        "data_use": "analytics",
        "data_subjects": [
          "citizen_voter"
        ],
        "dataset_references": null,
        "egress": null,
        "ingress": null,
        "features": [],
        "flexible_legal_basis_for_processing": true,
        "legal_basis_for_processing": null,
        "impact_assessment_location": "",
        "retention_period": "",
        "processes_special_category_data": false,
        "special_category_legal_basis": null,
        "data_shared_with_third_parties": false,
        "third_parties": null,
        "shared_categories": [],
        "cookies": []
      }
    ],
    "administrating_department": "",
    "vendor_id": null,
    "previous_vendor_id": null,
    "dataset_references": [],
    "processes_personal_data": true,
    "exempt_from_privacy_regulations": false,
    "reason_for_exemption": null,
    "uses_profiling": false,
    "legal_basis_for_profiling": [],
    "does_international_transfers": false,
    "legal_basis_for_transfers": [],
    "requires_data_protection_assessments": false,
    "dpa_location": null,
    "dpa_progress": null,
    "privacy_policy": null,
    "legal_name": "",
    "legal_address": "",
    "responsibility": [],
    "dpo": "",
    "joint_controller_info": "",
    "data_security_practices": "",
    "cookie_max_age_seconds": null,
    "uses_cookies": false,
    "cookie_refresh": false,
    "uses_non_cookie_access": false,
    "legitimate_interest_disclosure_url": null,
    "cookies": [],
    "created_at": "2023-12-13T22:14:13.123634+00:00"
  }
'
  • Dataset fields: confirmed using YAML editor in UI that the old demo_dataset.yml from fideslang, that includes some of our deprecated fields, was able to successfully be uploaded. the deprecated fields were ignored. 👍

@pattisdr
Copy link
Contributor

pattisdr commented Dec 13, 2023

ah such thorough testing thank you 🙏 #4502 (comment)

Do you anticipate any issues with previously-saved data with deprecated fields being serialized? I don't think it's a problem similar logic to your endpoints (thinking of some of the fields where data is stored in json)

@adamsachs
Copy link
Contributor Author

ah such thorough testing thank you 🙏 #4502 (comment)

Do you anticipate any issues with previously-saved data with deprecated fields being serialized? I don't think it's a problem similar logic to your endpoints

hmm, well shouldn't all that data be removed as part of the migrations? i may be missing something or misunderstanding what you mean!

@pattisdr
Copy link
Contributor

I was thinking about fields stored as json? like ctl_datasets has a collections json field - I assume this would have data qualifiers buried in it, would just want to make sure these were likewise ignored

@adamsachs
Copy link
Contributor Author

I was thinking about fields stored as json? like ctl_datasets has a collections json field - I assume this would have data qualifiers buried in it, would just want to make sure these were likewise ignored

Ah right, forgot about the embedded JSON fields - i'll look to confirm that tomorrow 👍

@adamsachs
Copy link
Contributor Author

I was thinking about fields stored as json? like ctl_datasets has a collections json field - I assume this would have data qualifiers buried in it, would just want to make sure these were likewise ignored

Ah right, forgot about the embedded JSON fields - i'll look to confirm that tomorrow 👍

OK, this is looking good. just to document what i did:

  • nox -s teardown -- volumes to get into a totally clean db state
  • git checkout main to get onto our current main
  • nox -s dev -- shell, then fides user login -u root_user -P [password], and then fides push demo_resources. this adds the demo_dataset.yml to the server, which on main has data_qualifiers and other removed fields embedded in its collections.
    • confirmed the DB entry has those deprecated fields within the collections JSON. here's the column value:
[
  {
    "name": "users",
    "description": "User information",
    "data_categories": [],
    "data_qualifier": "aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified",
    "retention": null,
    "fields": [
      {
        "name": "created_at",
        "description": "User's creation timestamp",
        "data_categories": [
          "system.operations"
        ],
        "data_qualifier": "aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified",
        "retention": null,
        "fides_meta": null,
        "fields": null
      },
      {
        "name": "email",
        "description": "User's Email",
        "data_categories": [
          "user.contact.email"
        ],
        "data_qualifier": "aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified",
        "retention": "Account termination",
        "fides_meta": null,
        "fields": null
      },
      {
        "name": "first_name",
        "description": "User's first name",
        "data_categories": [
          "user.name"
        ],
        "data_qualifier": "aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified",
        "retention": "Account termination",
        "fides_meta": null,
        "fields": null
      },
      {
        "name": "food_preference",
        "description": "User's favorite food",
        "data_categories": [],
        "data_qualifier": "aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified",
        "retention": null,
        "fides_meta": null,
        "fields": null
      },
      {
        "name": "state",
        "description": "User's State",
        "data_categories": [
          "user.contact.address.state"
        ],
        "data_qualifier": "aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified",
        "retention": null,
        "fides_meta": null,
        "fields": null
      },
      {
        "name": "uuid",
        "description": "User's unique ID",
        "data_categories": [
          "user.unique_id"
        ],
        "data_qualifier": "aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified",
        "retention": null,
        "fides_meta": null,
        "fields": null
      }
    ],
    "fides_meta": null
  }
]
  • stopped my server, git checkout asachs/PROD-1490-fides, started my server again (no db reset) with nox -s dev -- shell
  • confirmed i can still GET /api/v1/dataset just fine, the dataset is returned, the deprecated fields are just excluded from the response. here's the response payload :
[
  {
    "fides_key": "demo_users_dataset",
    "organization_fides_key": "default_organization",
    "tags": null,
    "name": "Demo Users Dataset",
    "description": "Data collected about users for our analytics system.",
    "meta": null,
    "data_categories": [],
    "fides_meta": null,
    "collections": [
      {
        "name": "users",
        "description": "User information",
        "data_categories": [],
        "retention": null,
        "fields": [
          {
            "name": "created_at",
            "description": "User's creation timestamp",
            "data_categories": [
              "system.operations"
            ],
            "retention": null,
            "fides_meta": null,
            "fields": null
          },
          {
            "name": "email",
            "description": "User's Email",
            "data_categories": [
              "user.contact.email"
            ],
            "retention": "Account termination",
            "fides_meta": null,
            "fields": null
          },
          {
            "name": "first_name",
            "description": "User's first name",
            "data_categories": [
              "user.name"
            ],
            "retention": "Account termination",
            "fides_meta": null,
            "fields": null
          },
          {
            "name": "food_preference",
            "description": "User's favorite food",
            "data_categories": [],
            "retention": null,
            "fides_meta": null,
            "fields": null
          },
          {
            "name": "state",
            "description": "User's State",
            "data_categories": [
              "user.contact.address.state"
            ],
            "retention": null,
            "fides_meta": null,
            "fields": null
          },
          {
            "name": "uuid",
            "description": "User's unique ID",
            "data_categories": [
              "user.unique_id"
            ],
            "retention": null,
            "fides_meta": null,
            "fields": null
          }
        ],
        "fides_meta": null
      }
    ]
  }
]
  • i was able to fides push demo_resources/ (which now has the new definition on this branch, i.e. without deprecated fields) and things still worked well. i even tried reverting the state of demo_dataset.yml back to its state on main, with the deprecated fields, and that also still worked too with the fides push (as expected, i'd basically tested that before!)

so i think we're looking good on this front!

@pattisdr
Copy link
Contributor

excellent thank you for verifying @adamsachs 🏆

@adamsachs
Copy link
Contributor Author

as mentioned @pattisdr , i think this is just about ready for a final review! i've done some amount of manual testing here and in fidesplus, but i'll look to be a bit more thorough in the next day or two 👍

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice careful work here 👍

@adamsachs adamsachs merged commit 508e81e into main Dec 15, 2023
48 checks passed
@adamsachs adamsachs deleted the asachs/PROD-1490-fides branch December 15, 2023 17:17
@adamsachs adamsachs mentioned this pull request Dec 15, 2023
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants