Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HJ-23 Add connection_type to NamespaceMeta #5387

Merged
merged 5 commits into from
Oct 18, 2024
Merged

HJ-23 Add connection_type to NamespaceMeta #5387

merged 5 commits into from
Oct 18, 2024

Conversation

erosselli
Copy link
Contributor

@erosselli erosselli commented Oct 16, 2024

Part of HJ-23

Description Of Changes

Migration to add the connection_type field to all existing namespace dicts in the fides_meta field of the Dataset model.

Code Changes

  • Migration to populate connection_type for existing datasets with a namespace

Steps to Confirm

  • Make sure you have at least 4 datasets created (but the more the merrier!):
    • A: One without a namespace key in its fides_meta field
    • B: One with a namespace key and value { "project_id": "some-id" } (simulates a BigQuery dataset created through D&D)
    • C: One with a namespace key and value { "database_instance_id": "some-id" } (simulates an RDS MySQL dataset created through D&D)
    • D: One with a namespace key and value { "other": "test" }
  • Run the migration (if it was already run when you started the server, you can just downgrade and then upgrade again using the alembic CLI)
  • Check that the dataset in created in (B) now has a connection_type field with value bigquery as part of its namespace
  • Check that the dataset in created in (C) now has a connection_type field with value rds_mysql as part of its namespace
  • The other two datasets should remain unchanged

Pre-Merge Checklist

  • All CI Pipelines Succeeded
  • Documentation:
    • documentation complete, PR opened in fidesdocs
    • documentation issue created in fidesdocs
    • if there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
  • Issue Requirements are Met
  • Relevant Follow-Up Issues Created
  • Update CHANGELOG.md
  • For API changes, the Postman collection has been updated
  • If there are any database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!

Copy link

vercel bot commented Oct 16, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
fides-plus-nightly ⬜️ Ignored (Inspect) Visit Preview Oct 18, 2024 3:23pm

Copy link

cypress bot commented Oct 16, 2024

fides    Run #10511

Run Properties:  status check passed Passed #10511  •  git commit 6e70867336 ℹ️: Merge 88b77a6f8adaaf38eb579561591f0e67db8efd7c into cb0579859b31e8f3b8c8ef5c232f...
Project fides
Run status status check passed Passed #10511
Run duration 00m 39s
Commit git commit 6e70867336 ℹ️: Merge 88b77a6f8adaaf38eb579561591f0e67db8efd7c into cb0579859b31e8f3b8c8ef5c232f...
Committer erosselli
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 4
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.

Copy link

codecov bot commented Oct 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.58%. Comparing base (cb05798) to head (88b77a6).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5387   +/-   ##
=======================================
  Coverage   85.58%   85.58%           
=======================================
  Files         379      379           
  Lines       23984    23989    +5     
  Branches     2623     2623           
=======================================
+ Hits        20526    20531    +5     
  Misses       2906     2906           
  Partials      552      552           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

"""
update ctl_datasets
set fides_meta = jsonb_set(fides_meta::jsonb, '{namespace, connection_type}', '"bigquery"', true)
where fides_meta::jsonb ? 'namespace';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to call out that I am a bit concerned about the performance of this migration -- opening the json to check whether it has a namespace key seems like it might not scale well.

Copy link
Contributor Author

@erosselli erosselli Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay tested on 500,000 rows and it took like 7s and on 1,120,000 rows it took 20s. we should be okay

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing that, makes for much higher confidence

@erosselli erosselli marked this pull request as ready for review October 17, 2024 17:59
Copy link
Contributor

@andres-torres-marroquin andres-torres-marroquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks perfect to me!

Copy link
Contributor

@thingscouldbeworse thingscouldbeworse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we get these back out of the database? Does the app just parse everything in fides_meta already?

)

# If the namespace dict has a database_instance_id, we know it's an RDS MySQL dataset.
op.execute(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing context from the ticket, sorry, but do we care about any other dialects? We'll fill those in later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fidesplus PR https://github.com/ethyca/fidesplus/pull/1673 (not ready for review yet) makes the necessary changes to start storing it for all datasets created through D&D . This migration just takes advantage of the fact that BQ and RDS MySQL were both already using the namespace key so we have a reliable way to identify existing datasets from these integrations. for other integrations we'll start storing the connection_type from now on but we don't have a way to populate their values for existing datasets

@erosselli
Copy link
Contributor Author

How do we get these back out of the database? Does the app just parse everything in fides_meta already?

@thingscouldbeworse yes, fides_meta is a JSON column so it gets parsed as a dictionary.

Copy link
Contributor

@thingscouldbeworse thingscouldbeworse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 👍

@erosselli erosselli merged commit ca8a095 into main Oct 18, 2024
39 checks passed
@erosselli erosselli deleted the HJ-23 branch October 18, 2024 15:54
Copy link

cypress bot commented Oct 18, 2024

fides    Run #10512

Run Properties:  status check passed Passed #10512  •  git commit ca8a0950d3: HJ-23 Add connection_type to NamespaceMeta (#5387)
Project fides
Run status status check passed Passed #10512
Run duration 00m 40s
Commit git commit ca8a0950d3: HJ-23 Add connection_type to NamespaceMeta (#5387)
Committer erosselli
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 4
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants