Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PROD-2663: specify dataset on fides pull #5260

Merged
merged 31 commits into from
Sep 12, 2024

Conversation

thingscouldbeworse
Copy link
Contributor

@thingscouldbeworse thingscouldbeworse commented Sep 5, 2024

Closes #PROD-2663

Description Of Changes

option for resource and key on push/pull

Code Changes

  • fides pull now has subcommands for each resource type and those subcommands take a fides_key argument to specify which one
  • if a manifest file already exists for that resource type the file is rewritten with the server version of the resource. Otherwise a new file is created

Steps to Confirm

All these commands should be valid

  • install editable pip install -e ./
  • fides pull
  • fides pull dataset fides_db
  • fides pull .fides/
  • fides pull .fides/ -a .fides/test_resources.yml
  • fides pull dataset fides_db

Pre-Merge Checklist

  • All CI Pipelines Succeeded
  • Documentation:
    • documentation complete, PR opened in fidesdocs
    • documentation issue created in fidesdocs
    • if there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
  • Issue Requirements are Met
  • Relevant Follow-Up Issues Created
  • Update CHANGELOG.md
  • For API changes, the Postman collection has been updated
  • If there are any database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!

Copy link

vercel bot commented Sep 5, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
fides-plus-nightly ⬜️ Ignored (Inspect) Visit Preview Sep 12, 2024 7:37pm

Copy link

cypress bot commented Sep 5, 2024

fides    Run #9943

Run Properties:  status check passed Passed #9943  •  git commit ca4603a753 ℹ️: Merge 3dfe8e9dc47210b703d86b7f88caf7afc3767d0a into 82324976d9120e7d139c9657b88e...
Project fides
Branch Review refs/pull/5260/merge
Run status status check passed Passed #9943
Run duration 00m 37s
Commit git commit ca4603a753 ℹ️: Merge 3dfe8e9dc47210b703d86b7f88caf7afc3767d0a into 82324976d9120e7d139c9657b88e...
Committer Kirk Hardy
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 4
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.
View all changes introduced in this branch ↗︎

def resource_type_option(command: Callable) -> Callable:
"Add the resource_type option."
command = click.option(
"--resource-type",
Copy link
Contributor

@NevilleS NevilleS Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really prefer this as an argument (e.g. fides pull dataset {key}) than as an option (--resource-type). It just feels out of place to me as an option, and inconsistent with fides ls [resource type]?

Is there a particular reason you didn't use the argument type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind (and the default behavior for click) an "argument" like that is a requirement, non optional, whereas an option is optional. It looks like we could override the behavior and write a custom class to make optional click arguments if you feel strongly enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all a consideration because fides push by default pushes everything. We could also split out this functionality into a different command

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid trying to hack the Click internals at all for this, they are well considered to make parsing a terminal command unambiguously.

That said... can you do this without achieve this by just having the "push" command group have a few subcommands with names like:

  • name=system
  • name=dataset
  • name=data_category
  • name=''

The empty string would effectively be the default sub-command. I think it might let you do that?

If it feels janky and weird, don't do it and follow the standard way for Click apps 👍

@pattisdr pattisdr self-requested a review September 9, 2024 19:56
Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there's some in-progress work here, just adding a few comments from experimenting with this -

src/fides/cli/commands/ungrouped.py Outdated Show resolved Hide resolved
config = ctx.obj["CONFIG"]
taxonomy = _parse.parse(manifests_dir)
# if the user has specified a specific fides_key, push only that dataset from taxonomy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just confused about the expected behavior. I guess I thought only the particular dataset would be pushed and not system resources and policy resources

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ticket it specifies that we should be able to push any resource (if I'm reading that correctly) https://ethyca.atlassian.net/browse/PROD-2663 " argument accepts top level resources via keys. (datasets, systems, categories, uses, subjects) "

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I interpreted it as only that resource would be pushed or pulled when the argument was supplied. If that's not expected that's fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, the argument being the fides_key? That is my intention here, I might be missing something. Does it seem like I'm pushing the wrong thing under the old "default" behavior?

Copy link
Contributor

@pattisdr pattisdr Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does limit on the fides_key - I have two datasets, fides_db and fides_db_2, and if I do fides push dataset fides_db_2, just fides_db_2 is pushed.

However, I wasn't expecting the policies, data_categories, and subjects, uses, etc. to be also pushed, but if I'm understanding your intent is still to push all other types and only limit the one if a fides key is supplied.

I am running these commands from the .fides directory that has other resources defined

fidesuser@3b88299a132b:/fides$  fides push dataset fides_db_2

> Loaded config from: /fides/.fides/fides.toml
Loading resource manifests from: .fides/
Taxonomy successfully created.
Orphan Dataset Warning: The following datasets are not found referenced on a System
fides_db_2
----------
Processing data_category resource(s)...
PUSHED 85 data_category resource(s).
----------
Processing policy resource(s)...
PUSHED 1 policy resource(s).
----------
Processing organization resource(s)...
PUSHED 1 organization resource(s).
----------
Processing dataset resource(s)...
PUSHED 1 dataset resource(s).
----------
Processing data_use resource(s)...
PUSHED 56 data_use resource(s).
----------
Processing data_subject resource(s)...
PUSHED 15 data_subject resource(s).
----------
Processing system resource(s)...
PUSHED 3 system resource(s).

src/fides/core/pull.py Outdated Show resolved Hide resolved
src/fides/core/pull.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
config = ctx.obj["CONFIG"]
taxonomy = _parse.parse(manifests_dir)
# if the user has specified a specific fides_key, push only that dataset from taxonomy
Copy link
Contributor

@pattisdr pattisdr Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does limit on the fides_key - I have two datasets, fides_db and fides_db_2, and if I do fides push dataset fides_db_2, just fides_db_2 is pushed.

However, I wasn't expecting the policies, data_categories, and subjects, uses, etc. to be also pushed, but if I'm understanding your intent is still to push all other types and only limit the one if a fides key is supplied.

I am running these commands from the .fides directory that has other resources defined

fidesuser@3b88299a132b:/fides$  fides push dataset fides_db_2

> Loaded config from: /fides/.fides/fides.toml
Loading resource manifests from: .fides/
Taxonomy successfully created.
Orphan Dataset Warning: The following datasets are not found referenced on a System
fides_db_2
----------
Processing data_category resource(s)...
PUSHED 85 data_category resource(s).
----------
Processing policy resource(s)...
PUSHED 1 policy resource(s).
----------
Processing organization resource(s)...
PUSHED 1 organization resource(s).
----------
Processing dataset resource(s)...
PUSHED 1 dataset resource(s).
----------
Processing data_use resource(s)...
PUSHED 56 data_use resource(s).
----------
Processing data_subject resource(s)...
PUSHED 15 data_subject resource(s).
----------
Processing system resource(s)...
PUSHED 3 system resource(s).

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pausing here, I want to switch to main and make sure I understand current fides push/fides pull behavior, and compare it to here -

Also there's no additional test coverage included here looks like - our cli tests could be more thorough but there's some tests that could be appended to

"""
if manifests_dir[-1] == "/":
manifests_dir = manifests_dir[:-1]
manifest_path = f"{manifests_dir}/{resource_type}.yaml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if there's already an existing file, instead of reconciling that resource you write to a separate local file? I'm just asking for clarification on whether these were requirements, not asking that it be changed)

This feels a little confusing with multiple definitions floating around:

  • Say I have one file, resource.yml
  • I push it up to the server: fides push
  • Some changes are made in the UI to dataset_1
  • I pull down just dataset 1 with your new fides pull dataset dataset_1
  • It gets added to a new file dataset.yaml
  • Now I have two different definitions in separate yaml files with the same fides key
  • I can no longer do a fides push:
sqlalchemy.exc.DBAPIError: (sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.CardinalityViolationError'>: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values

resource_type="dataset",
all_resources_file=None,
)
echo_green(f"Successfully pulled {fides_key} resource from the server.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blanket echo_green can be a little confusing -
Screenshot 2024-09-11 at 7 35 20 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and it feels redundant with the other. Removed here: 1901145#diff-aa790a7a4419e75ad08bcd578d54f03be70c59691b10c20c3857fcf5a9bdb783L83

@thingscouldbeworse thingscouldbeworse changed the title PROD-2663: specify dataset on fides push and pull PROD-2663: specify dataset on fides pull Sep 12, 2024
Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call to do a more targeted change here.

To capture what I understand is the behavior here:

  • A fides pull dataset test_dataset command will write the test_dataset definition to a brand new dataset.yaml file even if there is already a dataset definition within the current directory in a separate pre-existing file of a different name.

I think a little more clarity in the docstrings would be useful

resource_type: str,
) -> None:
"""
Pull a resource from the server by its fides_key and update the local manifest file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What might help clarify here are talking about the file its going to - you're writing out to a separate file you're likely creating.

@thingscouldbeworse thingscouldbeworse merged commit b6dee79 into main Sep 12, 2024
8 of 9 checks passed
@thingscouldbeworse thingscouldbeworse deleted the PROD-2663_fides-pull-specify-dataset branch September 12, 2024 19:34
Copy link

cypress bot commented Sep 12, 2024

fides    Run #9944

Run Properties:  status check passed Passed #9944  •  git commit b6dee79e00: PROD-2663: specify dataset on fides pull (#5260)
Project fides
Branch Review main
Run status status check passed Passed #9944
Run duration 00m 37s
Commit git commit b6dee79e00: PROD-2663: specify dataset on fides pull (#5260)
Committer Kirk Hardy
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 4
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.
View all changes introduced in this branch ↗︎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants