-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PROD-2663: specify dataset on fides pull #5260
PROD-2663: specify dataset on fides pull #5260
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
fides Run #9943
Run Properties:
|
Project |
fides
|
Branch Review |
refs/pull/5260/merge
|
Run status |
Passed #9943
|
Run duration | 00m 37s |
Commit |
ca4603a753 ℹ️: Merge 3dfe8e9dc47210b703d86b7f88caf7afc3767d0a into 82324976d9120e7d139c9657b88e...
|
Committer | Kirk Hardy |
View all properties for this run ↗︎ |
Test results | |
---|---|
Failures |
0
|
Flaky |
0
|
Pending |
0
|
Skipped |
0
|
Passing |
4
|
Upgrade your plan to view test results. | |
View all changes introduced in this branch ↗︎ |
def resource_type_option(command: Callable) -> Callable: | ||
"Add the resource_type option." | ||
command = click.option( | ||
"--resource-type", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really prefer this as an argument (e.g. fides pull dataset {key}
) than as an option (--resource-type
). It just feels out of place to me as an option, and inconsistent with fides ls [resource type]
?
Is there a particular reason you didn't use the argument type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind (and the default behavior for click) an "argument" like that is a requirement, non optional, whereas an option is optional. It looks like we could override the behavior and write a custom class to make optional click arguments if you feel strongly enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all a consideration because fides push
by default pushes everything. We could also split out this functionality into a different command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd avoid trying to hack the Click internals at all for this, they are well considered to make parsing a terminal command unambiguously.
That said... can you do this without achieve this by just having the "push" command group have a few subcommands with names like:
- name=system
- name=dataset
- name=data_category
- name=''
The empty string would effectively be the default sub-command. I think it might let you do that?
If it feels janky and weird, don't do it and follow the standard way for Click apps 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there's some in-progress work here, just adding a few comments from experimenting with this -
src/fides/cli/commands/ungrouped.py
Outdated
config = ctx.obj["CONFIG"] | ||
taxonomy = _parse.parse(manifests_dir) | ||
# if the user has specified a specific fides_key, push only that dataset from taxonomy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just confused about the expected behavior. I guess I thought only the particular dataset would be pushed and not system resources and policy resources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the ticket it specifies that we should be able to push any resource (if I'm reading that correctly) https://ethyca.atlassian.net/browse/PROD-2663 " argument accepts top level resources via keys. (datasets, systems, categories, uses, subjects) "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I interpreted it as only that resource would be pushed or pulled when the argument was supplied. If that's not expected that's fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, the argument being the fides_key
? That is my intention here, I might be missing something. Does it seem like I'm pushing the wrong thing under the old "default" behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does limit on the fides_key
- I have two datasets, fides_db
and fides_db_2
, and if I do fides push dataset fides_db_2
, just fides_db_2
is pushed.
However, I wasn't expecting the policies, data_categories, and subjects, uses, etc. to be also pushed, but if I'm understanding your intent is still to push all other types and only limit the one if a fides key is supplied.
I am running these commands from the .fides
directory that has other resources defined
fidesuser@3b88299a132b:/fides$ fides push dataset fides_db_2
> Loaded config from: /fides/.fides/fides.toml
Loading resource manifests from: .fides/
Taxonomy successfully created.
Orphan Dataset Warning: The following datasets are not found referenced on a System
fides_db_2
----------
Processing data_category resource(s)...
PUSHED 85 data_category resource(s).
----------
Processing policy resource(s)...
PUSHED 1 policy resource(s).
----------
Processing organization resource(s)...
PUSHED 1 organization resource(s).
----------
Processing dataset resource(s)...
PUSHED 1 dataset resource(s).
----------
Processing data_use resource(s)...
PUSHED 56 data_use resource(s).
----------
Processing data_subject resource(s)...
PUSHED 15 data_subject resource(s).
----------
Processing system resource(s)...
PUSHED 3 system resource(s).
src/fides/cli/commands/ungrouped.py
Outdated
config = ctx.obj["CONFIG"] | ||
taxonomy = _parse.parse(manifests_dir) | ||
# if the user has specified a specific fides_key, push only that dataset from taxonomy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does limit on the fides_key
- I have two datasets, fides_db
and fides_db_2
, and if I do fides push dataset fides_db_2
, just fides_db_2
is pushed.
However, I wasn't expecting the policies, data_categories, and subjects, uses, etc. to be also pushed, but if I'm understanding your intent is still to push all other types and only limit the one if a fides key is supplied.
I am running these commands from the .fides
directory that has other resources defined
fidesuser@3b88299a132b:/fides$ fides push dataset fides_db_2
> Loaded config from: /fides/.fides/fides.toml
Loading resource manifests from: .fides/
Taxonomy successfully created.
Orphan Dataset Warning: The following datasets are not found referenced on a System
fides_db_2
----------
Processing data_category resource(s)...
PUSHED 85 data_category resource(s).
----------
Processing policy resource(s)...
PUSHED 1 policy resource(s).
----------
Processing organization resource(s)...
PUSHED 1 organization resource(s).
----------
Processing dataset resource(s)...
PUSHED 1 dataset resource(s).
----------
Processing data_use resource(s)...
PUSHED 56 data_use resource(s).
----------
Processing data_subject resource(s)...
PUSHED 15 data_subject resource(s).
----------
Processing system resource(s)...
PUSHED 3 system resource(s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pausing here, I want to switch to main and make sure I understand current fides push
/fides pull
behavior, and compare it to here -
Also there's no additional test coverage included here looks like - our cli tests could be more thorough but there's some tests that could be appended to
""" | ||
if manifests_dir[-1] == "/": | ||
manifests_dir = manifests_dir[:-1] | ||
manifest_path = f"{manifests_dir}/{resource_type}.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if there's already an existing file, instead of reconciling that resource you write to a separate local file? I'm just asking for clarification on whether these were requirements, not asking that it be changed)
This feels a little confusing with multiple definitions floating around:
- Say I have one file,
resource.yml
- I push it up to the server:
fides push
- Some changes are made in the UI to dataset_1
- I pull down just dataset 1 with your new
fides pull dataset dataset_1
- It gets added to a new file
dataset.yaml
- Now I have two different definitions in separate yaml files with the same fides key
- I can no longer do a
fides push
:
sqlalchemy.exc.DBAPIError: (sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.CardinalityViolationError'>: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values
src/fides/cli/commands/pull.py
Outdated
resource_type="dataset", | ||
all_resources_file=None, | ||
) | ||
echo_green(f"Successfully pulled {fides_key} resource from the server.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and it feels redundant with the other. Removed here: 1901145#diff-aa790a7a4419e75ad08bcd578d54f03be70c59691b10c20c3857fcf5a9bdb783L83
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call to do a more targeted change here.
To capture what I understand is the behavior here:
- A
fides pull dataset test_dataset
command will write thetest_dataset
definition to a brand newdataset.yaml
file even if there is already a dataset definition within the current directory in a separate pre-existing file of a different name.
I think a little more clarity in the docstrings would be useful
src/fides/core/pull.py
Outdated
resource_type: str, | ||
) -> None: | ||
""" | ||
Pull a resource from the server by its fides_key and update the local manifest file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What might help clarify here are talking about the file its going to - you're writing out to a separate file you're likely creating.
fides Run #9944
Run Properties:
|
Project |
fides
|
Branch Review |
main
|
Run status |
Passed #9944
|
Run duration | 00m 37s |
Commit |
b6dee79e00: PROD-2663: specify dataset on fides pull (#5260)
|
Committer | Kirk Hardy |
View all properties for this run ↗︎ |
Test results | |
---|---|
Failures |
0
|
Flaky |
0
|
Pending |
0
|
Skipped |
0
|
Passing |
4
|
Upgrade your plan to view test results. | |
View all changes introduced in this branch ↗︎ |
Closes #PROD-2663
Description Of Changes
option for resource and key on push/pull
Code Changes
fides pull
now has subcommands for each resource type and those subcommands take afides_key
argument to specify which oneSteps to Confirm
All these commands should be valid
pip install -e ./
fides pull
fides pull dataset fides_db
fides pull .fides/
fides pull .fides/ -a .fides/test_resources.yml
fides pull dataset fides_db
Pre-Merge Checklist
CHANGELOG.md
main
downgrade()
migration is correct and works