Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate + make third-party consent API requests #2144

Closed
seanpreston opened this issue Jan 5, 2023 · 12 comments
Closed

Generate + make third-party consent API requests #2144

seanpreston opened this issue Jan 5, 2023 · 12 comments
Assignees

Comments

@seanpreston
Copy link
Contributor

seanpreston commented Jan 5, 2023

My basic proposal for this is to take the existing SaaS connector config and extend the DDL such that inside run_consent_request we can build + execute all requests required to execute the consent flow.

One idea I've had for this would be to add something akin to a used_for value to each request definition, e.g.

  endpoints:
    - name: messages
      requests:
        read:
          used_for: [dsr, consent]  # this would indicate that we should build this request for our consent connector
          method: GET
          path: /3.0/conversations/<conversation_id>/messages
          param_values:
            - name: conversation_id
              references:
                - dataset: <instance_fides_key>
                  field: conversations.id
                  direction: from
          data_path: conversation_messages
          postprocessors:
            - strategy: filter
              configuration:
                field: from_email
                value:
                  identity: email

A downside to this would be bloat within the config files themselves should either DSR or consent processing require a large number of requests — is this preferable to having two top-level sections? e.g.

consent_endpoints:
  - name: ...

dsr_endpoints:
  - name: ...

and then eating any request duplication that happens as a result? (ie. requests that must exist in both DSR and consent flows)

Note: this doesn't address the dataset generation issue — I'm happy to keep Dawn's workaround for the time being.

If we can agree upon one of the above as the end-state solution, it leaves us open to hard-coding the Mandrill case inside the run_consent_request method (with a check on the type of connector)

@adamsachs
Copy link
Contributor

adamsachs commented Jan 5, 2023

my gut feeling is that there won't be significant duplication/overlap between DSR endpoints and consent endpoints, to the point that they're hitting the same paths. so of your two options, the latter - the two top-level sections - definitely feels more natural to me.

to throw it out there - my initial idea on how we'd model this is that we'd actually just use a different request type, i.e. something like being able to define on any given endpoint - read, update, ... , consent. using mandrill/mailchimp as a quick example, it'd probably look something like this:

endpoints:
    - name: messages
      requests:
        read:
          method: GET
          path: /3.0/conversations/<conversation_id>/messages
          param_values:
            - name: conversation_id
              references:
                - dataset: <instance_fides_key>
                  field: conversations.id
                  direction: from
          data_path: conversation_messages
          postprocessors:
            - strategy: filter
              configuration:
                field: from_email
                value:
                  identity: email
    - name: allowlist
      requests: 
        consent:
          method: POST
          path: /allowlists/delete
          query_params:
            - name: email
              value: <email>
          param_values:
            - name: email
              identity: email

to me that felt like it'd fit pretty seamlessly into what @pattisdr's done so far in the POC, given that we've got essentially an analogous execution path for consent as we do for access and erasure requests. that being said, fitting seamlessly into what we have so far isn't the only consideration here: we need to take into account what the consent propagation APIs actually look like and how they compare to existing DSR endpoints for the provider, the dataset generation phase, etc.

in fact, now that i think through it more, i actually think the top-level sections option that you offered makes the most sense out of all 3 options laid out - it may lend itself better to differentiate the dataset generation from what is done in DSR execution, i.e. it'd probably give an easier route to decoupling nodes from particular dataset collections, which seems like something we'll want, at a minimum. and based on what i've seen in the consent APIs so far, i'm just not sure there's really a logical grouping between the endpoints used for DSR execution and those used for consent propagation - so with this in mind, having them as disjoint sets of endpoints would probably be more intuitive from the POV of someone configuring these.

@seanpreston
Copy link
Contributor Author

it'd probably give an easier route to decoupling nodes from particular dataset collections, which seems like something we'll want

This will be useful — we'll certainly want to leave our commercialisation options open too.

@galvana
Copy link
Contributor

galvana commented Jan 5, 2023

I agree with @adamsachs that the consent endpoints would most likely be separate from the data endpoints we use for access and erasure requests. With that said, I agree with @seanpreston's proposal to add a new top-level section. This is consistent with the way we already define dedicated requests liketest_request and data_protection_request (single GDRP, CCPA endpoint).

consent_requests:
  read:
    method: GET
    path: /consent
  update:
    method: POST
    path: /consent
    body: |
      {
        <consent_payload>
      }

This assumes we require read functionality as well but if we're just update consent settings we can simplify it to

consent_request:
  method: POST
  path: /consent
  body: |
    {
      <consent_payload>
    }

This approach has the benefit of keeping these two workflows separate in the config and it makes it easier to refer to since it will be at the root level instead of nested with the access and erasure requests.

@pattisdr
Copy link
Contributor

pattisdr commented Jan 8, 2023

@galvana nice proposal: Adding a new top-level section similar to data_protection_request makes a lot of sense, defining this more at the dataset level rather than the collection level too, as Adam highlighted.

@galvana @adamsachs Are my assumptions correct: ?

  • I assume these resources will still be persisted to the db via the same connector workflow, where we create a saas connector from a template
  • I assume both a ConnectionConfig and corresponding DatasetConfig will exist.
  • I assume most relevant information here to build the API requests is saved under connectionconfig.saas_config.consent_request
  • I assume we will not need anything in particular from the DatasetConfig resource, except I can still use this to build the consent request graph. (The access/erasure graphs are currently built from DatasetConfig resources).
    • For example, the graph might be built by creating one node per DatasetConfig whose ConnectionConfig's SaasConfig has consent_request instructions.

@adamsachs
Copy link
Contributor

adamsachs commented Jan 9, 2023

@pattisdr that generally looks correct to me!

i think you're getting at this, but in some ways the DatasetConfig seems like it's not really required here from an end-user/UX POV, and it's more of an artifact that our implementation constraints presuppose. i'm not saying that we need to get rid of it, just pushing a bit to see if we could perhaps, at some point, reduce any of the pieces here that aren't substantively providing functionality. or at least we can think about how we can provide some mechanisms/utilities such that an end user doesn't really need to know about this?

just as a caveat, take this with a grain of salt, as i haven't fully thought this piece through 😄 more of an initial impression and i'll continue to think it over on my end...

@pattisdr
Copy link
Contributor

pattisdr commented Jan 9, 2023

@adamsachs I understand your point, we don't technically need a DatasetConfig per se, but I do think we need some database artifact that we query from which to build the graph. Further, if we don't require a DatasetConfig, then we need a separate set of code changes to handle resources not backed by DatasetConfigs, and is that worth the lift?

So a thought experiment: what if everything we need to make the consent request is stored in ConnectionConfig.saas_config, so we don't create a DatasetConfig there. What are the next steps? I can build the graph by querying for ConnectionConfigs with consent capability directly. But how is that ConnectionConfig created in the first place? Our connector workflow creates DatasetConfigs by default in the "create from a template" flow. And what happens if a certain type of Saas Connector needs to support both access and consent requests? I create a DatasetConfig if access is defined, but not consent? Are there other cons by special-casing not creating this expected resource for consent connectors only?

@adamsachs
Copy link
Contributor

@pattisdr thanks for walking through that, very helpful to get it all spelled out! i think we're on the same page, i just wanted to run that idea through to its conclusions.

based on what you've spelled out, it is feeling to me like requiring a sort of "placeholder" DatasetConfig is better than all the special-casing, at least as things stand now. that still seems to be what you're leaning toward as well, right? @galvana if you've got strong opinions one way or the other, speak up! :)

@pattisdr
Copy link
Contributor

pattisdr commented Jan 9, 2023

@adamsachs your calling it a "placeholder" is making me second guess myself. I was still hoping that there might be useful information in the dataset yaml file, similar to how a mailchimp dataset yaml details the fields in various API requests. That we may not need to use it, but there would still be information worth persisting there. But if there's truly nothing to put in the DatasetConfig resource, that probably changes things.

(The pending release of unified-fides-resources will also necessitate more "placeholder" resources as the DatasetConfig.dataset field won't exist, instead there will be a FK to another ctl_datasets resource).

Also is this not going to be a connector of type saas but something else?

@adamsachs
Copy link
Contributor

honestly i haven't given this the thought it deserves quite yet, i may have misstated things by calling it a placeholder, perhaps there is information we'd want to store there. i think we need to spend some time looking at the consent propagation APIs and fleshing out what a config schema would look like more precisely.

to me, the assumption of yours here

I assume we will not need anything in particular from the DatasetConfig resource, except I can still use this to build the consent request graph. (The access/erasure graphs are currently built from DatasetConfig resources).

seems pretty reasonable to design around at this point. does that allow you to keep you going and let us come back to this a bit later once we can consider the details a bit more closely?

apologies for starting a digression that i hadn't really thought through. if it's blocking you now, i can prioritize some closer thinking on this now :)

@pattisdr
Copy link
Contributor

pattisdr commented Jan 9, 2023

That sounds fine to me @adamsachs! My POC branch makes the assumption you stated, I should have it ready for review by tomorrow morning! Assuming there's a DatasetConfig is an easy place to start, and we can modify from there.

@pattisdr
Copy link
Contributor

We settled on a simple configuration here #2194, where we have a top-level consent_requests key with opt_in and opt_out headers, each detailing a single saas request. The dataset here is more of a placeholder for now, but its existence is what we use to create the graph.

The current assumption that the consent saas connector framework makes is that a consent request will be sent with a single data use, and the saas connector just needs to pick whether it should opt in or opt out. More product discussion is needed to determine the mapping between data uses and more nuanced consent requests.

Mailchimp Transactional SaaS Config

saas_config:
  fides_key: <instance_fides_key>
  name: Mailchimp Transactional SaaS Config
  type: mailchimp_transactional
  description: A sample schema representing the Mailchimp Transactional (Mandrill) connector for Fides
  version: 0.0.1

  connector_params:
    - name: domain
      default_value: mandrillapp.com/api/1.0
    - name: api_key

  client_config:
    protocol: https
    host: <domain>
    authentication:
      strategy: api_key
      configuration:
        body: |
          {
            "key": "<api_key>"
          }

  test_request:
    method: GET
    path: /users/ping


  consent_requests:
    opt_in:
      method: POST
      path: /allowlists/add
      body: |
        {
          "email": "<email>"
        }

    opt_out:
      method: POST
      path: /allowlists/delete
      body: |
        {
          "email": "<email>"
        }

  endpoints: []

Mailchimp Transactional SaaS Dataset

dataset:
  - fides_key: <instance_fides_key>
    name: Mailchimp Transactional Dataset
    description: A sample dataset representing the Mailchimp Transactional connector for Fides
    collections: []

@pattisdr
Copy link
Contributor

OK marking this issue as done - there's no code to merge here, the ability to make and generate consent requests via the saas connector framework was added in #2194.

Discussed with @NevilleS we're okay with not being able to customize based on the consent request for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants