Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accommodations for differential privacy #94

Closed
cjpatton opened this issue Jun 28, 2022 · 18 comments · Fixed by #339
Closed

Accommodations for differential privacy #94

cjpatton opened this issue Jun 28, 2022 · 18 comments · Fixed by #339
Assignees

Comments

@cjpatton
Copy link
Collaborator

We wish for the VDAF draft to have first-class support for schemes that are amendable to differential privacy (DP). The task for this issue is to identify what changes need to be made, if any, to accommodate DP. In addition, we should consider adding guidance to the draft about best practices.

@cjpatton
Copy link
Collaborator Author

At a minimum, I imagine we will need a way for:
(a) an Aggregator to add noise to its aggregate share
(b) a Client to add noise to its measurement

(a) is straight-forward for both Prio3 and Poplar1 since the aggregate shares are vectors over some finite field. In general, I think all we'll need syntactically is that the set of all aggregate shares form a (semi-)group. The mechanism for (b) seems like it'll depend somewhat on the measurement type.

@tgeoghegan
Copy link
Contributor

My intuition is that (b) would be out of scope for [V]DAF or DAP since the DP would be applied before Vdaf.measurement_to_input_shares and removed after Vdaf.agg_shares_to_result. So long as a noised input is still a valid encoding of a measurement, VDAF doesn't need to know about the DP, right?

@cjpatton
Copy link
Collaborator Author

cjpatton commented Jun 28, 2022

I think that's probably right, except it may be desirable to add noise to the aggregate shares (a).

@chris-wood
Copy link

Yeah, I think I agree with @tgeoghegan here. I think it makes sense to add discussion to this draft as to how clients might add local DP, and how, for example, deployments might use shuffling to amplify that DP effect, but both seem outside the scope of the draft. In contrast, (a) being an application of central DP seems like something squarely in scope?

@simon-friedberger
Copy link
Contributor

simon-friedberger commented Jun 29, 2022

Maybe I am misunderstanding but I think for heavy hitters the distribution complicates things. Some of the finite field vector values are just too unlikely.

If I flip a bit in "https://www.facebook.com" I might end up with "https://www.fa{ebook.com" which doesn't add meaningful privacy from a client point of view and makes the value almost useless from a collector point of view.

To clarify: The client has no good way of adding DP randomness here for (b). For (a) the servers could still adjust the count for "https://www.facebook.com" by random +/-10 when computing the Private Subset Histogram solution for that string. So, doing (b) would also be difficult here.

@csharrison
Copy link

Note I wrote a bit of my thoughts on how we might want to do this in the DAP repo at ietf-wg-ppm/draft-ietf-ppm-dap#19 (comment).

A few relevant points from that discussion:

  • Do we even want to add noise addition to VDAF vs. specify it in DAP (my preference is to do it in VDAF)
  • Are we concerned about malicious helpers adding "bad" noise shares such that we would want to "verify" the noise shares (I think we don't need this given our threat model)

@simon-friedberger I agree for the heavy hitters problem the client side noise addition is difficult. Your observation that a single bit flip in "facebook.com" doesn't add privacy is correct, the DP analysis would agree with you. You need to flip many bits in the entire (high entropy) output domain. However, remember that the privacy that we get is stronger than just the noise from a single client, we just want to achieve good privacy after aggregation (when everything is back in cleartext).

The biggest problem I see with client-side noise is that the poplar protocol (as far as I can tell) only really works with one-hot input vectors. This means that a RAPPOR-like randomizer similar to ENPAs which randomly flips many bits independently in the whole domain would mean constructing tons of dpf keys (scaling linearly with the domain size) on the client for each noisy "1", leading to large communication and processing overhead.

There may be solutions here using different local randomizers, but I think it's hard to get around the domain size problem (e.g. regular randomized response will guarantee you that you only send a one-hot vector, but then the noise scales with the domain size)

@chris-wood
Copy link

chris-wood commented Jun 29, 2022

For what it's worth, the Poplar paper describes how to add noise for (a) in Appendix E. It may be the case that this is sufficient for reasonable measures of privacy, without requiring the client to do anything.

@cjpatton
Copy link
Collaborator Author

cjpatton commented Jun 29, 2022

Thanks for bumping that thread, @csharrison.

  • Do we even want to add noise addition to VDAF vs. specify it in DAP (my preference is to do it in VDAF)

I agree, I think the mechanism needs to live in VDAF-land. However, ideally this would require only minimal syntax changes (e.g., as suggested for (a) here: #94 (comment)) and would leave it up to applications (like a DAP deployment) to tune noise as needed. The VDAF draft could also provide detailed recommendations for adding local DP, or central DP, or both (if applicable) to the VDAFs it specifies.

  • Are we concerned about malicious helpers adding "bad" noise shares such that we would want to "verify" the noise shares (I think we don't need this given our threat model)

I think this threat model only matters insofar as it impacts privacy. For correctness, we already concede that all of the Aggregators are trusted to execute the protocol correctly.

@csharrison
Copy link

csharrison commented Jun 29, 2022

For what it's worth, the Poplar paper describes how to add noise for (a) in Appendix E. It may be the case that this is sufficient for reasonable measures of privacy, without requiring the client to do anything.

Yeah this is the standard way we'd add DP in the central model. I think there may be ways to optimize the noise in that section with (discretized) gaussian/skellam noise instead of Laplace too (see https://desfontain.es/privacy/gaussian-noise.html for an intro).

I also think that section assumes each noisy prefix is published publicly, which I don't think is necessarily the view of the collector in the poplar vdaf (would need to double check that).

@cjpatton
Copy link
Collaborator Author

cjpatton commented Jun 29, 2022

@schoppmp and I have been working on completing the spec for Poplar1 (#84). Once that's done, I think a great first step would be to add a subsection that spells out recommendations for adding DP to the heavy hitters computation. @csharrison would you be willing to spend time on this?

Alternatively, we could get cracking on Prio3 right away.

@csharrison
Copy link

csharrison commented Jun 29, 2022

@schoppmp and I have been working on completing the spec for Poplar1 (#84). Once that's done, I think a great first step would be to add a subsection that spells out recommendations for adding DP to the heavy hitters computation. @csharrison would you be willing to spend time on this?

I am actually on pat leave right now until late Aug so unfortunately I can't spend much time on this (just enough free time to bug ya'll on issues 😄 ) . Let's just make sure what's specified in the poplar paper is possible and I might be able to help out when I get back. In general, I think we want to consider:

  • Alternative noise mechanisms other than Laplace (this should be pretty simple)
  • Configurations where noise is only added at the last step of the heavy hitters protocol, vs. adding it to every prefix and publishing those prefixes publicly (this means that a malicious server could publish non-DP prefix counts, which may be OK in some deployments).

@kunal-talwar
Copy link

I think it would be worth supporting both (a) and (b).
For (a), having an option to add central DP noise to the aggregate makes sense, and can perhaps be implemented as part of Daf.out_shares_to_agg_share() and this can be specified in each VDAF.
For (b): there are many ways of doing local randomization which are compatible with prio/poplar. In particular, for frequency estimation / heavy hitters, when working over a small universe, RAPPOR can be used in conjunction with pro. Over larger universes, PI-RAPPOR (https://arxiv.org/abs/2102.12099) and ProjectiveResponse (https://arxiv.org/abs/2203.00194) can be used to randomize to a one-hot vector over a smaller domain. This makes these well-compatible with prio or poplar, and I could imagine, for example, pi-rappor+poplar to be a useful VDAF.
The shuffling/aggregation based privacy amplification results though depend on a the batch over which we are summing to be large enough. This application therefore would make a strong case to have the min_batch_size be a natively supported feature in DAP. E.g. it would be great to have a way for the task parameter to specify just the minimum batch size and have the servers agree on a batch boundary that is close to getting this minimum batch size, without having to predict the batch interval in advance.

@csharrison
Copy link

@kunal-talwar I would love to hear more about how concretely we could compose e.g. pi-rappor with poplar. It seems very non-obvious to me but I might be missing something. In pi-rappor we're sending some seed to the server, are you saying we'd encode this seed as a one-hot vector and aggregate over seeds in the VDAF?

I hadn't seen the paper on ProjectiveReponse, will need to read it 😄 . In general though I agree with you about supporting both (a) and (b). The most obvious candidate requiring (b) support is the existing ENPA system.

@kunal-talwar
Copy link

It is indeed not obvious (or at least wasn't obvious to me at first). But an approach along the lines you suggested works. We send the seed using poplar, and can either aggregate the seeds, or decode each seed to a vector in the original domain and aggregate those. The generalized version of PI-RAPPOR can allow additional efficiencies on top of this basic approach. We have a write-up that should be on the arxiv in a couple of weeks.

@cjpatton
Copy link
Collaborator Author

cjpatton commented Jul 8, 2022

Great discussion, I'm very happy to see engagement on this draft from folks who are deep in differential privacy. Something to keep in mind here is that this document will be developed by the CFRG ("Cryptography Forum Research Group"), which, to my knowledge, has not done a lot with DP as of yet. Thus I think a useful goal would be to ensure that (V)DAFs are compatible with a variety of mechanisms for (a) and (b), but without being too prescriptive about a particular mechanism.

That said, @kunal-talwar we would welcome text in the document that describes how some of these methods might be applied to either Prio3 or Poplar1. Such text would be maximally useful if it also provided a gentle introduction to the method described.

@cjpatton
Copy link
Collaborator Author

Just to follow up here: It's very likely that the VDAF draft is going to say nothing about DP. Our idea right now is to encapsulate the details of composing DP with a VDAF into a "DP policy" that would be used by DAP. We're working on a draft for PPM: https://github.com/wangshan/draft-wang-ppm-differential-privacy/

If adopted, I'll close this issue.

@cjpatton
Copy link
Collaborator Author

As of this writing, PPM has not reached consensus about adopting https://github.com/wangshan/draft-wang-ppm-differential-privacy/. It's clear however that folks think there is something to be done about differential privacy, and that PPM is the right place to do it. It also seems fairly clear that whatever needs to be specified for DP is orthogonal to the VDAF draft.

I'm going to close this issue after clarifying in the draft that DP is out-of-scope and emphasizing that VDAFs can be made differentially private.

@cjpatton
Copy link
Collaborator Author

cjpatton commented May 1, 2024

Hi folks, I've put up a PR to resolve this issue. Basically it says that VDAFs SHOULD be composed with a DP mechanism, but leaves it to the application. We don't need any changes to this draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants