-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accommodations for differential privacy #94
Comments
At a minimum, I imagine we will need a way for: (a) is straight-forward for both Prio3 and Poplar1 since the aggregate shares are vectors over some finite field. In general, I think all we'll need syntactically is that the set of all aggregate shares form a (semi-)group. The mechanism for (b) seems like it'll depend somewhat on the measurement type. |
My intuition is that (b) would be out of scope for [V]DAF or DAP since the DP would be applied before |
I think that's probably right, except it may be desirable to add noise to the aggregate shares (a). |
Yeah, I think I agree with @tgeoghegan here. I think it makes sense to add discussion to this draft as to how clients might add local DP, and how, for example, deployments might use shuffling to amplify that DP effect, but both seem outside the scope of the draft. In contrast, (a) being an application of central DP seems like something squarely in scope? |
Maybe I am misunderstanding but I think for heavy hitters the distribution complicates things. Some of the finite field vector values are just too unlikely. If I flip a bit in "https://www.facebook.com" I might end up with "https://www.fa{ebook.com" which doesn't add meaningful privacy from a client point of view and makes the value almost useless from a collector point of view. To clarify: The client has no good way of adding DP randomness here for (b). For (a) the servers could still adjust the count for "https://www.facebook.com" by random +/-10 when computing the Private Subset Histogram solution for that string. So, doing (b) would also be difficult here. |
Note I wrote a bit of my thoughts on how we might want to do this in the DAP repo at ietf-wg-ppm/draft-ietf-ppm-dap#19 (comment). A few relevant points from that discussion:
@simon-friedberger I agree for the heavy hitters problem the client side noise addition is difficult. Your observation that a single bit flip in "facebook.com" doesn't add privacy is correct, the DP analysis would agree with you. You need to flip many bits in the entire (high entropy) output domain. However, remember that the privacy that we get is stronger than just the noise from a single client, we just want to achieve good privacy after aggregation (when everything is back in cleartext). The biggest problem I see with client-side noise is that the poplar protocol (as far as I can tell) only really works with one-hot input vectors. This means that a RAPPOR-like randomizer similar to ENPAs which randomly flips many bits independently in the whole domain would mean constructing tons of dpf keys (scaling linearly with the domain size) on the client for each noisy "1", leading to large communication and processing overhead. There may be solutions here using different local randomizers, but I think it's hard to get around the domain size problem (e.g. regular randomized response will guarantee you that you only send a one-hot vector, but then the noise scales with the domain size) |
For what it's worth, the Poplar paper describes how to add noise for (a) in Appendix E. It may be the case that this is sufficient for reasonable measures of privacy, without requiring the client to do anything. |
Thanks for bumping that thread, @csharrison.
I agree, I think the mechanism needs to live in VDAF-land. However, ideally this would require only minimal syntax changes (e.g., as suggested for (a) here: #94 (comment)) and would leave it up to applications (like a DAP deployment) to tune noise as needed. The VDAF draft could also provide detailed recommendations for adding local DP, or central DP, or both (if applicable) to the VDAFs it specifies.
I think this threat model only matters insofar as it impacts privacy. For correctness, we already concede that all of the Aggregators are trusted to execute the protocol correctly. |
Yeah this is the standard way we'd add DP in the central model. I think there may be ways to optimize the noise in that section with (discretized) gaussian/skellam noise instead of Laplace too (see https://desfontain.es/privacy/gaussian-noise.html for an intro). I also think that section assumes each noisy prefix is published publicly, which I don't think is necessarily the view of the collector in the poplar vdaf (would need to double check that). |
@schoppmp and I have been working on completing the spec for Poplar1 (#84). Once that's done, I think a great first step would be to add a subsection that spells out recommendations for adding DP to the heavy hitters computation. @csharrison would you be willing to spend time on this? Alternatively, we could get cracking on Prio3 right away. |
I am actually on pat leave right now until late Aug so unfortunately I can't spend much time on this (just enough free time to bug ya'll on issues 😄 ) . Let's just make sure what's specified in the poplar paper is possible and I might be able to help out when I get back. In general, I think we want to consider:
|
I think it would be worth supporting both (a) and (b). |
@kunal-talwar I would love to hear more about how concretely we could compose e.g. pi-rappor with poplar. It seems very non-obvious to me but I might be missing something. In pi-rappor we're sending some seed to the server, are you saying we'd encode this seed as a one-hot vector and aggregate over seeds in the VDAF? I hadn't seen the paper on ProjectiveReponse, will need to read it 😄 . In general though I agree with you about supporting both (a) and (b). The most obvious candidate requiring (b) support is the existing ENPA system. |
It is indeed not obvious (or at least wasn't obvious to me at first). But an approach along the lines you suggested works. We send the seed using poplar, and can either aggregate the seeds, or decode each seed to a vector in the original domain and aggregate those. The generalized version of PI-RAPPOR can allow additional efficiencies on top of this basic approach. We have a write-up that should be on the arxiv in a couple of weeks. |
Great discussion, I'm very happy to see engagement on this draft from folks who are deep in differential privacy. Something to keep in mind here is that this document will be developed by the CFRG ("Cryptography Forum Research Group"), which, to my knowledge, has not done a lot with DP as of yet. Thus I think a useful goal would be to ensure that (V)DAFs are compatible with a variety of mechanisms for (a) and (b), but without being too prescriptive about a particular mechanism. That said, @kunal-talwar we would welcome text in the document that describes how some of these methods might be applied to either Prio3 or Poplar1. Such text would be maximally useful if it also provided a gentle introduction to the method described. |
Just to follow up here: It's very likely that the VDAF draft is going to say nothing about DP. Our idea right now is to encapsulate the details of composing DP with a VDAF into a "DP policy" that would be used by DAP. We're working on a draft for PPM: https://github.com/wangshan/draft-wang-ppm-differential-privacy/ If adopted, I'll close this issue. |
As of this writing, PPM has not reached consensus about adopting https://github.com/wangshan/draft-wang-ppm-differential-privacy/. It's clear however that folks think there is something to be done about differential privacy, and that PPM is the right place to do it. It also seems fairly clear that whatever needs to be specified for DP is orthogonal to the VDAF draft. I'm going to close this issue after clarifying in the draft that DP is out-of-scope and emphasizing that VDAFs can be made differentially private. |
Hi folks, I've put up a PR to resolve this issue. Basically it says that VDAFs SHOULD be composed with a DP mechanism, but leaves it to the application. We don't need any changes to this draft. |
We wish for the VDAF draft to have first-class support for schemes that are amendable to differential privacy (DP). The task for this issue is to identify what changes need to be made, if any, to accommodate DP. In addition, we should consider adding guidance to the draft about best practices.
The text was updated successfully, but these errors were encountered: