Accommodations for differential privacy #94

cjpatton · 2022-06-28T15:22:45Z

We wish for the VDAF draft to have first-class support for schemes that are amendable to differential privacy (DP). The task for this issue is to identify what changes need to be made, if any, to accommodate DP. In addition, we should consider adding guidance to the draft about best practices.

cjpatton · 2022-06-28T15:28:20Z

At a minimum, I imagine we will need a way for:
(a) an Aggregator to add noise to its aggregate share
(b) a Client to add noise to its measurement

(a) is straight-forward for both Prio3 and Poplar1 since the aggregate shares are vectors over some finite field. In general, I think all we'll need syntactically is that the set of all aggregate shares form a (semi-)group. The mechanism for (b) seems like it'll depend somewhat on the measurement type.

tgeoghegan · 2022-06-28T17:20:26Z

My intuition is that (b) would be out of scope for [V]DAF or DAP since the DP would be applied before Vdaf.measurement_to_input_shares and removed after Vdaf.agg_shares_to_result. So long as a noised input is still a valid encoding of a measurement, VDAF doesn't need to know about the DP, right?

cjpatton · 2022-06-28T17:22:32Z

I think that's probably right, except it may be desirable to add noise to the aggregate shares (a).

chris-wood · 2022-06-28T17:33:45Z

Yeah, I think I agree with @tgeoghegan here. I think it makes sense to add discussion to this draft as to how clients might add local DP, and how, for example, deployments might use shuffling to amplify that DP effect, but both seem outside the scope of the draft. In contrast, (a) being an application of central DP seems like something squarely in scope?

simon-friedberger · 2022-06-29T12:16:29Z

Maybe I am misunderstanding but I think for heavy hitters the distribution complicates things. Some of the finite field vector values are just too unlikely.

If I flip a bit in "https://www.facebook.com" I might end up with "https://www.fa{ebook.com" which doesn't add meaningful privacy from a client point of view and makes the value almost useless from a collector point of view.

To clarify: The client has no good way of adding DP randomness here for (b). For (a) the servers could still adjust the count for "https://www.facebook.com" by random +/-10 when computing the Private Subset Histogram solution for that string. So, doing (b) would also be difficult here.

csharrison · 2022-06-29T13:47:27Z

Note I wrote a bit of my thoughts on how we might want to do this in the DAP repo at ietf-wg-ppm/draft-ietf-ppm-dap#19 (comment).

A few relevant points from that discussion:

Do we even want to add noise addition to VDAF vs. specify it in DAP (my preference is to do it in VDAF)
Are we concerned about malicious helpers adding "bad" noise shares such that we would want to "verify" the noise shares (I think we don't need this given our threat model)

@simon-friedberger I agree for the heavy hitters problem the client side noise addition is difficult. Your observation that a single bit flip in "facebook.com" doesn't add privacy is correct, the DP analysis would agree with you. You need to flip many bits in the entire (high entropy) output domain. However, remember that the privacy that we get is stronger than just the noise from a single client, we just want to achieve good privacy after aggregation (when everything is back in cleartext).

The biggest problem I see with client-side noise is that the poplar protocol (as far as I can tell) only really works with one-hot input vectors. This means that a RAPPOR-like randomizer similar to ENPAs which randomly flips many bits independently in the whole domain would mean constructing tons of dpf keys (scaling linearly with the domain size) on the client for each noisy "1", leading to large communication and processing overhead.

There may be solutions here using different local randomizers, but I think it's hard to get around the domain size problem (e.g. regular randomized response will guarantee you that you only send a one-hot vector, but then the noise scales with the domain size)

chris-wood · 2022-06-29T15:05:24Z

For what it's worth, the Poplar paper describes how to add noise for (a) in Appendix E. It may be the case that this is sufficient for reasonable measures of privacy, without requiring the client to do anything.

cjpatton · 2022-06-29T15:07:52Z

Thanks for bumping that thread, @csharrison.

Do we even want to add noise addition to VDAF vs. specify it in DAP (my preference is to do it in VDAF)

I agree, I think the mechanism needs to live in VDAF-land. However, ideally this would require only minimal syntax changes (e.g., as suggested for (a) here: #94 (comment)) and would leave it up to applications (like a DAP deployment) to tune noise as needed. The VDAF draft could also provide detailed recommendations for adding local DP, or central DP, or both (if applicable) to the VDAFs it specifies.

Are we concerned about malicious helpers adding "bad" noise shares such that we would want to "verify" the noise shares (I think we don't need this given our threat model)

I think this threat model only matters insofar as it impacts privacy. For correctness, we already concede that all of the Aggregators are trusted to execute the protocol correctly.

csharrison · 2022-06-29T16:04:22Z

For what it's worth, the Poplar paper describes how to add noise for (a) in Appendix E. It may be the case that this is sufficient for reasonable measures of privacy, without requiring the client to do anything.

Yeah this is the standard way we'd add DP in the central model. I think there may be ways to optimize the noise in that section with (discretized) gaussian/skellam noise instead of Laplace too (see https://desfontain.es/privacy/gaussian-noise.html for an intro).

I also think that section assumes each noisy prefix is published publicly, which I don't think is necessarily the view of the collector in the poplar vdaf (would need to double check that).

cjpatton · 2022-06-29T16:24:15Z

@schoppmp and I have been working on completing the spec for Poplar1 (#84). Once that's done, I think a great first step would be to add a subsection that spells out recommendations for adding DP to the heavy hitters computation. @csharrison would you be willing to spend time on this?

Alternatively, we could get cracking on Prio3 right away.

csharrison · 2022-06-29T17:16:51Z

@schoppmp and I have been working on completing the spec for Poplar1 (#84). Once that's done, I think a great first step would be to add a subsection that spells out recommendations for adding DP to the heavy hitters computation. @csharrison would you be willing to spend time on this?

I am actually on pat leave right now until late Aug so unfortunately I can't spend much time on this (just enough free time to bug ya'll on issues 😄 ) . Let's just make sure what's specified in the poplar paper is possible and I might be able to help out when I get back. In general, I think we want to consider:

Alternative noise mechanisms other than Laplace (this should be pretty simple)
Configurations where noise is only added at the last step of the heavy hitters protocol, vs. adding it to every prefix and publishing those prefixes publicly (this means that a malicious server could publish non-DP prefix counts, which may be OK in some deployments).

kunal-talwar · 2022-06-29T21:24:37Z

I think it would be worth supporting both (a) and (b).
For (a), having an option to add central DP noise to the aggregate makes sense, and can perhaps be implemented as part of Daf.out_shares_to_agg_share() and this can be specified in each VDAF.
For (b): there are many ways of doing local randomization which are compatible with prio/poplar. In particular, for frequency estimation / heavy hitters, when working over a small universe, RAPPOR can be used in conjunction with pro. Over larger universes, PI-RAPPOR (https://arxiv.org/abs/2102.12099) and ProjectiveResponse (https://arxiv.org/abs/2203.00194) can be used to randomize to a one-hot vector over a smaller domain. This makes these well-compatible with prio or poplar, and I could imagine, for example, pi-rappor+poplar to be a useful VDAF.
The shuffling/aggregation based privacy amplification results though depend on a the batch over which we are summing to be large enough. This application therefore would make a strong case to have the min_batch_size be a natively supported feature in DAP. E.g. it would be great to have a way for the task parameter to specify just the minimum batch size and have the servers agree on a batch boundary that is close to getting this minimum batch size, without having to predict the batch interval in advance.

csharrison · 2022-06-30T03:15:00Z

@kunal-talwar I would love to hear more about how concretely we could compose e.g. pi-rappor with poplar. It seems very non-obvious to me but I might be missing something. In pi-rappor we're sending some seed to the server, are you saying we'd encode this seed as a one-hot vector and aggregate over seeds in the VDAF?

I hadn't seen the paper on ProjectiveReponse, will need to read it 😄 . In general though I agree with you about supporting both (a) and (b). The most obvious candidate requiring (b) support is the existing ENPA system.

kunal-talwar · 2022-06-30T15:30:44Z

It is indeed not obvious (or at least wasn't obvious to me at first). But an approach along the lines you suggested works. We send the seed using poplar, and can either aggregate the seeds, or decode each seed to a vector in the original domain and aggregate those. The generalized version of PI-RAPPOR can allow additional efficiencies on top of this basic approach. We have a write-up that should be on the arxiv in a couple of weeks.

cjpatton · 2022-07-08T14:47:22Z

Great discussion, I'm very happy to see engagement on this draft from folks who are deep in differential privacy. Something to keep in mind here is that this document will be developed by the CFRG ("Cryptography Forum Research Group"), which, to my knowledge, has not done a lot with DP as of yet. Thus I think a useful goal would be to ensure that (V)DAFs are compatible with a variety of mechanisms for (a) and (b), but without being too prescriptive about a particular mechanism.

That said, @kunal-talwar we would welcome text in the document that describes how some of these methods might be applied to either Prio3 or Poplar1. Such text would be maximally useful if it also provided a gentle introduction to the method described.

cjpatton · 2023-10-17T01:16:57Z

Just to follow up here: It's very likely that the VDAF draft is going to say nothing about DP. Our idea right now is to encapsulate the details of composing DP with a VDAF into a "DP policy" that would be used by DAP. We're working on a draft for PPM: https://github.com/wangshan/draft-wang-ppm-differential-privacy/

If adopted, I'll close this issue.

cjpatton · 2024-01-25T17:48:35Z

As of this writing, PPM has not reached consensus about adopting https://github.com/wangshan/draft-wang-ppm-differential-privacy/. It's clear however that folks think there is something to be done about differential privacy, and that PPM is the right place to do it. It also seems fairly clear that whatever needs to be specified for DP is orthogonal to the VDAF draft.

I'm going to close this issue after clarifying in the draft that DP is out-of-scope and emphasizing that VDAFs can be made differentially private.

cjpatton · 2024-05-01T21:18:50Z

Hi folks, I've put up a PR to resolve this issue. Basically it says that VDAFs SHOULD be composed with a DP mechanism, but leaves it to the application. We don't need any changes to this draft.

cjpatton mentioned this issue Jun 29, 2022

How to integrate our custom type into prio divviup/libprio-rs#257

Closed

cjpatton mentioned this issue Jul 28, 2022

Review: Henry Corrigan-Gibbs #102

Closed

7 tasks

This was referenced Sep 8, 2022

Add task configuration schema and describe client behaviour ietf-wg-ppm/draft-ietf-ppm-dap-taskprov#2

Merged

Clean up TaskConfig definition ietf-wg-ppm/draft-ietf-ppm-dap-taskprov#7

Merged

MxmUrw mentioned this issue Feb 1, 2023

Implementing differential privacy for the bounded-norm vector type divviup/libprio-rs#440

Closed

cjpatton mentioned this issue Feb 8, 2023

VDAF security analysis: Requirements for report processing ietf-wg-ppm/draft-ietf-ppm-dap#408

Closed

cjpatton added the editorial label Jan 25, 2024

cjpatton self-assigned this Jan 25, 2024

cjpatton mentioned this issue May 1, 2024

Clarify that VDAFs aren't sufficient for DP #339

Merged

cjpatton added the draft-09 label May 2, 2024

cjpatton closed this as completed in #339 May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accommodations for differential privacy #94

Accommodations for differential privacy #94

cjpatton commented Jun 28, 2022

cjpatton commented Jun 28, 2022

tgeoghegan commented Jun 28, 2022

cjpatton commented Jun 28, 2022 •

edited

Loading

chris-wood commented Jun 28, 2022

simon-friedberger commented Jun 29, 2022 •

edited

Loading

csharrison commented Jun 29, 2022

chris-wood commented Jun 29, 2022 •

edited

Loading

cjpatton commented Jun 29, 2022 •

edited

Loading

csharrison commented Jun 29, 2022 •

edited

Loading

cjpatton commented Jun 29, 2022 •

edited

Loading

csharrison commented Jun 29, 2022 •

edited

Loading

kunal-talwar commented Jun 29, 2022

csharrison commented Jun 30, 2022

kunal-talwar commented Jun 30, 2022

cjpatton commented Jul 8, 2022

cjpatton commented Oct 17, 2023

cjpatton commented Jan 25, 2024

cjpatton commented May 1, 2024

Accommodations for differential privacy #94

Accommodations for differential privacy #94

Comments

cjpatton commented Jun 28, 2022

cjpatton commented Jun 28, 2022

tgeoghegan commented Jun 28, 2022

cjpatton commented Jun 28, 2022 • edited Loading

chris-wood commented Jun 28, 2022

simon-friedberger commented Jun 29, 2022 • edited Loading

csharrison commented Jun 29, 2022

chris-wood commented Jun 29, 2022 • edited Loading

cjpatton commented Jun 29, 2022 • edited Loading

csharrison commented Jun 29, 2022 • edited Loading

cjpatton commented Jun 29, 2022 • edited Loading

csharrison commented Jun 29, 2022 • edited Loading

kunal-talwar commented Jun 29, 2022

csharrison commented Jun 30, 2022

kunal-talwar commented Jun 30, 2022

cjpatton commented Jul 8, 2022

cjpatton commented Oct 17, 2023

cjpatton commented Jan 25, 2024

cjpatton commented May 1, 2024

cjpatton commented Jun 28, 2022 •

edited

Loading

simon-friedberger commented Jun 29, 2022 •

edited

Loading

chris-wood commented Jun 29, 2022 •

edited

Loading

cjpatton commented Jun 29, 2022 •

edited

Loading

csharrison commented Jun 29, 2022 •

edited

Loading

cjpatton commented Jun 29, 2022 •

edited

Loading

csharrison commented Jun 29, 2022 •

edited

Loading