Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WiP: Signed measurements #116

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

nbraud
Copy link
Contributor

@nbraud nbraud commented Oct 1, 2018

Following the OONI data format session, here is a proposal for signing measurements in a format-agnostic and privacy-preserving way.

Note that this is a work-in-progress: I tried to provide rationales for all the choices I proposed, but everything in here is up for discussion, and feedback on any aspect is most definitely welcome.

@nbraud
Copy link
Contributor Author

nbraud commented Oct 1, 2018

One thing I didn't attempt to spec yet, is how organisations can fetch their probes' measurements.
As an “extreme” solution, they can download all measurements, and attempt to verify each signature against each of their probes' pubkeys, and discard measurements that weren't signed by one of their probes.

This is of course completely unpractical and unusable, so we should provide a mechanism to fetch measurements signed by a given (set of?) pubkey(s), and do so without undermining the privacy properties of the scheme; for instance, if we have an index directly matching pubkeys to sets of measurements, it could be possible to run a timing attack against it to discover the set of probe pubkeys in the index.

@hellais Is the OONI infrastructure currently in a position to attribute multiple measurements to a single probe? (As in, do we see metadata that would allow that?) If so, we can submit the pubkey alongside the measurement data (in an HTTP header?), have OONI verify the signature, and maintain an index linking pubkeys to measurements; this would not introduce new privacy concerns, and it would be easy to provide a query-by-pubkey mechanism.

@nbraud
Copy link
Contributor Author

nbraud commented Oct 2, 2018

PS: I also noticed that, while objecthash will produce the same hash for the same object regardless of the format it was serialized as, it won't help when converting from one major format version to another; from our previous conversations, I think that's not an issue?

@hellais
Copy link
Member

hellais commented Oct 18, 2018

First, thank you so much for putting this together and apologies for not getting around to reviewing it sooner.

One thing to keep in mind while thinking about this is what the threat model of OONI Probe users is, which I think is to some extent addressed in our data usage policy: https://ooni.torproject.org/about/data-policy/.
That is to say that OONI Probe is not a privacy tool, but rather an investigatory tool and as such we do "best effort" to protect them, yet the nature of the tool is such that we can't guarantee that.

Is the OONI infrastructure currently in a position to attribute multiple measurements to a single probe?

It could potentially be in a position to do that. For example all mobile probes currently are registering with the orchestra registry and they have a long term secret with the registry that they can use to perform authenticated requests to any piece of OONI Probe infrastructure (though it doesn't currently do it with the measurements collector).

If so, we can submit the pubkey alongside the measurement data (in an HTTP header?), have OONI verify the signature, and maintain an index linking pubkeys to measurements; this would not introduce new privacy concerns, and it would be easy to provide a query-by-pubkey mechanism.

I think that we don't want to keep an index of this sort in a database as it will grow quite quickly out of control (we currently collect ~10k measurements per day or ~300k per month).
All our databases at this scale can all be regenerated directly from the raw measurement data and we would like to preserve that. I think that whatever we come up with for this scheme it should work in such a way that these metrics are stamped directly on the raw data. It is possible for us to strip some fields before we publish the data and we already keep two copies of the data, so this is a possibility.

Regarding the properties that I think are desirable from this sort of system:

  1. It should be possible for us to distinguish between measurements that come from a trusted set of probes
  2. It should be possible for users of OONI Probe to retrieve all the measurements that come from their probe
  3. It should be possible for users of the OONI data to known which measurements come from a set of probes that OONI Probe trusts
  4. It should be possible for users of the OONI data to know, within the same network, country and approximate geographical location, measurements that come from the same user (For example that a certain set of measurements from Vodafone Italia are run by user abd231fe).

In terms of privacy properties, I think our number 1 concern is:

  1. We don't want anybody who has access to the public OONI data to track where a user is going to around the world, unless they gained access to some secret of their (see 2.).

To illustrate what I mean by this, let's take the naive example of just stamping onto every measurement of a user their ID and this ID doesn't change over time. If I download all the OONI data and GROUP BY user_id I can figure out where a OONI Probe user has traveled too (and run OONI Probe), because they will have measurements stamped with their user_id and different probe_cc and probe_asn.

In light of this, I believe, we should have as part of this scheme something that "refreshes" the IDs that we make available publicly (or does this process before publication), when the network of a given user changes, so that we cannot say user X was on network A and B at times T.

I also think we may not necessarily need a public/private key scheme, in light of the fact that we already have an authentication layer based on JWT tokens (see: above point on registry).

In relation to the data storage requirements, I would be OK with storing just the "temporary IDs" that are mapped to the master users credentials which change every time the user is on a different network, as these will likely be significantly less than the measurement count and is in any case bounded by the number of clients we have and the number of networks in the world.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants