Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geolocation-based sharding. #16

Open
hdevalence opened this issue Apr 1, 2020 · 6 comments
Open

Geolocation-based sharding. #16

hdevalence opened this issue Apr 1, 2020 · 6 comments

Comments

@hdevalence
Copy link
Collaborator

This issue tracks discussion on geolocation-based sharding.

@hdevalence
Copy link
Collaborator Author

Here is a description of the sharding proposal I suggested in Slack. It uses a uniform quantization of space and time. Each shard corresponds to a time interval (e.g., 1-7 days) and a latitude / longitude interval (e.g., 0.25 degrees). This is chosen to be roughly city-sized (0.25 deg = 15 arcminutes = 15 nmi = 28 km along a line of longitude). The spatial blocks are narrower away from the equator, but at least 1/3 this size below 70 degrees latitude (beyond the arctic circle).

This quantization is extremely easy to compute -- it involves rounding the latitude, longitude, and unix time -- so all users can determine which shards are relevant to them.

@GallagherCommaJack
Copy link

are we still centralizing the writes? I guess this leaks data re set of locations to the server, but not time series data, so not too terrible.

@degregat
Copy link
Contributor

degregat commented Apr 6, 2020

So preliminary calculations based on the above scheme say the following about bandwidth in correlation to key rotation (in a shard on the equator, in a timeframe of two weeks):

At 1 rotation
download per user is: 0.45 to 2.54 MB
total download per day is: 3000 to 16600 GB

At 14 rotations (1 a day)
download per user is: 6.41 to 35.6 MB
total download per day is: 42000 to 233000 GB

At 336 rotations (1 an hour)
download per user is: 154MB and 856 MB
total download per day is: 1.0e6 to 5.6e6 GB

At 1344 rotations (4 an hour)
download per user is: 616 to 3420 MB
total download per day is: 4.03e6 to 2.24e7 GB

@hdevalence
Copy link
Collaborator Author

One perspective on sharding and anonymity is that it provides a second "knob" to turn adjusting reporter privacy vs. bandwidth, in addition to the report duration.

The report duration trades the number of reports (proxy for bandwidth) against user linkability over some interval (loss of reporter privacy). This tradeoff is made with respect to the bandwidth usage over all users. However, if we do sharding, we get to change what is considered to be the set of all users, potentially allowing shorter report durations for the same bandwidth. This means that, all else fixed, implementing sharding can increase reporter privacy, rather than decrease it.

@scottleibrand
Copy link
Contributor

And from a practical perspective, some degree of geo information is already known to the server, based simply on geolocation of client IP address. So geo-based sharding down to the level of precision available from IP geolocation (approximately city-level) incurs no real loss of privacy, but significantly reduces the number of reports that must be processed by every client in that city, thereby allowing us to use shorter report durations and improve overall reporter privacy.

@degregat
Copy link
Contributor

degregat commented Apr 8, 2020

Geolocation from IP address is very coarse, and e.g. in Europe can point to the other side of the country, so even if we have contact, we might end up in different shards.

We need to sync switching between shards with the switching of TCNs, otherwise this will also be susceptible to rollover attacks, like TCN/MAC pairs. Ideally we would generate a new reporting keypair, so no more than one shard is correlated to any given keypair.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants