Geolocation-based sharding. #16

hdevalence · 2020-04-01T05:58:43Z

This issue tracks discussion on geolocation-based sharding.

hdevalence · 2020-04-01T06:21:52Z

Here is a description of the sharding proposal I suggested in Slack. It uses a uniform quantization of space and time. Each shard corresponds to a time interval (e.g., 1-7 days) and a latitude / longitude interval (e.g., 0.25 degrees). This is chosen to be roughly city-sized (0.25 deg = 15 arcminutes = 15 nmi = 28 km along a line of longitude). The spatial blocks are narrower away from the equator, but at least 1/3 this size below 70 degrees latitude (beyond the arctic circle).

This quantization is extremely easy to compute -- it involves rounding the latitude, longitude, and unix time -- so all users can determine which shards are relevant to them.

GallagherCommaJack · 2020-04-04T01:06:38Z

are we still centralizing the writes? I guess this leaks data re set of locations to the server, but not time series data, so not too terrible.

degregat · 2020-04-06T17:30:12Z

So preliminary calculations based on the above scheme say the following about bandwidth in correlation to key rotation (in a shard on the equator, in a timeframe of two weeks):

At 1 rotation
download per user is: 0.45 to 2.54 MB
total download per day is: 3000 to 16600 GB

At 14 rotations (1 a day)
download per user is: 6.41 to 35.6 MB
total download per day is: 42000 to 233000 GB

At 336 rotations (1 an hour)
download per user is: 154MB and 856 MB
total download per day is: 1.0e6 to 5.6e6 GB

At 1344 rotations (4 an hour)
download per user is: 616 to 3420 MB
total download per day is: 4.03e6 to 2.24e7 GB

hdevalence · 2020-04-06T21:24:03Z

One perspective on sharding and anonymity is that it provides a second "knob" to turn adjusting reporter privacy vs. bandwidth, in addition to the report duration.

The report duration trades the number of reports (proxy for bandwidth) against user linkability over some interval (loss of reporter privacy). This tradeoff is made with respect to the bandwidth usage over all users. However, if we do sharding, we get to change what is considered to be the set of all users, potentially allowing shorter report durations for the same bandwidth. This means that, all else fixed, implementing sharding can increase reporter privacy, rather than decrease it.

scottleibrand · 2020-04-07T05:19:13Z

And from a practical perspective, some degree of geo information is already known to the server, based simply on geolocation of client IP address. So geo-based sharding down to the level of precision available from IP geolocation (approximately city-level) incurs no real loss of privacy, but significantly reduces the number of reports that must be processed by every client in that city, thereby allowing us to use shorter report durations and improve overall reporter privacy.

degregat · 2020-04-08T20:15:24Z

Geolocation from IP address is very coarse, and e.g. in Europe can point to the other side of the country, so even if we have contact, we might end up in different shards.

We need to sync switching between shards with the switching of TCNs, otherwise this will also be susceptible to rollover attacks, like TCN/MAC pairs. Ideally we would generate a new reporting keypair, so no more than one shard is correlated to any given keypair.

hdevalence mentioned this issue Apr 1, 2020

Add CEN Reporting Proposal. #15

Closed

degregat mentioned this issue Apr 6, 2020

Added sharding calculations #41

Closed

elliemdaw mentioned this issue Apr 24, 2020

When consuming the TCNs, utilize k-anonymity #59

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Geolocation-based sharding. #16

Geolocation-based sharding. #16

hdevalence commented Apr 1, 2020

hdevalence commented Apr 1, 2020

GallagherCommaJack commented Apr 4, 2020

degregat commented Apr 6, 2020 •

edited

Loading

hdevalence commented Apr 6, 2020

scottleibrand commented Apr 7, 2020

degregat commented Apr 8, 2020

Geolocation-based sharding. #16

Geolocation-based sharding. #16

Comments

hdevalence commented Apr 1, 2020

hdevalence commented Apr 1, 2020

GallagherCommaJack commented Apr 4, 2020

degregat commented Apr 6, 2020 • edited Loading

hdevalence commented Apr 6, 2020

scottleibrand commented Apr 7, 2020

degregat commented Apr 8, 2020

degregat commented Apr 6, 2020 •

edited

Loading