-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geolocation-based sharding. #16
Comments
Here is a description of the sharding proposal I suggested in Slack. It uses a uniform quantization of space and time. Each shard corresponds to a time interval (e.g., 1-7 days) and a latitude / longitude interval (e.g., 0.25 degrees). This is chosen to be roughly city-sized (0.25 deg = 15 arcminutes = 15 nmi = 28 km along a line of longitude). The spatial blocks are narrower away from the equator, but at least 1/3 this size below 70 degrees latitude (beyond the arctic circle). This quantization is extremely easy to compute -- it involves rounding the latitude, longitude, and unix time -- so all users can determine which shards are relevant to them. |
are we still centralizing the writes? I guess this leaks data re set of locations to the server, but not time series data, so not too terrible. |
So preliminary calculations based on the above scheme say the following about bandwidth in correlation to key rotation (in a shard on the equator, in a timeframe of two weeks): At 1 rotation At 14 rotations (1 a day) At 336 rotations (1 an hour) At 1344 rotations (4 an hour) |
One perspective on sharding and anonymity is that it provides a second "knob" to turn adjusting reporter privacy vs. bandwidth, in addition to the report duration. The report duration trades the number of reports (proxy for bandwidth) against user linkability over some interval (loss of reporter privacy). This tradeoff is made with respect to the bandwidth usage over all users. However, if we do sharding, we get to change what is considered to be the set of all users, potentially allowing shorter report durations for the same bandwidth. This means that, all else fixed, implementing sharding can increase reporter privacy, rather than decrease it. |
And from a practical perspective, some degree of geo information is already known to the server, based simply on geolocation of client IP address. So geo-based sharding down to the level of precision available from IP geolocation (approximately city-level) incurs no real loss of privacy, but significantly reduces the number of reports that must be processed by every client in that city, thereby allowing us to use shorter report durations and improve overall reporter privacy. |
Geolocation from IP address is very coarse, and e.g. in Europe can point to the other side of the country, so even if we have contact, we might end up in different shards. We need to sync switching between shards with the switching of TCNs, otherwise this will also be susceptible to rollover attacks, like TCN/MAC pairs. Ideally we would generate a new reporting keypair, so no more than one shard is correlated to any given keypair. |
This issue tracks discussion on geolocation-based sharding.
The text was updated successfully, but these errors were encountered: