Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree-like hash ratchet #40

Open
burdges opened this issue Apr 6, 2020 · 4 comments
Open

Tree-like hash ratchet #40

burdges opened this issue Apr 6, 2020 · 4 comments

Comments

@burdges
Copy link

burdges commented Apr 6, 2020

In the current design, if users can create a time barrier that permits them to disclose CENs before a specified point in time, but only if they start a new report authorization key. There are two problems with this approach: Users suck at thinking about privacy in advance. Users might start too many report authorization keys.

Instead, you could assume some fixed rotation schedule for report authorization keys consisting of 2^d broadcasts for some fixed d, but produce CENs using a tree-like hashing structure

    cek_0 = rak
    (cek_{2i}, cek_{2i+1} ← H_cek(cek_i).
    cen_i = cek_{2^d+i}

At any time, the application only requires O(d) storage, which appears negligible. Now users reveal whatever cek_j they like along with the depth log_2 j, but not j itself. Reveals need not cover the cen_i.

Initial versions could simply use this hashing strategy, while leaving the reveal control interface to future work. In particular charging logs or GPS logs could exclude time spent at home automatically. There is a small privacy risk in revealing the depth log_2 j, but the system could randomly do slightly deeper reveals as cover.

@burdges burdges changed the title Tree-like hashing Tree-like hash ratchet Apr 6, 2020
@hdevalence
Copy link
Collaborator

hdevalence commented Apr 6, 2020

Hey Jeff,

Thanks for the suggestion. Kenny Paterson mentioned something similar in the context of DP-3T in a call yesterday -- do you know whether there any detailed writeups floating around?

Tree hashing sounds like a nice idea, but I worry that once all the details are worked out, it will pose significant additional complexity that doesn't outweigh the benefits.

First, I think in your description, iterating through the CENs in index order iterates through the tree in depth order, so it doesn't actually let you select time intervals by revealing a root. I think to do that, you would need to generate CENs from the leaves of the tree. (Maybe I am misreading your notation, so in what follows I'm going to assume that CENs are generated from the leaves).

I don't think it's sufficient to reveal just a tree root (or rather, that having that capability alone is not sufficient to justify the change), because this only allows you to select a single 2^k-sized, 2^k-aligned subtree. But this probably doesn't actually align with the time interval that a user actually wants to disclose. This can be patched up by specifying multiple tree roots, so that a user can specify a particular, custom range (here I'm using H to denote hidden, R to denote revealed, r to denote revealed by a higher-level root):

         H               (root CEK)
        / \       
       /   \      
      /     \     
     /       \    
     H       H       
    / \     / \   
   /   \   /   \  
   H   R   H   H   
  / \ / \ / \ / \ 
  H R r r R H H H (leaf CEKs)
  | | | | | | | |
  v v v v v v v v
  0 1 2 3 4 5 6 7  (CEN indexes)

This means that reporting clients now have to do relatively more complex tree traversal logic to select the reporting interval, and the report format has to include multiple tree roots, and presumably some kind of limit on how many roots can be included. However, it does mean that reporters can select an arbitrary subset of the CENs they report, which is nice.

In contrast, in the current protocol, I'm not sure that the problems that you mentioned will actually be significant in practice. I don't think users are required to think about privacy in advance. Their user-agent can automatically start a new report at some regular interval (say 1-4 per day), then submit multiple reports when they wish to disclose. I think that the common case would then be the one where j_1 = 0, in which case having more fine-grained reporting tools doesn't help much.

@burdges
Copy link
Author

burdges commented Apr 7, 2020

Yes, I wrote it recursively using "heap addressing" so 2^d+i with i>0 runs over the leaves. You'd implement the iteration over CENs with a right-leaning copath Vec<Option<[u8; 16]>> from the root to your current leaf, not heap addressing.

You'd regenerate any desired roots when revealing because you only hash twice as many times as currently, and bluetooth prevents consuming CENs faster than one per minute anyways.

Yes, anytime you reveal a 2^k size subtree then you'd reveal subtrees of many smaller sizes too, exactly like your picture.

I agree people cannot really remember exactly when they visited some embarrassing place while coughing up their lungs. ;) I'd mostly envision the resolution being useful for automated tools that separate or omit time spent in different places. I've no idea if anyone would add should tools but this opens the door.

@hdevalence
Copy link
Collaborator

Thanks for the clarification!

I guess the big-picture question is whether a tree-like hash ratchet is useful at all, if the lifetime of each report is dialed down to try to prevent tracking of reporters. If each report only covers, say, 4 hours, is it useful to be able to exclude 10-minute intervals?

@burdges
Copy link
Author

burdges commented Apr 13, 2020

In practice, I suspect one could report every 1 minute period separately because any contact tracing fails anyways beyond relative few people. shrug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants