Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DIDs as Enhanced URNs #457

Merged
merged 14 commits into from
Jan 31, 2021
Merged

DIDs as Enhanced URNs #457

merged 14 commits into from
Jan 31, 2021

Conversation

talltree
Copy link
Contributor

@talltree talltree commented Nov 13, 2020

The note on persistence of DIDs was revised per the text discussed on slide 34 of the presentation deck for the DID WG TPAC virtual F2F session.


Preview | Diff

Copy link
Member

@kdenhartog kdenhartog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very warranted warning. Thank you for writing up this PR in a way that cleaning nuances the tradeoffs and properly sets expectations!

.gitignore Outdated Show resolved Hide resolved
Copy link
Contributor

@jandrieu jandrieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this continues the miscommunication that DIDs can be bound permanently.

This is neither true nor is it desirable.

DIDs have no innate connection to any particular subject.

As identifiers they may be used to refer to particular entities, e.g., as the subject property of a Verifiable Credential, but on their own, they are no more permanently bound to one and only one entity that the term "Mom" or "Mary Smith" is bound to one and only one person for their entire life. Additional information is ALWAYS necessary to determine which entity an issuing a VC with a DID Subject really means. With sufficient assertions, it may be reasonable to accept that an identifier refers to a mutually understood physical person and to correlate those assertions with that person. However, the systems that accept such a binding SHOULD evaluate the authority & authenticity of those assertions before accepting them--as well as the evidence that the assertions were appropriately related to the intended entity. They SHOULD also be prepared to update that dependency should those assertions come under question.

Please clarify that any link between a DID and its subject can only be understood within a specific context and the confidence in that link is subject to evaluation of any and all evidence that might support or detract from it.

All that DIDs do is provide a way to attach a flexible cryptographic proof of control mechanism to an identifier without relying on a trusted party. It is at best one factor in what, for many use cases, will need to be a multi-factor assessment of identity.

Identifiers are only valid within a given context. Attempts to assert global and permanent binding of a given identifier to a specific physical person is a privacy fail and a human rights nightmare. Attempting to rely on the same for corporate entities is a security failure.

The permanence that is alluded to in this write up is the privacy problem that we need to avoid.

@kdenhartog
Copy link
Member

Unfortunately, this continues the miscommunication that DIDs can be bound permanently.
This is neither true...

While there's many cases where I agree the permanence is squishy at best and untrue in most cases, I can't agree with this statement.

  1. If the DID Subject is a cryptographic key using did:key it can be permanently bound.
  2. If the DID Subject is an information resource and the DID is a hash of the information resource than it's cryptographically bound to that information resource.

nor is it desirable.

This is a subjective assertion that assumes the usage of DIDs is only for personal identifiers. With the addition of did:schema to the did registry we're past the point of saying that's the only use case we're addressing with DIDs.

For these reasons it's my belief that this PR accurately reflects the landscape of implementations available today and doesn't mislead or misconstrue the circumstances and is the reason I'm +1 to the addition even though I agree with your assertion that in personal use cases it's problematic to assume it which it points out in the caveats section.

index.html Outdated Show resolved Hide resolved
@jandrieu
Copy link
Contributor

@kdenhartog wrote

This is a subjective assertion that assumes the usage of DIDs is only for personal identifiers. With the addition of did:schema to the did registry we're past the point of saying that's the only use case we're addressing with DIDs.

I'm not asserting DIDs are only for personal use.

I am saying that DIDs are for personal use and that any statement about DIDs as a whole, must also be true for their use by individuals. To say that you can bind a DID to a subject suggests that this is both standard practice and a desirable thing. It is not.

I stand by the original statement.

DIDs cannot be permanently bound to a real-world entity.

There are those who are using DIDs which are deterministically generated by a representation of an entity, but that is not a binding, it is simply a mathematical equivalence. For example, using a hash as a DID doesn't bind that DID to the subject, it is already deterministically bound to that subject, whether or not that hash is used as a DID. Math is math.

I'll go further and say that the use of DIDs that are deterministically generated, IMO, fall outside the intended use of DIDs, and frankly, is an anti-pattern. The ONE thing that DIDs enable is publicly verifiable proof-of-control without reliance on a central authority. DIDs for content don't do that. Full stop. We've got one job, let's not mess it up.

Also, don't conflate the addition of a given method to a registry as an endorsement of that method. What is allowed in the registry != what is true of all DIDs. By its very design, we will absolutely have tons of DIDs registered that are absolute privacy nightmares--because we have essentially zero filter on what can be accepted by the registry.

So, the fact that some people are using DIDs in weird ways is not a justification for altering the spec to support a subset of DIDs that are unable to support the use cases that DIDs are designed to support.

More importantly, the very NOTION of "permanently binding" a DID to a subject not only invites a misunderstanding (suggesting something that is possible that isn't) it encourages privacy-violating practices and will lead to the creation of even more methods that use DIDs in a privacy violating manner.

Please read, or re-read, Section 10.5 on Herd Privacy https://w3c.github.io/did-core/#herd-privacy:

When a DID subject is indistinguishable from others in the herd, privacy is available. When the act of engaging privately with another party is by itself a recognizable flag, privacy is greatly diminished. DIDs and DID methods need to work to improve herd privacy, particularly for those who legitimately need it most. Choose technologies and human interfaces that default to preserving anonymity and pseudonymity.

DIDs that permanently bind to any subject diminish the herd privacy that can and should be achieved by ensuring that at the DID layer, DIDs and their Documents are indistinguishable when used by controllers for different purposes.

Binding a DID to a particular Subject is not the role of DIDs. DIDs exist to allow anyone to be able to prove control over an identifier, which can then allow the use of that identifier for specific references, such as a DID as the subject of a VC. In that scenario it is the VC issuer that does the binding, representing a specific statement about that DID that is to be taken as "about" an intended entity. Whether or not that identifier in fact refers to the party in question is entirely dependent on the quality of identity assurance performed by the issuer, which may or may not even include an initiating Proof of Control.

The only binding you have in DIDs is the binding of a particular set of proof mechanisms for particular verification relationships.

Full stop.

@kdenhartog
Copy link
Member

kdenhartog commented Nov 15, 2020

Responding inline to @jandrieu response.

I'm not asserting DIDs are only for personal use.

I am saying that DIDs are for personal use and that any statement about DIDs as a whole, must also be true for their use by individuals. To say that you can bind a DID to a subject suggests that this is both standard practice and a desirable thing. It is not.

It is a standard and desirable practice when identifying information resources in my opinion. So we've got a point of disagreement here.

I stand by the original statement.

DIDs cannot be permanently bound to a real-world entity.

I agree with this statement and am not contending that point.

There are those who are using DIDs which are deterministically generated by a representation of an entity, but that is not a binding, it is simply a mathematical equivalence. For example, using a hash as a DID doesn't bind that DID to the subject, it is already deterministically bound to that subject, whether or not that hash is used as a DID. Math is math.

Sure it does if the did method says it does. If the DID Method asserts that the DID subject is the pre-image of the hash in the representation of a DID Document and that when converted into a did format that the guaruntee is provided by the cryptographic security of hash algorithm then the DID is "bound" to the DID subject (the DID Document representation of the information resource). This seems to work with definition 1.C of webster's dictionary of the term "bind".

I'll go further and say that the use of DIDs that are deterministically generated, IMO, fall outside the intended use of DIDs, and frankly, is an anti-pattern. The ONE thing that DIDs enable is publicly verifiable proof-of-control without reliance on a central authority. DIDs for content don't do that. Full stop. We've got one job, let's not mess it up.

Sure, but that's your opinion and as far as I can tell much of the working group would like to enable these types of use cases based on #199. Furthermore, based on my read of the charter,

"Use cases from other industries may be included if there is significant industry participation."

It is acceptable and within reason for us to address use cases beyond the original scope set out by the initial use cases.

Also, don't conflate the addition of a given method to a registry as an endorsement of that method. What is allowed in the registry != what is true of all DIDs. By its very design, we will absolutely have tons of DIDs registered that are absolute privacy nightmares--because we have essentially zero filter on what can be accepted by the registry.

If a DID method meets all the normative statements set out within the specification necessary to define a DID Method and doesn't violate any of the normative statements in the technical portion of the specification then it's a valid DID Method. If you disagree with this please provide feedback to the normative statements that should be adjusted, modified, or added and we can discuss that from there. As far as I can tell from reading the DID Method specification nothing from the did:schema method is in violation of any of the normative statements defined in the did-core specification.

So, the fact that some people are using DIDs in weird ways is not a justification for altering the spec to support a subset of DIDs that are unable to support the use cases that DIDs are designed to support.

Looking back to the charter it's acceptable for us to address additional use cases beyond the scope of the initial set of use cases. Furthermore, I haven't seen any normative statement within the specification that states that a DID Method MUST support all use cases in order to be considered a valid DID method and I highly doubt that statement would pass WG consensus.

More importantly, the very NOTION of "permanently binding" a DID to a subject not only invites a misunderstanding (suggesting something that is possible that isn't) it encourages privacy-violating practices and will lead to the creation of even more methods that use DIDs in a privacy violating manner.

Please point me to evidence that shows at least one privacy-violating practice that is enabled by the misinterpretation that DIDs are permanently bound to a single DID Subject in a way. I've not seen or heard anyone making this point and I've reviewed and worked with a variety of different DID methods.

Please read, or re-read, Section 10.5 on Herd Privacy https://w3c.github.io/did-core/#herd-privacy:

When a DID subject is indistinguishable from others in the herd, privacy is available. When the act of engaging privately with another party is by itself a recognizable flag, privacy is greatly diminished. DIDs and DID methods need to work to improve herd privacy, particularly for those who legitimately need it most. Choose technologies and human interfaces that default to preserving anonymity and pseudonymity.

When the DID Subject is a person this statement holds true. When the DID Subject is an information resource what is the benefit in giving it herd privacy? What about when the DID Subject is an IoT device - does it need privacy too?

DIDs that permanently bind to any subject diminish the herd privacy that can and should be achieved by ensuring that at the DID layer, DIDs and their Documents are indistinguishable when used by controllers for different purposes.

In theory this would be nice to hold true, but in practice it's not going to hold. Even if we were to say that a DID Document can only contain keys, the usage of particular algorithms would violate this herd privacy requirement by diminishing the entropy into subsets of DIDs that use particular algorithms. If we wanted to prevent that capability to maximize herd privacy we'd shoot ourselves in the foot on the extensibility front to be able to support post quantum cryptography in the future. So my question to you is, can you provide me with some rough estimates in the drop in entropy (see here to understand what I mean) in such a way that what you're arguing for here won't actually be greatly diminished anyways through the different usage patterns of implementers?

Binding a DID to a particular Subject is not the role of DIDs. DIDs exist to allow anyone to be able to prove control over an identifier, which can then allow the use of that identifier for specific references, such as a DID as the subject of a VC. In that scenario it is the VC issuer that does the binding, representing a specific statement about that DID that is to be taken as "about" an intended entity. Whether or not that identifier in fact refers to the party in question is entirely dependent on the quality of identity assurance performed by the issuer, which may or may not even include an initiating Proof of Control.

The only binding you have in DIDs is the binding of a particular set of proof mechanisms for particular verification relationships.

Full stop.

Again, not if the DID Method is written in such a way to enable use cases other than the initial use cases and is also a valid DID method. Which from my reading of this PR, this language further nuances the language to suggest that it's up to the DID method to make these guarantees just as the URI spec does in section 7.1 of RFC 3986 by saying "Such guarantees can only be obtained from the person(s) controlling that namespace and the resource in question."

With all of these responses in mind, I think the premise of this debate lies upon whether or not the usage of a DID to identify an information resource is acceptable and therefore whether or not we should keep the text that allows a DID method to declare a DID provides permanence through cryptographic binding for certain use cases. I can tell that you take a pretty hardline no to that, but from everything that I've read and understood while participating in this WG they've not violated any of the normative statements in the specification. So this leaves me with the final question for you. How would you like to concretely change this language in such a way that your concerns about alluding to some privacy violations are met, while my use cases around DIDs identifying information resources can still be met?

@msporny
Copy link
Member

msporny commented Nov 24, 2020

@jandrieu need you to engage in the discussion to see if anything @kdenhartog said has changed your position. Concrete text changes would help move things forward at a more rapid pace.

@jandrieu
Copy link
Contributor

Attempting to advance the conversation per @msporny's request.

Starting with the primary point of agreement:
I wrote

DIDs cannot be permanently bound to a real-world entity.

@kdenhartog replied:

I agree with this statement and am not contending that point.

Great. Then lets remove language that talks about DIDs, in general, as being bound to their subject. The only methods that currently "bind" are exceptions rather than the rule.

I have opposed that language since it was first introduced, and I have stood up and attempted to clarify for the co-editors exactly how and why that persistence as specified is NOT a desirable goal, full stop. Note that the current use-case document avoids the problematic language of permanent binding.

@kdenhartog continues:

It is a standard and desirable practice when identifying information resources in my opinion. So we've got a point of disagreement here.

Yes, we want identifiers that nobody can take away from us.

We DO NOT want identifiers that are permanently bound to subjects. Such binding is an administrative and bureaucratic act that has nothing to do with the goals of DIDs. It violates self-sovereignty and provides exactly the wrong kind of ammunition for both regulators who are working to protect individuals (because it confuses them about exactly what DIDs do), and for greedy actors who, in fact, want such binding so they can monetize it, despite the privacy harms.

The act of binding is a fundamentally cognitive one. It requires a role of an observer to assign such an ID to a Subject. To make such a binding PERMANENT gives some observer more authority than others. This is the fundamental problem.

In the normal use of identifiers, the symbols are only ever bound within a specific context. This is why JSON-LD has such a disconnect with "pure" JSON (a term I dislike, but which clarifies the point concisely). Because JSON has no means for resolving ambiguous identifiers.

And NEITHER do DIDs.

And this works great: within the relevant context, establish identifiers and their meanings.

You can use RDF and JSON-LD to combine or establish unambiguous contexts, PRECISELY because identifiers, in fact, DON'T just magically bind themselves to subjects: it requires an understanding of the context to do so, as well as a party that is capable of recognizing the subject and assigning the identifier in some "permanetn way".

If identifiers could be bound, globally, automatically, and permanently, as you seem to be suggesting, then we wouldn't need RDF or JSON-LD. Instead, we would just use those bound identifiers for everything and we could all communicate without lexical confusion. But identifiers don't work this way. In fact, the contextual nature of identifiers is an evergreen field of generational distinction and cohort exclusivity: if you don't know the hip terms, you're a square.

Which is to say that privacy itself depends on ambiguous contexts providing a natural boundary between appropriate uses of different identifiers.
Another exchange
I write:

I'll go further and say that the use of DIDs that are deterministically generated, IMO, fall outside the intended use of DIDs, and frankly, is an anti-pattern. The ONE thing that DIDs enable is publicly verifiable proof-of-control without reliance on a central authority. DIDs for content don't do that. Full stop. We've got one job, let's not mess it up.

@kdenhartog replies:

Sure, but that's your opinion and as far as I can tell much of the working group would like to enable these types of use cases based on #199. Furthermore, based on my read of the charter,

"Use cases from other industries may be included if there is significant industry participation."

It is acceptable and within reason for us to address use cases beyond the original scope set out by the initial use cases.

I'm editor of that use case document, and we have significantly expanded the set of use cases from the starting set.

However, there remains no use case for content identifiers. They have not been shown to be capable of the functionality that is already outlined in that document:

  1. They can't be used to prove control (5.2.2)
  2. They can't be revoked, rotated., or otherwise updated or recovered (5.4.1, 5.4.3, 5.4.4)
  3. They cannot be used to sign, nor to verify signatures (5.2.3, 5.2.4)

At the end of the day such content-identifiers are more appropriately created as URNs, as they already are: http://www.nuke24.net/docs/2015/HashURNs.html

Yes, I realize it is easier to get a new DID method approved--because there is essentially no oversight of the approval process. However, just because some enterprising developers see a way to use DIDs for content identifiers, doesn't mean that DIDs are appropriate for content identifiers and they were NOT created for that purpose.

Regarding issue #199, that merely established a point of precedence for the idea of a "representation" property. It did not establish that such a property is accepted, merely that such a property request made it in before the feature-freeze.

It remains up to the working group to decide whether or not such DIDs are appropriate.

@kdenhartog also writes:

If a DID method meets all the normative statements set out within the specification necessary to define a DID Method and doesn't violate any of the normative statements in the technical portion of the specification then it's a valid DID Method. If you disagree with this please provide feedback to the normative statements that should be adjusted, modified, or added and we can discuss that from there. As far as I can tell from reading the DID Method specification nothing from the did:schema method is in violation of any of the normative statements defined in the did-core specification.

Yes, this is a fine statement once the specification is complete.

It is not.

The changes with this PR double down on issues that have plagued this work from the beginning, and perhaps now we can get enough attention on the problem to resolve one way or another.

I don't know enough about did:schema to comment on it specifically, but if it is just a CID, I would suggest there are better mechanisms for minting identifiers.

However, my previous point remains: just because there is a method or two or several that use the specification in an unanticipated but conformant manner has no bearing on whether or not the core DID spec should support that unanticipated use. Just because someone comes up with a method there's no reason to think that THAT particular method should be used as a guide star for changing the specification.

This PR, in particular, is meant to clarify the dangers of persistent DIDs for humans. It was NOT requested to establish a new form of DIDs that are, at their most fundamental, incapable of several actions that define what you can do with DIDs.

I agree with @kdenhartog in that there are no hard requirements for what use cases a method must support. That's an unfortunate side effect of trying to play "big tent" without first establish clear boundaries for the common work. The result of which are long, drawn out, painful debates like this, precisely because implementers ran ahead of the spec--and now want their special feature added to the core. I get it, but it is why these github discussions

We adopted the rubric as an approach to cut the Gordian knot of defining "decentralized" rigorously enough to be used to distinguish conformant methods from undesired methods. What came out of that work is that, after months of effort by half a dozen contributors, the work did, in fact, result in a clear and distinct definition that would have worked for all DID methods envisioned at the beginning of the work group:

To qualify as a decentralized identifier, it must be possible to prove control--in a publicly verifiable manner--without reliance on a trusted third party.

The only methods I know of that DON'T provide this affordance are CIDs. There may be others, but my inventory of the registry is incomplete.

Since a co-chair and an editor favor DIDs for CIDs, I expect I won't be able to establish win the hard-earned distinction from the rubric as group consensus. However,

  1. I will continue to make the case for that distinction, as it is one of the few concrete boundaries that actually aligns ALL of the early methods.
  2. Even if such a method is conformant, such conformance is not, in an of itself, justification for embedding privacy violating practices as the norm. ESPECIALLY in a section that is designed to help clarify privacy.

@kdenhartog also says

Please point me to evidence that shows at least one privacy-violating practice that is enabled by the misinterpretation that DIDs are permanently bound to a single DID Subject in a way. I've not seen or heard anyone making this point and I've reviewed and worked with a variety of different DID methods.

Sure. The "type" property.

Putting such a property in DID core would imply that "type" is an expected property. This has been discussed at great length, with a clear consensus that, yes, "type" is a privacy problem for individuals. The author of this PR, in a separate security and privacy note, himself used an example of "type": "person" to illustrate the intended advantage of the property. So, even though said author is one of the most knowledgeable experts in this field, even he got the privacy impact wrong. It is an easy mistake to make, one that we should make extra care to help implementers avoid.

Since we also have language clarifying that such a property should NOT be used with individual human subjects, methods that use "type" for non-humans will, in fact, be creating a problem for the herd privacy goals of DIDs.

Herd privacy works by ensuring that one can "get lost in the crowd". There is a mathematical notion of anonymity that defines your level of anonymity by the number of people you can be confused with. If there are only 100 people in the world that satisfy a given constraint, that is less anonymous that if there are a million. Identifiers that can be definitively linked with those 100 people fundamentally have less privacy than an otherwise equivalent set of identifiers which define a group of 1 million people.

This is how de-anonymization works. You gather what evidence you can to tease out correlatable details that allow you to further refine the set of individuals. Get enough correlatable data and you can often reduce that to a set of 1.

Check out this NYT article on de-anonymizing Netflix user data: https://www.nytimes.com/2009/10/18/business/18stream.html

Or this more recent one that looked at an approach that could deanonymize 99.98^ of Americans from almost any available dataset with as few as 15 attributes: https://www.nytimes.com/2019/07/23/health/data-privacy-protection.html

Or this paper about Massachusetts Governor William Weld's medical data:
https://fpf.org/wp-content/uploads/The-Re-identification-of-Governor-Welds-Medical-Information-Daniel-Barth-Jones.pdf

Or this one about 20 million web search queries from AOL, which the NYTs was able to use to identify searchers. https://www.nytimes.com/2006/08/09/technology/09aol.html

DID Documents that expose properties about the presumed DID Subject, such as "type" will, by their nature, create a bifurcation at the DID layer. DIDs that have a "type" will come to be known as having non-person subjects. And thus, without receiving ANY other information than the DID and a DID Document, observers--aka crawlers, watchers, surveillors--can use that property to whittle down the possibility of who, in fact, a DID refers to.

The goal of herd privacy, which is already a part of the spec, advocates for a world where DID Documents for any two entities are indistinguishable without recourse to additional information.

So, ANYTHING that binds a DID to a specific Subject or class of Subjects violates herd privacy.

@kdenhartog sums up with:

With all of these responses in mind, I think the premise of this debate lies upon whether or not the usage of a DID to identify an information resource is acceptable and therefore whether or not we should keep the text that allows a DID method to declare a DID provides permanence through cryptographic binding for certain use cases.

Let's not let a desired feature of a few isolated methods undermine the opportunity to speak rigorously and clearly about what DIDs actually do.

Nothing in what I have said implies that DIDs can't be used to refer to an information resource. Of course they can. They can be used to refer to anything. However, the particular adoption by a specific method of a controversial application of the spec should not be taken as license to reduce the privacy and security of DIDs in general. Turning DIDs into content identifiers does just that. More importantly, the assurances that @kdenhartog and others are seeking only apply to a limited set of methods, which by their nature, are unsuitable for humans. Language like

Even if a permanent binding is desired, maintaining this binding is dependent
on the infrastructure required by the DID method.

is so off the mark as to be an invitation to invent DID methods that do, in fact, bind DIDs to specific individuals. In contrast, IMO, we should actively discourage that.

@msporny Also implied that providing some concrete spec-text might help.

Here's text that would be suitable for me:

DIDs are designed to persist beyond the life of any particular institution or organization. However, it is important to understand that they provide no innate means to bind a particular DID to a particular subject. Anyone can use any DID to refer to any Subject. It is incumbant on those using the DIDs to ascertain what inferences can be made based on the level of assurance needed.

The controller has the ability to control when and where such a DID is authenticated (they can elect to opt-out of such ceremonies), however, they cannot control what Issuers put into a Verifiable Credential, nor what information might be stored alongside a DID in a proprietary database.

Like all identifiers, DIDs are used to refer to specific Subjects,. For example the issuer and verifier of a Verifiable Credential with a DID for the "subject" property take the statements in the VC to be about the Subject intended by the Issuer.

Establishing that a given DID refers to a specific Subject for a particular credential typically uses a form of proof-of-control: once before issuance and then on every presentation. This mechanism provides a specific level of assurance that the current party has access to secret information that only the intended Subject is expected to have. As such, we can use DIDs to independently verify at least one factor of identity that

a. uses strong cryptographic means,
b. WITHOUT reliance on a trusted third party, and
c. WITH explicit opt-in (because the controller must actively participate in proof-of-control)

DIDs used in this manner provide exceptional identity assurance in the form of an identifier that cannot be administratively denied, because there is no trusted third party involved in establishing proof of control.

Treating DIDs as permanently bound to one DID Subject for all time is understood to cause several privacy and security issues that should be considered.

Missing Guarantees

First, there is no way to guarantee that the Controller is using the DID for a single Subject. You could create a DID today for your cat and tomorrow decide to re-use that same DID for your dog.

There's nothing preventing this.

Second, there is no way to guarantee that the intended Subject does not evolve over time. What might begin as "The President of the United States" may come to refer to a specific president of the United States. This semantic drift is a fundamental part of language.

Third, there is no way to guarantee that the keys used in proof of control are in fact still under the control of the expected party. We can establish a specific level of assurance—but that assurance is dependent on the secrecy of the cryptographic material behind the proof mechanism. For high value or life-critical systems, additional mechanisms should be put in place to deal with the potential of compromised keys.

Fourth, there is no way to guarantee that the Subject has not been mischaracterized. The initial binding may be in error or otherwise no longer deemed appropriate.

Security Impacts

If you assume that a DID is permanently bound to a particular Subject, you are likely going to under-evaluate the edge cases that violate the above, assumed guarantees. A robust security architecture SHOULD anticipate the failings of these guarantees and provide mechanisms to respond appropriately.

** Privacy Impacts **

The thinking of persistence of a DID, as the permanent binding of a DID to a specific Subject creates a regulatory nightmare, dramatically increasing the likelihood that a given DID will, in fact, become public knowledge as referring to specific individuals and therefore be deemed personal data or "personally identifiable information". This causes problems as both "super cookies" and with herd privacy.

Super cookies emerge when DIDs unintentionally create a long-term tracking mechanism that can be correlated by anyone who happens to see the DID in use. In the worst case, if everything done online uses one, and only one, identifier, permanently bound to a physical body, then every single service would be able to collude to track my behavior across different websites. Similarly, the use of the same identifier over time invites the eventual correlation as data accumulates at different service providers. The best practice is to use identifiers for limited periods of time, in specific contexts, thus reducing the operational effect of context violations. The solution is not so much to enforce permanence, but rather to embrace the ephemerality of all identifiers: know the context, know the identifier: contexts that are limited in time and use, rather than permanently bound to a particular subject.

Herd privacy Section 10.5 works by ensuring that one can "get lost in the crowd". You can evaluate in mathematically by the number of people an individual can be confused with. If there are only 100 people in the world that satisfy a given constraint, that is less easier to re-identify than if there are a million. Identifiers that can be definitively linked with those 100 people fundamentally have less privacy than an otherwise equivalent set of identifiers which define a group of 1 million people.

This is how re-identification and de-anonymization works. One gathers available evidence to tease out correlatable data that allow you further refinement on the set of individuals. Get enough correlatable data and you can often reduce that to a set of 1.

DIDs Documents that express an innate binding to a specific subject, will, by their nature, create a bifurcation at the DID layer. DIDs that have a binding-specific data will come to be known as having non-person subjects (because binding to individuals is actively discouraged). Thus, without receiving ANY other information than the DID and a DID Document, observers--aka crawlers, watchers, surveillors--can use those distinct property to whittle down the possibility of who, in fact, a DID refers to.

Herd privacy advocates for a world where DID Documents for any two parties, whether they are people or things, are essentially indistinguishable, without recourse to additional information.

ANYTHING that binds a DID to a specific Subject or class of Subjects violates herd privacy.

This is certainly more text that I think we want in that section on persistence. However, my more concise feedback didn't provide enough detail, so here's a good chunk of language that can be dropped in or iterated on, as desired.

@agropper
Copy link
Contributor

@jandrieu

I agree with your proposed text but I also seek clarity on a particular use case, especially useful in healthcare, where control of DIDs is delegated to a semi-autonomous agent.

In our example implementation, a 'guardian' controls a mobile wallet (with biometric lock). The guardian's wallet controls one or more 'personas' delegated to act on behalf of children, incompetent adults, or the guardian themselves. The personas are designed to be semi-autonomous and, for example, create and or use anonymous / same-domain credentials. These derived persona credentials may or may not have verifiable links back to the guardian.

It's clear how a DID would be used by the guardian. Do we need anything in particular to consider the case where a (persona) DID is derived form another (guardian) DID? See also the NIST paper on Derived PIV Credentials.

@kdenhartog
Copy link
Member

kdenhartog commented Dec 1, 2020

Thanks @jandrieu for the long and drawn out response. I know you and I have discussed this topic for quite some time in the controller related issues. I did read through the entire thing and spent some time thinking about the points you make outside the text you provided. Some of it I agree with and others I don't, but going back and forth on that seemed like it would be a less direct route to us achieving consensus. In an effort to get closer to consensus I'll focus on the text you proposed rather than the response to my comments and the text above your proposed text. I think this will get us closer than continuing to find alignment at the mental model layer because I don't think we'll ever actually find complete alignment there nor do I think we need to. Since this is the case I suspect we may end up with language that all of us are happy with, but something we as a group are willing to grit our teeth and accept.

Below I made an attempt to try to nuance the language in a way to speak specifically about real world entity subjects (your first proposal as stated by "DIDs cannot be permanently bound to a real-world entity"), but don't go so far as to assume it's generalized for all classes of DID Subjects like you argued for in the second follow up response. This is something that I think we may be able to get consensus on or at least get closer to consensus with. I'd be curious where @talltree and @peacekeeper would land on my revisions to your text. Either of you want to modify this?

DIDs are designed to persist beyond the life of any particular institution or organization. However, it is important to understand that they provide no innate means to bind a particular DID to a real world entity. Anyone can use any DID to refer to any Subject whether it's a real world entity or an information resource. It is incumbent on those using the DIDs to ascertain what inferences can be made based on the level of assurance needed.

I've modified this to change "particular subject" to "real world entity". That was the your original point in stating "DIDs cannot be permanently bound to a real-world entity.", but doesn't stretch it so far to generalize it for all dids in a way that isn't true about information resources.

The controller has the ability to control when and where such a DID is authenticated (they can elect to opt-out of such ceremonies), however, they cannot control what Issuers put into a Verifiable Credential, nor what information might be stored alongside a DID in a proprietary database.

Agree with this paragraph. Fine to leave as is.

Like all identifiers, DIDs are used to refer to specific Subjects. For example the issuer and verifier of a Verifiable Credential with a DID for the "subject" property take the statements in the VC to be about the Subject intended by the Issuer.

I'm not sure we want to overly bind this section to VCs. I'd prefer to strike this paragraph or rewrite it in a more generalized way not explicitly linked to VCs.

Establishing that a given DID refers to a specific Subject typically requires a form of proof-of-control. This mechanism provides a specific level of assurance that the current party has access to secret information that only the intended Controller is expected to have.

DIDs used in this manner provide exceptional identity assurance in the form of an identifier that cannot be administratively denied, because there is no trusted third party involved in establishing proof of control.

I suggest striking this as well as it's overly unrelated to persistance. I think it would better fit in a different portion of text about control.

Treating DIDs as permanently bound to one real world entity for all time is understood to cause several privacy and security issues that should be considered.

Missing Guarantees

First, there is no way to guarantee that the Controller is using the DID for a single real world entity. You could create a DID today for your cat and tomorrow decide to re-use that same DID for your dog.

Second, there is no way to guarantee that the intended real world entity does not evolve over time. What might begin as "The President of the United States" may come to refer to a specific president of the United States. This semantic drift is a fundamental part of language.

Third, there is no way to guarantee that the keys used in proof of control are in fact still under the control of the expected real world entity. We can establish a specific level of assurance—but that assurance is dependent on the secrecy of the cryptographic material behind the proof mechanism. For high value or life-critical systems, additional mechanisms should be put in place to deal with the potential of compromised keys.

Fourth, there is no way to guarantee that the real world entity has not been mischaracterized. The initial binding may be in error or otherwise no longer deemed appropriate.

I changed the usage of subject to real world entity in order to more precisely represent the original point you made. I'd be happy to keep one or two of these examples, but don't think all 4 are necessary.

Security Impacts

If you assume that a DID is permanently bound to a particular real-world entity, you are likely going to under-evaluate the edge cases that violate the above, assumed guarantees. A robust security architecture should anticipate the failings of these guarantees and provide mechanisms to respond appropriately.

Again same mention as above. Changed subject to real world entity. Also changed the normative statement to non-normative language because that's untestable.

** Privacy Impacts **

The thinking of persistence of a DID, as the permanent binding of a DID to a specific Subject creates a regulatory nightmare, dramatically increasing the likelihood that a given DID will, in fact, become public knowledge as referring to specific individuals and therefore be deemed personal data or "personally identifiable information". This causes problems as both "super cookies" and with herd privacy.

Super cookies emerge when DIDs unintentionally create a long-term tracking mechanism that can be correlated by anyone who happens to see the DID in use. In the worst case, if everything done online uses one, and only one, identifier, permanently bound to a physical body, then every single service would be able to collude to track my behavior across different websites. Similarly, the use of the same identifier over time invites the eventual correlation as data accumulates at different service providers. The best practice is to use identifiers for limited periods of time, in specific contexts, thus reducing the operational effect of context violations. The solution is not so much to enforce permanence, but rather to embrace the ephemerality of all identifiers: know the context, know the identifier: contexts that are limited in time and use, rather than permanently bound to a particular subject.

These two paragraphs above read like subjective experience and seem orthogonal to the relationship of persistence. While I understand where you're going with this, it seems to fray a bit too much from what we'd want to put inside a specification. Perhaps focus the text more on the the point around embracing ephemeral DIDs instead. As stands I'd prefer to strike these two paragraphs, but would be willing to heavily modify them to focus more on the points at hand if you feel strongly about keeping these.

Herd privacy Section 10.5 works by ensuring that one can "get lost in the crowd". You can evaluate in mathematically by the number of people an individual can be confused with. If there are only 100 people in the world that satisfy a given constraint, that is less easier to re-identify than if there are a million. Identifiers that can be definitively linked with those 100 people fundamentally have less privacy than an otherwise equivalent set of identifiers which define a group of 1 million people.

This is how re-identification and de-anonymization works. One gathers available evidence to tease out correlatable data that allow you further refinement on the set of individuals. Get enough correlatable data and you can often reduce that to a set of 1.

While true and I'm aware of this, we're inevitably going to face unintended classifications of DIDs even just by public key types used combined with usage patterns in number of public keys. We have no enforcement over these. If you want to keep this section it should be reworded to address these concerns better and be moved into section 10.5 rather than keeping in this note in my opinion.

DID Documents that express an innate binding to a specific real world entity, will, by their nature, create a bifurcation at the DID layer. DID that have a binding-specific data will come to be known as having non-person subjects (because binding to individuals is actively discouraged). Thus, without receiving ANY other information than the DID and a DID Document, observers--aka crawlers, watchers, surveillors--can use those distinct property to whittle down the possibility of who, in fact, a DID refers to.

This reads a bit more as an opinion about the level of conformance implementer should take rather than as anything to do with permanence. As stated above due to the variety of publicKey types and number of keys (and serviceEndpoints etc) available we'll already be naturally bifurcating the DID layer, so I don't understand what further use this will help implementors when relating to their decisions around permanence of a DID. As such, I'd suggest striking this paragraph instead.

Herd privacy advocates for a world where DID Documents for any two parties, whether they are people or things, are essentially indistinguishable, without recourse to additional information.

ANYTHING that binds a DID to a specific Subject or class of Subjects violates herd privacy.

This could be moved to section 10.5 as well if you wish to keep it.

@brentzundel
Copy link
Member

In my view, persistence is less about the persistence of the binding between a DID and a subject, and much more about the persistence of the DID itself, i.e., it cannot be taken away by some third party.
The language this PR adds which clarifies that the subject of a DID may change over time is good, but takes focus away from the main point, which is that the controller is in charge of determining how persistent the identifier is.

@msporny
Copy link
Member

msporny commented Dec 6, 2020

Please retarget this PR to the main branch. You can do this by going to the top of this page, clicking the Edit button beside the title of the pull request, and using the drop-down menu beside the w3c:master target branch indicator for the pull request and changing it to main.

@talltree talltree changed the base branch from master to main December 11, 2020 00:30
Signed-off-by: Drummond Reed <[email protected]>
@shigeya
Copy link
Contributor

shigeya commented Dec 22, 2020

On the same note on persistence, I raised issue #504. The text's current location is at 3.1 DID Syntax, and this text is not about the syntax. We want to relocate the text to a reasonable location, possibly to Sec.4. Data Model?

@talltree
Copy link
Contributor Author

In the interests of closing on this PR, I will try to address several of the points made on this thread.

Point # 1
@jandrieu wrote:

Then lets remove language that talks about DIDs, in general, as being bound to their subject. The only methods that currently "bind" are exceptions rather than the rule.

I have opposed that language since it was first introduced, and I have stood up and attempted to clarify for the co-editors exactly how and why that persistence as specified is NOT a desirable goal, full stop.

Joe, on this point we just need to agree to disagree. The very first version of this specification written over four years ago said that one use of DIDs is to be able to serve as a cryptographically verifiable version of a URN (Uniform Resource Name). To quote from the URN spec (RFC 8141):

A Uniform Resource Name (URN) is a Uniform Resource Identifier (URI) that is assigned under the "urn" URI scheme and a particular URN namespace, with the intent that the URN will be a persistent, location-independent resource identifier.

This does not mean that all DIDs must function as URNs. But it does mean that some DIDs need to be able to function as URNs. That's why this PR says exactly what it does about persistence.

Point # 2
@jandrieu wrote:

Nothing in what I have said implies that DIDs can't be used to refer to an information resource. Of course they can. They can be used to refer to anything. However, the particular adoption by a specific method of a controversial application of the spec should not be taken as license to reduce the privacy and security of DIDs in general. Turning DIDs into content identifiers does just that.

This is a second area where we need to agree to disagree. I understand you have a particular point of view about DIDs for natural persons. I agree with many aspects of that POV. However, as you say, a DID can be used to refer to anything. Over the past year we have watched the universe of DID applications expand. Many of them involve non-human DID subjects. Some of them involve cryptographic binding to information resources. That usage of DIDs as consistent with and supported by the spec.

My overall point is that it does not make sense to constrain the use of all DIDs to conform to the requirements of DIDs whose subject is a natural person. Yes, indeed herd privacy is an important factor in the latter case. But if the DID subject is not a human, there are many cases where herd privacy does not apply. In fact for DIDs whose subject is an organization, the requirement is often just the opposite: strong correlation of publicly-verifiable DIDs is needed.

So herd privacy as a topic should be taken up in the Privacy Considerations section of the spec. I will revisit the text of this PR over the holiday break to revise it to further reinforce that persistence of DIDs whose subject is a natural person involve very important privacy considerations and and I will add links to the appropriate subsection(s) of the Privacy Considerations section.

@talltree
Copy link
Contributor Author

On the same note on persistence, I raised issue #504. The text's current location is at 3.1 DID Syntax, and this text is not about the syntax. We want to relocate the text to a reasonable location, possibly to Sec.4. Data Model?

@shigeya While the text about persistence is not strictly about the syntax, it is about DIDs in general, and not about the data model. So perhaps it would be best to move it from section 3.1 (DID Syntax) to section 3 (Identifier)?

@agropper
Copy link
Contributor

agropper commented Dec 23, 2020 via email

@TallTed
Copy link
Member

TallTed commented Dec 28, 2020

@kdenhartog wrote --

I've modified this to change "particular subject" to "real world entity". That was the your original point in stating "DIDs cannot be permanently bound to a real-world entity.", but doesn't stretch it so far to generalize it for all dids in a way that isn't true about information resources.

"Real world entity" is problematic, because, as you've acknowledged, DIDs may refer to anything -- concrete a/k/a "real world" or conceptual a/k/a "unreal world". Further, "real world entity" typically doesn't include information resources, which seem to be much of what you want DIDs to refer to, so I don't understand why you've suggested this change.

"Particular subject" avoids both of those, but may be seen as further overloading "subject", being somewhat circular in definition, and otherwise problematic (outside your arguments, even).

"Particular entity of interest" might be acceptable to all ... though I recognize that simply by saying that, I'm likely triggering an objection.

@TallTed
Copy link
Member

TallTed commented Dec 28, 2020

@jandrieu @kdenhartog

The suggested text above has a few points where there are missing words, primarily negations like "not", the lack of which changes the meaning from what I'm pretty sure was meant to its direct opposite. I was not able to keep track of these while digesting the long thread, and will look carefully at the revised PR that I think will be coming (either by revision of this one, or creation of another), but it would be in your best interest to carefully re-review as well.

@shigeya
Copy link
Contributor

shigeya commented Dec 29, 2020

@talltree

@shigeya While the text about persistence is not strictly about the syntax, it is about DIDs in general, and not about the data model. So perhaps it would be best to move it from section 3.1 (DID Syntax) to section 3 (Identifier)?

Move to the identifier section is better than the syntax section, but I think it depends on how the text will be.
The current version of the text slightly discusses the relationship to DID document, which is beyond the discussion on DID as an identifier.

(note: I'm in a slow work mode so responding slowly..)

Copy link
Member

@kdenhartog kdenhartog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with these updates in combination of @jandrieu PR on persistence as well

Copy link
Contributor

@jandrieu jandrieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@talltree This update addresses many of my concerns, and I proposed some language that would clear up the rest. Hopefully I managed to do that in a way that highlights your goals.

DIDs really aren't URNs. I think that's the wrong way to frame what I think you might be looking for. In particular, URNs by their most fundamental definition do not provide any location information about where to retrieve those resources.

However, DIDs can be used like super URNs, which is what I think you're getting at.

Hopefully this adjustment helps.

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated
Comment on lines 4317 to 4321
If desired by a <a>DID controller</a>, a <a>DID</a> is capable of fulfilling
the functions of a Uniform Resource Name (URN) as defined by [[RFC8141]], i.e.,
"a persistent, location-independent resource identifier". A <a>DID controller</a>
who intends to use a <a>DID</a> for this purpose is advised to follow the
security considerations in [[RFC8141]]. In particular:
Copy link
Member

@TallTed TallTed Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I can't suggest over a suggestion, but I can make a multi-line suggestion... This includes @jandrieu's suggestions for this paragraph plus turns his -- into &mdash; and <a>-wraps (I think, all) relevant terms—

Suggested change
If desired by a <a>DID controller</a>, a <a>DID</a> is capable of fulfilling
the functions of a Uniform Resource Name (URN) as defined by [[RFC8141]], i.e.,
"a persistent, location-independent resource identifier". A <a>DID controller</a>
who intends to use a <a>DID</a> for this purpose is advised to follow the
security considerations in [[RFC8141]]. In particular:
If desired by a <a>DID controller</a>, a <a>DID</a> is capable of acting
as an enhanced Uniform Resource Name (URN) as defined by [[RFC8141]], i.e.,
"a persistent, location-independent resource identifier". <a>DIDs</a> used
in this way provide a cryptographically secure, location-independent
identifier for a digital resource, while also providing the metadata that
allows retrieval. Because of the indirection between the <a>DID
document</a> and the <a>DID</a> itself, the <a>DID controller</a> can
adjust the actual location of the resource &mdash; or even provide the
resource directly &mdash; without adjusting the <a>DID</a>. <a>DIDs</a>
of this type can definitively verify that the resource retrieved is, in
fact, the resource identified.
</p>
<p>
A <a>DID controller</a> who intends to use a <a>DID</a> for this purpose is
advised to follow the security considerations in [[RFC8141]]. In particular:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TallTed I incorporated your suggestions. I realized my last push still had the bullets as a numbered list. I will fix that now.

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
@talltree
Copy link
Contributor Author

@jandrieu and @TallTed I am in agreement with all of your suggestions—they substantially improve the PR. I just need to uplevel my GitHub ninja skills to figure out how to process suggestions on suggestions. I'm consulting my GitHub mentor for assistance and will process ASAP.

@TallTed
Copy link
Member

TallTed commented Jan 28, 2021

@talltree -- I think you can accept the suggestions from @jandrieu first, and then accept mine (which I think will show fewer changes against the "original", since mine include his which will have already been made). I think. :-)

@iherman
Copy link
Member

iherman commented Jan 29, 2021

The issue was discussed in a meeting on 2021-01-28

  • no resolutions were taken
View the transcript

1. PRs on the Appendix

See github pull request #457, #460, #574.

Manu Sporny: Real quick, Drummond, just to check in on your PRs. For appendix and persistence and the URN thing. Can we get an update on those?

Drummond Reed: Lots of progress on one, not the other. Came down to the amount of time in the day. The PR on persistence... on our last call, I said "Hey, Joe, can I add something on URNs." He said he'd consider it.
… When I looked at his PR it was really about a different set of topics on persistence. In the end I felt that the points I wanted to be made about URNs, and so on should just be in a different section, which I called DIDs as URNs.
… Rather than shoehorn/appending it to Joe's section, I just made a new section. I revised it to just be a standalone section with a different name. Meanwhile, I reviewed Joe's PR and made one suggestion about formatting. That's my full report. Because I changed the substance of what my PR was suggesting, I just went back to the reviewers to ask to look at it again.
… I completed last night at midnight so maybe no reviews yet.

Manu Sporny: I'm hearing persistence PR is on a good glide path, good reviews, will probably go in. You've got another PR that needs review, and maybe there's alignment there with some of Joe's comments on it. Potential alignment there. There were a lot of change suggestions for the appending, but it seems no one is objecting anymore.

Drummond Reed: Everything after Joe had done the same thing and read it closely, he's got a particular way to characterize identification and in the end we agreed. There are like 20 small suggestions, I agree with all of them and just need time to incorporate them. The largest one is a formatting suggestion from Ivan and I just need to reformat to be a single section.
… Then I need to change the settings inside and do wordsmithing.
… My next highest priority is to get that done by the end of this weekend. My next assignment after that is either another section or adding more to a section on privacy regulations. I've prepared wording for that, short and sweet.
… Then I can put all my energy into review.
… That's my report.

Manu Sporny: Unless there are any objections, we can move onto concerns around the CBOR section. Anyone want to cover anything else first?

@iherman iherman mentioned this pull request Jan 29, 2021
talltree and others added 3 commits January 29, 2021 09:43
Co-authored-by: Ted Thibodeau Jr <[email protected]>
Co-authored-by: Ted Thibodeau Jr <[email protected]>
index.html Outdated Show resolved Hide resolved
Copy link
Member

@msporny msporny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor editorial suggestions to avoid normative language that is not testable.

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
talltree and others added 4 commits January 31, 2021 12:27
@talltree
Copy link
Contributor Author

I have accepted all suggestions and done one more clean-up, so I believe this is ready to merge once the merge conflict is resolved (deferring to @msporny on that).

@talltree talltree changed the title DIDs as URNs DIDs as Enhanced URNs Jan 31, 2021
@msporny msporny merged commit 75b403b into w3c:main Jan 31, 2021
@msporny
Copy link
Member

msporny commented Jan 31, 2021

Editorial, multiple reviews, changes requested and made, no objections, merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants