Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type property #410

Closed
wants to merge 2 commits into from
Closed

Add type property #410

wants to merge 2 commits into from

Conversation

rhiaro
Copy link
Member

@rhiaro rhiaro commented Sep 21, 2020

Adds the type property to the DID subject section. Supersedes PR #348


Preview | Diff

index.html Outdated
{
"@context": "https://www.w3.org/ns/did/v1",
"id": "did:example:21tDAKCERh95uGgKbJNHYp",
"type": ["https://schema.org/Person"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a potentially less contentious example, perhaps:

Suggested change
"type": ["https://schema.org/Person"],
"type": ["https://schema.org/Book"],

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brentzundel While I appreciate that "Book" is much less controversial, it's also potentially a much lower-value use case. May I suggest "https://schema.org/Organization"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think types are a great idea. I try to give them to all of my linked objects, as it helps to query by type

If a DID is of type Person

And elsewhere we can say <did:example:21tDAKCERh95uGgKbJNHYp> :created <Timestamp> that leads to an ambiguity as to what actually was created, as it applies to both the person, and the identifier

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this to "inmate" or "terrorist" ?... I prefer the worst case scenarios be addressed up front :)

@melvincarvalho I believe created is "did document metadata` and is not returned in the did document itself, although it would be fine for. "JSON-only" did documents which do not care about RDF / triples.

@jandrieu
Copy link
Contributor

jandrieu commented Sep 21, 2020

This is an absolute privacy violation.

The DID Document, IMO, should never include any affirmative statements about the supposed nature of the Subject. It should include ONLY those properties necessary to establish secure interactions.

The DID Document architecture is designed to enable the robust establishment of root authority for secure interactions without reliance on any trusted third party.

Attributes of the kind expressed in this PR should be expressed separately, such as in a Verifiable Credential.

@agropper
Copy link
Contributor

+1 @jandrieu

Although I know that many methods keep the DID Doc private, I hope we can keep the assumption that DID Documents are public and, as Miranda goes: "anything you say will be used against you."

@brentzundel
Copy link
Member

What if the DID subject is a schema, or a book? What privacy implications does it present to identify that the DID Document may also be interpreted as a schema?

I completely agree that it could be a privacy violation, but disagree that in all cases it would be.
The real question before the group is whether the possibility of privacy violations should restrict the possibility of use cases that don't introduce them.

I sometimes wonder if our biggest error is the whole DID subject notion in the first place.
I don't believe most VDRs are interested in serving as a repository of public identifiers, but rather they wish to simply be a key lookup service.
Would it be terrible to say that the DID identifies a set of keys and endpoints controlled by some entity, and leave the idea of subject out of it entirely?

@jandrieu
Copy link
Contributor

@brentzundel wrote:

Would it be terrible to say that the DID identifies a set of keys and endpoints controlled by some entity, and leave the idea of subject out of it entirely?

I think that would be awesome.

IMO, it is the adjustable proof-of-control without reliance on a trusted third party that defines DIDs. Everything else is optional.

To your previous question:

What if the DID subject is a schema, or a book? What privacy implications does it present to identify that the DID Document may also be interpreted as a schema?

I would argue that using a DID for a schema or a book, while possible, is not what DIDs are designed for, nor should we design for that use case. There are already URN's based on content hashes, which meets that need without all the fancy extra stuff in DIDs and DID Documents. Just like you can use a blockchain as an expensive database--but I wouldn't recommend it--you can use a DID as a URN for static content, but I wouldn't recommend it.

For example, why would you need a DID Document for a subject or a schema? What would be in it? Why would it mean anything when anyone can publish such a DID Document.

I don't think you need a DID Document for those subjects. You just need a content-based identifier. There is no need to look up key material, verification methods, or service endpoints. All of which would be meaningless without understanding WHO the controller is. Would it be appropriate for me to create a DID and DID Document for Harry Potter and the Sorcerer's Stone, then add some service endpoints that, conveniently, point to my website?

What DIDs do well--and they do it better than any other mechanism to date--is provide an adjustable, reliable way to prove control over an identifier without a trusted third party involved. Asymmetric keys also provide a way to prove control over an identifier without a trusted third party, but they aren't adjustable in the same way. Because the keys are just inputs to (and outputs of) mathematical functions, there is no built-in way to rotate or revoke: you need additional publish/subscribe mechanisms such as those used for PGP Key servers. The indirection in the DID Document is something different: it provides decentralized way to publish updates to authoritative key materials without relying on a central authority. That's huge.

IMO, there's no value in a DID Document for a Book, although there might be value in a DID Document for the author of a book.

@talltree
Copy link
Contributor

I would argue that using a DID for a schema or a book, while possible, is not what DIDs are designed for, nor should we design for that use case. There are already URN's based on content hashes, which meets that need without all the fancy extra stuff in DIDs and DID Documents.

@jandrieu Joe, I'm surprised this is coming up now. We have spent over 6 months discussing the need for this ever since @brentzundel raised the question in issue #199. We have discussed on multiple DID WG calls, including one entire special topic call, the proposed representation property which resulted in a consensus that we do not need that property because we can simply use the type property that's already so well defined in multiple vocabularies. This is also a key point covered in the proposed DID Appendices issue #373 that started on August 8.

Did you not participate in those calls?

I am also surprised to see the assertion that somehow DIDs are supposed to be just for people or organizations. Since the very first version of the DID spec four years ago, we have said that DIDs can be used for anything a URI can be used for. To quote RFC 3986:

This specification does not limit the scope of what might be a resource; rather, the term "resource" is used in a general sense for whatever might be identified by a URI. Familiar examples include an electronic document, an image, a source of information with a consistent purpose (e.g., "today's weather report for Los Angeles"), a service (e.g., an HTTP-to-SMS gateway), and a collection of other resources. A resource is not necessarily accessible via the Internet; e.g., human beings, corporations, and bound books in a library can also be resources. Likewise, abstract concepts can be resources, such as the operators and operands of a mathematical equation, the types of a relationship (e.g., "parent" or "employee"), or numeric values (e.g., zero, one, and infinity).

So then you ask:

For example, why would you need a DID Document for a subject or a schema? What would be in it? Why would it mean anything when anyone can publish such a DID Document.

I don't think you need a DID Document for those subjects. You just need a content-based identifier. There is no need to look up key material, verification methods, or service endpoints.

Although I could list a dozen reasons, let me simply cite the four key properties of a DID:

  1. Persistent (like a URN)
  2. Resolvable (like a URL)
  3. Cryptographically verifiable (like a content-based identifier)
  4. Decentralized (like a UUID or a content-based identifier)

Content-based identifiers have 3 of those four properties but not the fourth (resolvability). It turns out that being able to resolve a DID to get a verifiable digital object in the DID document is enormously valuable in the context of infrastructure like the ToIP stack. It allows DIDs to provide 100% of the interface at ToIP Layer 1. Neither URNs or content-based identifiers can do that.

In terms of just the use cases we have for ToIP, this means DIDs can be used to identify, retrieve, and when needed (using the type property) represent all of the following:

  • People
  • Organizations of any kind
  • Devices, products, physical things of any kind
  • Schema definitions
  • Credential definitions
  • Revocation registries
  • KERI event logs
  • Human-readable governance frameworks
  • Machine-readable governance frameworks

In short, provided that DID documents support the type property, we can use them to represent any digital object that needs the four properties of a DID.

Lastly, DID documents are far more powerful than content-based identifiers because they can also bind the content to an author (or any other resource) in a way that content-based identifiers cannot.

@talltree
Copy link
Contributor

Would it be terrible to say that the DID identifies a set of keys and endpoints controlled by some entity, and leave the idea of subject out of it entirely?

Yes, @brentzundel, that would be a terrible idea ;-)

Seriously, it would gut the primary goal of DIDs as an identifier. Please review the latest posts to #373 and see if you don't agree.

@talltree
Copy link
Contributor

Although I know that many methods keep the DID Doc private, I hope we can keep the assumption that DID Documents are public and, as Miranda goes: "anything you say will be used against you."

Each time I see a statement like this, I have to point out that it is an incorrect assumption that DIDs and DID documents are only for people. They are URIs, and as it says in RFC 3986, URIs are for anything with identity.

While SOME of those entities are people, and we definitely want people to use DIDs and DID documents safely, a huge swath of usage of DIDs and DID documents is for organizations, products, and other entities that are not subject to GDPR and which have a crying need for public DID documents with public keys and public service endpoints.

I am currently working on four ToIP Layer 4 governance frameworks, one of which is national in scope and the other three which are global in scope. Every one of them needs public DIDs for public institutions with public keys and public service endpoints. The only one I can mention publicly right now is GLEIF—you can see a presentation about their use cases here.

@agropper
Copy link
Contributor

@talltree We seem to be talking past each other. I have no issue with DIDs for institutions, cats, or books but Self-Sovereign Identity likely applies only to people and maybe clubs or militias that live outside any clear governance structure.

To oversimplify, neither SSI nor DIDs have anything much to do with governance and everything to do with privacy and security through accountability of individuals (as in Zero-Trust Architecture). Our discussion should not feel like a zero-sum game between DIDs as URIs and DIDs as the foundation for SSI, but I'm not smart enough to propose a solution.

@talltree
Copy link
Contributor

I have no issue with DIDs for institutions, cats, or books but Self-Sovereign Identity likely applies only to people and maybe clubs or militias that live outside any clear governance structure.

@agropper Two points:

  1. I personally don't define "SSI" as applying only to people, but as a decentralized identity model that implements a specific set of person-centric values, but which also applies to anything a person may interact with, i.e., organizations and things (either physical or logical).
  2. Regardless of how you want to define "SSI", DIDs are not just for SSI. DIDs are for any resource anywhere that needs permanent, resolvable, cryptographically verifiable, decentralization identification.

Copy link
Contributor

@talltree talltree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amy, I'm for pulling this in even if you don't update the example per my suggestion as we can always do that later and I think we benefit from getting this property added to the spec now.

@msporny
Copy link
Member

msporny commented Sep 27, 2020

Merge conflicts need to be resolved.

@msporny
Copy link
Member

msporny commented Sep 27, 2020

Need clarification by @jandrieu @brentzundel and @agropper on whether or not they're objecting to this PR being merged.

@agropper
Copy link
Contributor

agropper commented Sep 27, 2020

From a privacy perspective, I see no benefit whatsoever to adding a type property and I tend to agree with @jandrieu logic.

I find the arguments by @talltree to be abstract. DIDs and DID methods are needed to help solve an otherwise difficult problem related to people as data subjects. The problems involve the privacy of individuals in the face of vast asymmetry relative to powerful institutions and corporations. Things and institutional entities do not have the same rights to self-determination that many societies ascribe to people. They are not self-sovereign or should not be.

DIDs do not guarantee an easy solution to the privacy and security issues that affect humans. Diluting the human-centered features, including privacy, of DIDs seems like a very risky proposition for no obvious gain. There are other ways to attach attributes to things and institutional entities.

@jandrieu
Copy link
Contributor

@msporny I would.

IMO, the DID Document is for bootstrapping secure communications; I do not see it as an appropriate place for arbitrary assertions about any subject, whether they are the DID Subject or not.

We literally created an entire architectural layer to make arbitrary assertsion. We should use it.

Conflating these layers will not improve the security or privacy of the system, and in many cases will reduce them. Since we know that developers will misuse and abuse any properties we give them, we should do what we can to keep the foot-guns locked away.

In particular, we should help developers avoid creating GDPR and CCPA compliance issues by reducing the explicit linkage between DIDs and any particular entity.

The DID & DID Document architecture is about proofs and proof mechanisms, not assertions and attributes. When used in a particular way (using 2-phase proof-of-control), with particular technologies (such as asymmetric cryptography), it is reasonable to treat a DID as identifying a subject.

HOWEVER, when DID Documents are used to say anything about anyone, then we are encouraging building global directories without appropriate privacy controls.

@talltree
Copy link
Contributor

HOWEVER, when DID Documents are used to say anything about anyone, then we are encouraging building global directories without appropriate privacy controls.

I think we are going to need a special topic call to talk about this worldview issue. We just went through that whole debate with service endpoints. Now it appears we are having it again about type.

Please please please do not think I am ever advocating that we should not have "appropriate privacy controls" in the spec. I'm a living breathing Privacy by Design Ambassador. I have already written and will continue to author major portions of the Privacy Considerations section of the spec.

That said, I cannot emphasize strongly enough that DIDs are not just for people. The spec has NEVER said that. In fact from Day One it has always said they are for anything with identity. Organizations. Physical things. Logical things. Please see this post I made earlier in this thread for more about that whole subject.

In the vast majority of cases, those non-person entities do not need to be private. In fact, from a trust establishment standpoint, it benefits everyone when the DIDs and DID docs for those entities are public. The irony is that arguing against that actually works against the DID spec helping protect people's privacy.

Why? Because giving public DIDs and DID docs to the organizations, physical things, and logical things that people need to connect with means those people can:

  • Easily form persistent, secure, private connections with those publicly-identified entities.
  • Easily access the schemas, credential definitions, revocation registries, and other cryptographic metadata they need to have VCs issued and verified over those private channels.
  • Easily use ToIP Layer 3 data exchange protocols to have private exchanges of data under the person's full control.
  • Easily access and verify machine-readable governance frameworks that the person's agent can process to help protect the person's privacy.

@agropper
Copy link
Contributor

We may be approaching consensus that, for clear privacy reasons, there will be one or more service endpoints with a normative abstract data model -- #382 (comment)

What if one of the optional service endpoints was 'Public'?

Consider the IoT case where the thing is a secure client with a manufacturer's certificate. The DID would be the (same-domain) identifier, the 'Public' service endpoint information would be the type 'thing' and the manufacturer's certificate.

In another case, a DID that refers to a self-sovereign doctor could use 'Public' to publish a medical credential or maybe just their National Provider Identifier (NPI). The doctor's cell number or calendar would be posted behind service endpoints for 'Mediator' or 'Authorization' respectively. The mediator would hide the phone number from public view and the scope of calendar operations available to a requesting party would be controlled by an authorization server.

@kdenhartog
Copy link
Member

I'm not overly opinionated on this topic, but I do see the benefits of both sides of the arguments presented so far and unfortunately don't also have a compromise that I can offer yet either.

One thing I'm taking into consideration that I've not seen anyone else mention yet is that the exclusion of a type property seems like it would likely end up strongly correlating the identifier to a Person if it's the only type that ends up getting excluded. Is that something we also need to consider if we chose that excluding the type property for people is correct, but for everything else it ends up getting used.

Noting that my consideration is very much a "what-if" scenario that's hard to work around, but if we can consider it I think it would be beneficial.

@talltree
Copy link
Contributor

talltree commented Sep 28, 2020

Just to clarify in case anyone on this thread is confused, the use of the type property is OPTIONAL.

Speaking as one WG member with numerous use cases for this property, right now 100% of these uses cases involve public DIDs and DID docs for either public institutions or logical objects (such as schemas, credential definitions, etc.) that need to be fully public. Right now, I don't have a single use case that would involve needing a type property in a DID document where the subject is a person. (I'm not saying such use cases don't exist, just that I don't have any.)

@msporny
Copy link
Member

msporny commented Sep 28, 2020

One thing I'm taking into consideration that I've not seen anyone else mention yet is that the exclusion of a type property seems like it would likely end up strongly correlating the identifier to a Person if it's the only type that ends up getting excluded. Is that something we also need to consider if we chose that excluding the type property for people is correct, but for everything else it ends up getting used.

I was just thinking the same thing @kdenhartog ... good point.

A compromise position would be what I expect the compromise position for services is going to be -- allow it, but put giant warnings all over it stating that expressing things like Person, Organization, etc. are better done via other mechanisms (like private data exchanges) and leave it up to DID Methods to speak to how it's safe to do Person/Organization in their specific DID Method. So, deny-all types by default and suggest that DID Methods can create allow-lists for certain types.

@dhh1128
Copy link
Contributor

dhh1128 commented Sep 28, 2020

I get the architectural attractiveness of Joe's worldview that DID documents are just for describing the cryptographic control over the identifier. In software architecture, it is generally wise to limit the responsibilities of each component to something crisp and simple. Making the DID doc just about enabling proofs of control would seem to follow that wisdom.

However, I think the assertion that VCs are a layer for communicating all other metadata is making the same put-too-many-responsibilities-on-a-single-layer mistake in the other direction. VCs are wonderfully useful, but they are not a catch-all for random communication of any kind of metadata. The "V" in VCs reminds us that their purpose is to convey trust in a carefully imagined way; they are not a general information publishing mechanism, and they are not intended to be consumed in the same way that other metadata is.

It may be helpful here to think by analogy. The DNS system is somewhat like the system we're trying to create for DIDs. (Yes, I know there are important differences. Hang with me.) Squinting to suppress details, the raison d'etre for DNS is to resolve hostnames to IP addresses. However, are we losing something important when we squint this way? CNAME records turn out not to be the only useful artifact in DNS. We have A records and MX records and TXT records that all leverage the same resolution mechanism to convey additional metadata -- and that are all consumed in the same way, by the same types of code.

We also have X509 certs to convey trust. But we don't imagine that we should shift the content of A records and MX records and TXT records into certs. Why? Is this just because the designers of DNS were dumb about architectural principles, and didn't see the wisdom of making everything a cert unless it focused purely on hostname --> DID resolution?

I think not.

It turns out that certs are consumed at a different level of abstraction, and with a different goal, from DNS data. The software components that are going to consume an IP address are also the software components that need to know how to route email to a domain, and are also the software components that need to handle hostname aliases, etc. These layers don't need to deal in certs and CA chains, and their central concern is not verification.

It also turns out that the people and processes that maintain CNAME records tend to be the same ones that maintain A records and TXT records -- and they tend NOT to be the people and processes that issue certs. Different expertise, tools, and constraints. It makes sense for DNS, not certs, to carry the metadata in these alternate record types.

(You could argue that DNS has made bad choices here; that the division between certs and DNS is what allows various vulnerabilities, and the world would be more secure if everything were cert-backed except CNAMEs. But I disagree; the flaws in DNS derive from its centralization, not its unwillingness to put everything in a cert. We are talking about one of the most phenomenally successful and ubiquitous technologies on the planet...)

Anyway, I suggested the analogy because I think it applies here. The architectural level that resolves a DID to control information is the same level that needs to know where to send messages to a DID (service endpoints). This is because a DID controller needs to prove they've spoken (keys), but also needs to specify how they listen (endpoints). And that same architectural level is what needs to know other metadata that doesn't have privacy constraints but that the DID controller deems central to the exercise of their control. Saying, "offload everything else into a VC" is misunderstanding VCs as a general information conveyance mechanism, and is oversimplifying DID control as only a key control mechanism. Using a DID to identify a pharmaceutical that a company is developing IS better than using a UUID as the identifier, because the company can demonstrate control over the DID, and can guarantee that everybody in the world resolves it the same way. And if the company wants that resolution to also convey to the world that the identifier in question is used to track a pharmaceutical, is that really an architectural perversion? Or is it just good sense, because the systems that consume the resolved data aren't interested in crawling a public registry of pharmaceutical VCs to learn the same info?

In allowing a DID doc to contain other types of metadata, I admit that we are opening up the doc to be polluted/abused. It's the camel's nose in the tent; pretty soon someone will try to dump a genome in a DID doc, and we should all be rightfully annoyed at that prospect. But these will be minor exceptions, won't they? And, in keeping with crisp responsibilities, isn't protection against that abuse the job of whatever persistence mechanism is used by the DID method in question, rather than being the job of our spec? This risk of abuse is inherent in almost any architecture. We hear about URLs that contain 4k or 10k of data, and HTTP POSTs of 50 GB of data, and DNS TXT records that contain more than they should. Won't good sense and interop pressures eventually distill best practice, making this risk of little practical consequence? And won't some non-normative advice in the spec help?

@jandrieu
Copy link
Contributor

I get the architectural attractiveness of Joe's worldview that DID documents are just for describing the cryptographic control over the identifier. In software architecture, it is generally wise to limit the responsibilities of each component to something crisp and simple. Making the DID doc just about enabling proofs of control would seem to follow that wisdom.

However, I think the assertion that VCs are a layer for communicating all other metadata is making the same put-too-many-responsibilities-on-a-single-layer mistake in the other direction.

I did not make that assertion, nor do I believe anyone else has. In the example I gave Adrian with the sequence diagram, there was no VC involved at all. The relationship to the service endpoint was fully captured in the zCap. The DID Document was only used to verify signatures.

@dhh1128 continues:

It may be helpful here to think by analogy. The DNS system is somewhat like the system we're trying to create for DIDs. (Yes, I know there are important differences. Hang with me.) Squinting to suppress details, the raison d'etre for DNS is to resolve hostnames to IP addresses. However, are we losing something important when we squint this way? CNAME records turn out not to be the only useful artifact in DNS. We have A records and MX records and TXT records that all leverage the same resolution mechanism to convey additional metadata -- and that are all consumed in the same way, by the same types of code.
...
Anyway, I suggested the analogy because I think it applies here. The architectural level that resolves a DID to control information is the same level that needs to know where to send messages to a DID (service endpoints).

This is exactly where I disagree. You do NOT need to know how to send messages to me to prove that I control a given cryptographic identifier.

Period.

The ability to get a DID implies at least an initiating communication. So, yes, if you want to interact with me, you need a channel, but that channel not only doesn't need to be in the DID Document, the initial communication of the DID itself will ALWAYS be outside of the DID Document. The only case where the DID is NOT communicated outside the DID Document is when data aggregators scrape up all the DID Documents in a registry: an anti-use case that we SHOULD not support.

Similarly, just because you have an identifier for me doesn't mean you NEED a means to communicate with me. Just because you have my name doesn't mean its appropriate to call me or email me or visit my home.

This is the whole point of privacy: the freedom to interact WITHOUT revealing unnecessary information that one may or may not want to reveal.

It is that higher interaction layer of directories, resource delivery, and service endpoints that risks breaking the privacy model of DIDs: namely that you can prove control over an identifier without reliance on a trusted third party AND without unnecessarily revealing unintended information.

Pointing to examples where some entities have information they want to publish has nothing to do with what DIDs need to do, in fact, to be DIDs. Both @talltree and @csuwildcat have made this argument, but it is irrelevant to the impact that DIDs are going to have on individuals--which is where the privacy compliance issues create a legitimate existential threat.

If DIDs are deemed PII by courts in the EU, all ledger based DIDs risk becoming illegal for individuals in Europe. Is that what people are arguing for? Build DIDs for corporations and stop worrying about people? That's a non-starter for me.

IMO, DIDs aren't here to solve the directory and storage / distribution problem. They are here to provide a decentralized root of trust for anyone, even those who have higher privacy requirements than corporations.

In allowing a DID doc to contain other types of metadata, I admit that we are opening up the doc to be polluted/abused. It's the camel's nose in the tent; pretty soon someone will try to dump a genome in a DID doc, and we should all be rightfully annoyed at that prospect. But these will be minor exceptions, won't they? And, in keeping with crisp responsibilities, isn't protection against that abuse the job of whatever persistence mechanism is used by the DID method in question, rather than being the job of our spec? This risk of abuse is inherent in almost any architecture. We hear about URLs that contain 4k or 10k of data, and HTTP POSTs of 50 GB of data, and DNS TXT records that contain more than they should. Won't good sense and interop pressures eventually distill best practice, making this risk of little practical consequence? And won't some non-normative advice in the spec help?

These are great examples of exactly why we should not include these abuses as justification for design decisions. YES, people will abuse whatever you allow. YES, you could establish an entire web-like linked set of resources using DNS TXT records. But if you did so, you would be abusing the system and using it in unintended ways with unintended consequences. You would have unforeseen privacy problems and you would have absolutely horrible quality of service controls. It makes no sense to champion hackable work-arounds as design goals.

So, let's get the DID part of DIDs right: let's make it easy, secure, and reliable for anyone to prove cryptographic control over an identifier without reliance on a trusted third party and without revealing any unnecessary information.

We can solve the directory and data storage problems in future specs.

In contrast, encouraging controllers and/or DID Methods to assert arbitrary attestations in DID Documents WILL lead to privacy harms. Full stop.

@brentzundel
Copy link
Member

Everyone, please consider this thread locked and refrain from commenting on it until the chairs open it again.

@brentzundel
Copy link
Member

[wearing my chair hat] We are now unlocking this thread.

@w3c w3c unlocked this conversation Oct 5, 2020
@brentzundel
Copy link
Member

brentzundel commented Oct 5, 2020

[removing my chair hat]
I believe it would meet my use case (as stated in #199) if the type property was understood the same way it is in the VC Spec.
That is, type should be informative about the DID Document, not about the DID Subject.

@brentzundel
Copy link
Member

[putting my chair hat back on] The chairs also discussed with @jandrieu his concerns about process and introducing new features and believe that those concerns have been resolved. Joe, please feel free to correct this assertion, or to add more information as you prefer.

@ChristopherA
Copy link
Contributor

On Mon, Oct 5, 2020 at 2:31 PM Brent Zundel [email protected] wrote:

I believe it would meet my use case (as stated in #199) if the type property was understood the same way it is in the VC Spec. That is, type should be informative about the DID Document, not about the DID Subject.

I'm slightly more comfortable with that, with the caveat that if this becomes another way to coerce subject information I'll reject it, i.e. 

"type": ["DIDDocument", "SomethingUnkueAboutThisKindOfDIDDocumentLikeOffersCapabilities"]

and not:

"type": ["DIDDocument", "PersonTypeOfDDIDDocument"]

— Christopher Allen

@peacekeeper
Copy link
Contributor

That is, type should be informative about the DID Document, not about the DID Subject.

This is not how properties in the DID document work. Properties describe the DID subject, not the DID document. If you want to say something about the DID document, that should go into DID document metadata.

This is pretty fundamental to how we ended up designing the DID document's abstract data model (ADM), and metadata structures.

@brentzundel
Copy link
Member

That is, type should be informative about the DID Document, not about the DID Subject.

This is not how properties in the DID document work. Properties describe the DID subject, not the DID document. If you want to say something about the DID document, that should go into DID document metadata.

This is pretty fundamental to how we ended up designing the DID document's abstract data model (ADM), and metadata structures.

this makes sense to me. would it fit best here?

@OR13
Copy link
Contributor

OR13 commented Oct 5, 2020

If the type is not about the did subject it does not belong in the did document, I am assuming this is about adding type to the did document.... (and therefore about describing "types" of subjects)

Pretty sure, we can't restrict the domain of RDF types to only politically acceptable values....

There is nothing stopping someone from making:

  • https://example.com/ns/Sniper -> ["DIDDocument", "Sniper"]
  • https://example.com/ns/IllegalCombatant -> ["DIDDocument", "IllegalCombatant"]
  • https://example.com/ns/DailyKillsCredential -> -> ["VerifiableCredential", "DailyKillsCredential"]

Then issuing credentials associated with number of combatants killed per day.

This is not a privacy / political thing... its a data modeling thing... and if we are saying that type is treated like https://www.w3.org/TR/vc-data-model/#types

I expect we are inviting RDF extensibility features to be used, which is a useful, powerful feature, and with great power comes great responsibility.....

I don't see a problem with warning people clearly about type and service and defining them formally.

TL;DR; Im in favor of adding type to the did document, but I would be careful to warn people about using it... similar to using services.

@jandrieu
Copy link
Contributor

jandrieu commented Oct 5, 2020

I like the "type" semantics as described by @brentzundel

I see how that approach addresses the use case of "representation" and is a natural evolution of the original issue #199 and the pr #348. As such, there are no process issues involved.

I agree with @ChristopherA that, from a privacy perspective, it isn't foolproof. You could still coerce a document type into an assertion about the individual, but I doubt we could avoid all of the ways someone could abuse the system by embedding different fields. For example, it would be easy to have a verification method that leaks group membership because only members of a particular group can get the secret mechanism that such a proof requires. We can't plug all the holes.

The most important thing it does for me is shift from assertions about the Subject to assertions about the DID Document, which is especially useful when you're trying to figure out the right way to interpret this digital object which may combine a DID Document with other schema types. That makes perfect sense and aligns with how we used type in VCs.

I have neither process nor privacy concerns with the definition of the term "type" as @brentzundel proposed. As long as we avoid semantics of saying the type is about the Subject, I'm good.

@iherman
Copy link
Member

iherman commented Oct 6, 2020

I believe it would meet my use case (as stated in #199) if the type property was understood the same way it is in the VC Spec.
That is, type should be informative about the DID Document, not about the DID Subject.

This then touches on a different discussion addressed in several other is, namely on the usage of id to identify the subject. Indeed, I believe what you propose contradicts the semantics of JSON-LD/RDF as used today for DID.

At this moment, the consensus seems to be (see for example the discussion on the Appendix spearheaded by @talltree in #373) that statements in the DID Document are on the subject. The fact that we use the id property for the subject, i.e., the JSON-LD @id keyword, is in line with this. However, this statement is then valid for the type property (i.e., the JSON-LD @type keyword), too: these two are interrelated in the JSON-LD semantics (i.e., its RDF representation). If we want to say that type refers to the DID Document and not the subject, then id cannot identify the subject, it must identify the DID Document.

This is possible, but requires changes. I did raise the alternative (in #401) to introduce a distinct subject property (we can bike shed on the property name) to identify the DID subject explicitly. As a consequence the id property would clearly identify the DID document and not the DID subject. (The id could actually be missing in practice, i.e., the DID Document itself would be identified through a blank node in RDF.) The change looks superficial, but it is a significant change in terms of the Linked Data representation of the DID concepts. (@talltree will be having fun reshaping #373 :-)

We may have to look at the definition of all other properties to check that we don't hit a different semantic problem. Personally, I do not expect so, but one may never know without checking.

See #413, #401, #404, #373 for related issues...

@iherman
Copy link
Member

iherman commented Oct 6, 2020

Just an addition to the previous comment. If we want to keep to the notion that all properties that we have defined so far are to be understood as defined for the DID subject (see the comment of @peacekeeper in #410 (comment)), but we also want to keep the approach to type by @brentzundel in #410 (comment), we can make it work in JSON-LD through something like:

{
    "@context": "https://www.w3.org/ns/did/v1",
    "type" : "WhateverTypeWeWantToUse",
    "subject" : {
        "id" : "did:example:123456789abcdefghi",
        "authentication" : {
            ...
        },
        ...
    }
}

This keeps the semantics described by @peacekeeper but keeps the freedom of assigning a type to the DID document. Note that, even if this case, we cannot avoid somebody using type on the object of the subject property, but we can avoid talking about it in the normative part of the spec (and mention the problems in a separate section).

I am not saying we should or shouldn't do this, I put this forward simply as a technical alternative. It is a bit more convoluted than what we have, but it may work.

@peacekeeper
Copy link
Contributor

Thanks @iherman, I think it's useful to be reminded of this! Just for reference, this was also one of the options that was discussed in the metadata thread (e.g. see my comment #65 (comment) and your reply #65 (comment)).

@msporny
Copy link
Member

msporny commented Oct 6, 2020

I like the "type" semantics as described by @brentzundel

If we go this route, we might as well use a Verifiable Credential to express a DID Document. This was raised as an option years ago and the consensus of the group was to not do that. As a result, we've created a new way of expressing metadata, a new abstract data model, a new registry process, etc...

I agree that just using a VC from day one would have been easier and avoided all of these issues. So, the question now is if there is consensus to go that direction. Here's what we could end up with:

{
  "@context": [
    "https://www.w3.org/2018/credentials/v1",
    "https://www.w3.org/ns/did/v1"
  ],
  "type": ["VerifiableCredential", "DidDocument"],
  "credentialSubject": {
    "id": "did:example:123456789abcdefghi",
    "authentication": [{
        "id": "did:example:123456789abcdefghi#keys-1",
        "type": "Ed25519VerificationKey2018",
        "controller": "did:example:123456789abcdefghi",
        "publicKeyBase58": "H3C2AVvLMv6gmMNam3uVAjZpfkcJCwDwnZn6z3wXmqPV"
      }],
      "service": [{
        "id":"did:example:123456789abcdefghi#vcs",
        "type": "VerifiableCredentialService",
        "serviceEndpoint": "https://example.com/vc/"
      }]
    }]
  },  
}

Example of a schema

{
  "@context": [
    "https://www.w3.org/2018/credentials/v1",
    "https://www.w3.org/ns/did/v1",
    "https://www.example.org/schema-ledger/v1"
  ],
  "type": ["VerifiableCredential", "CryptographicSchema"],
  "credentialSubject": {
    "id": "did:example:schema:zyx9876",
    "credentialSchema": { ... }
  },  
}

We would move all metadata into the top-level object, including type and be done. I expect the JSON-only worldview would object to this approach.

If we decide to go this direction, there will have to be a massive refactoring of the DID Core specification -- we will almost certainly miss the CR deadline (possibly by a lot depending on how much push back there is on changes to metadata/resolution/dereferencing).

The other path could be that we move type into metadata, which will not address @jandrieu's concerns about things like Person being associated with a DID Document. I expect a number of -1s on that path because it's quite non-standard wrt. everything we're working with... it's yet another non-standard way to express type information.

And finally, the other compromise is to use type associated with the DID Subject, because we are talking about the DID Subject, not metadata about the DID Document / resolution process... the DID Subject is a schema in some cases.

At this point, I don't see a clear set of proposals that could get to consensus... we're going to need a special topic call for this.

@OR13
Copy link
Contributor

OR13 commented Oct 6, 2020

PROPOSAL: Add type as a DID Document property (a property of a DID Subject)
PROPOSAL: Add type as DID Document Meta Data
PROPOSAL: Make DID Documents Verifiable Credentials

At this point, I could take any of these, though I am pretty sure 3 would not succeed...

It would be nice to see the ambiguous use of "credential" for both attestations and key material be unified by the did spec...

The more I think about this, the more correct option 3 looks, although I expect it's politically nearly impossible to achieve.

@msporny
Copy link
Member

msporny commented Oct 6, 2020

PROPOSAL: Add type as a DID Document property (a property of a DID Subject)

This one is going to fail because of the arguments that @jandrieu made above.

PROPOSAL: Add type as DID Document Meta Data

This one doesn't change the concern that @jandrieu has above and makes the typing system unnecessarily vague and/or complicated.

PROPOSAL: Make DID Documents Verifiable Credentials

This one is going to fail from the JSON-only worldview standpoint.

I expect that none of them will reach consensus and the first one will have the least number of -1s. We can run the proposals and see where we end up.

@jandrieu
Copy link
Contributor

jandrieu commented Oct 6, 2020

FWIW, if "type" is about the DID Document as proposed by @brentzundel, then adding it to meta-data would be fine. It would also make it much clearer that the "type" is about the DID Document and not about the Subject.

I agree that this would be a different approach than basically the entire rest of JSON-LD, communicating the type information in a separate document, but it fixes the issue of type wrt privacy.

FWIW, I would also support DID Documents just being VCs, although I understand the process issues involved. That would also solve a number of issues about provenance, providing an assurance that all of the DID Document itself is from the Controller rather than generated by the DID Method.

@dhh1128
Copy link
Contributor

dhh1128 commented Oct 6, 2020

I am strongly opposed to turning DID documents into VCs, and will feel compelled to actively campaign against any standard that heads that direction.

@OR13
Copy link
Contributor

OR13 commented Oct 6, 2020

@dhh1128 care to take a position on adding "type" for DID Subjects vs DID Document Meta Data?

@brentzundel
Copy link
Member

During the WG call today, we reached the following:
RESOLVED: Document how dangerous specifying properties such as the "type" of a DID subject, that are not related to expressing cryptographic material/verification methods related to using the DID, in a VDR can be for entities such as people.

We also had broad agreement that more conversation was needed.

@brentzundel brentzundel added the needs special call Needs a special topic call to make progress label Oct 6, 2020
@OR13
Copy link
Contributor

OR13 commented Oct 20, 2020

Another question related to type in did:web, w3c-ccg/did-method-web#16 (comment)

@peacekeeper
Copy link
Contributor

I will add one perspective to this thread which is probably not very popular.. If you add various properties such as "type" to your DID document, then yes that increases the risk of correlation and surveillance and can lead to a lot of bad things. And that's why we have wallets and agents and secure data stores, etc.

BUT: Those are additional components which require additional software and architectures and processes than just DID Resolution. If you have properties such as "type" or even VCs directly in your DID document, then the DID Resolution process will often have guarantees (cryptographic verifiability, immutability, persistence) that you don't have if you have to rely on service endpoints or other external components. Of course this also depends on the exact DID method and service endpoints you are using.

All I'm saying is that there may be a (very small, admittedly) set of use cases, where the benefits and guarantees of DID Resolution can outweigh the privacy risks, and therefore justify including additional data in the DID document.

@msporny msporny added the do not merge Do not merge - waiting on resolution to issue label Oct 26, 2020
@jonnycrunch
Copy link
Contributor

So, one of my biggest issues with this is that @type being implied and not explicit defined using JSON-LD using the ADM is the following:

using JSON-LD and the @context we have effectively accepted mandated JSON-LD processing by nature of an implicit property into the Abstract Data Model that was never explicitly defined. While I understand the convenience of knowing that the DID subject is an IOT device and NOT a person, I am too concern about the reverse ... that a DID subject is a person and could be more explicitly defined as a jew or insert pet name and opens the door to despotism. To quote the DID spec: "The process of binding a DID to something in the real world, such as a person or a company, for example with credentials with the same subject as that DID, is out of scope for this specification. For more information, see the [VC-DATA-MODEL] instead." Proof of personhood could easily be accomplished in the VC/VP spec. i.e

'''
"{ credentialSubject: { "@type" : "person" }" ...

'''
This keeps the clear delineation of layer of the assertions as ultimately @type is an assertion being made about the did subject which could come from the solipsistic view (self signed) or from an issuer.

@OR13
Copy link
Contributor

OR13 commented Oct 28, 2020

In JSON-only representation, you can add any property like "kind", "type", "ssn", "biometricTemplate", and it will be preserved by the rules of the abstract data model.... so from a privacy perspective, while we are arguing over open world data modeling issues with RDF, the non RDF DID Document representation support an even more permissive stance....

The abstract data model has made even the simplest conversations regarding privacy more complex, because its not just JSON-LD privacy engineering we must consider... now we must consider JSON, CBOR, JSON-LD, and maybe in the future XML, YAML, PDF, Protocol Buffers.... each with their own privacy issues, and each with their own production and consumption rules, which might yield privacy issues.

Every DID Document representation we define increases the attack surface for both privacy and security issues related to DIDs, each representations parsers have different exploits, which imply different sanitization, and different "meta data fields" which might leak privacy related information, such as timing or filesystem details.

In DID Core we are saying "we care deeply about privacy and security" and "we want to enable unbounded did document representations"... both cannot be true without admitting that privacy and security are less important than the preferences of did method authors.

I don't think the privacy minded folks understand what the abstract data model and "many representations objective" has opened us up to.

@dmitrizagidulin
Copy link

@OR13

The abstract data model has made even the simplest conversations regarding privacy more complex

+1, very much agree.

@rhiaro
Copy link
Member Author

rhiaro commented Nov 16, 2020

There has been no consensus to add a type property for DID docs. Language about danger of type-like properties has been added. Nothing remains to do on this PR.

@rhiaro rhiaro closed this Nov 16, 2020
@msporny msporny deleted the rhiaro-type branch November 24, 2020 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not merge Do not merge - waiting on resolution to issue needs special call Needs a special topic call to make progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.