-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does DID Document metadata belong in the Document? #65
Comments
A DID Document is a graph of information. That information is primarily about the DID subject. If we want to make statements about the graph itself, those statements do not belong in that very graph. There may perhaps be exceptions we can make for things like |
This issue was discussed in a meeting.
View the transcriptBrent Zundel: #65Markus Sabadello: this issue is about the question of having data in the DID document some of which is about the subject and some of which is about the DID document itself … there were ideas to maybe remove some properties like created or updated … or proof property. That’s where this discussion started … we thought created, updated, proof is it about the subject or the DID document itself … dmitriz wrote a really good summary … The question that we need to agree on is are we okay with data that is sometimes about the subject and sometimes about the document … or do we want to separate that somehow … I think there’s a third category which is data about a the resolution process, some metadata may be added about that in the result … but the primary question is are we fine having data about the did subject like services an dpublic keys … as well as about the document like proof … and if we’re not, do the subject and the document have separate identifiers? … we spent a lot of time discussing that … we felt we are comfortable with combining that, they don’t need separate identifiers … the same identifier for the subject and the did document Manu Sporny: #27 Manu Sporny: #28 Manu Sporny: I want to point out that we have two PRs pending, 27 and 28, when I put those PRs in there my assumption was that the created and updated were being used to describe metadata about the DID document … but after putting it in I can see how people thought they were about the DID itself, the identifier … this is really a metadata discussion, if created and updated are truly about the identifier itself and not metadata about the DID document then it’s fine to keep them in there … but if is metadata about the DID document I feel strongly we should take it out, we shouldn’t be conflating those two things … we need to decide whether or not it’s okay to use the same identifier to kind of sort of refer to two different things Ivan Herman: +1 to manu Manu Sporny: that is a huge red flag in the linked data space, your semantics get really messy … similarly you do not need to have an identifier for everything … you can do autogenerated identifiers, that’s a common thing, we use it in VCs … we could have metadata about the DID document that’s outside of the DID document itself, much cleaner separation Daniel Burnett: The DID document is not the resource. It is an explicit representation of access mechanisms (to use the HTTP URI analogy) Manu Sporny: if we come to that philosophy it’s much easier for us to determine if a particular item is in or outside of the DID document … I thought the original issue was about metadata about the DID document, interested to see if anyone hears differently Jonathan Holt: I thought these were for convenience, and if you wanted to find the original source of truth you spin up a resolver or your own node and verify the assertions being made in the DID document Manu Sporny: I’m hearing Jonathan say “issued” and “created” are about the DID Document. Jonathan Holt: my interpretation was they were self asserted related to creation of the DID document, and are there for convenience … what markus mentioned for identifiers, the keys ed25519, hiding keys.. was that what you were talking about as far as the subjec tidentifier? you have to have a self asserted key identifier in the DID document that’s only about its own keys? … or are we having this conceptual framework of referring to delegate keys or controller keys? … what are the semantics we are working with? Markus Sabadello: we’re not talking about identifiers for keys, we’re talking about whether the DID is an identifier for the subject, that’s where we ended up after a few months of httpRange-14, or is the DID the identifier for the document, or is it both? … I think the community thinks it should be both … but understand that’s ambiguous from a linked data perspective Jonathan Holt: the subject is the identifier around the DID document, not a human subject? Markus Sabadello: the DID subject is the person, org, thing, whatever, resource, identified by the DID Daniel Burnett: “The DID subject is the subject of the DID.” <- Official definition :) Dmitri Zagidulin: interpretation about created, updated, my summary in issue 65, I was interpreting them to be metadata about the DID document … I’m not sure it makes sense to have metadata abut the DID because it doesn’t apply separately to the DID document Markus Sabadello: +1 to dmitriz that created, updated are metadata about the DID document Dmitri Zagidulin: On the issue of does metadata about the document belong there. On which grounds is manu objecting? … I laid out several arguments that i’ve seen you make in the various issues against it, which are relevant and howd o you feel about the counterpoints? Joe Andrieu: I’ve flipflopped on this issue … one aha for me right now is the definitive way to find out if a given DID document is the correct DID document for a given DID is to execute the resolution process Daniel Burnett: markus_sabadello, does created not apply to BTCR DIDs where DID documents are generated rather than stored? Joe Andrieu: If that’s correct, there’s not necessarily a baked in way for a document to demonstrate on its own as a set of bytes that it’s the authoritative one, I think whatever metadata you need to verify the process needs to be in the DID document … otherwise that separation feels a little false to me Manu Sporny: there is a certain subset of things I’m strongly objecting to, and that’s the conflation of any kind of semantics … it’s not clear to me what the group things issued, attributed, means yet … one thing that might be helpful, there are two categories of information we are talking about … information about the identifier itself, the DID string, and whatever it may identify … and then information about the DID document itself … those are two distinct categories that i think we should keep distinct … if we conflate them there’s nasty stuff that can happen … that’s where my concern comes from … Let’s say that we say that updated is the time the identifier was updated. Semantically that’s meaningless. I know the identifier was updated but it doesn’t tell me anything more than that … whereas if the DID document was updated, there’s a change the resolver can check, that’s about the document itself not the identifier. The semantics are very different Dmitri Zagidulin: nobody is proposing that it would be about the identifier Manu Sporny: i’m not convinced … I think some people are and some people aren’t, and some people don’t understand what conflating those two things does to the entire data model … You may not be proposing that and I think other people might be, we need to get down to the definition of what created and updated really means to people, and then see if those definitions are the problem Dmitri Zagidulin: the topic of this issue is does metadata about the document belong in the document … that’s a separate httpRange-14 discussion … nobody is conflating, just discussing whether data about the document belongs in the document Ted Thibodeau Jr: “How do we identify the identifier which identifies an entity?” Dmitri Zagidulin: Having metadata about the DID document in the document allows portability … it allows fo standardizing of that metadata among mutable DID methods that don’t have underlying ledger mechanisms Markus Sabadello: manu is saying sometimes we’re talking about metadata about the identifier, I don’t think that makes much sense … it always identifies something, and the data is about that resource … we can’t have data about the identifier, we can only have data about the thing being identified … with data about the subject, data about the DID document … I agree it’s better to separate them, even though conflating was the outcome of a few months of discussion of httpRange-14, makes more sense to keep separate … agree with dmitriz that the metadata about the document inside the DID document is the issue. inside the DID document we need a separate object or level of JSON-LD structure … one one level describe the document, on one about the subject Dave Longley: when we’re talking about updating or applying an update to a DID document, eg. adding a key, we’re really updating the subject Daniel Burnett: yes, explicitly marking any meta data as such by placing it in a separate subtree in the DID doc would at least make clear that it is different Dave Longley: The predicates in a DID document are things like authorization, which the subject, some aspect of a person or some thing, and when you add a key you say this person authorizes this key for some purpose … that’s the statement you’re making … if you make that kind of update you’re updating information about the subject, not the document … these update times that are metadata might actually be information about the subject not the DID document Manu Sporny: I agree with dlongley, that’s the point I’m attempting to make. Dave Longley: dmitri also brought up portability, we’re talking about porting information about the subject, not the document … the information inside the DID document is about the DID subject. That’s what you’d want to port … I think we disagree less than we think because a lot of these things we’re talking about are really just more information about the DID subject … manu was talkign about the identifier, I think he really meant information about the subject not the DID, we’re not changing DIDs, that doesn’t make any sense … a lot of the disagreements go away because we’re not talking about metadata that happens to live on some registry somewhere, we’re talking about the subject Joe Andrieu: manu, you conflated the identifier with the subject. A lot of people have been responding in confusing because of that. I don’t think anyone is talking about putting information about the subject in a DID, that would be a privacy antipattern … we have a did that’s a string, we don’t need metadata about that … The subject.. the DID document is how you get from the DID to secure interaction with that subject … We need to be much more careful about the language we use here, it’s confusing us, going to be more confusing for others … we have this weird issue of the definitive DID document is not a string of bytes anywhere, it’s the output of a resolution process … to understand if it’s definitive, whatever metadata we use, needs to be part of the DID document Daniel Burnett: I wanted to bump up a level here … the metal model that led to where we are … as long as we can keep that mental model we’ll be fine … what joe said matches what manu said … we wanted our use of DIDs as URIs to work similarly to the way other URIs work … such as http URIs … if you look at the definition that we always refer to of a URI there is a resolution process and a dereferencing process … the resolution process is where you discover what the access method and operation methods with the resource are, including any kinds of authn approaches that are necessary … we’re different from http - we put a lot of that information that is part of the resolution process in the DID document … we’re getting confused by making the DID document something more magical than it is intended to be … which is a representation about how you access and update the resource … it’s not the access to the resource itself , it is the things you can do with the resource and how you can authenticate yourself for that … That may help. We may still decide that there is information that is not about the resource itself but we stil may put it inside the DID document … joe is correct that conceptually the resource access methods all of this exists even for DID methods that do not explicitly store a representation of the DID document … the DID document can be generate if necessary, not have to live at a location somewhere Ivan Herman: from a linked data / semantic web point of view … with JSON-LD for the did document, that means we define in a particular syntax a bunch of RDF triples and if I can imagine a linked data environment which includes lots of triples, includes the triples in the did document … according to the JSON-LD and RDF, there are triples, and all what I see in the DID document. The triple consists of subject, predicate, object, and the subject is a DID URL … that’s what happens in RDF Manu Sporny: yep, to Ivan. Ivan Herman: none of those triples have to say anything about the DID document itself because the DID document is just a collection of triples Manu Sporny: exactly, Ivan. Ivan Herman: if we want to say something about the DID document, we need another subject that identifies it, in order to play properly with the linked data world … if you link it to any other process that wants to use these identifiers, we have to be careful because you will get wrong triples … triples that say things you don’t want … and someone may use those triples to deduce things that semweb technologies can deduce, you will get wrong statements, you cannot mix these two up Brent Zundel: we have 9 mins left Dmitri Zagidulin: to draw a parallel with the VC data model … we had the same discussion about the created metadata, and there we have two separate sections, subgraphs … one about the credential and the other about the credential subject … we label it … we standardized the created timestamp for the verifiable credential Dave Longley: +1 to ivan, the DID Document is a graph/dataset with triples about the DID subject in it Dmitri Zagidulin: this is the same thing that’s being proposed for the DID document … we standardize it for the DID document, not to the person or org … if we need to have a separate linked data section so that the triples don’t get confused, that’s fine, let’s talk about that Ivan Herman: +1 to dmitriz Dmitri Zagidulin: but I want to re-emphasize the need for storing the data about the document not the subject in the document itself Joe Andrieu: the conversation isn’t about triples. it’s about quads. about statements about statements. Dmitri Zagidulin: the counter that manu seems to be proposing is we let each did method standardize their own. That doesn’t seem right Manu Sporny: dmitriz that is absolutely not what I’m suggesting … I think there’s some miscommunication … we need some concrete examples Dmitri Zagidulin: let’s take ‘created’ as a concrete example. Manu Sporny: The thing that you raised is spot on - in VC we had two subgraphs, one for the credential, the other for the credential subject … this is the exact same thing … the issue is that.. what we need is put some concrete examples and ways we could address this problem … we can use created and issued as examples … that would help people see how the philosophy applies to an actual concrete solution … we only need two examples, there are two ways we can go … that’s what we need for the next time we discuss this Ivan Herman: +1 to manu, we need specific examples Manu Sporny: people can see what’s being proposed Joe Andrieu: +1 for specific examples … The thing is we’re not talking about triples, we’re talking about quads Kenneth Ebert: I like the examples, too Joe Andrieu: I’m not familiar enough with JSON-LD spaghetti, methods for representing quads Brent Zundel: +1 for examples Joe Andrieu: we’re talking about the context in which the triples are stated Dave Longley: {id: Joe Andrieu: we need to make statements about that context … we need to be able to in the DID document say something about the DID document … metadata about the resolution is part of proof … why do we believe this? here’s some metadata about the process to increase your confidence that this is legitimate … What needs to be in there we should figure out at the DID document level, not at the DID resolution level Markus Sabadello: +1 to keep the triples/quads clean and separate. Strictly speaking we would need a separate identifier for the DID document Daniel Burnett: +1 dlongley Manu Sporny: you don’t need to give the DID Document a separate identifier… can be a blank node… works just fine. Markus Sabadello: the problem with that which we’ve discussed before for a few months, if we give the DID document a separate identifier we ran into problems defining the dereferencing process with URLs, especially if the DID URL has a fragment … the way you dereference a fragment is you first deref the primary resource, without the fragment. The result has a mime type and dereferencing the fragment depends on the mime type Ivan Herman: +1 to markus_sabadello Markus Sabadello: if it’s an identifier for the subject, we can’t dereference it because it’s a real world resource and doesn’t have a mime type … I like what dmitri said, parallel with VC, separate sections about the document and the subject Dave Longley: a DID Document itself is much more ephemeral – you generally don’t “talk about it”, except perhaps to make statements in a resolution process Brent Zundel: we had a recommendation to present real world examples so we can have something more concrete to discuss about … The issue, 65 is assigned to markus Manu Sporny: {resolution_things… didDocument: {did document things}} Brent Zundel: markus, comfortable working to arrange some concrete examples? Markus Sabadello: I can come up with some examples Manu Sporny: {metadata_about_did_document… didDocument: {did_document_stuff}} Daniel Burnett: yes, dlongley, this is what I meant by giving a DID document more reality than it should have, which is a physical representation of resolution info Dave Longley: I think it helps to think of the DID Document as a graph … for which we generally don’t give an identifier Ted Thibodeau Jr: DID document … is { .ttl owl:sameAs .jsonld owl:sameAs .rdfxml }? Can you speak of one serialization? Or only of all? Ted Thibodeau Jr: It can be important to track when info about a subject was changed, as well as when the subject changed, as well as when the info about the subject was logged (which may be different from when it changes)… Ted Thibodeau Jr: VERY complex! |
@dlongley I believe that statement is fundamentally incorrect.
The DID document provides the information necessary to interact securely with a DID Subject. That's it. It is NOT about the did subject. Yes, I can see how you could argue that how you interact with a Subject is indirectly and ultimately about the subject, but that is just going to get us in trouble. It's the wrong mental model. The defining line here the DID Document provides the information needed to interact securely with the Subject. If it isn't about interacting securely with the subject--potentially including meta-data about why we should believe the rest of the content is itself secure--then it doesn't belong in the DID Document. Statements about Subjects don't belong in DID Documents. If we don't tow that line, we are inviting a privacy nightmare with this work. |
Yes, this information is about the subject. That there are risks there are not a reason to break the model, IMO.
I think it would be confusing to create a new model here (both mentally and technically) -- i.e, "public information about a subject is not about the subject, but private information is". The issue isn't with whether or not the information is about the subject. It's about public, discoverable information vs. private information. What we need to do is provide clear guidance on what should be said where. This is no different from talking about people in general and I suspect moving away from that will only create more confusion. I think it is better to draw on what people already know about public vs. private to help avoid trouble rather than try to obscure it away with a special model.
Privacy is always going to be a consideration no matter what we do. We have to be clear and upfront about what kind of information should be in a DID Document that is publicly available or on a blockchain, for example. And, yes, no private information should ever be there. |
@jandrieu @dlongley chiming in again with my Semantic Web hat on; maybe this is one of those cases when the RDF terminology does help. (It does help me, but I am biased by my background. If I look at the DID document, then I only see triples like
I.e., strictly speaking, we are making statements about the DID (URI). The RDF Semantics doesn't require anything more about the DID URI and what it "denotes" (in our case about the relationship between the DID URI and the DID Subject). It says:
(Emphasis is mine). In other words: the only thing the DID document contains are statements about the DID as a URI, and any relationship between the DID and the DID subject is defined "outside" of the DID document. You guys tell me exactly where. Does this help? |
@dlongley The distinction between "private" and "public" is a false dichotomy. I've been writing and speak about this for years. http://blog.joeandrieu.com/2011/04/10/constellations-of-privacy/ MANY people have repeatedly argued that once a piece of information is public it is no longer private. This is grossly incorrect. It is also usually a bald-faced justification for the kinds of broken Big Data business models which have inspired many in this community to create a better alternative. Semantically, these terms are essentially meaningless. As such, it is incorrect scoping for determining what is or is not in the DID Document. What goes in the document should ONLY be information that enables secure resolution of appropriate resources, within the meaning of RFC 3986 https://tools.ietf.org/html/rfc3986#page-28:
You wouldn't say that a DNS record is about the owner of the record. It's about how you turn that identifier into service endpoints. In the same way, what is in the DID Document is not about the Subject, it is about how you interact with the Subject securely. That is a very specific subset of information "about the Subject". Asserting the broader statement will lead to inappropriate information included in DID Documents rather than expressing them through other secure or verifiable mechanisms, like VCs. This would directly undermine the separation of concerns that underlies the entire framework of VCs and DIDs and the idea of decentralized identity as we--as a community--have been working on for years. If we don't make the distinction about what goes in a DID Document clearly, early, and consistently, we will be enabling massive global tracking systems such as that proposed by GADI http://didalliance.org/. |
@iherman I think you have the gist of it, with one clarification. The statements are not about the DID-URI, but rather about how you use the DID. The distinction between DID-URIs and DIDs is an unfortunate one, but the DID Document can't know the full DID-URI that might be ultimately dereferenced. All the statements are relative to the DID. This makes for some delicate nuance between a DID-URI (whose ABNF is in the spec) and a DID as a URI, both of which might be referred to as a DID URI. |
In my view, this is in support of not drawing some artificial line at the data modeling layer between public and private. The data is about the subject -- the only question is about whether it is appropriate to express certain pieces of information in places where anyone can read them.
I don't think the terms are meaningless -- though they can get sticky to pin down, violating expectations. I think we'll find a similar problem with other approaches, too, as I mention below.
Yes, but you could say that "how you interact with the Subject did:123" is you "must call him by the name Joe Andrieu". Similarly, you could say "how you interact with Subject did:123" is you use endpoint "https://my-website.com/my-SSN/my-other-private-info/foo". Perhaps we'll end up debating the semantics of "secure" instead. Who knows? But I'm sure a nearly unbounded set of examples like this can be used to violate expectations here as well. None of this changes (or should change) that we have a graph data model that expresses information about subjects. Again, this is a debate about what should be expressed and where. You may have argued that "private" and "public" are semantically meaningless, but they clearly get across some meaning, even in this conversation. I don't think the distinction "how you interact with the Subject securely" solves the problem you want it to solve. I also don't think we should shy aware of terms that are more commonly understood; they get us closer to where we want to be and help establish the very expectations we worry may be violated. Perhaps it would be simpler and better to talk about the information in a DID Document in terms of who can read the DID Document. |
"Subject" is causing trouble again, still, forever. Also, a DID document may contain a representation of a graph -- but a DID document is not itself a graph! We interact with entities (that may be humans, organizations, or otherwise). Those entities may be identified by DIDs (but those entities are not DIDs). If identified by DIDs, those entities should be the subjects of DID documents which documents contain sentences describing those entities identified by the DIDs, and which documents might also contain sentences describing the documents themselves -- as they should in a Linked Data world -- and in such case, the documents should be identified with a different identifier than that which identifies the entity (the DID) which description is the purpose of the DID document. |
@dlongley I'm not saying they are meaningless terms, I'm saying they aren't black & white. What is private in one context may not be in another. Privacy is innately contextual and the context in which a DID Document might be read is unknowable. In fact, ANY data might be considered private, depending on context. Therefore, private v public is an ineffective way to distinguish between what should be in a DID Document and what should not. There will absolutely be service endpoints that some would consider private, while others will bend over backwards to keep correlatable yet non-private pseudonyms out. It's up to the DID Controller whether or not to use service endpoints (or other data) that might be correlatable and thereby, in some context, be considered private. It's not up to us, in the specification to define, embed, and then police some abstract notion of what should be private and what should be public. That way lies madness. @TallTed is right. In one lens, of course graphs are about subjects. That's how RDF works. I'm using Subject as the term is defined in VCs and in the spec: the entity referred to by the DID. It's unclear how you mean it. If the defining nature of what should and should not go in a DID Document whether or not a statement is about a subject (RDF sense), then there is no meaningful distinction; ALL RDF statements are about a subject. Equally so, if the litmus test is whether or not the statement is about the Subject (in the VC and DID sense), that is equally meaningless AND invites putting inappropriate information in a DID Document. If, instead, you build on the RFC3986 distinction about resolution, then the ONLY thing that should be in a DID Document are statements that enable secure interactions with the Subject, including, IMO, the provenance of the DID Document itself, because it tells you why you should believe any of those statements are "secure". That's my litmus test. @dlongley, is there anything you want to put in a DID Document that doesn't pass that test? The examples you gave made my point more than yours. It's trivial (and yet potentially useful) to put information about secure interactions, which violates some notion of privacy. That's why private is a horrible litmus test. In contrast, any information you put in a DID that isn't about secure interactions with the subject absolutely should not go in the DID Document. Back to the point of this issue... For ALL DIDs, the only way to know you have the authentic DID Document is to exercise DID resolution according to the DID's method. As such, any supporting meta-data for why you should believe that resolution returned a correct DID Document is provenance that, IMO, should be included in the DID Document itself. Data without provenance is meaningless; therefore, we should embed the provenance WITH the data. You said
Could you unpack that? All I want it to solve is defining a litmus test of what should and should not go into a DID Document. The distinction I offer is actually a distinction. You're statements about subjects (or Subjects) provide no distinction whatsoever. You also said
"Privacy" is one of the least understood terms in this industry. Talk to anyone who has been working on the problem professionally for more than a freshman year and they will tell you that regulators, legislators, developers, end-users, and entrepreneurs constantly put forth different notions on what privacy means to them. To some it means to be left alone (Brandeis) to others it means agency (Gropper) to still others it means avoiding PII leaks. There is no commonly accepted definition of what is "private". For a hot minute Personally Identifiable Information (PII) was the red herring many thought would provide a functional way to manage privacy. Turned out that was a horrible way to try and discuss privacy, much less regulate it. Public and private are not well defined terms. Period. |
I think the problem is with this test -- I suspect just about anything can be construed to meet its demand. Any piece of information about the subject could be understood to be required to have a secure interaction with the subject, depending on the context. The subject's cat's name? Well, on catville.com, that's key. I think this test is actually less useful than thinking about who can read the contents of the DID Document. |
Exactly. So the requirements for catville are different than those for others. But let's take your offer and talk about who can read the contents of a DID Document. To date, there are zero authorization mechanisms for who can read a DID Document. Are you proposing we add some? Asking who can read a DID Document when deciding what goes into a DID Document per the specification is, IMO, almost as useless as asking who can read an HTML document to inform the HTML standard. Controlling access to DID Documents is not currently part of the DID specification. For all of the use cases currently in the DID Use Case document, it is presumed that DID Documents are accessible to anyone who has the DID and access to the mechanisms of resolution per its method. Notable exceptions in the community discussion are contextual DIDs such as did:git and did:peer, where if you aren't a part of the context, you can't resolve the DID. I expect adding authorization isn't what you mean. Some notion of baking authorization to read a DID Document into the DID Document would be a significant departure from current conversations. So, from a specifications standpoint, we should assume that ANYONE might read any given DID Document. Which is why ONLY that information directly relevant to secure interactions with the subject should be included. Putting your favorite cat, a street address, or an email address into a DID Document is an anti-pattern, UNLESS it, in fact, contributes to secure interactions with the Subject. Not that it might--that would lead us to potentially putting the entire data warehouse worth of PII in--but that it specifically DOES. A service endpoint of http://twitter.com/JoeAndrieu IS completely reasonable if that is how the controller chooses to present a channel for secure interaction. Arbitrary statements like "The Subject is known to the State of California as Joseph Andrieu" are NOT. In fact, that service endpoint MUST NOT be interpreted as saying the Subject is the person who controls http://twitter.com/JoeAndrieu, but rather simply that http://twitter.com/JoeAndrieu is a means to interact with the Subject. That interaction may be understood to be posting @JoeAndrieu publicly--which is, in fact, interpreted by others as sort of a digital drop of messages never even intended for Joe Andrieu. Can you unpack the insights you think we'd get by asking who gets to read a DID Document? |
No, I'm not suggesting we propose any. I'm suggesting that we're using an open world data model and that what should govern whether or not something appears in a DID Document depends on a combination of the what the DID controller wants to put there and what the DID method allows. These, in turn, should be governed, at the very least, by an understanding of who is able to read the DID Document. If anyone can read the DID Document -- then only put information in the DID Document that you're ok with anyone reading. I don't think it has to be more complicated than that in terms of data visibility. Beyond this, all we're doing is saying in the spec is: if you're going to represent verification methods, controllers, services, etc. -- here's the interoperable way of doing that. Side note: There are still discussions this group needs to have on GDPR-compliant "proxy/see also" services that can appear in DID Documents registered on blockchains. These services would direct people to more information about the DID subject, including additional service endpoints that may not be able to be written to the blockchain in a GDPR compliant way. This other graph of information could potentially require some authorization to get access to it ... which is one thing I was alluding to. |
I think I'm mostly with @dlongley in this thread. The RDF statements in the DID document are about the DID subject. The intention is that these statements contain only public information, and the primary motivation is that they will be used for secure interaction with the DID subject. I'm also supportive of the open world model, i.e. a DID document could contain arbitrary other statements, if the DID controller wants that and the DID method supports it. We had a long discussion about "hardening" (i.e. strongly constraining) DID documents about 2 years ago. The DNS record analogy is partially useful when talking about resolution, but one difference is that a DID is an identifier for a real-world entity, whereas a domain name is not (an HTTP URI containing the domain name might be). To get back to the original topic, if we want to make statements about the DID document itself, then as @TallTed has noted we would strictly speaking need a separate identifier, and we would therefore need to change the overall JSON-LD structure. Example 1:
In this example, the identifier of the DID subject is Example 2:
Or similar, with several possible variations. I believe this has similar problems with regard to DID URL dereferencing as Example 1. Or we just leave things the way they are (maybe preprending certain property names such as "docCreated" as suggested by @dmitrizagidulin). This means that would we accept a certain "conflation" (aka "simplification") of identifiers for the DID subject and the DID document. I believe we have had this conflation for a long time anyway, due to the two assumptions that 1. the DID identifies the DID subject, and 2. we want to use DID URLs such as |
Looking at the first pattern of @peacekeeper, with a little additional JSON-LD trick it can be turned into a semantically perfectly sound structure. I have turned example 1 into finished JSON-LD with an additional statement in the context: {
"@context": [
"https://www.w3.org/ns/did/v1",
{
"didSubject": "@graph"
}
],
"type": "DidDocument",
"created": "2019-11-26",
"didSubject": {
"id": "did:ex:1234",
"authentication": [
"did:example:123456789abcdefghi#keys-1",
{
"id": "did:example:123456789abcdefghi#keys-2",
"controller": "did:example:123456789abcdefghi",
"publicKeyBase58": "H3C2AVvLMv6gmMNam3uVAjZpfkcJCwDwnZn6z3wXmqPV"
}
]
}
} Which translates in a set of TriG statements as follows:
(see JSON-LD Playground to experiment with this further.) I have not looked at example 2 but, at first glance, that seems semantically a bit less clear. |
Thanks to @peacekeeper for the examples, building on what he has said above. Example 1 is sort of how we dealt with this topic with Verifiable Credentials. Both are valid ways of expressing metadata about information, but here's the real issue: We made a mistake by calling something a "DID Document". There is no such thing. There is a DID, that identifies a resource, and when you dereference it, you get a representation of that resource. It's information at that point in time... and that's all it is... and calling it a DID Document is confusing people. There is information, and metadata about information. Sometimes you serialize that information, and some people call that serialization "a document"... but it isn't. It isn't a unique parchment of which there is only one copy in the entire universe. It's this ephemeral thing, and sometimes you need to say things about that ephemeral thing. We got this right with Verifiable Credentials. The outermost thing was metadata about the information (metadata about the credential), and the innermost thing was the information itself (the subject(s) of the credential). I really worry about both Examples, I think they're both wrong. Example 1 is wrong because it breaks all blockchain-based mechanisms. Submitting Example 1 to Veres One would mean that the DID subject would be setting the created and updated dates, and they have no right to do that. It's the consensus algorithm that decides when entries in the ledger are created and updated. Example 2 is wrong for the same reason. The DID subject has no right to set the created/updated dates except in the fringe case where they actually control that information (like for did:web). So, I think the correct solution is this (Example 3):
The proposal above (Example 3) is nuanced in its difference from Example 1. It works for did:web and did:v1/did:btcr/did:ethr where Example 1 is very problematic in the latter use cases. Here's how it could work: the did:web Method would state that any file written to a web server MUST be a DID Resolution response. This means that a resolver will hit a did:web method and pull a raw resolution response (that contains a didSubject) from the web server. If a developer just wants the "DID Document", they pull the We do have the authority in the Working Group to specify a data model for "where metadata about the DID Document should go". The trick is doing this w/o opening a massive can of worms that is DID Resolution. So, we have a few options going forward:
|
this puts a lot of power in the resolver. |
also, just to highlight the self-sovereign cryptographic signature that I as the author of the DID data assert the time created and updated, not that it is necessarily |
@jonnycrunch IMO, you shouldn't trust a resolver you aren't running any more than a bitcoin node you aren't running, for the same reasons. DLT-based resolution generally requires a full node under the hood. The point of meta-data about DID Document resolution is for a given resolver to provide some level of assurance (mechanism TBD and per-method) for making a trust decision about that result. The kind of meta-data we are talking about could include just about anything, including identifying information about the resolver, so that one could rely on specific resolvers (either pseudonymous with some notion of reputation or bound to legal entities and their reputations). Another kind of "meta-data" could include the block height of the tip (for BTCR) or even a merkle bloom filter that could be used elsewhere to proof existence on chain of the root of the DID Document. I'm just speculating about these cryptographic assurances, but they are definitely part of the "stack" for deciding whether or not to rely on the result from a given resolver. |
The discussion on the WG call today was all over the place, and I think the root cause was because no one, including me, has defined what "created" means. At least six definitions popped up during the discussion today:
I think @dmitrizagidulin was talking about either 1 or 4, I was talking about 3 or 6, and I'm not sure which one @peacekeeper was talking about. Let's go at this from the other direction and get very specific about the items being discussed. I don't think having the conversation in the abstract is helping us. Let's just focus on "created" and all make sure we're talking about the same definition before we start talking about where the data should be stored. |
As an httpRange-14 nerd, I would say: What is that resource that the DID identifies? The DID subject, right? Well that's not an "information resource", therefore it has no representation that can be retrieved, has no media type, and there is no way to dereference fragments like did:ex:123#keys-1. From an RDF semantics perspective, we treat DIDs like identifiers for the DID subject, but from a URI dereferencing perspective, we treat DIDs like identifiers for the DID document. I think this is the reason why originally we didn't really mind having properties like "created", "updated", "services", "authentication" side by side without distinction. |
@msporny - The point I (and @peacekeeper) was trying to make is not that there are multiple definitions of 'created'. It's that there are multiple timestamps that need to be tracked. Which may include:
2 and 3 already have mechanisms in the (DID Resolution) data model. And what we're arguing is that item 1, the self-asserted creation date of the document, belongs in the DID document. (We were not talking about the timestamp that the DID was created (as a separate entity from the DID Document), because it's not really possible to record or keep track of.) |
@msporny I agree with you, btw, that the current property, |
At the Amsterdam F2F meeting in January 2020, @gannan08 ran a session on this topic (see slides). We then started a document to collect (meta-)data items related to DIDs and DID documents. The next steps are:
|
Chairs set a 2 week deadline on the document from today after which we can move to the next step. |
See also #203 |
DID Document metadata does not belong in the did document, it does belong in the response from a DID Method resolver.... in the same way that The current Google Document makes no sense to me, it contains both properties of a did subject, and properties related to cryptographic construction of the did method... I consider properties related to the construction of the did method to be "did method/document metadata" and properties related to the did subject to be "properties of a did document". I think we need to define DID Method Resolution in the Core Spec, as a process which converts:
After which point it will be possible to actually create a did method of "application/json" and know the difference without sniffing didDocument content. ... @kdenhartog I guess I agree with you now... :) And yes, I get that the point of the document was to collect attributes, and then decide... consider this comment as me adding all attributes defined in the did-core json-ld context to the list, along with all existing defined mime types :) |
I propose the following next step, which I think is in line what we discussed at the F2F: Now that we are collecting some items in that Google doc mentioned above, decide what the "buckets" or "categories" will be where (meta-)data will go. Remember that this discussion started because we had different understandings what the "created" property means. Some of the interpretations were:
So now we should try to identify and name the "buckets" or "categories" we want to accurately express everything. I would also recommend reviewing @gannan08 's excellent presentation from the F2F again. For now, let's leave out the related topics of what the concrete data structures would look like, or how they would be returned by a DID resolver. Let's discuss that separately later. |
My personal proposal would be that we have 3 "buckets" which we could describe and name as follows:
Again, I would propose to not discuss concrete data structures or resolver behavior yet. For example, it may be possible to express two of the above "buckets" as part of a single data structure or merge them, instead of inventing too many separate ones. Also, the format may not necessarily be JSON(-LD) for all of them, perhaps we'll have key/value pairs similar to HTTP headers, or perhaps we'll have multiple representations. But let's agree on the "buckets" first. Thoughts? |
I think what @peacekeeper is suggesting (as well as all of the bullet items) are an excellent next step. I agree with the buckets and all bullet points (but reserve the right to change my mind if complexities require us to fine tune the bullet points later). What Markus says above fits my mental model of the buckets. |
There is a bright line between document and metadata, and it's that apart from the DID Document itself, I need to have a way to understand the DID Document. What I see is that when you call a From an abstract standpoint, it's a hashmap of strings. The keys cannot be repeated, the values are always strings. A metadata definition can define an internal syntax on top of that if it wants to, for things like dates, but I think keeping these as simple as possible and not having them be a rich structure is a feature. We don't want there to be a ton of different things here, just what's needed. And we can define a single way to serialize this structure in a way that's simple. I would even argue to re-use the HTTP header grammar if that makes sense. As a bonus, this gives us a way to express input "options" to the DID resolver. We send in a bunch of request headers along with the DID. To be perfectly clear: I am not saying we should use HTTP, nor that this would require HTTP to implement. I'm saying that other protocols like HTTP, SMTP, and many others have this same kind of separation between headers that are always in the same format and content that can be in a wide variety of formats. And these protocols have this pattern for a reason: it's simple, it's powerful, and it's functional. |
I find the list of buckets by @peacekeeper at #65 (comment) an excellent starting point. But I think @jricher comment #65 (comment) is at the wrong direction. IMHO the correct paradigm to consider for DID documents is that of digital certificates. DCs can be retrieved and transferred using a number of protocols, they are "understood" by many systems and applications, they are portable, and they can even be transferred using out-of-band mechanisms. This happens because DCs are self-contained. I wish the same property will hold for DID documents. I wish they can be easily ported from one registry to another, and I wish the amount of trust to registries will be minimum. Having saying that, I believe bucket 2 should be part of the document, at least the metadata created by the controller. And for no reason the proof property section should be removed from the document! |
example:
|
Current status of this issue: It should be addressed by the PRs related to the DID Resolution contracts that the WG is actively discussing right now: https://github.com/w3c/did-core/pulls?q=is%3Apr+label%3Acontract+ |
I think our current understanding is that DID document metadata is returned separately from the DID document by the abstract
Also, in section Metadata Structure, we are now defining data types for metadata, but we are not defining how it would be serialized or represented by implementations of the With this understanding, to return to the original question in this issue, DID document metadata is logically NOT part of the DID document. BUT: Implementations of the From my perspective, this is a good solution. Logically, metadata is not part of the DID document, but the abstract definition of the Can we close this issue? |
+1 on closing the issue because we have a concrete answer now (outlined by @peacekeeper above). |
@dmitrizagidulin based on the last few comments, can we close this issue? |
No comments since marked pending close, closing. |
Does metadata about the DID Document (such as when it was created, updated, or who it was signed by) belong in that DID Document?
Note that this question is not about a) the metadata for the subject of the DID (keys, service endpoints) or b) the metadata about the resolution of a particular DID Document (proof added by a resolver, caching data, what servers/nodes were used for resolution) -- that belongs either in the Resolver metadata or Method metadata sections.
So far, there have been arguments both for and against placing this metadata in the DID Document itself (vs outside of it, say in the Resolver metadata sections).
A) This metadata is already in the registry
A - against: Since much of this metadata (specifically, the
created
andupdated
timestamps and theproof
which includes authorship metadata and document integrity protection) will also likely reside in the underlying DID registry mechanism (distributed ledger, etc), a Resolver should be able to figure out this data from the registry, and include it in the resolution metadata.A - for: In many (most?) cases, these are two separate sets of metadata - one about the document itself, and one about the underlying registry mechanism.
Also: The DID Document should be self-contained, in terms of critical metadata, in case it is archived or otherwise separated from its underlying ledger or storage medium.
B) Potential for developer confusion
B - against: If the DID Doc metadata (such as when the document was created) differs from the did registry metadata (when the document was registered on a ledger, for example), this may confuse developers.
B - for: @TallTed
In other words, these two categories of metadata are separate, and developers constantly have to keep this difference in mind anyway.
C) Use cases
C - against: There are no use cases currently for this metadata. (Or, the use cases are unclear.)
C - for: There are use cases -- this topic is highly relevant to any DID registry using a mutable storage mechanism, such as the BTCR mutable extension documents or
did:web
method documents.Also, as @peacekeeper points out:
D) Offload this topic to DID method specific specs
D - against: Even if this metadata does belong in the DID Document, perhaps we should hand this off to each DID method to decide (rather than the main DID spec).
D - for: @ChristopherA
In other words, this is going to be a common enough problem that we should address this in the main spec.
E) Conceptual elegance
E - against: @dlongley:
E - for: ... an excellent point. Perhaps we can continue to benefit from this conceptual simplicity (of having the DID Doc be mostly about the DID subject) by making it clear via the attribute names that the metadata is about the doc, not the subject? Like, having the field be named
docCreated
instead of justcreated
, to prevent ambiguity?The text was updated successfully, but these errors were encountered: