Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service Endpoints in the DID Doc might be an anti-pattern #382

Closed
msporny opened this issue Aug 28, 2020 · 149 comments
Closed

Service Endpoints in the DID Doc might be an anti-pattern #382

msporny opened this issue Aug 28, 2020 · 149 comments
Assignees
Labels
pr exists There is an open PR to address this issue

Comments

@msporny
Copy link
Member

msporny commented Aug 28, 2020

TL;DR: We don't need service endpoints in the DID Document... it's an overly-complicated anti-pattern that has a lot of downsides when we already have patterns that are implemented today that would work for all use cases.

It has been asserted that Service Endpoints in the DID Document might be an anti-pattern because, at worst, they can be used to express PII in DID Documents, and in every use case that we know of to date, they can be discovered through other means that are already employed today.

Ultimately, the problem is that developers need to be educated about the dangers of placing PII in service endpoints... many won't read the spec in detail... we have over 70 DID Methods now and the number is only increasing.

What are the chances that a non-trivial subset of them implement unwisely? My guess is the chances are pretty high, and that weakens the ecosystem.

We do have an option to not give developers foot guns... and we should try very hard not to do that. I'm afraid that non-normative documentation is better than nothing, but not good enough.

Here's what the group resolved yesterday (pending 7 days for objections to the resolutions):

RESOLVED: Discuss in a non-normative appendix how one might model Service Endpoints that preserve privacy.

RESOLVED: Define an abstract data model for serviceEndpoints in normative text, like we have done with verification methods.

RESOLVED: Define how you do service endpoint extensions using the DID Spec Registry.

I wish we would do more than that... there are alternatives that the group should consider in order to discover service endpoints:

  • Go to an entity's website, which would have a DID Auth button, which you could then use to send them your service endpoints privately using VCs.
  • Find an entity like we do today -- using a search engine of some kind... schema.org markup can be used to express public endpoints using VCs.

Both of those solutions allow us to 1) Use what we already have today, and 2) address all of the use cases that we know of.

@msporny msporny self-assigned this Aug 28, 2020
@msporny msporny added the discuss Needs further discussion before a pull request can be created label Aug 28, 2020
@OR13
Copy link
Contributor

OR13 commented Aug 28, 2020

Add a serviceEndpoint just in time, without updating the verifiable data registry using signed-ietf-json-patch.

However ^ this solution still requires us to define a data model.... and I would argue that so does "getting service endpoints" in credentials... unless you want every vendor to construct them differently, which will harm interoperability.

in other words... there is no solution to this problem that does not include a data model... but there are proposals for how that data model should be communicated, which have privacy, security and usability tradeoffs :)

@mwklein
Copy link

mwklein commented Aug 28, 2020

Only using DID Documents on-ledger for well-known public identities, and using private off-ledger peer-wise DIDs for all personal identifiers mitigates the described issue as well. Personal service end-points would be shared only via the peer-wise connection, and public service endpoints are by definition meant to be public.

@csuwildcat
Copy link
Contributor

...there are alternatives that the group should consider in order to discover service endpoints:

  • Go to an entity's website, which would have a DID Auth button, which you could then use to send them your service endpoints privately using VCs.
  • Find an entity like we do today -- using a search engine of some kind... schema.org markup can be used to express public endpoints using VCs.

No centralized intermediaries should be required for everyone on the planet to read my decentralized profile/gravatar object, my resume object, my decentralized tweet objects, my blog post objects, my code repo objects, or any number of other things I want everyone to be able to locate without engaging in a contorted, centralization-injecting dance external to the DID Document. Anyone who disagrees implicitly (whether they are aware or not) takes one of the two positions below, there simply is no third:

  1. All services should require centralized parties for location/distribution.
  2. Entities should not be able to share their intended-public data with others without participating in an explicit, out-of-band, DID Doc-external activity.

If you fall under Position 2 above, please do the following to ensure you are abiding by your own beliefs, if you have not already:

  • Delete your Twitter account
  • Delete your public blog domain
  • Delete your resume wherever you post it
  • Delete your images and videos from other social media sites
  • Turn all forms of openly accessible sharing and connections off in all interaction-based apps, such that no one can read your posts, messages, or communications without somehow contacting you and exchanging permissions through another channel.

If you do the above things in response to the implicit Position 2 that many seem to be taking, that is a first step in building credibility for the case that we should deprive people, companies, IoT devices and other entities from a more direct, decentralized mechanism of expressing themselves in fulfillment of application and service use cases.

@dhh1128
Copy link
Contributor

dhh1128 commented Aug 28, 2020

we already have patterns that are implemented today that would work for all use cases...that we know of

I think a comparison will help explain why this argument falls flat for me.

The reason we need DIDs isn't because use cases aren't addressable, exactly -- it's because the nature of a use case's guarantees and semantics changes if we don't root them in DIDs. We could do VCs with SSH keys instead of DIDs, but we don't, because SSH keys don't have the same properties (decentralization, discovery, rotation, potential for multisig...) that DIDs do.

Similarly, the nature of a service endpoint's guarantees and semantics changes if we don't put them in DID docs. This is the essence of @csuwildcat 's comment above, which I agree with -- sure, you can do discovery with existing mechanisms, but you can't do it in a decentralized way unless you either use DID docs or invent an entirely new mechanism with the same characteristics as DID docs. Yes, there are alternative ways to communicate an endpoint. The DID controller may or may not control those alternative mechanisms. Therefore, by removing the service endpoint from the DID doc, we are allowing someone other than the DID controller to frame any conversations associated with that DID. You could say, "No big deal; the non-DID-controller can't lie about controlling the DID when a digital signature or encryption is required." I answer: "True, but that's not the full requirement, because just controlling the endpoint value itself allows a malicious party to simulate the silence, uncooperativeness, or flakiness of a DID owner they want to harass."

The recent Twitter hack of accounts belonging to Obama, Biden, Elon Musk, and others is exactly the sort of thing we enable if we communicate service endpoints outside the DID doc. That was an existing communication mechanism that could communicate endpoints, and its security properties are different from a DID doc itself. The claim that leaving service endpoints in the spec is an invitation for disaster is only half a story. Yes, doing service endpoints right is hard, and doing it wrong could be obnoxious. But taking it out is just as problematic, and I don't think developers write code that guards against ordinary cybersecurity risks any better than they write code that guards against service endpoint abuse. The difference is that service endpoints is a new field of knowledge where developers will be open to guidance, rather than familiar territory where developers will casually assume they already know best practice.

@wyc
Copy link
Contributor

wyc commented Aug 28, 2020

I agree that service endpoints certainly will reduce privacy in their (mis)use, and this is an important consideration to make.

However, I think that if we excluded them from the DID spec then the new risk we incur is one related to standards adoption--the standards will become far less useful without an ascribed way to do service discovery. +1 to @dhh1128's points about "what makes DIDs different and more useful than SSH keys?", with this being a core reason. Consider the impact of this on DIDComm, which in my mind is a major use case for DIDs. I believe we will need service endpoints to enable the discovery portion of DIDComm, though I'm not certain. cc @TelegramSam @awoie

Also agreed to the point that if we punt service endpoints into another standard, then the problem still doesn't go away. In fact it might be solved in a lot less decentralized way than with DID documents, such as state-owned BigCo saying that they are the #1 DID Broker that's easiest to use for everyone because they can direct a slush fund towards winning the market in this way--and everyone would likely use the most convenient and free thing around, as we've seen for the past 10 years on the Internet.

So in summary, I recommend we keep service endpoints while acknowledging they will bring privacy problems, with the understanding that having their functionality provided somewhere else could cause (1) significant adoption risks and (2) even larger systemic privacy risks. Perhaps if we agree on these logic inputs but disagree on the specific risk measures, we can make them part of the calculus from which the decision is made.

Finally, wanted to mention that resolution of this would unblock our work with the W3C privacy self-assessment here: #291 (comment)

@agropper
Copy link
Contributor

I propose a compromise solution based on my privacy-inspired perspective in #370 (comment)

Relative to yesterday's pending resolutions:

RESOLVED: Discuss in a non-normative appendix how one might model Service Endpoints that preserve privacy.

Treat the PDP serviceEndpoint as normative, if present.

RESOLVED: Define an abstract data model for serviceEndpoints in normative text, like we have done with verification methods.

Define an abstract data model for the PDP serviceEndpoint based on standard UMA2 and pending GNAP practices.

RESOLVED: Define how you do service endpoint extensions using the DID Spec Registry.

Yes.

@msporny msporny changed the title Service Endpoints might be an anti-pattern Service Endpoints in the DID Doc might be an anti-pattern Aug 28, 2020
@peacekeeper
Copy link
Contributor

there are alternatives that the group should consider in order to discover service endpoints

I have some sympathies for this view; it seems to align with what Sam Smith has been trying to tell us since the Amsterdam F2F, which is that DID documents should only be about establishing control authority over the identifier, and that everything else (including service endpoints) should happen on a different layer.

But as others have pointed out in this thread, I also believe that alternatives (such as sending service endpoints together with the DID via the original channel, or using a search engine, or using a special refresh/notification/etc. service) will usually not provide the same guarantees that DID resolution and DID methods are supposed to provide, i.e. decentralization, control, cryptographic verifiability.

DIDs should enable service and data portability in the same way as they enable key rotation. Services are not comparable to VCs, they are much more foundational. DIDs are an indirection layer on top of both verification methods and services, since those are the fundamental constructs that enable trustable interaction associated with the subject.

@agropper
Copy link
Contributor

agropper commented Aug 28, 2020 via email

@jonnycrunch
Copy link
Contributor

I have a procedural objection to this approach. The proposals that we agreed to were an attempt to communicate consensus among the participates in a special topic call and as such are non-binding. As Ivan @iherman pointed out in the minutes these "resolutions" would be brought back to the rest of the group for broader discussion. Placing a 7 day window doesn't seem fair to such an important topic and itself is an "anti-pattern" to the standards development process.

@msporny
Copy link
Member Author

msporny commented Aug 29, 2020

Placing a 7 day window doesn't seem fair to such an important topic and itself is an "anti-pattern" to the standards development process.

The 7 day window is for the RESOLUTIONs we made, not for the topic at hand. This 7 day window is the process the group agreed to for the special topic calls. It provides an opportunity for people to object on the main topic call while ensuring that there is closure to resolutions so the group can build upon them.

/cc @brentzundel @burnburn -- we may want to remind the group of this process during the next call.

@jonnycrunch -- are you objecting to any of the RESOLUTIONS made during the last call? I note that you didn't object at the time: https://www.w3.org/2019/did-wg/Meetings/Minutes/2020-08-27-did-topic#res

@dlongley
Copy link
Contributor

@csuwildcat,

No centralized intermediaries should be required for everyone on the planet to read my decentralized profile/gravatar object, my resume object, my decentralized tweet objects, my blog post objects, my code repo objects, or any number of other things I want everyone to be able to locate without engaging in a contorted, centralization-injecting dance external to the DID Document.

This may actually be more likely to happen as a result of exposing service endpoints in DID Documents. Especially if herd privacy is desirable -- it may result in a limited number of centralized parties providing service endpoint routers that can adequately provide that feature. You may end up having to choose from this limited selection in the same way we have to choose to "login with X" today.

You may say: But for cases where I don't care about unwanted correlation, I don't need herd privacy! Ok, I get it. You don't care about the privacy cases -- you've made that very clear. Please note, however, that it may be very challenging (or impossible) for a VDR (Verifiable Data Registry, aka DID ledger) to determine whether a service endpoint is "public" or not.

There's an implicit "typing" of service endpoints relative to whether or not people care about correlation here. If a VDR needs to accept service endpoints of "type" A and reject service endpoints of "type" B, but the VDR can't tell the difference, how would you resolve this problem? You may also say you don't care, you just want to use a DID Doc from a VDR. Well, there may not be such a VDR without solving this problem -- or the VDR you've chosen may get sued into the ground after you started using it and you'll be quite grumpy.

I want to see a solution here that addresses these issues. Ignoring them or saying they can't be discussed unless you delete your Twitter account -- while entertaining -- is missing the point. I also don't want to see a solution that furthers the kind of centralization problems we've seen in the past. Of course, this may mean leveraging more places to express service endpoints, not fewer. Note that that's a decentralized mechanism for solving this problem, not a centralized one.

@dhh1128 -- Can you provide a link to how the DIDComm community is considering how "GDPR-compliant service endpoints" might be implemented and how a VDR might differentiate them from non-compliant ones?

All: I think it would be most helpful to go through a number of concrete use cases around service endpoints to determine how they might be solved using service endpoints expressed in VDR-backed DID Documents vs. alternative approaches.

@csuwildcat
Copy link
Contributor

csuwildcat commented Aug 29, 2020

No centralized intermediaries should be required for everyone on the planet to read my decentralized profile/gravatar object, my resume object, my decentralized tweet objects, my blog post objects, my code repo objects, or any number of other things I want everyone to be able to locate without engaging in a contorted, centralization-injecting dance external to the DID Document.

This may actually be more likely to happen as a result of exposing service endpoints in DID Documents. Especially if herd privacy is desirable -- it may result in a limited number of centralized parties providing service endpoint routers that can adequately provide that feature. You may end up having to choose from this limited selection in the same way we have to choose to "login with X" today.

I don't buy this argument at all - a Service Endpoint can contain a decentralized protocol URI.

You may say: But for cases where I don't care about unwanted correlation, I don't need herd privacy! Ok, I get it. You don't care about the privacy cases -- you've made that very clear. Please note, however, that it may be very challenging (or impossible) for a VDR (Verifiable Data Registry, aka DID ledger) to determine whether a service endpoint is "public" or not.

The owner of the DID determines this, not the DID ledger (nor should it, I would argue), so I don't find this line of argument persuasive.

There's an implicit "typing" of service endpoints relative to whether or not people care about correlation here. If a VDR needs to accept service endpoints of "type" A and reject service endpoints of "type" B, but the VDR can't tell the difference, how would you resolve this problem? You may also say you don't care, you just want to use a DID Doc from a VDR. Well, there may not be such a VDR without solving this problem -- or the VDR you've chosen may get sued into the ground after you started using it and you'll be quite grumpy.

A ledger is not the place where adjudication of purported types is resolved, that is always going to be in a less resource constrained system that has more latitude to evaluate assertions based on evidence that can be computed ad hoc. The ledger is the place for key awareness, routing, and type declaration - on the latter point, it's about efficient global sorting in the aggregate sense, not assertion validity evaluation, which is not a singular, universal, globally shared test anyway.

I want to see a solution here that addresses these issues. Ignoring them or saying they can't be discussed unless you delete your Twitter account -- while entertaining -- is missing the point. I also don't want to see a solution that furthers the kind of centralization problems we've seen in the past. Of course, this may mean leveraging more places to express service endpoints, not fewer. Note that that's a decentralized mechanism for solving this problem, not a centralized one.

I am not reacting in this way to oppose any entity/implementer deciding to not use Service Endpoints, my opposition is strictly contained to spec changes and normative language that negatively impacts these features such that it hinders other entities/implementations from utilizing them.

All: I think it would be most helpful to go through a number of concrete use cases around service endpoints to determine how they might be solved using service endpoints expressed in VDR-backed DID Documents vs. alternative approaches.

Use cases: decentralizing literally every app that centers around posting intended-public info, or ad hoc encrypted direct sends of info, to/from an entity to the world, or some subset down to N+1, and doing so in a way that is as easy as lookup DID > instantly know of endpoint > send message.

@dhh1128
Copy link
Contributor

dhh1128 commented Aug 29, 2020

@dlongley : providing a link is a bit challenging, because knowledge about the question exactly as you framed it is scattered through numerous documents. The best single doc I can offer is here. This covers about 70% of your question. I will attempt a summary here that is partly redundant with that doc, and that fills in some gaps.

First, it's important to understand that, because DIDComm is not API-centric, it doesn't need a different endpoint for every service or protocol it exposes. The DIDComm community is assuming that a party usually needs only one DIDComm endpoint (per transport) no matter how many services they intend to offer. (The "per transport" note is just to acknowledge that if you want to speak DIDComm over http, smtp, AMQP, BlueTooth, and sneakernet, those may be different endpoints -- but you don't need different ones for credential issuance, verification, and so forth. Those are all just protocols running over a single endpoint.)

Now, a DIDComm endpoint has baked into it the potential (but not the requirement) for routing. Routing is done by a mostly untrusted mediator that has its own encryption keys. If Alice is talking to Bob, and Bob is using a mediator, then Bob's service endpoint will be hosted by the mediator. Thousands or millions of other parties can (should) have exactly the same service endpoint. The URI for the endpoint has no query string and nothing in its domain name that identifies Bob in any way. Alice places her plaintext message (let's call this M[0]) inside an encryption envelope that only Bob can open. Let's call the encrypted result M[1]. Then Alice places M[1] inside an encryption envelope that only the mediator can open. Let's call that encrypted result M[2]. The encrypted header of M[2] tells the mediator what Bob's DID is. Bob and the mediator have previously arranged for the mediator to forward messages for Bob's DID to Bob. (There's a DIDComm protocol they can use, if they want -- or they can do it any proprietary way they like, since it doesn't have to be interoperable.)

When the mediator receives the message M[2], it opens the outer encryption envelope and peers inside. It sees that the encrypted inner message is intended for Bob's DID. It then forwards M[1] to Bob. How it does this is never publicly known; it is a private arrangement between Bob and the mediator.

In order for Alice to know that she must do the double wrapping required by Bob's mediator, the service endpoint for Bob needs to contain an ordered list of the keys (or DIDs that let her look up keys) that she has to use when encrypting for Bob's route. Thus we have a serviceEndpoint declaration with a routingKeys field that might contain: [<DID or key of Bob's mediator>]. A route that uses one mediator will have one entry in this array; a route that uses two mediators will have two, etc. (Why you'd want two mediators is beyond scope here; suffice it to say that either one or two might be common, but anything more than two will not be.)

Now, note the properties I've just described:

  1. There is no identifier for a recipient embedded in the service endpoint, and it is not transmitted as plaintext anywhere (in HTTP headers, in a POST body...) either. No eavesdroppers can learn anything.

  2. The serviceEndpoint section of Bob's DID doc Fragment identifier semantics are independent of URI scheme #1 would be identical to that section in Bob's DID doc Rewrite Authentication section - controller is wrong #2...N, and to the endpoints of all customers of the same mediator.

  3. The mediator knows that they have a message to give to Bob's DID, but they don't necessarily know who it's from, and they don't know anything about the message except the size of the encrypted BLOB. The mediator does not know the content of Bob's DID doc. Bob's DID doc can be pairwise; it doesn't have to be on a ledger.

  4. There are two abuses that a mediator could perpetrate: they could record all the times and the sizes or encrypted content of all inbound messages for Bob, and they could fail to forward messages (selective or total delete).

Given this, we believe the requirements for GDPR compliance of the endpoint are:

  • If the endpoint is directly owned/maintained by the DID controller, no requirements (there is no separate processor of data; all control resides with the DID controller, so GDPR is irrelevant). This is not the case I described above, but I mention it just for completeness. We know this condition obtains when the endpoint has no routing keys.

  • If the endpoint is mediated (which we can detect because there are one or more routing keys for the endpoint), then the mediator becomes a data processor, and their duty is to A) faithfully deliver messages; and B) delete all data and metadata about messages after they are delivered. In cases where duty B is nuanced in some way, this should be clearly specified in the terms and conditions that were worked out when Bob and his mediator negotiated services. (The DIDComm protocol that does this has a place for that.)

Now, you asked how the outside world can know that Bob's endpoint is GDPR-compliant. I would like to point out that this is far less interesting than how BOB knows that his service is GDPR-compliant; in fact, I'm not even sure the outside world's question is legitimate. We send one another emails all day long without knowing whether the email service used by the recipient is GDPR compliant. It's none of our business; all we need to know is that the person we're attempting to contact has asked us to hand off the data to a particular mail transfer agent, and is apparently satisfied that that MTA will do the right thing.

But if we really have to have a way for the outside world to know an endpoint has this property, we could add it by simply adding a gdpr-compliant property inside the serviceEndpoint data model. This would be self-attested by the DID controller, and I think that's both clear and plenty good.

@dhh1128
Copy link
Contributor

dhh1128 commented Aug 29, 2020

I would like to point out a fundamental misalignment that permeates this thread. @msporny is approaching service endpoints from the standpoint that the goal of putting them into a DID document is to communicate a place to talk. I don't agree that this is an accurate summary of the goal. I would say that the goal of putting them into a DID document is to communicate a place to talk such that the communication is known to emanate from the DID controller, and such that the key material in the DID doc is known to apply to the associated endpoint in crisp, indivisible version evolution. That is, I want to be able to say that DID doc version X bundles a key state + an endpoint state, and version Y bundles a different key+endpoint state; I don't want them to be able to evolve independently. The "such thats" are very important to me, and I haven't yet seen any proposal that accomplishes these goals other than one of putting the endpoint in the DID doc. Manu has suggested that we need to explore alternatives. I'm totally fine with that -- but I'm only interested in alternatives that include my "such thats." Everything else is abandoning a vital security and control requirement of the system, IMO.

@agropper
Copy link
Contributor

@dhh1128, is your "such that" framing for a optional but normative notification serviceEndpoint type the same idea as what I proposed above #382 (comment) except we substitute "notification" where I had "PDP" for the type and substitute "DIDComm" where I had "UMA2 and pending GNAP practices" for the data model?

As for @csuwildcat Use Case:

decentralizing literally every app that centers around posting intended-public info, or ad hoc encrypted direct sends of info, to/from an entity to the world, or some subset down to N+1, and doing so in a way that is as easy as lookup DID > instantly know of endpoint > send message.

I'm confused by the inclusion of both "intended-public info" in the same use case as "or ad hoc encrypted direct sends...". Can we deal with these separately?

The intendedPublic serviceEndpoint type does not benefit from access control but may benefit from checks on authenticity. We should be able to craft a normative data model for this optional serviceEndpoint.

The "ad hoc encrypted" serviceEndpoint type will require something like the "PDP" serviceEndpoint type where the other "entity" can provide some claims, endpoints, and encryption keys.

@dlongley
Copy link
Contributor

@csuwildcat,

I don't buy this argument at all - a Service Endpoint can contain a decentralized protocol URI.

Which one? Which one(s) will the VDR permit? Will there be a centralized allow list for the ones that are permitted? How will the URI handle herd privacy? After all of these questions are answered, could it be that you should have just asked that other decentralized network directly for a VC signed by one of the DID's keys?

The owner of the DID determines this, not the DID ledger (nor should it, I would argue), so I don't find this line of argument persuasive.

Then you don't understand the core problem I'm trying to highlight. The VDR/DID method gets to decide what will be accepted in a DID Document. This is related to the GDPR/privacy problem of what kind of information is allowed onto an immutable ledger.

I am not reacting in this way to oppose any entity/implementer deciding to not use Service Endpoints, my opposition is strictly contained to spec changes and normative language that negatively impacts these features such that it hinders other entities/implementations from utilizing them.

I also want to make sure we have a healthy ecosystem that can leverage service endpoints. All of these issues are interrelated.

Use cases: decentralizing literally every app that centers around posting intended-public info, or ad hoc encrypted direct sends of info, to/from an entity to the world, or some subset down to N+1, and doing so in a way that is as easy as lookup DID > instantly know of endpoint > send message.

Please describe a single user story that is specific and concrete for people in this thread to talk about. I think the above is too abstract to help move the needle.

@dlongley
Copy link
Contributor

@dhh1128,

Thank you for your response, there's a lot of good information in it. I'm going to try and focus down to the specific problem with expressing information on an immutable VDR.

But if we really have to have a way for the outside world to know an endpoint has this property, we could add it by simply adding a gdpr-compliant property inside the serviceEndpoint data model. This would be self-attested by the DID controller, and I think that's both clear and plenty good.

I think my question was unclear because it was interpreted to be talking about whether or not the service behind the endpoint itself was GDPR-compliant. Rather, I'm looking for a way to know whether or not the service endpoint itself, the URL, has PII in it. And I'm not talking about incidental PII or information that is intentionally encoded in some abusive way to circumvent the feature of expressing non-PII information in a DID Document.

As an example, how does a VDR distinguish this:

https://danielhardman.com/my-personal-handle

From something like this:

https://public-company.com/foo

From something like this:

ipfs://fl3hf4kjh4fk3f/fhjl2fjlk23f32f/23423

The first URL having implicitly human-meaningful identifiers baked into for a private party, the second having implicitly human-meaningful identifiers baked into it for a public party, and the last having no human-meaningful identifiers baked directly into the URL.

Could you provide an example DIDComm herd-privacy mediator URL? What would it look like? From your linked article I found this: http://agents-r-us.com/inbox. Is that a good example?

If the endpoint is directly owned/maintained by the DID controller, no requirements (there is no separate processor of data; all control resides with the DID controller, so GDPR is irrelevant).

Does this statement mean that you also believe that an immutable VDR can permit a DID controller to put any PII they want to into into a DID Document -- and there would be no "right to be forgotten" issues?

Another side question:

The serviceEndpoint section of Bob's DID doc #1 would be identical to that section in Bob's DID doc #2...N, and to the endpoints of all customers of the same mediator.

How many of these mediators do you expect to exist in the ecosystem?

@dhh1128
Copy link
Contributor

dhh1128 commented Aug 29, 2020

Rather, I'm looking for a way to know whether or not the service endpoint itself, the URL, has PII in it.

Ah. Yes, you're right; I misinterpreted the question.

I know of no way to inspect a raw URL and conclude with certainty that it does or doesn't contain PII.

You seem to be poking at whether putting into a DID doc a service endpoint with PII in it alters the GDPR analysis, as if the service endpoint is the locus of the risk. I don't think this implication is correct, because a DID value on its own is PII. If you can write a personal DID doc to a ledger at all, you have a GDPR problem, whether or not you include a PII-containing service endpoint in it. This is why I assumed the other interpretation of your question.

Could you provide an example DIDComm herd-privacy mediator URL? What would it look like? From your linked article I found this: http://agents-r-us.com/inbox. Is that a good example?

Yes, that's a reasonable example. It could also be https://myisp.com/didcomm or https://myuniversity.edu/students or whatever. (HTTPS is not strictly required for security properties, but there are some benefits to it, such as the fact that mobile apps will pass review by app stores if they only make HTTPS calls.)

Does this statement mean that you also believe that an immutable VDR can permit a DID controller to put any PII they want to into into a DID Document -- and there would be no "right to be forgotten" issues?

No. Each VDR has to solve this problem. The first Indy/Aries solution to this problem is to use peer DIDs, which are never written to a ledger in the first place, and to use ZKPs for VCs, which don't require a binding to a public DID. Building on that, Sovrin's next solution to this problem is to decompose full DID docs into individual sections that have more specialized data models and their own transaction types. This makes them amenable to careful validation. That may filter some obvious stuff (query strings with DID values in them), but it will not fix the deeper problem in your 3-part example. Its next proximate solution to this problem is to require each writer to the ledger to include with their write a signature over a Transaction Author Agreement (essentially terms and conditions that clarify that no PII is allowed, and that by writing the data, any claim of right to be forgotten are explicitly forfeit). That probably will limit problems significantly, but it may not be enough in the end. Sovrin's final solution is to support a tombstoning mechanism that can be applied on a per-node, per-jurisdiction basis, such that read requests of a tombstoned record cause the semantic equivalent of an HTTP 451 error, yet the ledger's integrity, and the ability to forge consensus by nodes in different jurisdictions, is maintained.

Note that I deliberately said "Sovrin" in the preceding paragraph. Other Indy ledgers may choose to layer their own solutions on top of the peer DID strategy (or a different DID strategy), according to the governance they choose. Non-Indy ledgers each have to solve it, also. I'm not aware of a good solution yet for Bitcoin and Ethereum.

How many of these mediators do you expect to exist in the ecosystem?

The answer here will vary by time. Aries includes an Apache-2-licensed impl of one, and there are currently several SaaS vendors in production, who've got an interoperable wallet scheme to prevent vendor lockin... In the youth of the ecosystem, dozens or hundreds? Eventually, I'd say they will be offered by a meaningful % of ISPs or email providers and will have a long-tail distribution of customer counts like mail transfer agents -- so maybe tens of thousands, with a small handful supporting herd sizes in the billions or millions?

@dhh1128
Copy link
Contributor

dhh1128 commented Aug 30, 2020

@agropper :

is your "such that" framing for a optional but normative notification serviceEndpoint type the same idea as what I proposed above #382 (comment) except we substitute "notification" where I had "PDP" for the type and substitute "DIDComm" where I had "UMA2 and pending GNAP practices" for the data model?

I'm not sure. I don't think DIDComm is a "notification" service endpoint type; I think it's a service endpoint type all its own. It can be used for anything that DIDComm can be used for, which is any message-based interaction (protocol) that wants to inherit DIDComm's security and privacy guarantees and processing model. I also don't know enough about UMA2 and GNAP to feel confident about the analog.

@dlongley
Copy link
Contributor

@dhh1128,

You seem to be poking at whether putting into a DID doc a service endpoint with PII in it alters the GDPR analysis, as if the service endpoint is the locus of the risk. I don't think this implication is correct, because a DID value on its own is PII. If you can write a personal DID doc to a ledger at all, you have a GDPR problem, whether or not you include a PII-containing service endpoint in it. This is why I assumed the other interpretation of your question.

A DID on its own does not necessarily identify a person. This depends on its use outside of the VDR. However, a URL that includes a person's full name identifies a person, all on its own.

@dlongley
Copy link
Contributor

@csuwildcat -- Please take a look at @dhh1128's comment. He covers Sovrin's view of putting PII onto a VDR and all of the problems there. This is the sort of thing I've been trying to highlight as a problem for the case you want supported.

@dhh1128
Copy link
Contributor

dhh1128 commented Aug 30, 2020

@dlongley :

A DID on its own does not necessarily identify a person. This depends on its use outside of the VDR. However, a URL that includes a person's full name identifies a person, all on its own.

A DID that has as its subject a person is PII, according to legal experts who've studied PII+GDPR+SSI carefully. (Or perhaps more precisely, experts I've talked to say that they believe legal rulings will eventually formalize this legal conclusion.) The fact that some DIDs have subjects that aren't individuals is irrelevant. Putting a DID that identifies a person onto a public ledger is putting PII onto that ledger, even if it is not obvious to an outside observer that the DID in question has an individual as its subject. Obviousness is not a definitional criterion of PII, and does not eliminate the right-to-be-forgotten requirement.

@dlongley
Copy link
Contributor

@csuwildcat,

I'm pretty sure @dhh1128 is mostly advocating in this issue for service endpoints in did:peer DID Documents, which is a separate case from putting service endpoints directly on a VDR -- which, I believe, is what you want.

@dhh1128
Copy link
Contributor

dhh1128 commented Aug 30, 2020

I'm pretty sure @dhh1128 is mostly advocating in this issue for service endpoints in did:peer DID Documents, which is a separate case from putting service endpoints directly on a VDR -- which, I believe, is what you want.

True. Well, sort of. I want service endpoints in the spec because A) I want institutions to publish their endpoints in their DID docs; and B) I want private individuals to put their endpoints in peer DID docs.

Daniel B's case of individuals publishing an endpoint for a public DID on a public ledger for discovery purposes is one I've thought less about. I do believe in individuals having public DIDs, and putting endpoints in the associated DID docs -- but I don't believe that requires a ledger. Peer DIDs can be public and published without a ledger (e.g., on your FB page, on your twitter profile, [[edit: and in lots of other places]]). They still have all the characteristics of security and control you need, but they don't incur any right-to-be forgotten issues if the individuals publish them in places they control.

@dlongley
Copy link
Contributor

@dhh1128,

A DID that has as its subject a person is PII, according to legal experts who've studied PII+GDPR+SSI carefully. The fact that some DIDs have subjects that aren't individuals is irrelevant. Putting a DID that identifies a person onto a public ledger is putting PII onto that ledger, even if it is not obvious to an outside observer that the DID in question has an individual as its subject. Obviousness is not a definitional criterion of PII, and does not eliminate the right-to-be-forgotten requirement.

I understand that this is your position. It's not settled yet -- and until it is, there are possible interpretations that split the information into two separate classes. There are also a number of exceptions to the "right to be forgotten" for which this difference might be important either on its own or in conjunction with the function of or governance/authority structures for a particular VDR. So, there remain open questions. It's harder to make the case for any difference, however, when your full legal name is explicitly called out in a DID Doc as merely additional information. This is in contrast to other "authoritative data" in the DID Doc including the DID itself and public key material that can be more readily linked to legal purposes and public interest, etc.

@dlongley
Copy link
Contributor

@dhh1128,

I do believe in individuals having public DIDs, and putting endpoints in the associated DID docs -- but I don't believe that requires a ledger. Peer DIDs can be public and published without a ledger (e.g., on your FB page, on your twitter profile, etc). They still have all the characteristics of security and control you need, but they don't incur any right-to-be forgotten issues if the individuals publish them in places they control.

Yes, but this approach is what @csuwildcat is railing against as being insufficient for his use case (which we still need to get more concrete about).

@csuwildcat
Copy link
Contributor

csuwildcat commented Aug 30, 2020

Peer DIDs can be public and published without a ledger (e.g., on your FB page, on your twitter profile, etc).

Guys, please reread this and consider how it is explicitly failing to solve for the needs I have. To help, I will restate the comment above in the scope of the use case: "Dan, you can create decentralized social networks, decentralized secondhand sales networks, decentralized gig economy exchanges, etc. that don't require centralized intermediary services, like Twitter, Craigslist, and Uber, by creating unregistered, uncrawlable DIDs, and simply attaching them to your Twitter, LinkedIn, and Uber accounts"

@msporny msporny added pre-cr-p3 ready for pr Issue is ready for a PR and removed discuss Needs further discussion before a pull request can be created labels Nov 1, 2020
@msporny
Copy link
Member Author

msporny commented Nov 1, 2020

This issue can be resolved by writing a PR that addresses the resolutions raised by the group. This issue is waiting for an editorial PR to be written, thus is a low priority to get done before CR.

@tahpot
Copy link

tahpot commented Nov 27, 2020

Wow, what a thread and subsequent meeting on 09/24. I learned a lot, so thanks to everyone for their great contributions and thoughtful discussion.

At risk of opening a can of worms that has seemingly been shut, I see value in introducing an optional hiddenService (or unpublishedService?) core property that defines a single service endpoint for external users to request access to a set of hidden serviceEndpoints that must be requested at a point in time.

This hiddenService endpoint would only support a sub-set of common protocols (HTTP...) and auth methods (TBD). The endpoint would be expected to return a signed DID document listing all the service endpoints visible to the requester.

This allows the spec to:

  • be explicit about distinguishing between visible / hidden service endpoints
  • provides a means to "discover" hidden service endpoints if you only have a user's DID. If the endpoint is hit with no authorization a list of public endpoints can be returned, but they are never published so the controller can remove them at any time
  • enable dynamic endpoints to exist, whereby a requestors credentials could determine a subset of private service endpoints to be returned (ie: I'm okay with letting Google contact me via Twitter, but not via my PornHub account). As discussed in this thread, I could alternatively provide many VC's to Google. However, imagine if I have 50 different accounts, that's a lot of data to send (you can't embed all that in an onboarding URL!), plus anytime that information changes I need to resend every serviceEndpoint to Google (and others). I'm better off providing Google with an auth token to access my hidden service endpoints, dynamically controlling which endpoints are returned by Google's auth token (or equivalent).
  • avoid breaking the existing services property, so the spec can clearly state that services is a list of public endpoints and should be used with extreme caution.
  • state that any serviceEndpoints returned from the hiddenService should not be published / indexed and doing so could put the publisher in breach of various laws due to PII information.

@agropper
Copy link
Contributor

agropper commented Nov 27, 2020 via email

@tahpot
Copy link

tahpot commented Nov 27, 2020

As far as I understand GNAP, hiddenService could well be a GNAP-protected resource with the spec defining the structure of the returned resource.

@OR13
Copy link
Contributor

OR13 commented Nov 30, 2020

I don't see the value of hiddenService as a separate field... but I do see the value of a service type which accomplishes the same functionality, also, related to hidden services: https://github.com/BlockchainCommons/did-method-onion

@TelegramSam
Copy link

DIDComm supports the disclosure of supported protocols at the discretion of the DID owner for all the reasons stated above, without a separate hiddenService field. Given that disclosure is likely to be interactive in some way, specifying the method to do so in the spec seems limiting to the development of new and improved ways to accomplish that disclosure.

@OR13
Copy link
Contributor

OR13 commented Nov 30, 2020

@TelegramSam agree, I would personally like to see a service type of DIDComm or something similar... so that you can start interrogating the service directly...

@jandrieu
Copy link
Contributor

@OR13 I thought DIDComm was transport agnostic. How would you know what transport to use? Or is there a standard http API that provides a DIDComm binding for URLs?

@OR13
Copy link
Contributor

OR13 commented Nov 30, 2020

@jandrieu assuming the serviceEndpoint=https://example.com/... you would know its HTTP ready. AFAIK the did core spec has no examples other than HTTP, but I assume there might be service definitions that might express type=DIDComm transport=bluetooth... DID Core would be responsible for defining services sufficiently to support transport agnosticism, IMO its not doing a great job of that today.

@jandrieu
Copy link
Contributor

@OR13 Ok. That matches my expectation. The addition of a transport property might do the trick, but I'll leave that to DIDComm folks.

@dhh1128
Copy link
Contributor

dhh1128 commented Nov 30, 2020

assuming the serviceEndpoint=https://example.com/... you would know its HTTP ready

Yes.

Or you could have...

  • type=DIDComm and endpoint=mailto:[email protected] OR
  • type=DIDComm and endpoint=kafka:kafka.agentsrus.com/DIDComm OR
  • type=DIDComm and endpoint=bluetooth:mydeviceid OR
  • type=DIDComm and endpoint=post:123+main+street+Anywhere+USA+12345 OR
  • type=DIDComm and endpoint=s3:bucketid OR
  • type=DIDComm and endpoint=tor:foo.onion/xyz ...etc

In all cases, the encryption/packaging/security guarantees are identical. The logical bytes of the messages are also identical, although they may be MIME-encoded for email or use transfer chunk encoding with HTTP POST. This is what is meant when we say that DIDComm is transport agnostic.

DIDComm runs arbitrary protocols, so you never need more than one endpoint. One of the protocols you can run is a feature discovery protocol that lets you discover what other protocols the other party supports/is willing to engage in. A hidden service endpoint is thus unnecessary; any agent gets to decide what services it wants to expose to each party that contacts it there.

@OR13
Copy link
Contributor

OR13 commented Nov 30, 2020

@tahpot
Copy link

tahpot commented Dec 6, 2020

A hidden service endpoint is thus unnecessary; any agent gets to decide what services it wants to expose to each party that contacts it there.

That makes sense, so the "feature discovery protocol" could be used via DIDComm to expose additional services.

However, the definition of those services would differ from the serviceEndpoint spec within DID Core. Is that inconsistency acceptable? Is the dependency on DIDComm for discovery of these additional services acceptable?

@dhh1128
Copy link
Contributor

dhh1128 commented Dec 7, 2020 via email

@tahpot
Copy link

tahpot commented Dec 10, 2020

Probably not for everyone. I wasn't arguing that everyone should adopt DIDComm; I was just explaining why, if you assume DIDComm, you don't need a solution for this additional challenge.

I think that's the heart of what I'm trying to say.

While it's technically possible to use DIDComm (or another type), we can't assume everyone is going to use DIDComm to communicate a "hidden serviceEndpoint".

As it seems very useful to support the concept of a hidden serviceEndpoint (or similar), I would prefer to see such capability explicitly defined in the spec.

@agropper
Copy link
Contributor

agropper commented Dec 10, 2020 via email

@msporny
Copy link
Member Author

msporny commented Dec 20, 2020

I have authored PR #511 to address the resolutions the WG made here: #382 (comment)

This issue can be closed once PR #511 is merged.

@agropper
Copy link
Contributor

Made suggested changes to #511 (review) that I believe are consistent with the four resolutions.

@dhh1128
Copy link
Contributor

dhh1128 commented Dec 22, 2020

I turned Adrian's latest suggestions into PR #515, which is an alternative embodiment of the resolutions made here. If accepted, this would supersede PR #511.

@peacekeeper peacekeeper added the pr exists There is an open PR to address this issue label Dec 22, 2020
@msporny
Copy link
Member Author

msporny commented Jan 3, 2021

This issue will be closed once PR #515 is merged.

@msporny msporny removed the ready for pr Issue is ready for a PR label Jan 12, 2021
@msporny
Copy link
Member Author

msporny commented Jan 12, 2021

PR #515 has been merged, closing.

@msporny msporny closed this as completed Jan 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr exists There is an open PR to address this issue
Projects
None yet
Development

No branches or pull requests

17 participants