Clarify the heuristics to determine the interaction model if none is specified #128

RubenVerborgh · 2019-12-04T14:54:30Z

No description provided.

kjetilk · 2019-12-05T19:15:29Z

Since we don't have the actual clarification here, I'll return this to TODO.

kjetilk · 2019-12-16T14:55:30Z

I think there are two parts to this, one is the specific heuristics, which is simple. The other thing is consistency, which is probably harder. The latter was more the intention of #121 , but it really is connected.

My proposal for the heuristics:

Iff the request-URI path component ends with the character / is an LDP-C. Since only LDP-BC is supported by Solid, it is permissible to interprete it as that. Note, however, that in Solid, LDP-C doesn't necessarily imply LDP-RS, since it may be represented by an RDF-bearing document.
If the request-URI path component does not end with /, then the resource is an LDP-RS iff it has a media type that implies that it state is fully represented in RDF, corresponding to an RDF graph.
Any other resources are assumed to be LDP-NR.

I'm thinking it is good to define it in terms of media type in the second point, so that the interaction model can always be inferred from headers (since #70), the alternative is to inspect the body to ensure that. We could probably enumerate the media types that are LDP-RS at this point.

I'm still of the opinion that the HTML+RDFa falls between LDP-NR and LDP-RS, (#69), but I'm willing to go with that if it makes stuff simpler.

TallTed · 2019-12-16T17:40:21Z

[@kjetilk] I'm still of the opinion that the HTML+RDFa falls between LDP-NR and LDP-RS

There is no between. "Fully represented in RDF" (LDP-RS) and "not fully represented in RDF" (LDP-NR) add up to 100% of all possibilities (LDPR). HTML+RDFa docs are not fully represented in RDF, so they are LDP-NR.

Otherwise, I think I agree with your last.

acoburn · 2019-12-16T20:21:56Z

Would it make sense to require the response headers to include an indication of how the server understood the request? That is, if the server understands the new resource to be an LDP-BC, then the server MUST respond with a corresponding Link header, indicating an LDP-BC was created (the same would apply to any other LDP interaction model).

kjetilk · 2019-12-16T20:40:58Z

[@kjetilk] I'm still of the opinion that the HTML+RDFa falls between LDP-NR and LDP-RS

There is no between. "Fully represented in RDF" (LDP-RS) and "not fully represented in RDF" (LDP-NR) add up to 100% of all possibilities (LDPR).

But that's not what it says, it says:
"An LDPR whose state is fully represented in RDF," vs "An LDPR whose state is not represented in RDF.", which leaves open a class of things that are partially represented in RDF.

HTML+RDFa docs are not fully represented in RDF, so they are LDP-NR.

I think I can go with that though, since I can see that it may be just inaccuracy, which happens.

csarven · 2019-12-16T21:45:26Z

/ is a RDF Source (RDF 1.1). Should apply containment behaviour and resource lifecycle like LDP-BC.
re "media type that implies", you can't go from there exists a possibility of a representation with non-RDF emitting content to all representations of a media type. That doesn't rule out text/html, if that was what you were aiming at. It is permitted to have a representation in HTML+RDFa without any non-RDF emitting content corresponding to an RDF graph. If you want to relate to LDP, it qualifies for LDP-RS.

csarven · 2019-12-16T21:49:58Z

Categorically pinning text/html, application/xhtml+xml, image/svg+xml etc to LDP-NR conflicts with the possibility of having an HTML(+RDFa) representation available from /. In #69 we have already acknowledged and accepted the use cases which requires HTML(+RDFa).

kjetilk · 2019-12-16T23:14:43Z

Categorically pinning text/html, application/xhtml+xml, image/svg+xml etc to LDP-NR conflicts with the possibility of having an HTML(+RDFa) representation available from /. In #69 we have already acknowledged and accepted the use cases which requires HTML(+RDFa).

That's a practical question, I think. As a matter of definition, it was what I was referring to when I said that LDP-C does not imply LDP-RS. That may, or may not take care of it, depending on the practical implications of the assumptions made by pure LDP implementations.

We could also say that HTML+RDFa etc are just LDPR...

kjetilk · 2019-12-16T23:50:21Z

Or... We could say that "these media types must be inspected to determine if the document is fully represented in RDF and thus LDP-RS", and then leave the rest to be LDP-NR. But it won't solve the conflict of LDP-C not necessarily being LDP-RS.

csarven · 2019-12-17T10:38:32Z

If / is supposed to be handled as a LDP-BC, we shouldn't venture from LDP-RS. I don't particularly like the idea of Solid having a different take on LDP eg. / is LDP-BC but not necessarily LDP-RS.. or even suggesting that a resource has a (default?) interaction model but will differ depending on the representation. That would be internally inconsistent and not a simple design. Just because LDP works with vague notions attempting to cover spectrum of things with eg "fully", "partial", or "not have useful", doesn't mean that Solid has to go on that path. If it is really about that, I feel much more comfortable to default to RDF 1.1's notion of what RDF Source entails and not get sucked into pedantic over-engineering.

/ is intended to be a LDP-BC and LDP-RS and so must be handled as a RDF graph. Everything else is secondary or a non-issue... out of scope.

To answer @acoburn 's question:

Would it make sense to require the response headers to include an indication of how the server understood the request?

Yes, I agree that makes sense (as I've already proposed elsewhere). We note that LDP doesn't specify how requests without an interaction model should be handled. So, specifying it for Solid is not about compatibility with LDP but just us striving for normalisation and uniform handling.

kjetilk · 2019-12-17T11:17:50Z

/ is intended to be a LDP-BC and LDP-RS and so must be handled as a RDF graph. Everything else is secondary or a non-issue... out of scope.

We'd have to take that up with @timbl , basically, it is his design that / can return a representation where the state is not fully represented in RDF that is incompatible with LDP's notion that LDP-C isa LDP-RS.

My opinion is, and has always been, that this notion (LDP-C isa LDP-RS) is a Really Bad Idea[tm], so, I don't feel the tension personally, but indeed, it is important for Solid to have a clear relationship to LDP.

kjetilk · 2019-12-18T12:50:22Z

We need to define heuristics with the use of a Slug, BTW. It may be better to define that in #108 , though.

csarven · 2019-12-18T14:29:54Z

#96

kjetilk · 2019-12-18T18:58:56Z

#96

Good catch. Forgot that one, since it wasn't on the project board... I'm starting to think that we may need to elevate that from the implementation detail status. I envision that the presence of a / will influence whether it will be classified as a Container or not, which would not be just an implementation detail.

csarven · 2019-12-18T19:11:15Z

Yes, I think our perspective on that evolved (through other issues).

I suggest to apply the same heuristics from the resolution of #107 on #96 .

kjetilk · 2019-12-18T23:07:42Z

Actually, rereading, I found that the description #96 to be quite different from what needs to be resolved here. Also, I think it should be orthogonoal to #107 , since I think the use of a Slug is about the client making a suggestion, the server will do it damnedest to find a reasonable solution. If the client wants to be exact, they should use PUT, which I defined in terms of exactness in #40.

So, I'll guess I'll go with my proposal right here. :-) It seems there are six possibilities, combinations between whether the Slug ends with a / or not and whether the request has an LDP Link rel="type" header.

Here's my suggestion:

Given `Slug`	Given `Link`	Result
`foo/`	no `Link` header	Create a container with `foo/` appended to the request-URI.
`foo/`	`Link` header setting LDP-C	Create a container with `foo/` appended to the request-URI.
`foo/`	`Link` header setting some other LDP interaction model	Create a resource with `foo` appended to the request-URI, but check for consistency (as discussed in #40)
`foo`	no `Link` header	Create a resource with `foo` appended to the request-URI, check for consistency
`foo`	`Link` header setting LDP-C	Create a container with `foo/` appended to the request-URI.
`foo`	`Link` header setting some other LDP interaction model	Create a resource with `foo` appended to the request-URI, check for consistency

[Removed updated suggestion]

csarven · 2020-01-02T00:21:12Z

If Slug is provided, the slash semantics is used for the intended URI.

If the Link header accompanies Slug, Link has a higher specificity than Slug for the algorithm generating the URI.

kjetilk · 2020-01-02T16:11:40Z

Right! You can put it that way, but it has to say something more concrete about what to do with the slash if it conflicts with the expectation set by the Link, like in the third row in the table.

Can we formulate a consensus around this?

I think we first need to construct the Resource URL. With some methods, (i.e. PUT, PATCH), the URL is the Request-URI, while with POST), the Resource URL is constructed by the server, using the heuristics based on Slug and Link as discussed above, or assigned a string by the server if Slug is not present.

Then, the Resource URL is checked for consistency, e.g. a Resource URL not ending with / cannot be an LDP-C.

Then,

Iff the Resource URL path component ends with the character / the resource can be assumed to behave like an LDP-BC. However, in Solid, LDP-C doesn't necessarily imply LDP-RS.
If the Resource URL path component does not end with /, then the resource is an LDP-RS iff it has a media type listed in the a list TBD.
Other resources may be treated as LDP-NR.

csarven · 2020-01-05T23:27:41Z

I agree with the steps involving the construction of the effective request URI and setting the interaction model.

kjetilk · 2020-02-18T16:54:23Z

Just to note, NSS says:

The name of new file POSTed may not contain : | or /

in the case where the Slug ends with a /.

I don't think Slug should be restricted like that, it only serves to limit the client for no gain, so I think we should stay with the proposed consensus.

csarven · 2020-02-18T21:55:16Z

I've encountered that error on NSS as well when I tested Slug. Noted it as a bug because as far as I can tell from https://tools.ietf.org/html/rfc5023#section-9.7.1 , / is a permitted reserved character and doesn't need to be percent-encoded. If you can confirm this as well, that'd be great. / can appear anywhere and any number of times in slugtext.

RubenVerborgh · 2020-02-18T21:57:10Z

/ can appear anywhere and any number of times in slugtext

But then slugs can be used to create folders recursively, given that slashes do have a meaning in Solid? And this would create complex locking problems?

Not to mention the attack vector for security problems.

I'm against.

csarven · 2020-02-18T22:09:39Z

Recursive container creation is an atomic operation and not unique to Slug use. Applying the slash semantics to the Slug value keeps things consistent ( #96 (comment) ).

Can you elaborate on how the attack vector for security problems that you are thinking of is unique to Slug use?

RubenVerborgh · 2020-02-18T22:14:14Z

Recursive container creation is an atomic operation

Understood; could you add a reference for that (might provide context for the next question)?

Applying the slash semantics to the Slug value keeps things consistent ( #96 (comment) ).

But then I think we need a MUST there (will comment).

Can you elaborate on how the attack vector for security problems that you are thinking of is unique to Slug use?

Depending on how recursive resource creation is defined, the problems would be the same. I.e., we should in both cases warn against things such as:

foo/bar/../../secret
foo/bar/secret.acl
foo/bar/secret.acl/whatever
…

However, the real danger is that implementers overlook the fact that it is recursive container creation, as they might expect the slug to be just a filename.

A such, I would ask the question differently: what is the use case for recursive container creation via slug?

kjetilk · 2020-02-18T23:04:31Z

I'm thinking that we do need to restrict the acceptable characters or even what the server may use to construct a resource identifier (with relevance to #142), but that the server should otherwise have the liberty to construct identifiers based on the Slug, rejecting something should be a rare occurance.

That said, I say we defer recursive container creation with Slug until after FPWD. When using PUT, the URI resolution is already well-defined. Later, we can port that definition into our understanding of Slug, but for now, lets not do that.

csarven · 2020-02-18T23:04:57Z

Understood; could you add a reference for that

#68 (comment) , #118 (comment) , #126 (comment) , #137 (comment) .. there is more out there I'm sure. Resolving security follow up #129 can help further. Requirements like #107 (comment) can help towards halting if a particular segment (container) can't be created.

I think we need a MUST there

IIRC, the rationale for SHOULD was because of what Slug entails and that it would still be within the confines of client suggestion as well as server choosing to ignore or use as it prefers. I also understand the rationale for MUST and that would be equally valid but it may be overstepping Slug.

we should in both cases warn against things such as

These would be fine as non-normative information in general but I don't see it being specific to Slug.

However, the real danger is that implementers overlook the fact that it is recursive container creation, as they might expect the slug to be just a filename.

Servers that use the Slug (with a particular algorithm eg. slash semantics) will have to generate an effective request URI and perform the request based on that as usual. Again I'm not sure if there is anything unique here for Slug but I'd be content to clarify these further as non-normative information because we are really not defining anything new other than suggesting to apply the slash semantics algorithm for the sake of consistent path segment handling.

what is the use case for recursive container creation via slug?

The same use cases that would require PUT or PATCH to create recursive containers. Slug just happens to be a hint or a suggestion to the server on the naming.

RubenVerborgh · 2020-02-18T23:10:46Z

I'm thinking that we do need to restrict the acceptable characters or even what the server may use to construct a resource identifier (with relevance to #142),

And you might even want to limit length, as per whatwg/fetch#862 (comment), to avoid that people ask for 10MB slugs. (Also note for PUT.)

I also understand the rationale for MUST and that would be equally valid but it may be overstepping Slug.

Security seems key here. If Slug has it as a SHOULD, but you create it, and the slash interpretation is a SHOULD or MUST on GET, we might get in trouble.

non-normative information in general

The danger is huge. Why allow .acl slugs and ../ sequences? Open attack vector IMHO.

The same use cases that would require PUT or PATCH to create recursive containers. Slug just happens to be a hint or a suggestion to the server on the naming.

Then use those, I'd say. This is the kind of security risk where it's hard to imagine all implementations to get it right. I'd strongly ask the security experts to weigh in.

csarven · 2020-02-18T23:32:41Z

By mere definition of Slug, server can ignore or change it to what it deems to be useful or safe. ".acl" has no special meaning or expectation as far as the spec is concerned. If an implementation considers that to be a pattern that it best avoids, it will strip it off or use something else. As for ".." , these fall under how the effective URI is handled. No different than request-target being "..". Server still has to construct the intended target. In some cases it will be justifiably useful (relative paths) and in other cases it may be a malicious or nonsensical attempt that the server can ignore. Server choosing what to do also goes for client asking to create many nested containers and not getting it.

You asked for use cases that would entail recursive container creation which happens to route through Slug (because of slash semantics for consistency).

RubenVerborgh · 2020-02-18T23:44:50Z

That, or we can avoid the whole attack vector by not allowing these things in slug, such that there is no expectation of them to work at all. The risk is that Slug could follow different code paths from regular request URI handling, and be way less tested because of it being more rare and having more edge cases because of it being a relative URI.

Anyway, not my call; just bringing in arguments of why a security expert should really look at this.

csarven · 2020-02-19T00:09:51Z

All features in all specs will be reviewed by security experts.

There are indeed many attack vectors but the ones that you've raised can either be ignored or has no more specialty than relative request-targets on their way to the effective request URI. Nevertheless, I do agree that potential risks due to different code paths would be useful to note in the spec. Thanks for raising that.

Edit: "attack vectors" not "vendors" =)

kjetilk · 2020-02-19T00:09:56Z

We could surely define in terms of request URI handling and detail how to construct an effective request URI that would need to validated. We could put some common attacks in the test suite, but lets just consider it later :-) For now, lets just redefine slugtext to exclude recursive container creation. We could be even more restrictive and for example say that we allow only the regexp \w+/?$. We could also remove any non-matching character.

Edit: Oh, I didn't read what it said about Unicode, that last part would exclude that. So, what I wrote was correct, just much more restrictive than I intended :-)

kjetilk · 2020-02-19T13:09:15Z

Another note on NSS, as a response to the case of foo with no Link header (edit: remove attempt to make simple table)

NSS5 responds with

Location: /foo.ttl

rather than just foo. Is this what we expect? For now, I'll assume yes.

RubenVerborgh · 2020-02-19T13:23:51Z

Is this what we expect?

It does not seem correct.

csarven · 2020-02-19T13:27:17Z

That would be a valid response that would pass the tests. I think the primary concern was about classifying the request as a container or a non-container resource. I don't think NSS's response is good because it should probably stick to /foo and do /foo$.ttl on disk. It is not entirely clear what requesting JSON-LD on /foo.ttl is supposed to do.. not to mention the slight confusion. Any way, this bit is an implementation detail.

csarven · 2020-06-12T11:23:47Z

Noting here that the rough consensus in this issue: #128 (comment) was taken up in PR #160 , reviewed, and approved.

However, the issue was revisited in #160 (comment) highlighting the functional requirements, variability, and complexity to implement the heuristics.

#160 (comment) proposed an alternative approach that was deemed to simplify implementation and remove variability. It was reviewed and approved. c1aac42 includes the criteria based on the proposal.

RubenVerborgh added doc: Protocol category: editorial Concerns phrasing/wording topic: resource access labels Dec 4, 2019

This was referenced Dec 4, 2019

Handling requests without an interaction model #105

Closed

Handling resources and updates with conflicting interaction models #121

Closed

kjetilk added this to the December 19th milestone Dec 5, 2019

acoburn mentioned this issue Dec 12, 2019

Specify/advise semantics of default Container resources (index.html/.ttl) #69

Open

kjetilk mentioned this issue Dec 18, 2019

Specify semantics of resource creation via PUT #40

Closed

RubenVerborgh mentioned this issue Feb 18, 2020

Resource creation behaviour when using the Slug header #96

Open

csarven mentioned this issue Feb 19, 2020

Precedence of response codes #146

Open

kjetilk mentioned this issue Feb 25, 2020

Tests for variations over Slug and Link solid-contrib/test-suite#81

Merged

csarven mentioned this issue May 4, 2020

Identify Solid Server and Spec Version #179

Open

csarven mentioned this issue May 12, 2020

Add heuristic to determine resource type #160

Merged

csarven closed this as completed Jun 12, 2020

acoburn mentioned this issue Jul 28, 2020

Container questions CommunitySolidServer/CommunitySolidServer#44

Closed

kjetilk mentioned this issue Jun 28, 2021

Should we allow slugs ending on a slash to create containers? CommunitySolidServer/CommunitySolidServer#631

Closed

kjetilk mentioned this issue Aug 18, 2021

Slash semantics and conflicting requests on resource creation #301

Open

kjetilk mentioned this issue Oct 29, 2021

Server should partially encode Slug header values #339

Open

csarven added this to Specification Sep 25, 2022

csarven moved this to Done in Specification Sep 25, 2022

Clarify the heuristics to determine the interaction model if none is specified #128

Clarify the heuristics to determine the interaction model if none is specified #128

Comments

RubenVerborgh commented Dec 4, 2019 • edited Loading

kjetilk commented Dec 5, 2019

kjetilk commented Dec 16, 2019

TallTed commented Dec 16, 2019

acoburn commented Dec 16, 2019

kjetilk commented Dec 16, 2019

csarven commented Dec 16, 2019 • edited Loading

csarven commented Dec 16, 2019

kjetilk commented Dec 16, 2019

kjetilk commented Dec 16, 2019

csarven commented Dec 17, 2019

kjetilk commented Dec 17, 2019

kjetilk commented Dec 18, 2019

csarven commented Dec 18, 2019

kjetilk commented Dec 18, 2019

csarven commented Dec 18, 2019

kjetilk commented Dec 18, 2019 • edited Loading

csarven commented Jan 2, 2020

kjetilk commented Jan 2, 2020

csarven commented Jan 5, 2020

kjetilk commented Feb 18, 2020 • edited Loading

csarven commented Feb 18, 2020

RubenVerborgh commented Feb 18, 2020

csarven commented Feb 18, 2020

RubenVerborgh commented Feb 18, 2020

kjetilk commented Feb 18, 2020

csarven commented Feb 18, 2020

RubenVerborgh commented Feb 18, 2020

csarven commented Feb 18, 2020

RubenVerborgh commented Feb 18, 2020

csarven commented Feb 19, 2020 • edited Loading

kjetilk commented Feb 19, 2020 • edited Loading

kjetilk commented Feb 19, 2020 • edited Loading

RubenVerborgh commented Feb 19, 2020

csarven commented Feb 19, 2020

csarven commented Jun 12, 2020

RubenVerborgh commented Dec 4, 2019 •

edited

Loading

csarven commented Dec 16, 2019 •

edited

Loading

kjetilk commented Dec 18, 2019 •

edited

Loading

kjetilk commented Feb 18, 2020 •

edited

Loading

csarven commented Feb 19, 2020 •

edited

Loading

kjetilk commented Feb 19, 2020 •

edited

Loading

kjetilk commented Feb 19, 2020 •

edited

Loading