Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify the heuristics to determine the interaction model if none is specified #128

Closed
RubenVerborgh opened this issue Dec 4, 2019 · 44 comments

Comments

@RubenVerborgh
Copy link
Contributor

RubenVerborgh commented Dec 4, 2019

No description provided.

@kjetilk
Copy link
Member

kjetilk commented Dec 5, 2019

Since we don't have the actual clarification here, I'll return this to TODO.

@kjetilk
Copy link
Member

kjetilk commented Dec 16, 2019

I think there are two parts to this, one is the specific heuristics, which is simple. The other thing is consistency, which is probably harder. The latter was more the intention of #121 , but it really is connected.

My proposal for the heuristics:

  1. Iff the request-URI path component ends with the character / is an LDP-C. Since only LDP-BC is supported by Solid, it is permissible to interprete it as that. Note, however, that in Solid, LDP-C doesn't necessarily imply LDP-RS, since it may be represented by an RDF-bearing document.
  2. If the request-URI path component does not end with /, then the resource is an LDP-RS iff it has a media type that implies that it state is fully represented in RDF, corresponding to an RDF graph.
  3. Any other resources are assumed to be LDP-NR.

I'm thinking it is good to define it in terms of media type in the second point, so that the interaction model can always be inferred from headers (since #70), the alternative is to inspect the body to ensure that. We could probably enumerate the media types that are LDP-RS at this point.

I'm still of the opinion that the HTML+RDFa falls between LDP-NR and LDP-RS, (#69), but I'm willing to go with that if it makes stuff simpler.

@TallTed
Copy link
Contributor

TallTed commented Dec 16, 2019

[@kjetilk] I'm still of the opinion that the HTML+RDFa falls between LDP-NR and LDP-RS

There is no between. "Fully represented in RDF" (LDP-RS) and "not fully represented in RDF" (LDP-NR) add up to 100% of all possibilities (LDPR). HTML+RDFa docs are not fully represented in RDF, so they are LDP-NR.

Otherwise, I think I agree with your last.

@acoburn
Copy link
Member

acoburn commented Dec 16, 2019

Would it make sense to require the response headers to include an indication of how the server understood the request? That is, if the server understands the new resource to be an LDP-BC, then the server MUST respond with a corresponding Link header, indicating an LDP-BC was created (the same would apply to any other LDP interaction model).

@kjetilk
Copy link
Member

kjetilk commented Dec 16, 2019

[@kjetilk] I'm still of the opinion that the HTML+RDFa falls between LDP-NR and LDP-RS

There is no between. "Fully represented in RDF" (LDP-RS) and "not fully represented in RDF" (LDP-NR) add up to 100% of all possibilities (LDPR).

But that's not what it says, it says:
"An LDPR whose state is fully represented in RDF," vs "An LDPR whose state is not represented in RDF.", which leaves open a class of things that are partially represented in RDF.

HTML+RDFa docs are not fully represented in RDF, so they are LDP-NR.

I think I can go with that though, since I can see that it may be just inaccuracy, which happens.

@csarven
Copy link
Member

csarven commented Dec 16, 2019

  1. / is a RDF Source (RDF 1.1). Should apply containment behaviour and resource lifecycle like LDP-BC.
  2. re "media type that implies", you can't go from there exists a possibility of a representation with non-RDF emitting content to all representations of a media type. That doesn't rule out text/html, if that was what you were aiming at. It is permitted to have a representation in HTML+RDFa without any non-RDF emitting content corresponding to an RDF graph. If you want to relate to LDP, it qualifies for LDP-RS.

@csarven
Copy link
Member

csarven commented Dec 16, 2019

Categorically pinning text/html, application/xhtml+xml, image/svg+xml etc to LDP-NR conflicts with the possibility of having an HTML(+RDFa) representation available from /. In #69 we have already acknowledged and accepted the use cases which requires HTML(+RDFa).

@kjetilk
Copy link
Member

kjetilk commented Dec 16, 2019

Categorically pinning text/html, application/xhtml+xml, image/svg+xml etc to LDP-NR conflicts with the possibility of having an HTML(+RDFa) representation available from /. In #69 we have already acknowledged and accepted the use cases which requires HTML(+RDFa).

That's a practical question, I think. As a matter of definition, it was what I was referring to when I said that LDP-C does not imply LDP-RS. That may, or may not take care of it, depending on the practical implications of the assumptions made by pure LDP implementations.

We could also say that HTML+RDFa etc are just LDPR...

@kjetilk
Copy link
Member

kjetilk commented Dec 16, 2019

Or... We could say that "these media types must be inspected to determine if the document is fully represented in RDF and thus LDP-RS", and then leave the rest to be LDP-NR. But it won't solve the conflict of LDP-C not necessarily being LDP-RS.

@csarven
Copy link
Member

csarven commented Dec 17, 2019

If / is supposed to be handled as a LDP-BC, we shouldn't venture from LDP-RS. I don't particularly like the idea of Solid having a different take on LDP eg. / is LDP-BC but not necessarily LDP-RS.. or even suggesting that a resource has a (default?) interaction model but will differ depending on the representation. That would be internally inconsistent and not a simple design. Just because LDP works with vague notions attempting to cover spectrum of things with eg "fully", "partial", or "not have useful", doesn't mean that Solid has to go on that path. If it is really about that, I feel much more comfortable to default to RDF 1.1's notion of what RDF Source entails and not get sucked into pedantic over-engineering.

/ is intended to be a LDP-BC and LDP-RS and so must be handled as a RDF graph. Everything else is secondary or a non-issue... out of scope.

To answer @acoburn 's question:

Would it make sense to require the response headers to include an indication of how the server understood the request?

Yes, I agree that makes sense (as I've already proposed elsewhere). We note that LDP doesn't specify how requests without an interaction model should be handled. So, specifying it for Solid is not about compatibility with LDP but just us striving for normalisation and uniform handling.

@kjetilk
Copy link
Member

kjetilk commented Dec 17, 2019

/ is intended to be a LDP-BC and LDP-RS and so must be handled as a RDF graph. Everything else is secondary or a non-issue... out of scope.

We'd have to take that up with @timbl , basically, it is his design that / can return a representation where the state is not fully represented in RDF that is incompatible with LDP's notion that LDP-C isa LDP-RS.

My opinion is, and has always been, that this notion (LDP-C isa LDP-RS) is a Really Bad Idea[tm], so, I don't feel the tension personally, but indeed, it is important for Solid to have a clear relationship to LDP.

@kjetilk
Copy link
Member

kjetilk commented Dec 18, 2019

We need to define heuristics with the use of a Slug, BTW. It may be better to define that in #108 , though.

@csarven
Copy link
Member

csarven commented Dec 18, 2019

#96

@kjetilk
Copy link
Member

kjetilk commented Dec 18, 2019

#96

Good catch. Forgot that one, since it wasn't on the project board... I'm starting to think that we may need to elevate that from the implementation detail status. I envision that the presence of a / will influence whether it will be classified as a Container or not, which would not be just an implementation detail.

@csarven
Copy link
Member

csarven commented Dec 18, 2019

Yes, I think our perspective on that evolved (through other issues).

I suggest to apply the same heuristics from the resolution of #107 on #96 .

@kjetilk
Copy link
Member

kjetilk commented Dec 18, 2019

Actually, rereading, I found that the description #96 to be quite different from what needs to be resolved here. Also, I think it should be orthogonoal to #107 , since I think the use of a Slug is about the client making a suggestion, the server will do it damnedest to find a reasonable solution. If the client wants to be exact, they should use PUT, which I defined in terms of exactness in #40.

So, I'll guess I'll go with my proposal right here. :-) It seems there are six possibilities, combinations between whether the Slug ends with a / or not and whether the request has an LDP Link rel="type" header.

Here's my suggestion:

Given Slug Given Link Result
foo/ no Link header Create a container with foo/ appended to the request-URI.
foo/ Link header setting LDP-C Create a container with foo/ appended to the request-URI.
foo/ Link header setting some other LDP interaction model Create a resource with foo appended to the request-URI, but check for consistency (as discussed in #40)
foo no Link header Create a resource with foo appended to the request-URI, check for consistency
foo Link header setting LDP-C Create a container with foo/ appended to the request-URI.
foo Link header setting some other LDP interaction model Create a resource with foo appended to the request-URI, check for consistency

[Removed updated suggestion]

@csarven
Copy link
Member

csarven commented Jan 2, 2020

If Slug is provided, the slash semantics is used for the intended URI.

If the Link header accompanies Slug, Link has a higher specificity than Slug for the algorithm generating the URI.

@kjetilk
Copy link
Member

kjetilk commented Jan 2, 2020

Right! You can put it that way, but it has to say something more concrete about what to do with the slash if it conflicts with the expectation set by the Link, like in the third row in the table.

Can we formulate a consensus around this?

I think we first need to construct the Resource URL. With some methods, (i.e. PUT, PATCH), the URL is the Request-URI, while with POST), the Resource URL is constructed by the server, using the heuristics based on Slug and Link as discussed above, or assigned a string by the server if Slug is not present.

Then, the Resource URL is checked for consistency, e.g. a Resource URL not ending with / cannot be an LDP-C.

Then,

  1. Iff the Resource URL path component ends with the character / the resource can be assumed to behave like an LDP-BC. However, in Solid, LDP-C doesn't necessarily imply LDP-RS.
  2. If the Resource URL path component does not end with /, then the resource is an LDP-RS iff it has a media type listed in the a list TBD.
  3. Other resources may be treated as LDP-NR.

@csarven
Copy link
Member

csarven commented Jan 5, 2020

I agree with the steps involving the construction of the effective request URI and setting the interaction model.

@kjetilk
Copy link
Member

kjetilk commented Feb 18, 2020

Just to note, NSS says:

The name of new file POSTed may not contain : | or /

in the case where the Slug ends with a /.

I don't think Slug should be restricted like that, it only serves to limit the client for no gain, so I think we should stay with the proposed consensus.

@csarven
Copy link
Member

csarven commented Feb 18, 2020

I've encountered that error on NSS as well when I tested Slug. Noted it as a bug because as far as I can tell from https://tools.ietf.org/html/rfc5023#section-9.7.1 , / is a permitted reserved character and doesn't need to be percent-encoded. If you can confirm this as well, that'd be great. / can appear anywhere and any number of times in slugtext.

@RubenVerborgh
Copy link
Contributor Author

/ can appear anywhere and any number of times in slugtext

But then slugs can be used to create folders recursively, given that slashes do have a meaning in Solid? And this would create complex locking problems?

Not to mention the attack vector for security problems.

I'm against.

@csarven
Copy link
Member

csarven commented Feb 18, 2020

Recursive container creation is an atomic operation and not unique to Slug use. Applying the slash semantics to the Slug value keeps things consistent ( #96 (comment) ).

Can you elaborate on how the attack vector for security problems that you are thinking of is unique to Slug use?

@RubenVerborgh
Copy link
Contributor Author

Recursive container creation is an atomic operation

Understood; could you add a reference for that (might provide context for the next question)?

Applying the slash semantics to the Slug value keeps things consistent ( #96 (comment) ).

But then I think we need a MUST there (will comment).

Can you elaborate on how the attack vector for security problems that you are thinking of is unique to Slug use?

Depending on how recursive resource creation is defined, the problems would be the same. I.e., we should in both cases warn against things such as:

  • foo/bar/../../secret
  • foo/bar/secret.acl
  • foo/bar/secret.acl/whatever

However, the real danger is that implementers overlook the fact that it is recursive container creation, as they might expect the slug to be just a filename.

A such, I would ask the question differently: what is the use case for recursive container creation via slug?

@kjetilk
Copy link
Member

kjetilk commented Feb 18, 2020

I'm thinking that we do need to restrict the acceptable characters or even what the server may use to construct a resource identifier (with relevance to #142), but that the server should otherwise have the liberty to construct identifiers based on the Slug, rejecting something should be a rare occurance.

That said, I say we defer recursive container creation with Slug until after FPWD. When using PUT, the URI resolution is already well-defined. Later, we can port that definition into our understanding of Slug, but for now, lets not do that.

@csarven
Copy link
Member

csarven commented Feb 18, 2020

Understood; could you add a reference for that

#68 (comment) , #118 (comment) , #126 (comment) , #137 (comment) .. there is more out there I'm sure. Resolving security follow up #129 can help further. Requirements like #107 (comment) can help towards halting if a particular segment (container) can't be created.

I think we need a MUST there

IIRC, the rationale for SHOULD was because of what Slug entails and that it would still be within the confines of client suggestion as well as server choosing to ignore or use as it prefers. I also understand the rationale for MUST and that would be equally valid but it may be overstepping Slug.

we should in both cases warn against things such as

These would be fine as non-normative information in general but I don't see it being specific to Slug.

However, the real danger is that implementers overlook the fact that it is recursive container creation, as they might expect the slug to be just a filename.

Servers that use the Slug (with a particular algorithm eg. slash semantics) will have to generate an effective request URI and perform the request based on that as usual. Again I'm not sure if there is anything unique here for Slug but I'd be content to clarify these further as non-normative information because we are really not defining anything new other than suggesting to apply the slash semantics algorithm for the sake of consistent path segment handling.

what is the use case for recursive container creation via slug?

The same use cases that would require PUT or PATCH to create recursive containers. Slug just happens to be a hint or a suggestion to the server on the naming.

@RubenVerborgh
Copy link
Contributor Author

I'm thinking that we do need to restrict the acceptable characters or even what the server may use to construct a resource identifier (with relevance to #142),

And you might even want to limit length, as per whatwg/fetch#862 (comment), to avoid that people ask for 10MB slugs. (Also note for PUT.)

I also understand the rationale for MUST and that would be equally valid but it may be overstepping Slug.

Security seems key here. If Slug has it as a SHOULD, but you create it, and the slash interpretation is a SHOULD or MUST on GET, we might get in trouble.

non-normative information in general

The danger is huge. Why allow .acl slugs and ../ sequences? Open attack vector IMHO.

The same use cases that would require PUT or PATCH to create recursive containers. Slug just happens to be a hint or a suggestion to the server on the naming.

Then use those, I'd say. This is the kind of security risk where it's hard to imagine all implementations to get it right. I'd strongly ask the security experts to weigh in.

@csarven
Copy link
Member

csarven commented Feb 18, 2020

By mere definition of Slug, server can ignore or change it to what it deems to be useful or safe. ".acl" has no special meaning or expectation as far as the spec is concerned. If an implementation considers that to be a pattern that it best avoids, it will strip it off or use something else. As for ".." , these fall under how the effective URI is handled. No different than request-target being "..". Server still has to construct the intended target. In some cases it will be justifiably useful (relative paths) and in other cases it may be a malicious or nonsensical attempt that the server can ignore. Server choosing what to do also goes for client asking to create many nested containers and not getting it.

You asked for use cases that would entail recursive container creation which happens to route through Slug (because of slash semantics for consistency).

@RubenVerborgh
Copy link
Contributor Author

That, or we can avoid the whole attack vector by not allowing these things in slug, such that there is no expectation of them to work at all. The risk is that Slug could follow different code paths from regular request URI handling, and be way less tested because of it being more rare and having more edge cases because of it being a relative URI.

Anyway, not my call; just bringing in arguments of why a security expert should really look at this.

@csarven
Copy link
Member

csarven commented Feb 19, 2020

All features in all specs will be reviewed by security experts.

There are indeed many attack vectors but the ones that you've raised can either be ignored or has no more specialty than relative request-targets on their way to the effective request URI. Nevertheless, I do agree that potential risks due to different code paths would be useful to note in the spec. Thanks for raising that.

Edit: "attack vectors" not "vendors" =)

@kjetilk
Copy link
Member

kjetilk commented Feb 19, 2020

We could surely define in terms of request URI handling and detail how to construct an effective request URI that would need to validated. We could put some common attacks in the test suite, but lets just consider it later :-) For now, lets just redefine slugtext to exclude recursive container creation. We could be even more restrictive and for example say that we allow only the regexp \w+/?$. We could also remove any non-matching character.

Edit: Oh, I didn't read what it said about Unicode, that last part would exclude that. So, what I wrote was correct, just much more restrictive than I intended :-)

@kjetilk
Copy link
Member

kjetilk commented Feb 19, 2020

Another note on NSS, as a response to the case of foo with no Link header (edit: remove attempt to make simple table)

NSS5 responds with

Location: /foo.ttl

rather than just foo. Is this what we expect? For now, I'll assume yes.

@RubenVerborgh
Copy link
Contributor Author

Is this what we expect?

It does not seem correct.

@csarven
Copy link
Member

csarven commented Feb 19, 2020

That would be a valid response that would pass the tests. I think the primary concern was about classifying the request as a container or a non-container resource. I don't think NSS's response is good because it should probably stick to /foo and do /foo$.ttl on disk. It is not entirely clear what requesting JSON-LD on /foo.ttl is supposed to do.. not to mention the slight confusion. Any way, this bit is an implementation detail.

@csarven
Copy link
Member

csarven commented Jun 12, 2020

Noting here that the rough consensus in this issue: #128 (comment) was taken up in PR #160 , reviewed, and approved.

However, the issue was revisited in #160 (comment) highlighting the functional requirements, variability, and complexity to implement the heuristics.

#160 (comment) proposed an alternative approach that was deemed to simplify implementation and remove variability. It was reviewed and approved. c1aac42 includes the criteria based on the proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

6 participants