Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accept handling during view lookup is unpredictable #1264

Closed
mmerickel opened this issue Mar 7, 2014 · 37 comments
Closed

accept handling during view lookup is unpredictable #1264

mmerickel opened this issue Mar 7, 2014 · 37 comments
Milestone

Comments

@mmerickel
Copy link
Member

As per the examples in #1259, it should be clear that the accept handling in Pyramid is somewhat unpredictable. It is a smell that the view invoked is somewhat dependent on the ordering of a set. When defining some views with accept predicates, a user should be able to have a solid understanding of when each view will be invoked, and possibly be able to influence what happens what a client simply asks for */* or does not include an Accept header.

@digitalresistor
Copy link
Member

👍

This ultimately also comes down to the way that WebOb parses the accept headers, and what it considers a match. Maybe there should be a way to verify it matches exactly (ignore Accept */* in matching), so that the predicate can be more predictable...

@mmerickel
Copy link
Member Author

I don't actually think this is a problem with WebOb but rather how Pyramid handles the accept predicate in the MultiView.

@dobesv
Copy link
Contributor

dobesv commented Mar 12, 2014

FWIW, in WebOb they allow you specify a "server preference" for the accept offers by specifying a pair of (content_type, server_q) instead of just the content_type. If pyramid supported the same thing it could pass-through to that mechanism to let the user specify the ordering in cases where abiguity is a problem.

@dobesv
Copy link
Contributor

dobesv commented Mar 12, 2014

Another idea: for a route one could optionally specify a list of content-types available for that route. The order specified in the route would be used to resolve ambiguities like the one mentioned to provide a final order of preference for the result representation. This order could be used to order the views before checking their other predicates.

@dobesv
Copy link
Contributor

dobesv commented Mar 12, 2014

I think in cases where there IS ambiguity it might make sense to issue a warning of some sort and provide a way for developers to resolve the issue in some manner such as the two ideas above.

@mmerickel
Copy link
Member Author

I think that the accept= predicate should be updated to support a quality measurement. For example accept='application/json;q=1.0' vs another view with accept='application/json;q=0.5'. We could leave the default quality at 1.0. It'd also be useful if ambiguous views could be detected and emit a warning (possibly a ConfigurationError in pyramid 2.0).

@digitalresistor
Copy link
Member

Those two that you just mentioned should conflict. They are both of type 'application/json' and the second would NEVER match.

accept='application/json;q=1.0' vs accept='text/html;q=0.5' should mean that application/json gets picked.

@mmerickel
Copy link
Member Author

Yep, bad example on my part.

@mmerickel
Copy link
Member Author

The server quality is already supported by webob and so we could start supporting it with only minimal changes to pyramid's accept handling.

https://github.com/Pylons/webob/blob/31620bb8b7749c92693e4d6d7233d9547b942a57/webob/acceptparse.py#L126

I'm interested in arguments for or against this.

@mmerickel
Copy link
Member Author

The first argument against I can think of is that the interaction between server quality and client quality is a little hard to understand because they are multiplied together to come up with the final quality / sorted order of views (it's not defined at config time but rather per request). Regardless of that, I think it's not a bad solution.

@digitalresistor
Copy link
Member

In most cases the server quality will be used mainly to disambiguate which one is preferred ("default") server side.

@mmerickel
Copy link
Member Author

In most cases the server quality will be used mainly to disambiguate which one is preferred ("default") server side.

Are you saying that the quality option is overkill then? You'd rather just have a server-side weighting that is not combined with the client-side weighting?

@digitalresistor
Copy link
Member

I would like the option of having it take into account client side weighting (so that for example if my client sends application/json first, and text/html with q=0.5 then application/json could win if I have set up a view with text/html;q=1.0 and another with application/json;q=0.9. But I think that would be more difficult to document if we had to fully explain how accept matching works in the first place.

Just based on my experience it is pretty rare to require full on matching. I haven't had a use case for it yet... All I generally care about is that text/html is selected when the browser hits my Pyramid app, and all other requests are JSON from JavaScript and those all hit application/json.

@mmerickel
Copy link
Member Author

All I generally care about is that text/html is selected when the browser hits my Pyramid app, and all other requests are JSON from JavaScript and those all hit application/json.

So hypothetically let's use the solution I've described as it's the only one I can find that's actually concretely defined. How does a user get that to work?

config.add_view(..., accept='application/json')
config.add_view(..., accept='text/html')
config.add_view(..., accept=None)

This snippet works fine when the client specified different quality for application/json and text/html. How can we change that snippet to work when the client specifies the same quality?

config.add_view(..., accept='application/json')
config.add_view(..., accept='text/html;q=0.9')

Okay, we now declare that we prefer application/json. Using multiplication we get that application/json;q=1.0*1.0=1.0 and text/html;q=1.0*0.9=0.9 so we will prefer application/json.

Is this what we want? Now when a client requests text/html they will get back application/json unless they specific give text/html a higher quality. So let's say they do that and specify text/html,application/json;q=0.8. Now we have application/json;q=0.8,text/html;q=0.9. So the client will receive html. Note that the selection of 0.8 by the client was dependent on knowing that the server defined q=0.9. If the client said 0.9 we'd still have an issue!

Long story short, I'm not sure multiplying client_q * server_q is the right answer. We may just want to construct a sorted list on the server using the weights as sorting keys.

@digitalresistor
Copy link
Member

Welcome to the shitty part of the HTTP standard ;-) There is also no defined standard for how Accept/Accept-Encoding/Accept-language and whatnot should be checked, so depending on implementation you could end up with different documents.

@mmerickel
Copy link
Member Author

Well the sanest thing I can think of is to use the quality of the client cross-checked against a sorted list from the server. The question is how to define the sorted list. It's not possible in pyramid to define one directly because views - urls is many - many so we are left with what?

  1. a weighting factor per-view that lazily creates a sorted list
  2. a globally sorted list of mime-type expressions
  3. a pluggable resolver executed per-request that returns a sorted list of mime-type expressions

I'm trying my hardest to come up with some options!

@dobesv
Copy link
Contributor

dobesv commented Mar 15, 2016

Use cases:

  1. Legacy clients does not send any Accept header, or sends Accept: */*, but the server has started to support multiple content types for that resource now, and needs a way to make the legacy format the default.
  2. The client sends a very specific Accept header, which actually only matches a single view on the server. This case is already handled.
  3. A web browser sends a list of various content types with a preferred type text/html but allows */*. This works fine now for the "preferred" types of the client (e.g. text/html), since the client's preference is probably the best choice if they have provided one. However, if you fall into */* land it would be nice to let the server pick the better default here.

So, looking at the cases I can think of, I think the server-side matching should only be used to break a tie if it's not clear which type the client prefers.

@mmerickel
Copy link
Member Author

Thanks @dobesv, I think we are in agreement here that Pyramid currently works correctly in every situation except when the client specifies multiple accept types with the same quality.

@zart
Copy link

zart commented Apr 12, 2016

https://httpd.apache.org/docs/trunk/content-negotiation.html might be of interest to see how other projects handle this.

@digitalresistor
Copy link
Member

@zart that document is one of the few that I've found that shows a concrete ordering, because none of the standards mention it.

It basically devolves into a very large complicated matrix of possible choices...

@mmerickel and I had a discussion using that document as an example a while back with some ideas that came out it, but it's still difficult within Pyramid.

@zart
Copy link

zart commented Apr 12, 2016

It's one of implementations of https://tools.ietf.org/html/rfc2295 as far as I understand.

@bjmc
Copy link

bjmc commented Mar 28, 2017

It sounds like this is a reasonably complicated issue made worse by ambiguity in the RFC. Would it be possible to pick the low-hanging fruit and at least provide a way to specify a predictable "fallback" choice for when the client doesn't send an Accept header or sends */*? (use case 1 above) Or is there concern that the wrong implementation to handle that special case will make it harder to solve the general problem later?

@mmerickel
Copy link
Member Author

The two scenarios that are really difficult in pyramid right now are:

  1. The client sends ambiguous accept headers where there's multiple options with the same quality. This could be text/html, text/plain or simply text/* where we have more than one view that could match.

  2. The client does not send an accept header, meaning it would accept anything.

We had a very long conversation in irc about this that I'll link here for now until I have time to go back and distill it into possible action items.

https://botbot.me/freenode/pyramid/2016-11-28/?msg=77139347&page=2

@whiteroses
Copy link
Contributor

I've been looking into this issue to see if I can help with fixing it, and have a couple of questions:

  1. The accept parameter for config.add_route, config.add_view, config.add_notfound_view and add_forbidden_view accepts wildcards like text/* and */*. This is in line with MIMEAccept's .best_match (which it inherits from Accept), but it doesn't seem right to me: a view or a resource cannot claim to offer all the media types (*/*) or all the subtypes under a type (text/*). This is also not allowed for the corresponding NilAccept.best_match, where there is a call to _check_offer (which is why we get the ValueError exception here).
    I did not allow offers with wildcards in the new acceptable_offers method implemented for WebOb, as offers with wildcards were not in the RFC and did not seem to make sense, but only noticed yesterday that Pyramid also allowed wildcards in offers. I think a view can offer multiple media types, but it should always be a finite set — would you agree?

  2. The cause of the unpredictability in this issue was identified in PR #1259, but the concern in rejecting that PR was that it introduced an ordering to view definitions, and due to the usage of the @view_config decorator, Pyramid views should be very explicitly unordered. But should they be? I tested with @view_config decorators, and if we deduplicated without the set(), the accepts provided to @view_configs come in in a predictable order (from inner-most @view_config outwards). Am I missing any reasons why view registration needs to be unordered?

@mmerickel
Copy link
Member Author

The accept parameter for config.add_route

The accept parameter on config.add_route is not really part of content negotiation... it is a true predicate since routes are matched in config-order. It is improperly labeled in the docs right now as a "non-predicate argument".

I think a view can offer multiple media types, but it should always be a finite set — would you agree?

I think I disagree with this. I mean practically you could make that argument but theoretically a view can just say I accept these types of things and then internally decide between lots of things what type to return. We don't really know what the view wants to do with the request after it matches.

Am I missing any reasons why view registration needs to be unordered?

Yes, it's possible to override views later during config-time etc. There is 100% absolutely no guaranteed order between views defined by their place in a python module.

@whiteroses
Copy link
Contributor

whiteroses commented Sep 13, 2017

Yes, it's possible to override views later during config-time etc. There is 100% absolutely no guaranteed order between views defined by their place in a python module.

Yes sorry, I was totally focusing on the wrong thing, but I see it now.

In that case, we could add an optional order value, either specified with the accept argument as a pair, or as a separate argument. View configurations with no order value specified would take a default value of 0; when specified, the optional order value could be any integer (positive, 0, or negative). The views could then be looked up from highest order value to lowest?

This made me wonder whether the order value might be useful beyond the accept parameter: are there other cases where we might need to control view lookup order? Such as when two view configurations have the same number of predicates (so same specificity), and they both match; or cases where we might need view lookup to ignore specificity completely and order the views in another way? I'm still familiarising myself with the internals of how Pyramid view lookup works, and what I've been able to gather from testing and reading the code so far is that there may be more to view lookup than just specificity; that each predicate also has its own weight that affects lookup? Would that be correct?



I think a view can offer multiple media types, but it should always be a finite set — would you agree?

I think I disagree with this. I mean practically you could make that argument but theoretically a view can just say I accept these types of things and then internally decide between lots of things what type to return. We don't really know what the view wants to do with the request after it matches.

I think I see that possibility. But let's say the Accept header is 'text/html;q=0, */*', and the accept argument to config.add_view() is 'text/*'. Should that be a match?

If we take the 'text/*' to mean "this view will handle any text/ media types", then it should be a match. But then how would the view know that text/html has been ruled out? Would it then have to go and check the header again? (Which would rather defeat the point of the accept parameter.)

Or we could take the 'text/*' to mean "this view should match only when all subtypes under text/ are considered acceptable by the header" — in which case it would not be a match.

(Right now best_match and __contains__ of MIMEAccept (both currently used by Pyramid in view lookup) both return True for this, because of MIMEAccept._match.)

This behaviour is not in RFC 7231 — there are no wildcards on the server side. To me this seems to be something outside content negotiation, and Pyramid-specific; and it turns out it was actually introduced into WebOb in response to Pylons/webob#155, where it was noticed that Pyramid documentation allowed wildcards in the accept parameter, but WebOb didn't. I was unable to find the reasoning for why Pyramid chose to allow wildcards; the pull request for WebOb changed MIMEAccept to allow wildcards in offers, but did not change MIMENilAccept, which means that currently any matching routes or views with wildcards in its accept argument would raise a ValueError if the Accept header is not in the request, or if it is empty. As no one has filed an issue for this, it would suggest that no one is using wildcards. For the reasons above, imo wildcards should not be allowed in the accept parameter; in the unlikely case where a view does need wildcard-like behaviour (which of the two meanings?), I think it could be relatively easily and more suitably implemented with a custom predicate.

@mmerickel
Copy link
Member Author

Is your concern about wildcards simply the q=0 situation? It'd help to pin down exactly what the problem is with wildcards to figure out what to do with them.

@dobesv
Copy link
Contributor

dobesv commented Sep 13, 2017

The wildcard accept header might be a weird case we can safely ignore for the purposes of this ticket? If it causes trouble it can get its own ticket.

This issue (in my mind) really boils down to what does the client get when they don't send an Accept header and you have a few different views that can satisfy that request. Currently the choice is arbitrary and subject to change on server restart.

I think I see that possibility. But let's say the Accept header is 'text/html;q=0, /', and the accept argument to config.add_view() is 'text/*'. Should that be a match?

I think if you use a wildcard in your accept predicate you must manually go and see what the client actually said they would accept. For example if the client sends Accept: image/png and your view matches on accept=image/* then you must not go ahead and send back image/jpg. This doesn't defeat the purpose of the accept predicate. Perhaps you have an endpoint that is capable of generating images in various formats and you are feeling too lazy to list them explicitly in your accept predicate, but you will use the wildcard and then scan the Accept header to determine the format to use.

Not really a case I'd work hard to support, though, as you said it can easily be worked around by using the WebOb APIs yourself to pick a media type.

@whiteroses
Copy link
Contributor

This issue (in my mind) really boils down to what does the client get when they don't send an Accept header and you have a few different views that can satisfy that request. Currently the choice is arbitrary and subject to change on server restart.

The issue is present for any Accept header, where there is more than one equally-preferred "best match" given the view configurations. I proposed a fix in the first half of my last comment, using the optional order value — what do you think? I have read through many of the discussions on this issue, and believe similar solutions to the problem have been discussed, but not with 0 as the default value — I think 0 should work well.

I mentioned the issue with wildcards because it seemed to me it would significantly simplify the task if wildcards weren't allowed in the accept parameter, and I think the fix for this issue would involve fixing the other issues around Accept handling, but I can open another issue for this. @mmerickel: I'll try to explain the concerns with wildcards better in the new issue, when I get a chance to write it up, hopefully in the next couple of days.

What do you think about the fix for the issue in this ticket proposed in the first half of my last comment?

@dobesv
Copy link
Contributor

dobesv commented Sep 14, 2017

For our app I treat any Accept header that is missing, blank, or */* as if it were application/json, text/html;q=0.9, image/*;q=0.7, text/*;q=0.6, */*;q=0.1. This solves the wildcard problem for me.

We don't use wildcards in the accept predicate, I think that is not a great idea even if it could be useful from time to time as a shortcut. I would suggest ignoring that case even if it is supported by accident.

I don't worry about more complicated "theoretical" cases since I know the full range of Accept headers I need to support.

That said, the original issue here was some vague feeling that Accept header processing should be more deterministic.

I think once you have addressed the missing/empty/wildcard Accept header using a default as described above, it might be fine to simply return an error any time there are a set of views that match a request equally well. Or perhaps return an arbitrary view response and log a warning. I don't think it's really necessary to support every imaginable use case.

It's really just browsers (who always prefer text/html), API clients (who should use mandated, simple Accept headers or no Accept header if they are legacy clients), and curl (where it's nice not to have to provide Accept every time) you need to deal with here.

@whiteroses
Copy link
Contributor

whiteroses commented Sep 14, 2017

For our app I treat any Accept header that is missing, blank, or / as if it were application/json, text/html;q=0.9, image/;q=0.7, text/;q=0.6, /;q=0.1. This solves the wildcard problem for me.

But the problem isn't just with wildcards and missing header. Let's say your app has one view configuration with an accept argument of 'text/html', and another view configuration with an accept argument of 'application/json'. The header coming in is 'text/html, application/json'. Currently which view is called for the request is essentially random.

I think once you have addressed the missing/empty/wildcard Accept header using a default as described above, it might be fine to simply return an error any time there are a set of views that match a request equally well. Or perhaps return an arbitrary view response and log a warning. I don't think it's really necessary to support every imaginable use case.

It's really not an error: there is nothing wrong with having more than one equally preferred 'best matches'. We find this in one of the examples in RFC 7231:

A more elaborate example is

  Accept: text/plain; q=0.5, text/html,
          text/x-dvi; q=0.8, text/x-c

Verbally, this would be interpreted as "text/html and text/x-c are
the equally preferred media types, but if they do not exist, then
send the text/x-dvi representation, and if that does not exist, send
the text/plain representation".

So if we have two views here, one with an accept argument of 'text/html', and the other with an accept argument of 'text/x-c', then they are both equally preferred. The RFC does not say what to do then, I imagine because there are applications where we might want to return both media types at the same time; but in Pyramid's case, the logical action to take is for the server to just choose one for the client.

To choose a best match, Pyramid needs to pass a sequence-type (tuple/list) offers argument to WebOb — which has an order. But because Pyramid's views are registered one at a time, we currently have no way of telling Pyramid how to order the views when they are equally preferred — hence the proposal of the optional order value. There is no need to raise an exception or to warn, because the client is saying explicitly that the media types are equally preferred, so choosing between the views randomly is not even wrong; but that is rarely the behaviour we want. To get deterministic view lookup that we can control, we just need to find a way to specify the order we want within the view configurations so that Pyramid can know what order the media types should be in in offers; then Pyramid passes it to WebOb, WebOb uses the order in offers as tiebreaker if necessary, problem solved.

@dobesv
Copy link
Contributor

dobesv commented Sep 15, 2017

The only thing I don't like about the ordering thing is that it requires more book-keeping and, at the end of the day, you might end up with a lot of views with the same order value that are being selected arbitrarily. Or developers have to come up with some kind of internal rule like "set 100 for JSON, 50 for html, etc...".

I wonder if setting a global preference on media types would be more practical, since you just have to do it once, globally for the application. Some kind of "tie breaker" rule where you can say "If in doubt, prefer application/json, then image/png, then text/html, then text/plain."

I suppose the list is not complete you can still end up with an arbitrary choice.

I guess at the end of the day we can never eliminate the possibility of a tie, we can provide some tools to help avoid them like the ordering parameter you gave. Honestly I don't know which is better, I'm happy with my solution, so I'm just putting in my 2 cents here. It's up to Michael in the end, I think.

@mmerickel
Copy link
Member Author

mmerickel commented Sep 15, 2017

Hey folks I've been following this discussion and it has been excellent. I agree that handling the accept issues on a per-view basis can be pretty annoying and is why I'm also leaning toward some form of global hook which can sort the views however it wishes. The api isn't quite solidified in my head because I don't really want something that looks like view lookup. It's probably something that takes the request and a list of accept offers from view_config(accept=...) and returns the ordered list of offers. The offers would then be used to order the views for the remainder of view lookup. It's basically the get_views function here that needs some assistance testing accepts in the right order.

Ideally this sorting would be done at config-time such that the sort wouldn't even need the request, but I'm not clear yet on whether that makes sense. It basically means sorting offers in a vacuum but that might be want we want here.

@dobesv
Copy link
Contributor

dobesv commented Sep 16, 2017 via email

@whiteroses
Copy link
Contributor

I agree that we need a way to specify the order globally — we are talking about being able to specify the order both globally and per view, right?

Unfortunately I don't think we can specify the order on the route, as it also needs to work with traversal?

I believe the sort can be done at config-time, as it is just about coming up with an ordered sequence-type offers to pass to WebOb, which does not involve the request (so it should happen in add rather than get_views?)

@mmerickel
Copy link
Member Author

An active PR and discussion on this topic is in #3326 and I would really appreciate anyone who thinks Pyramid should actually support media ranges in the accept predicate to weigh in on their use case because right now everyone seems to be against supporting them and requiring concrete media types.

@stevepiercy stevepiercy added this to the 1.10 milestone Aug 30, 2018
@mmerickel
Copy link
Member Author

this issue is fixed via #3326

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants