Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebOb's MIMEAccept does follow standards for best_match(), but provides results that are less than satisfactory #239

Closed
digitalresistor opened this issue Apr 13, 2016 · 2 comments
Milestone

Comments

@digitalresistor
Copy link
Member

So I just figured out that MIMEAccept in WebOb returns results that while valid according to the standard feel like they are off...

Accept header:

application/json, text/html, */*;q=0.8

Server offers the following variants:

text/plain, text/html, application/json

Current MIMEAccept.best_match() will return text/html, even though application/json is preferred by the client since it is listed first. This is because we currently use the server variants as the definitive list.

@digitalresistor
Copy link
Member Author

To the RFC's we go...

Well, it would be nice if for a change we had a standard that was sane and was fully specced out, but unfortunately we don't get nearly that lucky.

https://tools.ietf.org/html/rfc7231#page-38

RFC7231 5.3.2 Accept is the RFC for this, and it states:

Accept: audio/*; q=0.2, audio/basic

is interpreted as "I prefer audio/basic, but send me any audio type
if it is the best available after an 80% markdown in quality"."

Which has audio/basic listed after the other, but thankfully there are weights, so making sure that is implemented correctly can be done pretty sanely.

Here's another example:

Accept: text/plain; q=0.5, text/html,
text/x-dvi; q=0.8, text/x-c

Which states:

Verbally, this would be interpreted as "text/html and text/x-c are
the equally preferred media types, but if they do not exist, then
send the text/x-dvi representation, and if that does not exist, send
the text/plain representation".

This could mean that given the following:

    mimeaccept = MIMEAccept('text/plain; q=0.5, text/html, text/x-dvi; q=0.8, text/x-c')
    best = mimeaccept.best_match(['text/x-c', 'text/html'])
    assert best = 'text/x-c'

It is acceptable to respond with text/x-c is the best match, because they are both equally preferred. Would be nice if the RFC actually fixed such ambiguities, but I won't hold my breath.

Prior Art

Apache HTTPD, this thing has or currently probably implements most of the various specs, including all those fun experimental ones. Specifically RFC2295: Transparent Content Negotiation in HTTP and RFC2296: HTTP Remote Variant Selection Algorithm -- RVSA/1.0

It's called mod_negotiation.c and it's a beastly thing. The documentation is here: http://httpd.apache.org/docs/current/content-negotiation.html

It has two methods, one if the browser supports RFC2295 and RFC2296, but otherwise it has what it calls:

Server driven negotiation with the httpd algorithm is used in the normal case. The httpd algorithm is explained in more detail below. When this algorithm is used, httpd can sometimes 'fiddle' the quality factor of a particular dimension to achieve a better result. The ways httpd can fiddle quality factors is explained in more detail below.

I am more interested in server driven negotiation, because we want WebOb to choose a best match, we have no interest in involving the client, and RFC2295/2296 can be implemented on top of WebOb.

Apache implements the following dimensions, so I had to dig through the source code here: https://github.com/apache/httpd/blob/8322599c746bbdf1410a098a5d4764499baf7670/modules/mappers/mod_negotiation.c to get a handle on what it is doing.

I am simplifying this down to just dealing with mime accept:

  1. Take all variants offered by the server and loop over them, storing the one that scores best:
    1. Loop over client Accept for the variant:
      1. Check match "level"
        • For a full on match, give it 3 stars (type/subtype)
        • For a partial match (type/*) give it 2 stars
        • For a wildcard match (/) give it 1 star
      2. If the stars are higher than previous check (better match):
        • If 3 stars, set quality to client accept quality
        • If 2 stars, set to 0.02
        • If 1 star, set to 0.01
      3. Take the quality from step 2 and multiply with source quality, check if this is better than previous best, if not throw it away
  2. Return the best result we have stored.

Given:

Accept header: text/plain; q=0.5, text/html, text/x-dvi; q=0.8, text/x-c and server offer list: text/x-c, text/html

It too would return text/x-c.

Conclusion

I was wrong, It's not busted, but doesn't exactly return results that make a lot of sense to a human being that is looking at it. Unfortunately the spec doesn't provide enough information to state which is the correct answer.

I personally consider the first one mentioned in the the quoted text "text/html and text/x-c are the equally preferred media types" should be the one returned. It comes first in the list of quality 1 (default quality), and the server (in this case our "offer") doesn't care, it hasn't set any quality on it's answers at all, it can return either or because it has both choices.

Browsers for example send a default header for general purpose requests that look similar to:

Safari/Firefox:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

Chrome:
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8

If the server offered the content in both application/xhtml+xml and text/html, in that order, as user I'd be surprised to see the server reply with application/xhtml+xml, even-though it is technically as correct as text/html.

If instead of looping by server variant we looped by client accept:

text/plain; q=0.5, text/html, text/x-dvi; q=0.8, text/x-c with server offer: text/x-c, text/html

We would end up with text/html, because text/x-c would get discarded because it is the same quality, as text/html.

@digitalresistor digitalresistor changed the title WebOb's MIMEAccept does not follow standards for best_match() WebOb's MIMEAccept does follow standards for best_match(), but provides results that are less than satisfactory Apr 14, 2016
@digitalresistor
Copy link
Member Author

Adding a simple best_client_match() function to w.Accept which flips the order that things are iterated over, rather than iterating over server variants and checking it against the client Accept header, we now iterate over the client Accept header and then check to see if there are any offers.

def test_MIMEAccept_best_client_match():
    mimeaccept = MIMEAccept('text/plain; q=0.5, text/html, text/x-dvi; q=0.8, text/x-c')
    assert mimeaccept._parsed == [
        ('text/plain', 0.5),
        ('text/html', 1),
        ('text/x-dvi', 0.8),
        ('text/x-c', 1)]
    best = mimeaccept.best_client_match(['text/x-c', 'text/html'])
    assert best == 'text/html'

def test_MIMEAccept_best_match():
    mimeaccept = MIMEAccept('text/html, application/json, */*;q=0.2')
    server_offer = ['application/json', 'text/html']
    assert mimeaccept.best_match(server_offer) == 'application/json'
    assert mimeaccept.best_client_match(server_offer) == 'text/html'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant