Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually constructed content-type headers with quotes are not processed correctly #1181

Closed
wrosenuance opened this issue Jun 20, 2020 · 13 comments

Comments

@wrosenuance
Copy link

wrosenuance commented Jun 20, 2020

I was attempting to create a MTOM SOAP payload using SAAJ to send through Karate, and encountered an issue because the Content-Type header that SAAJ generated was being adjusted by Karate. While debugging, I was able to narrow down the location where the issue was introduced to the ApacheHttpUtils class, and I can replicate the issue with the following code:

package karate;

import java.nio.charset.StandardCharsets;

import org.apache.http.HttpEntity;

import com.intuit.karate.http.apache.ApacheHttpUtils;

public class ApacheHttpUtilsTest {
    public static void main(String[] args) {
        final String originalContentType = "multipart/related; charset=UTF-8; boundary=\"----=_Part_19_1913847857.1592612068756\"; type=\"application/xop+xml\"; start-info=\"text/xml\"";

        HttpEntity httpEntity = ApacheHttpUtils.getEntity("content is not important",
                originalContentType, StandardCharsets.UTF_8);

        System.out.println("Original: " + originalContentType);
        System.out.println("Updated:  " + httpEntity.getContentType().getValue());
        System.out.println("They are " +
                (originalContentType.equals(httpEntity.getContentType().getValue()) ? "equal" : "not equal"));
    }
}

The output from running this is:

Original: multipart/related; charset=UTF-8; boundary="----=_Part_19_1913847857.1592612068756"; type="application/xop+xml"; start-info="text/xml"
Updated:  multipart/related; charset=UTF-8; boundary="\"----=_Part_19_1913847857.1592612068756\""; type="\"application/xop+xml\""; start-info="\"text/xml\""
They are not equal

The problem is that the Content-Type header is split at semi-colons and then broken into name-value pairs, but quotes around values are not removed. Then the names and values are passed to code that escapes the quotes that were not removed, leading to the appearance above.

To workaround this I had to remove quotes manually from the Content-Type value coming from SAAJ, which seems less than ideal since these values could in theory contain spaces or other characters that would cause problems for the string splitting algorithm.

@ptrthomas
Copy link
Member

@wrosenuance thanks for the details, since you have all the details, do you think you can submit a PR, also see if karate-jersey works. am tagging as help wanted

@wrosenuance
Copy link
Author

@ptrthomas I will check with my company - I need to get approval for contributions, which can take a while. Assuming that comes through I can definitely work on a PR!

@ptrthomas
Copy link
Member

@Nishant-sehgal
Copy link
Contributor

Nishant-sehgal commented Oct 2, 2020

@ptrthomas can you assign this issue to me if it is not yet picked?

@Nishant-sehgal
Copy link
Contributor

Nishant-sehgal commented Oct 2, 2020

@ptrthomas Have done initial analysis.

https://github.com/intuit/karate/blob/3322d55a30b06b3cc5f16d2c2e00562d3d0eea54/karate-core/src/main/java/com/intuit/karate/http/HttpUtils.java#L118

Double Quotes if present in front or in end are not removed in above class and it end up escaping the quotes while creating ContentType object.

Will be adding a new method in StringUtils.trimDoubleQuotes to trim the double quotes from starting and end so if any double quotes are present in front and last it will be removed.

Is this approach looks fine?

Should we also remove all the double quotes from front or end or only the first quote.

If media type has below values what should be the expected out come?

"abc" -> abc
"\"abc\"" -> should it be "abc" or should it also be abc
"\"\"\"abc\"\"\"" -> should it be abc?

Based on this will add a regex.

@ptrthomas
Copy link
Member

this may need research but if we are only trying to get the Charset - we should use the HttpUtils from Netty which has a public static Charset getCharset(CharSequence contentTypeValue, Charset defaultCharset) method.

@Nishant-sehgal
Copy link
Contributor

Nishant-sehgal commented Oct 2, 2020

Till here everything works fine.
https://github.com/intuit/karate/blob/master/karate-apache/src/main/java/com/intuit/karate/http/apache/ApacheHttpUtils.java#L89
But when we are trying to withParameters with key,value this is the place where double quotes are getting escaped.

@ptrthomas
Copy link
Member

@Nishant-sehgal let me try again, I'm saying we need to rewrite the karate httputils to use this method: https://netty.io/4.1/api/io/netty/handler/codec/http/HttpUtil.html#getCharset-java.lang.CharSequence-java.nio.charset.Charset-

@Nishant-sehgal
Copy link
Contributor

Nishant-sehgal commented Oct 3, 2020

@ptrthomas I am trying to understand how will using HttpUtil.getCharset of netty help.

If we use netty Implementation or existing karate httputils both are behaving the same way and returning the same Charset object with same data.
We can refractor the code to start using netty httputil instead of existing httpUtil implementation.But behaviour will be same.

Issue is when we are setting parameter in content type.

ct = ct.withParameters(new BasicNameValuePair(entry.getKey(), entry.getValue()));

Since we are creating HttpEntity object that requires ContentType.

I also found ContentType.parse(mediaType) correctly populate mimeType,charset and parameters and internally handles quotes as well.

But we are calling

ContentType ct = ContentType.parse(mediaType).withCharset(charset); 

where withCharset reset the parameters object as it creates a new object and now if we call withParameters it adds the parameters but that does not handles the quotes.

I think there are 2 options:

  1. handle double quotes while populating parameters or Similar to what i initial thought while reading value from content-type.
  2. call ContentType.parse(mediaType) only once and make sure if mediaType does not have charset, we set it before calling parse method.

Please suggest how shall i take it forward.

@ptrthomas
Copy link
Member

@Nishant-sehgal thanks for the analysis, this helps ! option 2 sounds like less of a hack to me

@Nishant-sehgal
Copy link
Contributor

Nishant-sehgal commented Oct 4, 2020

@ptrthomas Have raised an initial PR Please share your thoughts for the same. PR

@Nishant-sehgal
Copy link
Contributor

Nishant-sehgal commented Oct 4, 2020

Meanwhile i also though of an other implementation

 private static ContentType getContentType(String mediaType, Charset charset) {
        if (!HttpUtils.isPrintable(mediaType)) {
            try {
                return ContentType.create(mediaType);
            } catch (Exception e) {                
                return null;
            }
        }
       
        final CharArrayBuffer buf = new CharArrayBuffer(mediaType.length());
        buf.append(mediaType);
        
        final HeaderElement[] elements =
        		BasicHeaderValueParser.INSTANCE.parseElements(buf, new ParserCursor(0, mediaType.length()));
        
        HeaderElement headerElement = null;
        
        if(null != elements && elements.length == 1) {
        	headerElement = elements[0];
        }
        
        ContentType ct = ContentType.create(headerElement.getName(), HttpUtil.getCharset(mediaType, charset));
        
        if(headerElement.getParameters() != null) {
        	for (NameValuePair nameValuePair : headerElement.getParameters()) {
        		ct = ct.withParameters(nameValuePair);
        	}
        }

        return ct;
    }

@ptrthomas
Copy link
Member

1.0 released

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants