Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally require a valid content type for all rest requests with content #22691

Merged
merged 43 commits into from
Feb 2, 2017

Conversation

jaymode
Copy link
Member

@jaymode jaymode commented Jan 19, 2017

This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting http.content_type.required: true in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously.

The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type.

As part of this, many transport requests and builders were updated to provide methods that
accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated.

In non strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header.

See #19388

This change enforces that all incoming rest requests have a valid and supported content
type header before the request is dispatched. The content type header is parsed to the
matching XContentType value with the only exception being for plain text requests. This
value is then passed on with the content bytes so that we can reduce the number of places
where we need to autodetect the content type.

As part of this, many transport requests and builders were updated to provide methods that
accepted the XContentType along with the bytes and the methods that would rely on autodetection
have been deprecated.

Closes elastic#19388
Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot for working on this @jaymode . I did a first review round and left a bunch of comments. I will need to do another round another time as my eyes are crossing now :) looks good though. I am super happy that we are fixing this.

return xContentType;
}

@Deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add javadocs with the motivation for the deprecation and what it is replaced with?
I assume that these methods can be removed in master once this PR is backported to 5.x?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this can go away in master after backporting.

}

@Override
public String toString() {
String sSource = "_na_";
try {
sSource = XContentHelper.convertToJson(script, false);
if (xContentType == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify that this if is only needed for bw comp and should go away once the bw comp layer is removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually going to remove this and require a non-null xContentType.

@@ -40,9 +41,14 @@ public PutStoredScriptRequestBuilder setId(String id) {
return this;
}

@Deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadocs?

public PutMappingRequest source(String mappingSource) {
this.source = mappingSource;
return source(mappingSource, XContentFactory.xContentType(mappingSource));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not something that you changed but maybe something to fix as a followup, we have 4 xContentTypes, but given that source is a string, we only support yaml or json here. I wonder if source should rather be a BytesReference? I never thought we'd support yaml here though :) who knows if anybody is using that format. I also wonder if instead of converting here we should keep around the xContentType and carry it around all the way to DocumentMapperParser#parse where we actually parse it. That may be a bigger effort but sounds cleaner to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the way I suggested to go before is totally different from what I am saying in this second review. That previous direction was probably the most correct but had a high cost (bw comp mainly). It's the only one that doesn't require any conversion, but just parsing in MapperService once we know to content type. We can still go that way if we really want to... but in most of the cases we get json so we may end up complicating things to handle edge cases which is never good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also add tests for all these different content types around mappings, otherwise it's all theories. I can look into it as a followup

return this;
}

/**
* The settings to crete the index template with (either json/yaml/properties format).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you fix this typo while at it? s/crete/create

storedScriptRequest.script(new BytesArray("{}"), XContentType.JSON);

assertEquals(XContentType.JSON, storedScriptRequest.xContentType());
BytesStreamOutput output = new BytesStreamOutput();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit maybe use try finally around stream outputs and stream inputs?

serialized.readFrom(in);
assertEquals(XContentType.JSON, storedScriptRequest.xContentType());
assertEquals(storedScriptRequest.scriptLang(), serialized.scriptLang());
assertEquals(storedScriptRequest.id(), serialized.id());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for testing it. It is some more effort but what we've been doing to test also that when we receive from an older version properly is to deserialize a base64 encoded request that was serialized using the previous version. That is much better than simulating things.

request.writeTo(bytesStreamOutput);

StreamInput in = StreamInput.wrap(bytesStreamOutput.bytes().toBytesRef().bytes);
in.setVersion(Version.V_5_0_0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in all these serialization tests, shall we randomize the version just to make sure that the conditional target the proper versions? you may bump into the fact that some released versions are still marked unreleased hence cannot be used in VersionUtils.randomVersion.

// this is hacky but here goes
PutMappingRequest request = new PutMappingRequest("foo");
String mapping = YamlXContent.contentBuilder().startObject().field("foo", "bar").endObject().string();
request.source(mapping, XContentType.JSON); // THIS IS NOT A BUG! Intentionally specifying the wrong type so we serialize it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what practical usecase does this test? user is providing the mapping in yaml but?

}
/**
* A wrapper of {@link HttpHeaders} that implements a map to prevent copying unnecessarily. This class does not support modifications
* and due to the underlying implementation, it performs case insensitive lookups of key to values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice some other comment of mine above may be wrong then, seems like you have made headers case-insensitive. So does it mean that they we case-sensitive before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The headers were case-insensitive before. Both the Netty 3 and 4 implementations implemented this.

@clintongormley
Copy link
Contributor

clintongormley commented Jan 20, 2017

@elastic/es-clients this will require the clients to send a Content-Type header

@jaymode if no Content-Type header is received, do we just assume JSON? Otherwise this will impact command line curl statements

@jaymode
Copy link
Member Author

jaymode commented Jan 20, 2017

if no Content-Type header is received, do we just assume JSON?

We do not assume JSON. This commit requires the header to be present and a known type (JSON, YAML, SMILE, CBOR, Plain text). Curl sends the content-type as application/x-www-form-urlencoded by default (can be overriden with -H 'Content-Type: application/json').I do not think we should blindly accept this value

@karmi
Copy link
Contributor

karmi commented Jan 20, 2017

(...) this will require the clients to send a Content-Type header

I think it's trivial to set a default header, but we'll have to think about the "Cat" APIs -- I've tried to set application/json by default once, and of course broke all the _cat endpoints, which started to return JSON instead of text.

On the other hand, there could be a debate whether the "Cat" APIs shouldn't return parsed JSON when you call them from something like JavaScript, Python, ..., instead of plain text.

We do not assume JSON. (... Curl sends the content-type as application/x-www-form-urlencoded by default (...) I do not think we should blindly accept this value.

I might be missing a larger picture here, but do we have grave enough reason to just break the most simple use case when people do curl localhost:9200 on the command line?

@jaymode
Copy link
Member Author

jaymode commented Jan 20, 2017

I might be missing a larger picture here, but do we have grave enough reason to just break the most simple use case when people do curl localhost:9200 on the command line?

This is my fault as I did not state this clearly. We only require the Content-Type header for a request made to elasticsearch with a body (eg POST with JSON body). So curl localhost:9200 remains unaffected.

Another aspect is that sometimes we allow a GET with a source parameter and currently that doesn't require a header; I did add a source_type parameter that we may want to make required in the future to avoid autodetection of content.

@jasontedor
Copy link
Member

I think it's trivial to set a default header, but we'll have to think about the "Cat" APIs -- I've tried to set application/json by default once, and of course broke all the _cat endpoints, which started to return JSON instead of text.

I think this is because we use to mistakenly use the Content-Type field as determining the media type of the response. That is no longer the case, we use the Accept header for that now. So, I think that this should not be a problem anymore.

@honzakral
Copy link
Contributor

We only require the Content-Type header for a request made to elasticsearch with a body (eg POST with JSON body). So curl localhost:9200 remains unaffected.

but it would still break curl -X PUT localhost:9200/i/t/42 -d '{"hello": "world"}' and other uses of curl like put settings etc (often executed as one-off from the cmdline). That seems like a huge breaking change and downgrade for usability.

Also what would happen with bulk and m* APIs? Those are technically not valid JSON...

I would be 100% for abolishing the auto-detect in favor of assuming json (not trying to detect it) and requiring content-type if it's anything else. How does that sound?

@karmi
Copy link
Contributor

karmi commented Jan 21, 2017

We only require the Content-Type header for a request made to elasticsearch with a body (eg POST with JSON body). So curl localhost:9200 remains unaffected.

but it would still break curl -X PUT localhost:9200/i/t/42 -d '{"hello": "world"}' and other uses of curl like put settings etc (often executed as one-off from the cmdline). That seems like a huge breaking change and downgrade for usability.

Yes, that is what I was pointing out. Again, do we have a good enough reason to break things like this?

I would be 100% for abolishing the auto-detect in favor of assuming json (not trying to detect it) and requiring content-type if it's anything else. How does that sound?

+1 Assume JSON by default, require Accept for any other format.

@alexbrasetvik
Copy link
Contributor

How new (official) clients do users need to have for this?

@javanna
Copy link
Member

javanna commented Jan 23, 2017

I think that assuming json when the header is not available is already a huge improvement over auto-detection. We may want to make the header required at some point whenever a body is provided, but we can always do that later, and potentially add deprecation logging for that first and break with the next major version. After all the vast majority of our users use json so that assumption seems about right to me for now.

@jaymode
Copy link
Member Author

jaymode commented Jan 23, 2017

I think that assuming json when the header is not available is already a huge improvement over auto-detection. We may want to make the header required at some point whenever a body is provided, but we can always do that later, and potentially add deprecation logging for that first and break with the next major version. After all the vast majority of our users use json so that assumption seems about right to me for now.

Doing this doesn't solve the issue the usability issues with curl since the Content-Type header will be sent but will not be a supported value unless specified by a user.

I think the requirement of this header should be removed from this PR, based on the text from RFC 7231:

A sender that generates a message containing a payload body SHOULD
generate a Content-Type header field in that message unless the
intended media type of the enclosed representation is unknown to the
sender. If a Content-Type header field is not present, the recipient
MAY either assume a media type of "application/octet-stream"
([RFC2046], Section 4.5.1) or examine the data to determine its type.

In practice, resource owners do not always properly configure their
origin server to provide the correct Content-Type for a given
representation, with the result that some clients will examine a
payload's content and override the specified type. Clients that do
so risk drawing incorrect conclusions, which might expose additional
security risks (e.g., "privilege escalation"). Furthermore, it is
impossible to determine the sender's intent by examining the data
format: many data formats match multiple media types that differ only
in processing semantics. Implementers are encouraged to provide a
means of disabling such "content sniffing" when it is used.

Assuming a media type of application/octet-stream does not do us much good and the auto-detection is what we want to remove.

@javanna
Copy link
Member

javanna commented Jan 23, 2017

Doing this doesn't solve the issue the usability issues with curl since the Content-Type header will be sent but will not be a supported value unless specified by a user.

Right, thanks for clarifying. Seems like we don't want to break curl in any case, hence if we do want to remove auto-detection, we may have to make curl work with some hack. If curl is resolved one way or the other, do we want to assume json or always require the header (besides the curl specific case)?

@honzakral
Copy link
Contributor

To not break curl, and still be somewhat consistent, would essentially mean to assume json unless another known content-type header value is present.

...or decide (in a different issue) we don't need another formats other than json and completely side step this issue.

@javanna
Copy link
Member

javanna commented Jan 23, 2017

to assume json unless another known content-type header value is present

I think this is too lenient. I would rather special case curl in some other way and find out about other cases that may need special treatment (hopefully there aren't).

@Mpdreamz
Copy link
Member

To not break curl, and still be somewhat consistent, would essentially mean to assume json unless another known content-type header value is present.

++ we can document sending any unknown Content-Type will default to application/json and that this may break at anytime if at some point we will actually support that mime type.

This actually also brings its behavior inline with Accept which will revert to application/json for any unsupported output media type.

karmi added a commit to elastic/elasticsearch-ruby that referenced this pull request Feb 7, 2017
…' header

In future, Elasticsearch will require specifying the format of data which is being sent
 (eg. when indexing a document). Current versions of Elasticsearch will start to
 print a warning to the deprecation log.

 A default value 'application/json' for the 'Content-Type' header has been added
 to prevent the deprecation messages and to be prepared for the requirement in
 next major version of Elasticsearch.

 See: elastic/elasticsearch#22691

Related:

* c22ec89
* logstash-plugins/logstash-input-elasticsearch#55
* logstash-plugins/logstash-input-elasticsearch#56
* elastic/elasticsearch#22691

Closes #400
karmi added a commit to elastic/elasticsearch-ruby that referenced this pull request Feb 7, 2017
…' header

In future, Elasticsearch will require specifying the format of data which is being sent
 (eg. when indexing a document). Current versions of Elasticsearch will start to
 print a warning to the deprecation log.

 A default value 'application/json' for the 'Content-Type' header has been added
 to prevent the deprecation messages and to be prepared for the requirement in
 next major version of Elasticsearch.

 See: elastic/elasticsearch#22691

Related:

* c22ec89
* logstash-plugins/logstash-input-elasticsearch#55
* logstash-plugins/logstash-input-elasticsearch#56
* elastic/elasticsearch#22691

Closes #400
jaymode added a commit that referenced this pull request Feb 7, 2017
…ntent (#22691)

This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x
and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting
`http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported
content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected
and if it is not present or not valid, we will continue with auto detection of content like we have done previously.

The content type header is parsed to the matching XContentType value with the only exception being for plain text
requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need
to auto-detect the content type.

As part of this, many transport requests and builders were updated to provide methods that accepted the XContentType
along with the bytes and the methods that would rely on auto-detection have been deprecated.

In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type
header.

See #19388
@jaymode
Copy link
Member Author

jaymode commented Feb 7, 2017

5.x commit b9b9400

dakrone added a commit to dakrone/es-mode that referenced this pull request Feb 7, 2017
jaymode added a commit to jaymode/elasticsearch that referenced this pull request Feb 9, 2017
This commit adds methods to the BulkProcessor that accept bytes and a XContentType to avoid content type detection. The
methods that do not accept XContentType with bytes have been deprecated by this commit.

Relates elastic#22691
jaymode added a commit that referenced this pull request Feb 10, 2017
This commit adds methods to the BulkProcessor that accept bytes and a XContentType to avoid content type detection. The
methods that do not accept XContentType with bytes have been deprecated by this commit.

Relates #22691
jaymode added a commit that referenced this pull request Feb 10, 2017
This commit adds methods to the BulkProcessor that accept bytes and a XContentType to avoid content type detection. The
methods that do not accept XContentType with bytes have been deprecated by this commit.

Relates #22691
jaymode added a commit that referenced this pull request Feb 10, 2017
This commit adds methods to the BulkProcessor that accept bytes and a XContentType to avoid content type detection. The
methods that do not accept XContentType with bytes have been deprecated by this commit.

Relates #22691
jaymode added a commit that referenced this pull request Feb 13, 2017
This commit fixes communication with 5.3.0 nodes to send XContentType to these nodes since #22691 was backported to the
5.3 branch.
jaymode added a commit to jaymode/elasticsearch that referenced this pull request Feb 15, 2017
The backport of elastic#22691 caused plain text bodies with a scroll id to fail with an IllegalStateException as the wrong
method was being called. This commit adds tests to ensure plain text bodies work and fixes the search scroll action so
that it properly handles a request with a plain text body.
jaymode added a commit that referenced this pull request Feb 15, 2017
The backport of #22691 caused plain text bodies with a scroll id to fail with an IllegalStateException as the wrong
method was being called. This commit adds tests to ensure plain text bodies work and fixes the search scroll action so
that it properly handles a request with a plain text body.
jaymode added a commit that referenced this pull request Feb 15, 2017
The backport of #22691 caused plain text bodies with a scroll id to fail with an IllegalStateException as the wrong
method was being called. This commit adds tests to ensure plain text bodies work and fixes the search scroll action so
that it properly handles a request with a plain text body.
@paladox
Copy link

paladox commented Mar 29, 2017

Hi, after upgrading from 5.2.* to 5.3.0 i found that the config http.content_type.required is on by default in 5.x. Doing http.content_type.required: false in the elastic search config file fixed it for me.

@pickypg
Copy link
Member

pickypg commented Mar 29, 2017

@paladox

FYI, the setting defaults to false if it has not been set until 6.0.

https://github.com/elastic/elasticsearch/blob/v5.3.0/core/src/main/java/org/elasticsearch/http/HttpTransportSettings.java#L72

I also downloaded an instance of 5.3.0 to ensure something didn't sneak in and ran:

$ curl localhost:9200/_search -d '{"query":{"match_all":{}}}'

It worked without the content-type specified and no settings adjusted.

@paladox
Copy link

paladox commented Mar 29, 2017

Oh.

@paladox
Copy link

paladox commented Mar 29, 2017

We managed to fix it with https://phabricator.wikimedia.org/P5158#27475

@pickypg
Copy link
Member

pickypg commented Mar 29, 2017

@paladox

Ah, so that's a different issue. Your client was technically sending the wrong content-type for the content. Regardless, I'm glad you fixed it by sending the content-type; that puts you all ahead of the curve!

dliappis added a commit to elastic/elasticsearch-docker that referenced this pull request May 15, 2017
Starting with Elasticsearch 6.0.0 products are required to set
Content-Type to application/json. Elasticsearch 5.x already accepts
this[1].

Add acceptance tests when we load test data and ensure content type is
set to the correct value.

Also add ipdb to requirements.txt as a convenience for troubleshooting
tests.

[1] elastic/elasticsearch#22691
picandocodigo pushed a commit to elastic/elastic-transport-ruby that referenced this pull request Jun 7, 2021
…' header

In future, Elasticsearch will require specifying the format of data which is being sent
 (eg. when indexing a document). Current versions of Elasticsearch will start to
 print a warning to the deprecation log.

 A default value 'application/json' for the 'Content-Type' header has been added
 to prevent the deprecation messages and to be prepared for the requirement in
 next major version of Elasticsearch.

 See: elastic/elasticsearch#22691

Related:

* c22ec891b0be0e3e8f3c10fd7844adfbd2d608ed
* logstash-plugins/logstash-input-elasticsearch#55
* logstash-plugins/logstash-input-elasticsearch#56
* elastic/elasticsearch#22691

Closes #400
renaperes824 added a commit to renaperes824/ruby-elastic-transport that referenced this pull request Sep 12, 2022
…' header

In future, Elasticsearch will require specifying the format of data which is being sent
 (eg. when indexing a document). Current versions of Elasticsearch will start to
 print a warning to the deprecation log.

 A default value 'application/json' for the 'Content-Type' header has been added
 to prevent the deprecation messages and to be prepared for the requirement in
 next major version of Elasticsearch.

 See: elastic/elasticsearch#22691

Related:

* c22ec891b0be0e3e8f3c10fd7844adfbd2d608ed
* logstash-plugins/logstash-input-elasticsearch#55
* logstash-plugins/logstash-input-elasticsearch#56
* elastic/elasticsearch#22691

Closes #400
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.