-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally require a valid content type for all rest requests with content #22691
Conversation
This change enforces that all incoming rest requests have a valid and supported content type header before the request is dispatched. The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to autodetect the content type. As part of this, many transport requests and builders were updated to provide methods that accepted the XContentType along with the bytes and the methods that would rely on autodetection have been deprecated. Closes elastic#19388
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot for working on this @jaymode . I did a first review round and left a bunch of comments. I will need to do another round another time as my eyes are crossing now :) looks good though. I am super happy that we are fixing this.
return xContentType; | ||
} | ||
|
||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add javadocs with the motivation for the deprecation and what it is replaced with?
I assume that these methods can be removed in master once this PR is backported to 5.x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this can go away in master after backporting.
} | ||
|
||
@Override | ||
public String toString() { | ||
String sSource = "_na_"; | ||
try { | ||
sSource = XContentHelper.convertToJson(script, false); | ||
if (xContentType == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you clarify that this if is only needed for bw comp and should go away once the bw comp layer is removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually going to remove this and require a non-null xContentType.
@@ -40,9 +41,14 @@ public PutStoredScriptRequestBuilder setId(String id) { | |||
return this; | |||
} | |||
|
|||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadocs?
public PutMappingRequest source(String mappingSource) { | ||
this.source = mappingSource; | ||
return source(mappingSource, XContentFactory.xContentType(mappingSource)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not something that you changed but maybe something to fix as a followup, we have 4 xContentTypes, but given that source
is a string, we only support yaml or json here. I wonder if source
should rather be a BytesReference
? I never thought we'd support yaml here though :) who knows if anybody is using that format. I also wonder if instead of converting here we should keep around the xContentType and carry it around all the way to DocumentMapperParser#parse
where we actually parse it. That may be a bigger effort but sounds cleaner to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the way I suggested to go before is totally different from what I am saying in this second review. That previous direction was probably the most correct but had a high cost (bw comp mainly). It's the only one that doesn't require any conversion, but just parsing in MapperService
once we know to content type. We can still go that way if we really want to... but in most of the cases we get json so we may end up complicating things to handle edge cases which is never good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also add tests for all these different content types around mappings, otherwise it's all theories. I can look into it as a followup
return this; | ||
} | ||
|
||
/** | ||
* The settings to crete the index template with (either json/yaml/properties format). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you fix this typo while at it? s/crete/create
storedScriptRequest.script(new BytesArray("{}"), XContentType.JSON); | ||
|
||
assertEquals(XContentType.JSON, storedScriptRequest.xContentType()); | ||
BytesStreamOutput output = new BytesStreamOutput(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit maybe use try finally around stream outputs and stream inputs?
serialized.readFrom(in); | ||
assertEquals(XContentType.JSON, storedScriptRequest.xContentType()); | ||
assertEquals(storedScriptRequest.scriptLang(), serialized.scriptLang()); | ||
assertEquals(storedScriptRequest.id(), serialized.id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for testing it. It is some more effort but what we've been doing to test also that when we receive from an older version properly is to deserialize a base64 encoded request that was serialized using the previous version. That is much better than simulating things.
request.writeTo(bytesStreamOutput); | ||
|
||
StreamInput in = StreamInput.wrap(bytesStreamOutput.bytes().toBytesRef().bytes); | ||
in.setVersion(Version.V_5_0_0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in all these serialization tests, shall we randomize the version just to make sure that the conditional target the proper versions? you may bump into the fact that some released versions are still marked unreleased hence cannot be used in VersionUtils.randomVersion.
// this is hacky but here goes | ||
PutMappingRequest request = new PutMappingRequest("foo"); | ||
String mapping = YamlXContent.contentBuilder().startObject().field("foo", "bar").endObject().string(); | ||
request.source(mapping, XContentType.JSON); // THIS IS NOT A BUG! Intentionally specifying the wrong type so we serialize it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what practical usecase does this test? user is providing the mapping in yaml but?
} | ||
/** | ||
* A wrapper of {@link HttpHeaders} that implements a map to prevent copying unnecessarily. This class does not support modifications | ||
* and due to the underlying implementation, it performs case insensitive lookups of key to values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh nice some other comment of mine above may be wrong then, seems like you have made headers case-insensitive. So does it mean that they we case-sensitive before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The headers were case-insensitive before. Both the Netty 3 and 4 implementations implemented this.
@elastic/es-clients this will require the clients to send a Content-Type header @jaymode if no Content-Type header is received, do we just assume JSON? Otherwise this will impact command line |
We do not assume JSON. This commit requires the header to be present and a known type (JSON, YAML, SMILE, CBOR, Plain text). Curl sends the content-type as |
I think it's trivial to set a default header, but we'll have to think about the "Cat" APIs -- I've tried to set On the other hand, there could be a debate whether the "Cat" APIs shouldn't return parsed JSON when you call them from something like JavaScript, Python, ..., instead of plain text.
I might be missing a larger picture here, but do we have grave enough reason to just break the most simple use case when people do |
This is my fault as I did not state this clearly. We only require the Content-Type header for a request made to elasticsearch with a body (eg POST with JSON body). So Another aspect is that sometimes we allow a |
I think this is because we use to mistakenly use the |
but it would still break Also what would happen with I would be 100% for abolishing the auto-detect in favor of assuming json (not trying to detect it) and requiring content-type if it's anything else. How does that sound? |
Yes, that is what I was pointing out. Again, do we have a good enough reason to break things like this?
+1 Assume JSON by default, require |
How new (official) clients do users need to have for this? |
I think that assuming json when the header is not available is already a huge improvement over auto-detection. We may want to make the header required at some point whenever a body is provided, but we can always do that later, and potentially add deprecation logging for that first and break with the next major version. After all the vast majority of our users use json so that assumption seems about right to me for now. |
Doing this doesn't solve the issue the usability issues with curl since the Content-Type header will be sent but will not be a supported value unless specified by a user. I think the requirement of this header should be removed from this PR, based on the text from RFC 7231:
Assuming a media type of application/octet-stream does not do us much good and the auto-detection is what we want to remove. |
Right, thanks for clarifying. Seems like we don't want to break curl in any case, hence if we do want to remove auto-detection, we may have to make curl work with some hack. If curl is resolved one way or the other, do we want to assume json or always require the header (besides the curl specific case)? |
To not break curl, and still be somewhat consistent, would essentially mean to assume ...or decide (in a different issue) we don't need another formats other than json and completely side step this issue. |
I think this is too lenient. I would rather special case curl in some other way and find out about other cases that may need special treatment (hopefully there aren't). |
++ we can document sending any unknown This actually also brings its behavior inline with |
…' header In future, Elasticsearch will require specifying the format of data which is being sent (eg. when indexing a document). Current versions of Elasticsearch will start to print a warning to the deprecation log. A default value 'application/json' for the 'Content-Type' header has been added to prevent the deprecation messages and to be prepared for the requirement in next major version of Elasticsearch. See: elastic/elasticsearch#22691 Related: * c22ec89 * logstash-plugins/logstash-input-elasticsearch#55 * logstash-plugins/logstash-input-elasticsearch#56 * elastic/elasticsearch#22691 Closes #400
…' header In future, Elasticsearch will require specifying the format of data which is being sent (eg. when indexing a document). Current versions of Elasticsearch will start to print a warning to the deprecation log. A default value 'application/json' for the 'Content-Type' header has been added to prevent the deprecation messages and to be prepared for the requirement in next major version of Elasticsearch. See: elastic/elasticsearch#22691 Related: * c22ec89 * logstash-plugins/logstash-input-elasticsearch#55 * logstash-plugins/logstash-input-elasticsearch#56 * elastic/elasticsearch#22691 Closes #400
…ntent (#22691) This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting `http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously. The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type. As part of this, many transport requests and builders were updated to provide methods that accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated. In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header. See #19388
5.x commit b9b9400 |
Resolves #37 Relates to elastic/elasticsearch#22691
This commit adds methods to the BulkProcessor that accept bytes and a XContentType to avoid content type detection. The methods that do not accept XContentType with bytes have been deprecated by this commit. Relates elastic#22691
This commit adds methods to the BulkProcessor that accept bytes and a XContentType to avoid content type detection. The methods that do not accept XContentType with bytes have been deprecated by this commit. Relates #22691
This commit adds methods to the BulkProcessor that accept bytes and a XContentType to avoid content type detection. The methods that do not accept XContentType with bytes have been deprecated by this commit. Relates #22691
This commit adds methods to the BulkProcessor that accept bytes and a XContentType to avoid content type detection. The methods that do not accept XContentType with bytes have been deprecated by this commit. Relates #22691
This commit fixes communication with 5.3.0 nodes to send XContentType to these nodes since #22691 was backported to the 5.3 branch.
The backport of elastic#22691 caused plain text bodies with a scroll id to fail with an IllegalStateException as the wrong method was being called. This commit adds tests to ensure plain text bodies work and fixes the search scroll action so that it properly handles a request with a plain text body.
The backport of #22691 caused plain text bodies with a scroll id to fail with an IllegalStateException as the wrong method was being called. This commit adds tests to ensure plain text bodies work and fixes the search scroll action so that it properly handles a request with a plain text body.
The backport of #22691 caused plain text bodies with a scroll id to fail with an IllegalStateException as the wrong method was being called. This commit adds tests to ensure plain text bodies work and fixes the search scroll action so that it properly handles a request with a plain text body.
Hi, after upgrading from 5.2.* to 5.3.0 i found that the config http.content_type.required is on by default in 5.x. Doing http.content_type.required: false in the elastic search config file fixed it for me. |
FYI, the setting defaults to I also downloaded an instance of 5.3.0 to ensure something didn't sneak in and ran: $ curl localhost:9200/_search -d '{"query":{"match_all":{}}}' It worked without the content-type specified and no settings adjusted. |
Oh. |
We managed to fix it with https://phabricator.wikimedia.org/P5158#27475 |
Ah, so that's a different issue. Your client was technically sending the wrong content-type for the content. Regardless, I'm glad you fixed it by sending the content-type; that puts you all ahead of the curve! |
Starting with Elasticsearch 6.0.0 products are required to set Content-Type to application/json. Elasticsearch 5.x already accepts this[1]. Add acceptance tests when we load test data and ensure content type is set to the correct value. Also add ipdb to requirements.txt as a convenience for troubleshooting tests. [1] elastic/elasticsearch#22691
…' header In future, Elasticsearch will require specifying the format of data which is being sent (eg. when indexing a document). Current versions of Elasticsearch will start to print a warning to the deprecation log. A default value 'application/json' for the 'Content-Type' header has been added to prevent the deprecation messages and to be prepared for the requirement in next major version of Elasticsearch. See: elastic/elasticsearch#22691 Related: * c22ec891b0be0e3e8f3c10fd7844adfbd2d608ed * logstash-plugins/logstash-input-elasticsearch#55 * logstash-plugins/logstash-input-elasticsearch#56 * elastic/elasticsearch#22691 Closes #400
…' header In future, Elasticsearch will require specifying the format of data which is being sent (eg. when indexing a document). Current versions of Elasticsearch will start to print a warning to the deprecation log. A default value 'application/json' for the 'Content-Type' header has been added to prevent the deprecation messages and to be prepared for the requirement in next major version of Elasticsearch. See: elastic/elasticsearch#22691 Related: * c22ec891b0be0e3e8f3c10fd7844adfbd2d608ed * logstash-plugins/logstash-input-elasticsearch#55 * logstash-plugins/logstash-input-elasticsearch#56 * elastic/elasticsearch#22691 Closes #400
This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting
http.content_type.required: true
in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously.The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type.
As part of this, many transport requests and builders were updated to provide methods that
accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated.
In non strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header.
See #19388