Remove content detection mechanism from REST endpoints #19388

jasontedor · 2016-07-12T13:11:48Z

Today, some Elasticsearch REST endpoints (e.g., update index settings API) attempt to determine the content type (e.g., JSON vs. YAML) of the request body with a rudimentary detection mechanism. This mechanism breaks down on YAML bodies that contain braces because the content is inappropriately detected as JSON. Instead, this mechanism should not be used, Elasticsearch should default to JSON for all request bodies, and possibly obey the Content-Type header for allowing YAML.

Relates #19366

The text was updated successfully, but these errors were encountered:

javanna · 2016-07-12T13:41:16Z

I was under the impression that we already obey to the Content-Type header, and it's a bug if some apis don't. Seems like we do things differently only for apis that handle settings. I noticed that we support "properties" format for settings, which may be the reason behind this weird choice... wondering who uses/remembers that we support properties format there.

jasontedor · 2016-07-12T13:52:29Z

I was under the impression that we already obey to the Content-Type header, and it's a bug if some apis don't.

Sadly we do not always. 😢

I noticed that we support "properties" format for settings, which may be the reason behind this weird choice... wondering who uses/remembers that we support properties format there.

I think that should be removed as part of closing this issue. 😄

javanna · 2016-07-12T13:54:37Z

Sadly we do not.

It seems like we do apart from where we accept settings in the body? Or are there other REST endpoints where we try and detect content type?

I think that should be removed as part of closing this issue.

Sure, why not. Maybe it should be brought up before it gets removed though :)

clintongormley · 2016-07-12T14:08:24Z

I noticed that we support "properties" format for settings, which may be the reason behind this weird choice... wondering who uses/remembers that we support properties format there.

Not documented. Please rip it out!

javanna · 2016-07-12T14:16:27Z

Not documented. Please rip it out!

Perfect that makes it a no-brainer, with properties out of the way it makes no sense to do any auto-detection I think. We should do for settings what we already do for any other request I believe.

jasontedor · 2016-07-12T14:24:26Z

We can completely remove support for properties. I opened #19391.

anti-social · 2016-07-13T12:49:00Z

If you removed --- requirement YAML parser could parse JSON since YAML is a superset of JSON.

clintongormley · 2016-07-13T19:28:47Z

To quote from https://metacpan.org/pod/JSON::XS#JSON-and-YAML

YAML has hardcoded limits on (simple) object key lengths that JSON doesn't have and also has different and incompatible unicode character escape syntax, so you should make sure that your hash keys are noticeably shorter than the 1024 "stream characters" YAML allows and that you do not have characters with codepoint values outside the Unicode BMP (basic multilingual page). YAML also does not allow / sequences in strings

popravich · 2016-07-13T20:12:03Z

ok, but what about checking Content-Type header?
YAML format suits our flow the best -- its easier to read, write, add comments and maintain at all.
It would be really great if you guys atleast keep current YAML support.

jasontedor · 2016-07-13T20:20:07Z

@popravich That's the plan.

javanna · 2016-07-13T20:38:04Z

I care to highlight that we already take into account the Content-Type header. I think the title of this issue is misleading as it makes people think that we guess the content type all the time, while we (used to) have this detection in place only for apis that accept settings in their request body. The general approach is to take Content-Type into account and that's always been the case apart from the mentioned exceptions.

jasontedor · 2016-07-13T20:54:17Z

@javanna It's in the description of the issue that this is a problem only on certain endpoints.

javanna · 2016-07-14T09:25:37Z

ok it appears that what I cared to highlight above was completely wrong. We don't read the Content-Type header, ever, sorry for the confusion. We rather auto-detect based on markers on all REST endpoints. Settings were indeed a special place as we supported also properties format there (now gone).

The reason for this is that we have the java api, which exposes a lot of source setters that allow to effectively provide the response body as a byte array. Those methods are also used from REST to transport where we convert from RestRequest to java api object, so no content type is ever taken into account.

bleskes · 2016-07-14T09:45:53Z

I think the discussion should be split in two. First for the future where we will rely on http for all out of cluster communication and then second for what to do until then. As Luca noted, we don't do anything with the incoming content type at the moment.

For the future, I would suggest we move over to using content type headers exclusively and use JSON as default. That means we will need to extend our transport layer to carry that information. Maybe move from ByteReference to XContentByteReference and drop support for setting raw strings.

Until then we have two options - either keep things as is (but improve detection and make it uniform - i.e., not have one detection logic for settings and another for the rest) or change our transport client to require content type information. If we go for the latter, we have to do it on a major version. Since that is quite a big job for us and will case pain for users to upgrade to something that will be going away any way, my vote is to keep things the same for now and just solve the little issues we find. As soon as we can fully rely on http this will become simpler on it's own.

javanna · 2016-07-14T10:03:27Z

One compromise could be to keep the current source(BytesReference) etc. methods (maybe deprecate them) and add new methods that accept the content-type as a second argument, so that the REST layer can call those instead and hand over the content-type header. Where we don't know, we would still auto-detect. This is a big job but it doesn't necessarily have to be a pain for users.

Looking at what we are moving towards to (http), having those new methods may actually end up helping users migrating as the content-type will always be required with the rest client when providing a request body. Not too sure about this but maybe worth considering.

Another option we would have is to copy the content-type header from REST layer to transport layer so that the info is accessible when actually parsing the bytes and can be taken into account if available. This is suboptimal as the transport client would not have that info unless the header is explicitly set for each request, which is something that I am not sure users would do. Would work within elasticsearch though without changing methods signatures, probably too sneaky though.

clintongormley · 2016-07-15T09:38:12Z

Discussed in FixitFriday. To fix the transport client we have to either:

Require a content header
Remove the ability to specify bytes
Remove the transport client

All of these are big changes. In the meantime, should we just fix the REST layer and require a content header for anything that is not JSON?

javanna · 2016-07-15T11:53:35Z

Fixing the REST layer anyhow requires the transport layer to accept the content-type and carry it around wherever bytes can be provided. We agreed on FixItFriday that we want to fix the REST layer to not do auto-detection and rely on Content-Type instead. I think next step is to more closely evaluate what the impact is on the java api, and see if we can at least decrease the places where one can provide bytes in the java api, and add the content-type bit wherever providing bytes is needed. It is a big job but it is required to fix the REST layer, regardless of future deprecation of transport client.

With recent changes to our parsing code we have drastically reduced the places where we auto-detect the content type from the input. The usage of these methods spread in our codebase for no reason, given that in most of the cases we know the content type upfront and we don't need any auto-detection mechanism. Deprecating these methods is a way to try and make sure that these methods are carefully used, and hopefully not introduced in newly written code. We have yet to fix the REST layer to read the Content-Type header, which is the long term solution, but for now we just want to make sure that the usage of these methods doesn't spread any further. Relates to elastic#19388

) With recent changes to our parsing code we have drastically reduced the places where we auto-detect the content type from the input. The usage of these methods spread in our codebase for no reason, given that in most of the cases we know the content type upfront and we don't need any auto-detection mechanism. Deprecating these methods is a way to try and make sure that these methods are carefully used, and hopefully not introduced in newly written code. We have yet to fix the REST layer to read the Content-Type header, which is the long term solution, but for now we just want to make sure that the usage of these methods doesn't spread any further. Relates to #19388

This change enforces that all incoming rest requests have a valid and supported content type header before the request is dispatched. The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to autodetect the content type. As part of this, many transport requests and builders were updated to provide methods that accepted the XContentType along with the bytes and the methods that would rely on autodetection have been deprecated. Closes elastic#19388

…ntent (#22691) This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting `http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously. The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type. As part of this, many transport requests and builders were updated to provide methods that accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated. In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header. See #19388

…ntent (elastic#22691) This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting `http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously. The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type. As part of this, many transport requests and builders were updated to provide methods that accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated. In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header. See elastic#19388

…ntent (#22691) This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting `http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously. The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type. As part of this, many transport requests and builders were updated to provide methods that accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated. In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header. See #19388

…ted methods This commit enforces the requirement of Content-Type for the REST layer and removes the deprecated methods in transport requests and their usages. While doing this, it turns out that there are many places where *Entity classes are used from the apache http client libraries and many of these usages did not specify the content type. The methods that do not specify a content type explicitly have been added to forbidden apis to prevent more of these from entering our code base. Relates elastic#19388

…ted methods (#23146) This commit enforces the requirement of Content-Type for the REST layer and removes the deprecated methods in transport requests and their usages. While doing this, it turns out that there are many places where *Entity classes are used from the apache http client libraries and many of these usages did not specify the content type. The methods that do not specify a content type explicitly have been added to forbidden apis to prevent more of these from entering our code base. Relates #19388

javanna · 2017-03-17T20:32:23Z

@jaymode should we close this? We have removed auto-detection from the REST layer, is there anything left to do here?

jaymode · 2017-04-05T13:38:52Z

@javanna you are correct. There is nothing left to do here.

jasontedor added discuss :Core/Infra/REST API REST infrastructure and utilities labels Jul 12, 2016

jasontedor mentioned this issue Jul 12, 2016

Remove support for properties #19398

Merged

bleskes mentioned this issue Jul 13, 2016

_settings API fails to determine source format #19366

Closed

javanna added >enhancement and removed discuss labels Jul 15, 2016

javanna self-assigned this Jul 15, 2016

imotov mentioned this issue Aug 12, 2016

Put script API should be able to accept scripts not wrapped into a JSON object #19963

Closed

javanna added the high hanging fruit label Dec 10, 2016

nik9000 mentioned this issue Dec 13, 2016

Continue consolidating XContentParser construction in tests #22145

Merged

javanna mentioned this issue Dec 14, 2016

Deprecate XContentType auto detection methods in XContentFactory #22181

Merged

jaymode mentioned this issue Jan 19, 2017

Optionally require a valid content type for all rest requests with content #22691

Merged

pickypg mentioned this issue Feb 2, 2017

[DOCS] Low Level REST Client should emphasize ContentType #22940

Merged

jaymode self-assigned this Feb 13, 2017

jaymode mentioned this issue Feb 13, 2017

Enforce Content-Type requirement on the rest layer and remove deprecated methods #23146

Merged

jaymode closed this as completed Apr 5, 2017

danielmitterdorfer mentioned this issue Jan 9, 2018

406 response when POSTing to _bulk with a charset specified #28123

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove content detection mechanism from REST endpoints #19388

Remove content detection mechanism from REST endpoints #19388

jasontedor commented Jul 12, 2016 •

edited

Loading

javanna commented Jul 12, 2016 •

edited

Loading

jasontedor commented Jul 12, 2016 •

edited

Loading

javanna commented Jul 12, 2016

clintongormley commented Jul 12, 2016

javanna commented Jul 12, 2016

jasontedor commented Jul 12, 2016

anti-social commented Jul 13, 2016

clintongormley commented Jul 13, 2016

popravich commented Jul 13, 2016

jasontedor commented Jul 13, 2016

javanna commented Jul 13, 2016

jasontedor commented Jul 13, 2016

javanna commented Jul 14, 2016

bleskes commented Jul 14, 2016

javanna commented Jul 14, 2016

clintongormley commented Jul 15, 2016

javanna commented Jul 15, 2016

javanna commented Mar 17, 2017

jaymode commented Apr 5, 2017

Remove content detection mechanism from REST endpoints #19388

Remove content detection mechanism from REST endpoints #19388

Comments

jasontedor commented Jul 12, 2016 • edited Loading

javanna commented Jul 12, 2016 • edited Loading

jasontedor commented Jul 12, 2016 • edited Loading

javanna commented Jul 12, 2016

clintongormley commented Jul 12, 2016

javanna commented Jul 12, 2016

jasontedor commented Jul 12, 2016

anti-social commented Jul 13, 2016

clintongormley commented Jul 13, 2016

popravich commented Jul 13, 2016

jasontedor commented Jul 13, 2016

javanna commented Jul 13, 2016

jasontedor commented Jul 13, 2016

javanna commented Jul 14, 2016

bleskes commented Jul 14, 2016

javanna commented Jul 14, 2016

clintongormley commented Jul 15, 2016

javanna commented Jul 15, 2016

javanna commented Mar 17, 2017

jaymode commented Apr 5, 2017

jasontedor commented Jul 12, 2016 •

edited

Loading

javanna commented Jul 12, 2016 •

edited

Loading

jasontedor commented Jul 12, 2016 •

edited

Loading