Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove content detection mechanism from REST endpoints #19388

Closed
jasontedor opened this issue Jul 12, 2016 · 19 comments
Closed

Remove content detection mechanism from REST endpoints #19388

jasontedor opened this issue Jul 12, 2016 · 19 comments
Assignees
Labels

Comments

@jasontedor
Copy link
Member

jasontedor commented Jul 12, 2016

Today, some Elasticsearch REST endpoints (e.g., update index settings API) attempt to determine the content type (e.g., JSON vs. YAML) of the request body with a rudimentary detection mechanism. This mechanism breaks down on YAML bodies that contain braces because the content is inappropriately detected as JSON. Instead, this mechanism should not be used, Elasticsearch should default to JSON for all request bodies, and possibly obey the Content-Type header for allowing YAML.

Relates #19366

@jasontedor jasontedor added discuss :Core/Infra/REST API REST infrastructure and utilities labels Jul 12, 2016
@javanna
Copy link
Member

javanna commented Jul 12, 2016

I was under the impression that we already obey to the Content-Type header, and it's a bug if some apis don't. Seems like we do things differently only for apis that handle settings. I noticed that we support "properties" format for settings, which may be the reason behind this weird choice... wondering who uses/remembers that we support properties format there.

@jasontedor
Copy link
Member Author

jasontedor commented Jul 12, 2016

I was under the impression that we already obey to the Content-Type header, and it's a bug if some apis don't.

Sadly we do not always. 😢

I noticed that we support "properties" format for settings, which may be the reason behind this weird choice... wondering who uses/remembers that we support properties format there.

I think that should be removed as part of closing this issue. 😄

@javanna
Copy link
Member

javanna commented Jul 12, 2016

Sadly we do not.

It seems like we do apart from where we accept settings in the body? Or are there other REST endpoints where we try and detect content type?

I think that should be removed as part of closing this issue.

Sure, why not. Maybe it should be brought up before it gets removed though :)

@clintongormley
Copy link
Contributor

I noticed that we support "properties" format for settings, which may be the reason behind this weird choice... wondering who uses/remembers that we support properties format there.

Not documented. Please rip it out!

@javanna
Copy link
Member

javanna commented Jul 12, 2016

Not documented. Please rip it out!

Perfect that makes it a no-brainer, with properties out of the way it makes no sense to do any auto-detection I think. We should do for settings what we already do for any other request I believe.

@jasontedor
Copy link
Member Author

We can completely remove support for properties. I opened #19391.

@anti-social
Copy link

If you removed --- requirement YAML parser could parse JSON since YAML is a superset of JSON.

@clintongormley
Copy link
Contributor

To quote from https://metacpan.org/pod/JSON::XS#JSON-and-YAML

YAML has hardcoded limits on (simple) object key lengths that JSON doesn't have and also has different and incompatible unicode character escape syntax, so you should make sure that your hash keys are noticeably shorter than the 1024 "stream characters" YAML allows and that you do not have characters with codepoint values outside the Unicode BMP (basic multilingual page). YAML also does not allow / sequences in strings

@popravich
Copy link

ok, but what about checking Content-Type header?
YAML format suits our flow the best -- its easier to read, write, add comments and maintain at all.
It would be really great if you guys atleast keep current YAML support.

@jasontedor
Copy link
Member Author

@popravich That's the plan.

@javanna
Copy link
Member

javanna commented Jul 13, 2016

I care to highlight that we already take into account the Content-Type header. I think the title of this issue is misleading as it makes people think that we guess the content type all the time, while we (used to) have this detection in place only for apis that accept settings in their request body. The general approach is to take Content-Type into account and that's always been the case apart from the mentioned exceptions.

@jasontedor
Copy link
Member Author

@javanna It's in the description of the issue that this is a problem only on certain endpoints.

@javanna
Copy link
Member

javanna commented Jul 14, 2016

ok it appears that what I cared to highlight above was completely wrong. We don't read the Content-Type header, ever, sorry for the confusion. We rather auto-detect based on markers on all REST endpoints. Settings were indeed a special place as we supported also properties format there (now gone).

The reason for this is that we have the java api, which exposes a lot of source setters that allow to effectively provide the response body as a byte array. Those methods are also used from REST to transport where we convert from RestRequest to java api object, so no content type is ever taken into account.

@bleskes
Copy link
Contributor

bleskes commented Jul 14, 2016

I think the discussion should be split in two. First for the future where we will rely on http for all out of cluster communication and then second for what to do until then. As Luca noted, we don't do anything with the incoming content type at the moment.

For the future, I would suggest we move over to using content type headers exclusively and use JSON as default. That means we will need to extend our transport layer to carry that information. Maybe move from ByteReference to XContentByteReference and drop support for setting raw strings.

Until then we have two options - either keep things as is (but improve detection and make it uniform - i.e., not have one detection logic for settings and another for the rest) or change our transport client to require content type information. If we go for the latter, we have to do it on a major version. Since that is quite a big job for us and will case pain for users to upgrade to something that will be going away any way, my vote is to keep things the same for now and just solve the little issues we find. As soon as we can fully rely on http this will become simpler on it's own.

@javanna
Copy link
Member

javanna commented Jul 14, 2016

One compromise could be to keep the current source(BytesReference) etc. methods (maybe deprecate them) and add new methods that accept the content-type as a second argument, so that the REST layer can call those instead and hand over the content-type header. Where we don't know, we would still auto-detect. This is a big job but it doesn't necessarily have to be a pain for users.

Looking at what we are moving towards to (http), having those new methods may actually end up helping users migrating as the content-type will always be required with the rest client when providing a request body. Not too sure about this but maybe worth considering.

Another option we would have is to copy the content-type header from REST layer to transport layer so that the info is accessible when actually parsing the bytes and can be taken into account if available. This is suboptimal as the transport client would not have that info unless the header is explicitly set for each request, which is something that I am not sure users would do. Would work within elasticsearch though without changing methods signatures, probably too sneaky though.

@clintongormley
Copy link
Contributor

Discussed in FixitFriday. To fix the transport client we have to either:

  • Require a content header
  • Remove the ability to specify bytes
  • Remove the transport client

All of these are big changes. In the meantime, should we just fix the REST layer and require a content header for anything that is not JSON?

@javanna
Copy link
Member

javanna commented Jul 15, 2016

Fixing the REST layer anyhow requires the transport layer to accept the content-type and carry it around wherever bytes can be provided. We agreed on FixItFriday that we want to fix the REST layer to not do auto-detection and rely on Content-Type instead. I think next step is to more closely evaluate what the impact is on the java api, and see if we can at least decrease the places where one can provide bytes in the java api, and add the content-type bit wherever providing bytes is needed. It is a big job but it is required to fix the REST layer, regardless of future deprecation of transport client.

@javanna javanna self-assigned this Jul 15, 2016
javanna added a commit to javanna/elasticsearch that referenced this issue Dec 16, 2016
With recent changes to our parsing code we have drastically reduced the places where we auto-detect the content type from the input. The usage of these methods spread in our codebase for no reason, given that in most of the cases we know the content type upfront and we don't need any auto-detection mechanism. Deprecating these methods is a way to try and make sure that these methods are carefully used, and hopefully not introduced in newly written code.

We have yet to fix the REST layer to read the Content-Type header, which is the long term solution, but for now we just want to make sure that the usage of these methods doesn't spread any further.

Relates to elastic#19388
javanna added a commit that referenced this issue Dec 16, 2016
)

With recent changes to our parsing code we have drastically reduced the places where we auto-detect the content type from the input. The usage of these methods spread in our codebase for no reason, given that in most of the cases we know the content type upfront and we don't need any auto-detection mechanism. Deprecating these methods is a way to try and make sure that these methods are carefully used, and hopefully not introduced in newly written code.

We have yet to fix the REST layer to read the Content-Type header, which is the long term solution, but for now we just want to make sure that the usage of these methods doesn't spread any further.

Relates to #19388
jaymode added a commit to jaymode/elasticsearch that referenced this issue Jan 19, 2017
This change enforces that all incoming rest requests have a valid and supported content
type header before the request is dispatched. The content type header is parsed to the
matching XContentType value with the only exception being for plain text requests. This
value is then passed on with the content bytes so that we can reduce the number of places
where we need to autodetect the content type.

As part of this, many transport requests and builders were updated to provide methods that
accepted the XContentType along with the bytes and the methods that would rely on autodetection
have been deprecated.

Closes elastic#19388
jaymode added a commit that referenced this issue Feb 2, 2017
…ntent (#22691)

This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting `http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously.

The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type.

As part of this, many transport requests and builders were updated to provide methods that
accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated.

In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header.

See #19388
jaymode added a commit to jaymode/elasticsearch that referenced this issue Feb 2, 2017
…ntent (elastic#22691)

This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting `http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously.

The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type.

As part of this, many transport requests and builders were updated to provide methods that
accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated.

In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header.

See elastic#19388
jaymode added a commit that referenced this issue Feb 7, 2017
…ntent (#22691)

This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x
and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting
`http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported
content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected
and if it is not present or not valid, we will continue with auto detection of content like we have done previously.

The content type header is parsed to the matching XContentType value with the only exception being for plain text
requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need
to auto-detect the content type.

As part of this, many transport requests and builders were updated to provide methods that accepted the XContentType
along with the bytes and the methods that would rely on auto-detection have been deprecated.

In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type
header.

See #19388
@jaymode jaymode self-assigned this Feb 13, 2017
jaymode added a commit to jaymode/elasticsearch that referenced this issue Feb 13, 2017
…ted methods

This commit enforces the requirement of Content-Type for the REST layer and removes the deprecated methods in transport
requests and their usages.

While doing this, it turns out that there are many places where *Entity classes are used from the apache http client
libraries and many of these usages did not specify the content type. The methods that do not specify a content type
explicitly have been added to forbidden apis to prevent more of these from entering our code base.

Relates elastic#19388
jaymode added a commit that referenced this issue Feb 17, 2017
…ted methods (#23146)

This commit enforces the requirement of Content-Type for the REST layer and removes the deprecated methods in transport
requests and their usages.

While doing this, it turns out that there are many places where *Entity classes are used from the apache http client
libraries and many of these usages did not specify the content type. The methods that do not specify a content type
explicitly have been added to forbidden apis to prevent more of these from entering our code base.

Relates #19388
@javanna
Copy link
Member

javanna commented Mar 17, 2017

@jaymode should we close this? We have removed auto-detection from the REST layer, is there anything left to do here?

@jaymode
Copy link
Member

jaymode commented Apr 5, 2017

@javanna you are correct. There is nothing left to do here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants