Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] version number is incompatible with existing ES clients #693

Closed
dlvenable opened this issue May 12, 2021 · 49 comments
Closed

[BUG] version number is incompatible with existing ES clients #693

dlvenable opened this issue May 12, 2021 · 49 comments

Comments

@dlvenable
Copy link
Member

dlvenable commented May 12, 2021

Describe the bug
The API which provides version includes a field version.number. Clients parse this value to determine what version of ElasticSearch is running, and is used for checking API compatibility.

In ElasticSearch 7.10, this value is reported as 7.10.

In OpenSearch 1.0.0, the value is 1.0.0-SNAPSHOT.

Existing clients which parse this string are unable to communicate with OpenSearch when they rely on the value of the version.

To Reproduce
Steps to reproduce the behavior:

  1. Start ElasticSearch 7.10
  2. curl http://localhost:9200
  3. Read version.number.
  4. Start OpenSearch
  5. curl http://localhost:9200
  6. Read version.number.

Expected behavior

Because OpenSearch uses the same API as ElasticSearch 7.10.2, the value of version.number should be 7.10.2.

Expected:

  "version" : {
    "number" : "7.10.2",

Actual:

  "version" : {
    "distribution" : "opensearch",
    "number" : "1.0.0-SNAPSHOT",

Plugins
N/A

Screenshots
N/A

Host/Environment (please complete the following information):
N/A

Additional context

We confirmed that this is the cause of #667. It also appears that this may be causing #669.

@dlvenable dlvenable added Beta bug Something isn't working untriaged labels May 12, 2021
@dlvenable
Copy link
Member Author

Since the problem is only with the number property, a possible solution would be:

  • Retain the number property as matching the ElasticSearch API version
  • Add a new property representing the OpenSearch version.

For example:

"version" : {
    "distribution" : "opensearch",
    "number" : "7.10",               // This is the ES API
    "opensearchVersion" : "1.0.0"    // This is the OpenSearch version
    ...
}  

There is probably a property name to use rather than "opensearchVersion." But, the advantage of this solution is that callers can know the ElasticSearch API version and the actual OpenSearch version (if that matters to the client).

@dblock
Copy link
Member

dblock commented May 13, 2021

Doesn't feel right that we're retaining ES number anywhere. Are we saying that we don't want clients to have to modify anything to work against OpenSearch? Won't clients want to assert compatibility with OpenSearch 1.0?

@dlvenable
Copy link
Member Author

dlvenable commented May 13, 2021

Yes, I believe that clients should not have to make changes to migrate from ElasticSearch 7.10.2 to OpenSearch 1.0. We've already found that Logstash with the Logstash ES plugin can work with OpenSearch if we manually change version.number to 7.10.2.

The introduction blog post also indicates this is the case.

The Amazon OpenSearch Service APIs will be backward compatible with the existing service APIs to eliminate any need for customers to update their current client code or applications.

Clients could assert compatibility with OpenSearch by using the new distribution field which has already been added.

Additionally, the solution I proposed above may serve as a stepping stone from OpenSearch 1.x to 2.x. The change allows existing clients to work with OpenSearch 1.x without modifications (assuming other APIs are still compatible). Clients and libraries could eventually move to support OpenSearch only, or attempt to support both ElasticSearch and OpenSearch by utilizing the distribution field. Perhaps OpenSearch 2.x would remove the version.number entirely in favor of version.opensearchVersion.

For example, clients could start to write code along these lines. It would be compatible with the suggestions above and would continue to work if version.number were removed in 2.x.

if version.distribution == 'opensearch':
  if version.opensearchVersion.startsWith('1.x'):
    handleOpensearch1();
else:
  if version.number.startsWith('7.x'):
    handleElasticsearch7();

@dblock
Copy link
Member

dblock commented May 13, 2021

I am 💯 with you on backward compatibility, but OpenSearch didn't break backward compatibility by changing the value of version.number (the API continues to return the same fields, including number, but a value has changed to 1.0-...). Even though reverting it to 7.x seems to work around the problem, someone should figure out what the actual root cause of #667 is without that. The exception produced (java.lang.IllegalArgumentException: mapper [log.level] cannot be changed from type [keyword] to [text]) looks odd to me.

Someone here will definitely get to #667 in order, but don't let them beat you to it :)

@jcgraybill
Copy link

@dlvenable raises a great point that this is almost certainly going to be more than a logstash issue: a lot of clients that interact with OpenSearch are going to check the version # to know the wire protocol to use. The proposed solution seems really elegant: keep version.number at 7.10.2 which reflects the API compatibility and lets current clients work without changing, and have a separate openSearchVersion identifier for new clients.

@dblock
Copy link
Member

dblock commented May 13, 2021

@jcgraybill I think I agree, but I do worry that this could lead to users having broken software more often as a production surprise.

Either way, we need to contribute integration tests against OpenSearch to https://github.com/logstash-plugins/logstash-output-elasticsearch. It currently runs against multiple versions of ES.

@dlvenable
Copy link
Member Author

dlvenable commented May 14, 2021

@dblock , Thanks for engaging on this.

I'm unsure exactly what issues you might be imagining. Are they problems with automated tools expecting specific behavior from specific versions? Or more of a human-related issue?

With the latter, I could see an issue with logging. Code like log.info("ElasticSearch version {}", version.number) would log out ElasticSearch version 7.9.2. That could lead to confusion for human operators.

It is likely a fair assumption that automated clients which parse the version mostly care about the major version, and maybe sometimes care about the minor version. The patch version is probably ignored.

Perhaps OpenSearch could have an alternate patch value? Some possible ideas are:

  1. Format the OpenSearch version into the patch value. Something like MMmmpp (M=major, m=minor, p=patch).
  2. Start at some large version and auto-increment from there
  3. To support a 2-digit only patch, start at a smaller patch version and auto-increment from there
OpenSearch Version Formatted Auto-Increment Auto-Increment w/ 2 Digits
1.0.0 7.9.10000 7.9.1001 7.9.51
1.0.1 7.9.10001 7.9.1002 7.9.52
1.0.2 7.9.10002 7.9.1003 7.9.53
1.1.0 7.9.10100 7.9.1004 7.9.54
2.0.0 8.0.20000 8.0.1000 8.0.51

Now that log line about could read ElasticSearch version 7.9.10000.

@nknize
Copy link
Collaborator

nknize commented May 14, 2021

Let's be careful not to engineer a ton of cosmetic surgery just to treat a secondary symptom. Doing so risks causing other problems. First things first, the backwards compatibility tests are still disabled; so we absolutely need to re-enable those in order to catch any/all other areas affected by the version rebase. (/cc @harold-wang) That's being worked now.

As for this particular symptom, the root cause is similar to the "opensearch node fails to join" issue - /cc @shwetathareja. In short, since the server side transport layer is "version aware" of the client we can detect what version of the client is attempting to connect and "mask" the OpenSearch version to predecessor 7.10.2 appropriately. This is why we've been super thoughtful to have the 1.0.0 release not introduce any breaking changes from 7.10.2.

@nknize
Copy link
Collaborator

nknize commented May 14, 2021

Perhaps OpenSearch could have an alternate patch value?

There is already a patch value in the semantic versioning system. (e.g., 1.0.0 is the 0 patch version of Major 1, Minor 0 release).

Are you suggesting a "feature" or "nightly" number? This has been brought up before (in the context of plugins) and I think it's worth discussing as a separate topic independent of this particular bug because I think it has merit. Would you like to open a separate discuss issue?

@dlvenable
Copy link
Member Author

@nknize , The issue you referenced for "OpenSearch node fails to join" mentions a "backward compatibility mode" solution. Is the idea that OpenSearch would report version.number as 7.10.2 when that flag is set?

Regarding the patch idea, this was an extension of my original proposal. Rather than returning 7.9.2 as the version, we could have a different patch value so that human-operators can clearly see that it is OpenSearch. I offered it as a suggestion to @dblock 's concern:

I do worry that this could lead to users having broken software more often as a production surprise.

@dlvenable
Copy link
Member Author

To elaborate on the purpose of this bug, we found that Logstash and Filebeats both fail to load templates when running against OpenSearch 1.0.0.


In local experiments, we modified MainResponse to return a hardcoded value:

.field("number", "7.10.2")

After this change, Logstash and Filebeats can run against OpenSearch.

It does appear that this is the cause of #706 as well. I did not create the exact setup that author included. However, both of us saw this same error in OpenSearch 1.0.0.

couldn't load template: 400 Bad Request: {"error":{"root_cause":[{"type":"mapper_parsing_exception"

Again, that manual change to MainResponse was sufficient to get past the error.

@Smithx10
Copy link

@dlvenable,

Just chiming in to say thanks, and this worked in our testing. #706 (comment)

@nknize
Copy link
Collaborator

nknize commented May 15, 2021

Is the idea that OpenSearch would report version.number as 7.10.2 when that flag is set?

No. I'm saying use the versioning system as it is intended (for backward compatibility) and fix MainResponse to return the predecessor version for Legacy Clients which wasn't covered in the initial patch that adds the distribution key-value to the MainResponse to identify OpenSearch builds from other forks.

In local experiments, we modified MainResponse to return a hardcoded value:

No hardcoded values. It should be based on the client version. But as I mentioned, I wouldn't treat this symptom just yet while Backwards Compatibility testing is disabled. This could be "patching" one issue and causing another. This "fix" I'm suggesting is on hold until we can smoke test w/ bwc tests enabled.

@dblock
Copy link
Member

dblock commented May 17, 2021

@harold-wang or @nknize, are you working on this?

@nknize
Copy link
Collaborator

nknize commented May 17, 2021

@harold-wang or @nknize, are you working on this?

First PR opened

@dlvenable
Copy link
Member Author

@nknize , As I understand PR #708, this will support the internal node-to-node communication inside an ElasticSearch cluster.

I created this ticket mostly to support external clients. Also, #667 and #706 relate to Logstash and Filebeats. How would ElasticSearch determine the client version in order to change the reported version number?

@nknize
Copy link
Collaborator

nknize commented Jun 3, 2021

Let's not overthink this hack. It's a stopgap solution for 1.0+ forward compatibility with existing legacy clients to prevent users from having to mass upgrade.

The distribution change I suggested might not be necessary

It's not necessary but what your suggesting isn't a bad idea to spoof MainResponse exactly like a legacy cluster. This could be done in a follow up PR / "enhancement".

my instinct is to make that value consistent across all routes

/cat API uses the transport layer. Changing it breaks nodes joining the cluster so we don't want to do this.

I have mixed feelings to be allowing any random version number in opensearch.yml, but I can see how it can be convenient if we only want to change the number in the REST API.

Yes. This approach is a forward compatibility hack for MainResponse content until a compatibility module can do this "correctly". I would suggest that compatibility module provide an endpoint to return the spoofed version information instead of :9200 returning a lie.

Also, what is the plan to make all these clients understand OpenSearch distribution?

If OpenSearch sticks with Elastic clients then compatibility going forward is an unknown. Case in point: the new java client which Elastic intends to replace the existing Java High Level rest client. At some point here (preferably before 2.0) OpenSearch will be forced to fork the clients to ensure compatibility with a diverging core.

Having a REST API would be convenient as it would not require cluster restarts.

Rolling upgrades are required to deploy OpenSearch anyway.

@dblock dblock assigned mch2 and unassigned nknize and harold-wang Jun 3, 2021
@dblock
Copy link
Member

dblock commented Jun 3, 2021

Assigned to @mch2

@mch2
Copy link
Member

mch2 commented Jun 3, 2021

Put up a draft here for some initial comments

@liangxibing
Copy link

liangxibing commented Jun 7, 2021

Hi, do we have any idea when we will have this released?
Just tried latest opensearch docker image (with tag 1.0.0-rc1), still return below version number:

{
  "name" : "6d33cb208961",
  "cluster_name" : "es-cluster",
  "cluster_uuid" : "Y034pkVIRxq7E4DtbPH6Kg",
  "version" : {
    "distribution" : "opensearch",
    "number" : "1.0.0-rc1",
    "build_type" : "tar",
    "build_hash" : "26d579287f50bb33e17c8fe1f05ea208d5c64d1f",
    "build_date" : "2021-05-28T18:18:49.848386Z",
    "build_snapshot" : false,
    "lucene_version" : "8.8.2",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  }
}

And when logstash started, still report below errors:

logstash            | [2021-06-07T23:08:25,184][ERROR][logstash.outputs.elasticsearch] Failed to install template. {:message=>"Got response code '400' contacting Elasticsearch at URL 'https://es:9200/_template/logstash'", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::BadResponseCodeError", :backtrace=>["/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.8.6-java/lib/logstash/outputs/elasticsearch/http_client/manticore_adapter.rb:80:in `perform_request'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.8.6-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:317:in `perform_request_to_url'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.8.6-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:304:in `block in perform_request'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.8.6-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:399:in `with_connection'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.8.6-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:303:in `perform_request'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.8.6-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:311:in `block in put'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.8.6-java/lib/logstash/outputs/elasticsearch/http_client.rb:388:in `template_put'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.8.6-java/lib/logstash/outputs/elasticsearch/http_client.rb:86:in `template_install'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.8.6-java/lib/logstash/outputs/elasticsearch/template_manager.rb:31:in `install'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.8.6-java/lib/logstash/outputs/elasticsearch/template_manager.rb:17:in `install_template'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.8.6-java/lib/logstash/outputs/elasticsearch.rb:426:in `install_template'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.8.6-java/lib/logstash/outputs/elasticsearch.rb:274:in `block in register'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.8.6-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:137:in `block in setup_after_successful_connection'"]}

Logstash image: docker.elastic.co/logstash/logstash:7.12.1

@CEHENKLE
Copy link
Member

CEHENKLE commented Jun 8, 2021

@liangxibing Hi Simon -- PR 814 wasn't ready in time to make this release, so we cut it without it. Once it's merged, we'll figure out how to get it a bundle out that includes it, preferably before GA. I'm anxious to see it out, too!

/C

@dlvenable
Copy link
Member Author

The upcoming change should certainly help allow easy migration to using OpenSearch with existing clients.

I also want to bring up the topic of how compatibility will be expressed going forward. Once an ES administrator flips the switch off, clients will need to examine at the version.distribution and version.number to determine how to communicate with the OpenSearch cluster. This means clients must have built-in mappings from each OpenSearch version to compatible ElasticSearch versions.

For example, clients will need to know OpenSearch 1.0 is compatible with ElasticSearch 7.10, OpenSearch 2.0 is compatible with ElasticSearch 8.0 (hypothetically), OpenSearch 3.0 is compatible with ElasticSearch 8.5 (again, hypothetical), or whatever it is.

I'm hoping to open the discussion on how what support OpenSearch should provide for advertising ElasticSearch compatibility with clients. OpenSearch could make it easier for clients to know which ElasticSearch version the current OpenSearch version is compatible with. One possible solution is adding a new property named version.compatibility_number which would have the 7.10.2.

For example, curl http://localhost:9200 would have:

"version" : {
    "distribution" : "opensearch",
    "number" : "1.0.0-SNAPSHOT",
    "compatibility_number" : "7.10"

This can make it much easier for clients to support both ElasticSearch and OpenSearch going forward.

I think one big question is whether OpenSearch will retain compatibility with ElasticSearch or not. If not, then something like this makes little sense. But if so, then this could be a good way to encourage clients to write code for both ElasticSearch and OpenSearch.

@dblock
Copy link
Member

dblock commented Jun 8, 2021

I think one big question is whether OpenSearch will retain compatibility with ElasticSearch or not. If not, then something like this makes little sense. But if so, then this could be a good way to encourage clients to write code for both ElasticSearch and OpenSearch.

OpenSearch is derived from Elasticsearch. It's a fork at 7.10.2, so it's only compatible with ES until then. I expect ES to diverge. Clients can decide whether they want to support OpenSearch or ES, or both.

We tried to answer some of these questions in the upgrading FAQ. If we didn't do a good job, maybe bring more questions up? Try to answer them in a PR and we can discuss!

@dlvenable
Copy link
Member Author

Thank you for clarifying on the current expectations for compatibility.

@shwetathareja
Copy link
Member

We tried to answer some of these questions in the upgrading FAQ. If we didn't do a good job, maybe bring more questions up? Try to answer them in a PR and we can discuss!

@dblock FAQ link is broken

@dblock
Copy link
Member

dblock commented Jun 9, 2021

@shwetathareja Works for me... where does it land you? https://opensearch.org/faq/#c3

Looks like the website had a snafu overnight, opensearch-project/project-website#126 (comment)

@dlvenable
Copy link
Member Author

@dblock , @nknize
I want to bring up the idea of the distribution property again so that clients can know how to handle version going forward.

With the merge of #847 , if the setting is activated, then clients should get this:

"version" : {
    "distribution" : "opensearch",
    "number" : "7.10.2",

I believe that we want our clients to write code that can support all of these possibilities: ElasticSearch 7, OpenSearch 1 with version compatibility, and OpenSearch 1 without version compatibility. This helps with the migration path to OpenSearch 1 without version compatibility.

Currently, when in version compatibility mode, the client will have to add some short-term hack that also assumes OpenSearch 7 is the same as OpenSearch 1. Clients can write cleaner, more-robust code if we just remove the distribution value, or give it some other constant name, like version-compatible-opensearch. Then, OpenSearch 1 with version compatibility is treated like ElasticSearch 7 by the client.

@dblock
Copy link
Member

dblock commented Jun 24, 2021

I see your point @dlvenable, we're going to have a problem when we release OpenSearch 7.10.2. I am tempted to say that we should remove distribution=opensearch for this, but I am worried that there will be other side effects, WDYT @nknize @mch2?

@nknize
Copy link
Collaborator

nknize commented Jun 24, 2021

I think by the time OpenSearch 7.10.2 is released there will be a long tail divergence from the Elasticsearch codebase that this "compatibility" issue will be a distant memory. But this is a good point.

The distribution key-value was added as a way of tracking compatibility and "forked" distributions. It's a new field in OpenSearch that legacy clients (forward compatibilty) will not even be aware of so there should be no issue. Nevertheless, for completeness we could simply not set this field when running in "compatibility" mode so that MainResponse looks exactly like the legacy response.

@dblock
Copy link
Member

dblock commented Jun 24, 2021

I opened #880

@dlvenable
Copy link
Member Author

Yes, because ElasticSearch does not have the distribution property, that is why I think we can remove it in version compatibility mode. Ticket #880 is what I suggested and should help clients write code supporting different scenarios. Thanks!

@newzubakhin
Copy link

Are you going to release a new RC in July with the fix? My team can't use filebeat in production

@dblock
Copy link
Member

dblock commented Jul 7, 2021

A cluster setting was introduced in #847. We removed the distribution in backwards-compatible mode in #898. Set compatibility.override_main_response_version to put the cluster into a configuration that returns 7.10.2 as with ES OSS.

I am going to close this because the code is done. I added an issue to document it in opensearch-project/documentation-website#72.

@dblock dblock closed this as completed Jul 7, 2021
@dblock
Copy link
Member

dblock commented Jul 7, 2021

Are you going to release a new RC in July with the fix? My team can't use filebeat in production

This will be in 1.0 GA, shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests