merge from author #5

dengweisysu · 2019-10-09T06:00:48Z

No description provided.

Re-enable BWC tests now that #46534 has been backported to 7.x

Today we log and swallow exceptions during cluster state application, but such an exception should not occur. This commit adds assertions of this fact, and updates the Javadocs to explain it. Relates #47038

Simplify `SnapshotResiliencyTests` to more closely match the structure of `AbstractCoordinatorTestCase` and allow for future drying up between the two classes: * Make the test cluster nodes a nested-class in the test cluster itself * Remove the needless custom network disruption implementation and simply track disconnected node ids like `AbstractCoordinatorTestCase` does

Relates #30101

Relates #47107

This change removes unused parameters and simplifies the logic.

Relates #30101

We already prevent flushing in Engine if it's recovering. Hence, we can remove the protection in IndexShard.

#46180 added support for the `[source,console]` language for snippets which should be tested. This removes support for the `// CONSOLE` magic comment, which serve a similar purpose. Snippets that include the `// CONSOLE` magic comment will return an exception.

This PR removes a use-case of the ClusterFormationTasks and converts a project that flew under the radar so far. There's probably more clean-up possible here, but for now the goal is to be able to remove that code after `RunTask` is also updated.

* [DOCS] Reformats ranking evaluation API. Co-Authored-By: James Rodewig <[email protected]>

This PR changes the ingest executing to be non blocking by adding an additional method to the Processor interface that accepts a BiConsumer as handler and changing IngestService#executeBulkRequest(...) to ingest document in a non blocking fashion iff a processor executes in a non blocking fashion. This is the second PR that merges changes made to server module from the enrich branch (see #32789) into the master branch. The plan is to merge changes made to the server module separately from the pr that will merge enrich into master, so that these changes can be reviewed in isolation. This change originates from the enrich branch and was introduced there in #43361.

* ILM: parse origination date from index name Introduce the `index.lifecycle.parse_origination_date` setting that indicates if the origination date should be parsed from the index name. If set to true an index which doesn't match the expected format (namely `indexName-{dateFormat}-optional_digits` will fail before being created. The origination date will be parsed when initialising a lifecycle for an index and it will be set as the `index.lifecycle.origination_date` for that index. A user set value for `index.lifecycle.origination_date` will always override a possible parsable date from the index name.

Today if metadata persistence is excessively slow on a master-ineligible node then the `ClusterApplierService` emits a warning indicating that the `GatewayMetaState` applier was slow, but gives no further details. If it is excessively slow on a master-eligible node then we do not see any warning at all, although we might see other consequences such as a lagging node or a master failure. With this commit we emit a warning if metadata persistence takes longer than a configurable threshold, which defaults to `10s`. We also emit statistics that record how much index metadata was persisted and how much was skipped since this can help distinguish cases where IO was slow from cases where there are simply too many indices involved.

This debug was never logged, so although #45652 is not yet fixed there is no point keeping it. The strange IndexNotFoundException comes entirely from the authz layer. Relates #46739 Relates #45652

Currently the logic to check if a connection to a remote discovery node exists and otherwise create a proxy connection is mixed with the collect nodes, cluster connection lifecycle, and other RemoteClusterConnection logic. This commit introduces a specialized RemoteConnectionManager class which handles the open connections. Additionally, it reworks the "round-robin" proxy logic to create the list of potential connections at connection open/close time, opposed to each time a connection is requested.

This commit adds support for POST requests to the SLM `_execute` API, because POST is a more appropriate HTTP verb for this action as it is not idempotent. The docs are also changed to favor POST over PUT, although PUT is not removed or officially deprecated.

Enables support for Cartesian geometries shape type. We still need to decide how to handle the distance function since it is currently using the haversine distance formula and returns results in meters, which doesn't make any sense for Cartesian geometries. Closes #46412 Relates to #43644

* [ML][Inference] adding tree model * renaming features for updated schema

* Wait for snapshot completion in SLM snapshot invocation This changes the snapshots internally invoked by SLM to wait for completion. This allows us to capture more snapshotting failure scenarios. For example, previously a snapshot would be created and then registered as a "success", however, the snapshot may have been aborted, or it may have had a subset of its shards fail. These cases are now handled by inspecting the response to the `CreateSnapshotRequest` and ensuring that there are no failures. If any failures are present, the history store now stores the action as a failure instead of a success. Relates to #38461 and #43663

We can have a large number of shard copies in this test. For example, the two recent failures have 24 and 27 copies respectively and all replicas have to copy segment files as their stores are corrupted. Our CI needs more than 30 seconds to start all these copies. Note that in two recent failures, the cluster was green just after the cluster health timed out. Closes #41899

This commit removes a bunch of unused private fields and unused private methods from the code base.

…cified (#46897) This commit adds the documentation to point the user that when one creates API keys with no role descriptor specified then that API key will have a point in time snapshot of user permissions. Closes#46876

…es (#47019) When we added support for wildcard application names, we started to build the prefix query along with the term query but we used 'filter' clause instead of 'should', so this would not fetch the correct application privilege descriptor thereby failing the _has_privilege checks. This commit changes the clause to use should and with minimum_should_match as 1.

This commit adds a Java source formatter and checker into the build process. This is not yet enabled for any sub-projects - to format and check a sub-project, add its Gradle path into `build.gradle` and run: ./gradlew spotlessApply to format, and: ./gradlew spotlessJavaCheck # or: ./gradlew precommit to verify formatting.

) When deactivating a watch, there is a chance that it is fully deactivated and reporting as not running but the history is not fully written yet. There is not a tight coupling between the associated watcher history index and the deactivation. This test assumes that once a watch is deactivated that all history is fully written in a very short time period. If the Watch is deactivated, but the history is slow to write it can result in a failing test. This change removes an assertion that assumes that the deactivation of a watch ensured the all of the watch history was written. There is still a minor race condition with respect to the remaining history assertions. However, if the history is slow to be written, it will allow the test to still passing. fixes #47503

move the main endpoint to /_transform/ from /_data_frame/transforms/ with providing backwards compatibility and deprecation warnings

…47611) This has ELambda and ENewArrayFunctionRef add their generated synthetic methods to the SClass node during the semantic pass and removes this data from the write pass. This is the first step to remove "Globals" (mutable state) from the write pass.

…7447) * [ML][Inference] adjusting definition object schema and validation * finalizing schema and fixing inference npe * addressing PR comments

If a thread pool rejection exception happens, an alternative code path is chosen to write history and delete the trigger. If an exception happens during deletion of the trigger an exception may be thrown and not caught. This commit catches the exception and provides a meaning error message. fixes #47008

This test is believed to be fixed by #43939 Closes #45585

These tests are believed to be fixed by #43939 closes #45582 and #43975

This test was less stable following a backport as the shards in these indices did not always show up as allocated immediately after closing them. This ensures those shards have stabilized before trying to roll over.

This test is believed to be fixed by #43939 closes #43889

This test is believed to be fixed by #43939 closes #43988

* Explicit name for doc snippets This change adds option to specify explicit name for doc snippet. This name will be used instead of line number when creating yml file in buildRestTests task. Stable names should improve tracking changes through history and allow Gradle to skip tests on non-code docs changes. * Avoid duplication in names * Changes id declaration, more examples * Fix names in examples * Unit test added * Throw exception on duplicate names * Moved UT to Java

The example use of a scoring script was incorrectly using a filter script query, which has no scoring, and thus no _score variable avialable. This commit converts the example doc to using the newer script_score query.

This enhances the existing SLM test using users/roles/etc to also test that SLM retention works when security is enabled. Relates to #43663

Previously when retrieving an SLM policy it would always return a 200 with `{}` in the body, even if the policy did not exist. This changes that behavior to throw an error (similar to our other APIs) if a policy doesn't exist. This also adds a basic CRUD yml test for the behavior. Resolves #47664

Setting `cluster.routing.allocation.disk.include_relocations` to `false` is a bad idea since it will lead to the kinds of overshoot that were otherwise fixed in #46079. This commit deprecates this setting so it can be removed in the next major release.

Importing dangling indices with aliases risks breaking functionalities using those aliases. For instance, writing to an alias may break if there is no is_write_index indication on the existing alias and the dangling index import adds a second index to the alias. Or an application could have an assumption about the alias only ever pointing to one index and suddenly seeing the alias also linked to an old index could break it. With this change we strip aliases of the index meta data found before importing a dangling index.

Setting `cluster.routing.allocation.disk.include_relocations` to `false` is a bad idea since it will lead to the kinds of overshoot that were otherwise fixed in #46079. This setting was deprecated in #47443. This commit removes it.

When exceptions could be returned from another node, the exception might be wrapped in a `RemoteTransportException`. In places where we handled specific exceptions using `instanceof` we ought to unwrap the cause first. This commit attempts to fix this issue after searching code in the ML plugin.

) * Convert RunTask to use testclusers, remove ClusterFormationTasks This PR adds a new RunTask and a way for it to start a testclusters cluster out of band and block on it to replace the old RunTask that used ClusterFormationTasks. With this we can now remove ClusterFormationTasks.

* Add Snapshot Lifecycle Retention documentation This commits adds API and general purpose documentation for SLM retention. Relates to #43663 * Fix docs tests * Update default now that #47604 has been merged * Update docs/reference/ilm/apis/slm-api.asciidoc Co-Authored-By: Gordon Brown <[email protected]> * Update docs/reference/ilm/apis/slm-api.asciidoc Co-Authored-By: Gordon Brown <[email protected]> * Update docs with feedback

Similar to #47507. We are throwing `SnapshotException` when you (and SLM tests) would expect a `SnapshotMissingException` for concurrent snapshot status and snapshot delete operations with a very low probability. Fixed the exception type and added a test for this scenario.

This commit introduces a simple remote connection strategy which will open remote connections to a configurable list of user supplied addresses. These addresses can be remote Elasticsearch nodes or intermediate proxies. We will perform normal clustername and version validation, but otherwise rely on the remote cluster to route requests to the appropriate remote node.

Failed snapshots will eventually build up unless they are deleted. While failures may not take up much space, they add noise to the list of snapshots and it's desirable to remove them when they are no longer useful. With this change, failed snapshots are deleted using the following strategy: `FAILED` snapshots will be kept until the configured `expire_after` period has passed, if present, and then be deleted. If there is no configured `expire_after` in the retention policy, then they will be deleted if there is at least one more recent successful snapshot from this policy (as they may otherwise be useful for troubleshooting purposes). Failed snapshots are not counted towards either `min_count` or `max_count`.

If a cluster sending monitoring data is unhealthy and triggers an alert, then stops sending data the following exception [1] can occur. This exception stops the current Watch and the behavior is actually correct in part due to the exception. Simply fixing the exception introduces some incorrect behavior. Now that the Watch does not error in the this case, it will result in an incorrectly "resolved" alert. The fix here is two parts a) fix the exception b) fix the following incorrect behavior. a) fixing the exception is as easy as checking the size of the array before accessing it. b) fixing the following incorrect behavior is a bit more intrusive - Note - the UI depends on the success/met state for each condition to determine an "OK" or "FIRING" In this scenario, where an unhealthy cluster triggers an alert and then goes silent, it should keep "FIRING" until it hears back that the cluster is green. To keep the Watch "FIRING" either the index action or the email action needs to fire. Since the Watch is neither a "new" alert or a "resolved" alert, we do not want to keep sending an email (that would be non-passive too). Without completely changing the logic of how an alert is resolved allowing the index action to take place would result in the alert being resolved. Since we can not keep "FIRING" either the email or index action (since we don't want to resolve the alert nor re-write the logic for alert resolution), we will introduce a 3rd action. A logging action that WILL fire when the cluster is unhealthy. Specifically will fire when there is an unresolved alert and it can not find the cluster state. This logging action is logged at debug, so it should be noticed much. This logging action serves as an 'anchor' for the UI to keep the state in an a "FIRING" status until the alert is resolved. This presents a possible scenario where a cluster starts firing, then goes completely silent forever, the Watch will be "FIRING" forever. This is an edge case that already exists in some scenarios and requires manual intervention to remove that Watch. This changes changes to use a template-like method to populate the version_created for the default monitoring watches. The version is set to 7.5 since that is where this is first introduced. Fixes #43184

* Separate SLM stop/start/status API from ILM This separates a start/stop/status API for SLM from being tied to ILM's operation mode. These APIs look like: ``` POST /_slm/stop POST /_slm/start GET /_slm/status ``` This allows administrators to have fine-grained control over preventing periodic snapshots and deletions while performing cluster maintenance. Relates to #43663 * Allow going from RUNNING to STOPPED * Align with the OperationMode rules * Fix slmStopping method * Make OperationModeUpdateTask constructor private * Wipe snapshots better in test

This commit adds a thread filter for gradle unit tests which omits threads gradle may create that we have no control over shutting down. The current example of this is for project.exec which gradle pools. closes #47417

jkakavas and others added 30 commits September 25, 2019 14:30

Re-enable BWC tests (#47101)

bd5573d

Re-enable BWC tests now that #46534 has been backported to 7.x

Assert no exceptions during state application (#47090)

053e95b

Today we log and swallow exceptions during cluster state application, but such an exception should not occur. This commit adds assertions of this fact, and updates the Javadocs to explain it. Relates #47038

Disable the use of artifactory in CI (#47100)

cbe8b36

Mute monitoring/bulk/20_privileges

af57562

Relates #30101

Mute ClusterShardLimitIT.testIndexCreationOverLimitFromTemplate

b52d1da

Relates #47107

Tidy up Store#trimUnsafeCommits (#47062)

65374c9

This change removes unused parameters and simplifies the logic.

Mute second test in monitoring/bulk/10_basic

de88249

Relates #30101

Remove isRecovering method from Engine (#47039)

9df6cbe

We already prevent flushing in Engine if it's recovering. Hence, we can remove the protection in IndexShard.

[DOCS] Reformat suggesters page. (#47010)

69422b9

[DOCS] Reformats ranking evaluation API (#46974)

36502b2

* [DOCS] Reformats ranking evaluation API. Co-Authored-By: James Rodewig <[email protected]>

Remove temporary _cat/indices debug (#47116)

f565b1d

This debug was never logged, so although #45652 is not yet fixed there is no point keeping it. The strange IndexNotFoundException comes entirely from the authz layer. Relates #46739 Relates #45652

[DOCS] Reformat clone index API docs (#46762)

48471b2

Track enabled test task candidate class files as task input (#47054)

a67180d

[DOCS] Fix links to transform pages (#47134)

6c64566

[ML][Inference] adding tree model (#47044)

85f1272

* [ML][Inference] adding tree model * renaming features for updated schema

Remove unused private methods and fields (#47115)

b1a03a1

This commit removes a bunch of unused private fields and unused private methods from the code base.

jrodewig and others added 29 commits October 7, 2019 09:37

[DOCS] Correct deprecation note in mapping docs (#47656)

5cec47d

[Transform] move root endpoint to _transform with BWC layer (#47127)

e9e121c

move the main endpoint to /_transform/ from /_data_frame/transforms/ with providing backwards compatibility and deprecation warnings

[ML][Inference] adjusting definition object schema and validation (#4…

890b3db

…7447) * [ML][Inference] adjusting definition object schema and validation * finalizing schema and fixing inference npe * addressing PR comments

Re-enable Watcher rest test (#47687)

7db3dc3

This test is believed to be fixed by #43939 Closes #45585

Re-enable Watcher rest tests (#47690)

07ed3d5

These tests are believed to be fixed by #43939 closes #45582 and #43975

Ensure index is green after closing in test (#47541)

8b38e9f

This test was less stable following a backport as the shards in these indices did not always show up as allocated immediately after closing them. This ensures those shards have stabilized before trying to roll over.

Re-enable Watcher rest tests (#47692)

04a1b1d

This test is believed to be fixed by #43939 closes #43889

Re-enable Watcher rest test (#47699)

9358b2f

This test is believed to be fixed by #43939 closes #43988

Switch stored script example to script_score query (#47691)

4de0d77

The example use of a scoring script was incorrectly using a filter script query, which has no scoring, and thus no _score variable avialable. This commit converts the example doc to using the newer script_score query.

Add a test for SLM retention with security enabled (#47608)

e0c2ac1

This enhances the existing SLM test using users/roles/etc to also test that SLM retention works when security is enabled. Relates to #43663

Remove include_relocations setting (#47717)

7b652ad

Setting `cluster.routing.allocation.disk.include_relocations` to `false` is a bad idea since it will lead to the kinds of overshoot that were otherwise fixed in #46079. This setting was deprecated in #47443. This commit removes it.

[DOCS] Fix errors in rollover index API docs (#47702)

29c6da7

[ML] Add basic BWC tests for data frame analytics (#47212)

2246767

Filter out special gradle threads from leak control (#47713)

603c3e6

This commit adds a thread filter for gradle unit tests which omits threads gradle may create that we have no control over shutting down. The current example of this is for project.exec which gradle pools. closes #47417

dengweisysu merged commit 50a4f37 into dengweisysu:master Oct 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge from author #5

merge from author #5

dengweisysu commented Oct 9, 2019

merge from author #5

merge from author #5

Conversation

dengweisysu commented Oct 9, 2019