Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use String.join() to describe a list of tasks #28941

Conversation

DaveCTurner
Copy link
Contributor

This change replaces the use of string concatenation with a call to
String.join(). String concatenation might be quadratic, unless the compiler can
optimise it away, whereas String.join() is more reliably linear. There can
sometimes be a large number of pending ClusterState update tasks and #28920
includes a report that this operation sometimes takes a long time.

This change replaces the use of string concatenation with a call to
String.join(). String concatenation might be quadratic, unless the compiler can
optimise it away, whereas String.join() is more reliably linear. There can
sometimes be a large number of pending ClusterState update tasks and elastic#28920
includes a report that this operation sometimes takes a long time.
@DaveCTurner DaveCTurner added :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.3.0 v5.6.9 labels Mar 8, 2018
@DaveCTurner DaveCTurner requested a review from bleskes March 8, 2018 14:12
@DaveCTurner
Copy link
Contributor Author

NB I haven't been able to reproduce the slowness reported in #28920, because I don't have easy access to a partitionable cluster with 10k shards. @danielmitterdorfer do you have any opinions about this change and/or good ideas about how to quantify the benefit?

If this change looks ok, I can ask the OP of #28920 to try it.

@danielmitterdorfer
Copy link
Member

do you have any opinions about this change

I am not familiar with this part of the code. But if this is the bottleneck then your change makes sense to me.

good ideas about how to quantify the benefit?

I think this would be a good candidate for a microbenchmark? We have some infrastructure for that in place in the benchmarks module. You can also have a look at an example benchmark that I did recently in the context of #28702. Happy to help out here as well and we can also run it in our microbenchmarking environment (which is tuned to reduce measurement noise).

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return s1 + ", " + s2;
}
}).orElse("");
return String.join(", ", tasks.stream().map(t -> (CharSequence)t.toString()).filter(t -> t.length() == 0)::iterator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity - didn't T::toString work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, apparently Iterator<String> cannot be converted to Iterator<CharSequence>:

> Task :server:compileJava 
/Users/davidturner/src/elasticsearch-master/server/src/main/java/org/elasticsearch/cluster/ClusterStateTaskExecutor.java:59: error: no suitable method found for join(String,tasks.stre[...]rator)
        return String.join(", ", tasks.stream().map(T::toString).filter(t -> t.length() == 0)::iterator);
                     ^
    method String.join(CharSequence,CharSequence...) is not applicable
      (varargs mismatch; CharSequence is not a functional interface
          multiple non-overriding abstract methods found in interface CharSequence)
    method String.join(CharSequence,Iterable<? extends CharSequence>) is not applicable
      (argument mismatch; bad return type in method reference
          Iterator<String> cannot be converted to Iterator<CharSequence>)

@DaveCTurner
Copy link
Contributor Author

Thanks @danielmitterdorfer for the pointer to the benchmarks module, that's just what I was after. I just overwrote the existing benchmark with this one: https://gist.github.com/DaveCTurner/de8763bc791d860e4fb0c9a9f98df7cd. At 10,000 tasks, each with a 100-byte description, the improvement is from ~500ms to ~120µs:

Result "org.elasticsearch.benchmark.routing.allocation.AllocationBenchmark.measureStreamReduce":
  578.282 ±(99.9%) 14.884 ms/op [Average]
  (min, avg, max) = (534.191, 578.282, 633.771), stdev = 22.278
  CI (99.9%): [563.398, 593.166] (assumes normal distribution)

...


Result "org.elasticsearch.benchmark.routing.allocation.AllocationBenchmark.measureStringJoin":
  0.119 ±(99.9%) 0.004 ms/op [Average]
  (min, avg, max) = (0.114, 0.119, 0.138), stdev = 0.006
  CI (99.9%): [0.115, 0.123] (assumes normal distribution)


# Run complete. Total time: 00:02:14

Benchmark                                Mode  Cnt    Score    Error  Units
AllocationBenchmark.measureStreamReduce  avgt   30  578.282 ± 14.884  ms/op
AllocationBenchmark.measureStringJoin    avgt   30    0.119 ±  0.004  ms/op

This seems worth doing.

@danielmitterdorfer
Copy link
Member

You're welcome. That's quite a significant difference indeed.

@DaveCTurner DaveCTurner merged commit 033a83b into elastic:master Mar 9, 2018
DaveCTurner added a commit that referenced this pull request Mar 9, 2018
This change replaces the use of string concatenation with a call to
String.join(). String concatenation might be quadratic, unless the compiler can
optimise it away, whereas String.join() is more reliably linear. There can
sometimes be a large number of pending ClusterState update tasks and #28920
includes a report that this operation sometimes takes a long time.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 9, 2018
This change replaces the use of string concatenation with a call to
String.join(). String concatenation might be quadratic, unless the compiler can
optimise it away, whereas String.join() is more reliably linear. There can
sometimes be a large number of pending ClusterState update tasks and elastic#28920
includes a report that this operation sometimes takes a long time.
sebasjm pushed a commit to sebasjm/elasticsearch that referenced this pull request Mar 10, 2018
This change replaces the use of string concatenation with a call to
String.join(). String concatenation might be quadratic, unless the compiler can
optimise it away, whereas String.join() is more reliably linear. There can
sometimes be a large number of pending ClusterState update tasks and elastic#28920
includes a report that this operation sometimes takes a long time.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Mar 12, 2018
* master: (28 commits)
  Maybe die before failing engine (elastic#28973)
  Remove special handling for _all in nodes info
  Remove Booleans use from XContent and ToXContent (elastic#28768)
  Update Gradle Testing Docs (elastic#28970)
  Make primary-replica resync failures less lenient (elastic#28534)
  Remove temporary file 10_basic.yml~
  Use different pipeline id in test. (pipelines do not get removed between tests extending from ESIntegTestCase)
  Use fixture to test the repository-gcs plugin (elastic#28788)
  Use String.join() to describe a list of tasks (elastic#28941)
  Fixed incorrect test try-catch statement
  Plugins: Consolidate plugin and module loading code (elastic#28815)
  percolator: Take `matchAllDocs` and `verified` of the sub result into account when analyzing a function_score query.
  Build: Remove rest tests on archive distribution projects (elastic#28952)
  Remove FastStringReader in favor of vanilla StringReader (elastic#28944)
  Remove FastCharArrayReader and FastCharArrayWriter (elastic#28951)
  Continue registering pipelines after one pipeline parse failure. (elastic#28752)
  Build: Fix ability to ignore when no tests are run (elastic#28930)
  [rest-api-spec] update doc link for /_rank_eval
  Switch XContentBuilder from BytesStreamOutput to ByteArrayOutputStream (elastic#28945)
  Factor UnknownNamedObjectException into its own class (elastic#28931)
  ...
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Oct 1, 2018
In elastic#28941 we changed the computation of cluster state task descriptions but
this introduced a bug in which we only log the empty descriptions (rather than
the non-empty ones). This PR fixes that.
DaveCTurner added a commit that referenced this pull request Oct 2, 2018
In #28941 we changed the computation of cluster state task descriptions but
this introduced a bug in which we only log the empty descriptions (rather than
the non-empty ones). This change fixes that.
DaveCTurner added a commit that referenced this pull request Oct 2, 2018
In #28941 we changed the computation of cluster state task descriptions but
this introduced a bug in which we only log the empty descriptions (rather than
the non-empty ones). This change fixes that.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Oct 2, 2018
In elastic#28941 we changed the computation of cluster state task descriptions but
this introduced a bug in which we only log the empty descriptions (rather than
the non-empty ones). This change fixes that. Backport of elastic#34182.
DaveCTurner added a commit that referenced this pull request Oct 2, 2018
In #28941 we changed the computation of cluster state task descriptions but
this introduced a bug in which we only log the empty descriptions (rather than
the non-empty ones). This change fixes that. Backport of #34182.
kcm pushed a commit that referenced this pull request Oct 30, 2018
In #28941 we changed the computation of cluster state task descriptions but
this introduced a bug in which we only log the empty descriptions (rather than
the non-empty ones). This change fixes that.
@DaveCTurner DaveCTurner deleted the 2018-03-08-describeTasks-using-String-join branch July 23, 2022 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >non-issue v5.6.9 v6.3.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants