Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tests] Relax allowed delta in extended_stats aggregation #27171

Merged
merged 1 commit into from
Nov 10, 2017

Conversation

cbuescher
Copy link
Member

The order in which double values are added in java can give different results
for the sum, so we need to allow a certain delta in the test assertions. The
current value was still a bit too low, which manifested itself in occasional
test failures.

e.g. the CI failure that triggered this PR:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+periodic/4835/testReport/junit/org.elasticsearch.search.aggregations.metrics/InternalExtendedStatsTests/testReduceRandom/

@cbuescher cbuescher added >enhancement >test Issues or PRs that are addressing/adding tests v7.0.0 review labels Oct 30, 2017
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue appears only when we add an incremental reduce in InternalAggregationTestCase#testReduceRandom.
In this case the internal aggregations are shuffled and we do the reduce in two steps. This breaks the global ordering of the reduce and causes all sorts of issues when using doubles. Could we instead fix the base class to preserve the order like this:

diff --git a/test/framework/src/main/java/org/elasticsearch/test/InternalAggregationTestCase.java b/test/framework/src/main/java/org/elasticsearch/test/InternalAggregationTestCase.java
index fa342ee468..ee258a8c06 100644
--- a/test/framework/src/main/java/org/elasticsearch/test/InternalAggregationTestCase.java
+++ b/test/framework/src/main/java/org/elasticsearch/test/InternalAggregationTestCase.java
@@ -231,15 +231,14 @@ public abstract class InternalAggregationTestCase<T extends InternalAggregation>
         MockBigArrays bigArrays = new MockBigArrays(Settings.EMPTY, new NoneCircuitBreakerService());
         if (randomBoolean() && toReduce.size() > 1) {
             // sometimes do an incremental reduce
-            Collections.shuffle(toReduce, random());
             int r = randomIntBetween(1, toReduceSize);
             List<InternalAggregation> internalAggregations = toReduce.subList(0, r);
             InternalAggregation.ReduceContext context =
                 new InternalAggregation.ReduceContext(bigArrays, mockScriptService, false);
             @SuppressWarnings("unchecked")
             T reduced = (T) inputs.get(0).reduce(internalAggregations, context);
-            toReduce = new ArrayList<>(toReduce.subList(r, toReduceSize));
-            toReduce.add(reduced);
+            toReduce = toReduce.subList(r, toReduceSize);
+            toReduce.add(0, reduced);
         }
         InternalAggregation.ReduceContext context =
             new InternalAggregation.ReduceContext(bigArrays, mockScriptService, true);

This way we can change all the deltas to 0 and remove the true negatives that could hide a bug ?

@cbuescher cbuescher force-pushed the fix-InternalExtendedStatsTest branch from 6da39f2 to c7887cf Compare November 6, 2017 12:43
@cbuescher
Copy link
Member Author

@jimczi thanks for the review, I changed the PR according to your suggestions by removing the random shuffling. However, looking at the commit that first introduced the randomization (f933f80#diff-8e77ff5eaf3bae1ff401adfc0507766cR60), I'm wondering if we are still testing the right thing here then. When I read that commit right, this branch of the reduce test is there to simulate the behaviour of the aggregtion reduce phase in a real cluster where some shard aggregation results are reduced before others, which also breaks ordering. So getting the small deltas when doing this for many double values is something we also see in the real system behaviour, shouldn't we try to account for that in tests also? To me e.g. it was important to see in tests that changing the ordering here affects the result slightly, something that we also need to explain (and detect if it changes significantly). I'm happy to merge, but I'm rather for allowing small deltas here and be aware of the behaviour that I think also happens in non-unit test scenarios.

@cbuescher cbuescher force-pushed the fix-InternalExtendedStatsTest branch from c7887cf to 2559164 Compare November 6, 2017 15:18
@cbuescher
Copy link
Member Author

@jimczi could you take another look at this? I changed the tests according to your suggestions, but the more I think about it I'd prefer going back to the previous option of having the randomization and allowing (and testing) small deviations). Maybe we should get another opinion on this? That said, I'm fine either way, just let me know what you think about my reasons stated above.

@jimczi
Copy link
Contributor

jimczi commented Nov 10, 2017

I changed the tests according to your suggestions, but the more I think about it I'd prefer going back to the previous option of having the randomization and allowing (and testing) small deviations).

I don't have a strong opinion on this but I don't like that we need a delta to test a simple sum.
Could we test with random integers instead ? The order should not matter in this case and we can make safe assert ?

@cbuescher
Copy link
Member Author

cbuescher commented Nov 10, 2017

I don't have a strong opinion on this but I don't like that we need a delta to test a simple sum.

I completely agree, but IMHO the purpose of this test is not just testing a sum but to test the behaviour we get on a real multi-shard aggregation where we get those fluctuations with incremental reduction.

Could we test with random integers instead ? The order should not matter in this case and we can make safe assert ?

I don't think this works because internally we use double values in the aggregations. I will try though.

@cbuescher
Copy link
Member Author

Could we test with random integers instead ? The order should not matter in this case and we can make safe assert ?

Unfortunately this doesn't seem to work, we store the sum as a double in the InternalStats aggregation, which already introduces some inexactness.

@jimczi
Copy link
Contributor

jimczi commented Nov 10, 2017

Yes sorry I meant small integers but this is also broken. We already bound the doubles that we randomly create to -1M, 1M so I am ok with the small delta change to fix this issue for the moment.
We can discuss the reduce issue separately since it's the same for all aggregator tests.
No need for another review, you can push the delta change directly if you want.

The order in which double values are added in java can give different results
for the sum, so we need to allow a certain delta in the test assertions. The
current value was still a bit too low, which manifested itself in occasional
test failures.
@cbuescher cbuescher force-pushed the fix-InternalExtendedStatsTest branch from 2559164 to 8e4a2d2 Compare November 10, 2017 12:07
@cbuescher cbuescher merged commit 4fa33e7 into elastic:master Nov 10, 2017
cbuescher added a commit that referenced this pull request Nov 10, 2017
The order in which double values are added in java can give different results
for the sum, so we need to allow a certain delta in the test assertions. The
current value was still a bit too low, which manifested itself in occasional
test failures.
cbuescher added a commit that referenced this pull request Nov 10, 2017
The order in which double values are added in java can give different results
for the sum, so we need to allow a certain delta in the test assertions. The
current value was still a bit too low, which manifested itself in occasional
test failures.
martijnvg added a commit that referenced this pull request Nov 14, 2017
* es/master: (24 commits)
  Reduce synchronization on field data cache
  add json-processor support for non-map json types (#27335)
  Properly format IndexGraveyard deletion date as date (#27362)
  Upgrade AWS SDK Jackson Databind to 2.6.7.1
  Stop responding to ping requests before master abdication (#27329)
  [Test] Fix POI version in packaging tests
  Allow affix settings to specify dependencies (#27161)
  Tests: Improve size regex in documentation test (#26879)
  reword comment
  Remove unnecessary logger creation for doc values field data
  [Geo] Decouple geojson parse logic from ShapeBuilders
  [DOCS] Fixed link to docker content
  Plugins: Add versionless alias to all security policy codebase properties (#26756)
  [Test] #27342 Fix SearchRequests#testValidate
  [DOCS] Move X-Pack-specific Docker content (#27333)
  Fail queries with scroll that explicitely set request_cache (#27342)
  [Test] Fix S3BlobStoreContainerTests.testNumberOfMultiparts()
  Set minimum_master_nodes to all nodes for REST tests (#27344)
  [Tests] Relax allowed delta in extended_stats aggregation (#27171)
  Remove S3 output stream (#27280)
  ...
@lcawl lcawl removed the v6.1.0 label Dec 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>test Issues or PRs that are addressing/adding tests v6.0.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants