-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix InternalAutoDateHistogram reproducible failure #32723
Conversation
…incorrect reference to innerintervals
Pinging @elastic/es-search-aggs |
RoundingInfo[] roundings = new RoundingInfo[6]; | ||
DateTimeZone timeZone = DateTimeZone.UTC; | ||
roundings[0] = new RoundingInfo(createRounding(DateTimeUnit.SECOND_OF_MINUTE, timeZone), | ||
1000L, 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using a large innerInterval for the first rounding used by the function is important to trigger the problematic code path
|
||
OffsetDateTime timestamp = Instant.parse("2018-01-01T00:00:01.000Z").atOffset(ZoneOffset.UTC); | ||
int result = InternalAutoDateHistogram.getAppropriateRounding(timestamp.toEpochSecond()*1000, | ||
timestamp.plusDays(1).toEpochSecond()*1000, 0, roundings, 25); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we've also gotta start with an index that we know won't be used: to trigger the code path, the function needs to iterate past the first rounding.
On repeated runs I still see failures. For instance a run of 1000 iterations yielded this:
I can reproduce this on the command line:
However I haven't been able to get this single case to reproduce. |
import java.time.Instant; | ||
import java.time.OffsetDateTime; | ||
import java.time.ZoneOffset; | ||
import java.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jenkins is not amused ;) There's a setting in IntelliJ to avoid wildcard imports (at least until it hits an unreasonable number, I use 9999) and probably in other IDEs too.
@DaveCTurner thanks for the review, I'll track down the failure. once more into the breach! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a minor comment but I think the change looks good. I haven't approved the PR only because it looks like there is another failure that is being looked into as well here.
@@ -77,6 +84,22 @@ protected InternalAutoDateHistogram createTestInstance(String name, | |||
return new InternalAutoDateHistogram(name, buckets, targetBuckets, bucketInfo, format, pipelineAggregators, metaData); | |||
} | |||
|
|||
public void testGetAppropriateRounding() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems to be testing a specific scenario rather than testing the method in general. Could we maybe make the name reflect this so its not assumed that its testing the method in general? Would it also be helpful to add a comment/javaDoc which explains the scenario it is testing?
I was able to reproduce the problem @DaveCTurner found reliably by adding the |
@DaveCTurner @colings86 this is ready (fingers crossed) for final review: I reproduced the failures and believe I have addressed them, and have run a bunch of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good but it seems you left the @Seed
annotation on the test which need to be removed but I also think it would be worth running a battery of test runs again to make sure there aren't more bugs lurking when the seed is removed. Assuming that battery of tests passes this LGTM
@@ -229,6 +230,7 @@ protected T createUnmappedInstance(String name, | |||
return createTestInstance(name, pipelineAggregators, metaData); | |||
} | |||
|
|||
@Seed("ED2D551807CFA25B") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure you intend to commit this?
Removing the
|
If it helps, my preferred way to run lots of iterations of tests like this is the following:
This runs the tests until they fail, capturing the log output, and tries different timezones and locales (which I guess to be important here) in a way that a single |
I've used @DaveCTurner's command as a test, and after running a bunch of local iterations, I believe this PR is ready for re-review. This proved quite tricky to get correct! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for beasting the tests @DaveCTurner and @pcsanwald
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - one nit about whitespace
@@ -480,7 +480,7 @@ private int getAppropriateRounding(long minKey, long maxKey, int roundingIdx, | |||
currentKey = currentRounding.nextRoundingValue(currentKey); | |||
} | |||
currentRoundingIdx++; | |||
} while (requiredBuckets > (targetBuckets * roundings[roundingIdx].getMaximumInnerInterval()) | |||
} while (requiredBuckets > (targetBuckets * roundings[currentRoundingIdx-1].getMaximumInnerInterval()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whitespace nit: indentation looks wrong, and spaces around the -
sign?
* elastic/master: (46 commits) NETWORKING: Make RemoteClusterConn. Lazy Resolve DNS (#32764) [DOCS] Splits the users API documentation into multiple pages (#32825) [DOCS] Splits the token APIs into separate pages (#32865) [DOCS] Creates redirects for role management APIs page Bypassing failing test PainlessDomainSplitIT#testHRDSplit (#32966) TEST: Mute testRetentionPolicyChangeDuringRecovery [DOCS] Fixes more broken links to role management APIs [Docs] Tweaks and fixes to rollup docs [DOCS] Fixes links to role management APIs [ML][TEST] Fix BasicRenormalizationIT after adding multibucket feature [DOCS] Splits the roles API documentation into multiple pages (#32794) [TEST] Run pre 6.4 nodes in non-FIPS JVMs (#32901) Make Geo Context Mapping Parsing More Strict (#32821) [ML] fix updating opened jobs scheduled events (#31651) (#32881) Scripted metric aggregations: add deprecation warning and system property to control legacy params (#31597) Tests: Fix timezone conversion in DateTimeUnitTests Enable FIPS140LicenseBootstrapCheck (#32903) Fix InternalAutoDateHistogram reproducible failure (#32723) Remove assertion in testDocStats on deletedDocs counter (#32914) HLRC: Move ML request converters into their own class (#32906) ...
Closes #32215. This PR addresses two issues:
The problem here was a mistake in setting up the expected results: I had originally written the test setup to start with the smallest possible
innerInterval
and try the larger ones, but the approach in the reduce method is to try the largest and step down. I've updated the logic to reflect this and run the test with-Dtests.iters=200
to convince myself this won't cause a further failure.maximumInnerInterval
of the current rounding, we use the index of the initial rounding passed in.To fix this, I've updated the logic and added a test case that fails using the old code path, and passes with the modified code path.