Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate results and model snapshot retention using latest bucket timestamps #51061

Merged
merged 14 commits into from
Jan 22, 2020

Conversation

davidkyle
Copy link
Member

@davidkyle davidkyle commented Jan 15, 2020

The retention period is calculated relative to the last bucket result or model snapshot. For example if results retention is set to 30 days and the last bucket result was from 10 days ago then only results older than 40 days will be deleted. The same logic applies to model snapshots, but measured from the timestamp of the most recent model snapshot.

Previously retention was calculated relative to wall clock time, which was surprising for jobs that were not running continuously.

There is still an element of confusion, because model snapshot timestamps record the wall clock time that the model snapshot was created, not the model time. However, this change is still an improvement in that if you stop a real-time job for a period of days then you don't lose all model snapshots other than the active one.

@davidkyle davidkyle added WIP :ml Machine learning v8.0.0 labels Jan 15, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

terminate(threadPool);
}

public void testRemove_GivenJobsWithoutRetentionPolicy() throws IOException {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way to create a job without a model snapshot retention policy. If you set modelSnapshotRetentionDays to null then the builder will automatically default it back to 1 when the job is read back from xcontent.

Negative numbers are not tolerated and throw a validation exception so the only way of not having a retention policy is setting it to a large value

@droberts195 droberts195 changed the title Calculate results and snapshot retention using latest timestamps Calculate results and snapshot retention using latest bucket timestamps Jan 16, 2020
@davidkyle davidkyle added v7.7.0 and removed WIP labels Jan 16, 2020
The time in days that model snapshots are retained for the job. Older snapshots
are deleted. The default value is `1`, which means snapshots are retained for
one day (twenty-four hours).
Advanced configuration option. Denotes the period for which model snapshots
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szabosteve can you look over these docs changes please. Do they make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidkyle Sorry, I haven't noticed this one earlier.

Copy link
Contributor

@szabosteve szabosteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs changes LGTM. Thank you for the documentation effort!

// We are going to create data for last 2 days
long nowMillis = System.currentTimeMillis();
// We are going to create 2 days of data starting 24 hrs ago
long lastestBucketTime = System.currentTimeMillis() - TimeValue.timeValueHours(1).millis();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: lastest -> latest

@@ -57,15 +57,15 @@ public void setUpData() throws IOException {
.setMapping("time", "type=date,format=epoch_millis")
.get();

// We are going to create data for last 2 days
long nowMillis = System.currentTimeMillis();
// We are going to create 2 days of data starting 24 hrs ago
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We are going to create 2 days of data starting 24 hrs ago
// We are going to create 3 days of data ending 1 hour ago

long nowEpochMs = Instant.now(Clock.systemDefaultZone()).toEpochMilli();
return nowEpochMs - new TimeValue(retentionDays, TimeUnit.DAYS).getMillis();
listener.onResponse(nowEpochMs - new TimeValue(retentionDays, TimeUnit.DAYS).getMillis());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this method be abstract instead of providing a default based on wall clock time? It seems that now we've made deletion of model snapshots and results relative to latest bucket time rather than wall clock time we should do that for all job related documents that have a timestamp. So having a default implementation of this method that uses wall clock time just seems like a way that we'll introduce a bug by accidentally deleting some other type of document based on wall clock time.

ActionListener<SearchResponse> listener = (ActionListener<SearchResponse>) invocationOnMock.getArguments()[2];
listener.onResponse(response);
return null;
doAnswer(invocationOnMock -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is indented more than the line above.

}

private WrappedBatchedJobsIterator newJobIterator() {
BatchedJobsIterator jobsIterator = new BatchedJobsIterator(client, AnomalyDetectorsIndex.configIndexName());
return new WrappedBatchedJobsIterator(jobsIterator);
}

private long calcCutoffEpochMs(long retentionDays) {
void calcCutoffEpochMs(String jobId, long retentionDays, ActionListener<Long> listener) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other methods that extending classes are expected to override are protected, even though the actual classes that do extend this class are all in the same package. But then this class's constructor is package private so it would be impossible to have a derived class in another package despite the abstract methods being set up for that. I think for consistency they should all be the same - either protected or package private. Certainly with a default implementation here I think protected makes it clearer that we expect derived classes to modify it rather than it's just been made accessible for testing. But then if you agree with my other suggestion and make this abstract then that also makes that clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method is package-private for testing otherwise it is very difficult to test and can only be done indirectly. abstract makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But protected would also allow it to be tested directly. Basically I think all the abstract methods and the constructor should have the same accessibility, whether that be protected or package private. So either change the ones that are currently protected to be package private or change this one plus the constructor to be protected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the base class is package private, only classes in the same package can implement the abstract method whether the method is protected or package private. The only difference is I could derive a new class from one of the package's public non-abstract classes and reimplement calcCutoffEpochMs in a different package if it was protected but not if package private. In practice this isn't a concern so I've gone for the principle of least visibility and made the abstract methods package private.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I saw a couple of nits but happy to merge without further review


protected abstract Long getRetentionDays(Job job);
abstract Long getRetentionDays(Job job);

/**
* Template method to allow implementation details of various types of data (e.g. results, model snapshots).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next two methods (removeDataBefore and createQuery) might as well be package private too for consistency with the other abstract methods.

@@ -57,15 +57,15 @@ public void setUpData() throws IOException {
.setMapping("time", "type=date,format=epoch_millis")
.get();

// We are going to create data for last 2 days
long nowMillis = System.currentTimeMillis();
// We are going to create 3 days of data starting 1 hr ago
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We are going to create 3 days of data starting 1 hr ago
// We are going to create 3 days of data ending 1 hr ago

@davidkyle
Copy link
Member Author

Thanks for the rewrites @lcawl

@davidkyle
Copy link
Member Author

run elasticsearch-ci/default-distro

@davidkyle davidkyle merged commit 7978f0b into elastic:master Jan 22, 2020
@davidkyle davidkyle deleted the results-retention branch January 22, 2020 10:08
davidkyle added a commit that referenced this pull request Jan 22, 2020
…estamps (#51061) (#51301)

The retention period is calculated relative to the last bucket result or snapshot
time rather than wall clock
@droberts195 droberts195 changed the title Calculate results and snapshot retention using latest bucket timestamps Calculate results and model snapshot retention using latest bucket timestamps May 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants