Calculate results and model snapshot retention using latest bucket timestamps #51061

davidkyle · 2020-01-15T17:54:29Z

The retention period is calculated relative to the last bucket result or model snapshot. For example if results retention is set to 30 days and the last bucket result was from 10 days ago then only results older than 40 days will be deleted. The same logic applies to model snapshots, but measured from the timestamp of the most recent model snapshot.

Previously retention was calculated relative to wall clock time, which was surprising for jobs that were not running continuously.

There is still an element of confusion, because model snapshot timestamps record the wall clock time that the model snapshot was created, not the model time. However, this change is still an improvement in that if you stop a real-time job for a period of days then you don't lose all model snapshots other than the active one.

elasticmachine · 2020-01-15T17:54:31Z

Pinging @elastic/ml-core (:ml)

davidkyle · 2020-01-15T18:03:19Z

...rc/test/java/org/elasticsearch/xpack/ml/job/retention/ExpiredModelSnapshotsRemoverTests.java

-        terminate(threadPool);
-    }
-
-    public void testRemove_GivenJobsWithoutRetentionPolicy() throws IOException {


There is no way to create a job without a model snapshot retention policy. If you set modelSnapshotRetentionDays to null then the builder will automatically default it back to 1 when the job is read back from xcontent.

Negative numbers are not tolerated and throw a validation exception so the only way of not having a retention policy is setting it to a large value

davidkyle · 2020-01-16T10:36:35Z

docs/reference/ml/ml-shared.asciidoc

-The time in days that model snapshots are retained for the job. Older snapshots
-are deleted. The default value is `1`, which means snapshots are retained for
-one day (twenty-four hours).
+Advanced configuration option. Denotes the period for which model snapshots


@szabosteve can you look over these docs changes please. Do they make sense?

@davidkyle Sorry, I haven't noticed this one earlier.

szabosteve

The docs changes LGTM. Thank you for the documentation effort!

droberts195 · 2020-01-20T14:03:34Z

...lti-node-tests/src/test/java/org/elasticsearch/xpack/ml/integration/DeleteExpiredDataIT.java

-        // We are going to create data for last 2 days
-        long nowMillis = System.currentTimeMillis();
+        // We are going to create 2 days of data starting 24 hrs ago
+        long lastestBucketTime = System.currentTimeMillis() - TimeValue.timeValueHours(1).millis();


typo: lastest -> latest

droberts195 · 2020-01-20T14:09:08Z

...lti-node-tests/src/test/java/org/elasticsearch/xpack/ml/integration/DeleteExpiredDataIT.java

@@ -57,15 +57,15 @@ public void setUpData() throws IOException {
                .setMapping("time", "type=date,format=epoch_millis")
                .get();

-        // We are going to create data for last 2 days
-        long nowMillis = System.currentTimeMillis();
+        // We are going to create 2 days of data starting 24 hrs ago


Suggested change

// We are going to create 2 days of data starting 24 hrs ago

// We are going to create 3 days of data ending 1 hour ago

droberts195 · 2020-01-20T14:17:54Z

...ml/src/main/java/org/elasticsearch/xpack/ml/job/retention/AbstractExpiredJobDataRemover.java

        long nowEpochMs = Instant.now(Clock.systemDefaultZone()).toEpochMilli();
-        return nowEpochMs - new TimeValue(retentionDays, TimeUnit.DAYS).getMillis();
+        listener.onResponse(nowEpochMs - new TimeValue(retentionDays, TimeUnit.DAYS).getMillis());
    }


Could this method be abstract instead of providing a default based on wall clock time? It seems that now we've made deletion of model snapshots and results relative to latest bucket time rather than wall clock time we should do that for all job related documents that have a timestamp. So having a default implementation of this method that uses wall clock time just seems like a way that we'll introduce a bug by accidentally deleting some other type of document based on wall clock time.

droberts195 · 2020-01-20T14:35:04Z

...c/test/java/org/elasticsearch/xpack/ml/job/retention/AbstractExpiredJobDataRemoverTests.java

-            ActionListener<SearchResponse> listener = (ActionListener<SearchResponse>) invocationOnMock.getArguments()[2];
-            listener.onResponse(response);
-            return null;
+            doAnswer(invocationOnMock -> {


Looks like this is indented more than the line above.

droberts195 · 2020-01-20T14:48:14Z

...ml/src/main/java/org/elasticsearch/xpack/ml/job/retention/AbstractExpiredJobDataRemover.java

    }

    private WrappedBatchedJobsIterator newJobIterator() {
        BatchedJobsIterator jobsIterator = new BatchedJobsIterator(client, AnomalyDetectorsIndex.configIndexName());
        return new WrappedBatchedJobsIterator(jobsIterator);
    }

-    private long calcCutoffEpochMs(long retentionDays) {
+    void calcCutoffEpochMs(String jobId, long retentionDays, ActionListener<Long> listener) {


The other methods that extending classes are expected to override are protected, even though the actual classes that do extend this class are all in the same package. But then this class's constructor is package private so it would be impossible to have a derived class in another package despite the abstract methods being set up for that. I think for consistency they should all be the same - either protected or package private. Certainly with a default implementation here I think protected makes it clearer that we expect derived classes to modify it rather than it's just been made accessible for testing. But then if you agree with my other suggestion and make this abstract then that also makes that clear.

The method is package-private for testing otherwise it is very difficult to test and can only be done indirectly. abstract makes sense.

But protected would also allow it to be tested directly. Basically I think all the abstract methods and the constructor should have the same accessibility, whether that be protected or package private. So either change the ones that are currently protected to be package private or change this one plus the constructor to be protected.

Given that the base class is package private, only classes in the same package can implement the abstract method whether the method is protected or package private. The only difference is I could derive a new class from one of the package's public non-abstract classes and reimplement calcCutoffEpochMs in a different package if it was protected but not if package private. In practice this isn't a concern so I've gone for the principle of least visibility and made the abstract methods package private.

droberts195

LGTM

I saw a couple of nits but happy to merge without further review

droberts195 · 2020-01-20T17:39:08Z

...ml/src/main/java/org/elasticsearch/xpack/ml/job/retention/AbstractExpiredJobDataRemover.java


-    protected abstract Long getRetentionDays(Job job);
+    abstract Long getRetentionDays(Job job);

    /**
     * Template method to allow implementation details of various types of data (e.g. results, model snapshots).


The next two methods (removeDataBefore and createQuery) might as well be package private too for consistency with the other abstract methods.

droberts195 · 2020-01-20T17:41:49Z

...lti-node-tests/src/test/java/org/elasticsearch/xpack/ml/integration/DeleteExpiredDataIT.java

@@ -57,15 +57,15 @@ public void setUpData() throws IOException {
                .setMapping("time", "type=date,format=epoch_millis")
                .get();

-        // We are going to create data for last 2 days
-        long nowMillis = System.currentTimeMillis();
+        // We are going to create 3 days of data starting 1 hr ago


Suggested change

// We are going to create 3 days of data starting 1 hr ago

// We are going to create 3 days of data ending 1 hr ago

docs/reference/ml/ml-shared.asciidoc

Co-Authored-By: Lisa Cawley <[email protected]>

docs/reference/ml/ml-shared.asciidoc

Co-Authored-By: Lisa Cawley <[email protected]>

davidkyle · 2020-01-21T10:05:47Z

Thanks for the rewrites @lcawl

davidkyle · 2020-01-21T15:06:42Z

run elasticsearch-ci/default-distro

…estamps (#51061) (#51301) The retention period is calculated relative to the last bucket result or snapshot time rather than wall clock

davidkyle added WIP :ml Machine learning v8.0.0 labels Jan 15, 2020

davidkyle commented Jan 15, 2020

View reviewed changes

droberts195 changed the title ~~Calculate results and snapshot retention using latest timestamps~~ Calculate results and snapshot retention using latest bucket timestamps Jan 16, 2020

davidkyle force-pushed the results-retention branch from fdd99d3 to d1fc8f8 Compare January 16, 2020 10:35

davidkyle added v7.7.0 and removed WIP labels Jan 16, 2020

davidkyle commented Jan 16, 2020

View reviewed changes

davidkyle added 6 commits January 16, 2020 13:47

Calculate the results retention period based on the latest bucket time

7c0161a

Define retention period in docs

7ca8db9

Start expired snapshots

f620d6b

Adapt for origin setting client

6c56dda

Fix the tests

11fa81b

Rework docs

aee2d13

davidkyle force-pushed the results-retention branch from d1fc8f8 to aee2d13 Compare January 16, 2020 13:48

davidkyle added the >bug label Jan 16, 2020

szabosteve approved these changes Jan 20, 2020

View reviewed changes

droberts195 reviewed Jan 20, 2020

View reviewed changes

davidkyle added 2 commits January 20, 2020 17:00

Address review comments

498f5b0

Make package-private

580c517

droberts195 approved these changes Jan 20, 2020

View reviewed changes

nits

e7724b9