Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Flawed logic for when autodetect writes state/quantiles on graceful close #393

Closed
droberts195 opened this issue Feb 13, 2019 · 0 comments · Fixed by #437
Closed

[ML] Flawed logic for when autodetect writes state/quantiles on graceful close #393

droberts195 opened this issue Feb 13, 2019 · 0 comments · Fixed by #437

Comments

@droberts195
Copy link
Contributor

When the autodetect process is shut down gracefully it does the following:

  1. Writes the latest quantiles to its output unconditionally.
  2. Persists state if at least one input record was observed while it was running.

Both these conditions have problems:

  1. If a job is opened and closed with nothing else in between then it unnecessarily writes quantiles on close, causing a renormalization on the Java side. This leads to problems like [CI] Had to resort to force-closing job, something went wrong? elasticsearch#30300.
  2. If a job is opened, time is advanced causing results to be generated, then it is closed, then it does not persist state, thus losing the knowledge that time was advanced over a period in which there was no input data.

Both quantiles and state should be output on graceful close if and only if one or more of the following conditions is true:

  1. An input record has been processed.
  2. Time has been advanced.

Once this is fixed, state will be output more eagerly than it is now and quantiles less eagerly.

@edsavage edsavage self-assigned this Feb 25, 2019
edsavage added a commit to edsavage/ml-cpp that referenced this issue Mar 8, 2019
Changed the logic surrounding persistence of both state and quantiles on
graceful shutdown so that persistence only occurs if and only if at
least one input record has been processed or time has been advanced.

closes elastic#393
edsavage added a commit that referenced this issue Mar 18, 2019
Changed the logic surrounding persistence of both state and quantiles on
graceful shutdown so that persistence only occurs if and only if at
least one input record has been processed or time has been advanced.

closes #393
edsavage added a commit to edsavage/ml-cpp that referenced this issue Mar 18, 2019
Changed the logic surrounding persistence of both state and quantiles on
graceful shutdown so that persistence only occurs if and only if at
least one input record has been processed or time has been advanced.

closes elastic#393
edsavage added a commit to edsavage/ml-cpp that referenced this issue Mar 18, 2019
Changed the logic surrounding persistence of both state and quantiles on
graceful shutdown so that persistence only occurs if and only if at
least one input record has been processed or time has been advanced.

closes elastic#393
edsavage added a commit to edsavage/ml-cpp that referenced this issue Mar 18, 2019
Changed the logic surrounding persistence of both state and quantiles on
graceful shutdown so that persistence only occurs if and only if at
least one input record has been processed or time has been advanced.

closes elastic#393
edsavage added a commit to edsavage/elasticsearch that referenced this issue Mar 20, 2019
Additional checks have been added to exercise the behaviour of
persistence on graceful close of an anomaly job. In particular:

 - check that persistence does not occur for a job that is opened and
 then immediately closed, with nothing else having happened.
 - check that persistence occurs on graceful close of a job if it has
 processed data.
 - check that persistence occurs subsequent to time being manually
 advanced - even if no additional data has been seen by the job
 - check the edge case where persistence occurs if a job is opened, time
 is manually advanced and then the job is closed, having seen no data.

 Related to elastic/ml-cpp#393
edsavage added a commit to elastic/elasticsearch that referenced this issue Mar 21, 2019
Additional checks have been added to exercise the behaviour of
persistence on graceful close of an anomaly job. In particular:

 - check that persistence does not occur for a job that is opened and
 then immediately closed, with nothing else having happened.
 - check that persistence occurs on graceful close of a job if it has
 processed data.
 - check that persistence occurs subsequent to time being manually
 advanced - even if no additional data has been seen by the job
 - check the edge case where persistence occurs if a job is opened, time
 is manually advanced and then the job is closed, having seen no data.

 Related to elastic/ml-cpp#393
edsavage added a commit to edsavage/elasticsearch that referenced this issue Mar 21, 2019
Additional checks have been added to exercise the behaviour of
persistence on graceful close of an anomaly job. In particular:

 - check that persistence does not occur for a job that is opened and
 then immediately closed, with nothing else having happened.
 - check that persistence occurs on graceful close of a job if it has
 processed data.
 - check that persistence occurs subsequent to time being manually
 advanced - even if no additional data has been seen by the job
 - check the edge case where persistence occurs if a job is opened, time
 is manually advanced and then the job is closed, having seen no data.

 Related to elastic/ml-cpp#393
edsavage added a commit to edsavage/elasticsearch that referenced this issue Mar 21, 2019
Additional checks have been added to exercise the behaviour of
persistence on graceful close of an anomaly job. In particular:

 - check that persistence does not occur for a job that is opened and
 then immediately closed, with nothing else having happened.
 - check that persistence occurs on graceful close of a job if it has
 processed data.
 - check that persistence occurs subsequent to time being manually
 advanced - even if no additional data has been seen by the job
 - check the edge case where persistence occurs if a job is opened, time
 is manually advanced and then the job is closed, having seen no data.

 Related to elastic/ml-cpp#393
edsavage added a commit to elastic/elasticsearch that referenced this issue Mar 21, 2019
Additional checks to exercise the behaviour of
persistence on graceful close of an anomaly job.

Related to elastic/ml-cpp#393
Backports #40272
edsavage added a commit to elastic/elasticsearch that referenced this issue Mar 21, 2019
Additional checks to exercise the behaviour of
persistence on graceful close of an anomaly job

Related to elastic/ml-cpp#393
Backports #40272
edsavage added a commit to elastic/elasticsearch that referenced this issue Mar 21, 2019
Additional checks to exercise the behaviour of
persistence on graceful close of an anomaly job.

Related to elastic/ml-cpp#393
Backports #40272
edsavage added a commit to edsavage/ml-cpp that referenced this issue Feb 25, 2021
The collection of static program counters is cached prior to
persistence. This provides the background persistence thread access to a
consistent set of counters as they are being written.

As it is desired to only persist the program counters the once for each
model state snapshot, their persistence, and the clearing of the cache,
is coupled to the persistence of the simple count detector, which is
assumed to always exist.

However there is a scenario where persistence operates on an empty
collection of detectors. This occurs when no data has been seen but time
has advanced (see elastic#393 for more details). In this situation the program
counter cache is populated but not cleared. A subsequent persistence
operation will lead to a warning that the counter cache is being
overwritten.

To avoid the warning message, this PR takes the approach of ensuring
that the program counter cache is always cleared at the end of the
persistence operation, regardless of its success or not.
edsavage added a commit that referenced this issue Mar 1, 2021
The collection of static program counters is cached prior to
persistence. This provides the background persistence thread access to a
consistent set of counters as they are being written.

As it is desired to only persist the program counters the once for each
model state snapshot, their persistence, and the clearing of the cache,
is coupled to the persistence of the simple count detector, which is
assumed to always exist.

However there is a scenario where persistence operates on an empty
collection of detectors. This occurs when no data has been seen but time
has advanced (see #393 for more details). In this situation the program
counter cache is populated but not cleared. A subsequent persistence
operation will lead to a warning that the counter cache is being
overwritten.

To avoid the warning message, we take the approach of ensuring
that the program counter cache is always cleared at the end of the
persistence operation, regardless of its success or not.
edsavage added a commit to edsavage/ml-cpp that referenced this issue Mar 1, 2021
The collection of static program counters is cached prior to
persistence. This provides the background persistence thread access to a
consistent set of counters as they are being written.

As it is desired to only persist the program counters the once for each
model state snapshot, their persistence, and the clearing of the cache,
is coupled to the persistence of the simple count detector, which is
assumed to always exist.

However there is a scenario where persistence operates on an empty
collection of detectors. This occurs when no data has been seen but time
has advanced (see elastic#393 for more details). In this situation the program
counter cache is populated but not cleared. A subsequent persistence
operation will lead to a warning that the counter cache is being
overwritten.

To avoid the warning message, we take the approach of ensuring
that the program counter cache is always cleared at the end of the
persistence operation, regardless of its success or not.
edsavage added a commit that referenced this issue Mar 1, 2021
The collection of static program counters is cached prior to
persistence. This provides the background persistence thread access to a
consistent set of counters as they are being written.

As it is desired to only persist the program counters the once for each
model state snapshot, their persistence, and the clearing of the cache,
is coupled to the persistence of the simple count detector, which is
assumed to always exist.

However there is a scenario where persistence operates on an empty
collection of detectors. This occurs when no data has been seen but time
has advanced (see #393 for more details). In this situation the program
counter cache is populated but not cleared. A subsequent persistence
operation will lead to a warning that the counter cache is being
overwritten.

To avoid the warning message, we take the approach of ensuring
that the program counter cache is always cleared at the end of the
persistence operation, regardless of its success or not.

Backports #1774
2lambda123 pushed a commit to 2lambda123/elastic-elasticsearch that referenced this issue May 2, 2024
Additional checks have been added to exercise the behaviour of
persistence on graceful close of an anomaly job. In particular:

 - check that persistence does not occur for a job that is opened and
 then immediately closed, with nothing else having happened.
 - check that persistence occurs on graceful close of a job if it has
 processed data.
 - check that persistence occurs subsequent to time being manually
 advanced - even if no additional data has been seen by the job
 - check the edge case where persistence occurs if a job is opened, time
 is manually advanced and then the job is closed, having seen no data.

 Related to elastic/ml-cpp#393
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants