-
Notifications
You must be signed in to change notification settings - Fork 73
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Improve HCAD cold start ΔT is the length of the interval in minutes. A few tenets: *If ΔT is small then we can wait for real data to fill up to 32 + shingleSize *if ΔT is large then either user has a lot of data (and we can probe it slowly, or have large clusters) or the only option is to wait ... *if ΔT divides 1440 then there is a meaning to a “day” — this is a likely common case. *Current solution in prod uses linear interpolation — it will take significant work in changing that. Given that, it makes sense to embrace linearity all the way for missing data as well. *If we have 32 + shingleSize (hopefully recent) values, RCF can get up and running. It will be noisy — there is a reason that default size is 256 (+ shingle size), but it may be more useful for people to start seeing some results. We have two parameters : numberOfSamples and strideLength — *We probe numberOfSamples + 1 values at strideLength * ΔT gap *For each consecutive value present we interpolate into strideLength number of pieces. 0, as returned by the engine should constitute a valid answer, “null” is a missing answer — it may be that 0 is meaningless in some cases, but 0 is also meaningful in some cases. It may be that the query defining the metric is ill-formed, but that cannot be solved by the cold-start strategy of the AD plugin — if we attempt to do that, we will have issues with legitimate interpretations of 0. *For the missing entries we use linear interpolation as well. Denote the Samples S0, S1, ... as samples in reverse order of time. Each [Si,Si−1]corresponds to strideLength * ΔTgap. If we get samples for S0, S1, S4 (both S2 and S3 are missing) then we interpolate the [S4,S1] into 3*strideLength pieces. *If the above provides (32+shingleSize) points (note that if S0 is missing or all Sif or some i > N is missing then we would miss a lot of points — but the points we will get are contiguous based on the suggestion) then we have a model. Otherwise we issue another round of query — if there is any sample in the second round then we would have 32 + shingleSize points. If there is no sample in the second round then we should wait for real data. *If there is no data — there is ultimately nothing that can be done. How to set numberOfSamples and strideLength? *Suppose ΔT≤30 and divides 60. Then set numberOfSamples = ceil ( (shingleSize + 32)/ 24 )*24 and strideLength = 60/ΔT. Note that if there is enough data — we may have a lot more than shingleSize+32 points — which is only good. *Set numberOfSamples = (shingleSize + 32) and strideLength = 1. This should be an uncommon case, but if someone wants a 23 minutes interval — and the system permits -- let's give it to them. Note the smallest ΔT that does not divide 60 is 7 which is quite large to wait for one data point. Testing done: 1. added precision tests and various unit tests to cover changes. 2. Manually verified HCAD cold start does not break.
- Loading branch information
Showing
19 changed files
with
1,199 additions
and
303 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.