-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PR to filter big jumps even if all segments are in clusters #897
Commits on Jan 23, 2023
-
Add a new unit test for the big jumps caused when all smoothing segme…
…nts are clusters Once the actual issue is addressed, this will fix e-mission/e-mission-docs#843 For now, we load the location dataframes for the two use cases and verify that the returned values are the ones in the current implementation. Procedure: - Perturb the location points in the original use cases to avoid leaking information - Load the location points into the test case - Run the filtering code - Verify that the output is consistent with e-mission/e-mission-docs#843 (comment) e-mission/e-mission-docs#843 (comment) Also change the location smoothing code from `logging.info` to `logging.exception` so that we can see where the error is in a more meaningful way Testing done: - Test passes ``` ---------------------------------------------------------------------- Ran 1 test in 0.387s ``` Note that due to the perturbation of the location points, the outliers no longer perfectly match the original use case, but are close enough ``` 2023-01-22 22:37:57,262:INFO:4634275328:After first round, still have outliers accuracy altitude ... distance speed 17 70.051 88.551857 ... 8.468128e+06 50922.935508 26 3.778 66.404068 ... 8.467873e+06 2878.645674 49 3.900 72.118635 ... 4.673209e+00 2.336605 2023-01-22 22:37:57,308:INFO:4634275328:After first round, still have outliers Unnamed: 0 accuracy altitude ... heading distance speed 14 14 5.638 470.899994 ... 88.989357 1.113137e+07 284923.028227 ```
Configuration menu - View commit details
-
Copy full SHA for 434a9a1 - Browse repository at this point
Copy the full SHA 434a9a1View commit details -
Change the assertion checks to use the row index instead of the id
To make it easier to debug in case there are errors
Configuration menu - View commit details
-
Copy full SHA for 7d44d63 - Browse repository at this point
Copy the full SHA 7d44d63View commit details
Commits on Jan 24, 2023
-
Implement a backup algorithm in case the first zigzag algo does not work
- Since we have already implemented many different smoothing algorithms, we pick POSDAP to use as backup - if we still have outliers after the first round, and the max value is over MACH1, we fall back to the backup algo - after implementing the backup algo, if we don't have outliers, the backup algo has succeeded and we use its results - if we do have outliers, but the max value is under MACH1, the backup algo has succeeded and we use its results - if we have outliers, and the max is high (> MACH1) the backup algo has failed With this change, both the tests also change to the correctly deleted values - [16 17 18 19 20] for use case 1 (e-mission/e-mission-docs#843 (comment)) - [11] for use case 2 (e-mission/e-mission-docs#843 (comment)) In this commit, we also check in the csv data files for the two test cases
Configuration menu - View commit details
-
Copy full SHA for 67f5c86 - Browse repository at this point
Copy the full SHA 67f5c86View commit details -
Move the first round check and the backup algo code to the location s…
…moothing file This addresses a long-term TODO https://github.com/e-mission/e-mission-server/blob/master/emission/analysis/intake/cleaning/cleaning_methods/jump_smoothing.py#L262 It also: - ensures that the individual algorithms are clean and modular and don't depend on other algorithms - we can swap in any algorithm for the backup algo - we can support more complex backups in the future Testing done: - modified the test to pass in the backup algo - tests pass
Configuration menu - View commit details
-
Copy full SHA for cebb81f - Browse repository at this point
Copy the full SHA cebb81fView commit details -
Added unit test for
None
backup algo + unify algo outputsAdded a new unit test for the case of `backup_algo == None`, which should return the original algo results. While testing, found that the ZigZag algo returns a pandas Series, while the Posdap algo returns a numpy array, which means that combining them could be problematic Changed ZigZag to also return a numpy array to unify the implementations. Testing done: - All tests now pass
Configuration menu - View commit details
-
Copy full SHA for 0f1b24a - Browse repository at this point
Copy the full SHA 0f1b24aView commit details -
🎨 Return and record the selected algo correctly
Before this change, we only used one algorithm, so we hardcoded it into the result. However, we can now use either the main algorithm or the backup algorithm. So we return the algo also from `get_points_to_filter` and attribute it correctly. `get_points_to_filter` is used only in `location_smoothing` and in the tests. So also fix the tests to read both values and check the sel algo in each case Testing done: tests pass
Configuration menu - View commit details
-
Copy full SHA for 988871d - Browse repository at this point
Copy the full SHA 988871dView commit details -
- Unify algo outputs: `self.inlier_mask_ = self.inlier_mask_.to_numpy()` - remove `to_numpy()` from all the checks in the tests - Return two outputs -> `return (None, None)` Testing done: - All tests in this file pass
Configuration menu - View commit details
-
Copy full SHA for 95f88c5 - Browse repository at this point
Copy the full SHA 95f88c5View commit details
Commits on Jan 25, 2023
-
Fix regression caused by moving the second round checking out
When we moved the second round checks to the calling function in cebb81f we caused a very subtle regression The filtering code had an early return if there were no jumps detected. So in that case, we would not try the second round of checks, or attempt to filter again. However, when we moved the second round checking to the outer function, we called the second round anyway even if the first round didn't detect any jumps And in this one case, we actually found an outlier in the second round, which caused the test to fail. Fixed by checking to see if there were no outliers in the first round and skipping the second round check in that case. Everything in the `else` for the `if outlier_arr[0].shape[0] == 0:` is unchanged, just moved in a bit, not changed. The check for the length was unexpectedly complicated and took many hours to debug, so I added it as a simple use case. Note also that it is not clear if this is the correct long-term approach. If there were no jumps, then why did using the backup change anything? Maybe we should always use the backup. But changing this to avoid the regression for now; will look at this the next time we look at smoothing Testing done: - `TestPipelineRealData.testIosJumpsAndUntrackedSquishing` passes - `TestLocationSmoothing` passes
Configuration menu - View commit details
-
Copy full SHA for 5a4ae3d - Browse repository at this point
Copy the full SHA 5a4ae3dView commit details -
🔥 Remove unused function and extraneous logs
`get_filtered_points` is not used anywhere else we don't need to print out the series and the numpy version any more now that we have added the unit test in 5a4ae3d
Configuration menu - View commit details
-
Copy full SHA for 29e78de - Browse repository at this point
Copy the full SHA 29e78deView commit details