Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR to filter big jumps even if all segments are in clusters #897

Merged
merged 9 commits into from
Jan 25, 2023

Conversation

shankari
Copy link
Contributor

…nts are clusters

Once the actual issue is addressed, this will fix
e-mission/e-mission-docs#843

For now, we load the location dataframes for the two use cases and verify that
the returned values are the ones in the current implementation.

Procedure:
- Perturb the location points in the original use cases to avoid leaking information
- Load the location points into the test case
- Run the filtering code
- Verify that the output is consistent with
e-mission/e-mission-docs#843 (comment)
e-mission/e-mission-docs#843 (comment)

Also change the location smoothing code from `logging.info` to
`logging.exception` so that we can see where the error is in a more meaningful way

Testing done:
- Test passes

```
----------------------------------------------------------------------
Ran 1 test in 0.387s
```

Note that due to the perturbation of the location points, the outliers no
longer perfectly match the original use case, but are close enough

```
2023-01-22 22:37:57,262:INFO:4634275328:After first round, still have outliers     accuracy   altitude  ...      distance         speed
17    70.051  88.551857  ...  8.468128e+06  50922.935508
26     3.778  66.404068  ...  8.467873e+06   2878.645674
49     3.900  72.118635  ...  4.673209e+00      2.336605

2023-01-22 22:37:57,308:INFO:4634275328:After first round, still have outliers     Unnamed: 0  accuracy    altitude  ...    heading      distance          speed
14          14     5.638  470.899994  ...  88.989357  1.113137e+07  284923.028227

```
To make it easier to debug in case there are errors
- Since we have already implemented many different smoothing algorithms, we
  pick POSDAP to use as backup
- if we still have outliers after the first round, and the max value is over
  MACH1, we fall back to the backup algo
- after implementing the backup algo, if we don't have outliers,
  the backup algo has succeeded and we use its results
- if we do have outliers, but the max value is under MACH1,
  the backup algo has succeeded and we use its results
- if we have outliers, and the max is high (> MACH1)
  the backup algo has failed

With this change, both the tests also change to the correctly deleted values
- [16 17 18 19 20] for use case 1 (e-mission/e-mission-docs#843 (comment))
- [11] for use case 2 (e-mission/e-mission-docs#843 (comment))

In this commit, we also check in the csv data files for the two test cases
…moothing file

This addresses a long-term TODO
https://github.com/e-mission/e-mission-server/blob/master/emission/analysis/intake/cleaning/cleaning_methods/jump_smoothing.py#L262

It also:
- ensures that the individual algorithms are clean and modular and don't depend on other algorithms
- we can swap in any algorithm for the backup algo
- we can support more complex backups in the future

Testing done:
- modified the test to pass in the backup algo
- tests pass
@shankari shankari changed the title PR to fix an issue where if all the segments in a section are clusters, we don't filter big jumps PR to filter big jumps even if all segments are in clusters Jan 24, 2023
Added a new unit test for the case of `backup_algo == None`, which should
return the original algo results.

While testing, found that the ZigZag algo returns a pandas Series,
while the Posdap algo returns a numpy array, which means that combining them
could be problematic

Changed ZigZag to also return a numpy array to unify the implementations.
Testing done:
- All tests now pass
Before this change, we only used one algorithm, so we hardcoded it into the
result. However, we can now use either the main algorithm or the backup
algorithm. So we return the algo also from `get_points_to_filter` and attribute
it correctly.

`get_points_to_filter` is used only in `location_smoothing` and in the tests.
So also fix the tests to read both values and check the sel algo in each case

Testing done: tests pass
- Unify algo outputs: `self.inlier_mask_ = self.inlier_mask_.to_numpy()`
    - remove `to_numpy()` from all the checks in the tests
- Return two outputs -> `return (None, None)`

Testing done:
- All tests in this file pass
When we moved the second round checks to the calling function in
cebb81f
we caused a very subtle regression

The filtering code had an early return if there were no jumps detected.
So in that case, we would not try the second round of checks, or attempt to
filter again.

However, when we moved the second round checking to the outer function, we
called the second round anyway even if the first round didn't detect any jumps
And in this one case, we actually found an outlier in the second round, which
caused the test to fail.

Fixed by checking to see if there were no outliers in the first round and
skipping the second round check in that case.

Everything in the `else` for the
`if outlier_arr[0].shape[0] == 0:` is unchanged, just moved in a bit, not changed.

The check for the length was unexpectedly complicated and took many hours to
debug, so I added it as a simple use case.

Note also that it is not clear if this is the correct long-term approach.
If there were no jumps, then why did using the backup change anything?
Maybe we should always use the backup.

But changing this to avoid the regression for now; will look at this the next
time we look at smoothing

Testing done:
- `TestPipelineRealData.testIosJumpsAndUntrackedSquishing` passes
- `TestLocationSmoothing` passes
`get_filtered_points` is not used anywhere else
we don't need to print out the series and the numpy version any more now that we have added the unit test in
5a4ae3d
@shankari shankari merged commit b7749d0 into e-mission:master Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Giant, obvious GPS jump was not filtered
1 participant