Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR to filter big jumps even if all segments are in clusters #897

Merged
merged 9 commits into from
Jan 25, 2023

Commits on Jan 23, 2023

  1. Add a new unit test for the big jumps caused when all smoothing segme…

    …nts are clusters
    
    Once the actual issue is addressed, this will fix
    e-mission/e-mission-docs#843
    
    For now, we load the location dataframes for the two use cases and verify that
    the returned values are the ones in the current implementation.
    
    Procedure:
    - Perturb the location points in the original use cases to avoid leaking information
    - Load the location points into the test case
    - Run the filtering code
    - Verify that the output is consistent with
    e-mission/e-mission-docs#843 (comment)
    e-mission/e-mission-docs#843 (comment)
    
    Also change the location smoothing code from `logging.info` to
    `logging.exception` so that we can see where the error is in a more meaningful way
    
    Testing done:
    - Test passes
    
    ```
    ----------------------------------------------------------------------
    Ran 1 test in 0.387s
    ```
    
    Note that due to the perturbation of the location points, the outliers no
    longer perfectly match the original use case, but are close enough
    
    ```
    2023-01-22 22:37:57,262:INFO:4634275328:After first round, still have outliers     accuracy   altitude  ...      distance         speed
    17    70.051  88.551857  ...  8.468128e+06  50922.935508
    26     3.778  66.404068  ...  8.467873e+06   2878.645674
    49     3.900  72.118635  ...  4.673209e+00      2.336605
    
    2023-01-22 22:37:57,308:INFO:4634275328:After first round, still have outliers     Unnamed: 0  accuracy    altitude  ...    heading      distance          speed
    14          14     5.638  470.899994  ...  88.989357  1.113137e+07  284923.028227
    
    ```
    shankari committed Jan 23, 2023
    Configuration menu
    Copy the full SHA
    434a9a1 View commit details
    Browse the repository at this point in the history
  2. Change the assertion checks to use the row index instead of the id

    To make it easier to debug in case there are errors
    shankari committed Jan 23, 2023
    Configuration menu
    Copy the full SHA
    7d44d63 View commit details
    Browse the repository at this point in the history

Commits on Jan 24, 2023

  1. Implement a backup algorithm in case the first zigzag algo does not work

    - Since we have already implemented many different smoothing algorithms, we
      pick POSDAP to use as backup
    - if we still have outliers after the first round, and the max value is over
      MACH1, we fall back to the backup algo
    - after implementing the backup algo, if we don't have outliers,
      the backup algo has succeeded and we use its results
    - if we do have outliers, but the max value is under MACH1,
      the backup algo has succeeded and we use its results
    - if we have outliers, and the max is high (> MACH1)
      the backup algo has failed
    
    With this change, both the tests also change to the correctly deleted values
    - [16 17 18 19 20] for use case 1 (e-mission/e-mission-docs#843 (comment))
    - [11] for use case 2 (e-mission/e-mission-docs#843 (comment))
    
    In this commit, we also check in the csv data files for the two test cases
    shankari committed Jan 24, 2023
    Configuration menu
    Copy the full SHA
    67f5c86 View commit details
    Browse the repository at this point in the history
  2. Move the first round check and the backup algo code to the location s…

    …moothing file
    
    This addresses a long-term TODO
    https://github.com/e-mission/e-mission-server/blob/master/emission/analysis/intake/cleaning/cleaning_methods/jump_smoothing.py#L262
    
    It also:
    - ensures that the individual algorithms are clean and modular and don't depend on other algorithms
    - we can swap in any algorithm for the backup algo
    - we can support more complex backups in the future
    
    Testing done:
    - modified the test to pass in the backup algo
    - tests pass
    shankari committed Jan 24, 2023
    Configuration menu
    Copy the full SHA
    cebb81f View commit details
    Browse the repository at this point in the history
  3. Added unit test for None backup algo + unify algo outputs

    Added a new unit test for the case of `backup_algo == None`, which should
    return the original algo results.
    
    While testing, found that the ZigZag algo returns a pandas Series,
    while the Posdap algo returns a numpy array, which means that combining them
    could be problematic
    
    Changed ZigZag to also return a numpy array to unify the implementations.
    Testing done:
    - All tests now pass
    shankari committed Jan 24, 2023
    Configuration menu
    Copy the full SHA
    0f1b24a View commit details
    Browse the repository at this point in the history
  4. 🎨 Return and record the selected algo correctly

    Before this change, we only used one algorithm, so we hardcoded it into the
    result. However, we can now use either the main algorithm or the backup
    algorithm. So we return the algo also from `get_points_to_filter` and attribute
    it correctly.
    
    `get_points_to_filter` is used only in `location_smoothing` and in the tests.
    So also fix the tests to read both values and check the sel algo in each case
    
    Testing done: tests pass
    shankari committed Jan 24, 2023
    Configuration menu
    Copy the full SHA
    988871d View commit details
    Browse the repository at this point in the history
  5. Fix regressions in tests

    - Unify algo outputs: `self.inlier_mask_ = self.inlier_mask_.to_numpy()`
        - remove `to_numpy()` from all the checks in the tests
    - Return two outputs -> `return (None, None)`
    
    Testing done:
    - All tests in this file pass
    shankari committed Jan 24, 2023
    Configuration menu
    Copy the full SHA
    95f88c5 View commit details
    Browse the repository at this point in the history

Commits on Jan 25, 2023

  1. Fix regression caused by moving the second round checking out

    When we moved the second round checks to the calling function in
    cebb81f
    we caused a very subtle regression
    
    The filtering code had an early return if there were no jumps detected.
    So in that case, we would not try the second round of checks, or attempt to
    filter again.
    
    However, when we moved the second round checking to the outer function, we
    called the second round anyway even if the first round didn't detect any jumps
    And in this one case, we actually found an outlier in the second round, which
    caused the test to fail.
    
    Fixed by checking to see if there were no outliers in the first round and
    skipping the second round check in that case.
    
    Everything in the `else` for the
    `if outlier_arr[0].shape[0] == 0:` is unchanged, just moved in a bit, not changed.
    
    The check for the length was unexpectedly complicated and took many hours to
    debug, so I added it as a simple use case.
    
    Note also that it is not clear if this is the correct long-term approach.
    If there were no jumps, then why did using the backup change anything?
    Maybe we should always use the backup.
    
    But changing this to avoid the regression for now; will look at this the next
    time we look at smoothing
    
    Testing done:
    - `TestPipelineRealData.testIosJumpsAndUntrackedSquishing` passes
    - `TestLocationSmoothing` passes
    shankari committed Jan 25, 2023
    Configuration menu
    Copy the full SHA
    5a4ae3d View commit details
    Browse the repository at this point in the history
  2. 🔥 Remove unused function and extraneous logs

    `get_filtered_points` is not used anywhere else
    we don't need to print out the series and the numpy version any more now that we have added the unit test in
    5a4ae3d
    shankari committed Jan 25, 2023
    Configuration menu
    Copy the full SHA
    29e78de View commit details
    Browse the repository at this point in the history