Trip Segmentation Optimization #1014

JGreenlee · 2025-01-30T19:26:36Z

No description provided.

I noticed that in trip_segmentation, we query the user timeseries for background/filtered_location. Then we call segment_into_trips which performs the same query on the same timeseries with the same timequery. Passing the loc_df as an argument eliminates the need for this query

restart_checking has 2 functions that are called repeatedly from both segmentation filter classes: `is_tracking_restarted_in_range` and `get_ongoing_motion_in_range`. Both of these functions performed a DB query (for `statemachine/transition` and `background/motion_activity` entries, respectively) In the case of the former, we already have all the `statemachine/transition` in the current processing time range kept in memory as a dataframe. This can be passed as an (optional) argument; we just have to then filter it down to the range of start_ts -> end_ts. We do not have all the `background/motion_activity` kept as a dataframe but we can load them all at once and filter down later, just as we do with the`statemachine/transition`. This will use more memory but I think it is likely still more efficient than multiple queries because DB calls are a bottleneck on production. I will investigate the effect of these changes further, but by my inital estimates, this drastically reduces the number of DB queries during trip segmentation (by a magnitude of ~100)

shankari

This change seems fine; it is a pretty straightforward change in which we pre-load a bunch of values and then re-use them instead of loading them lazily.

I am fine with merging this so we can see the impact on the DB queries.

As part of cleanup, I would like to see:

what was justification for this change? In particular, what were the timing results from our sample programs (stage, ccebike, smart commute, stm-community) that led you to focus on these areas?
I see that we are querying for loc_df in trip_segmentation and passing it in the time/distance filters. And then we are querying for transition and motion activity inside segment_into_trips. Why? It seems like we can just read all input data from that time range and pass it into segment_into_trips at the same time, in three different dataframes. This will not have an impact on performance, but it is cleaner and easier to understand.

shankari · 2025-02-07T06:51:49Z

I checked staging, and this seems to be working correctly. The logs show a successful run

2025-02-06 03:09:22,152:INFO:140272741308224:For stage PipelineStages.TRIP_SEGMENTATION, start_ts is None
2025-02-06 03:09:23,516:INFO:140272741308224:For stage PipelineStages.TRIP_SEGMENTATION, last_ts_processed = 2025-02-06T02:16:41.052000

2025-02-06 05:09:04,586:INFO:140433989416768:For stage PipelineStages.TRIP_SEGMENTATION, start_ts = 2025-02-06T02:16:41.052000
2025-02-06 05:09:10,450:INFO:140433989416768:For stage PipelineStages.TRIP_SEGMENTATION, last_ts_processed = 2025-02-06T04:59:49.876000

And I see two composite trips retrieved by the UI

10623,1738860615.793,2025-02-06T08:50:15.793000-08:00,"js : getRawEntries, args: [[""analysis/composite_trip""],1738810605.3665705,1738817817.996,""data.end_ts""];
      
got 2 entries"
10624,1738860615.798,2025-02-06T08:50:15.798000-08:00,"js : Timeline: readCompositePromise resolved with 2 trips; 
      readUnprocessedPromise resolved with 0 trips"

I can't see them in the UI because of e-mission/e-mission-docs#1108

@JGreenlee can you verify that you saw the trips from yesterday (Wednesday) in the UI?

shankari · 2025-02-07T06:54:30Z

@TeachMeTW as an aside, while evaluating this fix, I notice that we store the has_trip_ended stat for every point considered. Given that, we may want to add them up in addition to looking at the individual checks

JGreenlee · 2025-02-07T16:15:35Z

@JGreenlee can you verify that you saw the trips from yesterday (Wednesday) in the UI?

Yes, it is there and processed on all the phones I brought

TeachMeTW · 2025-02-07T18:50:02Z

@TeachMeTW as an aside, while evaluating this fix, I notice that we store the has_trip_ended stat for every point considered. Given that, we may want to add them up in addition to looking at the individual checks

~~@shankari to clarify, summarize how many times has_trip_ended was True?~~

@shankari @JGreenlee Is no longer using has_trip_ended on his segmentation optimization hence we believe this metric and flag is unneeded

JGreenlee force-pushed the segmentation_optimization branch from 740c75d to 6243b40 Compare January 30, 2025 19:59

JGreenlee added 3 commits January 30, 2025 15:33

update trip segmentation tests

11fd09e

JGreenlee force-pushed the segmentation_optimization branch from 6243b40 to 11fd09e Compare January 30, 2025 20:34

JGreenlee mentioned this pull request Jan 31, 2025

Pipeline Optimization Strategies e-mission/e-mission-docs#1105

Open

JGreenlee marked this pull request as ready for review January 31, 2025 22:06

shankari approved these changes Jan 31, 2025

View reviewed changes

shankari merged commit c941e5d into e-mission:master Jan 31, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trip Segmentation Optimization #1014

Trip Segmentation Optimization #1014

JGreenlee commented Jan 30, 2025

shankari left a comment

shankari commented Feb 7, 2025

shankari commented Feb 7, 2025

JGreenlee commented Feb 7, 2025

TeachMeTW commented Feb 7, 2025 •

edited

Loading

Trip Segmentation Optimization #1014

Trip Segmentation Optimization #1014

Conversation

JGreenlee commented Jan 30, 2025

shankari left a comment

Choose a reason for hiding this comment

shankari commented Feb 7, 2025

shankari commented Feb 7, 2025

JGreenlee commented Feb 7, 2025

TeachMeTW commented Feb 7, 2025 • edited Loading

TeachMeTW commented Feb 7, 2025 •

edited

Loading