Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use labels to override trip mode #476

Open
st-patrick opened this issue Nov 28, 2019 · 43 comments
Open

use labels to override trip mode #476

st-patrick opened this issue Nov 28, 2019 · 43 comments
Labels
enhancement New feature or request

Comments

@st-patrick
Copy link
Contributor

We want to use the transportation mode label to actually impact the metrics shown inside the dashboard.

Since the labels are currently "decoration only", we need to implement some code that e.g. adds an endpoint that will basically do the same as the userMetrics endpoint but use the label as mode of transportation if a label has been applied to that trip.

My question being: in which file would I best start to look for a place to implement this feature?

@shankari
Copy link
Contributor

shankari commented Dec 4, 2019

@st-patrick the metrics code in general is here
https://github.com/e-mission/e-mission-server/tree/master/emission/analysis/result/metrics

simple_metrics.py is where we actually generate the metrics - it depends upon the mode_section_grouped_df dataframe being computed correctly.

https://github.com/e-mission/e-mission-server/blob/8747e2279393f05f86e6be58f94de77909a4d455/emission/analysis/result/metrics/time_grouping.py#L118
mode_grouped_df = section_group_df.groupby('sensed_mode')

And here's where we group the sections by the sensed mode.

@shankari
Copy link
Contributor

shankari commented Dec 4, 2019

Let me think through the best way to modify that, for the record.

  • we should be consistent with the rest of the pipeline, and create new sections analysis/confirmed_sections where the mode is set to the overridden value. Then the change to the metrics code is trivial, we would just set the analysis results to analysis/confirmed_section and everything would Just Work.
  • But then we need to figure out how and when to generate these analysis/confirmed_section objects. Note that the trips may not yet be confirmed when the section is analysed. In that case, we can:
    • generate analysis/confirmed_section objects only for sections that have been confirmed. This is not a great solution because it gets us back to the situation where we have to query for two kinds of objects and merge them. We could theoretically do this for the existing manual/trip_confirm objects anyway.
    • generate analysis/confirmed_section objects for all sections. Essentially, we would have a 1:1:1 mapping between analysis/cleaned_section, analysis/inferred_section and analysis/confirmed_section. This seems like a much better option in terms of usability, so let us explore it further.

The biggest challenge with this approach is that the manual/trip_confirm objects are not synchronized with the analysis pipeline. So we need to handle the case where they appear before the analysis is run, as well as after the analysis is run.

So a rough outline of the proposed design is:

  • add a new wrapper class, confirmed_section, with fields for sensed_mode, overridden_mode and final_mode
  • add a new pipeline step at the end that creates confirmed_section objects with the following algorithm for filling in the final_mode:
    • final_mode = overridden_mode if overridden_mode is not None else final_mode = sensed_mode
  • add a new pipeline step at the beginning that looks through the incoming objects and for every manual/trip_confirm, update the corresponding confirmed_section. Make sure to retain the manual/trip_confirm objects since our rule is that all incoming data is read-only and can never be modified.

You can then use the new objects by using the confirmed_section and final_mode instead of inferred_section and sensed_mode in the metrics. This should also potentially make the client code easier since you can retrieve confirmed_section directly instead of retrieving inferred_section and the manual/* override objects separately and merging them on the phone.

This is also consistent with the reproducible pipeline because:

  • even if we delete all the analysis/confirmed_section objects as part of deleting all analysis/* objects, the manual/* objects are retained
  • the first step of the pipeline (that processes incoming objects) will not find any manual/* objects, BUT
  • the final step of the pipeline (that matches against existing confirmations) will find the existing manual/* objects and incorporate them while creating analysis/inferred_section objects. This is crucial, and one of the main reasons, in addition to race conditions, and cases without internet connectivity, where we cannot only rely on processing the manual/* objects as they come in.

That seems like a pretty solid design with no holes.

@st-patrick let me know if you have additional questions
@PatGendre @kafitz FYI for understanding design decisions for the future 😄

@shankari
Copy link
Contributor

shankari commented Dec 4, 2019

wrt

This should also potentially make the client code easier since you can retrieve confirmed_section directly instead of retrieving inferred_section and the manual/* override objects separately and merging them on the phone.

you would change inferred -> confirmed and sensed -> final in the GeoJSON export as well
https://github.com/e-mission/e-mission-server/blob/master/emission/analysis/plotting/geojson/geojson_feature_converter.py

Then all the data returned from the /timeline/getTrips/<day> call will already have the overridden values in place. While reading unprocessed data, though, you will have to retain the existing code, since unprocessed = pipeline not run = no inferred sections or confirmed sections. But there might be performance benefits to not having to retrieve the manual/* objects for processed data.

@shankari
Copy link
Contributor

shankari commented Dec 4, 2019

@st-patrick you probably want to send out a draft PR once you have a significant chunk of the code written so that I can review and give feedback. Since this is a non-trivial change, probably best to make the development/review cycle interactive.

@PatGendre
Copy link
Contributor

@shankari thanks, this will be a great feature, and thanks @st-patrick for working on it :-)
It may also useful for the bicycle survey in dev in Nantes.

The solution you describe seem fine,
still I have a question about the timing between pipeline and mode_confirm:
if the pipeline is run daily say every night, if may well happen that the user confirms the mode for a trip the day after, or even a few days after, depending on the application (i.e. a weekly survey). I understand that the pipeline does not process data prior to the last processing date (except when reset), so here, the daily pipeline will not process the confirmed_section older than the last day, will it?

Another suggestion :
is it possible to add two more fields in the confirmed_section : purpose (as it can be useful for many use cases), and say "custominfo", a field that could be used for asking the user any additional info at trip confirm; this would avoid the developer to create another field in the database, he/she would then just have to complete the display/dashboard feature if needed but not modify the data structure.
I must admit I have not a direct use case for this request but it seems likely to be useful.

@shankari
Copy link
Contributor

shankari commented Dec 5, 2019

still I have a question about the timing between pipeline and mode_confirm:
I understand that the pipeline does not process data prior to the last processing date (except when reset), so here, the daily pipeline will not process the confirmed_section older than the last day, will it?

You are absolutely correct the existing section segmentation and inference steps will not process the older confirmed_sections. But that's why I propose adding a "new pipeline step at the beginning that looks through the incoming objects and for every manual/trip_confirm, update the corresponding confirmed_section."

Note that every pipeline step manages its own last_processed_ts. This new pipeline step at the beginning would either check the objects before moving them from the usercache to the timeseries (e.g. before move to long_term) or it would update its own last_processed_ts based on the write timestamp.

So to return to your use case

if the user confirms the mode for a trip the day after, or even a few days after, depending on the application (i.e. a weekly survey)

then the confirmation will be saved as manual/mode_confirm in the local phone DB, it will be pushed up to the usercache as usual. When it is moved from the usercache to the timeseries, or right after that, the new pipeline step will find the corresponding section and modify it.

So, before the manual/mode_confirm is processed, the confirmed_section will have the automatically inferred value, after it is processed, it will have the overriden value. But it is an analysis output which will be deleted when the pipeline is reset, so it doesn't break the reproducibility guarantees.

@PatGendre Does this make sense?

@shankari
Copy link
Contributor

shankari commented Dec 5, 2019

is it possible to add two more fields in the confirmed_section : purpose (as it can be useful for many use cases), and say "custominfo", a field that could be used for asking the user any additional info at trip confirm; this would avoid the developer to create another field in the database, he/she would then just have to complete the display/dashboard feature if needed but not modify the data structure.

Definitely makes sense to add purpose. Not that sure about custom info because the standard algorithms can't process it without understanding its structure. Can we defer that until it is needed so that we don't overengineer?

@PatGendre
Copy link
Contributor

@shankari

that's why I propose adding a "new pipeline step at the beginning that looks through the incoming objects and for every manual/trip_confirm, update the corresponding confirmed_section." ... So, before the manual/mode_confirm is processed, the confirmed_section will have the automatically inferred value, after it is processed, it will have the overriden value. But it is an analysis output which will be deleted when the pipeline is reset, so it doesn't break the reproducibility guarantees.
@PatGendre Does this make sense?

Yes, it's very clever, I think it will work for the proposed used too :-)

Definitely makes sense to add purpose. Not that sure about custom info . Can we defer that until it is needed so that we don't overengineer?

Yes

@shankari
Copy link
Contributor

shankari commented Dec 7, 2019

Just for the record, I just wanted to point out a slight difference between this step and other previous steps. The previous steps are identical under replay. So every time you run the pipeline, the output of every intermediate step will be identical to the previous runs.

But for the proposed new handling_incoming_confirmations and generate_confirmed_sections steps, the first run after the confirmation is received will be potentially be different from subsequent runs. with @PatGendre's use case, in the first run, the confirmed_section will be changed by the handling_incoming_confirmations step; in subsequent runs, it will be generated by the generate_confirmed_sections.

This is not really an issue - as we have seen, the design will still work. But it is a subtle little difference that we should note for interaction with subsequent design decisions.

@shankari
Copy link
Contributor

shankari commented Dec 7, 2019

@jf87 fyi in case this is useful for you too 😄

@st-patrick
Copy link
Contributor Author

Quite frankly I don't understand where I would even begin with these changes.

Over the last days, I have tried to get the trip data with the labels at every request but couldn't find anything, since dataframe and timeseries are not serializable and made debugging quite the headache.

Is there any way we could implement a serialization for that?

Also, I don't think I will have time to work on the above mentioned solution, simply because we only have one week left and I just don't have the comprehension of the server code that is needed for that.

But if you can draft something a little more practical and add some hints for debugging, especially how to access dataframe and timeseries data, that would be a great help. There's probably a really simple way I didn't see. I saw that in time_grouping you used .iloc[i] but since the data doesn't contain any labels at that point, it didn't really help my case.

@shankari
Copy link
Contributor

shankari commented Dec 13, 2019

@st-patrick dataframes are pandas dataframes. There are tons of pandas tutorials on the internet - it is part of the standard data science toolkit.

https://duckduckgo.com/?q=pandas+dataframe+tutorial&t=ffsb&ia=web should help you get started.

not sure what you mean by serializable, do you mean that they don't print properly? How are you trying to print them? I don't have time to test this right now, but per stackoverflow, print(df) should work and that's what I remember as well.
https://stackoverflow.com/questions/49826909/how-to-print-out-dataframe-in-python

if you can draft something a little more practical

Here are even more detailed steps:

  • add a new wrapper class for confirmed_section (similar to Add support for the new mode_confirm and purpose_confirm classes e-mission-server#517)
  • add a new pipeline step in emission/pipeline/intake_stage.py
    • emission/analysis/classification/inference/mode/rule_engine.py is an example of a simple pipeline step, where in runPredictionPipeline, you find the unprocessed sections, process them and save the results
    • for the processing, for each unprocessed inferred section, find the corresponding confirm object (if any) and create a confirmed_section
  • change the analysis.result.section.key from conf/analysis/debug.conf.json to analysis/confirmed_section

This will get you a working solution in the case where the confirmation happens almost immediately.

Once I review that, the second pipeline step should be much easier and will handle the other case in which the confirmation comes later.

@shankari
Copy link
Contributor

shankari commented Aug 12, 2020

Since there have been requests for this from both DFKI and Heidelberg, and now I need it for the CEO e-bike project, I am going to tackle this issue now.

@lefterav @jf87 @EstherEu @PatGendre

@PatGendre
Copy link
Contributor

@shankari that's great :-)
this would be useful I guess if you can include purpose, not only overriden mode.

@shankari
Copy link
Contributor

shankari commented Dec 2, 2020

Picking this up again, one challenge is that the confirmation currently happens at the trip level and not the section level. How do we then deal with creating confirmed_section objects correctly.

One option is to only create confirmed_trip objects, not confirmed_section. If/when we support trip editing, we can create confirmed_section objects.

But then how would we deal with confirmed_trip objects that don't have user input associated with them. They can have multiple sections and we would want to use them.

This will be fixed if/when we have trip editing in place but we need to figure out what to do as a temporary workaround.

@shankari
Copy link
Contributor

shankari commented Dec 3, 2020

Another challenge is that some of the data is genuinely represented at the trip level. For example, a trip has a purpose, not a section.

@PatGendre
Copy link
Contributor

@shankari I agree with you, the major difficulty may be that the mode and purpose buttons are at the trip level, so there is no clear way to induce modes at the section level.

Actually the mode button should be named "principal mode for the trip" and possibly could be pre-filled with the mode totalling most of the trip length, so that we could have a relation between trip (principal) mode and section modes (thus possibly changing section modes - i.e. confirmed section mode - if the principal mode is modified) ... but that would be complicated to implement and to understand for the end user!

@shankari
Copy link
Contributor

shankari commented Dec 3, 2020

@PatGendre @robfitzgerald @jf87 @asiripanich since all of you have worked with the data model, feedback would be appreciated

Here's a more detailed design

we will have both confirmed_trip and confirmed_section

  • confirmed_trip will have a field for confirmed_vals.

    • On master, this will have primary_mode and purpose entries. For branches that use embedded surveys, this will have the survey JSON.
    confirmed_trip
       - confirmed_vals
          - primary_mode
          - purpose
    
    • confirmed_trip will also have an inferred_vals with a primary_mode entry on master. This will be the same even for branches that use embedded surveys, since they do not include any additional inference algorithms.
    • the primary mode will be the mode of the primary section, as determined below
  • confirmed_section will also have user_input. The user_input will have only a mode entry. Only the primary section of a trip will have the user_input set.

  • Determining the primary section of a trip: *
  • if there is only one section, it is primary. Due to COVID, we are likely to have many unimodal trips now.
  • if there is more than one section, it is the longest section
  • Using the confirmed sections for calculations: *
    For most of the pre-defined modes, we can determine what the calculation factors (energy efficiency/carbon emissions) are. But we do allow users to enter their own modes, and it is not clear how we can handle those in calculations.

Since all our calculations are currently based on mode, if the confirmed_mode is one for which we don't have a calculation factor, we will use the sensed mode instead.

Whoo! That was a bit complicated but I think it works for now.

@shankari
Copy link
Contributor

shankari commented Dec 3, 2020

From a UI perspective, we will continue to show the confirmed_vals (if they exist) in the big buttons of trip diary and the inferred mode from the sections on the top - so this just simplifies and pre-computes the values for now.

However, in the dashboard and the CEO ebike gamification, we can switch to confirmed trips and confirmed sections. Concretely, for the CEO ebike gamification, we can use confirmed_trips to determine the number of ebike trips and the % of travel by ebike.

For the dashboard, we can switch to getting the metrics from confirmed_sections. But if we only get metrics, how do we do the mapping from unknown modes to the corresponding sensed mode? I guess we can do that mapping in the metrics calculation for now.

@robfitzgerald
Copy link

i agree with the general strategy here, to follow the pattern of building these immutable documents and not to lose any information. regarding the algorithmic selection of primary modes, a few thoughts (which both may not be helpful at this point):

if there is more than one section, it is the longest section

longest by distance or time? also, i could imagine this might differ by survey, where users may want to inject their own "primary section function" (like, for instance, "if {driving|transit} appears anywhere, set it as primary").

Since all our calculations are currently based on mode, if the confirmed_mode is one for which we don't have a calculation factor, we will use the sensed mode instead

might be helpful (?) to write this "analysis_mode" to a field so the user has a record of it.

Whoo!

🦉

@shankari
Copy link
Contributor

shankari commented Dec 4, 2020

longest by distance or time? also, i could imagine this might differ by survey, where users may want to inject their own

Good point. I was going to use distance, since the primary mode is typically motorized, and likely to be much faster than the modes used for the first and last leg.

"primary section function" (like, for instance, "if {driving|transit} appears anywhere, set it as primary").

my assumption was that if people wanted to change this, they would change it in the code. But I guess I could have people pass in a primary section function instead. I might not do that in this first pass though, pending evidence that people actually need it.

@shankari
Copy link
Contributor

shankari commented Dec 4, 2020

I guess we can do that mapping in the metrics calculation for now.

We can actually do this by adding an entry for mode_for_calculation for each section on the server. At some point, we can actually move the CO2 calculation to the server so it can be based on the energy profile of transportation in the trip location.

We originally started calculating values on the client because we were also calculating the calories burned and we didn't want to send over weight and height information to the server. But the CO2/EE calculations are making more and more sense on the server.

@PatGendre
Copy link
Contributor

hi @shankari

here are a few thoughts:

primary section [...] it is the longest section

It is still not clear to me if the end-user will really understand what "primary section" vs "primary mode" means, and what it implies in terms of metrics calculation.
An alternative would be to link between trip mode and section mode only if there is only one section, or if the longest section if obviously the primary section, say if its length is more then the half of the trip length. And otherwise to leave the primary section confirmed mode blank ... and wait until there is a full trip/section edition feature.

Since all our calculations are currently based on mode, if the confirmed_mode is one for which we don't have a calculation factor, we will use the sensed mode instead.

This a reasonable rule.
In the future a feature could be added so that the end-user can state that the mode is similar to an existing mode, for example wheel chair is similar to walking in terms of calculation, and e-scooter similar to ebike...
Anyway as long as the calculation parameters are not personalised for each end-user, the calculations are only indicative (e.g for car the emission figures can vary a lot from a model to another).

We originally started calculating values on the client because we were also calculating the calories burned and we didn't want to send over weight and height information to the server. But the CO2/EE calculations are making more and more sense on the server.

I agree, as long as the weight/height privacy can be managed and/or the end-user agrees to send these personal data to the server.

@shankari
Copy link
Contributor

shankari commented Dec 7, 2020

I agree, as long as the weight/height privacy can be managed and/or the end-user agrees to send these personal data to the server.

I think we would split the calculations. CO2/EE would be on the server; calorie (which doesn't depend on motorized mode) would be on the phone.

@shankari
Copy link
Contributor

shankari commented Dec 7, 2020

It is still not clear to me if the end-user will really understand what "primary section" vs "primary mode" means, and what it implies in terms of metrics calculation.
An alternative would be to link between trip mode and section mode only if there is only one section, or if the longest section if obviously the primary section, say if its length is more then the half of the trip length. And otherwise to leave the primary section confirmed mode blank ... and wait until there is a full trip/section edition feature.

I want to clarify that this will not necessarily affect the UI at this time. The UI currently displays section modes at the top of the card, and displays trip mode overrides at the bottom. The proposed data model will allow us to continue doing that.

  • The main user-visible difference will be in the dashboard and the calculations.
  • The main deployer-level difference will be in the dashboard.

@PatGendre
Copy link
Contributor

@shankari Thank you for clarifying, I did not get that point.

The main user-visible difference will be in the dashboard and the calculations

Also, the diary screen code could label automatically the primary mode button (with the primary section inferred mode when a section makes >50% of the length of the trip) but it might not be very useful.

@shankari
Copy link
Contributor

Also, the diary screen code could label automatically the primary mode button (with the primary section inferred mode when a section makes >50% of the length of the trip) but it might not be very useful.

Yes, that would also be confusing, as you pointed out earlier. We already show the inferred mode at the top of the trip card, so it is not like the information is missing. We can change this later if we have time to run some user tests.

@shankari
Copy link
Contributor

One final consideration while creating the pipeline is how to set the timestamps for pipeline states. We will need to use the write_ts of the user input but the end_ts of the confirmed trips. Let's walk through how this will work in both cases and ensure that there are no surprises.

  • user confirms in draft mode
    • user inputs are copied to the timeseries in the first step
    • no matching confirmed trips, pipeline state is not updated
    • we create confirmed trips in the last step
    • match with the user inputs from the timeseries
    • on the next pipeline run, we will have to handle the user inputs again

Let's try a different approach:

  • user confirms in draft mode

    • user inputs are copied to the timeseries in the first step
    • no matching confirmed trips, pipeline state is updated to the last write_ts since we have "processed" them
    • we create confirmed trips in the last step
    • match with the user inputs from the timeseries
    • on the next pipeline run, user inputs are already processed
  • user confirms in confirmed mode

    • first pipeline run, we create confirmed trips on the last step
    • no match for user inputs
    • user inputs arrive,
    • next pipeline run, are copied to the timeseries in the first step
    • matching confirmed trips, pipeline state is updated to the last write_ts
    • on the last step, confirmed trips have already been processed, so no change necessary

shankari added a commit to shankari/emdash that referenced this issue Dec 18, 2020
After doing the heavy lifting for implementing
e-mission/e-mission-docs#476
this is actually super simple

- Switch to reading confirmed_trips instead of cleaned_trips
- This leads to a table with user_inputXXX for each user input XXX
    - Remove the user input prefix

We now end up with a table with the user inputs as columns.
Et Voila!

Please see screenshots in PR

This doesn't fully solve the problem of how to determine whether trips have
been confirmed or not, but that will require some careful design to be sufficiently general.
I will open an issue for that and send a separate PR.
@shankari
Copy link
Contributor

Fixed in e-mission/e-mission-server#780

@shankari
Copy link
Contributor

Actually, that only fixed the trips, we also need to fix sections.

@shankari shankari reopened this Dec 18, 2020
shankari added a commit to shankari/e-mission-phone that referenced this issue Feb 18, 2021
While matching user inputs on the server, found that user input matching was
broken for some cleaned trips on the phone.
e-mission/e-mission-docs#476 (comment)

Expanded the user input end check to fix.
e-mission/e-mission-docs#476 (comment)

Will merge into master after additional testing on the branch.

+ bump up the allowed delta to 15 minutes since the time threshold default for
the distance filter is 10 minutes.
shankari added a commit to shankari/e-mission-phone that referenced this issue Feb 18, 2021
The enhanced trip matching (75129db) caused a
regression in which a spurious trip (not a trip) that occurred after the real
trip fit the criteria for a match.

And since it was confirmed after the real trip, as you would expect while going
down the trip list, it was matched preferentially.
e-mission/e-mission-docs#476 (comment)

Fixed by checking the degree over overlap and rejecting too short matches
shankari added a commit to shankari/e-mission-phone that referenced this issue Feb 18, 2021
While matching user inputs on the server, found that user input matching was
broken for some cleaned trips on the phone.
e-mission/e-mission-docs#476 (comment)

Expanded the user input end check to fix.
e-mission/e-mission-docs#476 (comment)

Will merge into master after additional testing on the branch.

+ bump up the allowed delta to 15 minutes since the time threshold default for
the distance filter is 10 minutes.
shankari added a commit to shankari/e-mission-phone that referenced this issue Feb 18, 2021
The enhanced trip matching (75129db) caused a
regression in which a spurious trip (not a trip) that occurred after the real
trip fit the criteria for a match.

And since it was confirmed after the real trip, as you would expect while going
down the trip list, it was matched preferentially.
e-mission/e-mission-docs#476 (comment)

Fixed by checking the degree over overlap and rejecting too short matches
shankari added a commit to e-mission/e-mission-phone that referenced this issue Feb 18, 2021
* Improve the matching of user inputs to cleaned trips

While matching user inputs on the server, found that user input matching was
broken for some cleaned trips on the phone.
e-mission/e-mission-docs#476 (comment)

Expanded the user input end check to fix.
e-mission/e-mission-docs#476 (comment)

Will merge into master after additional testing on the branch.

+ bump up the allowed delta to 15 minutes since the time threshold default for
the distance filter is 10 minutes.

* Fix regression in enhanced trip matching

The enhanced trip matching (75129db) caused a
regression in which a spurious trip (not a trip) that occurred after the real
trip fit the criteria for a match.

And since it was confirmed after the real trip, as you would expect while going
down the trip list, it was matched preferentially.
e-mission/e-mission-docs#476 (comment)

Fixed by checking the degree over overlap and rejecting too short matches

* Fix the infinite scroll user input matching

to be consistent with the refactor in
21ab3fa
and
6bd731a
@PatGendre
Copy link
Contributor

Hi @shankari
you've been improving the trip labeling process lastly, and introduced a new confirmed_trip key, but it is not clear to me, I still have at least 2 questions :

  • can the app dashboard already take the user labelled modes in the indicators computations?
  • if yes : currently the user can only label the trips not the sections, so how do you replace the user labeled modes at the section level for the computation, what's the heuristics (e.g. you replace only the detected mode with the user mode for the largest section)? (I didn't find in the code)

jf87 pushed a commit to jf87/e-mission-server that referenced this issue Jun 21, 2021
+ Fix the existing TestTripQueries for real data by passing in the correct
    trip_id instead of the first entry every time

+ this exposed several flaws, fix them by expanding the check. The corresponding UI fixes are in
    e-mission/e-mission-phone@75129db
    - Bump up the trip end buffer to 15 minutes since the time threshold
      default for the distance filter is 10 minutes
      e-mission/e-mission-docs#476 (comment)
    - Expand the trip end check to handle the corner case, where if the user
      input end is significantly after the trip end because of weird sensing issues
      e-mission/e-mission-docs#476 (comment)

+ This expansion generated a regression in which a spurious (not a trip) that
    occurred after the real trip fit the criteria for a match. And since it was
    confirmed after the real trip, as you would expect while going down the trip
    list, it was matched preferentially.
    e-mission/e-mission-docs#476 (comment)

    Fixed by checking the degree of overlap and rejecting too short matches.

    Corresponding UI Fixes are in
    e-mission/e-mission-phone@9438799

+ Implemented a function to find the matching trip given a user input
    - Generalized the validity checks to take both a trip and a user input
    - Created two curried wrapper functions: one which took a trip and the other which took a userinput
    - Generalized the final_candidate function to take in the curried function
      and invoke it, and fix the entry detail logging to match
    - Create a new function that retrieves all trips within a one day window in
      each direction based on the user input start timestamp

+ Added a new test to check the new function
    - The test loads the data
    - Runs the pipeline
    - Loads the user inputs
    - Finds the trip matching each user input
    - Note that there can be duplicate entries for each trip as users override
      their prior inputs, so we may have duplicate matches, as in the case with
      the purpose confirm objects.

Testing done:
User input related tests pass
@shankari
Copy link
Contributor

shankari commented Jun 22, 2021

@PatGendre the deployer dashboard (emdash) uses the user labeled modes, but the in-app dashboard (the "metrics" screen) does not, at least partially because I wasn't sure how to deal with the mismatch between trip and section labeling that you outlined.

The A-mission branch of the project (https://github.com/xubowenhaoren/A-Mission) from UW supports confirming sections, along with a bunch of accessibility improvements. If somebody wanted to merge the changes over to master, it would help a lot wrt modifying the in-app dashboard as well.

@PatGendre
Copy link
Contributor

@shankari thanks ! Yann will have a look at A-mission, of course I'll tell you if we envisage merging this appealing section labeling into master (we'd have to find a budget).

@PatGendre
Copy link
Contributor

@shankari FYI as there is no budget for improving to complete the labeling feature with indicators taking into account the modes labeled manually by the user, it was decided to remove the labeling feature (at least for a few months), because we think the user won't understand why the (mode, purpose) labels he enters have no effect on the dashboard indicators.

And I have another question, as I've seen that GabrielKS is implementing a "label inference pipeline" : do you have a kind of "functional spec" of what this pipeline will do (on the server and on the app side)? Thanks

@shankari
Copy link
Contributor

@PatGendre I apologize for the delay in responding to your comments about the confirmed_trip objects, but the CanBikeCO deployments are just starting up, and I was on vacation for a week.

And I have a couple of interns working on improving the labeling by determining common and novel trips. The related issues are:
for the analysis: #606
for the system integration: #647

This is a key component of our ongoing work since there is information that we cannot automatically detect - e.g. purpose and replaced mode. So we have to urgently reduce the user labeling burden.

@PatGendre
Copy link
Contributor

@shankari no problem, it is not an urgent question for us!
Thanks for your reply, I understand better what you intend to do.
If I understand well enough, with what your interns will implement, there will still be the question of labeling modes à trip level while actual mode is at the section level.

FYI Fouad and Yann worked on clustering stops outside of e-mission in postgis in 2019, so as to produce statistics on frequent places and frequent trips between places, even if we didn't intend to try ML on automatic mode/purpose labeling, it was already interesting.

We would like to do this again with la Rochelle, but rather in a python notebook than in postgis, but (looking at it quickly) I've found the k-means clustering methods of postgis available in shapely for python...

shankari added a commit to JGreenlee/e-mission-server that referenced this issue Jun 20, 2023
We added these as part of the design in
e-mission/e-mission-docs#476 (comment)

However, we never filled them in

```
$ grep -r primary_section emission
Binary file emission/core/wrapper/__pycache__/confirmedtrip.cpython-39.pyc matches
Binary file emission/core/wrapper/__pycache__/confirmedtrip.cpython-37.pyc matches
emission/core/wrapper/confirmedtrip.py:                  "primary_section": ecwb.WrapperBase.Access.WORM,
$ grep -r inferred_primary_mode emission
Binary file emission/core/wrapper/__pycache__/confirmedtrip.cpython-39.pyc matches
Binary file emission/core/wrapper/__pycache__/confirmedtrip.cpython-37.pyc matches
emission/core/wrapper/confirmedtrip.py:                  "inferred_primary_mode": ecwb.WrapperBase.Access.WORM,
```

And we are now adding a summary of all the sections into the confirmed trip,
to be consistent with the error bars branch
https://github.com/e-mission/e-mission-eval-private-data/pull/33/files#diff-bc613b970b833193aafe96cfd4ce0c145cbe0a4a9c52b8967b60cf936fc260b6

So we don't need the primary section and the inferred primary mode any more.
The inferred primary mode is basically the mode that has the greatest
proportion in the `cleaned_section_summary` or the `inferred_section_summary`
(as the case may be).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants