use labels to override trip mode #476

st-patrick · 2019-11-28T16:31:22Z

We want to use the transportation mode label to actually impact the metrics shown inside the dashboard.

Since the labels are currently "decoration only", we need to implement some code that e.g. adds an endpoint that will basically do the same as the userMetrics endpoint but use the label as mode of transportation if a label has been applied to that trip.

My question being: in which file would I best start to look for a place to implement this feature?

shankari · 2019-12-04T03:07:21Z

@st-patrick the metrics code in general is here
https://github.com/e-mission/e-mission-server/tree/master/emission/analysis/result/metrics

simple_metrics.py is where we actually generate the metrics - it depends upon the mode_section_grouped_df dataframe being computed correctly.

https://github.com/e-mission/e-mission-server/blob/8747e2279393f05f86e6be58f94de77909a4d455/emission/analysis/result/metrics/time_grouping.py#L118
mode_grouped_df = section_group_df.groupby('sensed_mode')

And here's where we group the sections by the sensed mode.

shankari · 2019-12-04T16:39:32Z

Let me think through the best way to modify that, for the record.

we should be consistent with the rest of the pipeline, and create new sections analysis/confirmed_sections where the mode is set to the overridden value. Then the change to the metrics code is trivial, we would just set the analysis results to analysis/confirmed_section and everything would Just Work.
But then we need to figure out how and when to generate these analysis/confirmed_section objects. Note that the trips may not yet be confirmed when the section is analysed. In that case, we can:
- generate analysis/confirmed_section objects only for sections that have been confirmed. This is not a great solution because it gets us back to the situation where we have to query for two kinds of objects and merge them. We could theoretically do this for the existing manual/trip_confirm objects anyway.
- generate analysis/confirmed_section objects for all sections. Essentially, we would have a 1:1:1 mapping between analysis/cleaned_section, analysis/inferred_section and analysis/confirmed_section. This seems like a much better option in terms of usability, so let us explore it further.

The biggest challenge with this approach is that the manual/trip_confirm objects are not synchronized with the analysis pipeline. So we need to handle the case where they appear before the analysis is run, as well as after the analysis is run.

So a rough outline of the proposed design is:

add a new wrapper class, confirmed_section, with fields for sensed_mode, overridden_mode and final_mode
add a new pipeline step at the end that creates confirmed_section objects with the following algorithm for filling in the final_mode:
- final_mode = overridden_mode if overridden_mode is not None else final_mode = sensed_mode
add a new pipeline step at the beginning that looks through the incoming objects and for every manual/trip_confirm, update the corresponding confirmed_section. Make sure to retain the manual/trip_confirm objects since our rule is that all incoming data is read-only and can never be modified.

You can then use the new objects by using the confirmed_section and final_mode instead of inferred_section and sensed_mode in the metrics. This should also potentially make the client code easier since you can retrieve confirmed_section directly instead of retrieving inferred_section and the manual/* override objects separately and merging them on the phone.

This is also consistent with the reproducible pipeline because:

even if we delete all the analysis/confirmed_section objects as part of deleting all analysis/* objects, the manual/* objects are retained
the first step of the pipeline (that processes incoming objects) will not find any manual/* objects, BUT
the final step of the pipeline (that matches against existing confirmations) will find the existing manual/* objects and incorporate them while creating analysis/inferred_section objects. This is crucial, and one of the main reasons, in addition to race conditions, and cases without internet connectivity, where we cannot only rely on processing the manual/* objects as they come in.

That seems like a pretty solid design with no holes.

@st-patrick let me know if you have additional questions
@PatGendre @kafitz FYI for understanding design decisions for the future 😄

shankari · 2019-12-04T16:46:31Z

wrt

This should also potentially make the client code easier since you can retrieve confirmed_section directly instead of retrieving inferred_section and the manual/* override objects separately and merging them on the phone.

you would change inferred -> confirmed and sensed -> final in the GeoJSON export as well
https://github.com/e-mission/e-mission-server/blob/master/emission/analysis/plotting/geojson/geojson_feature_converter.py

Then all the data returned from the /timeline/getTrips/<day> call will already have the overridden values in place. While reading unprocessed data, though, you will have to retain the existing code, since unprocessed = pipeline not run = no inferred sections or confirmed sections. But there might be performance benefits to not having to retrieve the manual/* objects for processed data.

shankari · 2019-12-04T17:00:24Z

@st-patrick you probably want to send out a draft PR once you have a significant chunk of the code written so that I can review and give feedback. Since this is a non-trivial change, probably best to make the development/review cycle interactive.

PatGendre · 2019-12-05T07:34:27Z

@shankari thanks, this will be a great feature, and thanks @st-patrick for working on it :-)
It may also useful for the bicycle survey in dev in Nantes.

The solution you describe seem fine,
still I have a question about the timing between pipeline and mode_confirm:
if the pipeline is run daily say every night, if may well happen that the user confirms the mode for a trip the day after, or even a few days after, depending on the application (i.e. a weekly survey). I understand that the pipeline does not process data prior to the last processing date (except when reset), so here, the daily pipeline will not process the confirmed_section older than the last day, will it?

Another suggestion :
is it possible to add two more fields in the confirmed_section : purpose (as it can be useful for many use cases), and say "custominfo", a field that could be used for asking the user any additional info at trip confirm; this would avoid the developer to create another field in the database, he/she would then just have to complete the display/dashboard feature if needed but not modify the data structure.
I must admit I have not a direct use case for this request but it seems likely to be useful.

shankari · 2019-12-05T16:04:14Z

still I have a question about the timing between pipeline and mode_confirm:
I understand that the pipeline does not process data prior to the last processing date (except when reset), so here, the daily pipeline will not process the confirmed_section older than the last day, will it?

You are absolutely correct the existing section segmentation and inference steps will not process the older confirmed_sections. But that's why I propose adding a "new pipeline step at the beginning that looks through the incoming objects and for every manual/trip_confirm, update the corresponding confirmed_section."

Note that every pipeline step manages its own last_processed_ts. This new pipeline step at the beginning would either check the objects before moving them from the usercache to the timeseries (e.g. before move to long_term) or it would update its own last_processed_ts based on the write timestamp.

So to return to your use case

if the user confirms the mode for a trip the day after, or even a few days after, depending on the application (i.e. a weekly survey)

then the confirmation will be saved as manual/mode_confirm in the local phone DB, it will be pushed up to the usercache as usual. When it is moved from the usercache to the timeseries, or right after that, the new pipeline step will find the corresponding section and modify it.

So, before the manual/mode_confirm is processed, the confirmed_section will have the automatically inferred value, after it is processed, it will have the overriden value. But it is an analysis output which will be deleted when the pipeline is reset, so it doesn't break the reproducibility guarantees.

@PatGendre Does this make sense?

shankari · 2019-12-05T16:06:00Z

is it possible to add two more fields in the confirmed_section : purpose (as it can be useful for many use cases), and say "custominfo", a field that could be used for asking the user any additional info at trip confirm; this would avoid the developer to create another field in the database, he/she would then just have to complete the display/dashboard feature if needed but not modify the data structure.

Definitely makes sense to add purpose. Not that sure about custom info because the standard algorithms can't process it without understanding its structure. Can we defer that until it is needed so that we don't overengineer?

PatGendre · 2019-12-05T16:53:36Z

@shankari

that's why I propose adding a "new pipeline step at the beginning that looks through the incoming objects and for every manual/trip_confirm, update the corresponding confirmed_section." ... So, before the manual/mode_confirm is processed, the confirmed_section will have the automatically inferred value, after it is processed, it will have the overriden value. But it is an analysis output which will be deleted when the pipeline is reset, so it doesn't break the reproducibility guarantees.
@PatGendre Does this make sense?

Yes, it's very clever, I think it will work for the proposed used too :-)

Definitely makes sense to add purpose. Not that sure about custom info . Can we defer that until it is needed so that we don't overengineer?

Yes

shankari · 2019-12-07T15:55:48Z

Just for the record, I just wanted to point out a slight difference between this step and other previous steps. The previous steps are identical under replay. So every time you run the pipeline, the output of every intermediate step will be identical to the previous runs.

But for the proposed new handling_incoming_confirmations and generate_confirmed_sections steps, the first run after the confirmation is received will be potentially be different from subsequent runs. with @PatGendre's use case, in the first run, the confirmed_section will be changed by the handling_incoming_confirmations step; in subsequent runs, it will be generated by the generate_confirmed_sections.

This is not really an issue - as we have seen, the design will still work. But it is a subtle little difference that we should note for interaction with subsequent design decisions.

shankari · 2019-12-07T21:03:28Z

@jf87 fyi in case this is useful for you too 😄

st-patrick · 2019-12-13T18:50:49Z

Quite frankly I don't understand where I would even begin with these changes.

Over the last days, I have tried to get the trip data with the labels at every request but couldn't find anything, since dataframe and timeseries are not serializable and made debugging quite the headache.

Is there any way we could implement a serialization for that?

Also, I don't think I will have time to work on the above mentioned solution, simply because we only have one week left and I just don't have the comprehension of the server code that is needed for that.

But if you can draft something a little more practical and add some hints for debugging, especially how to access dataframe and timeseries data, that would be a great help. There's probably a really simple way I didn't see. I saw that in time_grouping you used .iloc[i] but since the data doesn't contain any labels at that point, it didn't really help my case.

shankari · 2019-12-13T19:08:52Z

@st-patrick dataframes are pandas dataframes. There are tons of pandas tutorials on the internet - it is part of the standard data science toolkit.

https://duckduckgo.com/?q=pandas+dataframe+tutorial&t=ffsb&ia=web should help you get started.

not sure what you mean by serializable, do you mean that they don't print properly? How are you trying to print them? I don't have time to test this right now, but per stackoverflow, print(df) should work and that's what I remember as well.
https://stackoverflow.com/questions/49826909/how-to-print-out-dataframe-in-python

if you can draft something a little more practical

Here are even more detailed steps:

add a new wrapper class for confirmed_section (similar to Add support for the new mode_confirm and purpose_confirm classes e-mission-server#517)
add a new pipeline step in emission/pipeline/intake_stage.py
- emission/analysis/classification/inference/mode/rule_engine.py is an example of a simple pipeline step, where in runPredictionPipeline, you find the unprocessed sections, process them and save the results
- for the processing, for each unprocessed inferred section, find the corresponding confirm object (if any) and create a confirmed_section
change the analysis.result.section.key from conf/analysis/debug.conf.json to analysis/confirmed_section

This will get you a working solution in the case where the confirmation happens almost immediately.

Once I review that, the second pipeline step should be much easier and will handle the other case in which the confirmation comes later.

shankari · 2020-08-12T17:58:11Z

Since there have been requests for this from both DFKI and Heidelberg, and now I need it for the CEO e-bike project, I am going to tackle this issue now.

@lefterav @jf87 @EstherEu @PatGendre

PatGendre · 2020-08-12T19:31:03Z

@shankari that's great :-)
this would be useful I guess if you can include purpose, not only overriden mode.

shankari · 2020-12-02T18:49:56Z

Picking this up again, one challenge is that the confirmation currently happens at the trip level and not the section level. How do we then deal with creating confirmed_section objects correctly.

One option is to only create confirmed_trip objects, not confirmed_section. If/when we support trip editing, we can create confirmed_section objects.

But then how would we deal with confirmed_trip objects that don't have user input associated with them. They can have multiple sections and we would want to use them.

This will be fixed if/when we have trip editing in place but we need to figure out what to do as a temporary workaround.

shankari · 2020-12-03T07:02:30Z

Another challenge is that some of the data is genuinely represented at the trip level. For example, a trip has a purpose, not a section.

PatGendre · 2020-12-03T08:29:03Z

@shankari I agree with you, the major difficulty may be that the mode and purpose buttons are at the trip level, so there is no clear way to induce modes at the section level.

Actually the mode button should be named "principal mode for the trip" and possibly could be pre-filled with the mode totalling most of the trip length, so that we could have a relation between trip (principal) mode and section modes (thus possibly changing section modes - i.e. confirmed section mode - if the principal mode is modified) ... but that would be complicated to implement and to understand for the end user!

shankari · 2020-12-03T16:30:40Z

@PatGendre @robfitzgerald @jf87 @asiripanich since all of you have worked with the data model, feedback would be appreciated

Here's a more detailed design

we will have both confirmed_trip and confirmed_section

confirmed_trip will have a field for confirmed_vals.
- On master, this will have primary_mode and purpose entries. For branches that use embedded surveys, this will have the survey JSON.
```
confirmed_trip
   - confirmed_vals
      - primary_mode
      - purpose
```
- confirmed_trip will also have an inferred_vals with a primary_mode entry on master. This will be the same even for branches that use embedded surveys, since they do not include any additional inference algorithms.
- the primary mode will be the mode of the primary section, as determined below
confirmed_section will also have user_input. The user_input will have only a mode entry. Only the primary section of a trip will have the user_input set.

Determining the primary section of a trip: *

if there is only one section, it is primary. Due to COVID, we are likely to have many unimodal trips now.
if there is more than one section, it is the longest section

Using the confirmed sections for calculations: *
For most of the pre-defined modes, we can determine what the calculation factors (energy efficiency/carbon emissions) are. But we do allow users to enter their own modes, and it is not clear how we can handle those in calculations.

Since all our calculations are currently based on mode, if the confirmed_mode is one for which we don't have a calculation factor, we will use the sensed mode instead.

Whoo! That was a bit complicated but I think it works for now.

shankari · 2020-12-03T16:47:50Z

From a UI perspective, we will continue to show the confirmed_vals (if they exist) in the big buttons of trip diary and the inferred mode from the sections on the top - so this just simplifies and pre-computes the values for now.

However, in the dashboard and the CEO ebike gamification, we can switch to confirmed trips and confirmed sections. Concretely, for the CEO ebike gamification, we can use confirmed_trips to determine the number of ebike trips and the % of travel by ebike.

For the dashboard, we can switch to getting the metrics from confirmed_sections. But if we only get metrics, how do we do the mapping from unknown modes to the corresponding sensed mode? I guess we can do that mapping in the metrics calculation for now.

robfitzgerald · 2020-12-04T15:51:42Z

i agree with the general strategy here, to follow the pattern of building these immutable documents and not to lose any information. regarding the algorithmic selection of primary modes, a few thoughts (which both may not be helpful at this point):

if there is more than one section, it is the longest section

longest by distance or time? also, i could imagine this might differ by survey, where users may want to inject their own "primary section function" (like, for instance, "if {driving|transit} appears anywhere, set it as primary").

Since all our calculations are currently based on mode, if the confirmed_mode is one for which we don't have a calculation factor, we will use the sensed mode instead

might be helpful (?) to write this "analysis_mode" to a field so the user has a record of it.

Whoo!

🦉

shankari · 2020-12-04T18:03:10Z

longest by distance or time? also, i could imagine this might differ by survey, where users may want to inject their own

Good point. I was going to use distance, since the primary mode is typically motorized, and likely to be much faster than the modes used for the first and last leg.

"primary section function" (like, for instance, "if {driving|transit} appears anywhere, set it as primary").

my assumption was that if people wanted to change this, they would change it in the code. But I guess I could have people pass in a primary section function instead. I might not do that in this first pass though, pending evidence that people actually need it.

shankari · 2020-12-04T19:11:13Z

I guess we can do that mapping in the metrics calculation for now.

We can actually do this by adding an entry for mode_for_calculation for each section on the server. At some point, we can actually move the CO2 calculation to the server so it can be based on the energy profile of transportation in the trip location.

We originally started calculating values on the client because we were also calculating the calories burned and we didn't want to send over weight and height information to the server. But the CO2/EE calculations are making more and more sense on the server.

PatGendre · 2020-12-07T15:41:58Z

hi @shankari

here are a few thoughts:

primary section [...] it is the longest section

It is still not clear to me if the end-user will really understand what "primary section" vs "primary mode" means, and what it implies in terms of metrics calculation.
An alternative would be to link between trip mode and section mode only if there is only one section, or if the longest section if obviously the primary section, say if its length is more then the half of the trip length. And otherwise to leave the primary section confirmed mode blank ... and wait until there is a full trip/section edition feature.

Since all our calculations are currently based on mode, if the confirmed_mode is one for which we don't have a calculation factor, we will use the sensed mode instead.

This a reasonable rule.
In the future a feature could be added so that the end-user can state that the mode is similar to an existing mode, for example wheel chair is similar to walking in terms of calculation, and e-scooter similar to ebike...
Anyway as long as the calculation parameters are not personalised for each end-user, the calculations are only indicative (e.g for car the emission figures can vary a lot from a model to another).

We originally started calculating values on the client because we were also calculating the calories burned and we didn't want to send over weight and height information to the server. But the CO2/EE calculations are making more and more sense on the server.

I agree, as long as the weight/height privacy can be managed and/or the end-user agrees to send these personal data to the server.

shankari · 2020-12-07T16:06:44Z

I agree, as long as the weight/height privacy can be managed and/or the end-user agrees to send these personal data to the server.

I think we would split the calculations. CO2/EE would be on the server; calorie (which doesn't depend on motorized mode) would be on the phone.

shankari · 2020-12-07T19:45:12Z

It is still not clear to me if the end-user will really understand what "primary section" vs "primary mode" means, and what it implies in terms of metrics calculation.
An alternative would be to link between trip mode and section mode only if there is only one section, or if the longest section if obviously the primary section, say if its length is more then the half of the trip length. And otherwise to leave the primary section confirmed mode blank ... and wait until there is a full trip/section edition feature.

I want to clarify that this will not necessarily affect the UI at this time. The UI currently displays section modes at the top of the card, and displays trip mode overrides at the bottom. The proposed data model will allow us to continue doing that.

The main user-visible difference will be in the dashboard and the calculations.
The main deployer-level difference will be in the dashboard.

PatGendre · 2020-12-08T07:41:29Z

@shankari Thank you for clarifying, I did not get that point.

The main user-visible difference will be in the dashboard and the calculations

Also, the diary screen code could label automatically the primary mode button (with the primary section inferred mode when a section makes >50% of the length of the trip) but it might not be very useful.

shankari · 2020-12-14T05:27:31Z

Also, the diary screen code could label automatically the primary mode button (with the primary section inferred mode when a section makes >50% of the length of the trip) but it might not be very useful.

Yes, that would also be confusing, as you pointed out earlier. We already show the inferred mode at the top of the trip card, so it is not like the information is missing. We can change this later if we have time to run some user tests.

shankari · 2020-12-18T00:08:02Z

One final consideration while creating the pipeline is how to set the timestamps for pipeline states. We will need to use the write_ts of the user input but the end_ts of the confirmed trips. Let's walk through how this will work in both cases and ensure that there are no surprises.

user confirms in draft mode
- user inputs are copied to the timeseries in the first step
- no matching confirmed trips, pipeline state is not updated
- we create confirmed trips in the last step
- match with the user inputs from the timeseries
- on the next pipeline run, we will have to handle the user inputs again

Let's try a different approach:

user confirms in draft mode
- user inputs are copied to the timeseries in the first step
- no matching confirmed trips, pipeline state is updated to the last write_ts since we have "processed" them
- we create confirmed trips in the last step
- match with the user inputs from the timeseries
- on the next pipeline run, user inputs are already processed
user confirms in confirmed mode
- first pipeline run, we create confirmed trips on the last step
- no match for user inputs
- user inputs arrive,
- next pipeline run, are copied to the timeseries in the first step
- matching confirmed trips, pipeline state is updated to the last write_ts
- on the last step, confirmed trips have already been processed, so no change necessary

After doing the heavy lifting for implementing e-mission/e-mission-docs#476 this is actually super simple - Switch to reading confirmed_trips instead of cleaned_trips - This leads to a table with user_inputXXX for each user input XXX - Remove the user input prefix We now end up with a table with the user inputs as columns. Et Voila! Please see screenshots in PR This doesn't fully solve the problem of how to determine whether trips have been confirmed or not, but that will require some careful design to be sufficiently general. I will open an issue for that and send a separate PR.

shankari · 2020-12-18T21:36:54Z

Fixed in e-mission/e-mission-server#780

shankari · 2020-12-18T21:37:14Z

Actually, that only fixed the trips, we also need to fix sections.

While matching user inputs on the server, found that user input matching was broken for some cleaned trips on the phone. e-mission/e-mission-docs#476 (comment) Expanded the user input end check to fix. e-mission/e-mission-docs#476 (comment) Will merge into master after additional testing on the branch. + bump up the allowed delta to 15 minutes since the time threshold default for the distance filter is 10 minutes.

The enhanced trip matching (75129db) caused a regression in which a spurious trip (not a trip) that occurred after the real trip fit the criteria for a match. And since it was confirmed after the real trip, as you would expect while going down the trip list, it was matched preferentially. e-mission/e-mission-docs#476 (comment) Fixed by checking the degree over overlap and rejecting too short matches

While matching user inputs on the server, found that user input matching was broken for some cleaned trips on the phone. e-mission/e-mission-docs#476 (comment) Expanded the user input end check to fix. e-mission/e-mission-docs#476 (comment) Will merge into master after additional testing on the branch. + bump up the allowed delta to 15 minutes since the time threshold default for the distance filter is 10 minutes.

The enhanced trip matching (75129db) caused a regression in which a spurious trip (not a trip) that occurred after the real trip fit the criteria for a match. And since it was confirmed after the real trip, as you would expect while going down the trip list, it was matched preferentially. e-mission/e-mission-docs#476 (comment) Fixed by checking the degree over overlap and rejecting too short matches

* Improve the matching of user inputs to cleaned trips While matching user inputs on the server, found that user input matching was broken for some cleaned trips on the phone. e-mission/e-mission-docs#476 (comment) Expanded the user input end check to fix. e-mission/e-mission-docs#476 (comment) Will merge into master after additional testing on the branch. + bump up the allowed delta to 15 minutes since the time threshold default for the distance filter is 10 minutes. * Fix regression in enhanced trip matching The enhanced trip matching (75129db) caused a regression in which a spurious trip (not a trip) that occurred after the real trip fit the criteria for a match. And since it was confirmed after the real trip, as you would expect while going down the trip list, it was matched preferentially. e-mission/e-mission-docs#476 (comment) Fixed by checking the degree over overlap and rejecting too short matches * Fix the infinite scroll user input matching to be consistent with the refactor in 21ab3fa and 6bd731a

PatGendre · 2021-06-14T07:55:52Z

Hi @shankari
you've been improving the trip labeling process lastly, and introduced a new confirmed_trip key, but it is not clear to me, I still have at least 2 questions :

can the app dashboard already take the user labelled modes in the indicators computations?
if yes : currently the user can only label the trips not the sections, so how do you replace the user labeled modes at the section level for the computation, what's the heuristics (e.g. you replace only the detected mode with the user mode for the largest section)? (I didn't find in the code)

+ Fix the existing TestTripQueries for real data by passing in the correct trip_id instead of the first entry every time + this exposed several flaws, fix them by expanding the check. The corresponding UI fixes are in e-mission/e-mission-phone@75129db - Bump up the trip end buffer to 15 minutes since the time threshold default for the distance filter is 10 minutes e-mission/e-mission-docs#476 (comment) - Expand the trip end check to handle the corner case, where if the user input end is significantly after the trip end because of weird sensing issues e-mission/e-mission-docs#476 (comment) + This expansion generated a regression in which a spurious (not a trip) that occurred after the real trip fit the criteria for a match. And since it was confirmed after the real trip, as you would expect while going down the trip list, it was matched preferentially. e-mission/e-mission-docs#476 (comment) Fixed by checking the degree of overlap and rejecting too short matches. Corresponding UI Fixes are in e-mission/e-mission-phone@9438799 + Implemented a function to find the matching trip given a user input - Generalized the validity checks to take both a trip and a user input - Created two curried wrapper functions: one which took a trip and the other which took a userinput - Generalized the final_candidate function to take in the curried function and invoke it, and fix the entry detail logging to match - Create a new function that retrieves all trips within a one day window in each direction based on the user input start timestamp + Added a new test to check the new function - The test loads the data - Runs the pipeline - Loads the user inputs - Finds the trip matching each user input - Note that there can be duplicate entries for each trip as users override their prior inputs, so we may have duplicate matches, as in the case with the purpose confirm objects. Testing done: User input related tests pass

shankari · 2021-06-22T15:42:18Z

@PatGendre the deployer dashboard (emdash) uses the user labeled modes, but the in-app dashboard (the "metrics" screen) does not, at least partially because I wasn't sure how to deal with the mismatch between trip and section labeling that you outlined.

The A-mission branch of the project (https://github.com/xubowenhaoren/A-Mission) from UW supports confirming sections, along with a bunch of accessibility improvements. If somebody wanted to merge the changes over to master, it would help a lot wrt modifying the in-app dashboard as well.

PatGendre · 2021-06-22T16:22:00Z

@shankari thanks ! Yann will have a look at A-mission, of course I'll tell you if we envisage merging this appealing section labeling into master (we'd have to find a budget).

PatGendre · 2021-06-30T07:29:26Z

@shankari FYI as there is no budget for improving to complete the labeling feature with indicators taking into account the modes labeled manually by the user, it was decided to remove the labeling feature (at least for a few months), because we think the user won't understand why the (mode, purpose) labels he enters have no effect on the dashboard indicators.

And I have another question, as I've seen that GabrielKS is implementing a "label inference pipeline" : do you have a kind of "functional spec" of what this pipeline will do (on the server and on the app side)? Thanks

shankari · 2021-06-30T16:26:50Z

@PatGendre I apologize for the delay in responding to your comments about the confirmed_trip objects, but the CanBikeCO deployments are just starting up, and I was on vacation for a week.

And I have a couple of interns working on improving the labeling by determining common and novel trips. The related issues are:
for the analysis: #606
for the system integration: #647

This is a key component of our ongoing work since there is information that we cannot automatically detect - e.g. purpose and replaced mode. So we have to urgently reduce the user labeling burden.

PatGendre · 2021-06-30T17:00:07Z

@shankari no problem, it is not an urgent question for us!
Thanks for your reply, I understand better what you intend to do.
If I understand well enough, with what your interns will implement, there will still be the question of labeling modes à trip level while actual mode is at the section level.

FYI Fouad and Yann worked on clustering stops outside of e-mission in postgis in 2019, so as to produce statistics on frequent places and frequent trips between places, even if we didn't intend to try ML on automatic mode/purpose labeling, it was already interesting.

We would like to do this again with la Rochelle, but rather in a python notebook than in postgis, but (looking at it quickly) I've found the k-means clustering methods of postgis available in shapely for python...

We added these as part of the design in e-mission/e-mission-docs#476 (comment) However, we never filled them in ``` $ grep -r primary_section emission Binary file emission/core/wrapper/__pycache__/confirmedtrip.cpython-39.pyc matches Binary file emission/core/wrapper/__pycache__/confirmedtrip.cpython-37.pyc matches emission/core/wrapper/confirmedtrip.py: "primary_section": ecwb.WrapperBase.Access.WORM, $ grep -r inferred_primary_mode emission Binary file emission/core/wrapper/__pycache__/confirmedtrip.cpython-39.pyc matches Binary file emission/core/wrapper/__pycache__/confirmedtrip.cpython-37.pyc matches emission/core/wrapper/confirmedtrip.py: "inferred_primary_mode": ecwb.WrapperBase.Access.WORM, ``` And we are now adding a summary of all the sections into the confirmed trip, to be consistent with the error bars branch https://github.com/e-mission/e-mission-eval-private-data/pull/33/files#diff-bc613b970b833193aafe96cfd4ce0c145cbe0a4a9c52b8967b60cf936fc260b6 So we don't need the primary section and the inferred primary mode any more. The inferred primary mode is basically the mode that has the greatest proportion in the `cleaned_section_summary` or the `inferred_section_summary` (as the case may be).

shankari added the enhancement New feature or request label Jan 2, 2020

shankari mentioned this issue Feb 12, 2020

correction of a trip’s transportation mode in case of multi-modal trips #500

Closed

shankari mentioned this issue Jun 11, 2020

Have a sandbox environment to play with the app before a full deployment #326

Open

shankari mentioned this issue Dec 18, 2020

Display user labels generically in the dashboard asiripanich/emdash#23

Merged

shankari closed this as completed Dec 18, 2020

shankari reopened this Dec 18, 2020

shankari mentioned this issue Jan 8, 2021

Record and display non trip data on EmTripLog #593

Open

shankari mentioned this issue Feb 12, 2021

Suggestion needed regarding manually editing the per-section motion mode prediction #616

Open

shankari mentioned this issue Jun 22, 2021

Itinerum boolean commit #1 e-mission/e-mission-server#823

Closed

shankari mentioned this issue Jun 29, 2021

Implement label inference pipeline e-mission/e-mission-server#825

Merged

PatGendre mentioned this issue Jul 1, 2021

Common trip system building #647

Closed

shankari mentioned this issue Dec 6, 2021

Change dashboard to support user inputs #688

Closed

shankari mentioned this issue Jun 14, 2022

Push user labels to the server without waiting for trip end #640

Open

shankari mentioned this issue Oct 21, 2022

Closes e-mission/e-mission-docs#813 e-mission/e-mission-server#883

Merged

shankari mentioned this issue Feb 1, 2023

feat: Add timeseries purge script e-mission/e-mission-server#899

Open

shankari mentioned this issue Mar 22, 2023

Add composite trip shankari/e-mission-server#4

Merged

use labels to override trip mode #476

use labels to override trip mode #476

Comments

st-patrick commented Nov 28, 2019

shankari commented Dec 4, 2019

shankari commented Dec 4, 2019 • edited Loading

shankari commented Dec 4, 2019

shankari commented Dec 4, 2019 • edited Loading

PatGendre commented Dec 5, 2019

shankari commented Dec 5, 2019

shankari commented Dec 5, 2019

PatGendre commented Dec 5, 2019

shankari commented Dec 7, 2019

shankari commented Dec 7, 2019

st-patrick commented Dec 13, 2019

shankari commented Dec 13, 2019 • edited Loading

shankari commented Aug 12, 2020 • edited Loading

PatGendre commented Aug 12, 2020

shankari commented Dec 2, 2020

shankari commented Dec 3, 2020

PatGendre commented Dec 3, 2020

shankari commented Dec 3, 2020 • edited Loading

Here's a more detailed design

shankari commented Dec 3, 2020 • edited Loading

robfitzgerald commented Dec 4, 2020

shankari commented Dec 4, 2020

shankari commented Dec 4, 2020

PatGendre commented Dec 7, 2020

shankari commented Dec 7, 2020

shankari commented Dec 7, 2020

PatGendre commented Dec 8, 2020

shankari commented Dec 14, 2020

shankari commented Dec 18, 2020

shankari commented Dec 18, 2020

shankari commented Dec 18, 2020

PatGendre commented Jun 14, 2021

shankari commented Jun 22, 2021 • edited Loading

PatGendre commented Jun 22, 2021

PatGendre commented Jun 30, 2021

shankari commented Jun 30, 2021

PatGendre commented Jun 30, 2021

shankari commented Dec 4, 2019 •

edited

Loading

shankari commented Dec 4, 2019 •

edited

Loading

shankari commented Dec 13, 2019 •

edited

Loading

shankari commented Aug 12, 2020 •

edited

Loading

shankari commented Dec 3, 2020 •

edited

Loading

shankari commented Dec 3, 2020 •

edited

Loading

shankari commented Jun 22, 2021 •

edited

Loading