check unit test code #826

corinne-hcr · 2021-06-30T01:21:18Z

TestDataPreprocessing.py and TestGetUsers.py need to read trips. Also need help with setup and teardown function.

shankari · 2021-07-16T22:39:11Z

Please submit a separate PR for the code refactoring since the unit tests are not complete.

corinne-hcr · 2021-07-21T17:30:42Z

Just for records.
1.I am currently reading in shankari_2016-06-20.expected_confirmed_trips. I can get 6 confirmed trips in total. But when the process goes to similarity code (filter_too_short function), the code cannot read start_place.
2.To be able to test the functions for analyzing common trips, I need to make common trips.
3.Here are codes for creating start_place(or end_place), suggested by @shankari

import emission.core.wrapper.place as ecwp
start_place = ecwp.Place()
start_place.location = ct.start_loc
start_place['_id'] = ct.start_place
import emisson.storage.timeseries.abstract_timeseries as esta
esta.update(start_place)

P.S. ct is an instance of a confirmed trip for which you are trying to create a start place

corinne-hcr · 2021-07-21T21:38:03Z

I am still blocked by this:
start_place is an instance of Place(), contains the location the id. But esta.update(start_place) doesn't work.

AttributeError: module 'emission.storage.timeseries.abstract_timeseries' has no attribute 'update'

The definition of update() is pass.
I don't know how to solve that.

shankari · 2021-07-21T22:02:24Z

pass is a static method in timeseries (esta.Timeseries.update) should work. Again, my suggestions are not intended to be runnable code - they give you an indication of the methods you should look at, but you need to actually understand the python and adapt as necessary.

corinne-hcr · 2021-07-21T22:37:27Z

I run these codes to get filter_trips, butesta.TimeSeries.update(start_place) doesn't work.
The error is

start_lon = start_place.data.location["coordinates"][0]
AttributeError: 'NoneType' object has no attribute 'data'

import emission.core.wrapper.place as ecwp
import emission.storage.timeseries.abstract_timeseries as esta
self.readAndStoreTripsFromFile("emission/tests/data/real_examples/shankari_2016-06-20.expected_confirmed_trips")
ts = esta.TimeSeries.get_time_series(self.testUUID)
user = self.testUUID
trips = preprocess.read_data(user)
for trip in trips:
    ct = trip
    start_place = ecwp.Place()
    start_place.location = ct.data.start_loc
    start_place['_id'] = ct.data.start_place
    esta.TimeSeries.update(start_place)
filter_trips = preprocess.filter_data(trips, 100)
print(filter_trips)

shankari · 2021-07-21T23:03:10Z

Again, my suggestions are not intended to be runnable code - they give you an indication of the methods you should look at, but you need to actually understand the code and adapt as necessary.

You are trying to read start_place.data.location. You are setting start_place.location. Make them be consistent and it will work.

If I need to give you runnable code, I might as well write the code myself.

corinne-hcr · 2021-07-21T23:31:27Z

To be consistent like this?
But it still doesn't work

user = self.testUUID
trips = preprocess.read_data(user)
for trip in trips:
  ct = trip
  ct.data.start_place = ecwp.Place()
  ct.data.start_place.location = ct.data.start_loc
  ct.data.start_place.data['_id'] = ct.data.start_place
  esta.TimeSeries.update(ct.data.start_place.data)
  test = similarity.filter_too_short([ct], 100)

For this line,

ct.data.start_place = ecwp.Place()

the error is

AttributeError: property start_place is read-only

If I use

start_place.location = ct.start_loc
start_place.data['_id'] = ct.start_place

the error is

AttributeError: property start_loc is not defined for Entry

Since ct is an instance of a confirmed trip, so it is in Entry class. Only ct.data.start_loc can access start_loc

shankari · 2021-07-22T14:09:51Z

@corinne-hcr have you checked out the e-mission data model and the different wrapper classes?
https://github.com/e-mission/e-mission-server/tree/master/emission/core/wrapper

They are linked from the timeseries notebook (Timeseries_Sample.ipynb) that you experimented with earlier. The notebook also outlines the various ways to access the timeline objects as entry objects, data tables, etc. Please review them and become familiar with the data model.

To be consistent like this?

No. You don't want to modify the cleaned trip object. you want to read the cleaned trip object and modify the start place, since that is the object that you are creating.

Concretely. Entry objects should set data into the "data" sub-object.

    start_place["data"]["location"] = ct.data.start_loc

shankari · 2021-07-22T17:02:39Z

Double-checking the code

def update(entry):

works with an entry.

The Entry class includes both data and metadata. Place is an example of data, so you need to call

def update_data(user_id, key, obj_id, data):

instead.

shankari · 2021-07-22T17:06:48Z

To clarify: my previous comment:

Concretely. Entry objects should set data into the "data" sub-object.

Assumed that you were creating an Entry object. Since you are creating a Place object that corresponds to data in an entry object with metadata.key = analysis/cleaned_place, you can use .location directly.

shankari · 2021-07-22T17:15:13Z

Again, looking at the implementation of update_data in TimeSeries, it is a static method without a concrete implementation. you need to call the method in BuiltinTimeSeries, which has the actual implementation instead. Note that static methods are not overridden by subclasses.

corinne-hcr · 2021-07-23T15:20:41Z

I figured that I would get a different result if I changed similarity to extract coordinates directly from trip.data.start_loc["coordinates"](for example) instead of start_place = esda.get_entry(esda.CLEANED_PLACE_KEY, trip.data.start_place)
Here is the result from start_place = esda.get_entry(esda.CLEANED_PLACE_KEY, trip.data.start_place)
1st round

2nd round

Here is the result from trip.data.start_loc["coordinates"]
1st round

2nd round

Also, the way I extract coordinates for the 2nd round now is just extracting from the confirmed trip, not from place. Should I change?

shankari · 2021-07-23T15:45:57Z

I can't see the images, I am not sure that they were uploaded correctly, but that is an interesting finding.

Having said that:

the existing results are not immutable. they are not every good (so that we don't need to strive to meet them) and the choice of using the place v/s the trip locations is based on a decision by an undergrad who worked on the project for a summer. I don't think there was a lot of thought put into it, at least, there's no documentation of such thinking. This is why documentation is important :)
This is also why unit tests are important. If we had them, we would be able to see where the difference originated and see if it was meaningful, instead of looking at the final result and trying to make sense of it.

Given that the original choice of using the place locations was somewhat arbitrary anyway, I think it is fine to make this change unless the results are dramatically worse. I have already made the change in my local branch as I dig deep into the current similarity code to see how we can improve our results.

Part of the goal of having this kind of testing is to at least know that there are differences, and to reason about whether they are important. If the differences are meaningful, we fix the code. If they are not, and the new code is an improvement, we fix the tests. We are in the second case here.

shankari · 2021-07-24T03:31:08Z

@corinne-hcr I still can't see the images

shankari · 2021-07-24T14:01:30Z

@corinne-hcr I can see the images now. I don't see a significant difference between the two results; while individual points are moved around, that falls within the bounds of normal probability differences. If you are familiar with generating boxplots, and can generate a boxplot with the results, that would make it more clear. But generating the boxplot is not a priority.

@corinne-hcr

Instead of looking up the place and getting it instead. This has two advantages over the current implementation: 1. We don't have to make 2 separate database calls for each trip Note that we compute an nxn distance matrix, so this is likely to be a substantial savings 2. We can pass in a in-memory trip list. That makes it easier to write unit tests, and to use alternate load methods (e.g. for working with federated data e-mission/e-mission-eval-private-data@952c476 @corinne-hcr reported that the place location and the trip start/loc locations are not identical. We don't have unit tests to verify this (alas!) but the top level results are not changed significantly. So the ROI seems high enough; we are going ahead with this change. e-mission#826 (comment)

shankari · 2021-07-26T04:14:35Z

At a high level, there should typically be multiple tests for each function to test various scenarios that might happen, not just ones that happened to occur while processing one particular user in one particular dataset and one possible method. (e.g. corinne-hcr@b46a370) However, this should hopefully catch most regressions while refactoring so it is a good start!

@corinne-hcr Can you list out what additional tests you plan to complete?

I will review this tomorrow, I have a presentation to finish tonight to meet my own deadlines.

corinne-hcr · 2021-07-26T06:26:53Z

Currently, the similarity code is not significant changed. @shankari added new ways of accessing. But the old ways should still work. So the test of similarity code should still be fine.

shankari · 2021-07-26T06:32:19Z

For the record, @corinne-hcr's questions (asked in private chat) were:

I realize you change and also wrote a test for similarity
Since you have changed similarity code, my test for the old one may be useless. Should I integrate the new similarity code to evaluation_pipeline? e.g. change the first_round code
There are so many changes now. I am confused what my next step is

My response was:

I explicitly did not change the similarity code significantly; your existing code should work without any changes (I was not comfortable making changes without unit tests)
I added new ways of accessing it which I use, but the old ways should still work

@corinne-hcr so do you have any additional questions? How many more tests do you plan to write and what is their ETA?

corinne-hcr · 2021-07-26T07:01:31Z

I plan to write 3 more tests - get_score, second_round_of_clustering, evaluation_pipeline
They should be done by tomorrow's meeting.
I feel overwhelmed with so many changes at this point. I need some time tomorrow to digest the recent PR and changed ways. I don't know which way and what analysis I need to run and put in the paper. I think I will need to ask some questions after reading the codes. Please forgive me that I am probably not able to fully understand the new analysis in a very short time based on my knowledge and skillset.

shankari · 2021-07-26T07:21:55Z

@corinne-hcr I would start with e-mission/e-mission-eval-private-data@abf4f78 which compares the similarity code with different settings. In particular, the results around whether or not to filter and whether or not to use the cutoff. Note that I am now returning the labels from similarity (mapped to the original trip indices) so we don't need to maintain the "trip" data structure any more.

Note also that I have different results for the filtered_cutoff case than you do. I am pretty sure this is because of a bug in my h-score calculation code.
e-mission/e-mission-eval-private-data#28 (comment)

I don't care about that for the pipeline changes, since my selected settings are no_filter_no_cutoff, but you can feel free to do so to add additional analysis results to the paper!

corinne-hcr · 2021-07-26T07:29:44Z

I don't care about that for the pipeline changes, since my selected settings are no_filter_no_cutoff, but you can feel free to do so to add additional analysis results to the paper!

I remember you mentioned single level in one of the PR. Do you mean that you just use 1 round of clustering instead of 2 rounds? If so, the evaluation_pipeline is not useful anymore?

shankari · 2021-07-26T07:48:14Z

Not sure what you mean by single_level (link to code would be helpful). But yes, for the deployment on Monday night, I will use only the first round, and I will not filter or cutoff any trips, and I will not model only one split. I anticipate changing the evaluation pipeline significantly, as I said earlier.

corinne-hcr added 3 commits June 29, 2021 18:17

check unit test code

a79d4fa

not yet refactored, just for testing notebook codes

8de7c77

add kmeans at the 2nd round of clutering

47e0f61

shankari mentioned this pull request Jul 19, 2021

Modeling and functions #829

Merged

corinne-hcr added 2 commits July 25, 2021 18:34

tests and fake trips I have so far

3ba040d

done with TestGetRequestPercentage, update fake_trips for testing

f815cff

update tests and fake trips

2a0bc45

shankari changed the base branch from random-forest-mode-detection to master September 23, 2023 04:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check unit test code #826

check unit test code #826

corinne-hcr commented Jun 30, 2021

shankari commented Jul 16, 2021

corinne-hcr commented Jul 21, 2021

corinne-hcr commented Jul 21, 2021

shankari commented Jul 21, 2021

corinne-hcr commented Jul 21, 2021

shankari commented Jul 21, 2021

corinne-hcr commented Jul 21, 2021

shankari commented Jul 22, 2021 •

edited

Loading

shankari commented Jul 22, 2021

shankari commented Jul 22, 2021 •

edited

Loading

shankari commented Jul 22, 2021

corinne-hcr commented Jul 23, 2021 •

edited

Loading

shankari commented Jul 23, 2021 •

edited

Loading

shankari commented Jul 24, 2021

shankari commented Jul 24, 2021

shankari commented Jul 26, 2021 •

edited

Loading

corinne-hcr commented Jul 26, 2021

shankari commented Jul 26, 2021 •

edited

Loading

corinne-hcr commented Jul 26, 2021

shankari commented Jul 26, 2021 •

edited

Loading

corinne-hcr commented Jul 26, 2021

shankari commented Jul 26, 2021

check unit test code #826

Are you sure you want to change the base?

check unit test code #826

Conversation

corinne-hcr commented Jun 30, 2021

shankari commented Jul 16, 2021

corinne-hcr commented Jul 21, 2021

corinne-hcr commented Jul 21, 2021

shankari commented Jul 21, 2021

corinne-hcr commented Jul 21, 2021

shankari commented Jul 21, 2021

corinne-hcr commented Jul 21, 2021

shankari commented Jul 22, 2021 • edited Loading

shankari commented Jul 22, 2021

shankari commented Jul 22, 2021 • edited Loading

shankari commented Jul 22, 2021

corinne-hcr commented Jul 23, 2021 • edited Loading

shankari commented Jul 23, 2021 • edited Loading

shankari commented Jul 24, 2021

shankari commented Jul 24, 2021

shankari commented Jul 26, 2021 • edited Loading

corinne-hcr commented Jul 26, 2021

shankari commented Jul 26, 2021 • edited Loading

corinne-hcr commented Jul 26, 2021

shankari commented Jul 26, 2021 • edited Loading

corinne-hcr commented Jul 26, 2021

shankari commented Jul 26, 2021

shankari commented Jul 22, 2021 •

edited

Loading

shankari commented Jul 22, 2021 •

edited

Loading

corinne-hcr commented Jul 23, 2021 •

edited

Loading

shankari commented Jul 23, 2021 •

edited

Loading

shankari commented Jul 26, 2021 •

edited

Loading

shankari commented Jul 26, 2021 •

edited

Loading

shankari commented Jul 26, 2021 •

edited

Loading