-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check unit test code #826
base: master
Are you sure you want to change the base?
check unit test code #826
Conversation
Please submit a separate PR for the code refactoring since the unit tests are not complete. |
Just for records.
|
I am still blocked by this:
The definition of |
|
I run these codes to get
|
Again, my suggestions are not intended to be runnable code - they give you an indication of the methods you should look at, but you need to actually understand the code and adapt as necessary. You are trying to read If I need to give you runnable code, I might as well write the code myself. |
To be consistent like this?
For this line,
the error is
If I use
the error is
Since ct is an instance of a confirmed trip, so it is in Entry class. Only |
@corinne-hcr have you checked out the e-mission data model and the different wrapper classes? They are linked from the timeseries notebook (
No. You don't want to modify the cleaned trip object. you want to read the cleaned trip object and modify the start place, since that is the object that you are creating. Concretely. Entry objects should set data into the "data" sub-object.
|
Double-checking the code
works with an entry. The
instead. |
To clarify: my previous comment:
Assumed that you were creating an |
Again, looking at the implementation of |
I can't see the images, I am not sure that they were uploaded correctly, but that is an interesting finding. Having said that:
Given that the original choice of using the place locations was somewhat arbitrary anyway, I think it is fine to make this change unless the results are dramatically worse. I have already made the change in my local branch as I dig deep into the current similarity code to see how we can improve our results. Part of the goal of having this kind of testing is to at least know that there are differences, and to reason about whether they are important. If the differences are meaningful, we fix the code. If they are not, and the new code is an improvement, we fix the tests. We are in the second case here. |
@corinne-hcr I still can't see the images |
@corinne-hcr I can see the images now. I don't see a significant difference between the two results; while individual points are moved around, that falls within the bounds of normal probability differences. If you are familiar with generating boxplots, and can generate a boxplot with the results, that would make it more clear. But generating the boxplot is not a priority. |
Instead of looking up the place and getting it instead. This has two advantages over the current implementation: 1. We don't have to make 2 separate database calls for each trip Note that we compute an nxn distance matrix, so this is likely to be a substantial savings 2. We can pass in a in-memory trip list. That makes it easier to write unit tests, and to use alternate load methods (e.g. for working with federated data e-mission/e-mission-eval-private-data@952c476 @corinne-hcr reported that the place location and the trip start/loc locations are not identical. We don't have unit tests to verify this (alas!) but the top level results are not changed significantly. So the ROI seems high enough; we are going ahead with this change. e-mission#826 (comment)
At a high level, there should typically be multiple tests for each function to test various scenarios that might happen, not just ones that happened to occur while processing one particular user in one particular dataset and one possible method. (e.g. corinne-hcr@b46a370) However, this should hopefully catch most regressions while refactoring so it is a good start! @corinne-hcr Can you list out what additional tests you plan to complete? I will review this tomorrow, I have a presentation to finish tonight to meet my own deadlines. |
Currently, the similarity code is not significant changed. @shankari added new ways of accessing. But the old ways should still work. So the test of similarity code should still be fine. |
For the record, @corinne-hcr's questions (asked in private chat) were:
My response was:
@corinne-hcr so do you have any additional questions? How many more tests do you plan to write and what is their ETA? |
I plan to write 3 more tests - get_score, second_round_of_clustering, evaluation_pipeline |
@corinne-hcr I would start with e-mission/e-mission-eval-private-data@abf4f78 which compares the similarity code with different settings. In particular, the results around whether or not to filter and whether or not to use the cutoff. Note that I am now returning the labels from similarity (mapped to the original trip indices) so we don't need to maintain the "trip" data structure any more. Note also that I have different results for the I don't care about that for the pipeline changes, since my selected settings are |
I remember you mentioned |
Not sure what you mean by |
TestDataPreprocessing.py and TestGetUsers.py need to read trips. Also need help with setup and teardown function.