-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup PR for existing Survey Assist RF PR #938 #995
base: master
Are you sure you want to change the base?
Conversation
…result generation more carefully
…VM_decision_boundaries` compatible with changes in `clustering.py` and `mapping.py` files. Also porting these 3 notebooks to trip_model `cluster_performance.ipynb`, `generate_figs_for_poster` and `SVM_decision_boundaries` now have no dependence on the custom branch. Results of plots are attached to show no difference in theie previous and current outputs.
Unified Interface for fit function across all models. Passing 'Entry' Type data from the notebooks till the Binning functions. Default set to 'none'.
…results.py` Prior to this update, `NaiveBinningClassifier` in 'models.py' had dependencies on both of tour model and trip model. Now, this classifier is completely dependent on trip model. All the other notebooks (except `classification_performance.ipynb`) were tested as well and they are working as usual. Other minor fixes to support previous changes.
1. removed mentions of `tour_model` or `tour_model_first_only` . 2. removed two reads from database. 3. Removed notebook outputs ( this could be the reason a few diffs are too big to view)
RF initialisation and fit function. Build test written and tested. fit of Random forest uses df .
Predict is now included. Just need to figure out model storage and model testing.
Model loading and storing is now improved since it just stores the required predictors and encoders. Regression test and null value test included in tests.
1. switching a model is as simple as changing model_type in config file 2. ForestModel is now working. Main model is in model.py file which is copied from label_assist 3. TestRunForestModel.py is working. 3. Regression test in TestForestmodel.py are still under construction.
Removed redundancies and unnecessary code segments.
Config copy not required.
…' into SurveyAssistgetModel
Updating branch with changes from master. Moved mdoels from eval repo to server repo.
This will fail due to testForest.py file. Changes here include : 1. Integrated the shifting of randomForest model from eval to server. 2. unit tests for Model save and load 3. RegressionTest for RF model in testRandomForest.py.
This is replaced by models.py (movied with history)
Improving the test file by changing the way previpous predictions are stored.
Fixing circular import
The changes in this iteration are improvements in test for forest model : 1. Post discussion last week, the regression test was removed ( `TestForestModel.py` )since it won't be useful when model performance improves. Rather, the structures of predictions is checked. This check is merged with TestForestModel.py 2. After e-mission#944 , `predict_labels_with_n` in `run_model.py` expectes a lists and then iterates over it. The forest model and rest of the tests were updated accordingly.
1. Improved tests in `TestForestModelLoadandSave.py` 2. Better comments, imports nd cleanup
While testing model integration, 2 forest model features specific features are added in the `TestForestModelIntegration.py` file rather than in `entry.py` file.
2 more ( total 4) Forest model specific features are now added after generating random trips for testing purpose.
Forst model specific values added in test setup for random Trips
test in testRandomForestTypePreservation as using `allceodata` specific user id. The tests on github use different db. Fixed by generating random samples.
1. Newline fixes - reverted removal of line a. util.py e-mission#938 (comment) b. run_model.py e-mission#938 (comment) 2. Import format changed to "import X as Y" instead of "from X import Y" Files: clustering.py, models.py, TestForestModelIntegration.py, TestForestModelLoadandSave.py, TestRunForestMode.py e-mission#938 (comment)
Utility functions were added in this commit from the original PR e-mission@e9abd51#diff-e910051a05987388fa06c96f484c23b3e491408a61136ed62235117c445fd664 However, these are already present in clustering.py, which is also imported and used in models.py. But util.py functions are not being used anywhere.
a. Removed check for remaining test data from previous test runs. - This should not be possible if data is cleared correctly in tearDown(). - Improved database clearing in tearDown() just to be sure. b. Moved model build to setup() since all tests need this - I did see Shankari's comment stating that model building is a heavyweight process (e-mission@104dd9a#r1486605432) - But it is anyways required by all tests and moving it to setup helps reduce duplicate code. c. Merged EqualityTest with Type Preservation test - Shankari had left a comment to check for values versus checking for types (e-mission#938 (comment)). - Satyam had added changes to check the predictions list after serialization and deserialization respectively. - However this equality test was already being done in a previous test. - Hence merged these two. d. Merged Serialization and Deserialization error handling test. - These tests were identical and mock functions were being used to assert raised exceptions. - Merged these as well. Merging tests helps reduce the number of types we have to build the model as for all tests the common steps involve building model and fetching predictions.
…tance variables 1. Split up mock trips data into train / test data. - Saw that this was being done in one of the tests in TestForestModelLoadandSave.py itself as well as in TestGreedySimilarityBinning.py - Hence added it for all tests in forest model tests for uniformity. 2. Reduced number of instance variables since they were used inside setUp() only. This addresses review comment mentioned originally for TestForestModelIntegration e-mission#938 (comment) 3. Cleaned up TestForestModeIntegration.py - Added equality tests that check for prediction values generated in pipeline. Address review comment: e-mission#938 (comment) - Added train / test data split. - Removed check for empty data in setUp() Addresses review comment: e-mission#938 (comment)
A. Files directly involved in original PR as well as this PR
a. Mukul’s comments / code changes
|
a. Initial Comments
Mukul’s comments / code changes
b. Mukul’s comments / code changes
PENDING QUERY c. Initial Comments
Mukul’s comments / code changes
d. Initial comments
Mukul’s comments / code changes
PENDING QUERY e. Initial Comments
Mukul’s comments / code changes
Satyam's explanation looks valid. But I will try any alternative approaches if required... f. Initial comments
Mukul’s comments / code changes
PENDING QUERY g. Comment, Code - average of probabilities
Mukul’s comments / code changes
h. Initial comments
Mukul’s comments / code changes
|
a. Initial comments
Mukul’s comments / code changes
b. Initial comments
Mukul’s comments / code changes
|
a. Mukul’s comments / code changes The “from X import Y” type imports are present in various files not in PR.
b. Mukul’s comments / code changes
PENDING QUERY c. Initial comments
Mukul’s comments / code changes
|
a. Mukul’s comments / code changes
b. Mukul’s comments / code changes
|
a. Mukul’s comments / code changes
b. Mukul’s comments / code changes
PENDING QUERY c. Initial comments
Mukul’s comments / code changes
d. Initial comments
Mukul’s comments / code changes
e. Mukul’s comments / code changes
|
Extra changes made by me that were not explicitly a part of code review comments:
PENDING QUERY
a. Initial comments.
Mukul’s comments / code changes
b. Mukul’s comments / code changes
c. Initial comments.
Mukul’s comments / code changes
d. Mukul’s comments / code changes
|
a. Mukul’s comments / code changes Confirmed, review comments addressed with code changes. b. Mukul’s comments / code changes
|
N/A - No review comments B. Part of original PR but not a part of this PR
Changes I added; Not affiliated with any review comments
a. Initial comments.
Mukul’s comments / code changes
b. Initial comments
Mukul’s comments / code changes
c. Mukul’s comments / code changes
Extra code fixes d. Mukul’s comments / code changes
e. Mukul’s comments / code changes
|
C. Involved in original PR but removed later in some commits
There was a single change made related to start/end timestamps. This code is now moved to TestForestModelIntegration.py and discussed under that file. Commit - First time changes made a. Comment - start, end timestamps to be handled in tests, not in core
Mukul’s comments / code changes
|
Commit - File removed
a. Initial comments
Mukul’s comments / code changes
b. Mukul’s comments / code changes
c. Mukul’s comments / code changes Confirmed, review comments addressed with code changes. d. Mukul’s comments / code changes
e. Mukul’s comments / code changes
f. Mukul’s comments / code changes
g. Mukul’s comments / code changes
|
a. Mukul’s comments / code changes
D.Extra files that I made changes to:a. Mukul’s comments / code changes
|
This is a cleanup PR to address pending review comments in the
Survey Assist using RF
#938I have approached this by going through all the comments throughout
Then I had to check the future commits to see if those were addressed already or if they needed any improvements.
I will list all this relevant information file wise below along with my comments and any relevant changes I've made.
This information will be in a table format with these columns:
This will be followed by more details including my own changes and comments.
I have marked review comments / code changes where I need more clarification with the status tag:
PENDING QUERY
Listing files involved in the PR:
A. Directly involved in original PR as well as this PR
B. Part of original PR but not a part of this PR
C. Involved in original PR but removed later in some commits
D.Extra files that I made changes to