-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make backend deletion robust after fit #1155
Make backend deletion robust after fit #1155
Conversation
Codecov Report
@@ Coverage Diff @@
## development #1155 +/- ##
===============================================
- Coverage 88.12% 88.00% -0.13%
===============================================
Files 139 139
Lines 10955 10959 +4
===============================================
- Hits 9654 9644 -10
- Misses 1301 1315 +14
Continue to review full report at Codecov.
|
Hey, could you please rebase this to the latest changes in the development branch? |
836b08f
to
40abe51
Compare
@@ -175,6 +175,7 @@ def cli_start_worker(scheduler_file_name): | |||
X_train, X_test, y_train, y_test = \ | |||
sklearn.model_selection.train_test_split(X, y, random_state=1) | |||
automl = AutoSklearnClassifier( | |||
delete_tmp_folder_after_terminate=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, what's the reason for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very good question. I have made it so, that backend is created on a need basis and destroyed after it is no longer needed. We cannot relay on the del method.
If you look a couple of lines below, we do fit_ensemble. If we do not preserve the backend after fit, the backend directory will be removed.
@@ -123,6 +123,7 @@ async def do_work(): | |||
# 3. Start the client | |||
with dask.distributed.Client(address=cluster.scheduler_address) as client: | |||
automl = AutoSklearnClassifier( | |||
delete_tmp_folder_after_terminate=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here?
@@ -47,11 +47,10 @@ def __del__(self): | |||
pass | |||
|
|||
|
|||
def test_fit(dask_client, backend): | |||
def test_fit(dask_client): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the overall difference here will be that the temporary directories will be at random while they were given by a pytest fixture before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is anyone taking care that the temporary directories which are now no longer deleted by Auto-sklearn or the fixture are removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. If we do delete_tmp_folder_after_terminate=False
then they will remain on the temporary directory and will get deleted after the laptop is rebooted. This happens in some test, but not many.
I thought about creating a custom code for deleting this, but as the name is random and did not want to complicate the code.
However I can create a fixture at the begining/end that uses the temporary_directory and clean it if you think we should do this.
@@ -62,6 +62,7 @@ def __call__(self, *args, **kwargs): | |||
get_smac_object_wrapper_instance = get_smac_object_wrapper() | |||
|
|||
automl = AutoSklearnClassifier( | |||
delete_tmp_folder_after_terminate=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this now necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because after fit there is no need for the backend to survive, yet the test still wants to go to the temporary directory and check for the correctness of some files.
Just to set some context in the PR: The problem statement is that when calling The backend gets deleted as of now when the object is garbage collected. This does not work due to the point above. So this PR tried to move away from garbage collection deletion to delete the backend when is not needed, after Something that I did not explore is using the atexit from python which might help. |
* Make backend deletion robust after fit * [FIX] smbo unittest * Output folder is needed for ensemble output * fit_ensemble also runs in manual spawning * fit_ensemble also runs in manual spawning * Rebase artifacts * Fix rebase * Missed arg in examples * Remove rebase artifacts Co-authored-by: Eddie Bergman <[email protected]>
Addresses #1149 which is caused by destructor being garbage collected when the modules where already terminated, as explained here.