Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add faq #1109

Merged
merged 9 commits into from
May 27, 2021
Merged

Add faq #1109

merged 9 commits into from
May 27, 2021

Conversation

mfeurer
Copy link
Contributor

@mfeurer mfeurer commented Mar 26, 2021

No description provided.

@codecov
Copy link

codecov bot commented Mar 26, 2021

Codecov Report

Merging #1109 (b675954) into development (79627e1) will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           development    #1109      +/-   ##
===============================================
- Coverage        85.82%   85.80%   -0.02%     
===============================================
  Files              137      137              
  Lines            10625    10625              
===============================================
- Hits              9119     9117       -2     
- Misses            1506     1508       +2     
Impacted Files Coverage Δ
...ature_preprocessing/select_rates_classification.py 85.91% <0.00%> (-1.41%) ⬇️
...ine/components/classification/gradient_boosting.py 91.30% <0.00%> (-0.87%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 79627e1...b675954. Read the comment docs.

doc/faq.rst Outdated
``forkserver`` and ``spawn``. The default ``fork`` copies the whole process memory into the
subprocess. If the main process already uses 1.5GB of main memory and we apply a 3GB memory
limit to Auto-sklearn, it will only be able to use 1.5GB of that. We would have loved to use
``forkserver`` or ``spawn`` instead, which both don't suffer from this issue (and have some
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "this issue"? The mentioned link refers to deadlocks when using multi processing and how to solve them.

doc/faq.rst Outdated
memory limit and a time limit. To start such a process, Python gives three options: ``fork``,
``forkserver`` and ``spawn``. The default ``fork`` copies the whole process memory into the
subprocess. If the main process already uses 1.5GB of main memory and we apply a 3GB memory
limit to Auto-sklearn, it will only be able to use 1.5GB of that. We would have loved to use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it -> executing a machine learning algorithm is limited to use at most 1.5GB.

doc/faq.rst Outdated

There are now two possible solutions:

1. Use parallel Auto-sklearn: if you use Auto-sklean in parallel, it defaults to ``forkserver``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a link to an example or a hint how to do this? Is this just setting the n_jobs flag?

doc/faq.rst Outdated

We therefore suggest using one of the above settings by default.

Auto-sklearn is extremely memory hungry in a sequential setting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same title as above

doc/faq.rst Outdated

When running Auto-sklearn in a parallel setting it starts new processes for evaluating machine
learning models using the ``forkserver`` mechanism. If not all code in the main script is guarded
by ``if __name__ == "__main__"`` it is executed for each subprocess. If now part of the code that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If now ... your RAM -> This sentence is hard to parse.
If the code loading your dataset is not guarded, it is executed for every evaluation of a machine learning algorithm and thus blocking your RAM.

doc/faq.rst Show resolved Hide resolved
doc/faq.rst Outdated
-----------------------------------

In certain cases, for example for debugging, it can be helpful to limit the number of
models to try. We do not provide this as an argument in the API as we believe that it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model evaluations

doc/faq.rst Show resolved Hide resolved
doc/faq.rst Show resolved Hide resolved
doc/manual.rst Outdated
@@ -77,6 +77,8 @@ For a full list please have a look at the source code (in `autosklearn/pipeline/
* `Regressors <https://github.com/automl/auto-sklearn/tree/master/autosklearn/pipeline/components/regression>`_
* `Preprocessors <https://github.com/automl/auto-sklearn/tree/master/autosklearn/pipeline/components/feature_preprocessing>`_

We do also provide an example `on how to restrict the classifiers to search over <examples/80_advanced/example_interpretable_models.html>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is in "40_advanced"

doc/faq.rst Show resolved Hide resolved
doc/faq.rst Show resolved Hide resolved
doc/faq.rst Show resolved Hide resolved
doc/faq.rst Outdated Show resolved Hide resolved
doc/faq.rst Outdated Show resolved Hide resolved
doc/faq.rst Show resolved Hide resolved
doc/faq.rst Show resolved Hide resolved
doc/faq.rst Outdated Show resolved Hide resolved
doc/faq.rst Outdated Show resolved Hide resolved
doc/faq.rst Show resolved Hide resolved
doc/faq.rst Outdated Show resolved Hide resolved
doc/faq.rst Outdated Show resolved Hide resolved
@mfeurer mfeurer merged commit 5a72e52 into development May 27, 2021
@mfeurer mfeurer deleted the ADD_FAQ branch May 27, 2021 13:28
github-actions bot pushed a commit that referenced this pull request May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants