-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional dependency GSTools-Core #215
Conversation
The Github actions still need to be updated.
One thing that would be nice would be to create a largish regression test suite using the existing implementation and apply that to the new one to ensure equivalent results. |
Nearly all of the tests done with during the |
I was pondering something more automated, e.g. using the existing implementation as a model-based test using a coverage-guided fuzzer that would automatically create test cases with the definition of the test cases basically being to yield the same results as the existing implementation. I am just note sure how to integrate the existing implementation, i.e. on a source-level or maybe by just tapping into a fixed binary distribution. |
After a nice conversation with @LSchueler we decided:
|
Shouldn't this be try:
import gstools_core
USE_RUST = True
except ImportError:
del gstools_core
USE_RUST = False i.e. only if the import succeeds, |
I meant |
Now, the GSTools-Core is used, if the package can be imported, but it can also be switched off, by setting the global var. `gstools.config.USE_RUST=False` during the runtime.
This makes the Cython code compatible to GSTools-Core v0.1.2 again.
The Cython code is back in again and GSTools-Core is going to be used if the package is installed and |
gstools/variogram/variogram.py
Outdated
|
||
if config.USE_RUST: | ||
# pylint: disable=E0401 | ||
from gstools_core import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be possible to merge those or this is a formatter issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... that was isort
's idea. It's about as ugly as it gets, right?! :-D And black
and isort
can't even agree on where to put the # pragma: no cover
.
rustfmt
is so nice :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about setting # pragma: no cover
behind if config.USE_RUST:
once?
Also the # pylint: disable=C0412
is redundant (see line 13) and we could also just add E0401
in line 13.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think I was a bit annoyed yesterday by the combination of a false positive coming from pylint and then the isort and black peskiness. I imnproved the situation, but it seems that isort does not like grouped renaming of imports.
I don't want to disable the pylint error E0401
globally, as import errors can be quite a severe problem.
For the moment, we don't provide GSTools-Core via conda, which makes it a bit awkward to use it, if GSTools is installed via conda. But I think that's a problem of GSTools-Core and as the Rust implementations are labeled as experimental anyway, I think this is ready to be merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really cool! I got some ideas for improvement, maybe you like them. And what about a routine to set the env-var for the number of cores for the rust parallel runs in gstools.config
?
GSTools-Core will automatically use all your cores in parallel, without having | ||
to use OpenMP or a local C compiler. | ||
In case you want to restrict the number of threads used, you can set the | ||
environment variable ``RAYON_NUM_THREADS`` to the desired amount. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a little function in gstools.config
to set this environment variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One argument against that is that this variable will affect all Rust software using Rayon, not just GSTools-Core's backend.
If we do provide a programmatic interface, we should probably make the Core initialize a dedicated thread pool and make this function affect only this thread pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamreichold raises a fair point there. But that env. variable would only exist during the shell session from which GSTools is run (I don't have enough experience with Windows to know how it handles env. variables).
Did I understand it correctly, that the number of threads used by Rayon can only be set once at the start and doing it a second time results in an error? I'm not if this is worth the effort.
Another alternative would be to capture the env. variable at the start of GSTools and set it back to that value when exiting, but that doesn't sound very elegant.
I personally would just keep the hint in the readme for the moment. But if you have stronger opinions than I do, I'd be happy to implement your preferred solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did I understand it correctly, that the number of threads used by Rayon can only be set once at the start and doing it a second time results in an error? I'm not if this is worth the effort.
Setting the environment variable will affect Rayon's so-called global thread pool which is initialized once, either on-demand when the first Rayon task is spawned or explicitly be the application. Hence, changing the environment variable after a first computation using Rayon was run has no effect.
This why I suggested making the GSTools-Core module store an Option<ThreadPool>
which would be affected (meaning re-initialized using a given number of threads) by the function from gstools.config
. If the option is set, that thread pool is used via ThreadPool::install
, otherwise GSTools-Core falls back to the global thread pool.
I personally would just keep the hint in the readme for the moment.
Personally, I would say that online control over parallelism is a somewhat orthogonal issue and would best be served via a follow-up PR / issue.
gstools/variogram/variogram.py
Outdated
|
||
if config.USE_RUST: | ||
# pylint: disable=E0401 | ||
from gstools_core import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about setting # pragma: no cover
behind if config.USE_RUST:
once?
Also the # pylint: disable=C0412
is redundant (see line 13) and we could also just add E0401
in line 13.
I think an import error (E0401) should only be disabled locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I removed all Cython code and replaced it with the GSTools-Core PyPI package v0.1.2.
The Github actions still need to be updated, as the distribution of GSTools has become muuch simpler.