Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional dependency GSTools-Core #215

Merged
merged 10 commits into from
Nov 11, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -3,6 +3,15 @@
All notable changes to **GSTools** will be documented in this file.


## [1.3.4] - Pure Pink ?

### Enhancements
- add GStools-Core as optional dependency [#215](https://github.com/GeoStat-Framework/GSTools/pull/215)

### Changes
- remove unnecessary `dim` argument in Cython code [#216](https://github.com/GeoStat-Framework/GSTools/issues/216)


## [1.3.3] - Pure Pink - 2021-08

### Enhancements
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -345,6 +345,7 @@ in memory for immediate 3D plotting in Python.

### Optional

- [GSTools-Core >= 0.2.0](https://github.com/GeoStat-Framework/GSTools-Core)
- [matplotlib](https://matplotlib.org)
- [pyvista](https://docs.pyvista.org/)

30 changes: 30 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -70,6 +70,8 @@ To get the latest development version you can install it directly from GitHub:

If something went wrong during installation, try the :code:`-I` `flag from pip <https://pip-python3.readthedocs.io/en/latest/reference/pip_install.html?highlight=i#cmdoption-i>`_.

**Speeding up GSTools by parallelization**

To enable the OpenMP support, you have to provide a C compiler and OpenMP.
Parallel support is controlled by an environment variable ``GSTOOLS_BUILD_PARALLEL``,
that can be ``0`` or ``1`` (interpreted as ``0`` if not present).
@@ -89,6 +91,33 @@ For the development version, you can do almost the same:
export GSTOOLS_BUILD_PARALLEL=1
pip install git+git://github.com/GeoStat-Framework/GSTools.git@main

**Using experimental GSTools-Core for even more speed**

You can install the optional dependency `GSTools-Core <https://github.com/GeoStat-Framework/GSTools-Core>`_,
which is a re-implementation of the main algorithms used in GSTools. The new
package uses the language Rust and it should be faster (in some cases by orders
of magnitude), safer, and it will potentially completely replace the current
standard implementation in Cython. Once the package GSTools-Core is available
on your machine, it will be used by default. In case you want to switch back to
the Cython implementation, you can set :code:`gstools.config.USE_RUST=False` in
your code. This also works at runtime. You can install the optional dependency
e.g. by

.. code-block:: none

pip install gstools[rust]

or by manually installing the package

.. code-block:: none

pip install gstools-core

GSTools-Core will automatically use all your cores in parallel, without having
LSchueler marked this conversation as resolved.
Show resolved Hide resolved
to use OpenMP or a local C compiler.
In case you want to restrict the number of threads used, you can set the
environment variable ``RAYON_NUM_THREADS`` to the desired amount.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a little function in gstools.config to set this environment variable?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One argument against that is that this variable will affect all Rust software using Rayon, not just GSTools-Core's backend.

If we do provide a programmatic interface, we should probably make the Core initialize a dedicated thread pool and make this function affect only this thread pool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamreichold raises a fair point there. But that env. variable would only exist during the shell session from which GSTools is run (I don't have enough experience with Windows to know how it handles env. variables).
Did I understand it correctly, that the number of threads used by Rayon can only be set once at the start and doing it a second time results in an error? I'm not if this is worth the effort.
Another alternative would be to capture the env. variable at the start of GSTools and set it back to that value when exiting, but that doesn't sound very elegant.
I personally would just keep the hint in the readme for the moment. But if you have stronger opinions than I do, I'd be happy to implement your preferred solution.

Copy link

@adamreichold adamreichold Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I understand it correctly, that the number of threads used by Rayon can only be set once at the start and doing it a second time results in an error? I'm not if this is worth the effort.

Setting the environment variable will affect Rayon's so-called global thread pool which is initialized once, either on-demand when the first Rayon task is spawned or explicitly be the application. Hence, changing the environment variable after a first computation using Rayon was run has no effect.

This why I suggested making the GSTools-Core module store an Option<ThreadPool> which would be affected (meaning re-initialized using a given number of threads) by the function from gstools.config. If the option is set, that thread pool is used via ThreadPool::install, otherwise GSTools-Core falls back to the global thread pool.

I personally would just keep the hint in the readme for the moment.

Personally, I would say that online control over parallelism is a somewhat orthogonal issue and would best be served via a follow-up PR / issue.



Citation
========
@@ -387,6 +416,7 @@ Requirements
Optional
--------

- `GSTools-Core >= 0.2.0 <https://github.com/GeoStat-Framework/GSTools-Core>`_
- `matplotlib <https://matplotlib.org>`_
- `pyvista <https://docs.pyvista.org>`_

1 change: 1 addition & 0 deletions gstools/__init__.py
Original file line number Diff line number Diff line change
@@ -127,6 +127,7 @@
"""
# Hooray!
from gstools import (
config,
covmodel,
field,
krige,
14 changes: 14 additions & 0 deletions gstools/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# -*- coding: utf-8 -*-
"""
GStools subpackage providing global variables.

.. currentmodule:: gstools.config

"""
# pylint: disable=W0611
try: # pragma: no cover
import gstools_core

USE_RUST = True
except ImportError:
USE_RUST = False
10 changes: 8 additions & 2 deletions gstools/field/generator.py
Original file line number Diff line number Diff line change
@@ -10,16 +10,22 @@
RandMeth
IncomprRandMeth
"""
# pylint: disable=C0103, W0222
# pylint: disable=C0103, W0222, C0412
import warnings
from copy import deepcopy as dcp

import numpy as np

from gstools import config
from gstools.covmodel.base import CovModel
from gstools.field.summator import summate, summate_incompr
from gstools.random.rng import RNG

if config.USE_RUST: # pragma: no cover
# pylint: disable=E0401
from gstools_core import summate, summate_incompr
else:
from gstools.field.summator import summate, summate_incompr

__all__ = ["RandMeth", "IncomprRandMeth"]


16 changes: 11 additions & 5 deletions gstools/krige/base.py
Original file line number Diff line number Diff line change
@@ -9,23 +9,29 @@
.. autosummary::
Krige
"""
# pylint: disable=C0103, W0221, E1102, R0201
# pylint: disable=C0103, W0221, E1102, R0201, C0412
import collections

import numpy as np
import scipy.linalg as spl
from scipy.spatial.distance import cdist

from gstools import config
from gstools.field.base import Field
from gstools.krige.krigesum import (
calc_field_krige,
calc_field_krige_and_variance,
)
from gstools.krige.tools import get_drift_functions, set_condition
from gstools.tools.geometric import rotated_main_axes
from gstools.tools.misc import eval_func
from gstools.variogram import vario_estimate

if config.USE_RUST: # pragma: no cover
# pylint: disable=E0401
from gstools_core import calc_field_krige, calc_field_krige_and_variance
else:
from gstools.krige.krigesum import (
calc_field_krige,
calc_field_krige_and_variance,
)

__all__ = ["Krige"]


4 changes: 2 additions & 2 deletions gstools/variogram/estimator.pyx
Original file line number Diff line number Diff line change
@@ -178,7 +178,6 @@ cdef _normalization_func_vec choose_estimator_normalization_vec(str estimator_ty


def directional(
const int dim,
const double[:,:] f,
const double[:] bin_edges,
const double[:,:] pos,
@@ -203,6 +202,7 @@ def directional(
choose_estimator_normalization_vec(estimator_type)
)

cdef int dim = pos.shape[0]
cdef int d_max = direction.shape[0]
cdef int i_max = bin_edges.shape[0] - 1
cdef int j_max = pos.shape[1] - 1
@@ -237,13 +237,13 @@ def directional(
return np.asarray(variogram), np.asarray(counts)

def unstructured(
const int dim,
const double[:,:] f,
const double[:] bin_edges,
const double[:,:] pos,
str estimator_type='m',
str distance_type='e',
):
cdef int dim = pos.shape[0]
cdef _dist_func distance

if distance_type == 'e':
31 changes: 21 additions & 10 deletions gstools/variogram/variogram.py
Original file line number Diff line number Diff line change
@@ -10,8 +10,10 @@
vario_estimate
vario_estimate_axis
"""
# pylint: disable=C0412
import numpy as np

from gstools import config
from gstools.normalizer.tools import remove_trend_norm_mean
from gstools.tools.geometric import (
ang2dir,
@@ -20,12 +22,20 @@
generate_grid,
)
from gstools.variogram.binning import standard_bins
from gstools.variogram.estimator import (
directional,
ma_structured,
structured,
unstructured,
)

if config.USE_RUST: # pragma: no cover
# pylint: disable=E0401
from gstools_core import variogram_directional as directional
from gstools_core import variogram_ma_structured as ma_structured
from gstools_core import variogram_structured as structured
from gstools_core import variogram_unstructured as unstructured
else:
from gstools.variogram.estimator import (
directional,
ma_structured,
structured,
unstructured,
)

__all__ = [
"vario_estimate",
@@ -59,7 +69,8 @@ def _separate_dirs_test(direction, angles_tol):
for j in range(i + 1, direction.shape[0]):
s_prod = np.minimum(np.abs(np.dot(direction[i], direction[j])), 1)
separate_dirs &= np.arccos(s_prod) >= 2 * angles_tol
return separate_dirs
# gstools-core doesn't like the type `numpy.bool_`
return bool(separate_dirs)


def vario_estimate(
@@ -341,7 +352,6 @@ def vario_estimate(
# "h"aversine or "e"uclidean distance type
distance_type = "h" if latlon else "e"
estimates, counts = unstructured(
dim,
field,
bin_edges,
pos,
@@ -350,7 +360,6 @@ def vario_estimate(
)
else:
estimates, counts = directional(
dim,
field,
bin_edges,
pos,
@@ -441,7 +450,9 @@ def vario_estimate_axis(
field = np.ma.array(field, ndmin=1, dtype=np.double)
if missing:
field.mask = np.logical_or(field.mask, missing_mask)
mask = np.asarray(np.ma.getmaskarray(field), dtype=np.int32)
mask = np.ma.getmaskarray(field)
if not config.USE_RUST:
mask = np.asarray(mask, dtype=np.int32)
else:
field = np.array(field, ndmin=1, dtype=np.double, copy=False)
missing_mask = None # free space
2 changes: 2 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -66,6 +66,8 @@ doc =
plotting =
matplotlib>=3,<4
pyvista>=0.29,<1
rust =
gstools_core>=0.2.0,<1
test =
coverage[toml]>=5.2.1,<6
pytest>=6.0,<7