Skip to content

Commit

Permalink
Unified input handling and dynamic input embedding (dynamicslab#63)
Browse files Browse the repository at this point in the history
This commit unifies the way the different models preprocess the static and dynamic input data.

Closes dynamicslab#57 and dynamicslab#74, part of dynamicslab#64.

Similar to the model head, there is now an InputLayer that all (single-frequency) models use. Depending on the config, this layer will:
- pass the static variables through an embedding layer (if cfg.statics_embedding is True)
- pass the dynamic variables through an embedding layer (if cfg.dynamics_embedding is True)
- concatenate static to each step of dynamic inputs (unless this is deactivated; e.g., for EA-LSTM this is not wanted)

Key changes:
- New config arguments statics_embedding, dynamics_embedding (both bool, default False).
- EmbCudaLSTM is no longer needed since the functionality is now part of CudaLSTM. For backwards compatibility, the class still exists but defers to CudaLSTM.
- If EA-LSTM and static embeddings are used, there will now be a linear layer between the embedding output and the lstm input to make sure the dimensions match. Previously, the linear layer would be left out if static embeddings were used.
  • Loading branch information
gauchm authored Jan 22, 2021
1 parent bed1880 commit ce419bc
Show file tree
Hide file tree
Showing 23 changed files with 526 additions and 401 deletions.
7 changes: 7 additions & 0 deletions docs/source/api/neuralhydrology.modelzoo.inputlayer.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
InputLayer
==========

.. automodule:: neuralhydrology.modelzoo.inputlayer
:members:
:undoc-members:
:show-inheritance:
1 change: 1 addition & 0 deletions docs/source/api/neuralhydrology.modelzoo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ nh.modelzoo
neuralhydrology.modelzoo.fc
neuralhydrology.modelzoo.gru
neuralhydrology.modelzoo.head
neuralhydrology.modelzoo.inputlayer
neuralhydrology.modelzoo.mtslstm
neuralhydrology.modelzoo.odelstm
neuralhydrology.modelzoo.template
Expand Down
21 changes: 13 additions & 8 deletions docs/source/usage/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,8 @@ General model configuration

- ``model``: Defines the core of the model that will be used. Names
have to match the values in `this
function <https://github.com/neuralhydrology/neuralhydrology/blob/master/neuralhydrology/modelzoo/__init__.py#L14>`__,
e.g., [``cudalstm``, ``ealstm``, ``embcudalstm``, ``mtslstm``]
function <https://github.com/neuralhydrology/neuralhydrology/blob/master/neuralhydrology/modelzoo/__init__.py#L17>`__,
e.g., [``cudalstm``, ``ealstm``, ``mtslstm``]

- ``head``: The prediction head that is used on top of the output of
the core model. Currently supported is ``regression``.
Expand Down Expand Up @@ -185,12 +185,17 @@ These are used if ``model == odelstm``.
Embedding network settings
--------------------------

These settings apply to small fully connected networks that are used in
various places, such as the embedding network for static features in the
``embcudalstm`` model or as an optional extended input gate network in
the ``ealstm`` model. For all other models, these settings can be ignored.
If specified, but the ``cudalstm`` model is selected, the code will print a
warning.
These settings define fully connected networks that are used in various places, such as the embedding network
for static or dynamic features in the single-frequency models or as an optional extended input gate network in
the EA-LSTM model. For multi-timescale models, these settings can be ignored.

- ``statics_embedding``: Boolean to indicate whether the static inputs should be passed through an embedding network.
Note that for EA-LSTM, there will always be an additional linear layer that maps to the EA-LSTM's hidden size. This
means that the the embedding layer output size does not have to be equal to ``hidden_size``.

- ``dynamics_embedding``: Boolean to indicate whether the dynamic inputs should be passed through an embedding network.
If both ``statics_embedding`` and ``dynamics_embedding`` are true, they will each use an individual embedding network,
but both networks will have the same structure (as defined by ``embedding_hiddens/activation/dropout``).

- ``embedding_hiddens``: List of integers that define the number of
neurons per layer in the fully connected network. The last number is
Expand Down
22 changes: 18 additions & 4 deletions docs/source/usage/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,16 @@ CudaLSTM
--------
:py:class:`neuralhydrology.modelzoo.cudalstm.CudaLSTM` is a network using the standard PyTorch LSTM implementation.
All features (``x_d``, ``x_s``, ``x_one_hot``) are concatenated and passed to the network at each time step.
If ``statics/dynamics_embedding`` are true, the static/dynamic inputs will be passed through embedding networks before
being concatenated.
The initial forget gate bias can be defined in config.yml (``initial_forget_bias``) and will be set accordingly during
model initialization.

CustomLSTM
----------
:py:class:`neuralhydrology.modelzoo.customlstm.CustomLSTM` is a variant of the ``CudaLSTM`` and ``EmbCudaLSTM``
:py:class:`neuralhydrology.modelzoo.customlstm.CustomLSTM` is a variant of the ``CudaLSTM``
that returns all gate and state activations for all time steps. This class is mainly implemented for exploratory
reasons. You can use the method ``model.copy_weights()`` to copy the weights of a ``CudaLSTM`` or ``EmbCudaLSTM`` model
reasons. You can use the method ``model.copy_weights()`` to copy the weights of a ``CudaLSTM`` model
into a ``CustomLSTM`` model. This allows to use the fast CUDA implementations for training, and only use this class for
inference with more detailed outputs. You can however also use this model during training (``model: customlstm`` in the
config.yml) or as a starter for your own modifications to the LSTM cell. Note, however, that the runtime of this model
Expand All @@ -31,16 +33,28 @@ EA-LSTM
`Kratzert et al. "Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets" <https://hess.copernicus.org/articles/23/5089/2019/hess-23-5089-2019.html>`__.
The static features (``x_s`` and/or ``x_one_hot``) are used to compute the input gate activations, while the dynamic
inputs ``x_d`` are used in all other gates of the network.
The initial forget gate bias can be defined in config.yml (``initial_forget_bias``). If ``embedding_hiddens`` is passed, the input gate consists of the so-defined
FC network and not a single linear layer.
The initial forget gate bias can be defined in config.yml (``initial_forget_bias``).
If ``statics/dynamics_embedding`` are true, the static/dynamic inputs will first be passed through embedding networks.
The output of the static embedding network will then be passed through the input gate, which consists of a single linear
layer.

EmbCudaLSTM
-----------
.. deprecated:: 0.9.9-beta
Use `CudaLSTM`_ with ``embedding_hiddens`` and ``statics_embedding: True``.

:py:class:`neuralhydrology.modelzoo.embcudalstm.EmbCudaLSTM` is similar to `CudaLSTM`_,
with the only difference that static inputs (``x_s`` and/or ``x_one_hot``) are passed through an embedding network
(defined, for instance, by ``embedding_hiddens``) before being concatenated to the dynamic inputs ``x_d``
at each time step.

GRU
---
:py:class:`neuralhydrology.modelzoo.gru.GRU` is a network using the standard PyTorch GRU implementation.
All features (``x_d``, ``x_s``, ``x_one_hot``) are concatenated and passed to the network at each time step.
If ``statics/dynamics_embedding`` are true, the static/dynamic inputs will be passed through embedding networks before
being concatenated.

MTS-LSTM
--------
:py:class:`neuralhydrology.modelzoo.mtslstm.MTSLSTM` is a newly proposed model by `Gauch et al. "Rainfall--Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network" <https://arxiv.org/abs/2010.07921>`__.
Expand Down
99 changes: 27 additions & 72 deletions examples/02-Adding-Models/adding-gru.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -70,35 +70,35 @@
"Every model's constructor receives a single argument: an instance of the run configuration.\n",
"Based on this config, we'll construct the GRU.\n",
"\n",
"Like most our models, the GRU will consist of two components: The \"body\" that represents the actual GRU cell, and the \"head\" that acts as a final output layer.\n",
"To maintain a modular architecture, the head should not be implemented inside the model, but we should use the `get_head` function in `neuralhydrology.modelzoo.head` to retrieve the head that fits to the run configuration."
"Like most our models, the GRU will consist of three components: \n",
"\n",
"- An optional input layer that acts as an embedding network for static or dynamic features. If used, the features will be passed through a fully-connected network before we pass them to the actual GRU. If no embedding is specified, this layer will do nothing.\n",
"- The \"body\" that represents the actual GRU cell.\n",
"- The \"head\" that acts as a final output layer.\n",
"\n",
"To maintain a modular architecture, the input and head layers should not be implemented inside the model. Instead, we should use the `InputLayer` in `neuralhydrology.modelzoo.inputlayer` and the `get_head` function in `neuralhydrology.modelzoo.head` which will automatically construct layers that fit to the run configuration."
]
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"class GRU(BaseModel):\n",
"\n",
" # specify submodules of the model that can later be used for finetuning. Names must match class attributes\n",
" module_parts = ['gru', 'head']\n",
" module_parts = ['embedding_net', 'gru', 'head']\n",
"\n",
" def __init__(self, cfg: Config):\n",
"\n",
" super(GRU, self).__init__(cfg=cfg)\n",
"\n",
" # calculate the dimension of inputs that will be fed into our model\n",
" input_size = len(cfg.dynamic_inputs + cfg.evolving_attributes + cfg.hydroatlas_attributes + cfg.static_attributes)\n",
" if cfg.use_basin_id_encoding:\n",
" input_size += cfg.number_of_basins\n",
"\n",
" if cfg.head.lower() == \"umal\":\n",
" input_size += 1\n",
" # retrieve the input layer\n",
" self.embedding_net = InputLayer(cfg)\n",
"\n",
" # create the actual GRU\n",
" self.gru = nn.GRU(input_size=input_size, hidden_size=cfg.hidden_size)\n",
" self.gru = nn.GRU(input_size=self.embedding_net.output_size, hidden_size=cfg.hidden_size)\n",
"\n",
" # add dropout between GRU and head\n",
" self.dropout = nn.Dropout(p=cfg.output_dropout)\n",
Expand Down Expand Up @@ -143,28 +143,14 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def forward(self, data: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:\n",
" \n",
" # transpose to [seq_length, batch_size, n_features]\n",
" x_d = data['x_d'].transpose(0, 1)\n",
"\n",
" # concatenate all inputs\n",
" if 'x_s' in data and 'x_one_hot' in data:\n",
" x_s = data['x_s'].unsqueeze(0).repeat(x_d.shape[0], 1, 1)\n",
" x_one_hot = data['x_one_hot'].unsqueeze(0).repeat(x_d.shape[0], 1, 1)\n",
" x_d = torch.cat([x_d, x_s, x_one_hot], dim=-1)\n",
" elif 'x_s' in data:\n",
" x_s = data['x_s'].unsqueeze(0).repeat(x_d.shape[0], 1, 1)\n",
" x_d = torch.cat([x_d, x_s], dim=-1)\n",
" elif 'x_one_hot' in data:\n",
" x_one_hot = data['x_one_hot'].unsqueeze(0).repeat(x_d.shape[0], 1, 1)\n",
" x_d = torch.cat([x_d, x_one_hot], dim=-1)\n",
" else:\n",
" pass\n",
"\n",
" # possibly pass dynamic and static inputs through embedding layers, then concatenate them\n",
" x_d = self.embedding_net(data, concatenate_output=True) \n",
"\n",
" # run the actual GRU\n",
" gru_output, h_n = self.gru(input=x_d)\n",
Expand All @@ -187,61 +173,30 @@
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you see, much of the heavy lifting is being done by existing methods, so we just have to wire everything up.\n",
"The input layer merges the static inputs (`data['x_s']` and/or `data['x_one_hot']`) to each step of the dynamic inputs (`data['x_d']`) and returns a single tensor that we can pass to the GRU cell.\n",
"\n",
"### Using the Model\n",
"\n",
"That's it! We now have a working GRU model that we can use to train and evaluate models.\n",
"The only thing left is registering the model in the `get_model` method of `neuralhydrology.modelzoo` to make sure we can specify the model in a run configuration.\n",
"\n",
"Since GRU already exists in the modelzoo, it's already there:"
]
"Since GRU already exists in the modelzoo, it's already there:\n"
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"name": "stdout",
"text": [
"def get_model(cfg: Config) -> nn.Module:\n",
" \"\"\"Get model object, depending on the run configuration.\n",
" \n",
" Parameters\n",
" ----------\n",
" cfg : Config\n",
" The run configuration.\n",
"\n",
" Returns\n",
" -------\n",
" nn.Module\n",
" A new model instance of the type specified in the config.\n",
" \"\"\"\n",
" if cfg.model in SINGLE_FREQ_MODELS and len(cfg.use_frequencies) > 1:\n",
" raise ValueError(f\"Model {cfg.model} does not support multiple frequencies.\")\n",
"\n",
" if cfg.model == \"cudalstm\":\n",
" model = CudaLSTM(cfg=cfg)\n",
" elif cfg.model == \"ealstm\":\n",
" model = EALSTM(cfg=cfg)\n",
" elif cfg.model == \"lstm\":\n",
" model = LSTM(cfg=cfg)\n",
" elif cfg.model == \"gru\":\n",
" model = GRU(cfg=cfg)\n",
" elif cfg.model == \"embcudalstm\":\n",
" model = EmbCudaLSTM(cfg=cfg)\n",
" elif cfg.model == \"mtslstm\":\n",
" model = MTSLSTM(cfg=cfg)\n",
" elif cfg.model == \"odelstm\":\n",
" model = ODELSTM(cfg=cfg)\n",
" else:\n",
" raise NotImplementedError(f\"{cfg.model} not implemented or not linked in `get_model()`\")\n",
"\n",
" return model\n",
"\n"
"def get_model(cfg: Config) -> nn.Module:\n \"\"\"Get model object, depending on the run configuration.\n \n Parameters\n ----------\n cfg : Config\n The run configuration.\n\n Returns\n -------\n nn.Module\n A new model instance of the type specified in the config.\n \"\"\"\n if cfg.model in SINGLE_FREQ_MODELS and len(cfg.use_frequencies) > 1:\n raise ValueError(f\"Model {cfg.model} does not support multiple frequencies.\")\n\n if cfg.model == \"cudalstm\":\n model = CudaLSTM(cfg=cfg)\n elif cfg.model == \"ealstm\":\n model = EALSTM(cfg=cfg)\n elif cfg.model == \"customlstm\":\n model = CustomLSTM(cfg=cfg)\n elif cfg.model == \"lstm\":\n warnings.warn(\n \"The `LSTM` class has been renamed to `CustomLSTM`. Support for `LSTM` will we dropped in the future.\",\n FutureWarning)\n model = CustomLSTM(cfg=cfg)\n elif cfg.model == \"gru\":\n model = GRU(cfg=cfg)\n elif cfg.model == \"embcudalstm\":\n model = EmbCudaLSTM(cfg=cfg)\n elif cfg.model == \"mtslstm\":\n model = MTSLSTM(cfg=cfg)\n elif cfg.model == \"odelstm\":\n model = ODELSTM(cfg=cfg)\n else:\n raise NotImplementedError(f\"{cfg.model} not implemented or not linked in `get_model()`\")\n\n return model\n\n"
]
}
],
Expand Down Expand Up @@ -274,7 +229,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
"version": "3.7.9-final"
}
},
"nbformat": 4,
Expand Down
3 changes: 1 addition & 2 deletions neuralhydrology/__about__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
__version__ = "0.9.10-beta"

__version__ = "0.9.11-beta"
50 changes: 12 additions & 38 deletions neuralhydrology/modelzoo/cudalstm.py
Original file line number Diff line number Diff line change
@@ -1,25 +1,23 @@
import logging
from typing import Dict

import torch
import torch.nn as nn

from neuralhydrology.modelzoo.inputlayer import InputLayer
from neuralhydrology.modelzoo.head import get_head
from neuralhydrology.modelzoo.basemodel import BaseModel
from neuralhydrology.utils.config import Config

LOGGER = logging.getLogger(__name__)


class CudaLSTM(BaseModel):
"""LSTM model class, which relies on PyTorch's CUDA LSTM class.
This class implements the standard LSTM combined with a model head, as specified in the config. All features
(time series and static) are concatenated and passed to the LSTM directly. If you want to embed the static features
prior to the concatenation, use the `EmbCudaLSTM` class.
To control the initial forget gate bias, use the config argument `initial_forget_bias`. Often it is useful to set
This class implements the standard LSTM combined with a model head, as specified in the config. Depending on the
embedding settings, static and/or dynamic features may or may not be fed through embedding networks before being
concatenated and passed through the LSTM.
To control the initial forget gate bias, use the config argument `initial_forget_bias`. Often it is useful to set
this value to a positive value at the start of the model training, to keep the forget gate closed and to facilitate
the gradient flow.
the gradient flow.
The `CudaLSTM` class only supports single-timescale predictions. Use `MTSLSTM` to train a model and get
predictions on multiple temporal resolutions at the same time.
Expand All @@ -29,23 +27,14 @@ class CudaLSTM(BaseModel):
The run configuration.
"""
# specify submodules of the model that can later be used for finetuning. Names must match class attributes
module_parts = ['lstm', 'head']
module_parts = ['embedding_net', 'lstm', 'head']

def __init__(self, cfg: Config):
super(CudaLSTM, self).__init__(cfg=cfg)

if cfg.embedding_hiddens:
LOGGER.warning("## Warning: Embedding settings are ignored. Use EmbCudaLSTM for embeddings")

input_size = len(cfg.dynamic_inputs + cfg.evolving_attributes + cfg.hydroatlas_attributes +
cfg.static_attributes)
if cfg.use_basin_id_encoding:
input_size += cfg.number_of_basins
self.embedding_net = InputLayer(cfg)

if cfg.head.lower() == "umal":
input_size += 1

self.lstm = nn.LSTM(input_size=input_size, hidden_size=cfg.hidden_size)
self.lstm = nn.LSTM(input_size=self.embedding_net.output_size, hidden_size=cfg.hidden_size)

self.dropout = nn.Dropout(p=cfg.output_dropout)

Expand All @@ -69,28 +58,13 @@ def forward(self, data: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:
Returns
-------
Dict[str, torch.Tensor]
Model outputs and intermediate states as a dictionary.
Model outputs and intermediate states as a dictionary.
- `y_hat`: model predictions of shape [batch size, sequence length, number of target variables].
- `h_n`: hidden state at the last time step of the sequence of shape [batch size, 1, hidden size].
- `c_n`: cell state at the last time step of the sequence of shape [batch size, 1, hidden size].
"""
# transpose to [seq_length, batch_size, n_features]
x_d = data['x_d'].transpose(0, 1)

# concat all inputs
if 'x_s' in data and 'x_one_hot' in data:
x_s = data['x_s'].unsqueeze(0).repeat(x_d.shape[0], 1, 1)
x_one_hot = data['x_one_hot'].unsqueeze(0).repeat(x_d.shape[0], 1, 1)
x_d = torch.cat([x_d, x_s, x_one_hot], dim=-1)
elif 'x_s' in data:
x_s = data['x_s'].unsqueeze(0).repeat(x_d.shape[0], 1, 1)
x_d = torch.cat([x_d, x_s], dim=-1)
elif 'x_one_hot' in data:
x_one_hot = data['x_one_hot'].unsqueeze(0).repeat(x_d.shape[0], 1, 1)
x_d = torch.cat([x_d, x_one_hot], dim=-1)
else:
pass

# possibly pass dynamic and static inputs through embedding layers, then concatenate them
x_d = self.embedding_net(data)
lstm_output, (h_n, c_n) = self.lstm(input=x_d)

# reshape to [batch_size, seq, n_hiddens]
Expand Down
Loading

0 comments on commit ce419bc

Please sign in to comment.