Unified input handling and dynamic input embedding (dynamicslab#63)

This commit unifies the way the different models preprocess the static and dynamic input data. Closes dynamicslab#57 and dynamicslab#74, part of dynamicslab#64. Similar to the model head, there is now an InputLayer that all (single-frequency) models use. Depending on the config, this layer will: - pass the static variables through an embedding layer (if cfg.statics_embedding is True) - pass the dynamic variables through an embedding layer (if cfg.dynamics_embedding is True) - concatenate static to each step of dynamic inputs (unless this is deactivated; e.g., for EA-LSTM this is not wanted) Key changes: - New config arguments statics_embedding, dynamics_embedding (both bool, default False). - EmbCudaLSTM is no longer needed since the functionality is now part of CudaLSTM. For backwards compatibility, the class still exists but defers to CudaLSTM. - If EA-LSTM and static embeddings are used, there will now be a linear layer between the embedding output and the lstm input to make sure the dimensions match. Previously, the linear layer would be left out if static embeddings were used.
jpcurbelo · Jan 22, 2021 · ce419bc · ce419bc
1 parent bed1880
commit ce419bc
Show file tree

Hide file tree

Showing 23 changed files with 526 additions and 401 deletions.
diff --git a/docs/source/api/neuralhydrology.modelzoo.inputlayer.rst b/docs/source/api/neuralhydrology.modelzoo.inputlayer.rst
@@ -0,0 +1,7 @@
+InputLayer
+==========
+
+.. automodule:: neuralhydrology.modelzoo.inputlayer
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/neuralhydrology.modelzoo.rst b/docs/source/api/neuralhydrology.modelzoo.rst
@@ -17,6 +17,7 @@ nh.modelzoo
    neuralhydrology.modelzoo.fc
    neuralhydrology.modelzoo.gru
    neuralhydrology.modelzoo.head
+   neuralhydrology.modelzoo.inputlayer
    neuralhydrology.modelzoo.mtslstm
    neuralhydrology.modelzoo.odelstm
    neuralhydrology.modelzoo.template

diff --git a/docs/source/usage/config.rst b/docs/source/usage/config.rst
@@ -105,8 +105,8 @@ General model configuration
 
 -  ``model``: Defines the core of the model that will be used. Names
    have to match the values in `this
-   function <https://github.com/neuralhydrology/neuralhydrology/blob/master/neuralhydrology/modelzoo/__init__.py#L14>`__,
-   e.g., [``cudalstm``, ``ealstm``, ``embcudalstm``, ``mtslstm``]
+   function <https://github.com/neuralhydrology/neuralhydrology/blob/master/neuralhydrology/modelzoo/__init__.py#L17>`__,
+   e.g., [``cudalstm``, ``ealstm``, ``mtslstm``]
 
 -  ``head``: The prediction head that is used on top of the output of
    the core model. Currently supported is ``regression``.
@@ -185,12 +185,17 @@ These are used if ``model == odelstm``.
 Embedding network settings
 --------------------------
 
-These settings apply to small fully connected networks that are used in
-various places, such as the embedding network for static features in the
-``embcudalstm`` model or as an optional extended input gate network in 
-the ``ealstm`` model. For all other models, these settings can be ignored.
-If specified, but the ``cudalstm`` model is selected, the code will print a 
-warning.
+These settings define fully connected networks that are used in various places, such as the embedding network
+for static or dynamic features in the single-frequency models or as an optional extended input gate network in
+the EA-LSTM model. For multi-timescale models, these settings can be ignored.
+
+- ``statics_embedding``: Boolean to indicate whether the static inputs should be passed through an embedding network.
+  Note that for EA-LSTM, there will always be an additional linear layer that maps to the EA-LSTM's hidden size. This
+  means that the the embedding layer output size does not have to be equal to ``hidden_size``.
+
+- ``dynamics_embedding``: Boolean to indicate whether the dynamic inputs should be passed through an embedding network.
+  If both ``statics_embedding`` and ``dynamics_embedding`` are true, they will each use an individual embedding network,
+  but both networks will have the same structure (as defined by ``embedding_hiddens/activation/dropout``).
 
 -  ``embedding_hiddens``: List of integers that define the number of
    neurons per layer in the fully connected network. The last number is

diff --git a/docs/source/usage/models.rst b/docs/source/usage/models.rst
@@ -12,14 +12,16 @@ CudaLSTM
 --------
 :py:class:`neuralhydrology.modelzoo.cudalstm.CudaLSTM` is a network using the standard PyTorch LSTM implementation.
 All features (``x_d``, ``x_s``, ``x_one_hot``) are concatenated and passed to the network at each time step.
+If ``statics/dynamics_embedding`` are true, the static/dynamic inputs will be passed through embedding networks before
+being concatenated.
 The initial forget gate bias can be defined in config.yml (``initial_forget_bias``) and will be set accordingly during
 model initialization.
 
 CustomLSTM
 ----------
-:py:class:`neuralhydrology.modelzoo.customlstm.CustomLSTM` is a variant of the ``CudaLSTM`` and ``EmbCudaLSTM``
+:py:class:`neuralhydrology.modelzoo.customlstm.CustomLSTM` is a variant of the ``CudaLSTM``
 that returns all gate and state activations for all time steps. This class is mainly implemented for exploratory
-reasons. You can use the method ``model.copy_weights()`` to copy the weights of a ``CudaLSTM`` or ``EmbCudaLSTM`` model
+reasons. You can use the method ``model.copy_weights()`` to copy the weights of a ``CudaLSTM`` model
 into a ``CustomLSTM`` model. This allows to use the fast CUDA implementations for training, and only use this class for
 inference with more detailed outputs. You can however also use this model during training (``model: customlstm`` in the
 config.yml) or as a starter for your own modifications to the LSTM cell. Note, however, that the runtime of this model
@@ -31,16 +33,28 @@ EA-LSTM
 `Kratzert et al. "Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets" <https://hess.copernicus.org/articles/23/5089/2019/hess-23-5089-2019.html>`__.
 The static features (``x_s`` and/or ``x_one_hot``) are used to compute the input gate activations, while the dynamic
 inputs ``x_d`` are used in all other gates of the network.
-The initial forget gate bias can be defined in config.yml (``initial_forget_bias``). If ``embedding_hiddens`` is passed, the input gate consists of the so-defined
-FC network and not a single linear layer.
+The initial forget gate bias can be defined in config.yml (``initial_forget_bias``).
+If ``statics/dynamics_embedding`` are true, the static/dynamic inputs will first be passed through embedding networks.
+The output of the static embedding network will then be passed through the input gate, which consists of a single linear
+layer.
 
 EmbCudaLSTM
 -----------
+.. deprecated:: 0.9.9-beta
+   Use `CudaLSTM`_ with ``embedding_hiddens`` and ``statics_embedding: True``.
+
 :py:class:`neuralhydrology.modelzoo.embcudalstm.EmbCudaLSTM` is similar to `CudaLSTM`_,
 with the only difference that static inputs (``x_s`` and/or ``x_one_hot``) are passed through an embedding network
 (defined, for instance, by ``embedding_hiddens``) before being concatenated to the dynamic inputs ``x_d``
 at each time step.
 
+GRU
+---
+:py:class:`neuralhydrology.modelzoo.gru.GRU` is a network using the standard PyTorch GRU implementation.
+All features (``x_d``, ``x_s``, ``x_one_hot``) are concatenated and passed to the network at each time step.
+If ``statics/dynamics_embedding`` are true, the static/dynamic inputs will be passed through embedding networks before
+being concatenated.
+
 MTS-LSTM
 --------
 :py:class:`neuralhydrology.modelzoo.mtslstm.MTSLSTM` is a newly proposed model by `Gauch et al. "Rainfall--Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network" <https://arxiv.org/abs/2010.07921>`__.

diff --git a/examples/02-Adding-Models/adding-gru.ipynb b/examples/02-Adding-Models/adding-gru.ipynb
@@ -70,35 +70,35 @@
     "Every model's constructor receives a single argument: an instance of the run configuration.\n",
     "Based on this config, we'll construct the GRU.\n",
     "\n",
-    "Like most our models, the GRU will consist of two components: The \"body\" that represents the actual GRU cell, and the \"head\" that acts as a final output layer.\n",
-    "To maintain a modular architecture, the head should not be implemented inside the model, but we should use the `get_head` function in `neuralhydrology.modelzoo.head` to retrieve the head that fits to the run configuration."
+    "Like most our models, the GRU will consist of three components: \n",
+    "\n",
+    "- An optional input layer that acts as an embedding network for static or dynamic features. If used, the features will be passed through a fully-connected network before we pass them to the actual GRU. If no embedding is specified, this layer will do nothing.\n",
+    "- The \"body\" that represents the actual GRU cell.\n",
+    "- The \"head\" that acts as a final output layer.\n",
+    "\n",
+    "To maintain a modular architecture, the input and head layers should not be implemented inside the model. Instead, we should use the `InputLayer` in `neuralhydrology.modelzoo.inputlayer` and the `get_head` function in `neuralhydrology.modelzoo.head` which will automatically construct layers that fit to the run configuration."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [],
    "source": [
     "class GRU(BaseModel):\n",
     "\n",
     "    # specify submodules of the model that can later be used for finetuning. Names must match class attributes\n",
-    "    module_parts = ['gru', 'head']\n",
+    "    module_parts = ['embedding_net', 'gru', 'head']\n",
     "\n",
     "    def __init__(self, cfg: Config):\n",
     "\n",
     "        super(GRU, self).__init__(cfg=cfg)\n",
     "\n",
-    "        # calculate the dimension of inputs that will be fed into our model\n",
-    "        input_size = len(cfg.dynamic_inputs + cfg.evolving_attributes + cfg.hydroatlas_attributes + cfg.static_attributes)\n",
-    "        if cfg.use_basin_id_encoding:\n",
-    "            input_size += cfg.number_of_basins\n",
-    "\n",
-    "        if cfg.head.lower() == \"umal\":\n",
-    "            input_size += 1\n",
+    "        # retrieve the input layer\n",
+    "        self.embedding_net = InputLayer(cfg)\n",
     "\n",
     "        # create the actual GRU\n",
-    "        self.gru = nn.GRU(input_size=input_size, hidden_size=cfg.hidden_size)\n",
+    "        self.gru = nn.GRU(input_size=self.embedding_net.output_size, hidden_size=cfg.hidden_size)\n",
     "\n",
     "        # add dropout between GRU and head\n",
     "        self.dropout = nn.Dropout(p=cfg.output_dropout)\n",
@@ -143,28 +143,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [],
    "source": [
     "def forward(self, data: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:\n",
-    "    \n",
-    "    # transpose to [seq_length, batch_size, n_features]\n",
-    "    x_d = data['x_d'].transpose(0, 1)\n",
-    "\n",
-    "    # concatenate all inputs\n",
-    "    if 'x_s' in data and 'x_one_hot' in data:\n",
-    "        x_s = data['x_s'].unsqueeze(0).repeat(x_d.shape[0], 1, 1)\n",
-    "        x_one_hot = data['x_one_hot'].unsqueeze(0).repeat(x_d.shape[0], 1, 1)\n",
-    "        x_d = torch.cat([x_d, x_s, x_one_hot], dim=-1)\n",
-    "    elif 'x_s' in data:\n",
-    "        x_s = data['x_s'].unsqueeze(0).repeat(x_d.shape[0], 1, 1)\n",
-    "        x_d = torch.cat([x_d, x_s], dim=-1)\n",
-    "    elif 'x_one_hot' in data:\n",
-    "        x_one_hot = data['x_one_hot'].unsqueeze(0).repeat(x_d.shape[0], 1, 1)\n",
-    "        x_d = torch.cat([x_d, x_one_hot], dim=-1)\n",
-    "    else:\n",
-    "        pass\n",
+    "\n",
+    "    # possibly pass dynamic and static inputs through embedding layers, then concatenate them\n",
+    "    x_d = self.embedding_net(data, concatenate_output=True)    \n",
     "\n",
     "    # run the actual GRU\n",
     "    gru_output, h_n = self.gru(input=x_d)\n",
@@ -187,61 +173,30 @@
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {},
    "source": [
+    "As you see, much of the heavy lifting is being done by existing methods, so we just have to wire everything up.\n",
+    "The input layer merges the static inputs (`data['x_s']` and/or `data['x_one_hot']`) to each step of the dynamic inputs (`data['x_d']`) and returns a single tensor that we can pass to the GRU cell.\n",
+    "\n",
     "### Using the Model\n",
     "\n",
     "That's it! We now have a working GRU model that we can use to train and evaluate models.\n",
     "The only thing left is registering the model in the `get_model` method of `neuralhydrology.modelzoo` to make sure we can specify the model in a run configuration.\n",
     "\n",
-    "Since GRU already exists in the modelzoo, it's already there:"
-   ]
+    "Since GRU already exists in the modelzoo, it's already there:\n"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 4,
    "metadata": {},
    "outputs": [
     {
-     "name": "stdout",
      "output_type": "stream",
+     "name": "stdout",
      "text": [
-      "def get_model(cfg: Config) -> nn.Module:\n",
-      "    \"\"\"Get model object, depending on the run configuration.\n",
-      "    \n",
-      "    Parameters\n",
-      "    ----------\n",
-      "    cfg : Config\n",
-      "        The run configuration.\n",
-      "\n",
-      "    Returns\n",
-      "    -------\n",
-      "    nn.Module\n",
-      "        A new model instance of the type specified in the config.\n",
-      "    \"\"\"\n",
-      "    if cfg.model in SINGLE_FREQ_MODELS and len(cfg.use_frequencies) > 1:\n",
-      "        raise ValueError(f\"Model {cfg.model} does not support multiple frequencies.\")\n",
-      "\n",
-      "    if cfg.model == \"cudalstm\":\n",
-      "        model = CudaLSTM(cfg=cfg)\n",
-      "    elif cfg.model == \"ealstm\":\n",
-      "        model = EALSTM(cfg=cfg)\n",
-      "    elif cfg.model == \"lstm\":\n",
-      "        model = LSTM(cfg=cfg)\n",
-      "    elif cfg.model == \"gru\":\n",
-      "        model = GRU(cfg=cfg)\n",
-      "    elif cfg.model == \"embcudalstm\":\n",
-      "        model = EmbCudaLSTM(cfg=cfg)\n",
-      "    elif cfg.model == \"mtslstm\":\n",
-      "        model = MTSLSTM(cfg=cfg)\n",
-      "    elif cfg.model == \"odelstm\":\n",
-      "        model = ODELSTM(cfg=cfg)\n",
-      "    else:\n",
-      "        raise NotImplementedError(f\"{cfg.model} not implemented or not linked in `get_model()`\")\n",
-      "\n",
-      "    return model\n",
-      "\n"
+      "def get_model(cfg: Config) -> nn.Module:\n    \"\"\"Get model object, depending on the run configuration.\n    \n    Parameters\n    ----------\n    cfg : Config\n        The run configuration.\n\n    Returns\n    -------\n    nn.Module\n        A new model instance of the type specified in the config.\n    \"\"\"\n    if cfg.model in SINGLE_FREQ_MODELS and len(cfg.use_frequencies) > 1:\n        raise ValueError(f\"Model {cfg.model} does not support multiple frequencies.\")\n\n    if cfg.model == \"cudalstm\":\n        model = CudaLSTM(cfg=cfg)\n    elif cfg.model == \"ealstm\":\n        model = EALSTM(cfg=cfg)\n    elif cfg.model == \"customlstm\":\n        model = CustomLSTM(cfg=cfg)\n    elif cfg.model == \"lstm\":\n        warnings.warn(\n            \"The `LSTM` class has been renamed to `CustomLSTM`. Support for `LSTM` will we dropped in the future.\",\n            FutureWarning)\n        model = CustomLSTM(cfg=cfg)\n    elif cfg.model == \"gru\":\n        model = GRU(cfg=cfg)\n    elif cfg.model == \"embcudalstm\":\n        model = EmbCudaLSTM(cfg=cfg)\n    elif cfg.model == \"mtslstm\":\n        model = MTSLSTM(cfg=cfg)\n    elif cfg.model == \"odelstm\":\n        model = ODELSTM(cfg=cfg)\n    else:\n        raise NotImplementedError(f\"{cfg.model} not implemented or not linked in `get_model()`\")\n\n    return model\n\n"
      ]
     }
    ],
@@ -274,7 +229,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.7"
+   "version": "3.7.9-final"
   }
  },
  "nbformat": 4,

diff --git a/neuralhydrology/__about__.py b/neuralhydrology/__about__.py
@@ -1,2 +1 @@
-__version__ = "0.9.10-beta"
-
+__version__ = "0.9.11-beta"
diff --git a/neuralhydrology/modelzoo/cudalstm.py b/neuralhydrology/modelzoo/cudalstm.py
@@ -1,25 +1,23 @@
-import logging
 from typing import Dict
 
 import torch
 import torch.nn as nn
 
+from neuralhydrology.modelzoo.inputlayer import InputLayer
 from neuralhydrology.modelzoo.head import get_head
 from neuralhydrology.modelzoo.basemodel import BaseModel
 from neuralhydrology.utils.config import Config
 
-LOGGER = logging.getLogger(__name__)
-
 
 class CudaLSTM(BaseModel):
     """LSTM model class, which relies on PyTorch's CUDA LSTM class.
 
-    This class implements the standard LSTM combined with a model head, as specified in the config. All features 
-    (time series and static) are concatenated and passed to the LSTM directly. If you want to embed the static features
-    prior to the concatenation, use the `EmbCudaLSTM` class.
-    To control the initial forget gate bias, use the config argument `initial_forget_bias`. Often it is useful to set 
+    This class implements the standard LSTM combined with a model head, as specified in the config. Depending on the
+    embedding settings, static and/or dynamic features may or may not be fed through embedding networks before being
+    concatenated and passed through the LSTM.
+    To control the initial forget gate bias, use the config argument `initial_forget_bias`. Often it is useful to set
     this value to a positive value at the start of the model training, to keep the forget gate closed and to facilitate
-    the gradient flow. 
+    the gradient flow.
     The `CudaLSTM` class only supports single-timescale predictions. Use `MTSLSTM` to train a model and get
     predictions on multiple temporal resolutions at the same time.
 
@@ -29,23 +27,14 @@ class CudaLSTM(BaseModel):
         The run configuration.
     """
     # specify submodules of the model that can later be used for finetuning. Names must match class attributes
-    module_parts = ['lstm', 'head']
+    module_parts = ['embedding_net', 'lstm', 'head']
 
     def __init__(self, cfg: Config):
         super(CudaLSTM, self).__init__(cfg=cfg)
 
-        if cfg.embedding_hiddens:
-            LOGGER.warning("## Warning: Embedding settings are ignored. Use EmbCudaLSTM for embeddings")
-
-        input_size = len(cfg.dynamic_inputs + cfg.evolving_attributes + cfg.hydroatlas_attributes +
-                         cfg.static_attributes)
-        if cfg.use_basin_id_encoding:
-            input_size += cfg.number_of_basins
+        self.embedding_net = InputLayer(cfg)
 
-        if cfg.head.lower() == "umal":
-            input_size += 1
-
-        self.lstm = nn.LSTM(input_size=input_size, hidden_size=cfg.hidden_size)
+        self.lstm = nn.LSTM(input_size=self.embedding_net.output_size, hidden_size=cfg.hidden_size)
 
         self.dropout = nn.Dropout(p=cfg.output_dropout)
 
@@ -69,28 +58,13 @@ def forward(self, data: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:
         Returns
         -------
         Dict[str, torch.Tensor]
-            Model outputs and intermediate states as a dictionary. 
+            Model outputs and intermediate states as a dictionary.
                 - `y_hat`: model predictions of shape [batch size, sequence length, number of target variables].
                 - `h_n`: hidden state at the last time step of the sequence of shape [batch size, 1, hidden size].
                 - `c_n`: cell state at the last time step of the sequence of shape [batch size, 1, hidden size].
         """
-        # transpose to [seq_length, batch_size, n_features]
-        x_d = data['x_d'].transpose(0, 1)
-
-        # concat all inputs
-        if 'x_s' in data and 'x_one_hot' in data:
-            x_s = data['x_s'].unsqueeze(0).repeat(x_d.shape[0], 1, 1)
-            x_one_hot = data['x_one_hot'].unsqueeze(0).repeat(x_d.shape[0], 1, 1)
-            x_d = torch.cat([x_d, x_s, x_one_hot], dim=-1)
-        elif 'x_s' in data:
-            x_s = data['x_s'].unsqueeze(0).repeat(x_d.shape[0], 1, 1)
-            x_d = torch.cat([x_d, x_s], dim=-1)
-        elif 'x_one_hot' in data:
-            x_one_hot = data['x_one_hot'].unsqueeze(0).repeat(x_d.shape[0], 1, 1)
-            x_d = torch.cat([x_d, x_one_hot], dim=-1)
-        else:
-            pass
-
+        # possibly pass dynamic and static inputs through embedding layers, then concatenate them
+        x_d = self.embedding_net(data)
         lstm_output, (h_n, c_n) = self.lstm(input=x_d)
 
         # reshape to [batch_size, seq, n_hiddens]
Original file line number	Diff line number	Diff line change
		@@ -1,2 +1 @@
		__version__ = "0.9.10-beta"

		__version__ = "0.9.11-beta"