Enables production data from CSV file and fixes failing probability d…

…istribution test (#425) * Enables production data from CSV file * Adds config_parser changes and fixes black formatting * Fixes pylint * Fixes description CSVData class * Fixes probability distributions test * Updates documentation and CHANGELOG * Changes position of observation vectors entry in config file * Changes CI testdata branch for CI * Reverts CI back to equinor/master testdata branch and updates CHANGELOG
equinor · Aug 6, 2021 · 9ffca96 · 9ffca96
1 parent 3568b46
commit 9ffca96
Show file tree

Hide file tree

Showing 9 changed files with 354 additions and 265 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,7 +5,8 @@ This project adheres to [Semantic Versioning](https://semver.org/).
 ## Unreleased
 
 ### Added
-- [#417] (https://github.com/equinor/flownet/pull/417) Added functionality to history match dissolved salts (TDS) in produced water.
+- [#425](https://github.com/equinor/flownet/pull/425) Added functionality to load production data from CSV file. Changed position of observation `vectors` entry in configuration file (now one level higher, same level as `simulation` and `database` in `data_source` entry).
+- [#417](https://github.com/equinor/flownet/pull/417) Added functionality to history match dissolved salts (TDS) in produced water.
 - [#404](https://github.com/equinor/flownet/pull/404) Added possibility for regional multipliers for permeability, porosity and bulkvolume multiplier. Current implementation allows for defining either one global multiplier, or a regional multipliers based on a region parameter extracted from an existing simulation model (typically FIPNUM, EQLNUM, SATNUM etc). The regional multiplier will be in addition to the per tube multipliers. New keys in config yaml are: porosity_regional_scheme (global, individual or regions_from_sim), porosity_regional (define prior same way as for other model parameters) and porosity_parameter_from_sim_model (name of region parameter in simulation model). The same three keys exists for permeability and bulkvolume_mult.
 - [#383](https://github.com/equinor/flownet/pull/383) Added option to either define a prior distribution for KRWMAX directly by using krwmax in the config yaml, or to let KRWMAX be calculated as KRWEND + delta. To do the latter, set krwmax_add_to_krwend to true, and then the prior distribution definition in the config yaml for krwmax will be interpreted as a prior distribution for the delta value to be added to KRWEND to get the KRWMAX.
 - [#386](https://github.com/equinor/flownet/pull/386) Expose FlowNet timeout to user.

diff --git a/docs/configuration_file.rst b/docs/configuration_file.rst
@@ -27,6 +27,8 @@ Example of the entire flownet part of the configuration yaml file:
 
   flownet:
     data_source:
+      database:
+        input_data: ../input_data/norne_production_data.csv
       simulation:
         input_case: ../input_model/norne/NORNE_ATW2013
         vectors:
@@ -182,12 +184,21 @@ FlowNet will extract the data used to construct and condition the model from an
   FlowNet has an option to generate separate FlowNet models for each layer. To initiate this, supply a list of lists containing the 
   start and end layer in the input simulation model for each distinct layer
 
+database
+~~~~~~~~~~
+
+FlowNet will extract the production data used to history match the model from a CSV file.
+
+* **input_data**: Path to the production data CSV file.
+
 Example yaml section:
 
 .. code-block:: yaml 
 
   flownet:
     data_source:
+      database:
+        input_data: /path/to/production_data.csv
       simulation:
         input_case: /path/to/simulation_model.DATA
         vectors:
@@ -204,7 +215,9 @@ Example yaml section:
 In this example, the input simulation model (which has been simulated with Flow or Eclipse or similar) will be found in 
 */path/to/simulation_model.DATA*, the vectors to use in the conditioning of the FlowNet model are *WOPR* and *WGPR*, each
 with a relative error of 10% and minimum error of 50 (Sm3). Two FlowNet models will be created, one based on layers 1 to 5 
-in the input simulation model, and one based on layers 6 to 10 in the input simulation model.
+in the input simulation model, and one based on layers 6 to 10 in the input simulation model. If no input database CSV file
+containing production data is provided, FlowNet will use the simulated production data from the input simulation model.
+If a CSV file is specified, the production data from the CSV file will be used.
 
 resampling
 ~~~~~~~~~~

diff --git a/src/flownet/ahm/_run_ahm.py b/src/flownet/ahm/_run_ahm.py
@@ -26,7 +26,7 @@
     FaultTransmissibility,
     Parameter,
 )
-from ..data import FlowData
+from ..data import FlowData, CSVData
 
 
 def _set_up_ahm_and_run_ert(
@@ -542,6 +542,9 @@ def run_flownet_history_matching(
         layers=config.flownet.data_source.simulation.layers,
     )
     df_production_data: pd.DataFrame = field_data.production
+    if config.flownet.data_source.database.input_data:
+        csv_data = CSVData(config.flownet.data_source.database.input_data)
+        df_production_data = csv_data.production
     df_well_connections: pd.DataFrame = field_data.get_well_connections(
         config.flownet.perforation_handling_strategy
     )

diff --git a/src/flownet/config_parser/_config_parser.py b/src/flownet/config_parser/_config_parser.py
diff --git a/src/flownet/data/__init__.py b/src/flownet/data/__init__.py
@@ -1,3 +1,4 @@
 from ..data import from_source
 
 from ..data.from_flow import FromSource, FlowData
+from ..data.from_csv import CSVData
diff --git a/src/flownet/data/from_csv.py b/src/flownet/data/from_csv.py
@@ -0,0 +1,64 @@
+from pathlib import Path
+from typing import Union
+
+import pandas as pd
+
+
+class CSVData:
+    """
+    CSV data source class
+
+    Args:
+         input_data: Full path to CSV file to load production data from
+
+    """
+
+    def __init__(
+        self,
+        input_data: Union[Path, str],
+    ):
+        super().__init__()
+
+        self._input_data: Path = Path(input_data)
+
+    # pylint: disable=too-many-branches
+    def _production_data(self) -> pd.DataFrame:
+        """
+        Function to read production data for all producers and injectors from a CSV file.
+
+        Returns:
+            A DataFrame with a DateTimeIndex and the following columns:
+                - date          equal to index
+                - WELL_NAME     Well name as used in Flow
+                - WOPR          Well Oil Production Rate
+                - WGPR          Well Gas Production Rate
+                - WWPR          Well Water Production Rate
+                - WOPT          Well Cumulative Oil Production
+                - WGPT          Well Cumulative Gas Production
+                - WWPT          Well Cumulative Water Production
+                - WBHP          Well Bottom Hole Pressure
+                - WTHP          Well Tubing Head Pressure
+                - WGIR          Well Gas Injection Rate
+                - WWIR          Well Water Injection Rate
+                - WSPR          Well Salt Production Rate
+                - WSIR          Well Salt Injection Rate
+                - WSPT          Well Cumulative Salt Production
+                - WSIT          Well Cumulative Salt Injection
+                - WSTAT         Well status (OPEN, SHUT, STOP)
+                - TYPE          Well Type: "OP", "GP", "WI", "GI"
+                - PHASE         Main producing/injecting phase fluid: "OIL", "GAS", "WATER"
+
+        Todo:
+            * Remove depreciation warning suppression when solved in LibEcl.
+            * Improve robustness pf setting of Phase and Type.
+
+        """
+        df_production_data = pd.read_csv(self._input_data)
+        df_production_data["date"] = pd.to_datetime(df_production_data["date"]).dt.date
+        df_production_data = df_production_data.set_index("date", drop=False)
+        return df_production_data
+
+    @property
+    def production(self) -> pd.DataFrame:
+        """dataframe with all production data"""
+        return self._production_data()
diff --git a/src/flownet/ert/_create_ert_setup.py b/src/flownet/ert/_create_ert_setup.py
@@ -112,7 +112,7 @@ def create_observation_file(
                     {
                         "dates": dates,
                         "schedule": schedule,
-                        "error_config": config.flownet.data_source.simulation.vectors,
+                        "error_config": config.flownet.data_source.vectors,
                         "num_beginning_date": setting[1],
                         "num_end_date": setting[2],
                         "last_training_date": dates[num_training_dates - 1],

diff --git a/tests/test_check_obsfiles_ert_yaml.py b/tests/test_check_obsfiles_ert_yaml.py
@@ -189,108 +189,103 @@ def test_check_obsfiles_ert_yaml() -> None:
     # pylint: disable=maybe-no-member
     config = collections.namedtuple("configuration", "flownet")
     config.flownet = collections.namedtuple("flownet", "data_source")
-    config.flownet.data_source = collections.namedtuple("data_source", "simulation")
-    config.flownet.data_source.simulation = collections.namedtuple(
-        "simulation", "vectors"
-    )
-    config.flownet.data_source.simulation.vectors = collections.namedtuple(
-        "vectors", "WTHP"
-    )
-    config.flownet.data_source.simulation.vectors.WOPR = collections.namedtuple(
+    config.flownet.data_source = collections.namedtuple("data_source", "vectors")
+    config.flownet.data_source.vectors = collections.namedtuple("vectors", "WTHP")
+    config.flownet.data_source.vectors.WOPR = collections.namedtuple(
         "WOPR", "min_error"
     )
-    config.flownet.data_source.simulation.vectors.WOPR.min_error = _MIN_ERROR
-    config.flownet.data_source.simulation.vectors.WOPR.rel_error = _REL_ERROR
+    config.flownet.data_source.vectors.WOPR.min_error = _MIN_ERROR
+    config.flownet.data_source.vectors.WOPR.rel_error = _REL_ERROR
 
-    config.flownet.data_source.simulation.vectors.WGPR = collections.namedtuple(
+    config.flownet.data_source.vectors.WGPR = collections.namedtuple(
         "WGPR", "min_error"
     )
-    config.flownet.data_source.simulation.vectors.WGPR.min_error = _MIN_ERROR
-    config.flownet.data_source.simulation.vectors.WGPR.rel_error = _REL_ERROR
+    config.flownet.data_source.vectors.WGPR.min_error = _MIN_ERROR
+    config.flownet.data_source.vectors.WGPR.rel_error = _REL_ERROR
 
-    config.flownet.data_source.simulation.vectors.WWPR = collections.namedtuple(
+    config.flownet.data_source.vectors.WWPR = collections.namedtuple(
         "WWPR", "min_error"
     )
-    config.flownet.data_source.simulation.vectors.WWPR.min_error = _MIN_ERROR
-    config.flownet.data_source.simulation.vectors.WWPR.rel_error = _REL_ERROR
+    config.flownet.data_source.vectors.WWPR.min_error = _MIN_ERROR
+    config.flownet.data_source.vectors.WWPR.rel_error = _REL_ERROR
 
-    config.flownet.data_source.simulation.vectors.WOPT = collections.namedtuple(
+    config.flownet.data_source.vectors.WOPT = collections.namedtuple(
         "WOPT", "min_error"
     )
-    config.flownet.data_source.simulation.vectors.WOPT.min_error = _MIN_ERROR
-    config.flownet.data_source.simulation.vectors.WOPT.rel_error = _REL_ERROR
+    config.flownet.data_source.vectors.WOPT.min_error = _MIN_ERROR
+    config.flownet.data_source.vectors.WOPT.rel_error = _REL_ERROR
 
-    config.flownet.data_source.simulation.vectors.WGPT = collections.namedtuple(
+    config.flownet.data_source.vectors.WGPT = collections.namedtuple(
         "WGPT", "min_error"
     )
-    config.flownet.data_source.simulation.vectors.WGPT.min_error = _MIN_ERROR
-    config.flownet.data_source.simulation.vectors.WGPT.rel_error = _REL_ERROR
+    config.flownet.data_source.vectors.WGPT.min_error = _MIN_ERROR
+    config.flownet.data_source.vectors.WGPT.rel_error = _REL_ERROR
 
-    config.flownet.data_source.simulation.vectors.WWPT = collections.namedtuple(
+    config.flownet.data_source.vectors.WWPT = collections.namedtuple(
         "WWPT", "min_error"
     )
-    config.flownet.data_source.simulation.vectors.WWPT.min_error = _MIN_ERROR
-    config.flownet.data_source.simulation.vectors.WWPT.rel_error = _REL_ERROR
+    config.flownet.data_source.vectors.WWPT.min_error = _MIN_ERROR
+    config.flownet.data_source.vectors.WWPT.rel_error = _REL_ERROR
 
-    config.flownet.data_source.simulation.vectors.WBHP = collections.namedtuple(
+    config.flownet.data_source.vectors.WBHP = collections.namedtuple(
         "WBHP", "min_error"
     )
-    config.flownet.data_source.simulation.vectors.WBHP.min_error = _MIN_ERROR
-    config.flownet.data_source.simulation.vectors.WBHP.rel_error = _REL_ERROR
+    config.flownet.data_source.vectors.WBHP.min_error = _MIN_ERROR
+    config.flownet.data_source.vectors.WBHP.rel_error = _REL_ERROR
 
-    config.flownet.data_source.simulation.vectors.WTHP = collections.namedtuple(
+    config.flownet.data_source.vectors.WTHP = collections.namedtuple(
         "WTHP", "min_error"
     )
-    config.flownet.data_source.simulation.vectors.WTHP.min_error = _MIN_ERROR
-    config.flownet.data_source.simulation.vectors.WTHP.rel_error = _REL_ERROR
+    config.flownet.data_source.vectors.WTHP.min_error = _MIN_ERROR
+    config.flownet.data_source.vectors.WTHP.rel_error = _REL_ERROR
 
-    config.flownet.data_source.simulation.vectors.WGIR = collections.namedtuple(
+    config.flownet.data_source.vectors.WGIR = collections.namedtuple(
         "WGIR", "min_error"
     )
-    config.flownet.data_source.simulation.vectors.WGIR.min_error = _MIN_ERROR
-    config.flownet.data_source.simulation.vectors.WGIR.rel_error = _REL_ERROR
+    config.flownet.data_source.vectors.WGIR.min_error = _MIN_ERROR
+    config.flownet.data_source.vectors.WGIR.rel_error = _REL_ERROR
 
-    config.flownet.data_source.simulation.vectors.WWIR = collections.namedtuple(
+    config.flownet.data_source.vectors.WWIR = collections.namedtuple(
         "WWIR", "min_error"
     )
-    config.flownet.data_source.simulation.vectors.WWIR.min_error = _MIN_ERROR
-    config.flownet.data_source.simulation.vectors.WWIR.rel_error = _REL_ERROR
+    config.flownet.data_source.vectors.WWIR.min_error = _MIN_ERROR
+    config.flownet.data_source.vectors.WWIR.rel_error = _REL_ERROR
 
-    config.flownet.data_source.simulation.vectors.WGIT = collections.namedtuple(
+    config.flownet.data_source.vectors.WGIT = collections.namedtuple(
         "WGIT", "min_error"
     )
-    config.flownet.data_source.simulation.vectors.WGIT.min_error = _MIN_ERROR
-    config.flownet.data_source.simulation.vectors.WGIT.rel_error = _REL_ERROR
+    config.flownet.data_source.vectors.WGIT.min_error = _MIN_ERROR
+    config.flownet.data_source.vectors.WGIT.rel_error = _REL_ERROR
 
-    config.flownet.data_source.simulation.vectors.WWIT = collections.namedtuple(
+    config.flownet.data_source.vectors.WWIT = collections.namedtuple(
         "WWIT", "min_error"
     )
-    config.flownet.data_source.simulation.vectors.WWIT.min_error = _MIN_ERROR
-    config.flownet.data_source.simulation.vectors.WWIT.rel_error = _REL_ERROR
+    config.flownet.data_source.vectors.WWIT.min_error = _MIN_ERROR
+    config.flownet.data_source.vectors.WWIT.rel_error = _REL_ERROR
 
-    config.flownet.data_source.simulation.vectors.WSPR = collections.namedtuple(
+    config.flownet.data_source.vectors.WSPR = collections.namedtuple(
         "WSPR", "min_error"
     )
-    config.flownet.data_source.simulation.vectors.WSPR.min_error = _MIN_ERROR
-    config.flownet.data_source.simulation.vectors.WSPR.rel_error = _REL_ERROR
+    config.flownet.data_source.vectors.WSPR.min_error = _MIN_ERROR
+    config.flownet.data_source.vectors.WSPR.rel_error = _REL_ERROR
 
-    config.flownet.data_source.simulation.vectors.WSPT = collections.namedtuple(
+    config.flownet.data_source.vectors.WSPT = collections.namedtuple(
         "WSPT", "min_error"
     )
-    config.flownet.data_source.simulation.vectors.WSPT.min_error = _MIN_ERROR
-    config.flownet.data_source.simulation.vectors.WSPT.rel_error = _REL_ERROR
+    config.flownet.data_source.vectors.WSPT.min_error = _MIN_ERROR
+    config.flownet.data_source.vectors.WSPT.rel_error = _REL_ERROR
 
-    config.flownet.data_source.simulation.vectors.WSIR = collections.namedtuple(
+    config.flownet.data_source.vectors.WSIR = collections.namedtuple(
         "WSIR", "min_error"
     )
-    config.flownet.data_source.simulation.vectors.WSIR.min_error = _MIN_ERROR
-    config.flownet.data_source.simulation.vectors.WSIR.rel_error = _REL_ERROR
+    config.flownet.data_source.vectors.WSIR.min_error = _MIN_ERROR
+    config.flownet.data_source.vectors.WSIR.rel_error = _REL_ERROR
 
-    config.flownet.data_source.simulation.vectors.WSIT = collections.namedtuple(
+    config.flownet.data_source.vectors.WSIT = collections.namedtuple(
         "WSIT", "min_error"
     )
-    config.flownet.data_source.simulation.vectors.WSIT.min_error = _MIN_ERROR
-    config.flownet.data_source.simulation.vectors.WSIT.rel_error = _REL_ERROR
+    config.flownet.data_source.vectors.WSIT.min_error = _MIN_ERROR
+    config.flownet.data_source.vectors.WSIT.rel_error = _REL_ERROR
 
     config.flownet.data_source.resampling = _RESAMPLING
 

diff --git a/tests/test_probability_distributions.py b/tests/test_probability_distributions.py
@@ -1,5 +1,6 @@
 from typing import List
 
+import numpy as np
 import pandas as pd
 
 
@@ -132,7 +133,7 @@
 
 DISTRIBUTION_DF = pd.DataFrame(DATA)
 # NaNs to None
-DISTRIBUTION_DF = DISTRIBUTION_DF.where(DISTRIBUTION_DF.notnull(), None)
+DISTRIBUTION_DF = DISTRIBUTION_DF.replace({np.nan: None})
 
 
 def test_probability_distributions() -> None: