ZEN-garden with linopy #281

jafluri · 2023-05-15T13:05:39Z

jafluri
May 15, 2023

Overview

This post discusses the changes in ZEN-Garden that resulted from the transition from Pyomo to linopy. This decision was made, because linopy allows one to use vectorization to create constraints which can speed up the model creation and the handover to the solver, as well as decrease the overall memory usage. This guide is intended to cover all the basics about linopy, its underlying framework xarray, and things that should be considered when writing constraints or other functions in ZEN-Garden. We start with a broad overview and then explain the concepts of vectorization, followed by a deep dive into the "Art of Writing Constraints".

Main Changes

There are a couple of general things that changed with the transition, e.g. how objects are accessed, which are summarized here, more details follow in the later sections.

Sets

The sets did not change a lot in the usage. However, linopy does not have the concept of sets or indexed sets and a new class IndexSet was written. With this class, sets can be added and used in more or less the same way as before. The class has some additional useful member functions which will appear in later sections. The sets are now a member of the optimization_setup and use dictionary-like access, e.g.

sets = optimization_setup.sets
sets["set_transport_technologies"] # not indexed set
sets["set_reference_carriers"][tech] # indexed set

For backwards compatibility, they support the functions .is_indexed() and have a name attribute.

Note: linopy uses the coordinates of xarray objects as sets. The combination of ZEN-Garden sets and linopy coordinates would be a nice unification of the frameworks. However, it was not done initially because it would require some global and deep changes to the package.

Config parameters

There are a few new config parameters which are explained in the following list:

solver["solver_dir"] is a directory where temporary solver files will be saved. These can be the lp-files, as well as the solution files. It defaults to .//outputs//solver_files.
solver["keep_files"] is a boolean that specifies whether or not to keep the written temporary files. This defaults to False because the temporary files can become fairly large.
solver["io_api"] is an enum that defines how linopy communicates with the chosen solver. The possible values are "direct" (default), "lp" or "mps". With "direct", the problem is handed over to the API of the solver, which can slightly increase the handover time but usually decreases the solving time. Using "direct" and solver["name"] = "gurobi", is more or less the same as "gurobi_persistent" with pyomo. Using "lp" or "mps" will write the problem to a file in the given format and the solver will read to problem from this file. Note that while writing the files, linopy will create a warning for all unused variables in the problem.

Parameter

All parameters are now xarray DataArray objects that allow vectorized access. The original, dictionary-based parameters still exist within the parameter object. You can access the DataArray objects via normal attribute access and the dictionaries within the dict_parameters:

capacity_existing_xarray = optimization_setup.parameters.capacity_existing # The normal DataArray parameters
capacity_existing_dict = optimization_setup.parameters.dict_parameter.capacity_existing # The old dictionary parameter

We kept the old dictionary-based parameters because they have much faster random access times (see later sections). Additionally, the parameter object has a new member function add_helper_parameter, which allows one to add a helper parameter. These are precalculated parameters that help with the constraints, parameters added this way are not saved in the results.

Variables

Variables can be added to the model in the same way as before, using the variables.add_variable function. However, the variables are accessed in a different way to pyomo, i.e. via dictionary access instead of attribute access:

model.flow_storage_discharge # old way of accessing variables from the model
model.variables["flow_storage_discharge"] # new way via dictionary access

Additionally, linopy always creates variables for all possible combinations of the index set (i.e. the cartesian product). However, it will mask all variables for which no index was supplied in the add_variable function. This means that those variables can still be accessed (e.g. for vectorization) but have no effect (dummy variables).

Constraints

The biggest changes in the framework affect the constraints. The basic, high-level functions did not change a lot though. One can still add constraints with the constraints.add_constraint_rule function. These constraints are usually very easy to read. Additionally, one can add vectorized constraints with the constraints.add_constraint_block function. This function adds many constraints at the same time, which will be explained later.

Piecewise linear constraints have to be added with the constraints.add_pw_constraint constraints that add them in a SOS2 manner.

Disjunctive constraints are a bit more tricky because linopy does not support them natively yet. Therefore, one has to create the disjunction variable manually and hand them over to the function that creates the constraints. This is implemented as an additional argument inside the disjunctive constraints in the technologies.

linopy

This section explains the basics of linopy, its underlying framework xarray and what to consider when using them.

Basic concepts

Similar to pyomo, linopy is an open-source Python package that facilitates linear or mixed-integer optimisation. However, compared to pyomo, linopy is a much more simplified framework. The main differences are:

Linopy only allows linear expressions and not arbitrary expressions. The term "linear" is taken from the mathematical definition and is very literal, meaning not even "affine" expressions, containing constants, are supported.
Linopy is based on xarray and supports vectorized operations.
Linopy is fast when vectorization is used and slow for random access, pyomo is optimized for random access.
Due to vectorization, constraints are usually less readable compared to constraints written with pyomo.

We will now proceed with a quick overview of the underlying framework and then highlight the slight but important differences between pure xarray and linopy.

xarray

In linopy, all objects, i.e. variables, linear expressions and constraints are represented as xarray DataArray or DataSet objects, where DataSets are essentially just a collection of various DataArray objects. Here we will give a quick overview of the basics of xarray and then proceed to its usage in linopy.

In short, xarray is a combination of pandas and numpy. It creates multidimensional arrays that can be accessed via integer indices (numpy) or coordinate-based (pandas). We can create a pure DataArray by explicitly specifying, data, coordinates and dimension names

In [1]: import xarray as xr
   ...:
   ...: x = xr.DataArray(data=1.0, coords=[["a1", "a2", "a3"], ["b1", "b2", "b3"]], dims=["A", "B"])
   ...: x
Out[1]:
<xarray.DataArray (A: 3, B: 3)>
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])
Coordinates:
  * A        (A) <U2 'a1' 'a2' 'a3'
  * B        (B) <U2 'b1' 'b2' 'b3'

Here we created a DataArray that contains only ones (the data attribute can be a constant or a numpy array), it has two coordinates with the dimension names A and B. We can access the data either index based or coordinated based

x.loc["a1", "b2"] # coordinate-based access
x[0, 0] # integer index based access

for coordinate-based access, one has to use the .loc property. It is important to note, that xarray will always return a DataArray object, even if you just access a single element. If you want a single value you need to use the item() function

In [7]: x.loc["a1", "b2"] # returns a DataArray with a single element
Out[7]:
<xarray.DataArray ()>
array(1.)
Coordinates:
    A        <U2 'a1'
    B        <U2 'b2'

In [8]: x.loc["a1", "b2"].item() # returns a float
Out[8]: 1.0

Multiple values across a single dimension can be accessed normally

x[:, 0] # index-based access of all elements in the A dimension
x.loc["a1", :] # coordinate-based access of all elements in the B dimension
x.loc["a1", ["b2", "b3"]] # coordinate-based access of a subset of elements in the B dimension

The first big difference with numpy is how multiple elements are accessed. Compared to numpy, xarray uses orthogonal vectorization, meaning when multiple elements are accessed across multiple dimensions, all combinations are returned

In [13]: x.loc[["a1", "a3"], ["b2", "b3"]]
Out[13]:
<xarray.DataArray (A: 2, B: 2)>
array([[1., 1.],
       [1., 1.]])
Coordinates:
  * A        (A) <U2 'a1' 'a3'
  * B        (B) <U2 'b2' 'b3'

Here we get all combinations of the coordinates, meaning a two by two array instead of an array with two elements which would be the expected result from numpy. The reason for this is the coordinates and dimensions. If we want a subset across multiple dimensions, we need to supply a new dimension name. In xarray, this is still possible by supplying DataArray objects instead of lists

In [14]: x.loc[xr.DataArray(["a1", "a3"], dims="C"), xr.DataArray(["b2", "b3"], dims="C")]
Out[14]:
<xarray.DataArray (C: 2)>
array([1., 1.])
Coordinates:
    A        (C) <U2 'a1' 'a3'
    B        (C) <U2 'b2' 'b3'
Dimensions without coordinates: C

By supplying an explicit new dimension with the DataArray objects that access the coordinates, we get the desired subset which has now a new dimension name.

Broadcasting

Broadcasting in xarray is also based on the dimensions and the coordinates. Per default, only matching dimensions and coordinates are broadcasted. Let's create some more DataArray objects

 y = xr.DataArray(data=2.0, coords=[["c1", "c2"]], dims="C")
x1 = xr.DataArray(data=3.0, coords=[["a1", "a3"], ["b2", "b3"]], dims=["A", "B"])

Here, y has a new dimension and x1 has the same dimensions as x, but only a subset of the coordinates. If we add x and y the new object will have ALL dimensions and coordinates of the two

In [24]: x + y
Out[24]:
<xarray.DataArray (A: 3, B: 3, C: 2)>
array([[[3., 3.],
        [3., 3.],
        [3., 3.]],

       [[3., 3.],
        [3., 3.],
        [3., 3.]],

       [[3., 3.],
        [3., 3.],
        [3., 3.]]])
Coordinates:
  * A        (A) <U2 'a1' 'a2' 'a3'
  * B        (B) <U2 'b1' 'b2' 'b3'
  * C        (C) <U2 'c1' 'c2'

whereas if we add x1 and x only the matching subset will be summed

In [25]: x + x1
Out[25]:
<xarray.DataArray (A: 2, B: 2)>
array([[4., 4.],
       [4., 4.]])
Coordinates:
  * A        (A) <U2 'a1' 'a3'
  * B        (B) <U2 'b2' 'b3'

Variables and linear expressions in linopy

In linopy, variables are based on DataArray objects and linear expressions and constraints on DataSets. That means a lot of the access and broadcasting rules apply. Note that a lot more examples are provided in the later section about constraints, which might make things clearer.

Access

Let's have a look at a common variable of a ZEN-Garden optimization problem:

In [4]: model.variables["cost_opex"]
Out[4]: 
Variable (set_technologies: 3, set_location: 4, set_time_steps_operation: 1) - 6 masked entries
-----------------------------------------------------------------------------------------------
[natural_gas_boiler, CH, 0]:      cost_opex[natural_gas_boiler, CH, 0]      ∈ [0.0, inf]
[natural_gas_boiler, CH-DE, 0]:   None                                      
[natural_gas_boiler, DE, 0]:      cost_opex[natural_gas_boiler, DE, 0]      ∈ [0.0, inf]
[natural_gas_boiler, DE-CH, 0]:   None                                      
[natural_gas_pipeline, CH, 0]:    None                                      
[natural_gas_pipeline, CH-DE, 0]: cost_opex[natural_gas_pipeline, CH-DE, 0] ∈ [0.0, inf]
[natural_gas_pipeline, DE, 0]:    None                                      
[natural_gas_pipeline, DE-CH, 0]: cost_opex[natural_gas_pipeline, DE-CH, 0] ∈ [0.0, inf]
[natural_gas_storage, CH, 0]:     cost_opex[natural_gas_storage, CH, 0]     ∈ [0.0, inf]
[natural_gas_storage, CH-DE, 0]:  None                                      
[natural_gas_storage, DE, 0]:     cost_opex[natural_gas_storage, DE, 0]     ∈ [0.0, inf]
[natural_gas_storage, DE-CH, 0]:  None

Here "cost_opex" has three dimensions and some coordinate combinations are masked (None). In contrast to pure xarray, linopy does not allow integer index-based access to individual variables, but only coordinate-based access, which is similar to pyomo

In [7]: model.variables["cost_opex"]["natural_gas_boiler", "CH", 0]
Out[7]: ScalarVariable: cost_opex[natural_gas_boiler, CH, 0]

Note that in contrast to pure xarray, we did not need to use the item() function. Using the .loc property, we can access a subset of the variable at once, note the orthogonal access

In [9]: model.variables["cost_opex"].loc[:, ["CH", "DE"], 0]
Out[9]: 
Variable (set_technologies: 3, set_location: 2) - 2 masked entries
------------------------------------------------------------------
[natural_gas_boiler, CH]:   cost_opex[natural_gas_boiler, CH, 0]  ∈ [0.0, inf]
[natural_gas_boiler, DE]:   cost_opex[natural_gas_boiler, DE, 0]  ∈ [0.0, inf]
[natural_gas_pipeline, CH]: None                                  
[natural_gas_pipeline, DE]: None                                  
[natural_gas_storage, CH]:  cost_opex[natural_gas_storage, CH, 0] ∈ [0.0, inf]
[natural_gas_storage, DE]:  cost_opex[natural_gas_storage, DE, 0] ∈ [0.0, inf]

An important difference to pure xarray is that we can not select a subset of variables with DataArray objects and an explicit new dimension name!

Broadcasting

Let's look at another variable

In [14]: model.variables["flow_import"]
Out[14]: 
Variable (set_carriers: 2, set_nodes: 2, set_time_steps_operation: 1)
---------------------------------------------------------------------
[heat, CH, 0]:        flow_import[heat, CH, 0]        ∈ [0.0, inf]
[heat, DE, 0]:        flow_import[heat, DE, 0]        ∈ [0.0, inf]
[natural_gas, CH, 0]: flow_import[natural_gas, CH, 0] ∈ [0.0, inf]
[natural_gas, DE, 0]: flow_import[natural_gas, DE, 0] ∈ [0.0, inf]

Here "flow_import" has also three dimensions. However, only one dimension has the same name (set_time_steps_operation) as "cost_opex" from before, so if we add these two variables, we get a linear expression with a total of 5 dimensions

In [15]: model.variables["cost_opex"] + model.variables["flow_import"]
Out[15]: 
LinearExpression (set_time_steps_operation: 1, set_technologies: 3, set_location: 4, set_carriers: 2, set_nodes: 2):

which is usually not what we want. To avoid this type of broadcasting, one can rename the dimension names, e.g.

In [16]: model.variables["cost_opex"] + model.variables["flow_import"].rename({"set_nodes": "set_location"})
Out[16]: 
LinearExpression (set_time_steps_operation: 1, set_technologies: 3, set_location: 4, set_carriers: 2):

Unlike pure xarray, if you add two variables where one variable contains the same dimension name, but only a subset of coordinates (here we rename "set_nodes" to "set_location" and "set_nodes" is a subset of "set_location"), the final linear expression has the full shape, however, only matching coordinates are added and dummy variables are used otherwise.

Finally, linear expressions can not be indexed, but it is possible to add more variables to them, e.g.

lin_expr = model.variables["cost_opex"] + model.variables["flow_import"].rename({"set_nodes": "set_location"})
lin_expr += model.variables["cost_opex"] # works as expected
lin_expr[:] # not possible

The reason for this is how linopy stores the linear expressions (as DataSet objects), which do not allow indexing. Note that it is also impossible to add constants to linear expressions.

The Art of Writing Constraints

Before we can go into the details of writing constraints, we go through the different ways of accessing data, which will make the necessity of vectorization in linopy apparent. Given a dictionary, pandas DataFrame and an xarray DataArray (or linopy variable), accessing a single, random element from the dictionary is by far the fastest. Accessing a single, random element from the pandas DataFrame is approximately 100 times slower and a xarray DataArray is even 1000 slower than a dictionary. Therefore, if we would use linopy in the same way as pyomo, i.e. using only single element access, we would make the package over 1000 times slower. However, linopy can outperform pyomo, when it accesses many elements at the same time. The more elements you can vectorize over, the better! Because of this, we decided to keep the parameters as both, dictionary and xarray DataArray. This makes it possible to use the dictionaries in functions that rely on single-element access and the DataArray objects for vectorization. In the following, we will go through many examples of constraints in ZEN-Garden and potential choices.

General rules

As mentioned before, whenever possible, you should use vectorization to speed up the model generation. However, there are sometimes cases where vectorization would only bring a minor speed improvement and it is valid to keep the slower, more readable indexed version.

Besides vectorizations, there is one more important difference to pyomo. In linopy, linear expressions do not contain constants. For constraints, this means that all constants have to go to the right-hand side of the constraints and all variables to the left-hand side.

In all of the following examples, if the name of the linopy constraint starts with "get_" then it produces a single block constraint that is added to the model with the add_constraint_block function, otherwise, it is a rule like the pyomo constraints.

Simple Rule-Based Constraints

Some constraints have very little potential for vectorization. Let's have a look at the following constraints in pyomo

def constraint_carbon_emissions_limit_rule(self, model, year):
    """ time-dependent carbon emissions limit from technologies and carriers"""
    # get parameter object
    params = self.optimization_setup.parameters
    if params.carbon_emissions_limit[year] != np.inf:
        return (params.carbon_emissions_limit[year] >= model.carbon_emissions_total[year])
    else:
        return pe.Constraint.Skip

This rule iterates only over the years, of which there are usually less than 100. Additionally, there is an if-statement in the constraint which makes vectorization a bit harder (see below). Therefore, we can keep the readability and don't vectorize without a big penalty in runtime. In linopy, the same constraint looks like this:

def constraint_carbon_emissions_limit_rule(self, year):
    """ time-dependent carbon emissions limit from technologies and carriers"""
    # get parameter object
    model = self.optimization_setup.model
    params = self.optimization_setup.parameters
    if params.carbon_emissions_limit.loc[year] != np.inf:
        return (model.variables["carbon_emissions_total"][year]
                <= params.carbon_emissions_limit.loc[year].item())
    else:
        return None

The only differences are:

The linopy constraints do not take the model as an argument.
Note the .item() at the parameter to get a single value instead of a DataArray object
Instead of a pe.Constraint.Skip, we just return None to skip the constraint.

Vectorization without Control flow

Constraints which are the same for all indices can be vectorized quite easily. Let's look at this constraint in pyomo:

def constraint_carbon_emissions_total_rule(self, model, year):
    """ add up all carbon emissions from technologies and carriers """
    return (model.carbon_emissions_total[year] ==
            # technologies
            model.carbon_emissions_technology_total[year] +
            # carriers
            model.carbon_emissions_carrier_total[year])

This constraint is the same for all indices, we can therefore vectorize it simply by

def get_constraint_carbon_emissions_total(self):
    """ add up all carbon emissions from technologies and carriers """
    model = self.optimization_setup.model

    return (model.variables["carbon_emissions_total"]
            - model.variables["carbon_emissions_technology_total"]
            - model.variables["carbon_emissions_carrier_total"]
            == 0)

Sums in Constraints

There are three ways to sum expressions and variables in linopy:

.sum(dims) sums the variable or linear expression along a subset or all dimensions
lp.merge(*terms, dim) The built-in merge function. When dim is not specified, it sums all terms element-wise, if dim is specified, it merges the terms along a new dimension.
zen_garden.utils.lp_sum(list) sums a list of expressions, if the list is empty it returns 0, and it sums the expressions element-wise.

Whenever possible, one should use one of these functions, because summing terms in a loop is very inefficient. Let's have a look at the following constraint in pyomo

def constraint_cost_carrier_total_rule(self, model, year):
    """ total cost of importing and exporting carrier"""
    # get parameter object
    params = self.optimization_setup.parameters
    return (model.cost_carrier_total[year] == sum(sum(
        (model.cost_carrier[carrier, node, time] + model.cost_shed_demand[carrier, node, time]) * params.time_steps_operation_duration[carrier, time] for time in
        self.energy_system.time_steps.get_time_steps_year2operation(carrier, year)) for carrier, node in Element.create_custom_set(["set_carriers", "set_nodes"], self.optimization_setup)[0]))

Here we have two sums, one over time and one over carrier and node. We rewrite the constraints in the following way

def constraint_carbon_emissions_carrier_total_rule(self, year):
    """ total carbon emissions of importing and exporting carrier"""
    # get parameter object
    params = self.optimization_setup.parameters
    sets = self.optimization_setup.sets
    model = self.optimization_setup.model

    # calculate the sums via loop
    terms = []
    for carrier in sets["set_carriers"]:
        time_arr = xr.DataArray(self.energy_system.time_steps.get_time_steps_year2operation(carrier, year),
                                dims=["set_time_steps_operation"])
        expr = model.variables["carbon_emissions_carrier"].loc[carrier, :, time_arr] * params.time_steps_operation_duration.loc[carrier, time_arr]
        terms.append(expr.sum())

    return (model.variables["carbon_emissions_carrier_total"].loc[year]
            - lp_sum(terms)
            == 0)

In the first loop, we cycle over all carriers. The sum over time and nodes is done in expr.sum(), since there are no dims specified, it sums over all elements. We now have a list of terms, one for each carrier, which we can sum with the lp_sum function.

Vectorization with Control flow

Omittable Control flow

For some constraints, it is possible to just omit the control flow, e.g.

def constraint_availability_import_rule(self, model, carrier, node, time):
    """node- and time-dependent carrier availability to import from outside the system boundaries"""
    # get parameter object
    params = self.optimization_setup.parameters
    if params.availability_import[carrier, node, time] != np.inf:
        return (model.flow_import[carrier, node, time] <= params.availability_import[carrier, node, time])
    else:
        return pe.Constraint.Skip

Here, ignoring the condition will just set the upper bound to infinity, which is fine for linopy and results in a constraint that is always satisfied. We can rewrite it like this

def get_constraint_availability_import(self):
    """node- and time-dependent carrier availability to import from outside the system boundaries"""
    # get parameter object
    params = self.optimization_setup.parameters
    model = self.optimization_setup.model

    return (model.variables["flow_import"]
            <= params.availability_import)

Simple Control flow

Let's have a look at a constraint that has a single if-statement in pyomo

def constraint_cost_carrier_rule(self, model, carrier, node, time):
    """ cost of importing and exporting carrier"""
    # get parameter object
    params = self.optimization_setup.parameters
    if params.availability_import[carrier, node, time] != 0 or params.availability_export[carrier, node, time] != 0:
        return (model.cost_carrier[carrier, node, time] ==
                params.price_import[carrier, node, time] * model.flow_import[carrier, node, time]
                - params.price_export[carrier, node, time] * model.flow_export[carrier, node, time])
    else:
        return (model.cost_carrier[carrier, node, time] == 0)

To vectorize this constraint, one can use the .where(mask) function. This function masks parameters and variables where the mask is False, for example

def get_constraint_cost_carrier(self):
    """ cost of importing and exporting carrier"""
    # get parameter object
    params = self.optimization_setup.parameters
    model = self.optimization_setup.model

    # normal tuple constraints
    mask = ((params.availability_import != 0) | (params.availability_export != 0))
    mask = xr.align(model.variables.labels, mask)[1]
    tuples = [(1.0, model.variables["cost_carrier"]),
              (-params.price_import.where(mask), model.variables["flow_import"].where(mask)),
              (params.price_export.where(mask), model.variables["flow_export"].where(mask))]
    return model.linexpr(*tuples) == 0

We build the mask for all parameters at the same time and then create a list of tuples (coefficient, variable), where all coefficients and variables are masked according to the control flow. The list of tuples is then transformed to a linear expression via model.linexpr(*tuples). Note that it is also possible to mask linear expressions themselves, which was not done here since we only need to mask a subset of the expression (model.variables["cost_carrier"]) is unmasked).

More advanced Control flow

Let's have a look at these nested conditions in pyomo

def constraint_technology_capacity_limit_rule(self, model, tech, capacity_type, loc, time):
    """limited capacity_limit of technology"""
    # get parameter object
    params = self.optimization_setup.parameters
    if params.capacity_limit[tech, capacity_type, loc] != np.inf:
        capacities_existing = Technology.get_available_existing_quantity(self.optimization_setup, tech, capacity_type, loc, time, type_existing_quantity="capacity")
        if capacities_existing < params.capacity_limit[tech, capacity_type, loc]:
            return (params.capacity_limit[tech, capacity_type, loc] >= model.capacity[tech, capacity_type, loc, time])
        else:
            return (model.capacity_addition[tech, capacity_type, loc, time] == 0)
    else:
        return pe.Constraint.Skip

Here we have two nested if-statements, which we can vectorize for example as follows

def get_constraint_technology_capacity_limit(self):
    """limited capacity_limit of technology"""
    # get parameter object
    params = self.optimization_setup.parameters
    model = self.optimization_setup.model

    # create the masks for the two cases
    m1 = (params.capacity_limit != np.inf) & (params.existing_capacities < params.capacity_limit)
    m2 = (params.capacity_limit != np.inf) & ~(params.existing_capacities < params.capacity_limit)

    # the lhs, rhs and sign depend on the case
    lhs = model.variables["capacity"].where(m1) + model.variables["capacity_addition"].where(m2)
    rhs = params.capacity_limit.where(m1, 0.0)
    sign = xr.DataArray("<=", coords=model.variables["capacity"].coords).where(m1, "==")
    return AnonymousConstraint(lhs, sign, rhs)

We create two masks, one for each nesting, and conditionally create the left-hand side, right-hand side and the sign of the constraint using the where function. Here params.existing_capacities is a helper parameter that was precalculated to avoid the function call in the constraint. For parameters, we can use the argument in the where function to replace the parameter wherever the mask is False with another value. Note that we use the coordinates from the model variables to make sure everything has the right dimensions. Then we put everything together with AnonymousConstraint(lhs, sign, rhs).

Loops, combining and reordering

Sometimes it is just impossible or at least impractical to reduce the constraints to a single block and loops are needed. For example, in this pyomo constraint

def constraint_technology_construction_time_rule(self, model, tech, capacity_type, loc, time):
    """ construction time of technology, i.e., time that passes between investment and availability"""
    # get parameter object
    params = self.optimization_setup.parameters
    start_time_step, _ = Technology.get_start_end_time_of_period(self.optimization_setup, tech, time, period_type="construction_time", clip_to_first_time_step=False)
    if start_time_step in model.set_time_steps_yearly:
        return (model.capacity_addition[tech, capacity_type, loc, time] == model.capacity_investment[tech, capacity_type, loc, start_time_step])
    elif start_time_step in model.set_time_steps_yearly_entire_horizon:
        return (model.capacity_addition[tech, capacity_type, loc, time] == params.capacity_investment_existing[tech, capacity_type, loc, start_time_step])
    else:
        return (model.capacity_addition[tech, capacity_type, loc, time] == 0)

we have a call to get_start_end_time_of_period which depends on the technology and the time followed by conditions that depend on the result. While it would be possible to get rid of all loops, it is fairly impractical in this case. So we rewrite it in the following way:

def get_constraint_technology_construction_time(self):
    """ construction time of technology, i.e., time that passes between investment and availability"""
    # get parameter object
    params = self.optimization_setup.parameters
    model = self.optimization_setup.model
    sets = self.optimization_setup.sets

    # get all the constraints
    index_values, index_names = Element.create_custom_set(["set_technologies", "set_capacity_types", "set_location", "set_time_steps_yearly"], self.optimization_setup)
    index = ZenIndex(index_values, index_names)
    constraints = []
    for tech, time in index.get_unique([0, 3]):
        start_time_step, _ = Technology.get_start_end_time_of_period(self.optimization_setup, tech, time, period_type="construction_time", clip_to_first_time_step=False)
        if start_time_step in sets["set_time_steps_yearly"]:
            constraints.append(model.variables["capacity_addition"].loc[tech, :, :, time]
                               - model.variables["capacity_investment"].loc[tech, :, :, start_time_step]
                               == 0)
        elif start_time_step in sets["set_time_steps_yearly_entire_horizon"]:
            constraints.append(model.variables["capacity_addition"].loc[tech, :, :, time]
                               == params.existing_invested_capacity.loc[tech, :, :, start_time_step])
        else:
            constraints.append(model.variables["capacity_addition"].loc[tech, :, :, time]
                               == 0)

    return self.optimization_setup.constraints.reorder_list(constraints, index.get_unique([0, 3]), [index_names[0], index_names[3]], model), model.variables["capacity_addition"].mask

This constraint introduces a couple of novel concepts, let's go through them one by one. The first one is the ZenIndex, which is a helper class that makes looping easier. Its main functions are get_unique and get_values. Here we make use of the get_unique function which returns all unique combinations of the index given a list of levels. Here the index was built using ["set_technologies", "set_capacity_types", "set_location", "set_time_steps_yearly"], so by specifying levels 0 and 3, we loop over all unique combinations of technologies and yearly timesteps. For each cycle of the loop, we generate a block constraint for all capacity types and locations (levels 1 and 2 of the index), depending on the condition. This gives us a list of constraints, which we could return, as add_constraint_block accepts lists of constraints. However, the dual variables have the same dimensions as the constraints, so it would be nice if the constraints have proper dimensions wherever possible. For this we can use the optimization_setup.constraints.reorder_list function, which takes a list of constraints and reorders them, giving the constraints extra dimensions. In this case, we add the dimensions for the loop variables back to the constraints. Finally, you might notice that this function actually returns two objects, the reordered constraints and a mask. This is possible as add_constraint_block accepts masks for constraints as well. Here we use it to make sure that the constraint is masked properly, i.e. entries, where there should be no constraint, are dummies. Note that it would not be necessary here, however, it is good practice after the reordering, because it might not be clear what has been reordered exactly.

Another way to combine constraints would be to combine the list into a single constraint by just adding a new dimension (similar to np.stack). This can be useful if the constraint is masked almost everywhere and just stacking the constraint saves a lot of memory compared to filling it with dummies. This is done in the constraint_technology_diffusion_limit and uses the optimization_setup.constraints.combine_constraints function.

Maintaining shapes

Let's have a look at the following constraint in pyomo

def constraint_carbon_emissions_technology_rule(self, model, tech, loc, time):
    """ calculate carbon emissions of each technology"""
    # get parameter object
    params = self.optimization_setup.parameters
    reference_carrier = model.set_reference_carriers[tech].at(1)
    if tech in model.set_conversion_technologies:
        if reference_carrier in model.set_input_carriers[tech]:
            reference_flow = model.flow_conversion_input[tech, reference_carrier, loc, time]
        else:
            reference_flow = model.flow_conversion_output[tech, reference_carrier, loc, time]
    elif tech in model.set_transport_technologies:
        reference_flow = model.flow_transport[tech, loc, time]
    else:
        reference_flow = model.flow_storage_charge[tech, loc, time] + model.flow_storage_discharge[tech, loc, time]
    return (model.carbon_emissions_technology[tech, loc, time] == params.carbon_intensity_technology[tech, loc] * reference_flow)

There are a couple of conditions, that all depend on the value of tech. So when we rewrite the constraint in linopy, it would be best to keep the loop over the technologies

def get_constraint_carbon_emissions_technology(self):
    """ calculate carbon emissions of each technology"""
    # get parameter object
    params = self.optimization_setup.parameters
    model = self.optimization_setup.model
    sets = self.optimization_setup.sets

    # get all constraints
    constraints = []
    index_values, index_names = Element.create_custom_set(["set_technologies", "set_location", "set_time_steps_operation"], self.optimization_setup)
    index = ZenIndex(index_values, index_names)
    for tech in index.get_unique(["set_technologies"]):
        locs = index.get_values([tech], 1, unique=True)
        reference_carrier = sets["set_reference_carriers"][tech][0]
        if tech in sets["set_conversion_technologies"]:
            if reference_carrier in sets["set_input_carriers"][tech]:
                reference_flow = model.variables["flow_conversion_input"].loc[tech, reference_carrier, locs].to_linexpr()
                reference_flow = reference_flow.rename({"set_nodes": "set_location"})
            else:
                reference_flow = model.variables["flow_conversion_output"].loc[tech, reference_carrier, locs].to_linexpr()
                reference_flow = reference_flow.rename({"set_nodes": "set_location"})
        elif tech in sets["set_transport_technologies"]:
            reference_flow = model.variables["flow_transport"].loc[tech, locs].to_linexpr()
            reference_flow = reference_flow.rename({"set_edges": "set_location"})
        else:
            reference_flow = model.variables["flow_storage_charge"].loc[tech, locs] + model.variables["flow_storage_discharge"].loc[tech, locs]
            reference_flow = reference_flow.rename({"set_nodes": "set_location"})
        # merge everything, the first is just to ensure full shape
        constraints.append(lp.merge(model.variables["carbon_emissions_technology"].loc[tech].where(False).to_linexpr(),
                                    model.variables["carbon_emissions_technology"].loc[tech, locs].to_linexpr(),
                                    - params.carbon_intensity_technology.loc[tech, locs] * reference_flow,
                                    compat="broadcast_equals")
                           == 0)

    return self.optimization_setup.constraints.reorder_list(constraints, index.get_unique([0]), index_names[:1], model)

The constraint follows a similar logic as before, we loop over a subset of the index and vectorize over the rest, and finally, we reorder the list of constraints to have a nice shape. Now, there are some additional things that we need to take care of here. Depending on the technology, the set of locations can be different (e.g. nodes or edges). We can get the relevant subset by calling index.get_values([tech], 1, unique=True), this retrieves all values from level 1 of the index (set_location) given concrete values of the previous levels (the value of tech for the level 0 which is set_technologies) and by specifying unique=True, we only get the unique values. Then, depending on the condition, the reference_flow will have different dimensions from the dimensions of the index (and the final constraint). Therefore, we rename the dimension mismatch by, e.g. renaming the dimension of the nodes (set_nodes) to the dimension of the locations (set_location). Finally, we combine all terms. Here we need to consider, that depending on the technology, the constraint we append to the list might have a different form (e.g. set_nodes or set_edges). To ensure that all constraints we add to the list have the same shape, meaning containing everything in the set_location, we add the term model.variables["carbon_emissions_technology"].loc[tech].where(False).to_linexpr() which contains just dummy variables, but because of the broadcasting rules for linopy (see above) it will cause the final constraint to have the same shape as "carbon_emissions_technology".

Combining terms without explicit renaming

Consider the following constraint in pyomo

def constraint_storage_technology_capex_rule(self, model, tech, capacity_type, node, time):
    """ definition of the capital expenditures for the storage technology"""
    # get parameter object
    params = self.optimization_setup.parameters
    return (model.cost_capex[tech, capacity_type, node, time] == model.capacity_addition[tech, capacity_type, node, time] * params.capex_specific_storage[tech, capacity_type, node, time])

Looks almost like one from the beginning of the section and should be quite easy. Let's have a look at the rewritten version

def get_constraint_storage_technology_capex(self):
    """ definition of the capital expenditures for the storage technology"""
    # get parameter object
    params = self.optimization_setup.parameters
    model = self.optimization_setup.model

    # get all the constraints
    index_values, index_names = Element.create_custom_set(["set_storage_technologies", "set_capacity_types", "set_nodes", "set_time_steps_yearly"], self.optimization_setup)
    if len(index_values) == 0:
        return []

    techs, capacity_types, nodes, times = IndexSet.tuple_to_arr(index_values, index_names, unique=True)
    coords = [model.variables.coords["set_storage_technologies"], model.variables.coords["set_capacity_types"], model.variables.coords["set_nodes"], model.variables.coords["set_time_steps_yearly"]]
    tuples = [(1.0, model.variables["cost_capex"].loc[techs, capacity_types, nodes, times]),
              (-params.capex_specific_storage.loc[techs, capacity_types, nodes, times], model.variables["capacity_addition"].loc[techs, capacity_types, nodes, times])]

    return linexpr_from_tuple_np(tuples, coords, model) == 0

This translation seems a bit longer than necessary. The first part is just catching errors. The index list might be empty, then we need to return an empty list or a dummy constraint to make sure it is dealt with correctly. Returning an empty list will ensure that no constraint is added. Afterwards, it is tempting to just combine all terms as blocks. However, this would not give you the expected result, since not all dimensions have the same names and broadcasting rules would apply. Instead of manually renaming the dimension we choose another approach here. Using IndexSet.tuple_to_arr(index_values, index_names, unique=True) we get all the unique values from all the levels of the index separately. Then we construct tuples with pairs of coefficients and variables using only the relevant entries. Finally, we combine them using the function linexpr_from_tuple_np. This function is similar to the model.linexpr from before, it transforms the tuples into a linear expression. However, it uses numpy-style broadcasting and does not care about the dimension names. The final coordinates have to be supplied separately to get a proper linear expression.

Going Low

Sometimes a constraint is very complex and it is not possible to have a version that is vectorized and fast in linopy. In those extreme cases, it is possible to resort to low-level functions. In linopy, linear expressions are xarray DataSet objects with two DataArray objects, the coeffs and the vars, the first containing the coefficients of the linear expression and the latter the variables, whereby variables are represented as unique integers (labels). These two DataArray objects have the same coordinates as the linear expression, however, they contain an additional dimension without coordinates, the _term dimension. This dimension represents the different terms of the linear expression which are summed over. For example

In [31]: coeffs
Out[31]:
<xarray.DataArray (set_nodes: 2, _term: 3)>
array([[ 0.5,  0.1,  0.2],
       [ 1.5,  0.1, np.nan]])
Coordinates:
  * set_nodes  (set_nodes) <U2 'CH' 'DE'
Dimensions without coordinates: _term

In [32]: vars
Out[32]:
<xarray.DataArray (set_nodes: 2, _term: 3)>
array([[ 254,  111,  231],
       [ 9,  32, -1]])
Coordinates:
  * set_nodes  (set_nodes) <U2 'CH' 'DE'
Dimensions without coordinates: _term

can represent the DataArray objects of a linear expression. Here, the linear expression has the coordinates "set_nodes", meaning we have a total of two expressions and each expression has at most 3 terms (size of the _term dimension). This would translate into the following two expressions for the solver

0.5 * x254 + 0.1 * x111 + 0.2 * x231
1.5 * x9 + 0.1 * x32

note that a term having NaN has a coefficient and -1 as a variable label is skipped. This can be used to make very fast constraints. However, the resulting constraints are almost completely unreadable, which is why they should only be used in extreme cases. In ZEN-Garden it is only used in the nodal-balance constraint. Let's have a look at a part of this constraint in pyomo:

def constraint_nodal_energy_balance_rule(self, model, carrier, node, time):
    """
    nodal energy balance for each time step.
    """
    
    ....

    # carrier flow transport technologies
    flow_transport_in, flow_transport_out = 0, 0
    set_edges_in = self.energy_system.calculate_connected_edges(node, "in")
    set_edges_out = self.energy_system.calculate_connected_edges(node, "out")
    for tech in model.set_transport_technologies:
        if carrier in model.set_reference_carriers[tech]:
            flow_transport_in += sum(model.flow_transport[tech, edge, time] - model.flow_transport_loss[tech, edge, time] for edge in set_edges_in)
            flow_transport_out += sum(model.flow_transport[tech, edge, time] for edge in set_edges_out)

where we skipped some lines that are not relevant to the example. Note that the rule cycles over carrier, node and time. Additionally, inside the constraint, we loop over technology and edges. This nested loop structure is very difficult to translate into linopy, which is why we adopted the low-level approach. The translation into linopy looks like this

# carrier flow transport technologies
if model.variables["flow_transport"].size > 0:
    # recalculate all the edges
    edges_in = {node: self.energy_system.calculate_connected_edges(node, "in") for node in sets["set_nodes"]}
    edges_out = {node: self.energy_system.calculate_connected_edges(node, "out") for node in sets["set_nodes"]}
    max_edges = max([len(edges_in[node]) for node in sets["set_nodes"]] + [len(edges_out[node]) for node in
                                                                           sets["set_nodes"]])

    # create the variables
    flow_transport_in_vars = xr.DataArray(-1, coords=[model.variables.coords["set_carriers"],
                                                      model.variables.coords["set_nodes"],
                                                      model.variables.coords["set_time_steps_operation"],
                                                      xr.DataArray(np.arange(
                                                          len(sets["set_transport_technologies"]) * (
                                                                      2 * max_edges + 1)), dims=["_term"])])
    flow_transport_in_coeffs = xr.full_like(flow_transport_in_vars, np.nan, dtype=float)
    flow_transport_out_vars = flow_transport_in_vars.copy()
    flow_transport_out_coeffs = xr.full_like(flow_transport_in_vars, np.nan, dtype=float)
    for carrier, node in index.get_unique([0, 1]):
        techs = [tech for tech in sets["set_transport_technologies"] if
                 carrier in sets["set_reference_carriers"][tech]]
        edges_in = self.energy_system.calculate_connected_edges(node, "in")
        edges_out = self.energy_system.calculate_connected_edges(node, "out")

        # get the variables for the in flow
        in_vars_plus = model.variables["flow_transport"].labels.loc[techs, edges_in, :].data
        in_vars_plus = in_vars_plus.reshape((-1, in_vars_plus.shape[-1])).T
        in_coefs_plus = np.ones_like(in_vars_plus)
        in_vars_minus = model.variables["flow_transport_loss"].labels.loc[techs, edges_out, :].data
        in_vars_minus = in_vars_minus.reshape((-1, in_vars_minus.shape[-1])).T
        in_coefs_minus = np.ones_like(in_vars_minus)
        in_vars = np.concatenate([in_vars_plus, in_vars_minus], axis=1)
        in_coefs = np.concatenate([in_coefs_plus, -in_coefs_minus], axis=1)
        flow_transport_in_vars.loc[carrier, node, :, :in_vars.shape[-1] - 1] = in_vars
        flow_transport_in_coeffs.loc[carrier, node, :, :in_coefs.shape[-1] - 1] = in_coefs

        # get the variables for the out flow
        out_vars_plus = model.variables["flow_transport"].labels.loc[techs, edges_out, :].data
        out_vars_plus = out_vars_plus.reshape((-1, out_vars_plus.shape[-1])).T
        out_coefs_plus = np.ones_like(out_vars_plus)
        flow_transport_out_vars.loc[carrier, node, :, :out_vars_plus.shape[-1] - 1] = out_vars_plus
        flow_transport_out_coeffs.loc[carrier, node, :, :out_coefs_plus.shape[-1] - 1] = out_coefs_plus

    # craete the linear expression
    flow_transport_in = lp.LinearExpression(xr.Dataset({"coeffs": flow_transport_in_coeffs,
                                                        "vars": flow_transport_in_vars}), model)
    flow_transport_out = lp.LinearExpression(xr.Dataset({"coeffs": flow_transport_out_coeffs,
                                                         "vars": flow_transport_out_vars}), model)
else:
    # if there is no carrier flow we just create empty arrays
    flow_transport_in = model.variables["flow_import"].where(False).to_linexpr()
    flow_transport_out = model.variables["flow_import"].where(False).to_linexpr()

First, the if-statement in the beginning is just to catch errors with empty variables. Then, instead of looping over everything and using the high-level functions to sum over the technologies and edges, we directly build the linear expressions. This requires a substantial amount of preparation. We first create the DataArray objects for the variables and the coefficients with the right coordinates and the appropriate size of the _term dimension. Then we will it with the correct variable labels and coefficients with proper signs. Finally, we build the linear expression directly from a DataSet object. There are a couple more intricacies, e.g. the necessary loop over carriers and nodes, index handling and so on, which are omitted here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZEN-garden with linopy #281

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

ZEN-garden with linopy #281

jafluri May 15, 2023

Overview

Main Changes

Sets

Config parameters

Parameter

Variables

Constraints

linopy

Basic concepts

xarray

Broadcasting

Variables and linear expressions in linopy

Access

Broadcasting

The Art of Writing Constraints

General rules

Simple Rule-Based Constraints

Vectorization without Control flow

Sums in Constraints

Vectorization with Control flow

Omittable Control flow

Simple Control flow

More advanced Control flow

Loops, combining and reordering

Maintaining shapes

Combining terms without explicit renaming

Going Low

Replies: 0 comments

jafluri
May 15, 2023