Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated unit handling in operations #536

Closed
pjuergens opened this issue May 20, 2021 · 5 comments
Closed

Automated unit handling in operations #536

pjuergens opened this issue May 20, 2021 · 5 comments
Assignees
Labels

Comments

@pjuergens
Copy link
Contributor

This issue shall describe the problem already noted in the PR #535

The _op_data-function doesn't handle units automatically. I guess, the corresponding functions add, subtract, multiply, divide and apply only work if the timeseries have the same units, but I didn't test yet what it does if timeseries have different units.

When doing df.multiply('Emission Factor|Gases', 'Final Energy|Gases', 'Emissions|Gases') I'd expect to have Emissions|Gases the units Mt CO2.

To make _op_data handle units automatically pint might be useful. Pint can do calculations with units, even with numpy-arrays. My first thoughts for it:

import iam_units
# load iam-Units
registry = iam_units.registry
# calculate something with pint - this should be adapted to pyam timeseries and the units stored there
a = registry.Quantity(5, 'TW')
b = registry.Quantity(3, 'Mt')
c = a / b
# store the unit as string 'TW / Mt' - this needs to be passed to the pyam timeseries
unit = '{:~}'.format(c.units)

I think it's important to handle units when calculating on the variables-axis, e.g. emissions = emission factor * energy. However on other axis like regions or scenarios I'm not sure how much it makes sense and how difficult the implementation would be to handle different units at once.

@danielhuppmann danielhuppmann self-assigned this May 25, 2021
@danielhuppmann danielhuppmann changed the title Automatted unit handling in operations Automated unit handling in operations May 25, 2021
@danielhuppmann
Copy link
Member

I started playing around with this feature and found a reasonably smart way to handle this using the iam-units package (which is based on pint), see danielhuppmann@6509394

Follow-up question: how should the binary-ops function work if automated unit handling doesn't work? This happens if the units are not defined in the iam-units registry.

  • Option 1: Add unknown units to the registry on the fly (which also doesn't always work, for example using -).
  • Option 2: Add a new keyword argument ignore_units (default False) to all binary-ops functions.
    • If True, return an empty string as unit
    • If a string, use that as unit in the returned timeseries data.

@znicholls @khaeru @gidden @Rlamboll, any thoughts?

@znicholls
Copy link
Collaborator

how should the binary-ops function work if automated unit handling doesn't work?

My default here would be to raise an error about the unknown unit or let pint simply raise a dimensionality error or similar... After that, option 2 sounds good with one tweak. If ignore_units is True, I would put 'unknown' as the unit (empty string in pyam is treated as nan often so might be dangerous...).

@khaeru
Copy link
Contributor

khaeru commented May 25, 2021

I'm asking myself similar questions at khaeru/genno#32.

The patterns are slightly different, but some reusable ideas:

  • Don't auto-add units to the registry, since this kind of defuses the point of using pint.
  • For a single quantity, if the user inputs non-SI or unparseable units (that they haven't explicitly added to the registry), one of:
    1. Discard: silently, or with log.warning() and/or warnings.warn().
    2. Raise an exception. You might do this at the time each single quantity becomes an operand to one of these binary ops, or earlier; I don't know.
  • When performing operations on ≥2 quantities:
    • Explicit but incompatible units are always an error.
    • Missing units/dimensionless (these are different) for ≥1 operand can result in:
      1. Dimensionless output: silently, or with log.warning() and/or warnings.warn().
      2. Raise an exception.
      3. Units that are inferred according to some clear, easily-documented and -understood rules.

As I mention in the linked issue, one possibility is a package-level global or option (or 2) that controls the "strictness" (behaviour from the numbered lists above).

@pjuergens
Copy link
Contributor Author

The approach looks really nice :) Some thoughts about it:

  • for addition and subtraction, the original unit should be returned
  • when doing operations e.g. on the scenario-axis, unit conversion might not make much sense - but maybe multiplying scenarios doesn't make sense in the first place.
  • by now pyam was flexible to either use the iam-units, a self defined registry or even self defined units without a registry. I would try to keep that flexibility

As a solution we could add a keyword unit-handling to the _op_data which can either take a pint-registry as an argument or keywords like ignore_units (behaviour like now) or add_automatically (add them if possible to the registry on the fly) or any other string to use as unit in the returned timeseries.

A question to dimensionless units: these might be wanted and converted into percentage later. Is that possible with the iam-units?

@danielhuppmann
Copy link
Member

closed via #541

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants