Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new data source: Ember #327

Merged
merged 9 commits into from
Oct 21, 2022
Merged

add new data source: Ember #327

merged 9 commits into from
Oct 21, 2022

Conversation

pweigmann
Copy link
Contributor

Includes new readSource() and convert() function for the yearly electricity data set from Ember:
https://ember-climate.org/data-catalogue/yearly-electricity-data/

The Ember data includes

  • electricity capacities which are added to calcCapacity(subtype = "ember"),
  • electricity generation which is used in a new function calcSE() and
  • emissions from electricity supply which are currently not used anywhere.

In this case, I tried to follow the madrat etiquette in dividing the Ember data set into the respective calcOutput() functions of the "category" of variables that are present (capacities, SE). However, other data sets contain a much wider range of variables making this approach very tedious and the code repetitive. In some of these case we created a calcOutput() function per source rather than per variable category (BP, HRE, ...).

Any ideas how to organize this in the future?

@cchrisgong
Copy link
Contributor

Thanks! I'm sorry I didn't fully understand your question though.. I thought the logic is like one does the read and convert function based on sources (BP, Ember..), then this is called in something like calcCapacity and calcSE, where many sources can be converted and read in. Tagging @Renato-Rodrigues for a second opinion

@pweigmann
Copy link
Contributor Author

Thanks! I'm sorry I didn't fully understand your question though.. I thought the logic is like one does the read and convert function based on sources (BP, Ember..), then this is called in something like calcCapacity and calcSE, where many sources can be converted and read in. Tagging @Renato-Rodrigues for a second opinion

Yes, that's the ideal. My point is, for some sources this isn't very practical because they have data for all variables and it would require a lot of repeated code in many functions. A "lighter" and quicker approach is to only have one calc-function for that source (which is what Falk and I and maybe also others did in the past...). Definitely curious about @Renato-Rodrigues opinion as well, thanks!

@Renato-Rodrigues
Copy link
Member

Renato-Rodrigues commented Oct 14, 2022

I am not sure I am following 100% the discussion, but if it helps I always follow these principles when delaing wiht REMIND input data:

  • each data source should have a single read and convert function.
  • read and convert functions can have different subtypes if you want to treat different different data from the same source, ex: see for example readREMIND_11Regi.R and convertREMIND_11Regi.R.
  • the read function should return the data as close as possible from what we get directly from the data source.
  • the convert function should return the data in a way that makes sense to remind, i.e., disaggregated at country level and with variables and technology names as close as possible to the ones that we use as standard.
  • calcOutput functions serve mainly two purposes: (1) merge together different sources data that refer to the same topic, or (2) do the necessary transformations to create an input file for remind.

So, I would approach this in a different way.

  1. I would create a readEmber function that has different subtypes (ex: "capacity", "demand", "generation", "imports", "emissions", "wholesale_price"). If all the data comes from the same place you can load all together no matter the subtype and just filter afterwards what you want to show.
  2. I would create a convert function that fill the missing country values, and map if possible data using name conventions as close as possible to what we use. Ex: you could map under convertEmber(x,"capacity") "wind" values to "Cap|Wind", and so on. You can use a single mapping file to convert multiple subtype variables if you want.
  3. Historical mif file creation could call directly readSource("Ember",subtype="capacity") for any ember specific values.
  4. I would only add a ember reference to the calcCapacity function if an ember capacity information is important or has better quality to determine REMIND historical bounds used in the model, and in this case you would not need a new subtype for that as this is already included in an existent subtype in the function.

@pweigmann
Copy link
Contributor Author

2. You can use a single mapping file to convert multiple subtype variables if you want.

Thanks for the input, I will do it the way you proposed! Somehow, I wasn't aware that the mapping step can be in the convert function, but it does make a lot of sense for me.

This also means, that all of the steps to bring the data in the right format so that it can be used in the historical.mif needs to be moved from the calcOutput function to the convert function. In this case, I don't see a problem here but am not sure if this is the "madrat-way" to do things?

@Renato-Rodrigues
Copy link
Member

I am not sure what I wrote is 100% compatible with the "madrat way", but this was always my work flow for dealing with input data that can be used in the model.
If you want a second opinion on that you could ask somebody from the RSE group.

@LaviniaBaumstark
Copy link
Member

Hi, in most parts @Renato-Rodrigues explained the madrat way. The only part which is a bit different is where teh mapping to REMIND-specific variables is happening. Mostly, we recommend adjusting only the spatial dimension in a convert* function (providing information for all ISO countries). The mapping (which can also include some calculations) should happen in a calc* funciton. If you do not want to repeat it in many other calc* functions using the same source, you can write a calc* function only for mapping variable names. This "intermediate" calc* function can than be used by all following calc* functions.

@pweigmann
Copy link
Contributor Author

Hi, in most parts @Renato-Rodrigues explained the madrat way. The only part which is a bit different is where teh mapping to REMIND-specific variables is happening. Mostly, we recommend adjusting only the spatial dimension in a convert* function (providing information for all ISO countries). The mapping (which can also include some calculations) should happen in a calc* funciton. If you do not want to repeat it in many other calc* functions using the same source, you can write a calc* function only for mapping variable names. This "intermediate" calc* function can than be used by all following calc* functions.

I changed the structure of the functions and use a 'calcEmber()' with subtypes now. Is this more or less what you imagined @LaviniaBaumstark ?


if (subtype == "capacity") {
# choose only capacity variables
x <- x[, , "GW"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to treat "capacity" and "generation" special here and cannot always read in all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah this is already the calc-function - got it

#'
#' @export

calcEmber <- function(subtype = "all") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe another name would help understanding, what is happening, e.g. calcEmberCleaned ?

Copy link
Contributor Author

@pweigmann pweigmann Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I didn't plan to use another calc function for "Ember" but would just call calcOutput("Ember", subtype = "capacity") in calcCapacity() for example. The only thing left to do there would be to convert to TW.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants