Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

technology_share for target_* is now properly calculated and weighted #294

Merged
merged 22 commits into from
May 19, 2021
Merged

technology_share for target_* is now properly calculated and weighted #294

merged 22 commits into from
May 19, 2021

Conversation

jdhoffa
Copy link
Member

@jdhoffa jdhoffa commented May 10, 2021

The target_* values of technology_share should be calculated as follows (in line with the methodology):
1 - Calculate the production_target for each company (without weighting)
2 - Calculate the technology_share_target for each company (without weighting)
3 - Apply the portfolio_weight to each of production, technology_share, production_target and technology_share_target

This has been achieved.
@maurolepore this is an MVP solution, as this has been an open bug for some time. Please review (and note that I recognize that some of my solutions are not so elegant).

Also @georgeharris2deg if you could please have a look at this it would be much appreciated.

I would rather we get the solution out the door, and we can refactor later.

Closes #277

@jdhoffa jdhoffa requested a review from maurolepore May 10, 2021 12:51
Copy link
Contributor

@maurolepore maurolepore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have a few questions you may consider and merge.

NEWS.md Outdated Show resolved Hide resolved
R/summarize_weighted_production.R Outdated Show resolved Hide resolved
R/target_market_share.R Outdated Show resolved Hide resolved
Comment on lines 214 to 225
add_technology_share_target <- function(data) {
crucial <- c("production_target", "sector_ald", "year", "technology")

check_crucial_names(data, crucial)
walk_(crucial, ~ check_no_value_is_missing(data, .x))

data %>%
group_by(.data$sector_ald, .data$year, .data$scenario, .data$name_ald) %>%
mutate(technology_share_target = .data$production_target / sum(.data$production_target)) %>%
group_by(!!!dplyr::groups(data))
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this more or less duplicates add_technology_share. Good candidate for refactoring

Comment on lines -156 to -165
if (weight_production) {
data <- summarize_weighted_production(
data,
!!!rlang::syms(summary_groups),
use_credit_limit = use_credit_limit
)
} else {
data <- summarize_unweighted_production(
data,
!!!rlang::syms(summary_groups)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now happens AFTER adding the targets. (See line 231). I also had to expand the summarize_weighted_production function, as it now needs to implement something different.

A lot of WET code here unfortunately.

Comment on lines 232 to 277
data <- data %>%
ungroup() %>%
add_loan_weight(use_credit_limit = use_credit_limit) %>%
add_technology_share() %>%
add_technology_share_target() %>%
calculate_weighted_loan_metric("production") %>%
calculate_weighted_loan_metric("technology_share") %>%
calculate_weighted_loan_metric("production_target") %>%
calculate_weighted_loan_metric("technology_share_target") %>%
group_by(
.data$sector_ald,
.data$technology,
.data$year,
!!!rlang::syms(summary_groups)
) %>%
summarize(
weighted_production = sum(.data$weighted_loan_production),
weighted_technology_share = sum(.data$weighted_loan_technology_share),
weighted_production_target = sum(.data$weighted_loan_production_target),
weighted_technology_share_target = sum(.data$weighted_loan_technology_share_target)
) %>%
# Restore old groups
group_by(!!!dplyr::groups(data))
} else {
data <- data %>%
select(-c(
.data$id_loan,
.data$loan_size_credit_limit,
.data$loan_size_outstanding
)) %>%
distinct() %>%
group_by(.data$sector_ald, .data$technology, .data$year, !!!rlang::syms(summary_groups)) %>%
# FIXME: Confusing: `weighted_production` holds unweighted_production?
summarize(
weighted_production = .data$production,
weighted_production_target = .data$production_target,
.groups = "keep"
) %>%
ungroup(.data$technology) %>%
mutate(
weighted_technology_share = .data$weighted_production / sum(.data$weighted_production),
weighted_technology_share_target = .data$weighted_production_target / sum(.data$weighted_production_target)
) %>%
group_by(!!!dplyr::groups(data))
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These dplyr chains are essentially summarize_weighted_production and summarize_unweighted_production, but expanded. The different is that they now need to summarize both the technology_share and technology_share_target. Unfortunately, I couldn't change summarize_weighted_production itself that much, as it is an exported function.

One possibility is to add an argument to summarize_weighted_production that allows you to include targets or not. @maurolepore happy to explore options.

Comment on lines -205 to +208
technology = c("electric", "ice", "electric", "ice", "electric", "ice", "electric", "ice"),
year = c(2020, 2020, 2021, 2021, 2020, 2020, 2021, 2021),
tmsr = c(1, 1, 1.85, 0.6, 1, 1, 1.85, 0.6),
smsp = c(0, 0, 0.34, -0.2, 0, 0, 0.34, -0.2)
technology = c("electric", "ice", "electric", "ice"),
year = c(2020, 2020, 2021, 2021),
tmsr = c(1, 1, 1.85, 0.6),
smsp = c(0, 0, 0.34, -0.2)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a poorly written test, since the scenario was effectively duplicated.
That said, perhaps it's good to write a test to see what happens if the scenario inputs duplicated scenarios?
I think the function should pass an error if this is the case, as I wouldn't know how to handle that otherwise.

tests/testthat/test-target_market_share.R Show resolved Hide resolved
@maurolepore
Copy link
Contributor

Thanks @jdhoffa for your comments. I see a bunch of things we should follow up on and I worry that left only here they will fall through the cracks. I suggest we do either of these:

  1. Do them now
  2. Move them to issues
  3. Write TODO comments in code

The order reflects my preference. You still have the context in your head and getting it done now seems the cheapest. This issue has already waited for a long time, and the number of impacted users I believe was low.

@jdhoffa
Copy link
Member Author

jdhoffa commented May 12, 2021

Happy to do them now, except unfortunately I am in calls all afternoon, and tomorrow is a holiday, so I will have to get to it on Friday.

@jdhoffa jdhoffa requested a review from maurolepore May 18, 2021 13:33
summarize_weighted_production_(data, ..., use_credit_limit = use_credit_limit, add_targets = FALSE)
}

summarize_weighted_production_ <- function(data, ..., use_credit_limit = FALSE, add_targets = FALSE) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this new internal function summarize_weighted_production_ which contains the argument add_targets. I then call summarize_weighted_production with default values for that internal function (this is to preserve the existing arguments of summarize_weighted_production).

@maurolepore let me know what you think of this solution as a way to avoid WET code

Copy link
Contributor

@maurolepore maurolepore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some CI workflows fail

image

Comment on lines 125 to 137
# FIXME: Confusing: `weighted_production` holds unweighted_production?
summarize(weighted_production = .data$production, .groups = "keep") %>%
ungroup(.data$technology, .data$tmsr, .data$smsp) %>%
mutate(weighted_technology_share = .data$weighted_production / sum(.data$weighted_production)) %>%
group_by(!!!dplyr::groups(data))

if (add_targets) {
data %>%
summarize(
weighted_production = .data$production,
weighted_production_target = .data$production_target,
.groups = "keep"
) %>%
ungroup(.data$technology) %>%
mutate(
weighted_technology_share = .data$weighted_production / sum(.data$weighted_production),
weighted_technology_share_target = .data$weighted_production_target / sum(.data$weighted_production_target)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The #FIXME comment is now misplaced. Maybe we can remove it and instead write a comment explaining why we use the "weighted_" prefix in this function? Or change the prefix for a temporary one?

@maurolepore maurolepore self-requested a review May 18, 2021 14:49
jdhoffa added 2 commits May 19, 2021 09:43
More accurate name, since this flag doesn't actually add anything
just checks to see if targets are already there or not.
@jdhoffa
Copy link
Member Author

jdhoffa commented May 19, 2021

Exploring the failing CI checks right now. It seems they are on older R releases only?

@@ -300,7 +300,7 @@ target_market_share <- function(data,
ald_with_benchmark <- calculate_ald_benchmark(ald, region_isos, by_company)

data %>%
rbind(ald_with_benchmark) %>%
dplyr::bind_rows(ald_with_benchmark) %>%
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well hot diggity!

@maurolepore maurolepore merged commit c302426 into RMI-PACTA:master May 19, 2021
@jdhoffa jdhoffa deleted the 277-target_before_weight branch May 19, 2021 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Initial value of technology_share differs between projected and target_*
2 participants