Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add internal loanbook_stable and region_isos_stable (#227) #230

Merged
merged 20 commits into from
Dec 1, 2020
Merged

Add internal loanbook_stable and region_isos_stable (#227) #230

merged 20 commits into from
Dec 1, 2020

Conversation

jdhoffa
Copy link
Member

@jdhoffa jdhoffa commented Nov 26, 2020

Closes #227

Relates to https://github.com/2DegreesInvesting/r2dii.match/pull/328

With these internal datasets, we will have a more stable output from regression tests, that depend less on unrelated changes to r2dii.data.

this loanbook_demo is stable as of v0.1.4 of r2dii.data
@jdhoffa jdhoffa requested a review from maurolepore November 26, 2020 12:47
Copy link
Contributor

@maurolepore maurolepore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that maybe it is too obscure to overwrite the name loanbook_demo so deep into the code base. An alternative would be to use another name, e.g. loanbook_stable and use it one of these ways:

  • To replace the calls to loanbook_demo (most transparent; many lines of code change).
# r2dii.analysis/tests/testthat/test-*.R
match_name(loanbook_stable, ald_demo)
  • To overwrite loanbook_demo but in a more obvious place, like at the top of a test-*.R file (a bit more transparent; only one line of code changes).
# r2dii.analysis/tests/testthat/test-*.R
loanbook_demo <- loanbook_stable
  • To output loanbook_stable indirectly, via an interface (many lines of code change but the code becomes more maintainable because if we later want to output something different we only need to change the implementation of the interface).
# r2dii.analysis/tests/testthat/helper-fake_lbk.R
fake_lbk <- function() {
  loanbook_stable
}

# r2dii.analysis/tests/testthat/test-*.R
match_name(fake_lbk(), ald_demo)

One thing to consider is if we can find an intermediate refactoring step that keeps the tests passing at all times. I see CI is failing now.

@jdhoffa
Copy link
Member Author

jdhoffa commented Nov 27, 2020

One thing to consider is if we can find an intermediate refactoring step that keeps the tests passing at all times. I see CI is failing now.

They are expected to fail until 2DegreesInvesting/r2dii.match#327 is merged. Not sure if it's worth our time for an intermediate refactoring? what do you think?

@jdhoffa
Copy link
Member Author

jdhoffa commented Nov 27, 2020

Or I guess, are you worried about reverse dependencies on r2dii.match?

@maurolepore
Copy link
Contributor

Yeah, I'm likely overthinking -- in part because I'm looking for real-world opportunities to apply what I'm learning in "Working effectively with legacy code" as I read it. Sorry you are a victim of this process :)

maurolepore referenced this pull request in RMI-PACTA/r2dii.match Nov 27, 2020
Closes #326
Closes #327 -- this is a simpler/safer way to solve it.
Relates to:
* https://github.com/2DegreesInvesting/r2dii.analysis/pull/227
* https://github.com/2DegreesInvesting/r2dii.analysis/pull/230

This change is internal, and motivated by our need to break the
dependency between r2dii.data and its reverse dependencies. The
problem we are trying to solve is this:

r2dii.data provides data that is used in some functions if its
reverse dependencies -- like r2dii.match. Concretely, changes in
the sector classification dataset (in r2dii.data) make tests to
fail in reverse dependencies like r2dii.match that store a
snapshot of expected output (e.g. of match_name()).

This commit allows us to use generate such snaphsots with a
frozen loanbook, and thus make r2dii packages less prone to
meaningless errors when CRAN checks for reverse dependencies.
Copy link
Contributor

@maurolepore maurolepore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdhoffa, have a look and edit or merge when you are ready. This should come after
https://github.com/2DegreesInvesting/r2dii.match/pull/328.

@jdhoffa
Copy link
Member Author

jdhoffa commented Nov 30, 2020

@maurolepore so this LGTM, but I have a question.
This PR will cause the dev version of r2dii.analysis to fail with the current CRAN release of r2dii.match. Does this mean r2dii.analysis now depends on r2dii.match (>= 0.0.7)? Should we bypass these tests if r2dii.match (<= 0.0.6) for stability?

@maurolepore
Copy link
Contributor

maurolepore commented Nov 30, 2020

Should we bypass these tests if r2dii.match (<= 0.0.6) for stability?

I like this idea. I think it makes sense because the test is valuable only in that it helps develop new software, so in this test we don't care about older versions.

@maurolepore
Copy link
Contributor

In order to decide if we need to skip the test, I'm unsure if the best condition is that r2dii.match <= 0.0.6. The problem is that some commit where r2dii.match = 0.0.6.9000 may lack the ability to allow reserved columns. Instead, a more direct way may be to check that allow_reserved_columns() exists in r2dii.match -- whatever the version. This is a bit fragile because allow_reserved_columns is internal, but I would trust our tests.

packageVersion("r2dii.match")
#> [1] '0.0.6.9000'

# https://bit.ly/2VksIPj (@hadly at StackOverflow)
allows_reserved_columns <- exists(
  'allow_reserved_columns',
  where = asNamespace('r2dii.match'),
  mode = 'function'
)

# We can use skip_if_not()
allows_reserved_columns
#> [1] TRUE
packageVersion("r2dii.match")
#> [1] '0.0.6'

# https://bit.ly/2VksIPj (@hadly at StackOverflow)
allows_reserved_columns <- exists(
  'allow_reserved_columns',
  where = asNamespace('r2dii.match'),
  mode = 'function'
)

# We can use skip_if_not()
allows_reserved_columns
#> [1] FALSE

Created on 2020-11-30 by the reprex package (v0.3.0)

Comment on lines +189 to +194
allows_reserved_columns <- exists(
'allow_reserved_columns',
where = asNamespace('r2dii.match'),
mode = 'function'
)
skip_if_not(allows_reserved_columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdhoffa this is ready for your review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still getting plenty of test failures when I run the tests locally. This is with the dev version of r2dii.*

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Good that I now see the failures on gh-actions. I'll check them out.

@jdhoffa
Copy link
Member Author

jdhoffa commented Nov 30, 2020

So it seems like the most recent failing tests (on devel) are coming from the most recent commit to r2dii.data: 2DegreesInvesting/r2dii.data@4d83cc1
This was surprising to me, as I expected only region_isos to change from this commit, however region_isos_demo changed as well. Since region_isos_demo is used in most tests, this will cause small (sometimes large) differences in the regression values.

Upon closer inspection, I can see that region_isos_demo is sourced using the actual values of region_isos: https://github.com/2DegreesInvesting/r2dii.data/blob/master/data-raw/region_isos_demo.R

As a result, region_isos_demo is not actually that stable. I think this is an issue, and it would make sense to freeze a stable version of region_isos_demo (either in r2dii.analysis or r2dii.data), again for the purposes of testing. This will be an easier job, since this data is passed explicitly to all functions as an argument.

Let me know if you have questions, and hopefully we can fix this together tomorrow.

Copy link
Contributor

@maurolepore maurolepore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jdhoffa, your idea is a great one. I polished the script a bit. See my comments below. I'm happy to merge whenever you are ready. Locally, I see the checks pass cleanly both with release and devel versions of r2dii.data and r2dii.match.

Comment on lines +3 to +5
* New internal data `loanbook_stable` and `region_isos_stable` make regression
tests more stable (#227).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a mention to region_isos_stable.

Comment on lines +1 to +5
# The `*_stable` datasets help create regression tests independent from changes
# in r2dii.data which might cause meaningless errors when checking for problems
# in reverse dependencies of r2dii.data. For example, in `loanbook_demo`, values
# of the columns `sector` and `borderline` may change if the classification
# system is updated (see #227).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I polished the text a bit.

Comment on lines +11 to +20
# Access older version of r2dii.data without changing current version
temp_lib <- tempdir()
fs::dir_create(temp_lib)

withr::with_libpaths(temp_lib, {
remotes::install_version("r2dii.data", version = "0.1.4")
region_isos_demo <- r2dii.data::region_isos_demo
nace_classification <- r2dii.data::nace_classification
loanbook_demo <- r2dii.data::loanbook_demo
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I install the older version on a disposable library so we don't mess up with the current installation.

Comment on lines +8 to +9
library(fs)
library(withr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unecessary but mostly to clearly anounce which packages I'm using.

usethis::use_data(
region_isos_stable,
loanbook_stable,
internal = TRUE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This results in R/sysdata.R so I renamed this file to data-raw/sysdata.R. All sysdata must be defined in the same call to use_data(), so this new name should make it easier to find the right place to add new sysdata.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me!

@maurolepore
Copy link
Contributor

@jdhoffa, while we have this context in the head, do you see an opportunity to reuse this idea to make r2dii.match more robust too?

@maurolepore maurolepore changed the title Add stable internal loanbook_demo (#227) Add internal loanbook_stable and region_isos_stable (#227) Nov 30, 2020
@jdhoffa
Copy link
Member Author

jdhoffa commented Dec 1, 2020

Ya let's think about it. Luckily region_isos doesn't play a role in match_name() at all, but could make sense to add a loanbook_stable object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Are these differences expected?
2 participants