Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finally rip out these Connect user and content caches #793

Merged
merged 4 commits into from
Oct 5, 2023

Conversation

juliasilge
Copy link
Member

In #667 we moved from caches saved to disk to environment caches, but this still causes problems that are very confusing and hard to reason about, as @ryjohnson09 demonstrates in #789. I would like to propose we just throw this whole idea away; my opinion is that these Connect user and content caches are not gaining us much (especially now that they are only environment caches) and when things go wrong, it's pretty horrible.

@juliasilge juliasilge marked this pull request as ready for review September 29, 2023 18:40
@juliasilge juliasilge requested a review from hadley September 29, 2023 18:51
@hadley
Copy link
Member

hadley commented Oct 4, 2023

Just to reassure me that this doesn't have a big performance impact, could you give a couple of benchmarks of reading/writing a pin before and after this change?

@juliasilge
Copy link
Member Author

juliasilge commented Oct 4, 2023

It's definitely slower. Here is with current CRAN:

bench::mark(
    pins = {
        library(pins)
        board <- board_connect()
        name <- pin_write(board, 1, pins:::random_pin_name())
        testthat::expect_equal(pin_read(board, name), 1)
        pin_delete(board, name)
    },
    iterations = 5
)
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-NfgrIuJ1Yf'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-aqX9EJ6vgk'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-Tobj1QG7JD'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-UsFthxGSFR'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-2DtQsuQjFY'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-OaKWLFT8vB'
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 pins          3.64s     3.8s     0.264    14.8MB    0.685

Created on 2023-10-04 with reprex v2.0.2

And here is with this PR:

bench::mark(
    pins = {
        library(pins)
        board <- board_connect()
        name <- pin_write(board, 1, pins:::random_pin_name())
        testthat::expect_equal(pin_read(board, name), 1)
        pin_delete(board, name)
    },
    iterations = 5
)
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-jJODh5a5wa'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-oe8m7ajoYw'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-sGcG0aUn9t'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-9pB1xPTZpm'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-RngIENxoHL'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-veDVBVQXWv'
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 pins          4.01s    4.05s     0.245      15MB    0.686

Created on 2023-10-04 with reprex v2.0.2

In this example, we look up what content on Connect belongs to name three times (write, read, delete). With CRAN right now, we look that up one time and then after that look in the content cache, while with this PR we look it up every time and it's 5-10% slower. If we increased that a lot (say, write to the same pin a dozen times) the difference would increase more and more with each additional API call that we weren't doing before.

The argument for making the change in this PR would be that this gain in speed isn't worth it because it is difficult for us to safely accommodate when the local cache gets out of sync, and then it's super confusing to the user.

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The small speed savings don't seem worth the additional hassle that caching adds.

@juliasilge juliasilge merged commit 1f61a84 into main Oct 5, 2023
14 checks passed
@juliasilge juliasilge deleted the no-more-connect-caches branch October 5, 2023 01:23
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Oct 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants