Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Hide inactive tests from validation report #563

Closed
jl5000 opened this issue Aug 12, 2024 · 4 comments · Fixed by #565
Closed

Feature request: Hide inactive tests from validation report #563

jl5000 opened this issue Aug 12, 2024 · 4 comments · Fixed by #565
Assignees

Comments

@jl5000
Copy link

jl5000 commented Aug 12, 2024

In my use case we have a master dataset containing all columns and rows. This data is then used for bespoke downstream analyses, often using subsets of this data. I have a function which creates a validation report for the master dataset, but I would also like to use this same function to pass through subsets of the data and only apply/show the tests that are relevant to the data. This will be viewed by stakeholders, so I'd prefer not to show them lots of greyed out rows.

I thought I would be able to do this by editing the interrogated agent, but it doesn't seem to work. (Incidentally, I was puzzled why the active column is a list column).

I have a similar situation for creating the data dictionary, it would be good if this skipped columns that didn't exist so I could re-use the same code.

library(pointblank)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

x <- iris |> 
  create_agent() |> 
  col_exists("Petal.Length",
             active = has_columns(iris, Petal.Length)) |> 
  col_exists("Spec",
             active = has_columns(iris, Spec)) |> 
  col_exists("Sepal.Length",
             active = has_columns(iris, Sepal.Length)) |> 
  interrogate()

x$validation_set <- filter(x$validation_set, unlist(active))
x
#> Error in if (assertion_type[x] == "serially" && !is.na(agent$validation_set[x, : missing value where TRUE/FALSE needed

Created on 2024-08-12 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14 ucrt)
#>  os       Windows 10 x64 (build 19045)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United Kingdom.utf8
#>  ctype    English_United Kingdom.utf8
#>  tz       Europe/London
#>  date     2024-08-12
#>  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  blastula      0.3.5   2024-02-24 [1] CRAN (R 4.4.1)
#>  cli           3.6.3   2024-06-21 [1] CRAN (R 4.4.1)
#>  digest        0.6.36  2024-06-23 [1] CRAN (R 4.4.1)
#>  dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.4.1)
#>  evaluate      0.24.0  2024-06-10 [1] CRAN (R 4.4.1)
#>  fansi         1.0.6   2023-12-08 [1] CRAN (R 4.4.1)
#>  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.4.1)
#>  fs            1.6.4   2024-04-25 [1] CRAN (R 4.4.1)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.4.1)
#>  glue          1.7.0   2024-01-09 [1] CRAN (R 4.4.1)
#>  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
#>  knitr         1.48    2024-07-07 [1] CRAN (R 4.4.1)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.4.1)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.4.1)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.4.1)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.4.1)
#>  pointblank  * 0.12.1  2024-03-25 [1] CRAN (R 4.4.1)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.4.1)
#>  reprex        2.1.1   2024-07-06 [1] CRAN (R 4.4.1)
#>  rlang         1.1.4   2024-06-04 [1] CRAN (R 4.4.1)
#>  rmarkdown     2.27    2024-05-17 [1] CRAN (R 4.4.1)
#>  rstudioapi    0.16.0  2024-03-24 [1] CRAN (R 4.4.1)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.4.1)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.4.1)
#>  tidyselect    1.2.1   2024-03-11 [1] CRAN (R 4.4.1)
#>  utf8          1.2.4   2023-10-22 [1] CRAN (R 4.4.1)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.4.1)
#>  withr         3.0.1   2024-07-31 [1] CRAN (R 4.4.1)
#>  xfun          0.46    2024-07-18 [1] CRAN (R 4.4.1)
#>  yaml          2.3.10  2024-07-26 [1] CRAN (R 4.4.1)
#> 
#>  [1] C:/Program Files/R/R-4.4.1/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@yjunechoe
Copy link
Collaborator

yjunechoe commented Aug 12, 2024

I think I'd prefer this to be handled via post-processing in {gt} (you should get the "hide inactive rows" behavior for free once we get something like gt::rows_hide() rstudio/gt#975). Unfortunately, not currently possible to do this post-hoc IMO.

In the meantime, the missing hack in your solution is to also align validation_set$i. So given your agent x, this should work:

x$validation_set <- filter(x$validation_set, sapply(active, isTRUE))
x$validation_set$i <- seq_len(nrow(x$validation_set))
x

Note that this isn't public API (!!), though my hope is that this workaround can be made a bit more painless until we get the more principled solution from {gt} (especially w.r.t. needing to update the i column - this surprised me too).


A note for the future - error is triggered here:

validation_set <- validation_set[report_tbl$i, ]

@yjunechoe
Copy link
Collaborator

Separately, to your comment:

Incidentally, I was puzzled why the active column is a list column

This is because active can also hold expressions that evaluate to TRUE/FALSE. For example, if you specify the has_columns() condition as a ~ formula, you get to keep a record of that (and not simply whether they evaluated to TRUE/FALSE):

x <- iris |> 
  create_agent() |> 
  col_exists("Petal.Length",
             active = ~ . %>% has_columns(Petal.Length)) |> 
  col_exists("Spec",
             active = ~ . %>% has_columns(Spec)) |> 
  col_exists("Sepal.Length",
             active = ~ . %>% has_columns(Sepal.Length)) |> 
  interrogate()
  
x$validation_set$active
#> [[1]]
#> ~. %>% has_columns(Petal.Length)
#> 
#> [[2]]
#> ~. %>% has_columns(Spec)
#> 
#> [[3]]
#> ~. %>% has_columns(Sepal.Length)

So actually, while active works for your specific example, you should instead read eval_active which is the logical vector column you're looking for:

x$validation_set$eval_active
#> [1]  TRUE FALSE  TRUE

@jl5000
Copy link
Author

jl5000 commented Aug 12, 2024

Many thanks! I shall take your advice for the interim workarounds and wait for the {gt} functionality :)

I'm happy for you to close this issue if you would like.

@yjunechoe
Copy link
Collaborator

@jl5000 The ability to hide rows by filtering on $validation_set still remains not part of public API, but your mental model of equivalence between rows of $validation_set and the rows of the agent report is accurate.

On dev, the intuitive behavior from your reprex should now work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants