Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding progress bar display func #951

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Adding progress bar display func #951

wants to merge 10 commits into from

Conversation

meztez
Copy link

@meztez meztez commented Jan 5, 2025

For #199.

@meztez
Copy link
Author

meztez commented Jan 5, 2025

Not printing, looking for further insight. @krlmlr? Any ideas.

@krlmlr
Copy link
Collaborator

krlmlr commented Jan 5, 2025

Thanks for working on it! I think this should be an R option with a callback that is called when the option is set, see duckdb.materialize_callback for an example.

@krlmlr
Copy link
Collaborator

krlmlr commented Jan 5, 2025

Or, perhaps even a slot in the duckdb_connection class?

@meztez
Copy link
Author

meztez commented Jan 5, 2025

Well, I'm almost done with the callback. So I'll test that first.

@meztez meztez force-pushed the main branch 2 times, most recently from b0cdaba to 560eee1 Compare January 6, 2025 01:01
@meztez
Copy link
Author

meztez commented Jan 6, 2025

@meztez
Copy link
Author

meztez commented Jan 6, 2025

All right, it works, now ironing out the bugs.

@meztez
Copy link
Author

meztez commented Jan 6, 2025

library(duckdb)
library(cli)

progress <- function(x) {
  if (cli::cli_progress_num() == 0) {
    cli::cli_progress_bar("Duckdb SQL", total = 100, .envir = .GlobalEnv)
  }
  cli::cli_progress_update(set = x, .envir = .GlobalEnv)
  if (x > 100) {
    cli::cli_progress_done(.envir = .GlobalEnv)
  }
}
options("duckdb.progress_display" = progress)
conn <- duckdb::dbConnect(duckdb::duckdb())
duckdb::dbSendQuery(conn, "SET progress_bar_time = 0;")
q <- "CREATE OR REPLACE TABLE BOB AS (
      SELECT * FROM 'ldbc-sf300-comments-creationDate.parquet')"
duckdb::dbSendQuery(conn, q)

Mytherin added a commit to duckdb/duckdb that referenced this pull request Jan 6, 2025
#ifndef DUCKDB_DISABLE_PRINT seems redundant since it is already used in
printer.cpp and it prevents from using a display set via
config.create_display_func when compiled with flag
-DDUCKDB_DISABLE_PRINT, like the duckdb-r package, where I'm trying to
implement a display.

https://github.com/duckdb/duckdb/blob/main/src/common/printer.cpp
duckdb/duckdb-r#951

PrintProgress -> TerminalProgressBarDisplay::Update ->
TerminalProgressBarDisplay::PrintProgressInternal -> Printer::RawPrint
and there is a macro there.

Plus there is already a config option to enable_progress_bar and default
is FALSE.

So. Can it be remove?
cc: @krlmlr
@meztez
Copy link
Author

meztez commented Jan 6, 2025

I'm done on this one. Let me know if this works for you.

@e-kotov
Copy link

e-kotov commented Jan 6, 2025

Testing with {spanishoddata}:

library(spanishoddata)
library(duckdb)
library(tidyverse)


x_dates <- c("2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04")
x <- spod_get(type = "od", zones = "distr", dates = x_dates)

dbGetQuery(x$src$con, "SELECT current_setting('enable_progress_bar');")
dbSendQuery(x$src$con, "SET enable_progress_bar = true;")
dbGetQuery(x$src$con, "SELECT current_setting('enable_progress_bar');")


progress <- function(x) {
  if (cli::cli_progress_num() == 0) {
    cli::cli_progress_bar("Duckdb SQL", total = 100, .envir = .GlobalEnv)
  }
  cli::cli_progress_update(set = x, .envir = .GlobalEnv)
  if (x > 100) {
    cli::cli_progress_done(.envir = .GlobalEnv)
  }
}

options("duckdb.progress_display" = progress)
duckdb::dbSendQuery(x$src$con, "SET progress_bar_time = 0;")

xx <- x |> group_by(id_origin, date, activity_origin) |> summarise(mean_trips = mean(n_trips)) |> collect()
Screenshot 2025-01-07 at 00 31 57

And it works!

@meztez do we have to manually define the progress function though...? what is the final idea of this PR? I would expect that progress bar just 'magically' appears as soon as we do:

dbGetQuery(x$src$con, "SELECT current_setting('enable_progress_bar');")

p.s. in my case x$src$con is because spod_get returns tbl_duckdb_connection, so you have to reach out to the connection itself.

@meztez
Copy link
Author

meztez commented Jan 6, 2025

It could provide a dummy default. It's just a function(x) called with progress percentage from within duckdb-r.

I'm not the package maintainer and I just needed it for a deliverable, so whatever works is fine by me.

@krlmlr
Copy link
Collaborator

krlmlr commented Jan 7, 2025

Thanks for the PR!

Looking at the implementation, I think the callback function should be a slot in the connection object. There could be basic reporting (opt-out, in interactive mode only) in the duckdb R package, and more sophisticated progress in duckplyr.

@e-kotov
Copy link

e-kotov commented Jan 7, 2025

I'm not the package maintainer and I just needed it for a deliverable, so whatever works is fine by me.

@meztez totally makes sense. Thanks for the work in the internals to make this possible!

Looking forward for this to be merged!

@HenrikBengtsson
Copy link

In the above examples, (x > 100) indicates that the processing is complete. Shouldn't that be (x >= 100)? I think it's more common to consider 100% to indicate "done" than "still processing".

@meztez
Copy link
Author

meztez commented Jan 7, 2025

In the above examples, (x > 100) indicates that the processing is complete. Shouldn't that be (x >= 100)? I think it's more common to consider 100% to indicate "done" than "still processing".

progress <- function(x) {
  if (x < 100 && cli::cli_progress_num() == 0) {
    cli::cli_progress_bar("Duckdb SQL", total = 100, .envir = .GlobalEnv, )
  }
  cli::cli_progress_update(set = x, .envir = .GlobalEnv)
}

options("duckdb.progress_display" = progress)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants