-
Help
DescriptionHello, How does For ex: I have some data like so (partitioned parquet files created by arrow): Would I define target like this, where I simply track the top directory (and target tracks changes for any files contained within said directory)? list(
tar_target(
fake_data_loc, {
out_path <- "some_location/FAKE_arrow10"
create_data() |>
group_by(group) |>
arrow::write_dataset(out_path)
out_path # <---- just the root directory
},
format = "file"
),
tar_target(
downstream,{
fake_data_loc
...
...
}
),
...
) Or would I have to define a target like so, where the target is a vector of all the files within? list(
tar_target(
fake_data_loc,{
out_path <- "some_location/FAKE_arrow10"
create_data() |>
group_by(group) |>
arrow::write_dataset(out_path)
fs::dir_ls(out_path, recurse = TRUE, type = "file") # <---- a vector of all the files
},
format = "file"
),
tar_target(
downstream,{
fake_data_loc
...
...
}
),
...
)
Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I suppose I answered my own question with an experiment... library(targets)
list(
tar_target(
mtcars_out_1,{
tibble::as_tibble(mtcars) |>
dplyr::group_by(cyl) |>
arrow::write_dataset("folder_out")
here::here("folder_out")
},
format = "file"
),
tar_target(
mtcars_out_2,{
tibble::as_tibble(mtcars) |>
dplyr::group_by(cyl) |>
arrow::write_dataset("file_out")
fs::dir_ls("file_out", recurse = TRUE, type = "file")
},
format = "file"
)
)
Both approaches seem to work right. Is there one approach more performant than the other, esp for v-large datasets? |
Beta Was this translation helpful? Give feedback.
I suppose I answered my own question with an experiment...
Both approaches seem to work right.
Is there one approach more performant than the other, esp for v-large datasets?