-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
write_fst Seems To Skip Small Tables When Writing In A for Loop #280
Comments
I have personally never seen this behaviour, and am unable to replicate it based on the information given. A few questions whose answers might enable you to narrow down the issue without de-anonymising your data include:
While there's no reproducible example, I doubt there's a lot more help that can be provided, but following the advice in this article may help you create one: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example |
@AndyTwo2 My answers bullet-by-bullet:
When the big job is complete I will go back through all the archives containing Thanks for suggestions! |
@AndyTwo2 I should have probably mentioned that although different in number of rows, all tables contain the same columns - in reference to second bullet in your reply. |
Hi @drag05, thanks for reporting your issue! I've done a small test using a table with 9 rows, but cannot reproduce your finding: nr_of_rows <- 9
test_dir <- tempdir()
df <- data.frame(
Logical = sample(c(TRUE, FALSE, NA), prob = c(0.85, 0.1, 0.05), nr_of_rows, replace = TRUE),
Integer = sample(1L:100L, nr_of_rows, replace = TRUE),
Real = sample(sample(1:10000, 20) / 100, nr_of_rows, replace = TRUE),
Factor = as.factor(sample(labels(UScitiesD), nr_of_rows, replace = TRUE))
)
# write using various compression settings
df |>
fst::write_fst(paste0(test_dir, "/compress_0.fst"), compress = 0 ) |>
fst::write_fst(paste0(test_dir, "/compress_1.fst"), compress = 1 ) |>
fst::write_fst(paste0(test_dir, "/compress_50.fst"), compress = 50 ) |>
fst::write_fst(paste0(test_dir, "/compress_75.fst"), compress = 75 ) |>
fst::write_fst(paste0(test_dir, "/compress_100.fst"), compress = 100)
# test roundtrip against source table
fst::read_fst(paste0(test_dir, "/compress_0.fst")) |> testthat::expect_equal(df)
fst::read_fst(paste0(test_dir, "/compress_1.fst")) |> testthat::expect_equal(df)
fst::read_fst(paste0(test_dir, "/compress_50.fst")) |> testthat::expect_equal(df)
fst::read_fst(paste0(test_dir, "/compress_75.fst")) |> testthat::expect_equal(df)
fst::read_fst(paste0(test_dir, "/compress_100.fst")) |> testthat::expect_equal(df) Is there a way you can adapt the example above to reflect the type of data you are using and reproduce the issue? thanx |
@MarcusKlik A "small test" does not replicate the error. A large dataset, from which table chunks of different As I have mentioned to @AndyTwo2 above, the source data that is being transferred to another location on disk has millions of rows and tens of thousands of table chunks of different I am no longer sure this is a package issue. Could be a disk cache or transfer issue. Thank you! |
I am using
fst
version 0.9.8 with R-4.3.2.I am writing 'data.table' class data frames in a
for
loop. The tables have different number of rows as they are the result of in-silico chemical modification of a list of peptides (protein fragments).When writing these data tables to
csv
format usingdata.table::fwrite
(with append = TRUE), all modified peptides are written correctly and all are present.When writing them to
fst
format withcompress = 50
orcompress = 100
, data tables with 10- 11 rows (resulted from peptides with 1-2 modifications) are skipped while the bigger ones are written as expected.Unfortunately proprietary rights do not allow me to present a full example, just the code for
write_fst
with argumentuniform_encoding
set to default (for speed as there are millions of such tables to be written):Here,
dt1
is a 'data.table' class data frame residing in memory.fname
is a character length 1 , i stands for current iteration and the arguments ofpaste0
form the unique name of thefst
file written to disk. Thefst
tables that are written have names formatted as expected.The upstream code is the same, the only difference at this point is the file format writing decided by an
if
control selected by User:if (compress == yes) write "fst" else write "csv"
.Thank you!
The text was updated successfully, but these errors were encountered: