Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message printed to console on read.fst #181

Closed
jangorecki opened this issue Dec 2, 2018 · 3 comments · Fixed by #182
Closed

Message printed to console on read.fst #181

jangorecki opened this issue Dec 2, 2018 · 3 comments · Fixed by #182
Assignees
Labels
Milestone

Comments

@jangorecki
Copy link
Contributor

When reading fst file we are getting extra message about loading data.table package. There should be an option to suppress that message.

fst::write.fst(iris, "iris.fst")
ir=fst::read.fst("iris.fst")
Loading required namespace: data.table

After investigating I found that reading fst file actually requires data.table package to be installed while DESCRIPTION defines it as Suggested depedency. Any use of data.table should be properly escaped in such case. When we try to read fst not having data.table installed we are getting following error:

fst::write.fst(iris, "iris.fst")
> ir=fst::read.fst("iris.fst")
Loading required namespace: data.table
Failed with error:  'there is no package called 'data.table''
@MarcusKlik
Copy link
Collaborator

MarcusKlik commented Dec 2, 2018

Hi @jangorecki, thanks a lot for the fix!

I think at some point data.table will end up in the Imports field again, as I plan to use data.table's fast sorting capabilities to sort chunks of data that constitute one or more groups of the data-set. Together with a merge-sort algorithm (for the chunks), that would allow for out-of-memory sorting of very big tables that are stored in a fst file.

Thanks again for the corrections!

@MarcusKlik MarcusKlik added this to the fst v0.8.10 milestone Dec 2, 2018
@MarcusKlik MarcusKlik added the bug label Dec 2, 2018
@xiaodaigh
Copy link
Contributor

Together with a merge-sort algorithm (for the chunks)

I wish I know enough C++ to help. I have started work on implementing an R-code only version.

@MarcusKlik
Copy link
Collaborator

Hi @xiaodaigh, an R only version using fst as a backend for writing the chunks might be almost as fast as a C++ implementation!

Most of the computational work during a merge sort is done in serializing and de-serializing chunks and writing- and reading the data to disk I think and the actual sorting of the chunks themselves (using data.table) will probably take less time.

You'll have to coordinate your workers however, and that will be relatively slow (especially on Windows :-))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants