Streaming approach for reading data in fread #1958
Labels
design-doc
Generic discussion / roadmap how some major new functionality can be implemented
fread
Issues related to parsing any input files via fread function
new feature
Feature requests for new functionality
See also: #1843, #1950
This issue concerns reading data from sources other than plain file. Such cases could include:
file
-like object;Currently we handle such use cases by first dumping the content into a file, and then reading it via fread as normal. Such approach, however, is suboptimal:
max_nrows=
parameter is used;Suggested Implementation
In most of the cases listed above data reading is unambiguously a sequential task. Therefore, it has to run in a single-threaded mode (with access to Python in many cases). The suggestion is therefore to use a dedicated thread for data reading, while all other threads will be busy parsing that data.
The input thread must also be ready to receive a signal from the worker threads to pause receiving any data. At that point the input thread must exit its current task, leaving the Input object in such a state that it could resume receiving data from the point where it left.
The text was updated successfully, but these errors were encountered: