[BUG] Quadratic (in number of columns) behaviour in read_csv
#14005
Labels
2 - In Progress
Currently a work in progress
bug
Something isn't working
Performance
Performance related issue
Describe the bug
When calling
cudf.read_csv
on a CSV file with many (hundreds of thousands) of columns, we take an unexpectedly long time. Yes, I don't expect this to be performant, but...I create a sequence of CSV files with 1 row, and N columns:
The culprit is this bit of code:
This looks innocuous, but unfortunately,
df._dtypes
is aproperty
:So the name to dtype lookup in the loop is
O(ncolumns)
rather thanO(1)
.After a localised fix:
and all is right with the world.
The text was updated successfully, but these errors were encountered: