You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Given a dask environment with two workers, the following script, produces a fatal error while reading and processing a file with null data. Also, the same issue happens with other formats like csv or orc files.
Traceback (most recent call last):
File "bug-dask.py", line 16, in <module>
gdf.to_csv("*.csv")
File "/home/workspace/lib/python3.7/site-packages/dask/dataframe/core.py", line 1459, in to_csv
return to_csv(self, filename, **kwargs)
File "/home/workspace/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 871, in to_csv
delayed(values).compute(**compute_kwargs)
File "/home/workspace/lib/python3.7/site-packages/dask/base.py", line 281, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/home/workspace/lib/python3.7/site-packages/dask/base.py", line 563, in compute
results = schedule(dsk, keys, **kwargs)
File "/home/workspace/lib/python3.7/site-packages/distributed/client.py", line 2655, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/home/workspace/lib/python3.7/site-packages/distributed/client.py", line 1970, in gather
asynchronous=asynchronous,
File "/home/workspace/lib/python3.7/site-packages/distributed/client.py", line 839, in sync
self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
File "/home/workspace/lib/python3.7/site-packages/distributed/utils.py", line 340, in sync
raise exc.with_traceback(tb)
File "/home/workspace/lib/python3.7/site-packages/distributed/utils.py", line 324, in f
result[0] = yield future
File "/home/workspace/lib/python3.7/site-packages/tornado/gen.py", line 762, in run
value = future.result()
File "/home/workspace/lib/python3.7/site-packages/distributed/client.py", line 1829, in _gather
raise exception.with_traceback(traceback)
File "/home/workspace/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 685, in _write_csv
df.to_csv(f, **kwargs)
File "/home/workspace/lib/python3.7/site-packages/cudf/core/dataframe.py", line 7390, in to_csv
**kwargs,
File "/home/workspace/lib/python3.7/contextlib.py", line 74, in inner
return func(*args, **kwds)
File "/home/workspace/lib/python3.7/site-packages/cudf/io/csv.py", line 209, in to_csv
index=index,
File "cudf/_lib/csv.pyx", line 418, in cudf._lib.csv.write_csv
File "cudf/_lib/csv.pyx", line 486, in cudf._lib.csv.write_csv
RuntimeError: CUDA error at: /home/workspace/include/rmm/device_buffer.hpp:445: cudaErrorInvalidValue invalid argument
If the same file is loaded with Pandas, the script runs smoothly.
Describe the bug
Given a dask environment with two workers, the following script, produces a fatal error while reading and processing a file with null data. Also, the same issue happens with other formats like csv or orc files.
Steps/Code to reproduce bug
The current log is:
If the same file is loaded with Pandas, the script runs smoothly.
Expected behavior
Loading and processing such file with null data should run smoothly.
Environment overview
The text was updated successfully, but these errors were encountered: