You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I cannot provide code since it is against my company policies.
I have small .parquet files, and I have tons of them. I read these ones with Dask. They are pretty small like 60KB. If I do the ".compute()" on a dask dataframe, it raises this error:
Traceback (most recent call last):
df = df[[Key, "Index"]].reset_index(drop=True).compute()
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\base.py", line 286, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\base.py", line 568, in compute
results = schedule(dsk, keys, **kwargs)
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 560, in get_sync
return get_async(
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 503, in get_async
for key, res_info, failed in queue_get(queue).result():
File "D:\Python38\lib\concurrent\futures\_base.py", line 437, in result
return self.__get_result()
File "D:\Python38\lib\concurrent\futures\_base.py", line 389, in __get_result
raise self._exception
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 545, in submit
fut.set_result(fn(*args, **kwargs))
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 237, in batch_execute_tasks
return [execute_task(*a) for a in it]
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 237, in <listcomp>
return [execute_task(*a) for a in it]
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 228, in execute_task
result = pack_exception(e, dumps)
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 223, in execute_task
result = _execute_task(task, data)
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\core.py", line 121, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\dataframe\shuffle.py", line 448, in __call__
path = tempfile.mkdtemp(suffix=".partd", dir=self.tempdir)
File "D:\Python38\lib\tempfile.py", line 358, in mkdtemp
_os.mkdir(file, 0o700)
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'E:/temp_dask/1729501515.144889\\tmpol0vhvzl.partd'
Anything else we need to know?: it happens when I have a lot of small files and that compute is done for each one of them.
Environment:
Dask version: 2021.7.0
Python version: 3.8.0
Operating System: Windows
Install method (conda, pip, source): pip
The text was updated successfully, but these errors were encountered:
Thanks for reporting this issue! It looks like your Dask version lags by several years. Please try updating to the latest release and see if the error still occurs. There's a lot of development activity, and your problem may have already been fixed.
I cannot provide code since it is against my company policies.
I have small .parquet files, and I have tons of them. I read these ones with Dask. They are pretty small like 60KB. If I do the ".compute()" on a dask dataframe, it raises this error:
Anything else we need to know?: it happens when I have a lot of small files and that compute is done for each one of them.
Environment:
The text was updated successfully, but these errors were encountered: