We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ketos test perform inference on the CPU but its data loader uses pinned memory: https://github.com/mittagessen/kraken/blob/773cc00cc07df4b44056512a601f7bffba8f2ada/kraken/ketos/recognition.py#LL464C1-L468C61
ketos test
This can cause various errors. For example, if there is a Cuda device currently used to train a model, torch will throw an error:
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/colibri/mambaforge/envs/kraken_master/bin/ketos:10 in <module> │ │ │ │ 7 │ │ 8 │ │ 9 if __name__ == "__main__": │ │ ❱ 10 │ sys.exit(cli()) │ │ 11 │ │ │ │ /home/colibri/mambaforge/envs/kraken_master/lib/python3.9/site-packages/click/core.py:1130 in │ │ __call__ │ │ │ │ /home/colibri/mambaforge/envs/kraken_master/lib/python3.9/site-packages/click/core.py:1055 in │ │ main │ │ │ │ /home/colibri/mambaforge/envs/kraken_master/lib/python3.9/site-packages/click/core.py:1657 in │ │ invoke │ │ │ │ /home/colibri/mambaforge/envs/kraken_master/lib/python3.9/site-packages/click/core.py:1404 in │ │ invoke │ │ │ │ /home/colibri/mambaforge/envs/kraken_master/lib/python3.9/site-packages/click/core.py:760 in │ │ invoke │ │ │ │ /home/colibri/mambaforge/envs/kraken_master/lib/python3.9/site-packages/click/decorators.py:26 │ │ in new_func │ │ │ │ /home/colibri/files/kraken/kraken/ketos/recognition.py:474 in test │ │ │ │ 471 │ │ │ batches = len(ds_loader) │ │ 472 │ │ │ pred_task = progress.add_task('Evaluating', total=batches, visible=True if n │ │ 473 │ │ │ │ │ ❱ 474 │ │ │ for batch in ds_loader: │ │ 475 │ │ │ │ im = batch['image'] │ │ 476 │ │ │ │ text = batch['target'] │ │ 477 │ │ │ │ lens = batch['seq_lens'] │ │ │ │ /home/colibri/mambaforge/envs/kraken_master/lib/python3.9/site-packages/torch/utils/data/dataloa │ │ der.py:628 in __next__ │ │ │ │ 625 │ │ │ if self._sampler_iter is None: │ │ 626 │ │ │ │ # TODO(https://github.com/pytorch/pytorch/issues/76750) │ │ 627 │ │ │ │ self._reset() # type: ignore[call-arg] │ │ ❱ 628 │ │ │ data = self._next_data() │ │ 629 │ │ │ self._num_yielded += 1 │ │ 630 │ │ │ if self._dataset_kind == _DatasetKind.Iterable and \ │ │ 631 │ │ │ │ │ self._IterableDataset_len_called is not None and \ │ │ │ │ /home/colibri/mambaforge/envs/kraken_master/lib/python3.9/site-packages/torch/utils/data/dataloa │ │ der.py:1333 in _next_data │ │ │ │ 1330 │ │ │ │ self._task_info[idx] += (data,) │ │ 1331 │ │ │ else: │ │ 1332 │ │ │ │ del self._task_info[idx] │ │ ❱ 1333 │ │ │ │ return self._process_data(data) │ │ 1334 │ │ │ 1335 │ def _try_put_index(self): │ │ 1336 │ │ assert self._tasks_outstanding < self._prefetch_factor * self._num_workers │ │ │ │ /home/colibri/mambaforge/envs/kraken_master/lib/python3.9/site-packages/torch/utils/data/dataloa │ │ der.py:1359 in _process_data │ │ │ │ 1356 │ │ self._rcvd_idx += 1 │ │ 1357 │ │ self._try_put_index() │ │ 1358 │ │ if isinstance(data, ExceptionWrapper): │ │ ❱ 1359 │ │ │ data.reraise() │ │ 1360 │ │ return data │ │ 1361 │ │ │ 1362 │ def _mark_worker_as_unavailable(self, worker_id, shutdown=False): │ │ │ │ /home/colibri/mambaforge/envs/kraken_master/lib/python3.9/site-packages/torch/_utils.py:543 in │ │ reraise │ │ │ │ 540 │ │ │ # If the exception takes multiple arguments, don't try to │ │ 541 │ │ │ # instantiate since we don't know how to │ │ 542 │ │ │ raise RuntimeError(msg) from None │ │ ❱ 543 │ │ raise exception │ │ 544 │ │ 545 │ │ 546 def _get_available_device_type(): │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: Caught RuntimeError in pin memory thread for device 0. Original Traceback (most recent call last): File "/home/colibri/mambaforge/envs/kraken_master/lib/python3.9/site-packages/torch/utils/data/_utils/pin_memory.py", line 32, in do_one_step data = pin_memory(data, device) File "/home/colibri/mambaforge/envs/kraken_master/lib/python3.9/site-packages/torch/utils/data/_utils/pin_memory.py", line 58, in pin_memory return type(data)({k: pin_memory(sample, device) for k, sample in data.items()}) # type: ignore[call-arg] File "/home/colibri/mambaforge/envs/kraken_master/lib/python3.9/site-packages/torch/utils/data/_utils/pin_memory.py", line 58, in <dictcomp> return type(data)({k: pin_memory(sample, device) for k, sample in data.items()}) # type: ignore[call-arg] File "/home/colibri/mambaforge/envs/kraken_master/lib/python3.9/site-packages/torch/utils/data/_utils/pin_memory.py", line 53, in pin_memory return data.pin_memory(device) RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I think pin_memory should be set to false.
pin_memory
false
The text was updated successfully, but these errors were encountered:
98ea3be
Successfully merging a pull request may close this issue.
ketos test
perform inference on the CPU but its data loader uses pinned memory: https://github.com/mittagessen/kraken/blob/773cc00cc07df4b44056512a601f7bffba8f2ada/kraken/ketos/recognition.py#LL464C1-L468C61This can cause various errors. For example, if there is a Cuda device currently used to train a model, torch will throw an error:
I think
pin_memory
should be set tofalse
.The text was updated successfully, but these errors were encountered: