Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oreo Parsing Bug #19

Open
braceal opened this issue Mar 29, 2024 · 0 comments
Open

Oreo Parsing Bug #19

braceal opened this issue Mar 29, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@braceal
Copy link
Contributor

braceal commented Mar 29, 2024

How did you install pdfwf?

See Readme.

What version of pdfwf are you using?

0.1.4 oreo_debug branch

Describe the problem.

parse raised an exception: Caught RuntimeError in DataLoader worker process 3.
Original Traceback (most recent call last):
  File "/lus/eagle/projects/CVD-Mol-AI/braceal/conda/envs/pdfwf/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/lus/eagle/projects/CVD-Mol-AI/braceal/conda/envs/pdfwf/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/lus/eagle/projects/CVD-Mol-AI/braceal/conda/envs/pdfwf/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/lus/eagle/projects/FoundEpidem/braceal/projects/metric-rag/src/pdfwf/pdfwf/parsers/oreo/tensor_utils.py", line 374, in __getitem__
    page = self.current_doc_doc[rel_page_idx]
  File "/lus/eagle/projects/CVD-Mol-AI/braceal/conda/envs/pdfwf/lib/python3.10/site-packages/fitz/__init__.py", line 2593, in __getitem__
    return self.load_page(i)
  File "/lus/eagle/projects/CVD-Mol-AI/braceal/conda/envs/pdfwf/lib/python3.10/site-packages/fitz/__init__.py", line 4734, in load_page
    page = mupdf.fz_load_page(self.this, page_id)
  File "/lus/eagle/projects/CVD-Mol-AI/braceal/conda/envs/pdfwf/lib/python3.10/site-packages/fitz/mupdf.py", line 39348, in fz_load_page
    return _mupdf.fz_load_page(doc, number)
RuntimeError: code=2: cannot find page 17 in page tree

Second bug:

Traceback (most recent call last):
  File "/lus/eagle/projects/FoundEpidem/braceal/projects/metric-rag/src/pdfwf/pdfwf/parsers/oreo/oreo.py", line 282, in parse
    ) = get_packed_patch_tensor(
  File "/lus/eagle/projects/FoundEpidem/braceal/projects/metric-rag/src/pdfwf/pdfwf/parsers/oreo/tensor_utils.py", line 2102, in get_packed_patch_tensor
    packed_patches_and_indices = [
  File "/lus/eagle/projects/FoundEpidem/braceal/projects/metric-rag/src/pdfwf/pdfwf/parsers/oreo/tensor_utils.py", line 2103, in <listcomp>
    get_packed_patch_list(
  File "/lus/eagle/projects/FoundEpidem/braceal/projects/metric-rag/src/pdfwf/pdfwf/parsers/oreo/tensor_utils.py", line 1648, in get_packed_patch_list
    merge_patches_into_row(
  File "/lus/eagle/projects/FoundEpidem/braceal/projects/metric-rag/src/pdfwf/pdfwf/parsers/oreo/tensor_utils.py", line 1350, in merge_patches_into_row
    [
  File "/lus/eagle/projects/FoundEpidem/braceal/projects/metric-rag/src/pdfwf/pdfwf/parsers/oreo/tensor_utils.py", line 1351, in <listcomp>
    F.pad(patch, (0, 0, row_height - patch.size()[1], 0), value=1.0)
  File "/lus/eagle/projects/CVD-Mol-AI/braceal/conda/envs/pdfwf/lib/python3.10/site-packages/torch/nn/functional.py", line 4495, in pad
    return torch._C._nn.pad(input, pad, mode, value)
@braceal braceal added the bug Something isn't working label Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant