-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-break space character in PDF causing load problem with text Brat sentence view #5269
Comments
That should actually have been fixed in recent versions. If you still have a project lying around from an older version, you can try this to fix your project: Open the CAS Doctor in the project settings and then follow the steps below (leaving all the repair options that are enabled by default active as well)
Hopefully, no errors should be left now and the document should render. |
Probably duplicate of #5035 |
Thanks for the guidance on the CAS Doctor repair feature, but this PDF seems more tricky, and the issue couldn't be resolved via the CAS Doctor approach. I wish I could share the exact document here but I can't due to the file containing sensitive data. It still loads fine in PDF viewer mode, but not in text viewing mode of the PDF file. What I tried was Sentence (BRAT) view. I use pymupdf and the work-around was to delete I am not sure of the exact condition of the error. It looks like there are other |
Does CAS Doctor detect an issue or does it claim everything to be in order? |
For INITIAL andCURATION, nothing to report for that document, but for annotation I am seeing |
The unreachable annotations are normal. This happens when a user deletes annotations. The stick around for a while until the document is opened for annotation again at which point they are garbage-collected. I think this is only an info-level message. |
Code I used with pdfbox v3:
|
I uploaded file here that can reproduce the issue. The step to reproduce the error with this redacted.pdf:
|
…xt Brat sentence view - Use the central TrimUtils in the brat visualizer instead of a local copy (which was out-of-sync)
Thanks for helping to track this down. There was some duplicate code and the recent fix wrt. nbsp issues only fixed one copy, not the other. I have removed the second copy now, using only the one that works. Looks like that fixes the issue. |
Thank you for applying the fix so quickly! |
…k-space-character-in-PDF-causing-load-problem-with-text-Brat-sentence-view #5269 - Non-break space character in PDF causing load problem with text Brat sentence view
…xt Brat sentence view - Bit of cleaning up
Describe the bug and To Reproduce
Wen PDF file contains non-breaking space (U+00A0), it loads the document fine in PDF view, but when switch to Brat-sentence text view, it has an issue
Expected behavior
Documents load with non-breaking space in Brat sentence text view if it works with PDF view
Screenshots
Please complete the following information:
The text was updated successfully, but these errors were encountered: