Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lost pages #55

Open
sailxjx opened this issue Apr 18, 2024 · 0 comments
Open

Lost pages #55

sailxjx opened this issue Apr 18, 2024 · 0 comments

Comments

@sailxjx
Copy link

sailxjx commented Apr 18, 2024

pythonlearn.pdf

I used a local docker server to parse the above document, which has 239 pages. However, the ingestor only parsed 158 pages, and the remaining content was discarded. Is this a bug?

Here is the logs:

processing page: 140 Number of p_tags.... 178
processing page: 141 Number of p_tags.... 4
processing page: 142 Number of p_tags.... 251
processing page: 143 Number of p_tags.... 303
processing page: 144 Number of p_tags.... 322
processing page: 145 Number of p_tags.... 287
processing page: 146 Number of p_tags.... 330
processing page: 147 Number of p_tags.... 308
processing page: 148 Number of p_tags.... 265
processing page: 149 Number of p_tags.... 312
processing page: 150 Number of p_tags.... 298
processing page: 151 Number of p_tags.... 346
processing page: 152 Number of p_tags.... 412
processing page: 153 Number of p_tags.... 287
processing page: 154 Number of p_tags.... 193
processing page: 155 Number of p_tags.... 5
processing page: 156 192.168.65.1 - - [18/Apr/2024 14:24:54] "POST /api/parseDocument?renderFormat=all HTTP/1.1" 200 -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant