community: Fix the problem of error reporting when OCR extracts text from PDF. #29378

jiangtongxueya · 2025-01-23T13:47:33Z

Description: The issue has been fixed where images could not be recognized from xObject[obj]["/Filter"] (whose value can be either a string or a list of strings) in the _extract_images_from_page() method. It also resolves the bug where vectorization by Faiss fails due to the failure of image extraction from a PDF containing only imagesIndexError: list index out of range.
Issue:
Fix the following issues:
#15227 #22892 #26652 #27153
Related issues:
#7067
Dependencies: None
Twitter handle: None

Fix the following issues: #15227 #22892 #26652 #27153 Related issues: #7067

vercel · 2025-01-23T13:47:38Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
langchain	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jan 23, 2025 3:01pm

Fix the problem of error reporting when OCR extracts text from PDF.

a0011d3

Fix the following issues: #15227 #22892 #26652 #27153 Related issues: #7067

dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jan 23, 2025

dosubot bot added community Related to langchain-community Ɑ: doc loader Related to document loader module (not documentation) 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Jan 23, 2025

jiangtongxueya added 2 commits January 23, 2025 21:56

Adjust the length of code lines.

9d907dc

Code formatting

f92f351

vercel bot deployed to Preview January 23, 2025 14:06 View deployment

vercel bot deployed to Preview January 23, 2025 14:21 View deployment

format

fa90550

ccurme approved these changes Jan 23, 2025

View reviewed changes

dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Jan 23, 2025

ccurme enabled auto-merge (squash) January 23, 2025 14:53

vercel bot deployed to Preview January 23, 2025 15:01 View deployment

ccurme merged commit a1e6207 into langchain-ai:master Jan 23, 2025
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

community: Fix the problem of error reporting when OCR extracts text from PDF. #29378

community: Fix the problem of error reporting when OCR extracts text from PDF. #29378

jiangtongxueya commented Jan 23, 2025 •

edited

Loading

vercel bot commented Jan 23, 2025 •

edited

Loading

community: Fix the problem of error reporting when OCR extracts text from PDF. #29378

community: Fix the problem of error reporting when OCR extracts text from PDF. #29378

Conversation

jiangtongxueya commented Jan 23, 2025 • edited Loading

vercel bot commented Jan 23, 2025 • edited Loading

jiangtongxueya commented Jan 23, 2025 •

edited

Loading

vercel bot commented Jan 23, 2025 •

edited

Loading