-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing PDFs broken when using Elasticsearch and Azure Media Storage features together #17291
Comments
If this is related to PR #16958 could you please debug and check why the stream does not support reading |
After further debugging, this seems to be the offending line:
If I change the parameter |
Do you plan to submit a PR for this? |
Fixes OrchardCMS#17291 by specifying that both Read and Write access are needed on the FileStream used for reading PDF files. This bug was observed when indexing PDFs stored in Azure Blog Storage.
Describe the bug
Indexing PDFs using the
Elasticsearch
feature with theAzure Media Storage
feature appears broken after upgrading to OrchardCore 2.x.We store our media library files in Azure blob storage, and index the contents of PDFs stored in the media library using the Elasticsearch integration. This worked perfectly fine in OrchardCore 1.x, but after upgrading to 2.x we now get this error:
This issue seems to be related to a change in
PdfMediaFileTextProvider.cs
, which now uses aFileStream
instead of aMemoryStream
to hand off the file data toUglyToad.PdfPig
for processing. If I modify the OrchardCore source code to revert back to using a MemoryStream, everything works fine again.Orchard Core version
2.1.3 (using Nuget packages)
To Reproduce
Expected behavior
Indexing should work fine, and text from the PDF should show up in the search index.
The text was updated successfully, but these errors were encountered: