Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract PDF on file system instead of memory #16958

Merged
merged 10 commits into from
Nov 7, 2024
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,15 @@ public async Task<string> GetTextAsync(string path, Stream fileStream)
// https://github.com/UglyToad/PdfPig/blob/master/src/UglyToad.PdfPig.Core/StreamInputBytes.cs#L45.
// Thus if it isn't, which is the case with e.g. Azure Blob Storage, we need to copy it to a new, seekable
// Stream.
MemoryStream seekableStream = null;
FileStream seekableStream = null;
try
{
if (!fileStream.CanSeek)
{
// Since fileStream.Length might not be supported either, we can't preconfigure the capacity of the
// MemoryStream.
seekableStream = new MemoryStream();
// While this involves loading the file into memory, we don't really have a choice.
seekableStream = GetTemporaryFileStream();

await fileStream.CopyToAsync(seekableStream);

seekableStream.Position = 0;
}

Expand All @@ -39,7 +38,16 @@ public async Task<string> GetTextAsync(string path, Stream fileStream)
if (seekableStream != null)
{
await seekableStream.DisposeAsync();

await Task.Run(() => File.Delete(seekableStream.Name));
hishamco marked this conversation as resolved.
Show resolved Hide resolved
MikeAlhayek marked this conversation as resolved.
Show resolved Hide resolved
}
}
}

private static FileStream GetTemporaryFileStream()
{
var tempFilePath = Path.Combine(Path.GetTempPath(), Path.GetTempFileName());

return new FileStream(tempFilePath, FileMode.Create, FileAccess.Write);
}
}