Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate option to skip derivative creation on .odt files #723

Open
2 tasks done
eporter23 opened this issue Jan 22, 2025 · 2 comments
Open
2 tasks done

Investigate option to skip derivative creation on .odt files #723

eporter23 opened this issue Jan 22, 2025 · 2 comments
Assignees

Comments

@eporter23
Copy link
Contributor

eporter23 commented Jan 22, 2025

In our multiple large import tests, we find that .odt files which are attached to every migrated work cause errors and incomplete ingests as CreateDerivative jobs are run on them. At this point, it's not clear if it is the attempt to create a thumbnail image or if text extraction is occurring when these jobs fail or cause the application processes to hang.

These files are important for preservation, but are always set to Private and will not be seen by end-users. We also anticipate that not many users would ever manually upload one of these files, as the majority of deposits are either PDF or standard MS Office files.

For this ticket, we want to investigate the level of effort to:

  • Exclude .odt files from CreateDerivatives jobs
  • Ensure that other types of plain text documents (.txt, .rtf) can still be submitted through normal manual deposit processes
@bwatson78
Copy link
Contributor

#744

@bwatson78
Copy link
Contributor

@eporter23 .txt files that get characterized as text/plain aren't originally in the rubric of files that will be processed for derivatives.

@bwatson78 bwatson78 self-assigned this Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants