Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protected docx, xlsx, pptx could have stopped to be classified properly #1805

Closed
lfcnassif opened this issue Aug 9, 2023 · 1 comment
Closed
Assignees
Labels

Comments

@lfcnassif
Copy link
Member

While working on #1793, I tried to put some rules into CustomSignatures.xml to identify iWork 13 files based on extension. As pages, numbers & key extensions are already present in Tika default definitions, I added them as uppercase to avoid extension conflicts. That is a workaround I used before in the past for protected OOXML files:
https://github.com/sepinf-inc/IPED/blob/master/iped-app/resources/config/conf/CustomSignatures.xml#L876-L892

Unfortunately that had no effect and I had to put custom rules into RefineCategoryTask.js. It used to work before and could have stopped to work after the last Tika upgrade. So we should check if protected OOXML MS Office files are still classified properly...

@lfcnassif lfcnassif self-assigned this Aug 9, 2023
@lfcnassif
Copy link
Member Author

Hopefully this is not happening because we have a hard coded handling of encrypted MS Office documents into SignatureTask. But I found a minor issue while testing this and will open another ticket to fix it. Closing this as invalid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant