Upgrading ingest-attachment dependencies #3111

kartg · 2022-04-29T18:35:12Z

Signed-off-by: Kartik Ganesh [email protected]

Description

Multiple dependencies under ingest-attachment have been upgraded:

Tika libraries upgraded from 1.24.1 to 2.4.0
- The major version upgrade requires an explicit dependency on tika-parsers-standard-package to import the parser implementations, and an update to the namespace of RTFParser.
- This upgrade also required an update of Apache Commons-IO from 2.7 to 2.11.0, and PDFBox to 2.0.25 as per Tika release notes
- Also, LanguageIdentifier has been deprecated. This must be replaced by a concrete implementation of LanguageDetector. Tika publishes an implementation based on Optimaize via tika-langdetect-optimaize
  - This in turn brings in dependencies on Optimaize's language-detector and Google Guava (set to version 18 since that is what Optimaize uses, and to minimize the list of ignored violations)
  - Language-detector and Guava do not supply LICENSE and NOTICE files in the right format, so these have been manually added
xmlbeans libraries updated from 3.0.1 to 5.0.2
- xmlbeans is now a subproject of Apache POI, so the POI libraries were upgraded from 4.1.2 to 5.2.2
- With POI 5.x the ooxml-schemas library has been moved to ooxml-lite / ooxml-full. Since ooxml-schemas no longer exists, the LICENSE and NOTICE files in the licenses/ directory have been removed.

Alongside these version upgrades, code changes have been made to use the updated dependencies:

OptimaizeLangDetector is now used in place of the deprecated LanguageIdentifier
The new library versions have removed processing of certain fields so fallback logic has been added to AttachmentProcessor
Attachment Processor unit tests have been updated to accomodate non-deterministic results across library upgrades

Issues Resolved

Once this is merged, the dependabot PR #2138 can be closed

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

opensearch-ci-bot · 2022-04-29T18:37:03Z

❌ Gradle Check failure c654d875d2a7f5990181ac7bc462d09ccd27c228
Log 4870

Reports 4870

kartg · 2022-04-29T21:11:42Z

start gradle check

opensearch-ci-bot · 2022-04-29T21:14:09Z

❌ Gradle Check failure c654d875d2a7f5990181ac7bc462d09ccd27c228
Log 4873

Reports 4873

peterzhuamazon · 2022-04-29T22:54:13Z

start gradle check

opensearch-ci-bot · 2022-04-29T23:02:30Z

❌ Gradle Check failure c654d875d2a7f5990181ac7bc462d09ccd27c228
Log 4874

Reports 4874

dblock

LGTM assuming you can get it to green

dblock · 2022-05-02T20:42:18Z

Test failures look legit.


org.opensearch.ingest.attachment.AttachmentProcessorTests > testEnglishTextDocument FAILED
    java.lang.IllegalStateException: No language detectors available
        at __randomizedtesting.SeedInfo.seed([346CD47C4FE8213C:CE265966D8557F82]:0)
        at org.apache.tika.language.detect.LanguageDetector.getDefaultLanguageDetector(LanguageDetector.java:67)
        at org.opensearch.ingest.attachment.AttachmentProcessor.execute(AttachmentProcessor.java:138)
        at org.opensearch.ingest.attachment.AttachmentProcessorTests.parseDocument(AttachmentProcessorTests.java:333)
        at org.opensearch.ingest.attachment.AttachmentProcessorTests.parseDocument(AttachmentProcessorTests.java:323)
        at org.opensearch.ingest.attachment.AttachmentProcessorTests.testEnglishTextDocument(AttachmentProcessorTests.java:85)```

opensearch-ci-bot · 2022-05-02T22:25:42Z

❌ Gradle Check failure abe7e74b3b77c64fcdc3d9151cc6e616b03e448f
Log 4913

Reports 4913

kartg · 2022-05-02T23:00:38Z

Test failures look legit.

Yup, need to figure out how to configure the language detectors. Flipping to draft PR.

This major version upgrade requires an explicit dependency on tika-parsers-standard-package to import the parser implementations, and an update to the namespace of RTFParser. Also, LanguageIdentifier has been deprecated and replaced by LanguageDetector. This change includes a bump in xmlbeans version from 3.0.1 to 3.1.0 Signed-off-by: Kartik Ganesh <[email protected]>

This also requires a update of Apache Commons-IO from 2.7 to 2.11.0 Signed-off-by: Kartik Ganesh <[email protected]>

Also update PDFBox to 2.0.25 as per Tika release notes Signed-off-by: Kartik Ganesh <[email protected]>

Tika libraries have been upgraded from 2.2.1 to 2.3.0. xmlbeans is now a subproject of POI, so POI was upgraded from 4.1.2 to 5.2.2. With POI 5.x the ooxml-schemas library has been moved to ooxml-lite/ooxml-full. Since ooxml-schemas no longer exists, the LICENSE and NOTICE files in the licenses/ directory have been removed. Finally, xmlbeans has been updated from 3.1.0 to 5.0.2 Signed-off-by: Kartik Ganesh <[email protected]>

Signed-off-by: Kartik Ganesh <[email protected]>

opensearch-ci-bot · 2022-05-03T18:04:36Z

❌ Gradle Check failure d992f12
Log 4950

Reports 4950

opensearch-ci-bot · 2022-05-03T18:12:56Z

❌ Gradle Check failure b2fc07a
Log 4951

Reports 4951

To fix the license check, the mapping regex was expanded to tika-.* This now means the tika-core LICENSE and NOTICE files are no longer needed. Signed-off-by: Kartik Ganesh <[email protected]>

…Detector This is a concrete implementation of LanguageDetector. Using this requires bringing in the optimaize dependency. Signed-off-by: Kartik Ganesh <[email protected]>

Signed-off-by: Kartik Ganesh <[email protected]>

opensearch-ci-bot · 2022-05-03T19:56:40Z

❌ Gradle Check failure 777f012
Log 4958

Reports 4958

opensearch-ci-bot · 2022-05-03T20:23:41Z

❌ Gradle Check failure d35aede
Log 4960

Reports 4960

Also bring in transitive Guava dependency. This requires manual addition of LICENSE and NOTICE files as with other plugins. Signed-off-by: Kartik Ganesh <[email protected]>

Signed-off-by: Kartik Ganesh <[email protected]>

Following the Tika library upgrade, some fallback logic is necessary: 1. "Author" is deprecated for MSOffice document parsing. It is recommended to use CREATOR from Tika Core Properties instead. 2. EPUB parsing no longer automatically extracts keywords. The convention to fall back to SUBJECT is now manually implemented in AttachmentProcessor Finally, unit tests have been upgraded to account for non-deterministic language results across library upgrades. Signed-off-by: Kartik Ganesh <[email protected]>

This is the version that Optimaize 0.6 depends on, and it allows for a smaller ignoreViolations list Signed-off-by: Kartik Ganesh <[email protected]>

opensearch-ci-bot · 2022-05-03T23:48:31Z

❌ Gradle Check failure eb74e94
Log 4971

Reports 4971

opensearch-ci-bot · 2022-05-03T23:59:33Z

❌ Gradle Check failure 5765ed2
Log 4976

Reports 4976

kartg · 2022-05-04T00:38:12Z

More failures around non-determinism of language recognition - this time in IngestAttachmentClientYamlTestSuiteIT. Repro commands:

./gradlew ':plugins:ingest-attachment:yamlRestTest' --tests "org.opensearch.ingest.attachment.IngestAttachmentClientYamlTestSuiteIT" -Dtests.method="test {yaml=ingest_attachment/30_files_supported/Test ingest attachment processor with .doc file}" -Dtests.seed=1E07D222039A022B -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=es-ES -Dtests.timezone=CNT -Druntime.java=17

./gradlew ':plugins:ingest-attachment:yamlRestTest' --tests "org.opensearch.ingest.attachment.IngestAttachmentClientYamlTestSuiteIT" -Dtests.method="test {yaml=ingest_attachment/30_files_supported/Test ingest attachment processor with .docx file}" -Dtests.seed=1E07D222039A022B -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=es-ES -Dtests.timezone=CNT -Druntime.java=17

Looks like the failure are limited to doc/docx file processing, and the assertion failure is the same in all cases:

expected String [pl] but was String [en]

Signed-off-by: Kartik Ganesh <[email protected]>

kartg · 2022-05-04T00:52:57Z

This is an error with the test case, and the new output is more accurate.

The two files are represented as Base64 encoded strings in the yml test file. Decoding them and opening as Word documents shows the contents to be:

Test opensearch

Previously, language detection was incorrectly identifying this as Polish (pl) - likely due to the phrase being so short. With the upgraded libraries, the result has changed to en which is an accurate value

opensearch-ci-bot · 2022-05-04T01:24:03Z

✅ Gradle Check success 59f6b92
Log 4980

Reports 4980

reta · 2022-05-05T13:20:49Z

...ns/ingest-attachment/src/main/java/org/opensearch/ingest/attachment/AttachmentProcessor.java

-            // TODO: stop using LanguageIdentifier...
-            LanguageIdentifier identifier = new LanguageIdentifier(parsedContent);
-            String language = identifier.getLanguage();
+            OptimaizeLangDetector langDetector = new OptimaizeLangDetector();


It looks like initialization could be an expensive operation, should OptimaizeLangDetector + loadModels() be done only once during AttachmentProcessor initialization?

owaiskazi19 · 2022-05-10T21:08:14Z

@kartg should we backport this to 2.x?

kartg · 2022-05-10T21:10:06Z

@owaiskazi19 Yes, looks like it is blocking this backport.

I'll dig into @reta 's comment above and make sure any fixes for that get backported too.

* Upgrading Tika from 1.24.1 to 2.1.0 and bumping xmlbeans version This major version upgrade requires an explicit dependency on tika-parsers-standard-package to import the parser implementations, and an update to the namespace of RTFParser. Also, LanguageIdentifier has been deprecated and replaced by LanguageDetector. This change includes a bump in xmlbeans version from 3.0.1 to 3.1.0 Signed-off-by: Kartik Ganesh <[email protected]> * Upgrade Tika libraries from 2.1.0 to 2.2.0 This also requires a update of Apache Commons-IO from 2.7 to 2.11.0 Signed-off-by: Kartik Ganesh <[email protected]> * Upgrade Tika libraries from 2.2.0 to 2.2.1 Also update PDFBox to 2.0.25 as per Tika release notes Signed-off-by: Kartik Ganesh <[email protected]> * Upgraded Tika and xmlbeans libraries Tika libraries have been upgraded from 2.2.1 to 2.3.0. xmlbeans is now a subproject of POI, so POI was upgraded from 4.1.2 to 5.2.2. With POI 5.x the ooxml-schemas library has been moved to ooxml-lite/ooxml-full. Since ooxml-schemas no longer exists, the LICENSE and NOTICE files in the licenses/ directory have been removed. Finally, xmlbeans has been updated from 3.1.0 to 5.0.2 Signed-off-by: Kartik Ganesh <[email protected]> * (In progress) Added tika-langdetect Signed-off-by: Kartik Ganesh <[email protected]> * Upgrading tika libraries to 2.4.0 Signed-off-by: Kartik Ganesh <[email protected]> * Switched from tika-langdetect to tika-langdetect-optimaize To fix the license check, the mapping regex was expanded to tika-.* This now means the tika-core LICENSE and NOTICE files are no longer needed. Signed-off-by: Kartik Ganesh <[email protected]> * (Work in progress) Switching AttachmentProcessor to use OptimaizeLangDetector This is a concrete implementation of LanguageDetector. Using this requires bringing in the optimaize dependency. Signed-off-by: Kartik Ganesh <[email protected]> * Manually added LICENSE and NOTICE files for Optimaize language-detector Signed-off-by: Kartik Ganesh <[email protected]> * Move Optimaize dependency to runtimeOnly Also bring in transitive Guava dependency. This requires manual addition of LICENSE and NOTICE files as with other plugins. Signed-off-by: Kartik Ganesh <[email protected]> * Fix Optimaize langDetector to load models first before detecting Signed-off-by: Kartik Ganesh <[email protected]> * Fallback logic, and test updates Following the Tika library upgrade, some fallback logic is necessary: 1. "Author" is deprecated for MSOffice document parsing. It is recommended to use CREATOR from Tika Core Properties instead. 2. EPUB parsing no longer automatically extracts keywords. The convention to fall back to SUBJECT is now manually implemented in AttachmentProcessor Finally, unit tests have been upgraded to account for non-deterministic language results across library upgrades. Signed-off-by: Kartik Ganesh <[email protected]> * Drop Guava version from 31.1 to 18.0 This is the version that Optimaize 0.6 depends on, and it allows for a smaller ignoreViolations list Signed-off-by: Kartik Ganesh <[email protected]> * Fix ingest-attachment integration test to assert correct language Signed-off-by: Kartik Ganesh <[email protected]> (cherry picked from commit fc0f446)

* Upgrading Tika from 1.24.1 to 2.1.0 and bumping xmlbeans version This major version upgrade requires an explicit dependency on tika-parsers-standard-package to import the parser implementations, and an update to the namespace of RTFParser. Also, LanguageIdentifier has been deprecated and replaced by LanguageDetector. This change includes a bump in xmlbeans version from 3.0.1 to 3.1.0 Signed-off-by: Kartik Ganesh <[email protected]> * Upgrade Tika libraries from 2.1.0 to 2.2.0 This also requires a update of Apache Commons-IO from 2.7 to 2.11.0 Signed-off-by: Kartik Ganesh <[email protected]> * Upgrade Tika libraries from 2.2.0 to 2.2.1 Also update PDFBox to 2.0.25 as per Tika release notes Signed-off-by: Kartik Ganesh <[email protected]> * Upgraded Tika and xmlbeans libraries Tika libraries have been upgraded from 2.2.1 to 2.3.0. xmlbeans is now a subproject of POI, so POI was upgraded from 4.1.2 to 5.2.2. With POI 5.x the ooxml-schemas library has been moved to ooxml-lite/ooxml-full. Since ooxml-schemas no longer exists, the LICENSE and NOTICE files in the licenses/ directory have been removed. Finally, xmlbeans has been updated from 3.1.0 to 5.0.2 Signed-off-by: Kartik Ganesh <[email protected]> * (In progress) Added tika-langdetect Signed-off-by: Kartik Ganesh <[email protected]> * Upgrading tika libraries to 2.4.0 Signed-off-by: Kartik Ganesh <[email protected]> * Switched from tika-langdetect to tika-langdetect-optimaize To fix the license check, the mapping regex was expanded to tika-.* This now means the tika-core LICENSE and NOTICE files are no longer needed. Signed-off-by: Kartik Ganesh <[email protected]> * (Work in progress) Switching AttachmentProcessor to use OptimaizeLangDetector This is a concrete implementation of LanguageDetector. Using this requires bringing in the optimaize dependency. Signed-off-by: Kartik Ganesh <[email protected]> * Manually added LICENSE and NOTICE files for Optimaize language-detector Signed-off-by: Kartik Ganesh <[email protected]> * Move Optimaize dependency to runtimeOnly Also bring in transitive Guava dependency. This requires manual addition of LICENSE and NOTICE files as with other plugins. Signed-off-by: Kartik Ganesh <[email protected]> * Fix Optimaize langDetector to load models first before detecting Signed-off-by: Kartik Ganesh <[email protected]> * Fallback logic, and test updates Following the Tika library upgrade, some fallback logic is necessary: 1. "Author" is deprecated for MSOffice document parsing. It is recommended to use CREATOR from Tika Core Properties instead. 2. EPUB parsing no longer automatically extracts keywords. The convention to fall back to SUBJECT is now manually implemented in AttachmentProcessor Finally, unit tests have been upgraded to account for non-deterministic language results across library upgrades. Signed-off-by: Kartik Ganesh <[email protected]> * Drop Guava version from 31.1 to 18.0 This is the version that Optimaize 0.6 depends on, and it allows for a smaller ignoreViolations list Signed-off-by: Kartik Ganesh <[email protected]> * Fix ingest-attachment integration test to assert correct language Signed-off-by: Kartik Ganesh <[email protected]> (cherry picked from commit fc0f446) Co-authored-by: Kartik Ganesh <[email protected]>

* Upgrading Tika from 1.24.1 to 2.1.0 and bumping xmlbeans version This major version upgrade requires an explicit dependency on tika-parsers-standard-package to import the parser implementations, and an update to the namespace of RTFParser. Also, LanguageIdentifier has been deprecated and replaced by LanguageDetector. This change includes a bump in xmlbeans version from 3.0.1 to 3.1.0 Signed-off-by: Kartik Ganesh <[email protected]> * Upgrade Tika libraries from 2.1.0 to 2.2.0 This also requires a update of Apache Commons-IO from 2.7 to 2.11.0 Signed-off-by: Kartik Ganesh <[email protected]> * Upgrade Tika libraries from 2.2.0 to 2.2.1 Also update PDFBox to 2.0.25 as per Tika release notes Signed-off-by: Kartik Ganesh <[email protected]> * Upgraded Tika and xmlbeans libraries Tika libraries have been upgraded from 2.2.1 to 2.3.0. xmlbeans is now a subproject of POI, so POI was upgraded from 4.1.2 to 5.2.2. With POI 5.x the ooxml-schemas library has been moved to ooxml-lite/ooxml-full. Since ooxml-schemas no longer exists, the LICENSE and NOTICE files in the licenses/ directory have been removed. Finally, xmlbeans has been updated from 3.1.0 to 5.0.2 Signed-off-by: Kartik Ganesh <[email protected]> * (In progress) Added tika-langdetect Signed-off-by: Kartik Ganesh <[email protected]> * Upgrading tika libraries to 2.4.0 Signed-off-by: Kartik Ganesh <[email protected]> * Switched from tika-langdetect to tika-langdetect-optimaize To fix the license check, the mapping regex was expanded to tika-.* This now means the tika-core LICENSE and NOTICE files are no longer needed. Signed-off-by: Kartik Ganesh <[email protected]> * (Work in progress) Switching AttachmentProcessor to use OptimaizeLangDetector This is a concrete implementation of LanguageDetector. Using this requires bringing in the optimaize dependency. Signed-off-by: Kartik Ganesh <[email protected]> * Manually added LICENSE and NOTICE files for Optimaize language-detector Signed-off-by: Kartik Ganesh <[email protected]> * Move Optimaize dependency to runtimeOnly Also bring in transitive Guava dependency. This requires manual addition of LICENSE and NOTICE files as with other plugins. Signed-off-by: Kartik Ganesh <[email protected]> * Fix Optimaize langDetector to load models first before detecting Signed-off-by: Kartik Ganesh <[email protected]> * Fallback logic, and test updates Following the Tika library upgrade, some fallback logic is necessary: 1. "Author" is deprecated for MSOffice document parsing. It is recommended to use CREATOR from Tika Core Properties instead. 2. EPUB parsing no longer automatically extracts keywords. The convention to fall back to SUBJECT is now manually implemented in AttachmentProcessor Finally, unit tests have been upgraded to account for non-deterministic language results across library upgrades. Signed-off-by: Kartik Ganesh <[email protected]> * Drop Guava version from 31.1 to 18.0 This is the version that Optimaize 0.6 depends on, and it allows for a smaller ignoreViolations list Signed-off-by: Kartik Ganesh <[email protected]> * Fix ingest-attachment integration test to assert correct language Signed-off-by: Kartik Ganesh <[email protected]> (cherry picked from commit fc0f446)

* Upgrading Tika from 1.24.1 to 2.1.0 and bumping xmlbeans version This major version upgrade requires an explicit dependency on tika-parsers-standard-package to import the parser implementations, and an update to the namespace of RTFParser. Also, LanguageIdentifier has been deprecated and replaced by LanguageDetector. This change includes a bump in xmlbeans version from 3.0.1 to 3.1.0 Signed-off-by: Kartik Ganesh <[email protected]> * Upgrade Tika libraries from 2.1.0 to 2.2.0 This also requires a update of Apache Commons-IO from 2.7 to 2.11.0 Signed-off-by: Kartik Ganesh <[email protected]> * Upgrade Tika libraries from 2.2.0 to 2.2.1 Also update PDFBox to 2.0.25 as per Tika release notes Signed-off-by: Kartik Ganesh <[email protected]> * Upgraded Tika and xmlbeans libraries Tika libraries have been upgraded from 2.2.1 to 2.3.0. xmlbeans is now a subproject of POI, so POI was upgraded from 4.1.2 to 5.2.2. With POI 5.x the ooxml-schemas library has been moved to ooxml-lite/ooxml-full. Since ooxml-schemas no longer exists, the LICENSE and NOTICE files in the licenses/ directory have been removed. Finally, xmlbeans has been updated from 3.1.0 to 5.0.2 Signed-off-by: Kartik Ganesh <[email protected]> * (In progress) Added tika-langdetect Signed-off-by: Kartik Ganesh <[email protected]> * Upgrading tika libraries to 2.4.0 Signed-off-by: Kartik Ganesh <[email protected]> * Switched from tika-langdetect to tika-langdetect-optimaize To fix the license check, the mapping regex was expanded to tika-.* This now means the tika-core LICENSE and NOTICE files are no longer needed. Signed-off-by: Kartik Ganesh <[email protected]> * (Work in progress) Switching AttachmentProcessor to use OptimaizeLangDetector This is a concrete implementation of LanguageDetector. Using this requires bringing in the optimaize dependency. Signed-off-by: Kartik Ganesh <[email protected]> * Manually added LICENSE and NOTICE files for Optimaize language-detector Signed-off-by: Kartik Ganesh <[email protected]> * Move Optimaize dependency to runtimeOnly Also bring in transitive Guava dependency. This requires manual addition of LICENSE and NOTICE files as with other plugins. Signed-off-by: Kartik Ganesh <[email protected]> * Fix Optimaize langDetector to load models first before detecting Signed-off-by: Kartik Ganesh <[email protected]> * Fallback logic, and test updates Following the Tika library upgrade, some fallback logic is necessary: 1. "Author" is deprecated for MSOffice document parsing. It is recommended to use CREATOR from Tika Core Properties instead. 2. EPUB parsing no longer automatically extracts keywords. The convention to fall back to SUBJECT is now manually implemented in AttachmentProcessor Finally, unit tests have been upgraded to account for non-deterministic language results across library upgrades. Signed-off-by: Kartik Ganesh <[email protected]> * Drop Guava version from 31.1 to 18.0 This is the version that Optimaize 0.6 depends on, and it allows for a smaller ignoreViolations list Signed-off-by: Kartik Ganesh <[email protected]> * Fix ingest-attachment integration test to assert correct language Signed-off-by: Kartik Ganesh <[email protected]> (cherry picked from commit fc0f446) Co-authored-by: Kartik Ganesh <[email protected]>

kartg requested review from a team and reta as code owners April 29, 2022 18:35

kartg mentioned this pull request Apr 29, 2022

Bump xmlbeans from 3.0.1 to 5.0.3 in /plugins/ingest-attachment #2138

Merged

dblock approved these changes May 2, 2022

View reviewed changes

kartg marked this pull request as draft May 2, 2022 23:00

kartg added 5 commits May 3, 2022 10:33

Upgrade Tika libraries from 2.1.0 to 2.2.0

b8b2025

This also requires a update of Apache Commons-IO from 2.7 to 2.11.0 Signed-off-by: Kartik Ganesh <[email protected]>

Upgrade Tika libraries from 2.2.0 to 2.2.1

246a885

Also update PDFBox to 2.0.25 as per Tika release notes Signed-off-by: Kartik Ganesh <[email protected]>

(In progress) Added tika-langdetect

d992f12

Signed-off-by: Kartik Ganesh <[email protected]>

kartg force-pushed the xmlBeansUpdate branch from abe7e74 to d992f12 Compare May 3, 2022 17:52

Upgrading tika libraries to 2.4.0

b2fc07a

Signed-off-by: Kartik Ganesh <[email protected]>

kartg added 3 commits May 3, 2022 12:08

Switched from tika-langdetect to tika-langdetect-optimaize

61aef61

To fix the license check, the mapping regex was expanded to tika-.* This now means the tika-core LICENSE and NOTICE files are no longer needed. Signed-off-by: Kartik Ganesh <[email protected]>

(Work in progress) Switching AttachmentProcessor to use OptimaizeLang…

777f012

…Detector This is a concrete implementation of LanguageDetector. Using this requires bringing in the optimaize dependency. Signed-off-by: Kartik Ganesh <[email protected]>

Manually added LICENSE and NOTICE files for Optimaize language-detector

d35aede

Signed-off-by: Kartik Ganesh <[email protected]>

kartg added 3 commits May 3, 2022 15:13

Move Optimaize dependency to runtimeOnly

6e5eea9

Also bring in transitive Guava dependency. This requires manual addition of LICENSE and NOTICE files as with other plugins. Signed-off-by: Kartik Ganesh <[email protected]>

Fix Optimaize langDetector to load models first before detecting

eb74e94

Signed-off-by: Kartik Ganesh <[email protected]>

Drop Guava version from 31.1 to 18.0

5765ed2

This is the version that Optimaize 0.6 depends on, and it allows for a smaller ignoreViolations list Signed-off-by: Kartik Ganesh <[email protected]>

Fix ingest-attachment integration test to assert correct language

59f6b92

Signed-off-by: Kartik Ganesh <[email protected]>

kartg marked this pull request as ready for review May 4, 2022 01:26

mch2 approved these changes May 4, 2022

View reviewed changes

kartg merged commit fc0f446 into opensearch-project:main May 4, 2022

reta reviewed May 5, 2022

View reviewed changes

andrross mentioned this pull request May 10, 2022

Bump xz from 1.8 to 1.9 in /plugins/ingest-attachment #3248

Merged

kartg deleted the xmlBeansUpdate branch May 10, 2022 21:07

kartg added the backport 2.x Backport to 2.x branch label May 10, 2022

opensearch-trigger-bot bot mentioned this pull request May 10, 2022

[Backport 2.x] Upgrading ingest-attachment dependencies #3279

Merged

mch2 added the backport 1.x label Jul 5, 2022

opensearch-trigger-bot bot mentioned this pull request Jul 5, 2022

[Backport 1.x] Upgrading ingest-attachment dependencies #3774

Merged

mch2 added the backport 1.3 Backport to 1.3 branch label Jul 7, 2022

opensearch-trigger-bot bot mentioned this pull request Jul 7, 2022

[Backport 1.3] Upgrading ingest-attachment dependencies #3794

Merged

mch2 pushed a commit that referenced this pull request Jul 7, 2022

Upgrading ingest-attachment dependencies (#3111) (#3774)

77bdbdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrading ingest-attachment dependencies #3111

Upgrading ingest-attachment dependencies #3111

kartg commented Apr 29, 2022 •

edited

Loading

opensearch-ci-bot commented Apr 29, 2022

kartg commented Apr 29, 2022

opensearch-ci-bot commented Apr 29, 2022

peterzhuamazon commented Apr 29, 2022

opensearch-ci-bot commented Apr 29, 2022

dblock left a comment

dblock commented May 2, 2022

opensearch-ci-bot commented May 2, 2022

kartg commented May 2, 2022

opensearch-ci-bot commented May 3, 2022

opensearch-ci-bot commented May 3, 2022

opensearch-ci-bot commented May 3, 2022

opensearch-ci-bot commented May 3, 2022

opensearch-ci-bot commented May 3, 2022

opensearch-ci-bot commented May 3, 2022

kartg commented May 4, 2022

kartg commented May 4, 2022

opensearch-ci-bot commented May 4, 2022

reta May 5, 2022

owaiskazi19 commented May 10, 2022

kartg commented May 10, 2022

Upgrading ingest-attachment dependencies #3111

Upgrading ingest-attachment dependencies #3111

Conversation

kartg commented Apr 29, 2022 • edited Loading

Description

Issues Resolved

Check List

opensearch-ci-bot commented Apr 29, 2022

kartg commented Apr 29, 2022

opensearch-ci-bot commented Apr 29, 2022

peterzhuamazon commented Apr 29, 2022

opensearch-ci-bot commented Apr 29, 2022

dblock left a comment

Choose a reason for hiding this comment

dblock commented May 2, 2022

opensearch-ci-bot commented May 2, 2022

kartg commented May 2, 2022

opensearch-ci-bot commented May 3, 2022

opensearch-ci-bot commented May 3, 2022

opensearch-ci-bot commented May 3, 2022

opensearch-ci-bot commented May 3, 2022

opensearch-ci-bot commented May 3, 2022

opensearch-ci-bot commented May 3, 2022

kartg commented May 4, 2022

kartg commented May 4, 2022

opensearch-ci-bot commented May 4, 2022

reta May 5, 2022

Choose a reason for hiding this comment

owaiskazi19 commented May 10, 2022

kartg commented May 10, 2022

kartg commented Apr 29, 2022 •

edited

Loading