Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[plugin] repository-azure is not working properly hangs on basic operations #1740

Merged
merged 2 commits into from
Dec 16, 2021

Conversation

reta
Copy link
Collaborator

@reta reta commented Dec 16, 2021

Signed-off-by: Andriy Redko [email protected]

Description

The issue is closely related to FasterXML/jackson-databind#3322 and in the nutshell, Azure Blob APIs V12 heavily relies on the fact that empty XML elements / attributes are going to be nullified.

However, sadly, it highly depends on XMLInputReader instance being picked up at runtime: the Woodstox does that, whereas the default one from JDK com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl does not. It leads to infinite loop within listBlobsByHierarchy or listBlobs - the page iterator only understands null as termination condition.

The fastest option to get it fixed is to fallback to manual iteration. The bug report to upstream is also submitted Azure/azure-sdk-for-java#26064

Issues Resolved

Closes #1734

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@opensearch-ci-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 5f25be4
Log 1520

Reports 1520

Copy link
Collaborator

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we write a test (with appropriate timeouts) for this?

Will this need to be reverted once fixed upstream? If so lets open an issue and reference in a todo?

public void onFailure(Exception e) {
exceptions.add(e);
}
@Override
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this here from spotless formatting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep :(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nknize nknize added bug Something isn't working Plugins v1.2.0 Issues related to version 1.2.0 labels Dec 16, 2021
@reta
Copy link
Collaborator Author

reta commented Dec 16, 2021

Thanks @nknize

Can we write a test (with appropriate timeouts) for this?

We do test, timeouts are actually fine and will trigger, the problem is that in the test we always use Woodstox and never see this problem, but I know how to test, hold on

Will this need to be reverted once fixed upstream? If so lets open an issue and reference in a todo?

👍

@dblock
Copy link
Member

dblock commented Dec 16, 2021

Thank you for fixing this.

How bad of a regression is this? Who’s affected? Given the log4j vulnerability being a must upgrade do we need to queue a 1.2.3 when this fix is in?

@dblock dblock requested a review from andrross December 16, 2021 15:40
@dblock
Copy link
Member

dblock commented Dec 16, 2021

@reta we ought to be able to write an integration test that exercises the plug-in in the same environment as when one runs it (no Woodstox).

setClasspath(internalTestSourceSet.getRuntimeClasspath())
dependsOn tasks.internalClusterTest
include '**/AzureStorageCleanupThirdPartyTests.class'
systemProperty 'javax.xml.stream.XMLInputFactory', "com.sun.xml.internal.stream.XMLInputFactoryImpl"
Copy link
Member

@dblock dblock Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t this be the default for all tests to match real usage?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would hopefully do upstream fix and will test with both, this is the only test which actually could run against Azure cloud, I did run it in all configurations.

@reta
Copy link
Collaborator Author

reta commented Dec 16, 2021

@reta we ought to be able to write an integration test that exercises the plug-in in the same environment as when one runs it (no Woodstox).

@dblock done, added azureThirdPartyDefaultXmlTest Gradle test with custom XML factory, it is failing on main right away

@reta
Copy link
Collaborator Author

reta commented Dec 16, 2021

Thank you for fixing this.

How bad of a regression is this? Who’s affected? Given the log4j vulnerability being a must upgrade do we need to queue a 1.2.3 when this fix is in?

To be honest, it is bad, the plugin is essentially unusable, would be good to get it out in 1.2.2 with log4j 2.16.0 update, if possible, the pull request is there #1745

@dblock
Copy link
Member

dblock commented Dec 16, 2021

Are we worried this problem exists elsewhere in OpenSearch that can cause an infinite loop depending on input?

@reta
Copy link
Collaborator Author

reta commented Dec 16, 2021

Are we worried this problem exists elsewhere in OpenSearch that can cause an infinite loop depending on input?

This is very Azure SDK specific, I would not worry about it in the scope outside the plugin itself

@dblock
Copy link
Member

dblock commented Dec 16, 2021

Thank you for fixing this.
How bad of a regression is this? Who’s affected? Given the log4j vulnerability being a must upgrade do we need to queue a 1.2.3 when this fix is in?

To be honest, it is bad, the plugin is essentially unusable, would be good to get it out in 1.2.2 with log4j 2.16.0 update, if possible, the pull request is there #1745

Won’t make it into 1.2.2, we shouldn’t be delaying a security fix for even a super major regression. But I raised this to the release team, so we’ll hear back today, and i hear someone on @anasalkouz team is looking into #1707 which would make a release easier.

@reta
Copy link
Collaborator Author

reta commented Dec 16, 2021

@nknize tests & TODOs added, thank you!

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 3850986
Log 1533

Reports 1533

@nknize
Copy link
Collaborator

nknize commented Dec 16, 2021

Nicely done! Especially on the test.

To be honest, it is bad, the plugin is essentially unusable

💯 agreement @reta . Having a DOA azure plugin is horrible. Essentially all users that rely on azure snapshot/restore are blocked from upgrading; that's absolutely a bad time. Agree that we do need to get a bugfix release out ASAP. It's unfortunate the release process takes a very long time and prevents this in a timely manner but that's being worked.

There's a related discussion around plugin/core version compatibility to determine how a patched core (M.m.p) can be released with minimal work to the rest of the stack. I'm not sure it's a straightforward answer but a compromise might be to introduce semver range syntax support on patched versions while keeping the strict Major.minor enforcement.. for now

@nknize nknize requested review from dblock and nknize December 16, 2021 18:46
Copy link
Collaborator

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.. let's just make sure we have a companion OpenSearch issue so we don't forget to revert when upgrading the Azure dependency.


do {
// Fetch one page at a time, others are going to be fetched by continuation token
// TODO: reconsider reverting to simplified approach once https://github.com/Azure/azure-sdk-for-java/issues/26064
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we open an OpenSearch issue as well so we don't forget here? Something like "Revert repository-azure patch once upstream fixes are available" would be fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public void onFailure(Exception e) {
exceptions.add(e);
}
@Override
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -283,4 +283,23 @@ task azureThirdPartyTest(type: Test) {
nonInputProperties.systemProperty 'test.azure.endpoint_suffix', "${-> azureAddress.call() }"
}
}
check.dependsOn(azureThirdPartyTest)

task azureThirdPartyDefaultXmlTest(type: Test) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@dblock dblock merged commit 3cae72d into opensearch-project:1.x Dec 16, 2021
@dblock
Copy link
Member

dblock commented Dec 16, 2021

@reta Are you making a PR into main as well? Thanks

@reta
Copy link
Collaborator Author

reta commented Dec 16, 2021

@reta Are you making a PR into main as well? Thanks

@dblock absolutely, want to cover 1.x first, on it

reta added a commit to reta/OpenSearch that referenced this pull request Dec 16, 2021
…ations (opensearch-project#1740)

* [plugin] repository-azure is not working properly hangs on basic operations

Signed-off-by: Andriy Redko <[email protected]>

* Added tests cases and TODO items, addressing code review comments

Signed-off-by: Andriy Redko <[email protected]>
nknize pushed a commit that referenced this pull request Dec 16, 2021
…ations (#1740) (#1749)

This commit fixes repository-azure hanging on basic operations. This will be reverted 
once it's fixed upstream in the Azure library.

Signed-off-by: Andriy Redko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Plugins v1.2.0 Issues related to version 1.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants