-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[plugin] repository-azure is not working properly hangs on basic operations #1740
Conversation
…ations Signed-off-by: Andriy Redko <[email protected]>
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we write a test (with appropriate timeouts) for this?
Will this need to be reverted once fixed upstream? If so lets open an issue and reference in a todo?
public void onFailure(Exception e) { | ||
exceptions.add(e); | ||
} | ||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this here from spotless formatting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thanks @nknize
We do test, timeouts are actually fine and will trigger, the problem is that in the test we always use Woodstox and never see this problem, but I know how to test, hold on
👍 |
Thank you for fixing this. How bad of a regression is this? Who’s affected? Given the log4j vulnerability being a must upgrade do we need to queue a 1.2.3 when this fix is in? |
Signed-off-by: Andriy Redko <[email protected]>
@reta we ought to be able to write an integration test that exercises the plug-in in the same environment as when one runs it (no Woodstox). |
setClasspath(internalTestSourceSet.getRuntimeClasspath()) | ||
dependsOn tasks.internalClusterTest | ||
include '**/AzureStorageCleanupThirdPartyTests.class' | ||
systemProperty 'javax.xml.stream.XMLInputFactory', "com.sun.xml.internal.stream.XMLInputFactoryImpl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn’t this be the default for all tests to match real usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would hopefully do upstream fix and will test with both, this is the only test which actually could run against Azure cloud, I did run it in all configurations.
To be honest, it is bad, the plugin is essentially unusable, would be good to get it out in 1.2.2 with |
Are we worried this problem exists elsewhere in OpenSearch that can cause an infinite loop depending on input? |
This is very Azure SDK specific, I would not worry about it in the scope outside the plugin itself |
Won’t make it into 1.2.2, we shouldn’t be delaying a security fix for even a super major regression. But I raised this to the release team, so we’ll hear back today, and i hear someone on @anasalkouz team is looking into #1707 which would make a release easier. |
@nknize tests & TODOs added, thank you! |
Nicely done! Especially on the test.
💯 agreement @reta . Having a DOA azure plugin is horrible. Essentially all users that rely on azure snapshot/restore are blocked from upgrading; that's absolutely a bad time. Agree that we do need to get a bugfix release out ASAP. It's unfortunate the release process takes a very long time and prevents this in a timely manner but that's being worked. There's a related discussion around plugin/core version compatibility to determine how a patched core (M.m.p) can be released with minimal work to the rest of the stack. I'm not sure it's a straightforward answer but a compromise might be to introduce semver range syntax support on patched versions while keeping the strict Major.minor enforcement.. for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.. let's just make sure we have a companion OpenSearch issue so we don't forget to revert when upgrading the Azure dependency.
|
||
do { | ||
// Fetch one page at a time, others are going to be fetched by continuation token | ||
// TODO: reconsider reverting to simplified approach once https://github.com/Azure/azure-sdk-for-java/issues/26064 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we open an OpenSearch issue as well so we don't forget here? Something like "Revert repository-azure patch once upstream fixes are available" would be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public void onFailure(Exception e) { | ||
exceptions.add(e); | ||
} | ||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -283,4 +283,23 @@ task azureThirdPartyTest(type: Test) { | |||
nonInputProperties.systemProperty 'test.azure.endpoint_suffix', "${-> azureAddress.call() }" | |||
} | |||
} | |||
check.dependsOn(azureThirdPartyTest) | |||
|
|||
task azureThirdPartyDefaultXmlTest(type: Test) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@reta Are you making a PR into main as well? Thanks |
…ations (opensearch-project#1740) * [plugin] repository-azure is not working properly hangs on basic operations Signed-off-by: Andriy Redko <[email protected]> * Added tests cases and TODO items, addressing code review comments Signed-off-by: Andriy Redko <[email protected]>
…ations (#1740) (#1749) This commit fixes repository-azure hanging on basic operations. This will be reverted once it's fixed upstream in the Azure library. Signed-off-by: Andriy Redko <[email protected]>
Signed-off-by: Andriy Redko [email protected]
Description
The issue is closely related to FasterXML/jackson-databind#3322 and in the nutshell, Azure Blob APIs V12 heavily relies on the fact that empty XML elements / attributes are going to be nullified.
However, sadly, it highly depends on
XMLInputReader
instance being picked up at runtime: theWoodstox
does that, whereas the default one from JDKcom.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl
does not. It leads to infinite loop withinlistBlobsByHierarchy
orlistBlobs
- the page iterator only understandsnull
as termination condition.The fastest option to get it fixed is to fallback to manual iteration. The bug report to upstream is also submitted Azure/azure-sdk-for-java#26064
Issues Resolved
Closes #1734
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.