Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Narayana JTA: expose property to set xaAssumeRecoveryComplete #35806

Open
turing85 opened this issue Sep 7, 2023 · 20 comments
Open

Narayana JTA: expose property to set xaAssumeRecoveryComplete #35806

turing85 opened this issue Sep 7, 2023 · 20 comments
Labels
area/narayana Transactions / Narayana kind/enhancement New feature or request

Comments

@turing85
Copy link
Contributor

turing85 commented Sep 7, 2023

Description

There is a gap in the XA restore process. If all resources have acknowledged the transaction, narayana deletes the transaction from its transaction store. If the application dies right in this moment and restarts, narayana still thinks that the XA transaction is not finished, but the transactions on the resources were already committed, and the resources do not have any memory about those transactions. This, in return, leads the a log message similar to:

... ARJUNA016037: Could not find new XAResource to use for recovering non-serializable XAResource XAResourceRecord ...

This message will appear periodically with log-level WARN, until the correspoding transaction log is removed.

This link (access.redhat.com) suggests setting system-property com.arjuna.ats.jta.xaAssumeRecoveryComplete to true.

While we can start the application with java ... -Dcom.arjuna.ats.jta.xaAssumeRecoveryComplete=true ..., it would be convenient to have a configuration property like quarkus.transaction-manager.xa-assume-recovery-complete that can be set in application.properties/application.yml.

Implementation ideas

No response

@turing85 turing85 added the kind/enhancement New feature or request label Sep 7, 2023
@gastaldi gastaldi added area/narayana Transactions / Narayana and removed triage/needs-triage labels Sep 7, 2023
@quarkus-bot
Copy link

quarkus-bot bot commented Sep 7, 2023

/cc @mmusgrov (narayana)

@turing85
Copy link
Contributor Author

turing85 commented Sep 9, 2023

@gsmet If the corresponding PR gets merged, I'd like to have a downstream backport to 2.13 and 2.16. Would this be possible? Only change required is adding import java.util.List; in file NarayanaJtaProcessor.java on both branches.

turing85 added a commit to turing85/quarkus that referenced this issue Sep 9, 2023
turing85 added a commit to turing85/quarkus that referenced this issue Sep 9, 2023
@mmusgrov
Copy link
Contributor

Doing this is risky because you may end up deleting logs for in doubt transactions if the assumption is wrong which could result in inconsistent data.

We added the interface org.jboss.tm.XAResourceWrapper to mitigate against this and it should be implemented by the resources that get passed to the transaction extension.

@turing85
Copy link
Contributor Author

turing85 commented Sep 11, 2023

Okay... then I am out. I took a quick look at the interface and I have no idea how to implement this and what resource have to implement this interface. I am going to close my PR.
@mmusgrov will you take over?

@mmusgrov
Copy link
Contributor

@turing85 Not really. We do not know which resources might have known about the transaction unless our transaction record can identify the resource. If the resource that's registered with us implements org.jboss.tm.XAResourceWrapper then we know its JNDI name and if we know that we have asked the resource, corresponding to the JNDI name, for its in doubt Xids and if it did not know about this transaction then we may infer that it must have completed.

But we do not control the resource registrations, that is the job of the Agroal subsystem so only Agroal can help in this situation. But Agroal does not know either because Quarkus does not initialise the integration code with the datasource name. @barreiro is that a fair description of the situation and if so can anyone suggest what it would take to ensure that Agroal can be supplied with that name.

@turing85
Copy link
Contributor Author

turing85 commented Sep 11, 2023

@mmusgrov I am very sure that the resource causing the issue is the JMS connection, so no Agroal is involved.

What does this exactly mean? Should I re-open the PR?

@mmusgrov
Copy link
Contributor

The issue needs to be fixed by whatever component is supplying us with the resource.

@turing85
Copy link
Contributor Author

That would be quarkus-artemis. But quarkus-artemis supplies a ActiveMQConnectionFactory extends JNDIStorable implements ConnectionFactoryOptions, Externalizable, ConnectionFactory, XAConnectionFactory, AutoCloseable. How would I implement that interface then? I am a little bit at a loss here.

@mmusgrov
Copy link
Contributor

Maybe the thing that implements the XASession methods can check. You'd need to do it just for the resources that you register with us since end user apps would not want the wrapper. I assumed that the route from ActiveMQConnectionFactory to the call to register the resource with us is not obvious so I did not look.

@turing85
Copy link
Contributor Author

turing85 commented Sep 11, 2023

It is obvious. CF -> XASession -> XAResource. Do you have an example of, for example, an agroal datasource implementing the interface?

@mmusgrov
Copy link
Contributor

Agroal does not implement it, I was hoping that this problem would be a trigger for Luis and the quarkus team to support the interface because it's a problem. But IronJacamar does and I believe there is some work in Quarkus to support that.

@turing85
Copy link
Contributor Author

turing85 commented Sep 11, 2023

Well in this case, I propose we merge my original PR until this issue is resolved for good. Having these logs popping up over and over again is noise, and setting the property does not work in native mode.

@mmusgrov
Copy link
Contributor

The window where this crash can happen is very small:

  1. commit resource.
  2. application crashes.
  3. delete log

So I'd be surprised to see the problem popping up over and over again, perhaps there is another root cause analysis or the reproducer is contrived.

The consequences of leaving the TM log lying around is rare noise.
The consequence of using xaAssumeRecoveryComplete and getting the assumption being wrong is data integrity.

I'd definitely want to avoid the latter so I'd resist approving the PR.

@turing85
Copy link
Contributor Author

Multiply the window times 100. We have multiple services producing those logs.

@mmusgrov
Copy link
Contributor

I'd still prefer noise over loss of data integrity so lets see if we can up the priority of implementing XAResourceWrapper.

@mmusgrov
Copy link
Contributor

For completeness, as an aside, the name of the datasource doesn't have to be JNDI, which I'm not sure if Quarkus supports, it just needs to be unique.

@turing85
Copy link
Contributor Author

turing85 commented Sep 11, 2023

Nobody is forced to set this option. We are right now forced to have this noise. And setting said property produces a very clear log message, stating that the transaction will be assumed to be completed.

@mmusgrov
Copy link
Contributor

I'm not convinced every developer knows the value of maintaining ACID semantics.

@mmusgrov
Copy link
Contributor

I'd also check your environment to determine why so many applications are crashing or are being scaled down with pending work.

@mmusgrov
Copy link
Contributor

mmusgrov commented Jul 5, 2024

@turing85 As mentioned, even though the property xaAssumeRecoveryComplete requires that wrapped resources have JNDI names any unique value (amongst all resources) will do. So I agree that we could add support for XAResourceWrapper to the jta extension now so that if, in the future, Agroal is able to wrap its resources and provide such a unique value then the issue would not be blocked waiting for the jta extension part of the feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/narayana Transactions / Narayana kind/enhancement New feature or request
Projects
None yet
3 participants