Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Narayana automatic recovery of JDBC XA resources only works after datasources was initialized manually #35238

Closed
michalvavrik opened this issue Aug 7, 2023 · 12 comments · Fixed by #35262
Labels
area/narayana Transactions / Narayana kind/bug Something isn't working
Milestone

Comments

@michalvavrik
Copy link
Member

Describe the bug

If I want periodic recovery module to recover my failed transaction stored in JDBC object store, I need first to have datasources initialized (e.g. when I sent request or with startevent I use datasources).

Expected behavior

When app starts, failed transactions recover.

Actual behavior

Exception and no recovery:

2023-08-06 23:56:26,715 WARN  [com.arj.ats.jta] (Periodic Recovery) ARJUNA016037: Could not find new XAResource to use for recovering non-serializable XAResource XAResourceRecord < resource:null, txid:< formatId=131077, gtrid_length=46, bqual_length=36, tx_uid=0:ffff0a0000b9:af1b:64d01657:11, node_name=quarkus-quickstart, branch_uid=0:ffff0a0000b9:af1b:64d01657:17, subordinatenodename=null, eis_name=0 >, heuristic: TwoPhaseOutcome.FINISH_OK com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@55ea0fec >
2023-08-06 23:56:26,719 WARN  [com.arj.ats.jta] (Periodic Recovery) ARJUNA016038: No XAResource to recover < formatId=131077, gtrid_length=46, bqual_length=36, tx_uid=0:ffff0a0000b9:af1b:64d01657:11, node_name=quarkus-quickstart, branch_uid=0:ffff0a0000b9:af1b:64d01657:13, subordinatenodename=null, eis_name=0 >

Recovery doesn't work till I use datasources myself. @zhfeng explained to me here quarkusio/quarkus-quickstarts#1311 (comment) that there was changes in Agroal regarding ds initialization https://github.com/quarkusio/quarkus/blob/main/extensions/agroal/runtime/src/main/java/io/quarkus/agroal/runtime/AgroalRecorder.java#L28-L30.

I've tested Zheng suggestion with following code that did a trick (I do not propose use such thing! it's just for context):
quarkusio/quarkus-quickstarts#1311 (comment)

How to Reproduce?

Reproducer:

follow readme added here quarkusio/quarkus-quickstarts#1311. There are steps that you need to follow, you will see that if you don't do curl -v localhost:8080/transaction-logs/annotation-way, transaction is never recovered and an exception is repeatedly logged.

Output of uname -a or ver

Fedora

Output of java -version

OpenJDK Runtime Environment Temurin-17.0.7+7

GraalVM version (if different from Java)

22.3

Quarkus version or git rev

999-SNAPSHOT

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.9.3

Additional information

Related to #34545 ? Didn't check (on PTO).

@michalvavrik michalvavrik added kind/bug Something isn't working area/narayana Transactions / Narayana labels Aug 7, 2023
@michalvavrik
Copy link
Member Author

michalvavrik commented Aug 7, 2023

@gsmet this looks like important for Ghost, would it be possible to get fix in 2.3.4.Final, please?

@michalvavrik
Copy link
Member Author

michalvavrik commented Aug 7, 2023

btw maybe even more important fact is that application don't even start when I test quickstart with 3.2.x.Final due to exception also explained here quarkusio/quarkus-quickstarts#1311 (comment)

@gsmet
Copy link
Member

gsmet commented Aug 7, 2023

Do we actually have tests for this feature in the tree? Because it looks very broken.

As for knowing if we can get a fix in 3.2.4, we don't have a fix right now so I don't know.

@michalvavrik
Copy link
Member Author

michalvavrik commented Aug 7, 2023

Do we actually have tests for this feature in the tree? Because it looks very broken.

The test that is there doesn't actually test the recovery feature.

As for knowing if we can get a fix in 3.2.4, we don't have a fix right now so I don't know.

Sorry about that, I just wanted to draw your attention to this as I don't know when is 3.2.4 happening and wanted you to know about this.

Anyway, it feels like fixable for Narayana recorder just needs to be run when datasources can be initialized and make sure it initializes them if recovery is enabled. I presume Narayana folk will have a look @mmusgrov @zhfeng , but in unlikely case everyone is busy, let me know, I can look.

@gsmet
Copy link
Member

gsmet commented Aug 7, 2023

Given the time frame, let's focus on getting it to work for 3.2 and let's keep 3f9de0c out of the loop. We will have to discuss it later.

As for 3.2, there's at least one big issue: using addLastShutdownTask to register the shutdown of the recovery service will make it stop AFTER ArC has been shut down. Thus there's no way for you to actually get the datasource.

I'll have a look at this this afternoon but still unsure if I will be able to get to the bottom of it.

@michalvavrik
Copy link
Member Author

Thank you

@gsmet
Copy link
Member

gsmet commented Aug 7, 2023

@michalvavrik I was able to play with your quickstart with 3.2, with this applied: #35242 .

The interesting part is the last commit, the first ones are just backports.

That being said, I'm not completely happy and we probably need to get some people from Quarkus (including people working on CDI) and Narayana in the same virtual room to sort out this problem and come up with a solid solution.

As for main, there's the above and the other issue of datasources not being initialized when Narayana needs them that @geoand probably introduced in 3f9de0c . I'm not completely sure if this commit needs love or if it's Narayana's side that needs fixing. We probably need to talk more about it.

@michalvavrik I let you have a look at the commit and tell me what you think?

@geoand
Copy link
Contributor

geoand commented Aug 7, 2023

I would not be surprised if the CDI handling in Narayana is not correct - I came across plenty of that in the OpenTelemetry extension.

That is not a slight against anybody, it's just stuff that is very easy to get wrong due to the complicated dance during startup.

That said, I won't be around for the next 10 days or so.
If we still have the problem towards the end of the month, I can have a look

@gsmet
Copy link
Member

gsmet commented Aug 7, 2023

Yeah I think we need to take a step back and understand exactly what are the requirements/dependencies of the various services and then get it fixed. It's not an easy task though so let's tackle this when we're out of August.

@geoand
Copy link
Contributor

geoand commented Aug 7, 2023

I absolutely agree

@michalvavrik
Copy link
Member Author

Added comments to the linked PR for 3.2, thanks.

@zhfeng
Copy link
Contributor

zhfeng commented Aug 8, 2023

Thanks @gsmet for taking a look at this issue!

We don't have any test for transaction crash recovery feature automatically right now and can only run it manually because it needs to crash the application at first and then restart it to watch the recovery happening. I'm not sure if it is possible to work with the quarkus test framework? e.g. restart a application after crashing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/narayana Transactions / Narayana kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants