Fixed first/last_indexed, improved import perf, support for mariadb connector #4108

damien-git · 2024-11-25T16:31:22Z

VuFind 10 includes a PR that modifies UpdateDateTracker.java to avoid a deprecated method. After we started using that in prod, our galera cluster crashed. Disabling first/last_indexed fixed it. We are guessing galera could not keep up with the frequent statement creations during import.

This PR reverts the UpdateDateTracker PR. The finalize() code could have been simply removed, because DatabaseManager closes the database connection and hence the statements on shutdown.

It also improves performance by sending db updates by batch. This required changing DatabaseManager to call the shutdown in UpdateDateTracker before the connection is closed, otherwise the statements could not have been used reliably to finish sending db data by batch. When the change_tracker table is empty and only database inserts are used, import performance was improved by 30% in a test.

Sending batches is done a bit differently in mysql and mariadb. We are using mariadb, so I had to add specific support for mariadb, and include the mariadb JDBC connector. In PHP code the mysql driver can be used, but support for the mariadb connection string is needed.

NOTE 1: this change will require mariadb users to change their config to use mariadb as the database driver (or in the connection string). If they don't do it, firt/last_indexed import might fail with a syntax error.

NOTE 2: this code has only been tested with mariadb so far, but it also changes execution for mysql and postgres.

TODO

Merge Upgrade MySQL JDBC .jar; raise minimum MySQL requirement to 8. #4162 before merging this.
Update changelog when merging (note recommendation to use mariadb connector in config.ini when appropriate)
Follow-up action: add support for setting MariaDB connector via the web-based installer

…onnector

demiankatz · 2024-11-26T16:15:04Z

Thanks, @damien-git! Since I'm out of office this week, I haven't had time to give this an especially close look yet, but I did run the full integration test suite on this branch in my local test environment using MySQL and everything passed. That's hardly an exhaustive test, but at least there's no catastrophic issue there.

I would like to do the same thing for PostgreSQL, but upgrading to Ubuntu 24 in my VM seems to have broken my PostgreSQL install; I'll need to figure that out when time permits so I can do further testing. I'll try to find time next week when I'm back in office, though with things piling up, it's possible it will take me a little longer!

demiankatz · 2024-12-06T14:17:30Z

Just a quick note to say that I'm making some progress on getting my PostgreSQL test environment working. There have been quite a few distractions this week, but I hope to get back to this PR next week, once #4149 is merged.

damien-git · 2024-12-06T14:47:47Z

@demiankatz Thanks. I will be on vacation from Dec 13 to Jan 3 (included), and then I will be working on FOLIO development again for 6 months. I we can't finish it before I go on vacation, maybe someone else on my team can pick this up in January, or maybe I will find some time to fix issues in January.

demiankatz · 2024-12-06T16:50:53Z

Enjoy your vacation, @damien-git! If I can make more progress on this early next week, I will try! I've just opened another PR (#4151) that will also be necessary to move this forward with PostgreSQL, but hopefully that won't take long to get approved.

demiankatz · 2024-12-09T18:56:53Z

@maccabeelevine and @mtrojan-ub, since we've had conversations related to Java optimization and database logic, I thought you might have interest in this PR. No obligation to look if you're busy or not interested, but this seems like something that would benefit from more eyes! (I'm still waiting to get some PostgreSQL bug fix PRs merged so that I can move forward with testing more scenarios on my end).

demiankatz

Thanks again, @damien-git! With all of my PostgreSQL fixes now in place, I was able to run the full test suite here using PostgreSQL, and it seems to be passing, just as MySQL did previously.

It would probably be wise to test this with something a bit more robust than the standard test suite, though I'm not sure exactly what scale would be necessary to constitute an adequate test. Open to suggestions!

In any case, I'm going to approve this but leave it open for a few more days in case others have comments or report different results.

mtrojan-ub · 2024-12-11T14:51:40Z

This sounds like a great improvement, thank you very much for implementing it!

I can't really judge whether the implementation is entirely correct or not - but i can confirm that first/last indexed fields are usually the fields with the worst performance during the import process, so there is a lot of potential in optimizing this - also i cannot see any obvious bugs when having a first look at the implementation.

demiankatz · 2024-12-11T14:54:17Z

Thanks for taking a look, @mtrojan-ub. Do you want to try this out in your local environment before we merge it, or is that not feasible? If nothing else, I can try doing a larger import in my test environment to see if any problems emerge before merging this.

mtrojan-ub · 2024-12-16T11:51:57Z

@demiankatz: Time until christmas is very short, so it might take until january for me to test this, sorry... so if you have the time it would be great if you could test this.

demiankatz · 2024-12-16T13:18:30Z

Thanks, @mtrojan-ub. If I have time to do more testing of my own, I'll share the results here. In any case, if you think it's worth doing testing on your end in January, I'll hold off on merging until you have time to do that.

demiankatz

Unfortunately, there does seem to be a problem here.

I decided to test what happens if I turn on change tracking for all records in our standard test environment. Here's what I did:

1.) I started up the standard test environment with MySQL.

2.) I copied and uncommented the first_indexed and last_indexed lines from import/marc_local.properties to import/marc.properties (to ensure that they would get picked up by the Phing build process).

3.) I ran vendor/bin/phing import_biblios to reimport all of the test records.

What I expected: 328 entries in the change_tracker table (326 for the real records, plus the 2 fake deleted records we set up during the build process).

What I got: only the entries from the initial startup; the larger-scale import failed with these errors:

ERROR [Thread-3] (UpdateDateTracker.java:82) - SQLException in possiblyExecuteBatch(): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ',('biblio', 'dot.dash-underscore__3.colon:18', '2024-12-16 15:11:03.741921', '20' at line 1
ERROR [Thread-3] (UpdateDateTracker.java:183) - SQLException in shutdown hook: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ',('biblio', 'dot.dash-underscore__3.colon:18', '2024-12-16 15:11:03.741921', '20' at line 1

My guess would be that there's some kind of escaping problem where the records with unusual IDs are breaking the SQL. I'm not sure why that would be the case, though! It seems like the abstraction layer is being used appropriately. However, when I repeat my test procedure on the dev branch, I do get the expected result and no errors... so the problem is somehow introduced by the changes here.

demiankatz · 2024-12-16T15:30:23Z

Addendum:

I decided to try this with PostgreSQL as well to see if I would get different results.

Following the previous experiment, I simply ran vendor/bin/phing reset_setup -Ddbtype=pgsql and then vendor/bin/phing import_biblios a second time.

In this scenario, I get the expected result. So it seems the problem may somehow be MySQL-specific.

demiankatz · 2024-12-16T15:36:10Z

Update: the problem seems to be related to the very old MySQL JDBC .jar included in the project. Things work correctly with a newer version. I will open a separate PR to upgrade this.

demiankatz · 2024-12-16T16:02:34Z

Okay, #4162 should fix the MySQL problems I encountered.

demiankatz · 2024-12-16T17:18:01Z

Okay, #4162 has been merged to dev, and I've merged dev into this branch. This should now be in reasonable shape for @mtrojan-ub to test in the new year, if time permits.

In the meantime, one question for @damien-git: is there any possibility that using a more up-to-date MySQL jar eliminates the need for a separate MariaDB jar? (I suspect the answer is no, but just double-checking in case there's an opportunity to simplify).

damien-git · 2024-12-23T14:20:04Z

Thanks @demiankatz . I was thinking it would be a good idea to update the other drivers, but I didn't want to mix that up with this ticket and I didn't have the time to test it. We do need a separate MariaDB jar for Java, the 2 drivers have diverged noticeably, especially in regards to bulk edits.

Fixed first/last_indexed, improved import perf, support for mariadb c…

54df179

…onnector

damien-git mentioned this pull request Nov 25, 2024

UpdateDateTracker.java: Remove deprecated finalize method #3647

Merged

2 tasks

demiankatz requested review from mtrojan-ub and maccabeelevine December 9, 2024 18:55

Merge branch 'dev' into import_dates_fix

e609416

demiankatz added this to the 11.0 milestone Dec 10, 2024

demiankatz added the indexing label Dec 10, 2024

demiankatz approved these changes Dec 10, 2024

View reviewed changes

Merge branch 'dev' into import_dates_fix

be87863

demiankatz requested changes Dec 16, 2024

View reviewed changes

demiankatz mentioned this pull request Dec 16, 2024

Upgrade MySQL JDBC .jar; raise minimum MySQL requirement to 8. #4162

Merged

1 task

Merge branch 'dev' into import_dates_fix

a91f59b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed first/last_indexed, improved import perf, support for mariadb connector #4108

Fixed first/last_indexed, improved import perf, support for mariadb connector #4108

damien-git commented Nov 25, 2024 •

edited by demiankatz

Loading

demiankatz commented Nov 26, 2024

demiankatz commented Dec 6, 2024

damien-git commented Dec 6, 2024

demiankatz commented Dec 6, 2024

demiankatz commented Dec 9, 2024

demiankatz left a comment

mtrojan-ub commented Dec 11, 2024

demiankatz commented Dec 11, 2024

mtrojan-ub commented Dec 16, 2024

demiankatz commented Dec 16, 2024

demiankatz left a comment

demiankatz commented Dec 16, 2024

demiankatz commented Dec 16, 2024

demiankatz commented Dec 16, 2024

demiankatz commented Dec 16, 2024

damien-git commented Dec 23, 2024

Fixed first/last_indexed, improved import perf, support for mariadb connector #4108

Are you sure you want to change the base?

Fixed first/last_indexed, improved import perf, support for mariadb connector #4108

Conversation

damien-git commented Nov 25, 2024 • edited by demiankatz Loading

demiankatz commented Nov 26, 2024

demiankatz commented Dec 6, 2024

damien-git commented Dec 6, 2024

demiankatz commented Dec 6, 2024

demiankatz commented Dec 9, 2024

demiankatz left a comment

Choose a reason for hiding this comment

mtrojan-ub commented Dec 11, 2024

demiankatz commented Dec 11, 2024

mtrojan-ub commented Dec 16, 2024

demiankatz commented Dec 16, 2024

demiankatz left a comment

Choose a reason for hiding this comment

demiankatz commented Dec 16, 2024

demiankatz commented Dec 16, 2024

demiankatz commented Dec 16, 2024

demiankatz commented Dec 16, 2024

damien-git commented Dec 23, 2024

damien-git commented Nov 25, 2024 •

edited by demiankatz

Loading