Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Batch ordering - Regression introduced in version 13.26.1, insertAll() leaves batchMode=true enabled ... when it should not #3487

Merged
merged 2 commits into from
Oct 10, 2024

Conversation

nPraml
Copy link
Contributor

@nPraml nPraml commented Oct 2, 2024

Hello @rbygrave,

we are currently in the process of updating from ebean 13 to 14 and we encountered a bug, see the test case.

The entity Contact has a @ManyToOne property (without cascadeType) ContactGroup.

In a transaction we store some elements using DB.insertAll(). Then we save a contact (without a group), we create a group with a certain id, we save it and then we put the group in the contact.

With transaction flush, however, the order in the queue is not correct: contact (with group) is saved first, but group does not yet exist in the DB.

What we discovered during debugging: after DB.insertAll(), txn.batchMode = true is set (this was not the case with ebean 13) and during flush the objects contact and group are in the wrong order in the queue.

With ebean 13 you have to explicitly call txn.setBatchMode(true) for the test to fail.

Could you please take a look and help us fix this?

Cheers,
Noemi

@rPraml FYI

@nPraml
Copy link
Contributor Author

nPraml commented Oct 2, 2024

if I modify insertAll on the same way as saveAll:

image

it fixes the test, but the test case fails if I use instead insertAll a trx.setBatchMode(true)

@rbygrave
Copy link
Member

rbygrave commented Oct 8, 2024

txn.flushBatchOnCollection()

Yes, this bug is a regression that was introduced in version 13.26.1 by #3324 line - https://github.com/ebean-orm/ebean/pull/3324/files#diff-0804968369579e408eea483a9e582485cd7faf94e5d78dc8f7a918c729962a70R1695

It should have that txn.flushBatchOnCollection()

but the test case fails if I use ... trx.setBatchMode(true)

That is expected by me.

The thinking here is that this isn't a case of "Cascading" persistence [where ebean is determining the "depth" and ordering based on the "depth"] but instead a case where both Contact and ContactGroup are explicitly saved as "Top Level" via database.save()

So when we have:

...
database.save(contact);
...
database.save(contactGroup);

The contact was saved before the contactGroup ... and these are both "Top Level" ... and so the ordering is then down to the order in which those are saved, which is contact first.

Does that make sense?

@rbygrave rbygrave changed the title Bug: Batch ordering Bug: Batch ordering - Regression introduced in version 13.26.1, insertAll() leaves batchMode=true enabled ... when it should not Oct 8, 2024
@rPraml
Copy link
Contributor

rPraml commented Oct 8, 2024

Hello Rob, unfortunately the places in our code are a bit spread out. One part of the code creates the contacts, the other the contact groups. A third then takes care of mapping both.
Imagine there are 3 CSV files (contacts.csv, groups.csv and mapping.csv) I tried to model this in a unit test

mappingCsv.forEach((contactId, groupId) -> {
Contact contact = createdContacts.get(contactId);
contact.setGroup(DB.reference(ContactGroup.class, groupId));
DB.save(contact);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last code part does some post-processing on some of the objects (the objects may not be saved, yet due ebean batching)

Yes, there are several ways, how to solve this issue

  • do a DB.saveAll(createdContacts.value())
  • disable batching
  • do a flush
  • write a better import-code 😉

I'm unsure, if ebean can do much here. It must know or adjust the save-order, but this may rise up new problems

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure, if ebean can do much here

I know its an example but to me its pretty clear that import code should look to first materialise and save the [ContactGroup] beans that will be later referenced by other beans [Contact].

Ebean gives the application exact control over this. With that control, this import can be done as batch inserts of ContactGroup followed by batch inserts of Contact with no actual updates required [optimal handling of this from the database perspective].

You are going to have a hard time convincing me that ebean should change behaviour here.

What we want is for ebean to give developers the ability to control exactly how this works because at certain levels of scale we absolutely need that control. Trying to get "fancy" here would not end well.

do a DB.saveAll(createdContacts.value())

So, do that and rely on ebean cascade persisting from Contact to ContactGroup [and then ebean will determine the ordering because we are now using cascade persist and ebean will get it right]. Yes, a good option.

disable batching

At some scale people regret this. I say invest the time and think about ordering and stay with jdbc batch [and get orders of magnitude performance benefit from that investment]. I'd personally never go without batching here.

do a flush

If we really must. Doing so, implies that we'll see extra updates being used when if the ordering was correct the app would avoid those extra updates. So generally not optimal.

write a better import-code

IMO this is actually the best answer. Actually spend some time thinking about the ordering and design import processing with that in mind, looking to use JDBC batch and avoiding extra updates when possible.

@rPraml
Copy link
Contributor

rPraml commented Oct 8, 2024

You are going to have a hard time convincing me that ebean should change behaviour here.

Everything is fine, we just noticed this "regression" because with ebean 14 some unit tests in our application failed because batching was active after insertAll.

We have the situation here that many people work on our code base and (unfortunately) often produce suboptimal code, as in this case.

Fortunately, we also have a lot of unit tests, so we can find most of these errors quickly.

If you say it "works as planned", then I also have good arguments in our company that maybe some parts of the code need to be revised.

@rPraml rPraml force-pushed the batch-ordering-bug branch from 3286276 to bf6c7e2 Compare October 8, 2024 11:13
@rPraml rPraml force-pushed the batch-ordering-bug branch from bf6c7e2 to 8b4804b Compare October 8, 2024 11:17
@rPraml
Copy link
Contributor

rPraml commented Oct 8, 2024

Stripped down the testcase

@rob-bygrave
Copy link
Contributor

Stripped down the testcase

Beautiful, love it, thanks !!

I'm happy with this PR. As this is a "Regression Bug" I think this should go into version 14.7.0 (not patch version).

So, wondering if there is anything that should go ahead into a 14.6.1 release or if the next release is 14.7.0 with this change. I suspect we are not going to have a 14.6.1 at this stage.

@rbygrave rbygrave added this to the 14.6.1 milestone Oct 10, 2024
@rbygrave rbygrave merged commit a2ee8b6 into ebean-orm:master Oct 10, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants