Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deleteBy operation easily triggers OOM and has horrible performance #3177

Open
wwadge opened this issue Sep 27, 2023 · 5 comments
Open

deleteBy operation easily triggers OOM and has horrible performance #3177

wwadge opened this issue Sep 27, 2023 · 5 comments
Assignees
Labels
type: documentation A documentation update

Comments

@wwadge
Copy link

wwadge commented Sep 27, 2023

Hello,

We recently had a production incident caused by this as a repository method:

@Transactional
void deleteAllByCreatedDateBefore(OffsetDateTime startTimeStart);

Digging in the source code, this leads to:

in our case the contents of that table was large enough that it caused an OutOfMemoryException.

I would like to argue that the default behaviour is poor:

  • There's no paging, so OOM is possible
  • It's surprising: as a developer you're not expecting a SELECT to be done when you've just issued a delete.
  • Poor performance: each row is then sent back 1 row at a time back to be deleted.
  • To work around it, we need to drop to using HQL/SQL/etc. This means we need to revisit all code-bases wherever this delete pattern was applied.

I know why it was done that way (the docs explain it, it's to fire off events "on delete") but IMHO it violates the principle of least surprise. Looking around at our microservices, I can see multiple developers taking the same wrong assumption all over the place.

At the very least, this should be done paging style though I'd much rather have it so as that you'd want to add some config/annotation to indicate that you care about such events.

Thoughts?

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Sep 27, 2023
@mp911de
Copy link
Member

mp911de commented Sep 27, 2023

There's no paging, so OOM is possible

How would that help? JPA entities are attached to the session, and generally, paging and batch sizes are subject to tuning so all simplicity would vaporize.

Poor performance: each row is then sent back 1 row at a time back to be deleted.

You might be caused by surprise but specifically, JPA can cause cascading deletes resulting in multiple operations for one object to be deleted. However, batching in the sense of Criteria Delete queries could leverage optimizations within the Persistence Provider.

To work around it, we need to drop to using HQL/SQL/etc. This means we need to revisit all code-bases wherever this delete pattern was applied.

Any reason you do not use @Modifying @Query("DELETE …") JPQL queries?

I know why it was done that way

The rationale for this design is to be able to get hold of the deleted elements. Also, JPA lifecycle events are honored.

Part of the problem is that we use Criteria Queries to build the query object. Update and Delete Criteria do not share the same base types to create a query through EntityManager (see createQuery(CriteriaQuery) vs. createQuery(CriteriaDelete) and that complicates the way such queries have to be passed around.

I would argue that using JPA with bulk data is intricate on its own because the amount of data you can reliably process is constrained by the transactional capacity of your system.

@wwadge
Copy link
Author

wwadge commented Sep 27, 2023

How would that help? JPA entities are attached to the session, and generally, paging and batch sizes are subject to tuning so all simplicity would vaporize.

Good point and I get the cascading/optimisation JPA brings about though this is sort of assuming we want to care about cascades and events. In our case, we had a very simple table: no @onetomany etc.

We worked around it by @Modifying @query("DELETE...") as you mentioned but I guess what annoyed me most is that I have no way of stopping or warning other devs that they are about to potentially introduce behaviour they're not expecting.

It feels like though we have a way of working around it our options are:

  • Either opt for repository naming style, but risk OOM; or
  • Forced to drop to HQL/querydsl/sql

There's no option to do something like:

@DumbMode 
void deleteByFoo(...)

to avoid writing HQL.

Here's a graph of what happened when we did the @query("DELETE...") change, just as an indication of how badly this default hit us.

image

Happy to see different opinions on the matter obviously. We have a workound in place but I'd be OK to open a MR if you think this is something that's worth pursuing.

@mp911de mp911de added the for: team-attention An issue we need to discuss as a team to make progress label Sep 27, 2023
@quaff
Copy link
Contributor

quaff commented Oct 8, 2023

If the entity has no cascades and events, then bulk delete should be used instead of fetching all and delete one by one.

@s-volkov-1
Copy link

Any updates on this issue?

@mp911de
Copy link
Member

mp911de commented Oct 9, 2024

Likely, the only thing we can do here is add a bit of documentation to make its effects more obvious.

@mp911de mp911de assigned mp911de and unassigned gregturn Oct 9, 2024
@mp911de mp911de added type: documentation A documentation update and removed status: waiting-for-triage An issue we've not yet triaged for: team-attention An issue we need to discuss as a team to make progress labels Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: documentation A documentation update
Projects
None yet
Development

No branches or pull requests

6 participants