deleteBy operation easily triggers OOM and has horrible performance #3177

wwadge · 2023-09-27T08:00:15Z

Hello,

We recently had a production incident caused by this as a repository method:

@Transactional
void deleteAllByCreatedDateBefore(OffsetDateTime startTimeStart);

Digging in the source code, this leads to:

spring-data-jpa/spring-data-jpa/src/main/java/org/springframework/data/jpa/repository/query/JpaQueryExecution.java

Line 295 in 7cdf53f

List<?> resultList = query.getResultList();

in our case the contents of that table was large enough that it caused an OutOfMemoryException.

I would like to argue that the default behaviour is poor:

There's no paging, so OOM is possible
It's surprising: as a developer you're not expecting a SELECT to be done when you've just issued a delete.
Poor performance: each row is then sent back 1 row at a time back to be deleted.
To work around it, we need to drop to using HQL/SQL/etc. This means we need to revisit all code-bases wherever this delete pattern was applied.

I know why it was done that way (the docs explain it, it's to fire off events "on delete") but IMHO it violates the principle of least surprise. Looking around at our microservices, I can see multiple developers taking the same wrong assumption all over the place.

At the very least, this should be done paging style though I'd much rather have it so as that you'd want to add some config/annotation to indicate that you care about such events.

Thoughts?

The text was updated successfully, but these errors were encountered:

mp911de · 2023-09-27T08:31:57Z

There's no paging, so OOM is possible

How would that help? JPA entities are attached to the session, and generally, paging and batch sizes are subject to tuning so all simplicity would vaporize.

Poor performance: each row is then sent back 1 row at a time back to be deleted.

You might be caused by surprise but specifically, JPA can cause cascading deletes resulting in multiple operations for one object to be deleted. However, batching in the sense of Criteria Delete queries could leverage optimizations within the Persistence Provider.

To work around it, we need to drop to using HQL/SQL/etc. This means we need to revisit all code-bases wherever this delete pattern was applied.

Any reason you do not use @Modifying @Query("DELETE …") JPQL queries?

I know why it was done that way

The rationale for this design is to be able to get hold of the deleted elements. Also, JPA lifecycle events are honored.

Part of the problem is that we use Criteria Queries to build the query object. Update and Delete Criteria do not share the same base types to create a query through EntityManager (see createQuery(CriteriaQuery) vs. createQuery(CriteriaDelete) and that complicates the way such queries have to be passed around.

I would argue that using JPA with bulk data is intricate on its own because the amount of data you can reliably process is constrained by the transactional capacity of your system.

wwadge · 2023-09-27T08:59:39Z

How would that help? JPA entities are attached to the session, and generally, paging and batch sizes are subject to tuning so all simplicity would vaporize.

Good point and I get the cascading/optimisation JPA brings about though this is sort of assuming we want to care about cascades and events. In our case, we had a very simple table: no @onetomany etc.

We worked around it by @Modifying @query("DELETE...") as you mentioned but I guess what annoyed me most is that I have no way of stopping or warning other devs that they are about to potentially introduce behaviour they're not expecting.

It feels like though we have a way of working around it our options are:

Either opt for repository naming style, but risk OOM; or
Forced to drop to HQL/querydsl/sql

There's no option to do something like:

@DumbMode 
void deleteByFoo(...)

to avoid writing HQL.

Here's a graph of what happened when we did the @query("DELETE...") change, just as an indication of how badly this default hit us.

Happy to see different opinions on the matter obviously. We have a workound in place but I'd be OK to open a MR if you think this is something that's worth pursuing.

quaff · 2023-10-08T01:12:50Z

If the entity has no cascades and events, then bulk delete should be used instead of fetching all and delete one by one.

s-volkov-1 · 2024-10-08T15:21:43Z

Any updates on this issue?

mp911de · 2024-10-09T07:47:23Z

Likely, the only thing we can do here is add a bit of documentation to make its effects more obvious.

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Sep 27, 2023

mp911de added the for: team-attention An issue we need to discuss as a team to make progress label Sep 27, 2023

quaff mentioned this issue Oct 16, 2023

Unable to use deleteInBatch due to cascade issue #3198

Closed

mp911de assigned gregturn Oct 16, 2023

seokmyungham mentioned this issue Aug 22, 2024

[BE] JPA의 saveAll, deleteAll를 bulk query로 개선 woowacourse-teams/2024-momo#273

Merged

mp911de assigned mp911de and unassigned gregturn Oct 9, 2024

mp911de added type: documentation A documentation update and removed status: waiting-for-triage An issue we've not yet triaged for: team-attention An issue we need to discuss as a team to make progress labels Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deleteBy operation easily triggers OOM and has horrible performance #3177

deleteBy operation easily triggers OOM and has horrible performance #3177

wwadge commented Sep 27, 2023

mp911de commented Sep 27, 2023

wwadge commented Sep 27, 2023

quaff commented Oct 8, 2023

s-volkov-1 commented Oct 8, 2024

mp911de commented Oct 9, 2024

deleteBy operation easily triggers OOM and has horrible performance #3177

deleteBy operation easily triggers OOM and has horrible performance #3177

Comments

wwadge commented Sep 27, 2023

mp911de commented Sep 27, 2023

wwadge commented Sep 27, 2023

quaff commented Oct 8, 2023

s-volkov-1 commented Oct 8, 2024

mp911de commented Oct 9, 2024