Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient SQL query on persisted actors recovery causes database overload #5419

Closed
lucavice opened this issue Dec 8, 2021 · 5 comments
Closed
Labels
akka-persistence-sql-common Akka.Persistence.Sql.Common perf
Milestone

Comments

@lucavice
Copy link
Contributor

lucavice commented Dec 8, 2021

Version Information
Reproduced with Akka 1.4.10.
Uses Akka.Persistence.SqlServer for persistance.

Describe the bug
When persisted actors are recovered, the highest sequence number is retrieved with the following query:

HighestSequenceNrSql = $@"
SELECT MAX(u.SeqNr) as SequenceNr
FROM (
SELECT e.{Configuration.SequenceNrColumnName} as SeqNr FROM {Configuration.FullJournalTableName} e WHERE e.{Configuration.PersistenceIdColumnName} = @PersistenceId
UNION
SELECT m.{Configuration.SequenceNrColumnName} as SeqNr FROM {Configuration.FullMetaTableName} m WHERE m.{Configuration.PersistenceIdColumnName} = @PersistenceId) as u";

Due to how the query is written, the max value is calculated from a temporary table that selects everything matching the specified PersistenceId , and doing an union between the journal and the metadata table. This is normally fine with actors with a low number of events for a specific PersistenceId, but the process of building the intermediate table forces the database to actually select all the matching rows and retrieve them. For actors that have a high number of events for a single PersistenceId this causes a very high I/O throughput and a long execution time.

We noticed this on our system where we have some EventProcessors actors that have million of events for a single PersistenceId. Even though they have snapshots, this query is still executed and it takes several seconds to complete. When Akka is booted and persisted actors are recovered, we have a dozen of those EventProcessors actors that compete to complete this query, causing I/O exhaustion and SQL timeouts after several minutes:

System.Data.SqlClient.SqlException (0x80131904): Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
 ---> System.ComponentModel.Win32Exception (258): The wait operation timed out.
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.SqlCommand.InternalEndExecuteReader(IAsyncResult asyncResult, String endMethod)
   at System.Data.SqlClient.SqlCommand.EndExecuteReaderInternal(IAsyncResult asyncResult)
   at System.Data.SqlClient.SqlCommand.EndExecuteReader(IAsyncResult asyncResult)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location ---
   at Akka.Persistence.Sql.Common.Journal.AbstractQueryExecutor.SelectHighestSequenceNrAsync(DbConnection connection, CancellationToken cancellationToken, String persistenceId)
   at Akka.Persistence.Sql.Common.Journal.SqlJournal.ReadHighestSequenceNrAsync(String persistenceId, Int64 fromSequenceNr)
   at Akka.Util.Internal.AtomicState.CallThrough[T](Func`1 task)
   at Akka.Util.Internal.AtomicState.CallThrough[T](Func`1 task)
ClientConnectionId:1ea18be6-f53f-43aa-b760-fea225669a7f
Error Number:-2,State:0,Class:11

To Reproduce
Steps to reproduce the behavior:

  1. Run the affected SQL query on any PersistenceId. Example with example actor called CheckoutSagaManager:
    SELECT MAX(u.SeqNr) as SequenceNr FROM ( SELECT e.SequenceNr as SeqNr FROM dbo.EventJournal e WHERE e.PersistenceId = 'CheckoutSagaManager' UNION SELECT m.SequenceNr as SeqNr FROM dbo.Metadata m WHERE m.PersistenceId = 'CheckoutSagaManager') as u

  2. Analyze the execution plan, and notice how the number of rows read matches the number of events for that PersistenceId:

image

Expected behavior
This query does not need to read all rows, it should only select the MAX by using the existing index

Environment
Windows

Solution
The solution to this problem is very simple, as it is sufficient to include the MAX statement in the sub-queries in the UNION, so that the database is not forced to retrieve all these rows.

I'm currently working on the fix, and I will create PR with resolution shortly.

@Aaronontheweb Aaronontheweb added perf akka-persistence-sql-common Akka.Persistence.Sql.Common labels Dec 8, 2021
@Aaronontheweb Aaronontheweb added this to the 1.4.29 milestone Dec 8, 2021
@Aaronontheweb
Copy link
Member

Yep, looks like a bad query.

@Aaronontheweb
Copy link
Member

This fix will be available in tonight's nightly: https://getakka.net/community/getting-access-to-nightly-builds.html

We're scheduled to do a new release soon and this will be included in it.

@Aaronontheweb
Copy link
Member

closed via #5420

@lucavice
Copy link
Contributor Author

@Aaronontheweb, just dropping by to say that I've tested release 1.4.29 and I can see in SQL profiler the updated query being applied and outputting expected result.

image

@Aaronontheweb
Copy link
Member

@lucavice excellent, we're going to release updates with this fix to all of the SQL Akka.Persistence plugins this shortly too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
akka-persistence-sql-common Akka.Persistence.Sql.Common perf
Projects
None yet
Development

No branches or pull requests

2 participants