-
Notifications
You must be signed in to change notification settings - Fork 11
Performance
This page has tips for getting better performance out of Nevermore.
Nevermore 12.0+ is generally faster for most queries, and with larger documents, uses much less memory. If you're on an older version, consider upgrading. See what's new in Nevermore 12.0+ for benchmarks.
Nevermore is optimized particularly around reading, with a focus on being fast (similar to hand-coded SqlCommand
speed) and with very few large memory allocations. This section explains a bit about how it all works.
When querying or loading documents, Nevermore uses a different "strategy" depending on the type being queried (primitive, value tuple, arbitrary class, or a mapped document). Each strategy behaves a little differently, but the core idea is the same.
When Nevermore calls ExecuteReader
, it specifies the SequentialAccess
command behavior for the data reader, which allows Nevermore to read and process data as it streams in from SQL Server without buffering the entire row. This is especially important for large documents. Each Nevermore strategy ensures data is only read in the correct order (i.e., field 2 is read before field 3...)
Most strategies keep a cache of queries they've seen before, and a compiled func that knows how to execute the reader for that particular query.
- For primitives, it creates a func for each type you query (
int
,string
) - For ValueTuples, it creates a func for each tuple combination you query (
ValueTuple<int, int>
,ValueTuple<string, int>
) - For arbitrary classes (without a document map) + documents, it creates one per type + returned columns
The compiled func will look as much as possible as if it was written by hand. For example, when reading a document with two mapped columns, it will generate and compile something like this:
(DbDataReader reader, DocumentReaderContext context) =>
{
var temp0 = reader.GetString(0); // Id
var temp1 = reader.GetString(1); // FirstName
var temp2 = reader.IsDBNull(2) ? (int?)null : reader.GetInt32(2); // Age
var jsonResult = context.DeserializeText<Contact>(reader, 7);
result.Id = temp0;
result.FirstName = temp1;
result.Age = temp2;
return result;
}
The above is overly simplified - there are extra null checks (e.g., if GetString()
returns DBNull.Value
), and some field assignments to keep track of what column is being read, in case there's an error, and so on. But you get the picture. Note that each field is read and converted into temporary variables, as we can't deserialize until we encounter the JSON column, and the JSON column is usually not at the start of the query.
If you're curious, look at DocumentReaderExpressionBuilder
in the source code. Also, if your query causes an exception, the error message will include the source code for the generated reader in case it isn't what you expect.
The func that is compiled out of this will be fast, but building it is slow, so the result is kept for future queries.
Nevermore can store JSON either as text, or in a compressed format.
For JSON as text, there are two strategies: one when we know there will always be small documents, and one if we think there could be large documents.
- Small documents: Nevermore simply calls
DbDataReader.GetString()
and then passes it to JSON.NET for deserialization - Large documents: Nevermore uses a custom
TextReader
that calls through toDbDataReader.GetChars()
which allows it to read and process the text directly into a buffer as needed, without loading the entire document into memory.
The benefit of the large document approach is great: Nevermore can completely avoid LOH allocations and keep overall memory usage down. However, for small documents (1K or smaller) it's slightly slower, which is why there's two strategies.
How does Nevermore know which strategy to use? It looks at an ExpectLargeDocuments
property on the document map, which defaults to false. If it's true, Nevermore always uses the large document strategy.
By default, ExpectLargeDocuments
is false
- but don't worry. If Nevermore reads a string larger than 1K when deserializing, it will set ExpectLargeDocuments
to true
and remember it for next time (until the application restarts). So in this way, Nevermore is self-tuning.
This means:
- If you know for sure that the table will have at least a few large documents, set
ExpectLargeDocuments
totrue
. - If you are very confident it won't, or at least for some customers it won't, leave it as
false
.
In Octopus Deploy's case, roughly 80% of documents fit the small category (1K or under). Looking at large customer databases, only certain tables tend to have clusters of large documents.
For compressed JSON, Nevermore calls GetStream
on the data reader, and sends it through a GZipStream
to decompress through to the JSON.NET deserializer.
Nevermore also makes use of array pools when possible to reduce allocations, though all buffers used are quite small anyway.
Once the expression is compiled, the actual reading is simply done one row at a time by calling the compiled func.
If you call transaction.Stream
, you'll get each row at a time. This is good if you need to process the rows but don't need them all in memory at once.
If you use transaction.Query()
, you have the choice of Stream()
, ToList()
and so-on. Use Stream()
if possible.
In general, most optimization work in Nevermore is focussed on reading data. For writing, the focus is mostly on memory use, especially with large documents.
When inserting or updating documents, Nevermore simply generates a SQL statement with the fields you'd expect, and generates command arguments, then executes it. The properties are read from the object into the SqlCommand
using compiled funcs (one for each property, cached on the document map).
When serializing JSON (as text or compressed) Nevermore first has to serialize the document to a memory stream. For this, we use RecyclableMemoryStream
, which in testing works much better especially when documents can be quite large. The memory stream is then passed to the SqlCommand
as a stream - when the command is executed, the stream is read. Again, this means there should be no LOH allocations when dealing with even very large documents.
When JSON is compressed, Nevermore uses GZipStream
with the compression level set to Optimal
by default. This is because we assume the SQL Server database will be on another server, and the goal is to reduce network I/O on large documents. Documents are written infrequently but read often, so this seems like a good tradeoff. You can, however, change the compression level if you need to.
Beyond reading and writing documents, Nevermore has a few other operations - ExecuteQuery
, ExecuteReader
and so on. These all call the corresponding method on SqlCommand
so there isn't much to optimize.
Whenever a query is run for the first time in SQL Server, it is compiled and a query plan is generated for the query. Every query requires a query plan before it is actually executed. This query plan is stored in SQL Server query plan cache. This way when that query is run again, SQL Server doesn’t need to create another query plan; rather it uses the cached query plan which improved database performance.
From https://www.sqlshack.com/understanding-sql-server-query-plan-cache/
What's not obvious is that a lot of things can determine whether SQL Server can re-use the query plan. These include:
- The names of parameters (
@firstname1
vs.@firstname2
) - The types and sizes of parameters (
@firstname1 nvarchar(32)
vs.@firstname1 nvarchar(27)
) - Casing and other things
These can make a significant difference to query performance. To make best use of the query plan cache, Nevermore does a few things:
Sizes of string variables
When you add a string parameter to a SqlCommand
by default, it's actually defined with the length of the string value. For example, if you do two queries with different length strings as parameters, you'll get two query plans.
To solve this, when adding string parameters, Nevermore will set a string length size and round it up into buckets. The buckets are:
nvarchar(100)
nvarchar(200)
nvarchar(600)
nvarchar(1000)
So if you do these queries:
transaction.Query<Customer>().Where(c => c.Email == "[email protected]").FirstOrDefault();
transaction.Query<Customer>().Where(c => c.Email == "[email protected]").FirstOrDefault();
The @email
parameter will be passed as nvarchar(100)
for both, and SQL can reuse the query plan.
IN queries
When you perform a query like this:
var surnames = new[] { "Robinson", "Smith" };
transaction.Query<Customer>().Where(c => c.LastName.In(surnames)).FirstOrDefault();
Nevermore will generate an IN
query:
select * from dbo.Customer where LastName in (@lastname_0, @lastname_1)
Therefore, the number of parameters in the array will determine the length and number of parameters to the query. This is a problem because it will generate a lot of query plans that cannot be reused.
Instead, Nevermore will attempt to "pad" the array into buckets of a set size, so that the query plan is more likely to be reused. It does this by repeating the last value in the array until it reaches the size.
The bucket sizes are:
1, 2, 3, 4, 5, 10, 15, 20, 30, 50, 75, 100, 150, 200, 500, 750, 1000
It should be noted that if you're doing queries with a hundred or more values, you should use table-valued parameters.
Predictable parameter names
Nevermore tries to use predictable names for SQL parameters - if you run the same query within a transaction (or between transactions), you should get the same names for command parameter values.
To detect cases where this doesn't happen, Nevermore has a feature you can enable in the configuration:
#if DEBUG
configuration.DetectQueryPlanThrashing = true;
#endif
We suggest doing this in debug builds of your application only, as it adds overhead to process each query, and it's better to go ahead with the query than break the application!
If it detects an issue, it will throw an exception with a message like this:
Nevermore.Diagnostics.DuplicateQueryException:
Detected a SQL query that is otherwise a perfect duplicate of another
query, except with different parameter names. This is likely to create
thrashing of the query plan cache.
The statement being executed this time was:
select * from dbo.Customer where Id = @id_1
The statement executed last time was:
select * from dbo.Customer where Id = @id
Rewrite your query to use more predictable parameter names, as this
will allow the database to re-use the query plan for both queries.
TODO: Include
This section includes a few tips around performance.
A common pattern is to query a list of documents, then to discover you need an additional document for each one you just queried. This would be a typical "SELECT N+1" problem.
Nevermore transactions have a Load
method that takes a list of document IDs. In old versions of Nevermore, this generated a where Id in (@id1, @id2, @id3...)
query, and was to be avoided. In Nevermore 12+, it uses table value parameters to pass the list of IDs, then joins on it. This is much faster - Nevermore can load 50,000+ documents this way in a few milliseconds. So use this method often!
In fact, when doing any other kind of query where you need hundreds of parameters, you can call CommandParameterValues.AddTable
which will define a temporary table in your query full of the list you give, with a single column called ParameterValue
. You can then join on it:
List<string> emails = GetHugeListOfEmailAddresses();
var args = new CommandParameterValues();
args.AddTable("emailList ", emails);
transaction.Stream<Person>(@"
select p.* from Person p
inner join @emailList t
on t.[ParameterValue] = p.Email",
args);
Learn more at about Table-Valued Parameters in Nevermore.
If the time spent reading and writing JSON from the database and transferring it over the network between the application and SQL Server is a concern, consider enabling compression. This makes a very big difference in bandwidth-constrained environments, especially for larger documents.
Nevermore uses JSON.NET by default. You could instead use the System.Text.Json
serializer, which seems to use less memory. You can do this by setting your own IDocumentSerializer
on the store configuration. However, System.Text.Json
makes the assumption that everything you do is in UTF-8, which isn't the case in SQL Server, so you might waste time converting anyway.
There is an overload of Stream
that takes a callback, which is used when doing a projection. It's used when you want to do a query with a join, then map the result to a pair (or three or four) documents. It works well when the result is small (or when you query an arbitrary class) but not for large documents. This is because the data reader cannot use SequentialAccess
(as your projection may be out of order), so the entire row must be read into memory.
Nevermore works hard to avoid any LOH allocations (objects 85K or more) by streaming and buffering everything it can. However, if your document includes one huge allocation anyway - e.g., because it has an 85K HTML string embedded in it, or a giant array of ints - then those allocations will still be needed.
For this kind of design, you might want to consider a separate table. One table could store the JSON (which is fully deserialized) but when the full data is needed, you could fetch from the second table instead (of course, maybe this data makes more sense to store on disk anyway!).
If you call IRelationalTransaction.ExecuteReader
you'll get raw access to a data reader - you could use this, for example, to read the text using GetChars
in chunks and write it to the HTML response directly, without ever allocating a large string.
SQL has two datatypes for representing strings - nvarchar
and varchar
. Nevermore must pass all strings as nvarchar
, since the .NET string type could contain unicode. This results in implicit conversions when querying against columns, and slows queries down.
For example, take this query:
var args = new CommandParameterValues
{{
"ids", new List<string>() {"MyId-18", "MyId-48"}
}};
var count = reader.ExecuteScalar<int>(
"select count(*) from dbo.SomeTable where Id in @ids", args);
The generated SQL looks like this:
exec sp_executesql N'select count(*) from dbo.SomeTable where Id in (@ids_1, @ids_2)',N'@ids_1 nvarchar(100),@ids_2 nvarchar(100)',@ids_1=N'MyId-18',@ids_2=N'MyId-48'
Notice how all string command values are passed to SQL with the N
prefix, indicating nvarchar
.
If the ID column is declared as nvarchar
, we get a nice, efficient index seek.
However, if the ID column is declared as varchar
, it results in an index scan, and a warning when you look at the execution plan.
Overview
Getting started
Extensibility
Misc