-
Notifications
You must be signed in to change notification settings - Fork 11
Compression
JSON documents can be big! 📄
Nevermore historically and by default assumes the JSON is stored as text (in an nvarchar(max)
column), and this makes sense for documents that are small. There are some optimizations in the Nevermore 12.0+ that make it work better for larger documents, but it's still a lot of data to shuffle back and forth.
Rather than storing documents as text, Nevermore 12.0+ supports compressing the JSON.
Much of the data in JSON documents is repetitive. Property names, types and ID's are repeated over and over.
In Octofront, for example, the LicenseKeys
table keeps a history of the changes made to each license key. Each history entry has a type, date, and some properties. Those strings are repeated potentially hundreds of times. Compressing this data reduces the size of the table at the time of writing from 435 MB to 118 MB.
Good example to try in Octopus would be Variables
and DeploymentProcess
. Bigger documents with collections as properties will naturally mean repeating many of the same strings.
The benefits of compression are:
- Less space used in the database
- Less space used in data transfer
The downside, of course, is:
- CPU time spent compressing and decompressing when reading data
- Loss of the ability to use features like JSON_VALUE
In testing with (local)\SQLEXPRESS and a 120K document with a 50% compression ratio, these benefits canceled each other out.
Format | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
Text | 1.560 ms | 0.2761 ms | 0.0151 ms | 64.4531 | 29.2969 | - | 357.26 KB |
Compressed | 1.703 ms | 0.3508 ms | 0.0192 ms | 66.4063 | 31.2500 | - | 356.59 KB |
But SQL Server rarely runs on the same machine. In those cases, compression makes a big difference. Here's the same test running when using NetLimiter to limit bandwidth in/out of SQLEXPRESS to 100mbps:
Format | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
Text | 21.242 ms | 1.637 ms | 0.0898 ms | 62.5000 | 31.2500 | - | 357.29 KB |
Compressed | 7.756 ms | 1.099 ms | 0.0602 ms | 62.5000 | 23.4375 | - | 356.6 KB |
100mbps is probably a real-world best case too. In Octofront, it's unlikely we get that when talking from our web apps to the SQL Azure database for example. I suspect the same will be true for Octopus Cloud. Customer networks may be different, but I'd expect it might make a considerable impact on many of them.
To use compression, you need to:
- Add a
[JSONBlob]
column of typevarbinary(max)
- In your
DocumentMap
, set theJsonStorageFormat
property. - Migrate the data somehow
Here's an example of a document map:
public class BigDocumentMap : DocumentMap<BigDocument>
{
public BigDocumentMap()
{
Column(m => m.Name);
Column(m => m.Type);
JsonStorageFormat = JsonStorageFormat.CompressedOnly;
}
}
There are four options for JsonStorageFormat
. The first two are for when you're creating a new type of document:
-
TextOnly: The default. Use when you only have a
[JSON]
column. -
CompressedOnly: Use when you only have a
[JSONBlob]
column.
The second two are for when you've got documents with data in them already, and you can't do a big-bang migration. Nevermore supports automatic migration for each document as you load and save it.
-
MixedPreferCompressed: Use when you have both
[JSONBlob]
and[JSON]
columns, and want data in the[JSON]
column to be migrated to the[JSONBlob]
column -
MixedPreferText: Use when you have both
[JSONBlob]
and[JSON]
columns, and want data in the[JSONBlob]
column to be migrated to the[JSON]
column.
When inserting or updating data, Nevermore will use the JSON serializer to write it as UTF-8, and GZipStream
the result, to store in the column.
When reading, Nevermore will use GZipStream
to read it back, again assuming the result is UTF-8.
Data in the columns is stored compressed using GZIP, which means you can use the COMPRESS
and DECOMPRESS
functions in SQL Server 2016 to interact with it.
However, the text within the GZIP stream is encoded using UTF-8, while SQL Server naturally uses UCS-2 (similar, but not quite the same as UTF-16).
In fact, dealing with text encodings in SQL is such a mess that we had a choice to make:
- Use UTF-8, which uses half as many bytes, and works naturally with
System.Text.Json
, or: - Have Nevermore read and write data in the encoding format SQL naturally understands, which may not actually work anyway
The whole thing is a bit of a mess: if the database collation has _SC at the end, you can use UTF-16. SQL 2019 adds UTF-8 support, but you get it by changing your collation again. We just can't predict what it will be.
So at this point, it's probably best to assume that if you enable compression, you can't use JSON_VALUE
or other functions from within SQL to interact with the JSON anymore.
If you enable compression on a table that already has data, you need to migrate the data - you can't have some records uncompressed.
The COMPRESS
function in SQL Server would have been helpful for the migration, except the text will remain encoded with whatever the collation in SQL is, which Nevermore might not be able to read. The basic plan would need to be:
- Create a new table with all the same columns (
VariableSet2
), but make the JSON columnvarbinary(max)
- From C#, do an insert into/select from
- Drop the old table and any indexes on it, and rename the new table
- Add indexes to the new table
We could have chosen other approaches for compression - using BSON instead of JSON, or using a better compression algorithm than GZIP. However, those would make it difficult to use any of the data from within SQL directly. The compatibility and migration benefits of this setup seem to be the optimal balance.
Further reading:
Overview
Getting started
Extensibility
Misc