Skip to content

Compression

Paul Stovell edited this page Apr 16, 2020 · 19 revisions

JSON documents can be big! 📄

Nevermore historically and by default assumes the JSON is stored as text (in an nvarchar(max) column), and this makes sense for documents that are small. There are some optimizations in the Nevermore 12.0+ that make it work better for larger documents, but it's still a lot of data to shuffle back and forth.

Rather than storing documents as text, Nevermore 12.0+ supports compressing the JSON.

Why use compression?

Much of the data in JSON documents is repetitive. Property names, types and ID's are repeated over and over.

In Octofront, for example, the LicenseKeys table keeps a history of the changes made to each license key. Each history entry has a type, date, and some properties. Those strings are repeated potentially hundreds of times. Compressing this data reduces the size of the table at the time of writing from 435 MB to 118 MB.

Good example to try in Octopus would be Variables and DeploymentProcess. Bigger documents with collections as properties will naturally mean repeating many of the same strings.

The benefits of compression are:

  1. Less space used in the database
  2. Less space used in data transfer

The downside, of course, is:

  1. CPU time spent compressing and decompressing when reading data
  2. Loss of the ability to use features like JSON_VALUE

In testing with (local)\SQLEXPRESS and a 120K document with a 50% compression ratio, these benefits canceled each other out.

Format Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Text 1.560 ms 0.2761 ms 0.0151 ms 64.4531 29.2969 - 357.26 KB
Compressed 1.703 ms 0.3508 ms 0.0192 ms 66.4063 31.2500 - 356.59 KB

But SQL Server rarely runs on the same machine. In those cases, compression makes a big difference. Here's the same test running when using NetLimiter to limit bandwidth in/out of SQLEXPRESS to 100mbps:

Format Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Text 21.242 ms 1.637 ms 0.0898 ms 62.5000 31.2500 - 357.29 KB
Compressed 7.756 ms 1.099 ms 0.0602 ms 62.5000 23.4375 - 356.6 KB

100mbps is probably a real-world best case too. In Octofront, it's unlikely we get that when talking from our web apps to the SQL Azure database for example. I suspect the same will be true for Octopus Cloud. Customer networks may be different, but I'd expect it might make a considerable impact on many of them.

Enabling compression

To use compression, you need to:

  1. Add a [JSONBlob] column of type varbinary(max)
  2. In your DocumentMap, set the JsonStorageFormat property.
  3. Migrate the data somehow

Here's an example of a document map:

public class BigDocumentMap : DocumentMap<BigDocument>
{
    public BigDocumentMap()
    {
        Column(m => m.Name);
        Column(m => m.Type);
        JsonStorageFormat = JsonStorageFormat.CompressedOnly;
    }
}

There are four options for JsonStorageFormat. The first two are for when you're creating a new type of document:

  • TextOnly: The default. Use when you only have a [JSON] column.
  • CompressedOnly: Use when you only have a [JSONBlob] column.

The second two are for when you've got documents with data in them already, and you can't do a big-bang migration. Nevermore supports automatic migration for each document as you load and save it.

  • MixedPreferCompressed: Use when you have both [JSONBlob] and [JSON] columns, and want data in the [JSON] column to be migrated to the [JSONBlob] column
  • MixedPreferText: Use when you have both [JSONBlob] and [JSON] columns, and want data in the [JSONBlob] column to be migrated to the [JSON] column.

How it works

When inserting or updating data, Nevermore will use the JSON serializer to write it as UTF-8, and GZipStream the result, to store in the column.

When reading, Nevermore will use GZipStream to read it back, again assuming the result is UTF-8.

Querying compressed data

Data in the columns is stored compressed using GZIP, which means you can use the COMPRESS and DECOMPRESS functions in SQL Server 2016 to interact with it.

However, the text within the GZIP stream is encoded using UTF-8, while SQL Server naturally uses UCS-2 (similar, but not quite the same as UTF-16).

In fact, dealing with text encodings in SQL is such a mess that we had a choice to make:

  • Use UTF-8, which uses half as many bytes, and works naturally with System.Text.Json, or:
  • Have Nevermore read and write data in the encoding format SQL naturally understands, which may not actually work anyway

The whole thing is a bit of a mess: if the database collation has _SC at the end, you can use UTF-16. SQL 2019 adds UTF-8 support, but you get it by changing your collation again. We just can't predict what it will be.

So at this point, it's probably best to assume that if you enable compression, you can't use JSON_VALUE or other functions from within SQL to interact with the JSON anymore.

Migration

If you enable compression on a table that already has data, you need to migrate the data - you can't have some records uncompressed.

The COMPRESS function in SQL Server would have been helpful for the migration, except the text will remain encoded with whatever the collation in SQL is, which Nevermore might not be able to read. The basic plan would need to be:

  1. Create a new table with all the same columns (VariableSet2), but make the JSON column varbinary(max)
  2. From C#, do an insert into/select from
  3. Drop the old table and any indexes on it, and rename the new table
  4. Add indexes to the new table

Why not...

We could have chosen other approaches for compression - using BSON instead of JSON, or using a better compression algorithm than GZIP. However, those would make it difficult to use any of the data from within SQL directly. The compatibility and migration benefits of this setup seem to be the optimal balance.

Further reading:

Clone this wiki locally