Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Clustered Columnstore Indexes (CCI) #68

Closed
JocaPC opened this issue Jan 8, 2017 · 15 comments · Fixed by #166
Closed

Support Clustered Columnstore Indexes (CCI) #68

JocaPC opened this issue Jan 8, 2017 · 15 comments · Fixed by #166
Assignees

Comments

@JocaPC
Copy link

JocaPC commented Jan 8, 2017

Your sample schema for Log table is a classic B-Tree, which is not optimal for storing large amount of data such as logs. CLUSTERED COLUMNSTORE index should be better choice for storing logs when JSON format is used (not applicable to XML).

  • CCI are available in all editions of SQL Server (even in Express) since SQL-2016 SP1.
  • CCI support nvarchar(max) type in the current CTP and Azure SQL Database, so we can store LogEvent column in CCI
  • JSON support is available since SQL-2016, which enables us to query LogEvent column using the functions like JSON_VALUE(LogEvent, '$.Propertes.RequestPath')

Unfortunately XML is not yet supported in CCI, but if someone uses JSON as log format, CCI would be much better option than B-tree (JSON is stored in nvarchar(max)). Could you update your docs and propose something like the following table schema if JSON logging is used with the latest SQL Server:

CREATE TABLE [Logs] (

   [Message] nvarchar(max) NULL,
   [Level] nvarchar(128) NULL,
   [TimeStamp] datetimeoffset(7) NOT NULL,
   [LogEvent] nvarchar(max) NULL,
   [Exception] nvarchar(max) NULL,
   INDEX cci CLUSTERED COLUMNSTORE
)
@nblumhardt nblumhardt added enhancement up-for-grabs This issue waits for a contributor to fix it. labels Jan 3, 2018
@MV10
Copy link
Contributor

MV10 commented Sep 5, 2018

I can't agree with this one at all.

CCI's primary use-case is super-fast reads over huge datasets (millions and billions of rows) where, crucially, the column has very high repetition (low cardinality) and queries typically match very high percentages of the data (e.g. you're querying a billion rows and expect half a billion matches... Microsoft's introductory article suggested queries with 90%+ matching rows!). The low cardinality factor supports the secondary use-case which is high compression, and the compression process is well-known for causing high CPU usage. CCI's ability to index into JSON documents is cool, but that has a pretty massive CPU overhead, too.

CCI is meant for data warehouse scenarios where you're doing massive offline ETLs to import data that is meant to be queried, not updated. Inserts are faster than updates (actually updates are implemented as a delete followed by an insert for CCI tables) but they're still measurably slower.

All of that is bad for write performance, which is of primary importance for logging.

@JocaPC
Copy link
Author

JocaPC commented Sep 6, 2018

The main value of CCI is high compression. Yes, it found the most values in DW scenarios because the batch mode processing is best fit for analytic workloads and due to a lot of repetitive values that can be nicely compressed; however, it is generally recommended structure for large tables that with read+append workload.

In addition to removal of duplicates, CCI has a number of additional compression techniques. In SQL Server 2017 is added LOB support that can provide good compression of text/json fields although there are no duplicates - see https://blogs.msdn.microsoft.com/sqlserverstorageengine/2017/02/09/extreme-25x-compression-of-json-data-using-clustered-columnstore-indexes/

I believe that it might be better to store logs in CCI instead of B-tree, because B-tree cannot compress a large data, there are no updates or deletes of log records (except truncate partitions) so there is no benefits of B-tree index SEEKs, the primary query pattern is range scan by some filter (for example timestamp/level), which might be nicely handled with CCI row-group elimination. CCI might have better bulk insert performance than B-Tree because it can insert data directly in compressed storage, but I don't believe that this is applicable in logging scenario because CCI insert is faster if you insert the batches with more than 100K rows. In the worst case, CCI write is same as B-tree write because new rows are inserted in delta store, which is B-Tree structure.

Generally, if someone decides to put the logs in database, they can expect a large amount of data and querying these data even with XML filters would be very CPU-intensive operation (the same as JSON processing). This would be problem both in B-Tree and CCI case, and we are aware that XML/JSON filtering cannot be compared with relational data filtering. However, CCI could at least mitigate IO impact because SQL server will fetch compressed data from the storage layer, and use row-group elimination to avoid rowgroups that don't have data that would match criterion.

@MV10
Copy link
Contributor

MV10 commented Sep 9, 2018

@JocaPC Since I've set up all these tests for #69, I'm going to do some CCI testing, too. I realized last night I can pretty easily set them up for this issue as well. I'm still concerned about write-performance impact but I'll go wherever the numbers lead!

@MV10
Copy link
Contributor

MV10 commented Sep 10, 2018

From #69:

@MV10 - It's great to see these results. I see in #68 that you will try it with CCI too (just beware that you will need SQL2017 to put LOB in CCI), so could you add another dimension in your test?

SQL Server places LOB data off-row meaning that each row contains just a pointer to NVARCHAR(MAX) value. There is one table option that enforces SQL Server to put NVARCHAR(MAX) values up to some size in row, and larger vales are stored off-row with a pointer:

EXEC sp_tableoption 'dbo.Log', 'large value types out of row', 0;

It would be interesting to see what would be a recommended config option in this scenario.

More details are here: https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-tableoption-transact-sql?view=sql-server-2017

@MV10
Copy link
Contributor

MV10 commented Sep 10, 2018

Regarding off-table LOBs, given that SQL2016+ can do CCI for LOBs in-row, I'm not sure that adding even more write-overhead to logging makes sense, nor would I expect to find many people dumping more than 8K of data into log events. I'd say at that point you're pretty firmly Doing It Wrong when it comes to logging.

I do have an MSDN subscription, but I'm on the fence about whether I want to install SQL2017 to try this. It just doesn't strike me as a likely (or sensible) use-case.

@JocaPC
Copy link
Author

JocaPC commented Sep 10, 2018

Regarding SQL Server 2017 I agree, and I don't expect that everyone will use it in the near future.

Just note that my proposal to test sp_tableoption is not related to CCI. This is an option that is applied on classic row-store tables that you already use. I just hoped that you could fit it as another config option in your test to test is this beneficial option or it would degrade performance.

@MV10
Copy link
Contributor

MV10 commented Sep 10, 2018

Ah, I misunderstood what you were saying about off-table LOBs. I can't imagine how writing external files is going to do anything but severely reduce performance and if the LOBs aren't large except in name (which ought to be the case with logging), it's just hard to see the point. It seems irrelevant to logging to me. SQL has thousands of settings, I'm not going to benchmark them all. 😁

But in terms of CCI, if I'm reading the docs correctly, it looks like older SQLs can handle nvarchar and other variable-length LOB-types as long as they have a specific size (not MAX). Is that your understanding as well? And I think (not 100% sure?) those sizes can be up to 8K which is far beyond what anyone should probably be logging.

My plan is to set specific sizes for Message and MessageTemplate (fairly generous, maybe 1024 bytes) and the JSON LogEvent column (probably a bit more, 2048 or something like that). No Id column. Then I'll blast a few million rows into it to see what kinds of write speeds we get with CCI.

In terms of comparing that to write speeds of non-CCI logging, I'd then create a heap b-tree table (no Id) with the same columns, and put non-clustered indexes on the TimeStamp column since people would likely search against that. I don't feel Exceptions would be a common search target and there's clearly no point in indexing the message columns (even though, I guess, technically they're indexed for CCI).

A third scenario is to simulate CCI's ability to index JSON. I'll add individual property columns and index those as well.

So, three scenarios, six tests (read/write). Am I missing anything?

@MV10
Copy link
Contributor

MV10 commented Sep 10, 2018

Just saw your post on MSDN:
https://blogs.msdn.microsoft.com/sqlserverstorageengine/2017/02/09/extreme-25x-compression-of-json-data-using-clustered-columnstore-indexes/

(Oops, same one you linked earlier ... I thought it all sounded oddly familiar.)

From your reply in the comments it sounds like I misunderstood, CCI+JSON does require SQL2017, or only if the JSON is nvarchar(MAX)?

@MV10
Copy link
Contributor

MV10 commented Sep 11, 2018

I'm impressed with the write performance. The "unindexed heap" has a 2K LogEvent column and two property columns, an int UserId and nvarchar(1024) RequestUri. Not surprisingly it is the fastest. Then I index TimeStamp and the property columns on a heap and performance is as we saw in #69. But the CCI which only has LogEvent (no separate property columns) comes very close to the unindexed heap performance. The btree version could, at best, approach CCI performance only when limited to a single index.

This was writing 3 passes of 1 million rows each. Smaller than I'd planned but spot-checking different values showed write performance didn't vary much with row count. I'll do the larger row counts for the read test where I gather it'll have the most visible impact. I also just went with a 1000-row batch size since we already saw in #69 that it had significant impact.

                                          avg---  var---  low---  high--  row/sec
unindexed heap...................... >>>  16.229  00.607  15.827  16.434  30809
btree, timestamp and property index. >>>  34.620  03.262  33.269  36.531  14443
clustered columnstore index......... >>>  19.762  01.289  19.004  20.293  25301

Just to be clear, CCI doesn't index the JSON content, correct? As in, the individual properties stored within the JSON?

Hopefully tomorrow I'll have time to work on read comparisons.

@JocaPC
Copy link
Author

JocaPC commented Sep 11, 2018

JSON is not a type. It is either NVARCHAR(MAX) or NVARCHAR(n), but SQL Server don't recognize that the internal content is formatted as JSON and do not index values in JSON. In SQL Server 2017 and below, you can use CCI+NVARCHAR(n) but CCI+NVARCHAR(MAX) is available in SQL Server 2017 and above. One important diff between NVARCHAR(MAX) and NVARCHAR(n) in CCI is that (MAX) has some additional compression and disabled some useless compression that just spend CPU but without some big effect on compression ratio. This reduces IO, which is good for writes, but increases querying if rows are already in buffer pool. If you want to achieve similar thing with NVARCHAR(n) you should put WITH(DATA_COMPRESSION = COLUMNSTORE_ARCHIVE) https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/data-compression?view=sql-server-2017#using-columnstore-and-columnstore-archive-compression

@MV10
Copy link
Contributor

MV10 commented Sep 12, 2018

Sure, I know JSON isn't a type but the original post led me to believe there was some deeper JSON/CCI relationship that I later realized doesn't exist. I gather the JSON query improvement is a side-effect of compression.

To be sure I follow: you're saying COLUMNSTORE_ARCHIVE on nvarchar(n) should produce improved write results similar to SQL2017's compression strategies for nvarchar(max)? I've seen earlier articles indicating COLUMNSTORE_ARCHIVE produces a relatively large CPU hit (adding "only" 5% to 10% but that would double the CPU usage I saw running the #69 tests... although even 20% would still leave a lot of slack in the system). And you're also saying COLUMNSTORE_ARCHIVE would reduce read performance?

Given my write test above shows CCI without COLUMNSTORE_ARCHIVE outperforms btree with indexes, I suppose it doesn't appear to be necessary. Is there another benefit I've missed?

@MV10
Copy link
Contributor

MV10 commented Sep 15, 2018

I decided to add a fourth scenario with discrete property columns so my read-test can compare JSON queries versus the same property with its own index. The results of this fourth test are kind of interesting, we still see great throughput but the average time is high, which I think reflects CCI's CPU overhead.

                                          avg---  var---  low---  high--  row/sec
unindexed heap...................... >>>  16.229  00.607  15.827  16.434  30809
btree, timestamp and property index. >>>  34.620  03.262  33.269  36.531  14443
clustered columnstore index......... >>>  19.762  01.289  19.004  20.293  25301
clustered columnstore with property. >>>  40.361  01.896  39.193  41.089  24776

Working on read tests now, hopefully I'll have time to finish them this morning.

@MV10
Copy link
Contributor

MV10 commented Sep 15, 2018

The read test results are also interesting. The query is:

SELECT [RequestUri] FROM LogTest WHERE [UserId] BETWEEN {low} AND {high} ORDER BY [TimeStamp]

with variations that exclude ORDER BY and/or that specify JSON_VALUE([LogEvent], '$.Properties.UserId') instead of a direct reference to the UserId column.

Testing 10 query passes against 3000000 rows.
Database is sized for 5 million rows and is rebuilt for each test.
Platform timer is high resolution? True @ 4605438/sec.

Possible values are 0 - 2999999, query is BETWEEN 1500000 AND 1875000, ORDER BY on TimeStamp.

                                         avg---  var---  low---  high--  reads
heap/btree, json.................... wr  33.163  00.202  33.063  33.265  374448
heap/btree, json.................... wr  33.208  00.258  33.068  33.326  374448 ordered
heap/btree, indexed................. wr  03.201  00.445  03.144  03.589  374448
heap/btree, indexed................. wr  04.867  00.160  04.807  04.967  374448 ordered
clustered columnstore, json......... wr  23.479  00.387  23.356  23.743  374448
clustered columnstore, json......... wr  24.227  00.803  24.044  24.847  374448 ordered
clustered columnstore, property col. wr  01.718  00.345  01.665  02.010  374448
clustered columnstore, property col. wr  01.972  00.041  01.949  01.990  374448 ordered

This is a somewhat unfair test of the CCI+JSON scenario since I used the same cold cache approach from the read tests of #69, and I think part of the benefit of CCI is compression in the cache, but nonetheless it's obvious that CCI outperforms an equivalent heap setup every time, and I think the difference would have been greater with more rows in the database. I could have also improved the non-CCI story by using a clustered ID which would yield better read performance, but I think this still paints a useful picture.

I did monitor CPU usage and CCI showed quite a bit of overhead both on write and read testing, although overall CPU usage remained low. In the #69 tests none of the cores ever went about about 10% and were usually down around 4% or 5% (8-core 5.1GHz AMD, 64GB 2.2GHz with mild overclock, I forget the exact speed). However, the CCI tests oten showed one or two cores spiking as high as 80% and several other cores occasionally jumping to 25%. It's something to keep in mind.

@JocaPC any criticisms or thoughts about how I've tested this?

@JocaPC
Copy link
Author

JocaPC commented Sep 15, 2018

That's what I expected. In cci+property case, cci will use column elimination and just ignore json column like it doesn't exists at all, unlike btree that needs to take all columns for each row and discard json at the execution time when it is not needed. In the read scenario cci is superior because besides less io due to compression it uses batch mode processing that can push processed columns in L2 cache instead of regular RAM that btree uses. in case of filtering by json value there are performance degradation because it parses every json in both cases but this is the same overhead both on cci and btree; however, once filtering is finished, cci can put the remaining scalar columns in L2 cache and be faster. CCi uses more cpu for decompression/compression so it is expected. Not sure about spikes. Maybe CCI must flush out some parts of memory and get rowgroups from the disk so it must use more CPU to decompress them.

Also, what exactly is unfair? Cold cache scenario is better for CCI compared to cold cache for btree because cci more quickly loads the data form disk. However, not sure that this is some key difference.

@MV10
Copy link
Contributor

MV10 commented Sep 16, 2018

Doesn't CCI also cache data in compressed form? That's what I mean about "unfair" -- I think there is a bit more to the CCI story (in CCI's favor) in more real-world usage.

There's an open request to add column indexing, I'll add a CCI flag when I get around to that other request.

@MV10 MV10 changed the title (DOC) Recommended schema for Log table in the latest SQL server/Azure SQL Database Support Clustered Columnstore Indexes (CCI) Oct 5, 2018
@MV10 MV10 self-assigned this Oct 7, 2018
@MV10 MV10 removed the up-for-grabs This issue waits for a contributor to fix it. label Oct 7, 2018
@MV10 MV10 closed this as completed in #166 Oct 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants