Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PS-8879: OOM when alter column to compression (percona#5164) (percona…
…#5168) https://jira.percona.com/browse/PS-8879 Post-push fix. Keep uncompressed data on blob_heap. compress_heap is used only for compressed data. Q: Why do we need separate heaps for compression and decompression? A: There are below scenarios when we use one or both heaps and we need to manage them separately. Case 1: Consider the following table: CREATE TABLE t1 (a INT, b BLOB COLUMN_FORMAT COMPRESSED); If we update the row in the following way: UPDATE t1 SET a=1 WHERE a = 0; the flow is: For every row do: 1. Read row 1.1. Read row with compressed BLOB from the storage into blob_heap (allocates mem from blob_heap) 1.2. Decompress BLOB. Put decompressed version on heap_x and link record with BLOB pointer (allocates memory from heap_x) 1.3. Construct the new row. As BLOB didn't change, reuse the pointer to it. 2. Write row 2.1. Compress BLOB. Put compressed version on compress_heap (allocates memory from compress_heap) 2.2. Store row with compressed BLOB to the storage. 3. Write row event to binlog 3.1. Use decompressed blob pointer (from heap_x) Note that in the above flow we have 1 table involved, so one 'prebuilt' object. So we need to keep decompressed BLOB buffer valid until we write the row into the binlog. Every loop allocates memory from heaps, so we need to free it when not needed anymore to avoid OOM. That's why blob_heap is cleared before row read and compress_heap is cleared after row write (it could be cleared before row write as well as it was in the original implementation). We can not use compress_heap to keep decompressed version of the BLOB, because it has to retain up to binlogging. The last moment we are able to deallocate it is at the end of row write, but it is too early. That's why we do it before next row read (and additionally after the query). Note: Decompression relates not only to BLOB columns, but also text and varchar. In case of such columns (and no blob in row), there is no blob_heap allocated, that's why we allocate it during decompression. Logically, we could use dedicated decompress_heap (heap_x from above) to keep decompressed version of column, but that would lead to bigger changes (also in partitioned tables). Both blob_heap and decompress_heap (heap_x) would serve the same purpose: keep decompressed version of column, so we can use blob_heap for this purpose. Case 2: Consider the following table: CREATE TABLE t1 (b1 BLOB, b2 BLOB COLUMN_FORMAT COMPRESSED); If we alter this table in the following way: ALTER TABLE t1 MODIFY b1 BLOB COLUMN_FORMAT COMPRESSED; the flow is: 1. Create temp table with compressed BLOB b1 column (target table) 2. For every row do: 3. Read row from original table 3.1 Read row with BLOBs from the storage into blob_heap 3.2. Decompress BLOB b2. Put decompressed version on heap_x and link record with BLOB b2 pointer 2. Write row to target table 2.1. Compress BLOBs. Put compressed version on compress_heap 2.2. Store row with compressed BLOBs to the storage. Note that in the above flow we have 2 different tables, so 2 different 'prebuilt' objects. Again, when we are looping over rows, we are allocating memory from heaps, but we need to keep decompressed version of the row until it is stored back, so we could free decompressed version after writing row. But it has to be kept longer because of Case 1. This is why we free it before next row read (or after the query). Considering two above cases we get to the following conclusions: 1. We need to keep decompressed version of row until it is binlogged => we can free it before next row read and after the query is completed 2. We need to free compressed version of row to avoid OOM => we can free it before or after row write and after the query is completed
- Loading branch information