You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While investigating a query case that could OOM the server (see #6556), it was discovered that compactions likely have the same problem.
If there are a lot of points on disk for the same series spread across many blocks in TSM files and a point is overwritten near the beginning of the shard's time range, the full series could be loaded into RAM triggering OOMs and huge allocations.
The compaction code uses a similar method as the query path to handle updating points in the past in that it just reads the whole series and deduplicates in memory. It's very likely that this can cause the server to OOM during compactions if enough data is being compacted. Compaction code needs to be updated similar to the fix in #6556.
The text was updated successfully, but these errors were encountered:
If a large series contains a point that is overwritten, the compactor
would load the whole series into RAM during a full compaction. If
the series was large, it could cause very large RAM spikes and OOMs.
The change reworks the compactor to merge blocks more incrementally
similar to the fix done in #6556.
Fixes#6557
If a large series contains a point that is overwritten, the compactor
would load the whole series into RAM during a full compaction. If
the series was large, it could cause very large RAM spikes and OOMs.
The change reworks the compactor to merge blocks more incrementally
similar to the fix done in #6556.
Fixes#6557
While investigating a query case that could OOM the server (see #6556), it was discovered that compactions likely have the same problem.
If there are a lot of points on disk for the same series spread across many blocks in TSM files and a point is overwritten near the beginning of the shard's time range, the full series could be loaded into RAM triggering OOMs and huge allocations.
The compaction code uses a similar method as the query path to handle updating points in the past in that it just reads the whole series and deduplicates in memory. It's very likely that this can cause the server to OOM during compactions if enough data is being compacted. Compaction code needs to be updated similar to the fix in #6556.
The text was updated successfully, but these errors were encountered: