[Segment Replication] Replicate *.liv file may cause performance issue #3929

hydrogen666 · 2022-07-16T02:55:00Z

In document replication scenario, *.liv file are only be written to disk when flush operation (Lucene commit) is performed. But in segment replication, the *.liv file must be written every refresh (by setting writeAllDeletes to true in DirectoryReader#open method).

It may not cause any problem in append-only scenario (no delete will be issued in old segment). But in update scenario, as long as there is a delete operation performed in old segment, primary shard's refresh will write full liv bitmap to disk and replicate to replica shard.

*.liv file may be very large in some merged segment (for example *.liv file for a segment with 16,000,000 docs takes up ~2MB disk space). Differ with segment data file, we cannot reuse old *.liv file when new *.liv file is generated, even if only one doc is deleted in segment, we must replicate the full *.liv file. So in segrep, write and replicate *.liv file may introduce greater network and CPU (write and load *.liv file) load.

Several ways to fix this issue:

Write diff other than full bitmap when refresh is performed
Compress liv doc file with LZ4 or zstd

The text was updated successfully, but these errors were encountered:

Jeevananthan-23 · 2023-03-06T07:01:03Z

Adding some already tested results belonging to bandwidth on the wire

Performance:
Early performance tests show improvements with segment replication enabled. This run using OpenSearch benchmark showed a ~40-45% drop in CPU and Memory usage, a 19% drop in p99 latency and a 57% increase in p100 throughput.

Instance type: m5.xlarge
Cluster Details: 3 Nodes with 6 shards and 1 replica each.
Test Dataset: Stackoverflow for 3 test iterations with 2 warmup iterations.

IOPS:
Document Replication: (Read 852k + Write 71k) / 1hr = 256 IOPS
Segment Replication: (Read 145k + Write 1M) / 1 hr = 318 IOPS

Total Bandwidth used:
Document Replication: 527 Gb
Segment Replication: 929 Gb

anasalkouz · 2023-07-13T15:53:02Z

@Poojita-Raj Any update on this?

hydrogen666 added enhancement Enhancement or improvement to existing feature or request untriaged labels Jul 16, 2022

owaiskazi19 added Indexing & Search and removed untriaged labels Jul 18, 2022

dreamer-89 mentioned this issue Jul 25, 2022

[META] Segment Replication Issue list #2194

Closed

31 tasks

anasalkouz added the distributed framework label Mar 8, 2023

anasalkouz added this to Segment Replication Mar 8, 2023

github-project-automation bot moved this to Todo in Segment Replication Mar 8, 2023

Bukhtawar added the Indexing:Replication Issues and PRs related to core replication framework eg segrep label Jul 27, 2023

anasalkouz removed Indexing & Search distributed framework labels Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Segment Replication] Replicate *.liv file may cause performance issue #3929

[Segment Replication] Replicate *.liv file may cause performance issue #3929

hydrogen666 commented Jul 16, 2022

Jeevananthan-23 commented Mar 6, 2023

anasalkouz commented Jul 13, 2023

[Segment Replication] Replicate *.liv file may cause performance issue #3929

[Segment Replication] Replicate *.liv file may cause performance issue #3929

Comments

hydrogen666 commented Jul 16, 2022

Jeevananthan-23 commented Mar 6, 2023

anasalkouz commented Jul 13, 2023