tsdb: Switch split16 to be a single file #370

pquentin · 2023-01-25T12:51:15Z

That way, it will be split in 16, but each part will start from beginning to end.

martijnvg

Thanks @pquentin! I left a question for my own understanding.
LGTM

martijnvg · 2023-01-25T12:57:12Z

tsdb/track.json

-            "source-file": "documents-split-15.json.bz2",
-            "document-count": 7289606,
-            "uncompressed-bytes": 8252803345
+            "source-file": "documents-split16.json.bz2",


Question: how does Rally know how to split this file into 16 parts? (and each part having the correct date range)

Splitting the file in 16 is the default behavior of Rally, provided we configure it to use 16 bulk indexing clients. I rearranged the original file to make sure that each of the 16 splits was starting from beginning to end. Given the previous split16 work, all I had to do was:

cat documents-split-0.json documents-split-1.json documents-split-2.json documents-split-3.json documents-split-4.json documents-split-5.json documents-split-6.json documents-split-7.json documents-split-8.json documents-split-9.json documents-split-10.json documents-split-11.json documents-split-12.json documents-split-13.json documents-split-14.json documents-split-15.json > documents-split16.json

Does that make sense?

I see, and each of the documents-split-N.json files has an equal size, so that Rally knows exactly how to split the documents-split16.json file?

Yes, each bulk indexing client will split and mmap its own part of file independently based on its id (from 0 to 15) and the number of lines (which is why we have to provide document-count).

That way, it will be split in 16, but each part will start from beginning to end.

tsdb: Switch split16 to be a single file

1e02c02

That way, it will be split in 16, but each part will start from beginning to end.

pquentin added the enhancement label Jan 25, 2023

pquentin requested a review from martijnvg January 25, 2023 12:51

pquentin self-assigned this Jan 25, 2023

martijnvg approved these changes Jan 25, 2023

View reviewed changes

pquentin merged commit 27f3554 into elastic:master Jan 25, 2023

pquentin deleted the single-file-split16 branch January 25, 2023 13:33

martijnvg mentioned this pull request Jan 26, 2023

Skip duplicate checks on segments that don't document's timestamp elastic/elasticsearch#92456

Merged

pquentin mentioned this pull request Jan 27, 2023

Allow indexing data in order with multiple indexing clients elastic/rally#1650

Closed

pquentin added a commit that referenced this pull request Feb 1, 2023

tsdb: Switch split16 to be a single file (#370)

b2f887a

That way, it will be split in 16, but each part will start from beginning to end.

pquentin added a commit that referenced this pull request Feb 1, 2023

tsdb: Switch split16 to be a single file (#370)

4cc0b50

That way, it will be split in 16, but each part will start from beginning to end.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tsdb: Switch split16 to be a single file #370

tsdb: Switch split16 to be a single file #370

pquentin commented Jan 25, 2023

martijnvg left a comment

martijnvg Jan 25, 2023

pquentin Jan 25, 2023

martijnvg Jan 25, 2023

pquentin Jan 25, 2023 •

edited

Loading

martijnvg Jan 25, 2023

tsdb: Switch split16 to be a single file #370

tsdb: Switch split16 to be a single file #370

Conversation

pquentin commented Jan 25, 2023

martijnvg left a comment

Choose a reason for hiding this comment

martijnvg Jan 25, 2023

Choose a reason for hiding this comment

pquentin Jan 25, 2023

Choose a reason for hiding this comment

martijnvg Jan 25, 2023

Choose a reason for hiding this comment

pquentin Jan 25, 2023 • edited Loading

Choose a reason for hiding this comment

martijnvg Jan 25, 2023

Choose a reason for hiding this comment

pquentin Jan 25, 2023 •

edited

Loading