-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow conflict-probability value of 0 #510
Allow conflict-probability value of 0 #510
Conversation
For the use case of externally generated ids (no updates), allow setting conflict-probability to the value 0.
ab7e94f
to
0f8cfa4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks promising already. I left a few minor comments about the docs and a suggestion to improve one of the new test cases.
docs/track.rst
Outdated
* ``conflicts`` (optional): Type of index conflicts to simulate. If not specified, no conflicts will be simulated. Valid values are: 'sequential' (A document id is replaced with a document id with a sequentially increasing id), 'random' (A document id is replaced with a document id with a random other id). | ||
* ``conflict-probability`` (optional, defaults to 25 percent): A number between (0, 100] that defines how many of the documents will get replaced. | ||
* ``conflicts`` (optional): Type of index conflicts to simulate. If not specified, no conflicts will be simulated (also read below on how to use non-autogenerated index ids with no conflicts). Valid values are: 'sequential' (A document id is replaced with a document id with a sequentially increasing id), 'random' (A document id is replaced with a document id with a random other id). | ||
* ``conflict-probability`` (optional, defaults to 25 percent): A number between [0, 100] that defines how many of the documents will get replaced. Combining ``conflicts=sequential`` and ``conflict-probability=0`` makes Rally generate index ids by itself, instead of relying on Elasticsearch's `Automatic ID Generation <https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#_automatic_id_generation>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Automatic ID Generation" -> "automatic id generation"?
docs/track.rst
Outdated
@@ -314,8 +314,8 @@ With the operation type ``bulk`` you can execute `bulk requests <http://www.elas | |||
* ``indices`` (optional): A list of index names that defines which indices should be used by this bulk-index operation. Rally will then only select the documents files that have a matching ``target-index`` specified. | |||
* ``batch-size`` (optional): Defines how many documents Rally will read at once. This is an expert setting and only meant to avoid accidental bottlenecks for very small bulk sizes (e.g. if you want to benchmark with a bulk-size of 1, you should set ``batch-size`` higher). | |||
* ``pipeline`` (optional): Defines the name of an (existing) ingest pipeline that should be used (only supported from Elasticsearch 5.0). | |||
* ``conflicts`` (optional): Type of index conflicts to simulate. If not specified, no conflicts will be simulated. Valid values are: 'sequential' (A document id is replaced with a document id with a sequentially increasing id), 'random' (A document id is replaced with a document id with a random other id). | |||
* ``conflict-probability`` (optional, defaults to 25 percent): A number between (0, 100] that defines how many of the documents will get replaced. | |||
* ``conflicts`` (optional): Type of index conflicts to simulate. If not specified, no conflicts will be simulated (also read below on how to use non-autogenerated index ids with no conflicts). Valid values are: 'sequential' (A document id is replaced with a document id with a sequentially increasing id), 'random' (A document id is replaced with a document id with a random other id). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we refer to "non-autogenerated ids" as "external ids" instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I googled everywhere but I haven't seen the term "external ids", however, so that's why I opted for this :)
tests/track/params_test.py
Outdated
self.assertEqual(idx("100"), next(generator)) | ||
self.assertEqual(idx("200"), next(generator)) | ||
self.assertEqual(idx("300"), next(generator)) | ||
self.assertEqual(idx("400"), next(generator)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That leaves the possibility that the generator is creating more entries than you expect. You could convert it to a list and compare the list contents to be sure it is only generating four elements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea! Addressed in 400b170
* Improve test case to match the complete list of generated documented with 0 conflict probability * Doc fixes
@danielmitterdorfer I addressed the comments in 400b170 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one typo, apart from that LGTM. No need for another review cycle.
docs/track.rst
Outdated
* ``conflicts`` (optional): Type of index conflicts to simulate. If not specified, no conflicts will be simulated (also read below on how to use non-autogenerated index ids with no conflicts). Valid values are: 'sequential' (A document id is replaced with a document id with a sequentially increasing id), 'random' (A document id is replaced with a document id with a random other id). | ||
* ``conflict-probability`` (optional, defaults to 25 percent): A number between [0, 100] that defines how many of the documents will get replaced. Combining ``conflicts=sequential`` and ``conflict-probability=0`` makes Rally generate index ids by itself, instead of relying on Elasticsearch's `Automatic ID Generation <https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#_automatic_id_generation>`_. | ||
* ``conflicts`` (optional): Type of index conflicts to simulate. If not specified, no conflicts will be simulated (also read below on how to use external index ids with no conflicts). Valid values are: 'sequential' (A document id is replaced with a document id with a sequentially increasing id), 'random' (A document id is replaced with a document id with a random other id). | ||
* ``conflict-probability`` (optional, defaults to 25 percent): A number between [0, 100] that defines how many of the documents will get replaced. Combining ``conflicts=sequential`` and ``conflict-probability=0`` makes Rally generate index ids by itself, instead of relying on Elasticsearch's `automatic iD generation <https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#_automatic_id_generation>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"iD" -> "ID"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in f00b32e
Thanks @danielmitterdorfer ! |
@dliappis can we settle for either "bug" or "enhancement" for this PR? |
@danielmitterdorfer I thought it contains a bit of both :) However, I realized there's something missing here, which are the checks in |
PR elastic#510 doesn't allow setting conflict probability to 0 as the parameter check is failing, assuming only open intervals are allowed. Fix bug and also add a number of unit tests.
For the use case of externally generated ids (no updates), allow setting conflict-probability to the value 0. Relates elastic#510
PR elastic#510 doesn't allow setting conflict probability to 0 as the parameter check is failing, assuming only open intervals are allowed. Fix bug and also add a number of unit tests.
PR elastic#510 doesn't allow setting conflict probability to 0 as the parameter check is failing, assuming only open intervals are allowed. Fix bug and also add a number of unit tests.
For the use case of externally generated ids (no updates), allow
setting conflict-probability to the value 0.