Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Datastreams #1092

Merged
merged 9 commits into from
Oct 22, 2020
Merged

Conversation

gingerwizard
Copy link

Addresses #1053

This PR adds support for data streams. I've tried to address this in a backwards compatible way i.e. it should in no way impact current use of indices. The PR itself is of limited use until we have #1054 as we would be entirely reliant on dynamic mappings otherwise. Assuming this PR is good, i will address this functionality next and then document together ie. i propose we don't document until #1054 is addressed.

This PR supports creating and deleting data streams through new explicit operations. Furthermore, we attempt to provide the same support for data streams as indices. More specifically, data-streams is a top level key in the track and target-data-stream can be specified on a corpus documents. These definitions will be re-usable in:

  1. Force Merge
  2. Search
  3. Bulk Indexing. This is probably the most invasive as we support a create action.

In all cases we prefer the specification of an index over a data stream if both are specified e.g. we don't error on the specification of both.

Finally, there are cases where internally a data stream value is used to set an index construct. This can appear confusing but the ES api doesn't extinguish also e.g. a bulk operation would contain {"create": {"_index": "logs-k8-application.log-default"}} when indexing into a data stream

I've held off implementing statistics. I suspect we will want to implement the stats endpoint for datastreams as a telemetry device.

Finally, PR doesn't include integration tests. We can either wait for #1054 or add limited now e.g. creation and deletion but not indexing.

@danielmitterdorfer danielmitterdorfer added :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. enhancement Improves the status quo labels Oct 19, 2020
@danielmitterdorfer danielmitterdorfer added this to the 2.0.2 milestone Oct 19, 2020
Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I did an initial pass.

esrally/resources/track-schema.json Outdated Show resolved Hide resolved
esrally/track/loader.py Outdated Show resolved Hide resolved
esrally/driver/runner.py Outdated Show resolved Hide resolved
esrally/driver/runner.py Outdated Show resolved Hide resolved
esrally/driver/runner.py Outdated Show resolved Hide resolved
esrally/track/params.py Show resolved Hide resolved
@@ -344,10 +408,14 @@ def __init__(self, track, params, **kwargs):
super().__init__(track, params, **kwargs)
if len(track.indices) == 1:
default_index = track.indices[0].name
elif len(track.data_streams) == 1:
default_index = track.data_streams[0].name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more generic name, such as default_target would be less confusing. This applies similarly also to other parts of this PR such as ForceMergeParamSource.

Copy link
Author

@gingerwizard gingerwizard Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Should we also prevent the use of index when data streams are in use and vise versa? This would be consistent with our proposal for the track i.e. data-streams and indices cannot co-exist.

Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! Great work.

Did an initial pass and left a few comments.

esrally/track/params.py Outdated Show resolved Hide resolved
esrally/track/loader.py Outdated Show resolved Hide resolved
esrally/track/params.py Outdated Show resolved Hide resolved
esrally/track/params.py Outdated Show resolved Hide resolved
esrally/track/params.py Outdated Show resolved Hide resolved
esrally/track/params.py Outdated Show resolved Hide resolved
esrally/track/params.py Outdated Show resolved Hide resolved
esrally/track/params.py Outdated Show resolved Hide resolved
esrally/track/params.py Outdated Show resolved Hide resolved
esrally/track/params.py Outdated Show resolved Hide resolved
@gingerwizard
Copy link
Author

@dliappis ready for review. Assuming this is good, i'd like to hold off docs + int tests until we have Index Templates. These are really needed to exploit this feature properly and any int tests wouldn't be representative.

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual change looks fine to me (apart from nits) but there are a lot of unrelated changes in there that I think we should revert.

esrally/track/loader.py Outdated Show resolved Hide resolved
esrally/track/loader.py Outdated Show resolved Hide resolved
esrally/track/loader.py Outdated Show resolved Hide resolved
esrally/track/loader.py Outdated Show resolved Hide resolved
esrally/track/loader.py Outdated Show resolved Hide resolved
esrally/track/loader.py Outdated Show resolved Hide resolved
esrally/track/loader.py Outdated Show resolved Hide resolved
esrally/track/loader.py Outdated Show resolved Hide resolved
esrally/track/loader.py Outdated Show resolved Hide resolved
esrally/track/params.py Outdated Show resolved Hide resolved
@gingerwizard
Copy link
Author

The actual change looks fine to me (apart from nits) but there are a lot of unrelated changes in there that I think we should revert.

Unrelated changes are formatting on long lines which my editor had configured. I'll revert but i do think we need a max line length and wrap appropriately.

@gingerwizard
Copy link
Author

@dm @dliappis apologies for the formatting changes here. I can’t imagine how horrible this was to review. I’ve re-pushed with most of the formatting reverted. I’ve left a few formatting changes which i could revert but which make sense to me.

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants