Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] data_stream fields #980

Merged
merged 14 commits into from
Nov 11, 2020
Merged

[RFC] data_stream fields #980

merged 14 commits into from
Nov 11, 2020

Conversation

roncohen
Copy link
Contributor

@roncohen roncohen commented Sep 28, 2020

Very early draft on an RFC for the data_stream fields.

  • Have you signed the contributor license agreement?
  • Have you followed the contributor guidelines?
  • For proposing substantial changes or additions to the schema, have you reviewed the RFC process?
  • If submitting code/script changes, have you verified all tests pass locally using make test?
  • If submitting schema/fields updates, have you generated new artifacts by running make and committed those changes?
  • Is your pull request against master? Unless there is a good reason otherwise, we prefer pull requests against master and will backport as needed.
  • Have you added an entry to the CHANGELOG.next.md?

Markdown preview of this RFC

@roncohen roncohen changed the title [RFC] data_stream [RFC] data_stream fields Sep 28, 2020
ruflin
ruflin previously approved these changes Sep 29, 2020
rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
@roncohen
Copy link
Contributor Author

roncohen commented Sep 29, 2020

updated with your comments addressed, thanks @ruflin !

@roncohen roncohen marked this pull request as ready for review September 29, 2020 12:08
@webmat webmat added the RFC label Sep 29, 2020
@jamiehynds
Copy link
Contributor

@roncohen @ruflin we have an RFC to categorise datasources. Do you think we could add an appropriate field under data_stream that categorises the datasource?

@ruflin
Copy link
Contributor

ruflin commented Sep 30, 2020

@jamiehynds Good question. Definitively possible but I would like to decouple it form this RFC to get it in first without the category field to not block it on it.

One issue I see this that these fields describe the data stream itself and when I understand https://github.com/elastic/ecs/pull/958/files#r492541198 correctly, it categorised where the data is coming from. Can it be that different sources match a different category and end up in the same data stream?

I know @jpountz was also thinking about categories in the past related to EQL. I guess it would be worth to file a separate issue or discuss it directly in #958 ?

@jamiehynds
Copy link
Contributor

jamiehynds commented Sep 30, 2020

@ruflin Totally agree that the categorisation topic shouldn't block the data_stream field. Will move the discussion to #958.

The issue above is certainly something we'll need to consider. Another point is when a data source produces several data streams - do we then have nested categories for each data stream. I'll move the discussion to my issue and go from there.

I believe @seth-goodwin is also working on categories for detection rules too, which may be related.

Copy link
Member

@ebeahan ebeahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @roncohen for putting together this proposal and submitting!

The current proposal focuses on how the data_stream.* fields have been adopted as part of the new indexing strategy for Elastic Agent. I think this makes great sense given the tight relationship that the data_stream.* fields have in that larger strategy.

Should we also note any guidance on how the data_stream.* fields should or could be used by other data sources?

For example, a couple of questions I had after reviewing the proposal:

  • Should the data_stream.* fields only to be used alongside data streams
  • If not, what cases exist where the data_stream.* fields could be present, but the index isn't backing a data stream?

rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
@ebeahan
Copy link
Member

ebeahan commented Oct 13, 2020

Should any of the value limitations/restrictions discussed in elastic/kibana#75846 be included in this proposal as well?

@ruflin
Copy link
Contributor

ruflin commented Oct 14, 2020

@ebeahan Good catch, I think yes.

@cla-checker-service
Copy link

cla-checker-service bot commented Oct 14, 2020

💚 CLA has been signed

@roncohen
Copy link
Contributor Author

thanks again for the comments! I added a note on the restrictions on the values and the option to use the fields with other sources. Please let me know if i missed anything!

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great. Thanks @roncohen and @ruflin for the work so far.

Here's my initial comments

rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
@webmat
Copy link
Contributor

webmat commented Oct 15, 2020

Additionally, I have a question unrelated to this RFC. Has the team thought of a name for this indexing strategy that will better stand the test of time?

This indexing strategy is new now, but it will no longer be in 2 years :-) I call this the “New Beetle” naming issue. 😉

@roncohen
Copy link
Contributor Author

I took another round on this. Please have at it :)

Has the team thought of a name for this indexing strategy that will better stand the test of time?

I don't think we have. I assume it'll just be named "the indexing strategy" eventually ;)

@webmat
Copy link
Contributor

webmat commented Oct 22, 2020

Or what about "The recommended indexing strategy"? ;-)

@roncohen
Copy link
Contributor Author

OK, ready for another round! When Nicolas' doc is ready, we can link to it from here.

rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the adjustments, Ron.

As discussed in #980 (comment), I think a call out to this inconsistency would be appropriate.

Additionally, I've two more small things noted below as review comments. Then I think we can merge as stage 1 and move on to stage 2.

We can link to the indexing strategy docs when it becomes available. I'll make sure this doesn't get lost between stage 1 & 2 PRs.

@webmat
Copy link
Contributor

webmat commented Oct 30, 2020

We may have to adjust the naming restrictions a bit, after elastic/elasticsearch#63987

@ruflin
Copy link
Contributor

ruflin commented Oct 30, 2020

@webmat Why? . indices are currently not supported by the indexing strategy.

@webmat
Copy link
Contributor

webmat commented Oct 30, 2020

@ruflin I guess you're right. I keep conflating using data streams in general with the use of this field set, which happens to use data streams, but is actually only for one specific use case / indexing strategy.

This gets back to my main concern, that I raised months ago. This field set is called data_stream, but it's not necessary (nor always expected) to use it when using data streams. Case in point: time series hidden indices can now use data streams, yet this has no impact nor relation to this field set.

As I said back then, I get the feeling that naming these two somewhat unrelated things the same way will be confusing to people. Me first, apparently 😄

Come to think of it, this actual concern on the naming is not captured in the RFC. Could we add a section in "Concerns" to mention this? We don't need to change anything else, but at least document that this potential confusion was noted.

@ruflin
Copy link
Contributor

ruflin commented Nov 3, 2020

@webmat I look at this a bit different. data_streams can be used in any way if the users wants, but what we strongly recommend is to use data_streams with the new indexing strategy as it works best with all the parts in our stack. In the ideal case, data_streams would ONLY be used in this way.

@roncohen
Copy link
Contributor Author

roncohen commented Nov 4, 2020

This gets back to my main concern, that I raised months ago. This field set is called data_stream, but it's not necessary (nor always expected) to use it when using data streams. Case in point: time series hidden indices can now use data streams, yet this has no impact nor relation to this field set.

I see what you mean. The way i think of it, data stream fields are a convention we've decided upon, for use in the new indexing strategy which relies on data streams (cannot be used with indices). As always with conventions, you can decide to use it in places it wasn't intended for or you can chose not to use it in cases were it does fit.

Separately, it's my hope that we'll use this convention even with hidden time series data streams.

webmat
webmat previously approved these changes Nov 9, 2020
Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'm good with the RFC as it stands now. Thanks for all the discussion and adjustments 👍

I'd like an Approve review from Observability as well, to confirm nothing's missing from your point of view.

We can merge afterwards

rfcs/text/0000-data_stream-fields.md Outdated Show resolved Hide resolved
ruflin
ruflin previously approved these changes Nov 11, 2020
@ebeahan
Copy link
Member

ebeahan commented Nov 11, 2020

Excellent thanks all! With the approvals from @webmat and @ruflin, I'll update the advancement date, assign the RFC #, and merge.

@ebeahan ebeahan dismissed stale reviews from ruflin and webmat via 9a179aa November 11, 2020 16:04
@ebeahan ebeahan merged commit fad7fa8 into elastic:master Nov 11, 2020
@roncohen
Copy link
Contributor Author

thanks folks!

ebeahan added a commit to ebeahan/ecs that referenced this pull request Dec 3, 2020
Co-authored-by: Mathieu Martin <[email protected]>
Co-authored-by: Eric Beahan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants