Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to define routing rules in data streams #535

Merged
merged 32 commits into from
Jun 22, 2023

Conversation

mrodm
Copy link
Contributor

@mrodm mrodm commented Jun 12, 2023

What does this PR do?

This PR adds support for routing rules for data streams.
If routing_rules are defined, then dataset field becomes mandatory to be set too.

This new feature will be part of the following package-spec version 2.9.0 (updated changelog accordingly).

These routing rules definitions are going to be defined in a new file under each data stream folder. The file name would be routing_rules.yml

Example of the contents/definitions of this new file (following example in #514):

  # "Local" routing rules are included under this current dataset, not a special case
- source_dataset:  nginx
  # Route error logs to `nginx.error` when they're sourced from an error logfile
  - target_dataset: nginx.error
    if: "ctx?.file?.path?.contains('/var/log/nginx/error')"
    namespace:
      - {{labels.data_stream.namespace}}
      - default

  # Route access logs to `nginx.access` when they're sourced from an access logfile
  - target_dataset: nginx.access
    if: "ctx?.file?.path?.contains('/var/log/nginx/access')"
    namespace:
      - {{labels.data_stream.namespace}}
      - default
  
# Route K8's container logs to this catch-all dataset for further routing
- source_dataset: k8s.router
  - target_dataset: nginx
    if: "ctx?.container?.image?.name == 'nginx'"
    namespace:
      - {{labels.data_stream.namespace}}
      - default
   
  # Route syslog entries tagged with nginx to this catch-all dataset
- source_dataset: syslog
  - target_dataset: nginx
    if: "ctx?.tags?.contains('nginx')"
    namespace:
      - {{labels.data_stream.namespace}}
      - default

Why is it important?

Integration packages need to expose their routing rules as part of their data stream manifest files.

Checklist

  • I have added test packages to test/packages that prove my change is effective.
    • Added a new data-stream in good_v2
    • Added a new test package bad_routing_rules
  • I have added an entry in spec/changelog.yml.

Related issues

How to test

  1. Build elastic-package with the latest changeset from this branch. Example:
    go mod edit -replace github.com/elastic/package-spec/v2=github.com/mrodm/package-spec/v2@4c8a7e2
    go mod tidy
    make build
    
  2. Update a package including routing_rules key and defining dataset field.
  3. Build package, start an Elastic stack and installs that new package vesrion
    elastic-package build -v
    elastic-package stack down ; elastic-package stack up -v -d
    # install package using the CLI or through the UI
    elastic-package install -v 
    

@elasticmachine
Copy link

elasticmachine commented Jun 12, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-06-21T15:08:48.609+0000

  • Duration: 9 min 40 sec

Test stats 🧪

Test Results
Failed 0
Passed 886
Skipped 0
Total 886

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link

elasticmachine commented Jun 12, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (8/8) 💚
Files 70.968% (22/31) 👎 -2.366
Classes 74.419% (32/43) 👎 -1.772
Methods 57.778% (78/135) 👎 -1.313
Lines 43.72% (731/1672) 👎 -1.481
Conditionals 100.0% (0/0) 💚

@jlind23
Copy link
Collaborator

jlind23 commented Jun 13, 2023

@mrodm is this one ready to be reviewed?

@mrodm
Copy link
Contributor Author

mrodm commented Jun 13, 2023

@mrodm is this one ready to be reviewed?

Not yet, pending some changes. I'm working to include the routing_rules key in the same data stream manifest. So it can be removed for spec versions before 2.9.0

@mrodm mrodm self-assigned this Jun 13, 2023
Comment on lines 351 to 357
allOf:
- if:
required:
- routing_rules
then:
required:
- dataset
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case routing_rules is defined, dataset becomes a required field.

@mrodm
Copy link
Contributor Author

mrodm commented Jun 13, 2023

/test

type: object
additionalProperties: false
properties:
dataset:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be more representative here ? dataset or package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option in the processor is called dataset, right? Though it seems it could accept a package name.
I would use dataset, to be coherent with the processor, or name if we want to be generic enough, but maybe name is too generic (would it be the option passed to the processor or would it be a name for the route itself?).

Copy link
Contributor Author

@mrodm mrodm Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly the docs and the example in the issue #514, this field would define the source dataset to be used by the reroute processor to check for documents to be routed to another dataset. This latter dataset would be the ones defined in each element under rules key.

Being that, would dataset be a good name? another options that I can think of source, source_dataset? @jsoriano @juliaElastic

If so, I would add these description to these fields

--- spec/integration/data_stream/manifest.spec.yml
+++ spec/integration/data_stream/manifest.spec.yml
@@ -364,8 +364,10 @@ spec:
         additionalProperties: false
         properties:
           dataset:
+            description: Source dataset to be used for this reroute processsor
             type: string
           rules:
+            description: List of routing rules
             type: array
             items:
               $ref: "#/definitions/routing_rule"

These definitions are now defined in its own file spec/integration/data_stream/routing_rules.spec.yml

Copy link

@juliaElastic juliaElastic Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like source_dataset and target_dataset, it is very easy to understand.

@@ -307,6 +332,29 @@ spec:
dynamic_namespace:
description: When set to true, agents running this integration are granted data stream privileges for all namespaces of its type
type: boolean
routing_rules: # technical preview
Copy link
Contributor Author

@mrodm mrodm Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to install a package with this new routing_rules key in 8.7.1 and it is installed successfully. No error and no reference to those processors neither.

Should we add some validation rule for this ? Would it be needed to set here some validation rule regarding minimum kibana version? Currently, it would be like ignored by Fleet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends, if we consider this as an additional feature that will work only where supported, then I guess it is fine to leave it without the check.
But in theory we should add a validation to avoid confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, it could be kept as it is without validation rule

Comment on lines 13 to 15
- description: Add support to define routing rules in data streams (technical preview)
type: enhancement
link: https://github.com/elastic/package-spec/pull/535
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added technical preview since it is documented also that reroute processors is in technical preview too. https://www.elastic.co/guide/en/elasticsearch/reference/current/reroute-processor.html

Is that ok ? @jsoriano @juliaElastic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, yes.

@mrodm mrodm marked this pull request as ready for review June 13, 2023 15:54
@mrodm mrodm requested a review from a team as a code owner June 13, 2023 15:54
@mrodm mrodm requested a review from juliaElastic June 13, 2023 15:54
Comment on lines 13 to 15
- description: Add support to define routing rules in data streams (technical preview)
type: enhancement
link: https://github.com/elastic/package-spec/pull/535
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, yes.

@@ -169,6 +169,31 @@ spec:
hidden:
description: Makes the data stream hidden
type: boolean
route:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are not defining the routes on separate files at the end then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wanted to remove this new key routing_rules in case spec version is < 2.9.0, I had to do in this way (adding it to the manifest).

Adding it as a new file, I would have to remove the entry also in the spec.yml but I think it is not supported json patches in those files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, but well, we should give priority to how we want packages to look like. If this is the only reason to keep this in the manifest, lets add support for json patches in the folder spec files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check how to take into account JSON patches in folder spec files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support to add JSON patches in folder spec files will be added in this PR
#540

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a possible drawback of moving this to its own file.
If routing_rules are defined in it own file, the condition to set as required the dataset field , I think it cannot be achieved. They would be in different files.

Referring to this block:

  allOf:
    - if:
        required:
          - routing_rules
      then:
        required:
          - dataset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new validation rule in code for this requirement, since it cannot be achieved with if-then-else block in JSON schema.

Comment on lines 181 to 185
if:
type: string
examples:
- "ctx?.file?.path?.contains('/var/log/nginx/error')"
- "ctx?.container?.image?.name == 'nginx'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this if is not interpreted by json schema 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been doing some tests and it works as expected.
To be considered that if as an if-then-else block, I would say it has to be at the same level (indentation) as properties key.

@@ -307,6 +332,29 @@ spec:
dynamic_namespace:
description: When set to true, agents running this integration are granted data stream privileges for all namespaces of its type
type: boolean
routing_rules: # technical preview
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends, if we consider this as an additional feature that will work only where supported, then I guess it is fine to leave it without the check.
But in theory we should add a validation to avoid confusion.

spec/integration/data_stream/manifest.spec.yml Outdated Show resolved Hide resolved
@@ -169,6 +169,31 @@ spec:
hidden:
description: Makes the data stream hidden
type: boolean
route:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Call this routing_rule?

Suggested change
route:
routing_rule:

type: object
additionalProperties: false
properties:
dataset:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option in the processor is called dataset, right? Though it seems it could accept a package name.
I would use dataset, to be coherent with the processor, or name if we want to be generic enough, but maybe name is too generic (would it be the option passed to the processor or would it be a name for the route itself?).

type: object
properties:
dataset:
type: string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this corresponds to dataset in the reroute processor, this can be also a list: https://www.elastic.co/guide/en/elasticsearch/reference/current/reroute-processor.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.
Reading that definition, I set both string and array of strings for both dataset and namespace keys.

Comment on lines 337 to 338
type: array
items:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try this if you want to try to make this an object instead of an array:

Suggested change
type: array
items:
type: object
additionalProperties:

Copy link
Contributor Author

@mrodm mrodm Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applying this suggestion:

--- spec/integration/data_stream/manifest.spec.yml
+++ spec/integration/data_stream/manifest.spec.yml
@@ -358,20 +358,11 @@ spec:
           type: boolean
     routing_rules:  # technical preview
       description: Routing rules set.
-      type: array
-      items:
-        type: object
-        additionalProperties: false
-        properties:
-          dataset:
-            type: string
-          rules:
-            type: array
-            items:
-              $ref: "#/definitions/routing_rule"
-        required:
-          - dataset
-          - rules
+      type: object
+      additionalProperties:
+        type: array
+        items:
+          $ref: "#/definitions/routing_rule"
   allOf:
     - if:
         required:

works and it would be like in the example given in the issue.

But it cannot be applied, since some elements under routing_rules could contain dots, for instance k8s.router. As it is implemented now package-spec, every dot in these keys are expanded as objects (link)

I think we would need to keep this a list of elements.

Example of the error:

2023/06/14 12:57:27 Warning: package using an unreleased version of the spec (2.9.0-next)
    validator_test.go:476: 
        	Error Trace:	/home/mariorodriguez/Coding/work/package-spec/code/go/pkg/validator/validator_test.go:476
        	Error:      	Received unexpected error:
        	            	found 1 validation error:
        	            	   1. file "../../../../test/packages/good_v2/data_stream/routing_rules/manifest.yml" is invalid: field routing_rules.k8s: Invalid type. Expected: array, given: object
        	Test:       	TestValidateWarnings/good_v2

mrodm added 4 commits June 20, 2023 10:50
As routing rules are not defined in its own file, its definition has
been updated to be an array directly. No need to have a "routing_rules"
key.
Updated description for source dataset.
@mrodm mrodm force-pushed the add_routing_rules branch from 17bf981 to 56671f0 Compare June 20, 2023 11:10
@@ -131,6 +131,7 @@ func (s Spec) rules(pkgType string, rootSpec spectypes.ItemSpec) validationRules
{fn: semantic.ValidateILMPolicyPresent, since: semver.MustParse("2.0.0"), types: []string{"integration"}},
{fn: semantic.ValidateProfilingNonGA, types: []string{"integration"}},
{fn: semantic.ValidateKibanaObjectIDs, types: []string{"integration"}},
{fn: semantic.ValidateRoutingRulesAndDataset, types: []string{"integration"}, since: semver.MustParse("2.9.0")},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added here the since, to avoid checking for routing rules before 2.9.0.

It has been added a JSON Patch to remove it before 2.9.0.

@mrodm mrodm requested a review from jsoriano June 21, 2023 14:29
jsoriano
jsoriano previously approved these changes Jun 21, 2023
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some nitpicking but can be ignored 🙂

spec/integration/data_stream/spec.yml Outdated Show resolved Hide resolved
type: string
rules:
description: List of routing rules
type: array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Maybe add a comment here mentioning that this is not an object because using the source dataset as key would require to support keys with dots.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, better to keep it as a comment

Co-authored-by: Jaime Soriano Pastor <[email protected]>
jsoriano
jsoriano previously approved these changes Jun 21, 2023
mrodm added 2 commits June 21, 2023 17:02
Adding minItems key results in an error: minItems must be of an integer,
that could be related to the conversions performed in package-spec
between YAML and JSON formats.
Copy link

@juliaElastic juliaElastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Fleet] Add support for routing rules in integrations
5 participants