Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2: Elastic agent monitoring data stream names should not be per process #1814

Open
cmacknz opened this issue Nov 28, 2022 · 28 comments
Open
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team V2-Architecture

Comments

@cmacknz
Copy link
Member

cmacknz commented Nov 28, 2022

The changes in #1702 changed the naming convention of the data streams used for sending agent process logs to Fleet to match how the processed are modelled in V2.

This has resulted in monitoring logs for spawned processes failing to be indexed in Elasticsearch when monitoring is enabled in Fleet. The root cause is that Fleet hard codes the list of expected monitoring data streams and uses them when adding index permissions into the agent policy. The result is that the agent has no permission to write to data streams using the new naming convention.

Example errors:

action [indices:admin/auto_create] is unauthorized for API key id [...] of user [elastic/fleet-server] on indices [metrics-elastic_agent.log_default-default], this action is granted by the index privileges [auto_configure,create_index,manage,all]

action [indices:admin/auto_create] is unauthorized for API key id [...] of user [elastic/fleet-server] on indices [metrics-elastic_agent.system/metrics_default-default], this action is granted by the index privileges [auto_configure,create_index,manage,all]

The existing monitoring data streams are defined in Fleet at https://github.com/elastic/kibana/blob/c5f20721e1879f1ebe161b0fa3b207f10ed2b6f7/x-pack/plugins/fleet/common/constants/agent_policy.ts#L15-L28

Monitoring permissions are generated here using this list: https://github.com/elastic/kibana/blob/19413b7daae983b95dbb9f5c7b39cb8f3578ebfa/x-pack/plugins/fleet/server/services/agent_policies/monitoring_permissions.ts#L21-L52

We need to either modify Fleet to generate index permissions for the new data stream names or we need to update the agent to use the index names expected in v1 (effectively they match the binary name).

Note that logs for the agent itself can successfully be sent to Fleet.

@kpollich
Copy link
Member

Could you confirm the list of new indices we'd need permissions for here? I'm happy to update the hardcoded list in Fleet. What I see so far from the error messages includes:

// Agent v2 datasets
'elastic_agent.filestream_monitoring',
'elastic_agent.log_default', // Why the `_default` here?
'elastic_agent.system',

It's probably helpful to look at Fleet's snapshot tests for these permissions as well. The full possible set of of permissions is

"logs-elastic_agent-testnamespace123",
"logs-elastic_agent.elastic_agent-testnamespace123",
"logs-elastic_agent.apm_server-testnamespace123",
"logs-elastic_agent.filebeat-testnamespace123",
"logs-elastic_agent.fleet_server-testnamespace123",
"logs-elastic_agent.metricbeat-testnamespace123",
"logs-elastic_agent.osquerybeat-testnamespace123",
"logs-elastic_agent.packetbeat-testnamespace123",
"logs-elastic_agent.endpoint_security-testnamespace123",
"logs-elastic_agent.auditbeat-testnamespace123",
"logs-elastic_agent.heartbeat-testnamespace123",
"logs-elastic_agent.cloudbeat-testnamespace123",
"metrics-elastic_agent-testnamespace123",
"metrics-elastic_agent.elastic_agent-testnamespace123",
"metrics-elastic_agent.apm_server-testnamespace123",
"metrics-elastic_agent.filebeat-testnamespace123",
"metrics-elastic_agent.fleet_server-testnamespace123",
"metrics-elastic_agent.metricbeat-testnamespace123",
"metrics-elastic_agent.osquerybeat-testnamespace123",
"metrics-elastic_agent.packetbeat-testnamespace123",
"metrics-elastic_agent.endpoint_security-testnamespace123",
"metrics-elastic_agent.auditbeat-testnamespace123",
"metrics-elastic_agent.heartbeat-testnamespace123",
"metrics-elastic_agent.cloudbeat-testnamespace123",

@blakerouse
Copy link
Contributor

@kpollich Is it possible to use wildcards? As there is a lot of possibilities and the new data_streams are based on what is defined in the policy.

It is defined as:

elastic_agent.%{input_type}_%{output_name}.%{monitoring_namespace}

Also how is this handled on upgrade? Will it result in a new policy revision? As existing keys will need to be upgraded, as well as not removed the older data-streams as pre-8.6 will still push to those older data-streams.

@kpollich
Copy link
Member

Is it possible to use wildcards? As there is a lot of possibilities and the new data_streams are based on what is defined in the policy.

Maybe? I don't really know how these permissions get handled once they're written to the agent policy. The Fleet code we're looking at here generates this chunk of an agent policy

output_permissions:
  default:
    _elastic_agent_monitoring:
      indices:
        - names:
            - logs-elastic_agent.apm_server-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.apm_server-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.auditbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.auditbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.cloudbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.cloudbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.elastic_agent-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.endpoint_security-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.endpoint_security-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.filebeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.filebeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.fleet_server-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.fleet_server-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.heartbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.heartbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.metricbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.metricbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.osquerybeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.osquerybeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.packetbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.packetbeat-default
          privileges:
            - auto_configure
            - create_doc

I think Fleet Server parses these permissions when it generates API keys for agent? If that's correct then wildcards should be supported as far as I know: https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-put-role.html#security-api-put-role-request-body

If I understand correctly, the reason we don't use wildcards today is a security precaution to prevent arbitrary access to Elasticsearch by agents. Whether that holds up in this case or not, I'm not sure. @joshdover can probably help out a bit here with some of the gaps in my knowledge.

Also how is this handled on upgrade? Will it result in a new policy revision? As existing keys will need to be upgraded, as well as not removed the older data-streams as pre-8.6 will still push to those older data-streams.

Yes we'd need to migrate all policies to the new version that includes any new permissions, which would generate a new revision.

@cmacknz
Copy link
Member Author

cmacknz commented Nov 29, 2022

Just adding another note here, in the Fleet agent details page there is a view for the logs that includes a way to filter by monitoring data stream. I think these are just prepared queries using the hard coded list of data streams that exist today. This would need to be adjusted in the UI as well. If you click through to the Kibana logs view via the "Open in Logs" link you can filter on whatever you want though.

Screen Shot 2022-11-28 at 5 17 42 PM

@cmacknz
Copy link
Member Author

cmacknz commented Nov 30, 2022

I am increasingly convinced we should use the existing data streams for monitoring data in 8.6 and deal with migrating to the format we want to use afterwards in 8.7. The hard coded data streams will cause problems as we add new input types to monitor, I'm just not convinced there is a way to remove the hard coded names in the time we have left in 8.6

@kpollich
Copy link
Member

+1 to @cmacknz suggestion - I think there are enough side effects to track down as a result of these hardcoded Fleet values that it'd be best if we could solve this on the agent side for 8.6, then iterate in 8.7.

@joshdover
Copy link
Contributor

If I understand correctly, the reason we don't use wildcards today is a security precaution to prevent arbitrary access to Elasticsearch by agents. Whether that holds up in this case or not, I'm not sure. @joshdover can probably help out a bit here with some of the gaps in my knowledge.

Yes, in theory ES should accept wildcards in the index privilege list here and it should work. This would need to be tested though.

I think these are just prepared queries using the hard coded list of data streams that exist today. This would need to be adjusted in the UI as well. If you click through to the Kibana logs view via the "Open in Logs" link you can filter on whatever you want though.

I don't think that's the case. IIRC we do terms enum query on that data_stream.dataset field on logs-elastic_agent* to populate this list. Ingesting the data into a new data stream would be feasible. We also would need to update the elastic_agent integration package for all the new stream names. Ideally, we'd switch to using the dataset_is_prefix option that APM uses to have a wildcard on the index template naming scheme. This way we don't need to add new templates for every new agent process.

That said, I'm +1 to use the existing names if it all possible for this release.

@blakerouse
Copy link
Contributor

I am happy to revert 8.6 only to use old names, but 8.7 needs to use the new names for the data streams.

Wildcarding is the best path forward, I believe the Elastic Agent being able to write-only to the logs-elastic_agent.* is all that we need. That allows the monitoring namespace to be change, new spawned components and new integrations without having to generate a new API key just to ingest monitoring logs from the Elastic Agent.

@blakerouse
Copy link
Contributor

Screen Shot 2022-11-28 at 5 17 42 PM

This UI element doesn't seem hard-coded as I have mine showing the new stream names when I have my elastic-agent injecting with superuser permissions.

@blakerouse
Copy link
Contributor

Wildcard for metrics-elastic_agent.* will also be needed.

@cmacknz
Copy link
Member Author

cmacknz commented Nov 30, 2022

Wildcarding is the best path forward, I believe the Elastic Agent being able to write-only to the logs-elastic_agent.* is all that we need.

Agreed, let's fix this for 8.6 and then open a separate issue summarizing the changes needed for 8.7. I have this issue listed as a blocker for the 8.6 release so we'll want to close it once 8.6 is fixed.

@blakerouse
Copy link
Contributor

Being this issue describes what is needed for 8.7, I would rather leave this one open and create a new one just for 8.6.

@cmacknz
Copy link
Member Author

cmacknz commented Nov 30, 2022

Makes sense, I created #1847 for 8.6 I'll update it as the tracking issue everywhere.

@cmacknz cmacknz changed the title V2: Spawned process logs cannot be sent to fleet. V2: Elastic agent monitoring data stream names should be per input, not per process Dec 7, 2022
@cmacknz cmacknz changed the title V2: Elastic agent monitoring data stream names should be per input, not per process V2: Elastic agent monitoring data stream names should be per input not per process Dec 7, 2022
@joshdover
Copy link
Contributor

While I agree that per-process data streams doesn't fit the long-term architecture we want to provide (where the processes are implementation details and not something the user should know or care about), I think we should discuss more about what the data streams should really be to provide a good experience, especially around applying features like ILM and index authorization.

IMO we should be following the pattern laid down by the data stream naming scheme in general, that is that the data stream names correlate to what kind of data is contained in the data stream rather than where the data came from. Of course we probably still want an elastic_agent.input field for filtering, but I doubt users will want to apply different data retention policies to filestream monitoring data vs. httpjson data.

Here's a quick rough idea of groupings that could make more sense from my perspective:

  • logs-elastic_agent.core
  • logs-elastic_agent.output
  • logs-elastic_agent.input
  • metrics-elastic_agent.core
  • metrics-elastic_agent.output
  • metrics-elastic_agent.input
  • metrics-elastic_agent.cpu
  • metrics-elastic_agent.memory
  • metrics-elastic_agent.network
  • metrics-elastic_agent.filesystem

This type of grouping would allow users to retain information like input and output metrics for longer periods of time than cpu and memory metrics of the agent. This may be useful for more general ingest performance tracking over time to help pinpoint data sources that may be causing a lot more noise or have seasonal patterns. cpu and memory don't have that same useful shelf-life IMO as they're more useful for debugging interactively something that you're trying to optimize.

@joshdover
Copy link
Contributor

cc @nimarezainia I think you likely have useful info to weigh in here

@cmacknz cmacknz changed the title V2: Elastic agent monitoring data stream names should be per input not per process V2: Elastic agent monitoring data stream names should not be per process Dec 8, 2022
@blakerouse
Copy link
Contributor

@joshdover At the moment we map the data streams to the component.dataset. I like @joshdover idea on its probably not something customers will want that level of control over in the case of an ILM for a specific component. We can still offer the filtering in the log viewer but with the ILM mapping to similar roles listed above.

@nimarezainia
Copy link
Contributor

+1 on this approach. CPU and Mem metrics don't need longevity and giving users the ability to apply other Elastic features like ILM to the data they collect is important.

Question:
As an example What grouping would "logs-elastic_agent.apm_server-testnamespace123" map to under this proposal?

@joshdover
Copy link
Contributor

As an example What grouping would "logs-elastic_agent.apm_server-testnamespace123" map to under this proposal?

Those logs would go to logs-elastic_agent.input-testnamespace123 with a field like unit: apm-server for filtering.

@jlind23
Copy link
Contributor

jlind23 commented Jan 5, 2023

@blakerouse @cmacknz What is the status here?
@joshdover will your PR fix all the problems mentioned here?

@blakerouse
Copy link
Contributor

@jlind23 We just need to come to a 100% agreement on the index names we want to use. Then we need to make the change in the Elastic Agent as well as the permissions that are requested for the API key.

@jlind23 jlind23 assigned cmacknz and unassigned blakerouse Jan 5, 2023
@jlind23 jlind23 removed the v8.7.0 label Jan 5, 2023
@cmacknz
Copy link
Member Author

cmacknz commented Jan 5, 2023

I think Josh's suggestions in #1814 (comment) are what we should go with, nobody seems opposed to it.

@cmacknz cmacknz removed their assignment Jan 5, 2023
@cmacknz cmacknz added the v8.8.0 label Jan 5, 2023
@joshdover
Copy link
Contributor

@joshdover will your PR fix all the problems mentioned here?

My PR doesn't address this. Agree we should be unblocked to move forward. It will require changes across Agent, Fleet Server, and the elastic_agent package.

@blakerouse
Copy link
Contributor

The more I think about this structure I do see an issue.

What does into the *.output data stream? Only the shipper logs? At the moment the beats do not place unit.id in its log message so the Elastic Agent could identify the difference between a log message for an input versus and output log.

Do we need to seperate the *.input and *.output events? Maybe we could just go with *.component and all component info go into that index?

@cmacknz
Copy link
Member Author

cmacknz commented Jan 9, 2023

Do we need to seperate the *.input and *.output events?

It seems like a useful separation to me, particularly for allowing output metrics to be separated and with a possibly different ILM policy than input metrics. The output performance metrics can be quite critical to diagnosing performance issues and tuning ingest performance.

I do agree that we are unlikely to use the logs-elastic_agent.output initially since we'd have to modify Beats to write to it. The first use of this will be the shipper in the near-ish future. I don't know that it is a problem for this to be empty initially.

@blakerouse
Copy link
Contributor

Well being that beats cannot be divided easily the .input and .output I believe that calling it .input would be incorrect.

@cmacknz
Copy link
Member Author

cmacknz commented Jan 9, 2023

The Beats logs could likely be divided with enough effort but it's probably not worth doing. This input and output division problem is solved by the introduction of the shipper, so we could just not implement this until the shipper is ready.

We could go with components instead of the input and output division, but I wouldn't want to do that if it is just to work around a temporary state of the V2 architecture where the shipper isn't usable yet.

@blakerouse
Copy link
Contributor

I like the idea of just placing them all under components. Do we really see ILM needed between a shipper and a component? To me they seem like the would almost always share the same lifecycle.

@cmacknz
Copy link
Member Author

cmacknz commented Jan 24, 2023

The output/shipper logs and metrics will have a much stronger correlation with overall ingest performance, so to me they will always be more important and it seems valuable to separate them out for that reason.

We could combine everything into a components index to start and then split the output index out later I suppose. That is definitely much easier to start and avoids having to deal with routing the Beat output metrics and logs to the right index initially.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team V2-Architecture
Projects
None yet
Development

No branches or pull requests

7 participants