Skip to content
This repository has been archived by the owner on Sep 21, 2023. It is now read-only.

Synchronize the shipper configuration model with what is implemented in the agent #161

Closed
Tracked by #16
cmacknz opened this issue Nov 4, 2022 · 12 comments · Fixed by #185
Closed
Tracked by #16

Synchronize the shipper configuration model with what is implemented in the agent #161

cmacknz opened this issue Nov 4, 2022 · 12 comments · Fixed by #185
Assignees
Labels
estimation:Week Task that represents a week of work. Team:Elastic-Agent Label for the Agent team v8.7.0

Comments

@cmacknz
Copy link
Member

cmacknz commented Nov 4, 2022

The shipper currently defines a configuration file with the following format: https://github.com/elastic/elastic-agent-shipper/blob/main/elastic-agent-shipper.yml

This configuration file was implemented before the design decisions needed to integrate the shipper into the agent were finalized. We need to update it to match the final design decisions. Specifically it was decided that:

  1. The agent will provide the shipper with a copy of the entire agent policy, so that it can parse out the fields it needs to implement features that depend on input configuration like processing. Provide the relevant subset of the agent policy to the shipper as its configuration elastic-agent#617 (comment)
  2. The shipper will be enabled on a per output basis by including a shipper: sub-object in the configuration for each agent output. All new shipper specific configuration should be nested under this shipper: configuration key, including queue configuration. Define a feature flag for enabling the shipper in the agent policy, defaulting to false. elastic-agent#217 (comment). For example:
outputs:
  default:
    type: elasticsearch
    hosts: https://localhost:9200
    shipper:
       enabled: true
       queue: {}
  1. The shipper's control protocol connection will receive both input and output units. The output units will match the output section of an agent policy. There will be one input unit per connected component (process or Beat) configuring the gRPC connection between that component and the shipper, including the mTLS certificate to use.

These changes are implemented in elastic/elastic-agent#1527 which adds support for the shipper to agent.

The scope of this issue is to incorporate the new shipper configuration model into the shipper, and test that it is compatible with with the agent changes in elastic/elastic-agent#1527.

Key cases to test are:

  1. Multiple components (Beat processes) can successfully connect to a single shipper instance and publish events.
  2. Multiple outputs can be configured in an agent policy, each individually configuring a data shipper. Ensure that each individual shipper process is started and configured correctly.
@cmacknz
Copy link
Member Author

cmacknz commented Nov 8, 2022

This is the Fleet UI implementation issue for supporting the shipper in the agent policy: elastic/kibana#141508

In particular I've provided some examples of how the shipper queue and output parameters can be configured in elastic/kibana#141508 (comment)

@fearful-symmetry
Copy link
Contributor

The shipper will be enabled on a per output basis by including a shipper: sub-object in the configuration for each agent output.

@cmacknz is the plan to support multiple outputs in that config? Right now it's built on the assumption that we'll get one "main" output config that determines the behavior of the shipper.

@cmacknz
Copy link
Member Author

cmacknz commented Nov 14, 2022

When an agent policy configures multiple outputs, the agent will start multiple independent shipper processes. Each shipper process will have a single output type configured (ES, Logstash, Kafka, etc.).

We may support multiple outputs in a single shipper process in the far future. For now we do not.

@fearful-symmetry
Copy link
Contributor

fearful-symmetry commented Nov 17, 2022

Alright, updating this with the current state, and issues, since it's kind of a large set of changes to be made:

  • Still figuring out how to handle the config coming from the input units. The shipper wasn't designed on the assumption that each input unit would get its own gRPC socket, so there will be a certain amount of retooling the gRPC server code
  • Logging config is a bit hacky as we wait for this: Capture stdout/stderr of spawned components elastic-agent#1702
  • My ability to test this accurately is somewhat limited, since fleet doesn't seem to generate any kind of shipper config, so I'm just adding it myself.
  • The output config could do with hard-coded shipper settings: Namespace config values in fleet output elastic-agent#1729
  • The input settings could use hard-coded fields for shipper-specific connection config: Input configs from units should hard-code shipper connection settings elastic-agent#1744
  • Right now, elastic-agent spins up two instances of the shipper (a normal one, and one for monitoring), and I'm a tad worried that things certain network ports might step on each other.
  • I'm not sure how certain auxiliary things, like expvar monitoring, will be configured. The output unit? the CLI?

@cmacknz
Copy link
Member Author

cmacknz commented Nov 17, 2022

that each input unit would get its own gRPC socket, so there will be a certain amount of retooling the gRPC server code

There should be a single gRPC server in the shipper, which accepts a connection from each Beat. In agent terminology there should be a connection per component (process), not per unit (input). If this isn't what is being configured we should consider changing it.

Right now, elastic-agent spins up two instances of the shipper (a normal one, and one for monitoring), and I'm a tad worried that things certain network ports might step on each other

We shouldn't be configuring a second shipper for monitoring, unless monitoring is shipping to a separate Elasticsearch cluster.

I'm not sure how certain auxiliary things, like expvar monitoring, will be configured. The output unit? the CLI?

I would follow what Beats does, which I haven't looked at it in a while to know off the top of my head what this is. Following what is set in the agent.monitoring section of the policy.

@fearful-symmetry
Copy link
Contributor

fearful-symmetry commented Nov 17, 2022

@cmacknz

There should be a single gRPC server in the shipper, which accepts a connection from each Beat. In agent terminology there should be a connection per component (process), not per unit (input)

Ah, sorry, still learning the logic of how input configs work. The fake shipper used for testing elastic agent spins up a new server with every input, using the TLS and server settings specified in that input unit config, but looking at the configs I'm seeing as I develop against elastic-agent, a given instance of the shipper will get the same server endpoint in every input config, which seems to indicate that we can use any given input config to spin up the shipper's gRPC server. However, it also implies that different inputs can potentially expect different gRPC server endpoints. Might need some clarification from @blakerouse here.

We shouldn't be configuring a second shipper for monitoring, unless monitoring is shipping to a separate Elasticsearch cluster.

Right now we're starting two instances of the shipper, one with the ID shipper-default and another with shipper-monitoring:

ps aux | grep shipper
alexk    1499610  0.0  0.0 1691352 27096 pts/17  Sl+  19:36   0:00 /home/alexk/go/src/github.com/elastic/elastic-agent/build/distributions/elastic-agent-8.6.0-linux-x86_64/data/elastic-agent-8eb334/components/shipper -E logging.level=debug -E logging.files.path=./ -E logging.files.name=shipper-hack -d * -E path.data=/home/alexk/go/src/github.com/elastic/elastic-agent/build/distributions/elastic-agent-8.6.0-linux-x86_64/data/elastic-agent-8eb334/run/shipper-default
alexk    1499675  0.0  0.0 1765852 27880 pts/17  Sl+  19:36   0:00 /home/alexk/go/src/github.com/elastic/elastic-agent/build/distributions/elastic-agent-8.6.0-linux-x86_64/data/elastic-agent-8eb334/components/shipper -E logging.level=debug -E logging.files.path=./ -E logging.files.name=shipper-hack -d * -E path.data=/home/alexk/go/src/github.com/elastic/elastic-agent/build/distributions/elastic-agent-8.6.0-linux-x86_64/data/elastic-agent-8eb334/run/shipper-monitoring

They seem to get the same output config, but different input units, presumably based on what inputs are used for cluster self-monitoring.

@cmacknz
Copy link
Member Author

cmacknz commented Nov 17, 2022

Got it thanks for clarifying.

However, it also implies that different inputs can potentially expect different gRPC server endpoints. Might need some clarification from @blakerouse here.

The way to view this is that the agent is incredibly flexible in what it can provision. In the extreme each individual input is its own process connected to an input specific shipper. From an architecture perspective it makes to have this level of flexibility.

However in practice we aren't going to do that. We are going to connect each Beat process (component) to a shipper, not each input. At least in the current iteration of V2.

They seem to get the same output config, but different input units, presumably based on what inputs are used for cluster self-monitoring.

This makes sense. It is certainly easier to always provision a monitoring shipper and a non-monitoring shipper unconditionally, but I don't think we should be creating two shipper instances (and therefore two queues and outputs to be tuned) unless the user explicitly wants to send monitoring to a different instance.

Regardless this is something to address in the agent and you can ignore it for the purpose of this issue.

@blakerouse
Copy link

@cmacknz

There should be a single gRPC server in the shipper, which accepts a connection from each Beat. In agent terminology there should be a connection per component (process), not per unit (input)

Ah, sorry, still learning the logic of how input configs work. The fake shipper used for testing elastic agent spins up a new server with every input, using the TLS and server settings specified in that input unit config, but looking at the configs I'm seeing as I develop against elastic-agent, a given instance of the shipper will get the same server endpoint in every input config, which seems to indicate that we can use any given input config to spin up the shipper's gRPC server. However, it also implies that different inputs can potentially expect different gRPC server endpoints. Might need some clarification from @blakerouse here.

The fake shipper is very simple and you are correct in its implementation it would start multiple GRPC listeners per unit, but the tests know that there will only every be one so it doesn't really need to worry about that. The real shipper should only start 1 listener per its process. The "server" address will always be the same for each input unit, the certificates will be different. You can use the connecting address to determine the correct certificate to serve. Example here on how the Elastic Agent does that for its control protocol.

https://github.com/elastic/elastic-agent/blob/main/pkg/component/runtime/manager.go#L718

@blakerouse
Copy link

Regardless this is something to address in the agent and you can ignore it for the purpose of this issue.

@cmacknz We will need to fix how the monitoring output configuration is added to the Elastic Agent. At the moment it is defined as a seperate output, which is why the Elastic Agent is performing that behavior.

@cmacknz
Copy link
Member Author

cmacknz commented Nov 29, 2022

Draft PR for shipper support in Fleet: elastic/kibana#145755

@fearful-symmetry
Copy link
Contributor

@blakerouse thanks for the clarification, I got somewhat confused just looking at the code.

@cmacknz
Copy link
Member Author

cmacknz commented Jan 10, 2023

@cmacknz We will need to fix how the monitoring output configuration is added to the Elastic Agent. At the moment it is defined as a seperate output, which is why the Elastic Agent is performing that behavior.

I created an issue to track this change elastic/elastic-agent#2078

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
estimation:Week Task that represents a week of work. Team:Elastic-Agent Label for the Agent team v8.7.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants