Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Fix bug with duplicate Fleet Server inputs in Cloud deployments #119925

Merged

Conversation

kpollich
Copy link
Member

Summary

Fixes #119523

Fixes a bug in our input lookup logic that causes us to create duplicate fleet-server inputs on package policies when upgrading policies in Elastic Cloud created in v7.13.

The root cause here was our lookup logic in cases where policy_template was undefined on a package policy inputs. When the "incoming" input has a policy_template and we don't find an existing input with policy_template at all (not just a mismatched template), we should use that existing input. This guarantees that policies from Kibana versions prior to #101531 in which policy_template was not always provided can still be upgraded.

How to test

  1. Comment out some code in package_to_package_policy.ts to prevent package policies from having policy_template set at all.
diff --git a/x-pack/plugins/fleet/common/services/package_to_package_policy.ts b/x-pack/plugins/fleet/common/services/package_to_package_policy.ts
index 0d40adb4bf7..8586746c160 100644
--- a/x-pack/plugins/fleet/common/services/package_to_package_policy.ts
+++ b/x-pack/plugins/fleet/common/services/package_to_package_policy.ts
@@ -72,7 +72,7 @@ export const packageToPackagePolicyInputs = (
   const hasIntegrations = doesPackageHaveIntegrations(packageInfo);
   const inputs: NewPackagePolicyInput[] = [];
   const packageInputsByPolicyTemplateAndType: {
-    [key: string]: RegistryInput & { data_streams?: string[]; policy_template: string };
+    [key: string]: RegistryInput & { data_streams?: string[] /* policy_template: string */ };
   } = {};
 
   packageInfo.policy_templates?.forEach((packagePolicyTemplate) => {
@@ -83,7 +83,7 @@ export const packageToPackagePolicyInputs = (
         ...(packagePolicyTemplate.data_streams
           ? { data_streams: packagePolicyTemplate.data_streams }
           : {}),
-        policy_template: packagePolicyTemplate.name,
+        // policy_template: packagePolicyTemplate.name,
       };
       packageInputsByPolicyTemplateAndType[inputKey] = input;
     });
@@ -128,15 +128,15 @@ export const packageToPackagePolicyInputs = (
     if (
       enableInput &&
       hasIntegrations &&
-      integrationToEnable &&
-      integrationToEnable !== packageInput.policy_template
+      integrationToEnable
+      // integrationToEnable !== packageInput.policy_template
     ) {
       enableInput = false;
     }
 
     const input: NewPackagePolicyInput = {
       type: packageInput.type,
-      policy_template: packageInput.policy_template,
+      // policy_template: packageInput.policy_template,
       enabled: enableInput,
       streams: streamsForInput,
     };
  1. Define the following config in your kibana.dev.yml to mimic the "Elastic Agent on Cloud" policy we use in Cloud:
xpack.fleet.agents.elasticsearch.host: http://localhost:9200
xpack.fleet.agents.kibana.host: http://localhost:5601

logging:
 loggers:
    - name: plugins.fleet
      appenders: [console]
      level: debug


# Cloud Agent policy from https://github.com/elastic/cloud-assets/blob/master/stackpack/kibana/config/kibana.yml
xpack.fleet.packages:
  - name: apm
    version: latest
  - name: fleet_server
    version: 0.9.1
  - name: system
    version: latest
xpack.fleet.outputs:
  - name: 'ES containerhost output'
    type: 'elasticsearch'
    id: 'es-containerhost'
    hosts: ["http://localhost:9200"]
xpack.fleet.agentPolicies:
  # Cloud Agent policy
  - name: Elastic Cloud agent policy
    description: Default agent policy for agents hosted on Elastic Cloud
    id: policy-elastic-agent-on-cloud

    # set the internal containerhost URL to ES
    data_output_id: es-containerhost
    monitoring_output_id: es-containerhost

    is_default: false
    is_managed: true
    is_default_fleet_server: true

    namespace: default
    monitoring_enabled: []
    # If a user scales up / down the Elastic Agent in Cloud, the old
    # instances will still be shown as offline for 24h
    # The reason for having a value such high is to prevent fleet-servers to self unenroll to soon
    # in case checkin cannot happen for a longer time period.
    unenroll_timeout: 86400 # 1 day TTL

    package_policies:
      - name: Fleet Server
        package:
          name: fleet_server
        inputs:
          - type: fleet-server
            keep_enabled: true

            vars:
              - name: host
                value: http://localhost:8220
                frozen: true
              - name: port
                value: 8220
                frozen: true
              - name: custom
                value: |
                  server.runtime:
                    gc_percent: 20          # Force the GC to execute more frequently: see https://golang.org/pkg/runtime/debug/#SetGCPercent
  # Default policy
  - name: Default policy
    description: Default agent policy created by Kibana

    is_default: true
    is_managed: false

    namespace: default
    monitoring_enabled:
      - logs
      - metrics

    package_policies:
      - name: system-1
        package:
          name: system
  1. From a clean Elasticsearch snapshot, start up and Kibana and ensure Fleet setup completes successfully

  2. Stash your changes to package_to_package_policy.ts or uncomment the code above. We just need this commented out during setup to reproduce the issue with the missing policy_template value 🙂

  3. Grab the ID of the "Fleet Server" package policy under your "Elastic Agent on Cloud" agent policy from the Kibana UI, and run the following in dev tools to confirm that policy_template is not set:

GET .kibana/_doc/ingest-package-policies:<package policy id>
  1. Update Fleet Server with "Upgrade integration policies" checked from the integration settings page.

  2. Verify that the upgrade was successful, and that only a single fleet_server input is defined on the agent policy

  3. Run the same dev tools query as above, and confirm that the policy data now includes the expected policy_template value.

Rework logic around looking up `originalInput` value to handle cases in
which `policy_template` is `undefined` on the policy's input object.
@kpollich kpollich added release_note:fix v8.0.0 Team:Fleet Team label for Observability Data Collection Fleet team auto-backport Deprecated - use backport:version if exact versions are needed v8.1.0 labels Nov 30, 2021
@kpollich kpollich requested a review from nchaulet November 30, 2021 01:09
@kpollich kpollich requested a review from a team as a code owner November 30, 2021 01:09
@kpollich kpollich self-assigned this Nov 30, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

Copy link
Member

@nchaulet nchaulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I tested an upgrade and it seems to fix the issue 🚀 we already have some unit test to that function maybe we can add this scenario

@kpollich
Copy link
Member Author

Looks good to me, I tested an upgrade and it seems to fix the issue 🚀 we already have some unit test to that function maybe we can add this scenario

Good call. I will add some tests here tomorrow morning and then this should be good to land.

I don't think this will be easy to backport to 7.16.x because of refactoring in this file - are we okay with this or do we need a "manual backport" fix to land in a patch release for 7.16?

@kpollich
Copy link
Member Author

Added tests in 98adb6c

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @kpollich

@kpollich
Copy link
Member Author

Waiting on #119971 to land then adding backport label for 7.16 to this

@kpollich kpollich merged commit b2dbac9 into elastic:main Nov 30, 2021
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Nov 30, 2021
…ts (elastic#119925)

* Fix bug with duplicate Fleet Server inputs

Rework logic around looking up `originalInput` value to handle cases in
which `policy_template` is `undefined` on the policy's input object.

* Set policy_template when possible

* Add tests for duplicate input case
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Nov 30, 2021
…ts (elastic#119925)

* Fix bug with duplicate Fleet Server inputs

Rework logic around looking up `originalInput` value to handle cases in
which `policy_template` is `undefined` on the policy's input object.

* Set policy_template when possible

* Add tests for duplicate input case
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
8.0
7.16

The backport PRs will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Nov 30, 2021
…ts (#119925) (#120014)

* Fix bug with duplicate Fleet Server inputs

Rework logic around looking up `originalInput` value to handle cases in
which `policy_template` is `undefined` on the policy's input object.

* Set policy_template when possible

* Add tests for duplicate input case

Co-authored-by: Kyle Pollich <[email protected]>
kibanamachine added a commit that referenced this pull request Nov 30, 2021
…ts (#119925) (#120015)

* Fix bug with duplicate Fleet Server inputs

Rework logic around looking up `originalInput` value to handle cases in
which `policy_template` is `undefined` on the policy's input object.

* Set policy_template when possible

* Add tests for duplicate input case

Co-authored-by: Kyle Pollich <[email protected]>
@kpollich kpollich deleted the 119523-fix-duplicate-fleet-server-inputs branch December 1, 2021 17:33
TinLe pushed a commit to TinLe/kibana that referenced this pull request Dec 22, 2021
…ts (elastic#119925)

* Fix bug with duplicate Fleet Server inputs

Rework logic around looking up `originalInput` value to handle cases in
which `policy_template` is `undefined` on the policy's input object.

* Set policy_template when possible

* Add tests for duplicate input case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed release_note:fix Team:Fleet Team label for Observability Data Collection Fleet team v7.16.1 v8.0.0 v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Fleet] Elastic Agent on Cloud policy incurs two Fleet Server inputs on policy upgrade
5 participants