Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APM switch to Elastic Agent fails with cryptic error #121934

Closed
simitt opened this issue Dec 23, 2021 · 26 comments
Closed

APM switch to Elastic Agent fails with cryptic error #121934

simitt opened this issue Dec 23, 2021 · 26 comments
Labels
bug Fixes for quality problems that affect the customer experience Team:Fleet Team label for Observability Data Collection Fleet team v7.16.2

Comments

@simitt
Copy link
Contributor

simitt commented Dec 23, 2021

Stack version: 7.16.2

Describe the issue:
A user reported that an error occurs when navigating to Kibana/APM/Settings/Schema and trying to initiate the switch to APM Integration.
From the Kibana logs:

"YAMLException: name of an alias node must contain at least one character at line 27, column 14:\n              - *\n                 ^\n    at generateError (/usr/share/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:167:10)\n    at throwError (/usr/share/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:173:9)\n    at readAlias (/usr/share/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1270:5)\n    at composeNode (/usr/share/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1368:20)\n    at readBlockMapping (/usr/share/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1036:16)\n    at composeNode (/usr/share/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1359:12)\n    at readBlockSequence (/usr/share/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:955:5)\n    at composeNode (/usr/share/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1358:12)\n    at readBlockMapping (/usr/share/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1089:11)\n    at composeNode (/usr/share/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1359:12)\n    at readBlockMapping (/usr/share/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1089:11)\n    at composeNode (/usr/share/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1359:12)\n    at readBlockMapping (/usr/share/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1089:11)\n    at composeNode (/usr/share/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1359:12)\n    at readDocument (/usr/share/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1525:3)\n    at loadDocuments (/usr/share/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1588:5) {\n  reason: 'name of an alias node must contain at least one character',\n  mark: Mark {\n    name: null,\n    buffer: 'apm-server:\\n' +\n      '    auth:\\n' +\n      '        anonymous:\\n' +\n      '            allow_agent:\\n' +\n      '            allow_service:\\n' +\n      '            enabled: \\n' +\n      '            rate_limit:\\n' +\n      '                event_limit: 10\\n' +\n      '                ip_limit: \\n' +\n      '        api_key:\\n' +\n      '            enabled: true\\n' +\n      '            limit: \\n' +\n      '        secret_token: xyz\\n' +\n      '    capture_personal_data: \\n' +\n      '    idle_timeout: \\n' +\n      '    default_service_environment: \\n' +\n      '    expvar.enabled: \\n' +\n      '    host: 0.0.0.0:8200\\n' +\n      '    max_connections: \\n' +\n      '    max_event_size: \\n' +\n      '    max_header_size: \\n' +\n      '    read_timeout: 3600\\n' +\n      '    response_headers: \\n' +\n      '    rum:\\n' +\n      '        allow_headers:\\n' +\n      '        allow_origins:\\n' +\n      '          - *\\n' +\n      '        enabled: true\\n' +\n      '        exclude_from_grouping: \\n' +\n      '        library_pattern: \\n' +\n      '        response_headers: \\n' +\n      '    shutdown_timeout: 30s\\n' +\n      '    ssl:\\n' +\n      '        enabled: true\\n' +\n      '        certificate: /app/config/certs/node.crt\\n' +\n      '        key: /app/config/certs/node.key\\n' +\n      '        key_passphrase: \\n' +\n      '        supported_protocols:\\n' +\n      '        cipher_suites:\\n' +\n      '        curve_types:\\n' +\n      '    write_timeout: \\n' +\n      '\\x00',\n    position: 606,\n    line: 26,\n    column: 13\n  }\n}"

Steps to reproduce:
It is not clear how to reproduce this, but the logs contains lots of data so this might be enough for finding the root cause for the issue.

Errors in browser console (if relevant):
Capture d’écran 2021-12-22 à 14 50 12

@simitt simitt added bug Fixes for quality problems that affect the customer experience Team:APM All issues that need APM UI Team support apm:fleet v7.16.2 labels Dec 23, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:apm)

@ruflin
Copy link
Contributor

ruflin commented Jan 31, 2022

@ogupte Did you manage to reproduce this? Have we seen other case of this? If not, I would suggest to close this assuming it was an error that was resolved in Fleet since then or similar (@joshdover )

@nugroho-exp
Copy link

@ruflin I still have the same issue. We are on v7.16.2 on Elastic Cloud service.

image

@ruflin
Copy link
Contributor

ruflin commented Jan 31, 2022

@nugroho-expereo Thanks for the heads up. Will definitively keep it open then.

@joshdover
Copy link
Contributor

@nugroho-expereo or @simitt do you know what your APM alias node field was set to before trying to run this upgrade?

@nugroho-exp
Copy link

@joshdover No I didn't set anything before running the switch. How do I know if alias node field is set?

FYI We setup the APM since v7.11 and upgraded to 7.12, 7.13, 7.14, 7.15, 7.16 using Elasticsearch service console.

@simitt
Copy link
Contributor Author

simitt commented Feb 2, 2022

@joshdover I wasn't able to reproduce, so cannot offer further advice unfortunately.

@ogupte
Copy link
Contributor

ogupte commented Feb 28, 2022

I'm also unable to reproduce this error in EC.

I'm curious to inspect your saved APM schema object via GET .kibana/_doc/apm-server-schema:apm-server-schema. There may be a particular setting that should be transformed before passing it to fleet to the new cloud policy.

We don't do any YAML conversions in this code in APM UI, but I'm guessing the YAML error originately from the compiled input as converts to policy yaml in Fleet?

@ogupte
Copy link
Contributor

ogupte commented Mar 1, 2022

After working thru a known bad schema, i was finally able to consistently reproduce the issue.

The setup

curl --request POST \
  --url http://localhost:5603/api/apm/fleet/apm_server_schema \
  --header 'Authorization: Basic YWRtaW46Y2hhbmdlbWU=' \
  --header 'Content-Type: application/json' \
  --header 'kbn-xsrf: true' \
  --data '{
	"schema": {
		"apm-server.host": "0.0.0.0:8200",
		"apm-server.secret_token": "asdfkjhasdf",
		"apm-server.api_key.enabled": true,
		"apm-server.read_timeout": 3600,
		"apm-server.register.ingest.pipeline.enabled": true,
		"apm-server.rum.enabled": true,
		"apm-server.rum.allow_origins": [
			"*"
		],
		"apm-server.rum.rate_limit": 10,
		"apm-server.shutdown_timeout": "30s",
		"logging.level": "error",
		"logging.metrics.enabled": false,
		"queue.mem.events": 2000,
		"queue.mem.flush.min_events": 267,
		"queue.mem.flush.timeout": "1s",
		"setup.template.settings.index.auto_expand_replicas": "0-1",
		"setup.template.settings.index.number_of_replicas": 1,
		"setup.template.settings.index.number_of_shards": 1
	}
}'

One of the fields in the saved schema object is "apm-server.rum.allow_origins": [ "*" ],. So APM UI sets [ "*" ] as the value for the rum_allow_origins input var. It seems that a YAML conversion takes place in the fleet plugin API call (packagePolicyService.create) which fails with the following error:

server    log   [00:37:15.910] [info][apm][plugins] Fleet migration on Cloud - apmPackagePolicy create start
server    log   [00:37:17.347] [error][apm][plugins] YAMLException: name of an alias node must contain at least one character at line 27, column 14:
              - *
                 ^
    at generateError (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:167:10)
    ...
    at loadDocuments (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1588:5) {
  reason: 'name of an alias node must contain at least one character',
  mark: Mark {
    name: null,
    buffer: 'apm-server:\n' +
      '    auth:\n' +
      '        anonymous:\n' +
      '            allow_agent:\n' +
      ...
      '    response_headers: \n' +
      '    rum:\n' +
      '        allow_headers:\n' +
      '        allow_origins:\n' +
      '          - *\n' +
      '        enabled: true\n' +
      '        exclude_from_grouping: \n' +
      '        library_pattern: \n' +
      '        response_headers: \n' +
      '    shutdown_timeout: 30s\n' +
      ...
  }
}

The problem

It seems that the YAML conversion sees the - * as a malformed alias reference (aliases are always prefixed with *). The fix for this would be to wrap the * in quotes before saving the APM Schema to kibana ("apm-server.rum.allow_origins": [ "\"*\"" ],). Once the * is wrapped in quotes, the migration is able to complete successfully.

It should be noted that the rum_allow_origins input var already defines a default value as '"*"', where asterisk * is wrapped in quotes. Also the Edit integration page in Fleet fails validation when a user prefixes a value with an asterisk * unless it's wrapped in quotes.

The fix

The fix to wrap quotes in the migration schema object can be done in either APM Server before it saves the object to Kibana or in Kibana transforming the invalid fields to valid ones as it creates the new package policy.

@joshdover
Copy link
Contributor

Interesting find, thanks for the very detailed investigation and write-up @ogupte.

Also the Edit integration page in Fleet fails validation when a user prefixes a value with an asterisk * unless it's wrapped in quotes.

This makes me think we should solve this on the Fleet side by wrapping all input string values in quotes when rendering the handlebar templates, before we parse the documents with YAML. I can't think of a reason not to do this.

@ogupte is there an extended stack trace that would show us where in the Fleet code this is happening? My guess is it's part of the compileTemplate function

@simitt
Copy link
Contributor Author

simitt commented Mar 1, 2022

+1 on fixing this directly in Fleet;
for context there were some issues with the yaml parsing in Fleet before, that required escaping * - #91401

@ogupte
Copy link
Contributor

ogupte commented Mar 1, 2022

@ogupte is there an extended stack trace that would show us where in the Fleet code this is happening? My guess is it's part of the compileTemplate function

unfortunately not, The YAML exception must be wrapped, thrown out of band, or otherwise obscured. :(

@ogupte
Copy link
Contributor

ogupte commented Mar 1, 2022

here's the full stack trace from the logs for reference, it doesn't seem very helpful:

server    log   [00:37:15.910] [info][apm][plugins] Fleet migration on Cloud - apmPackagePolicy create start
server    log   [00:37:17.347] [error][apm][plugins] YAMLException: name of an alias node must contain at least one character at line 27, column 14:
              - *
                 ^
    at generateError (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:167:10)
    at throwError (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:173:9)
    at readAlias (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1270:5)
    at composeNode (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1368:20)
    at readBlockMapping (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1036:16)
    at composeNode (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1359:12)
    at readBlockSequence (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:955:5)
    at composeNode (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1358:12)
    at readBlockMapping (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1089:11)
    at composeNode (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1359:12)
    at readBlockMapping (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1089:11)
    at composeNode (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1359:12)
    at readBlockMapping (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1089:11)
    at composeNode (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1359:12)
    at readDocument (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1525:3)
    at loadDocuments (/Users/ogupte/github/kibana/node_modules/js-yaml/lib/js-yaml/loader.js:1588:5) {
  reason: 'name of an alias node must contain at least one character',
  mark: Mark {
    name: null,
    buffer: 'apm-server:\n' +
      '    auth:\n' +
      '        anonymous:\n' +
 [redacted]
      '        curve_types:\n' +
      '    write_timeout: \n' +
      '\x00',
    position: 606,
    line: 26,
    column: 13
  }
}
server   error  [00:37:15.871]  Error: Internal Server Error
    at HapiResponseAdapter.toError (/Users/ogupte/github/kibana/src/core/server/http/router/response_adapter.ts:130:19)
    at HapiResponseAdapter.toHapiResponse (/Users/ogupte/github/kibana/src/core/server/http/router/response_adapter.ts:79:19)
    at HapiResponseAdapter.handle (/Users/ogupte/github/kibana/src/core/server/http/router/response_adapter.ts:71:17)
    at Router.handle (/Users/ogupte/github/kibana/src/core/server/http/router/router.ts:276:34)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at handler (/Users/ogupte/github/kibana/src/core/server/http/router/router.ts:230:13)
    at exports.Manager.execute (/Users/ogupte/github/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
    at Object.internals.handler (/Users/ogupte/github/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20)
    at exports.execute (/Users/ogupte/github/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20)
    at Request._lifecycle (/Users/ogupte/github/kibana/node_modules/@hapi/hapi/lib/request.js:371:32)
    at Request._execute (/Users/ogupte/github/kibana/node_modules/@hapi/hapi/lib/request.js:281:9)

@joshdover joshdover added the Team:Fleet Team label for Observability Data Collection Fleet team label Mar 3, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@joshdover joshdover removed Team:APM All issues that need APM UI Team support apm:fleet labels Mar 3, 2022
@joshdover
Copy link
Contributor

@ogupte We'll handle fixing this in Fleet and backporting to 7.17.x. Would you mind handling documenting your workaround for fixing the issue wherever it would be appropriate?

@joshdover joshdover self-assigned this Mar 4, 2022
@joshdover
Copy link
Contributor

joshdover commented Mar 4, 2022

Some initial findings:

While #93585 has a good explanation of why this problem presents itself when converting between YAML to JSON back to YAML, I don't agree with the conclusion that putting the onus on package developers and users makes sense. We should be able to handle at least the most common scenarios (like a string that starts with a *).

It appears the initial implementation of this in #91418 was too heavy handed and could be re-implemented in a safer way. Instead of quoting any strings that contain any special characters at any position, we could start by quoting only strings that start with * or one of the other known special characters.

@joshdover
Copy link
Contributor

Another area where we removed default quoting in #114123 opted to use !!str directive as a workaround. @nchaulet Do you think it would be problematic to reintroduce quoting, but only in the case where the value starts with * or &?

@nchaulet
Copy link
Member

nchaulet commented Mar 4, 2022

Another area where we removed default quoting in #114123 opted to use !!str directive as a workaround. @nchaulet Do you think it would be problematic to reintroduce quoting, but only in the case where the value starts with * or &?

On the Fleet side we do not really know how variable are used and some packaged are using string concatenation (like Kafka here for example https://epr.elastic.co/package/kafka/1.1.0/data_stream/log/agent/stream/log.yml.hbs ) It's why we choose to put the responsibility here on the package developper side to quote their values.

paths:
{{#each paths as |path i|}}
 - {{../kafka_home}}{{path}}
{{/each}}

@joshdover
Copy link
Contributor

@nchaulet is correct, Fleet UI really can't assume much at all about how variables are being used in package templates. The only thing I can think of that Fleet could explore is disabling some of the more advanced YAML features, but this would need to be coordinated with Elastic Agent's YAML parsing as well and it doesn't appear to be a supported feature in the YAML library we're using today in Fleet.

My recommendation would be to add double quotes around the {{this}} block in the APM package's input template here: https://github.com/elastic/apm-server/blob/2cbf33c02345a44ee58709ef456aeebe2b689fe2/apmpackage/apm/agent/input/template.yml.hbs#L39-L42

@simitt Does that make sense to you?

@jen-huang
Copy link
Contributor

There is a related discussion around the difficulties of yaml parsing here, where Andrew points out that similar issues was solved in Beats module templates by adding a to_json helper: elastic/package-spec#280 (comment)

I wonder if this kind of helper can help us in this kind of case too? @nchaulet @joshdover WDYT?

@RichMe1ster
Copy link

The fix to wrap quotes in the migration schema object can be done in either APM Server before it saves the object to Kibana or in Kibana transforming the invalid fields to valid ones as it creates the new package policy.

Hello @ogupte ! Thank you for your write up as well as the fix to this. I currently have a customer with this similar issue but unsure how to best address it with the high level workaround you've provided. Since my customer already has their services up, I think the latter approach you mentioned in Kibana transforming the invalid fields to valid ones as it creates the new package policy. is our best option although I'm not sure how to technically do that. Any step on how to apply a workaround for this issue would be appreciated. Thanks!

This is in regards to https://github.com/elastic/sdh-apm/issues/528

@joshdover
Copy link
Contributor

joshdover commented Mar 9, 2022

There is a related discussion around the difficulties of yaml parsing here, where Andrew points out that similar issues was solved in Beats module templates by adding a to_json helper: elastic/package-spec#280 (comment)

I wonder if this kind of helper can help us in this kind of case too? @nchaulet @joshdover WDYT?

+1 good find, seems worth exploring for a long-term fix. I've opened #127268

I think in the meantime, we should go ahead and wrap this setting with quotes in the apm package as explained in #121934 (comment) in order to fix this issue for 7.17.x and 8.1.x. @simitt friendly ping on this, do you think we can make this change in the APM package for these patch releases?

@simitt
Copy link
Contributor Author

simitt commented Mar 9, 2022

apologies, I missed that message; yes we can do that - elastic/apm-server#7508

@joshdover
Copy link
Contributor

Removing my assignment from this issue. The plan here is to:

@joshdover joshdover removed their assignment Mar 17, 2022
@joshdover
Copy link
Contributor

I think we can just consider this a duplicate of elastic/apm-server#7508 for now. Going to close this one.

@cauemarcondes
Copy link
Contributor

Looks like this was already fixed by this PR #128704

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Team:Fleet Team label for Observability Data Collection Fleet team v7.16.2
Projects
None yet
Development

Successfully merging a pull request may close this issue.