-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[On-Week] Hot update of APM/EBT labels #157093
Conversation
…ibana into onweek/hot-update-of-apm-labels
…ibana into onweek/hot-update-of-apm-labels
…ibana into onweek/hot-update-of-apm-labels
package.json
Outdated
@@ -788,7 +788,7 @@ | |||
"deepmerge": "^4.2.2", | |||
"del": "^6.1.0", | |||
"elastic-apm-http-client": "^11.0.1", | |||
"elastic-apm-node": "^3.45.0", | |||
"elastic-apm-node": "git+https://github.com/afharo/apm-agent-nodejs.git#add-agent.setGlobalLabel", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't be merged until elastic/apm-agent-nodejs#3337 is merged and published
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is merged. Do we know when we're planning on publishing the version shipping the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@david-luna can you help us with this?
|
||
router.versioned | ||
.put({ | ||
path: '/_settings', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using PUT /_settings
to update the dynamic options. I went with this approach instead of UI Settings because we don't need persistence through restarts... Also, this allows us to set specific settings for each Kibana node in case we have multiple instances.
But happy to iterate on that and move this to UI Settings if we don't want to open this pandora box.
@@ -6,6 +6,7 @@ | |||
"id": "telemetry", | |||
"server": true, | |||
"browser": true, | |||
"enabledOnAnonymousPages": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this allows us to send telemetry on the login screen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the new config keys that are exposed to anon page, I wonder, Is this fine, from a security standpoint?
E.g we're now surfacing the following config props to anon pages:
'telemetry.sendUsageFrom (alternatives)',
'telemetry.sendUsageTo (any)',
Are those safe to be accessed anonymously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they are safe:
sendUsageFrom
can be eitherserver
orbrowser
sendUsageTo
is eitherprod
orstaging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then I think so too yeah. I was afraid those might be urls or something
tap((labels) => { | ||
// Hack to update the APM agent's labels. | ||
// In the future we might want to expose APM as a core service to make reporting metrics much easier. | ||
window.elasticApm?.addLabels(labels); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the UI, the agent is global, so addLabels
works great.
However, on the server, the nodejs agent needs the global labels setter because addLabels
only applies to the current transaction.
options: { | ||
authRequired: 'optional', | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this allows us to fetch configuration while on the login screen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume there is nothing sensitive in the response, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structure of the output is declared in
kibana/src/plugins/telemetry/server/routes/telemetry_config.ts
Lines 83 to 89 in 0fcc9e0
body: schema.object({ | |
allowChangingOptInStatus: schema.boolean(), | |
optIn: schema.oneOf([schema.boolean(), schema.literal(null)]), | |
sendUsageFrom: schema.oneOf([schema.literal('server'), schema.literal('browser')]), | |
telemetryNotifyUserAboutOptInDefault: schema.boolean(), | |
labels: labelsSchema, | |
}), |
And our awesome tests for the versioned router will fail validation (proved) if any additional info is exposed (kudos to @jloleysens 🧡)
01c251e
to
bdadfab
Compare
I've hidden the feature behind a feature flag that it's only enabled on Serverless (we could decide to disable it on Serverless Prod as well if necessary). On top of that, it's an internal API, so external actors shouldn't be able to call the API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I won't comment on the general functionality of coreApp.allowDynamicConfigOverrides
. If it's indeed a necessity for your team, it's acceptable from a security standpoint as long as you:
- ✔️ Ensure that this API is hidden behind a non-default flag.
- ✔️ Require consumers to explicitly opt-in for configuration values that can be altered using this API.
- ✔️ Restrict access to this API to superusers only. While less stringent than our prior requirement of filesystem access, this is acceptable given our other implemented restrictions.
- ⛔ Introduce a snapshot-based test for
dynamicConfig
, similar to the one we have forexposeToBrowser
so that we're notified whenever someone decides to utilize it.
options: { | ||
authRequired: 'optional', | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume there is nothing sensitive in the response, right?
/** | ||
* This test is meant to fail when any setting is flagged as capable | ||
* of dynamic configuration {@link PluginConfigDescriptor.dynamicConfig}. | ||
* | ||
* Please, add your settings to the list with a comment of why it's required to be dynamic. | ||
* | ||
* The intent is to trigger a code review from the Core and Security teams to discuss potential issues. | ||
*/ | ||
test('detecting all the settings that have opted-in for dynamic in-memory updates', () => { | ||
expect(getListOfDynamicConfigPaths()).toStrictEqual([ | ||
// We need this for enriching our Perf tests with more valuable data regarding the steps of the test | ||
// Also helpful in Cloud & Serverless testing because we can't control the labels in those offerings | ||
'telemetry.labels', | ||
]); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@azasypkin, I added the test you requested 😇 in the commit 2bbc917
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from the security point of view 👍 Thanks for adding tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally and for Cloud deployment using kibana-a-la-carte
. It works as expected. Great work @afharo !
💚 Build Succeeded
Metrics [docs]Public APIs missing comments
Page load bundle
Unknown metric groupsAPI count
ESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: cc @afharo |
Summary
This PR is exploring the ability to update the APM/EBT labels on the fly.
Context
During test executions and, more specifically, during Perf and Scalability test executions, we rely on APM labels to tag the transactions that belong to each tested scenario, PR, branch, and test stage (warmup vs. actual test). However, up until now, we only could update these via
kibana.yml
config, which implies restarting Kibana whenever anything of the tagging info needs to be updated.This is also a problem when testing on Cloud and Serverless because restarting is even more expensive.
Alternatively, we could update the file
kibana.yml
and send theSIGHUP
signal to Kibana. However, this solution wouldn't work on Cloud nor Serverless because we don't have access to either re-writing thekibana.yml
or triggering theSIGHUP
signal to the running process.The adopted solution
We are exploring adding an HTTP API that CI can call whenever it needs to update the context. In order to do that, we need to be able to extend the configuration loaded via
kibana.yml
, CLI, and Env vars in a dynamic way.This PR introduces the concept of
dynamicConfiguration
for plugins to opt-in for specific settings. They can be updated viaPUT /internal/core/_settings
. The settings in the body are validated against the dynamicConfiguration list and the actual config schemas.In this PR, we only opt-in for
telemetry.labels
(validation schema).It can be used in any of the forms:
Why this solution?
This solution allows us to:
✅ Temporarily change certain settings: the setting overrides are only stored in memory, so restarts would wipe them.
✅ Control the settings of each individual node: similar to
kibana.yml
, these settings are applied to the node handling the request.✅ Overriding config means the UI has it during bootstrap: the
page-load
transactions include the correct configuration because they don't require an additional HTTP request to fetch the config from another plugin/service.✅ Immediately refreshed config
What's not that good about this solution?
⚠️ Requires calling it for every node: if we test any autoscaling capabilities, we'll need to apply the settings to the new nodes as soon as they come up.
⚠️ Despite being flagged as an internal API, external users may think this is persisted and shared in some way.
Some flaws of this solution:
👆 If we find that persistence is a requirement, we could store it as a SO (although we may miss the immediately refreshed config capability).
Alternative options considered
Exposing a custom API inside the
telemetry
pluginThe
telemetry
plugin could expose the APIPUT /internal/telemetry/labels
to update those.✅ Separation of concerns
✅ No need to support this from core
🚫 The UI needs to call
GET /api/v2/telemetry/config
to fetch the dynamically changed values. This means early transactions/events likepage-load
might miss the dynamically-appended labels.UI Settings
We could use Globally-Scoped UI Settings for this.
✅ They are injected into the page load
⚠️ Can't control individual tags per instance
✅ They are persisted as SO and shared to all instances
✅ Good for autoscaling
✅ Good for updating through proxies
🚫 They persist through restarts (not so good for testing multiple scenarios for the same ES instance)
🚫 They are visible in the Stack Management view
🚫 It requires plugins to handle and merge both config providers.
TODO
apm.setGlobalLabel()
apm-agent-nodejs#3337 is publishedChecklist
For maintainers