Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metrics] Expose metrics for cloud observability in serverless #153720

Closed
lukeelmers opened this issue Mar 25, 2023 · 4 comments
Closed

[metrics] Expose metrics for cloud observability in serverless #153720

lukeelmers opened this issue Mar 25, 2023 · 4 comments
Labels
Project:Serverless Work as part of the Serverless project for its initial release Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more!

Comments

@lukeelmers
Copy link
Member

We have a requirement to expose unauthenticated metrics endpoints in serverless which will be used to collect cloud o11y metrics. The plan is for these metrics to be collected by elastic agent using the existing kibana metricbeat module, which consumes /api/status and /api/stats. Long term, these metrics will be standardized across stack products (possibly as prometheus metrics endpoints).

Though the routes need to be unauthenticated internally for the purposes of metrics collection, they will be secured at the network level to prevent public access. To enforce the network policy, we need to expose these APIs on a different port, which metricbeat will be given access to.

That means Kibana needs to be able to serve /api/status and /api/stats from two ports, one which is unauthenticated, and one which requires authentication. To prevent complexity from leaking into metricbeat, these ideally need to be the same routes that we have today (rather than introducing separate routes for this purpose).

After some initial discussion, we've landed on two possible approaches:

  1. Metrics service could start a new server that binds to a different port, expose the routes that metricbeat needs (unauthenticated) and call the same handlers behind the scenes. This is lighter weight than a full health-gateway-style approach of spinning up a separate process to proxy requests.
  2. We could investigate whether it's possible to detect the port a request is coming in on, then expose Kibana from two ports and handle authentication requirements accordingly (i.e. we disable the auth requirement for the relevant routes if we detect the special metrics port is being used).

The requirements for this task break down into two phases:

Phase I: Initial workaround to unblock work on autoscaling (next ~2 weeks). We're hoping to have an autoscaling MVP by the end of April (IC2), but since it requires metrics collection to be working, this task will be blocking any forward progress there. As an interim solution, we agreed that we could temporarily leave both /api/status and /api/stats unauthenticated in serverless, and use a workaround at the control plane level to expose kibana from 2 ports on the pod instead of 1. To make this possible, we could leverage the existing status.allowAnonymous config (for /api/status), but would need a similar solution for /api/stats (either a dedicated config like stats.allowAnonymous, or overload the existing status.allowAnonymous to apply to both endpoints). Then we could force those on in serverless, but leave them off by default in ESS / self-managed, so we don't change any existing behavior.

Phase 2: Long term solution (next ~2 months). We take options 1 or 2 above (or any other options we can think of), finalize a design for them, and implement. In either case we'd need to add a way to configure a port for metrics. server.metrics.port, metrics.port, or similar. Then we can remove the Phase 1 workaround.

--

cc @legrego @azasypkin

@lukeelmers lukeelmers added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Project:Serverless Work as part of the Serverless project for its initial release labels Mar 25, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

ymao1 added a commit that referenced this issue Jun 12, 2023
…background worker utilization API (#159505)

## Summary

Until [this issue](#153720) is
resolved, this config flag allows us to access the task manager
background worker utilization API in serverless to support autoscaling
of background task deployments

## To Verify

Run es: `yarn es snapshot`
Run serverless on this branch: `yarn serverless-es`

Verify you see the following warning in the logs:
```
[2023-06-12T12:47:19.641-04:00][WARN ][plugins.taskManager] Disabling authentication for background task utilization API
```

and you can access `/api/task_manager/_background_task_utilization`
without logging in
@lukeelmers
Copy link
Member Author

lukeelmers commented Jun 12, 2023

Once #159530 merges, we'll be able to temporarily access /api/status and /api/stats without authentication in serverless for collecting UI metrics.

#159505 already added support for collecting unauthenticated background task metrics in serverless, which are also needed for autoscaling.

Edit: To clarify, these PRs are short-term solutions (initial workaround / Phase I mentioned above)

@rudolf rudolf added the Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! label Jun 13, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-security (Team:Security)

@rudolf rudolf removed the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Jun 13, 2023
@lukeelmers
Copy link
Member Author

Closing as I believe this is no longer relevant now that we're using JWTs for authenticating these requests.

@lukeelmers lukeelmers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Project:Serverless Work as part of the Serverless project for its initial release Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more!
Projects
None yet
Development

No branches or pull requests

4 participants