Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[http] Prevent access to internal-only APIs when running in serverless #151940

Closed
lukeelmers opened this issue Feb 22, 2023 · 31 comments
Closed

[http] Prevent access to internal-only APIs when running in serverless #151940

lukeelmers opened this issue Feb 22, 2023 · 31 comments
Assignees
Labels
Epic:VersionedAPIs Kibana Versioned APIs Feature:http Project:Serverless Work as part of the Serverless project for its initial release Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@lukeelmers
Copy link
Member

lukeelmers commented Feb 22, 2023

Currently Kibana only uses a naming convention to distinguish between internal (/internal) and public (/api) routes. We've seen many instances in the past of internal APIs being (ab)used, even when they are undocumented. The introduction of a serverless offering presents us with an opportunity to better enforce this separation and ensure we don't get stuck maintaining internal APIs just because they've been picked up.

Elasticsearch recently introduced serverless API protection, where calls to internal HTTP APIs are rejected if an x-elastic-internal-origin header is not present: elastic/elasticsearch#93607. Presumably these headers could eventually be stripped from incoming requests at the control plane level to ensure the protection can't be circumvented.

We should consider doing the same for Kibana, that is, formalize a way to flag routes as internal, and reject incoming requests to these routes if they are missing the relevant header. We could use the same header as ES or consider leveraging the existing x-elastic-product-origin header (which serves a similar purpose, though it wasn't designed with this use case in mind).

This would mean all requests from the client-side http service would need to have the header injected. We wouldn't be able to enforce this boundary for self-managed just yet, but could consider doing so in a future major version upgrade.

This is potentially related to the versioned router discussion -- depending on how that's implemented, we could also use it as an opportunity to incorporate a concept of internal to our APIs.

@lukeelmers lukeelmers added Feature:http Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Project:Serverless Work as part of the Serverless project for its initial release labels Feb 22, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented Feb 23, 2023

We should consider doing the same for Kibana, that is, formalize a way to flag routes as internal, and reject incoming requests to these routes if they are missing the relevant header.

++ to that!
As it is, we can't remove the (experimental) global SO HTTP APIs because they're used externally 😢 .

For versioned routers, are you thinking something along the lines of

/**
 * Arguments to create a {@link VersionedRouter | versioned router}.
 * @experimental
 */
export interface CreateVersionedRouterArgs<Ctx extends RqCtx = RqCtx> {
  /**
   * A router instance
   * @experimental
   */
  router: IRouter<Ctx>;
  /**
   * Flag to indicate the route is public
   * Public routes are:
   *    guaranteed to be stable and maintained for a specified amount of time (typically 18 months)
   *    must go through a standard deprecation mechanism
   * Internal-only route requests implement header-presence & match verification mechanism and block any requests missing the required header(s)
   * Header specification TBD
   * @experimental
   */
  isPublic?: boolean;
}

where isPublic would internally add a required header in the response?

Disclaimer: the implementation needs discussion and (possibly) a design proposal

@TinaHeiligers
Copy link
Contributor

@lukeelmers is restricting access to internal-only APIs a hard requirement or is it more of a desired mechanism? We want to know if we should build that into the current versioned API design now.

@lukeelmers
Copy link
Member Author

It's not a hard requirement, however if we don't have a solution like this it opens us up to the risk that internal APIs are discovered/used, and then we get stuck supporting them if they become widely adopted.

Explicitly documenting these APIs as not supported is fine as an interim solution (vs leaving them undocumented and hoping nobody uses them). I just thought if we were already introducing a versioned router & asking all teams to move to it, it would be an ideal time to also ask them to flag their APIs as internal/public in a way that's more formal than a pathname convention.

Even if we don't choose to enforce this until later, it'd probably make our lives easier than chasing folks down to revisit all of their APIs again later. But whether it's worth taking that tradeoff is up to the team 🙂

@TinaHeiligers
Copy link
Contributor

I just thought if we were already introducing a versioned router & asking all teams to move to it, it would be an ideal time to also ask them to flag their APIs as internal/public in a way that's more formal than a pathname convention.

I'm fully on board with adding an optional internal flag as a recommended mechanism to "protect" APIs not intended for external use.
@jloleysens since #151596 is already merged, we could treat this issue as a follow-up/enhancement to that.

@TinaHeiligers
Copy link
Contributor

Thinking aloud here, we could take the approach of making no assumptions as to how a domain reacts to requests in internal APIs, but we'd need to decide pretty soon on an approach.

Elasticsearch recently introduced serverless API protection, where calls to internal HTTP APIs are rejected if an x-elastic-internal-origin header is not present: elastic/elasticsearch#93607.

Since there's already a mechanism, why not implement something similar, maybe even reuse that exact same header?

@lukeelmers
Copy link
Member Author

optional internal flag

Two questions to consider if we decide to do this:

  1. If we add this as optional, what's an appropriate default? It would be lower-risk to force folks to opt-in to expose an API publicly instead of opt-in to making it internal (and hopefully we are producing more internal APIs than public ones anyway). Or alternatively we could just make it a required flag.
  2. Do we also want to have the router automatically enforce/handle the path naming convention (i.e., if you say public: true, it'll prepend /api for you)?

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented Feb 23, 2023

good questions!

For 1, we should default to internal and leave public APIs as those that are opted in to make them public. Of course, we'd need to flesh out and agree on a design pretty soon, ideally before teams start their migration.

There's a potential repercussion to opting into a public API: Do we need to implement a modified version for those? Once teams consciously decide to make an API public, they're signing a contract to maintain said API for a specified amount of time (AFAIK, that's 18 months). @jloleysens how would that affect the design?

For 2: That would be a lovely feature!
However (and I'm thinking aloud here), how would we handle a migration from a previously internal API to a public one?
Would it force us to sunset the internal version and introduce a new public API?

It's probably the easiest way and safer option than introducing a whole new API migration strategy.

@TinaHeiligers
Copy link
Contributor

@jloleysens the versionedRouter implementation would return the header we're going to use to verify requests to internal routes.

The example then becomes something like this:
**
 * This interface is the starting point for creating versioned routers and routes
 *
 * @example
 * const versionedRouter = vtk.createVersionedRouter({ router });
 *
 * ```ts
 * const versionedRoute = versionedRouter
 *   .post({
 *     path: '/api/my-app/foo/{id?}',
 *     options: { timeout: { payload: 60000 } },
 *   })
 *   .addVersion(
 *     {
 *       version: '1',
 *       validate: {
 *         query: schema.object({
 *           name: schema.maybe(schema.string({ minLength: 2, maxLength: 50 })),
 *         }),
 *         params: schema.object({
 *           id: schema.maybe(schema.string({ minLength: 10, maxLength: 13 })),
 *         }),
 *         body: schema.object({ foo: schema.string() }),
 *       },
 *     },
 *     async (ctx, req, res) => {
 *       // logic to validate request based on `isPublic` flag
 *       await ctx.fooService.create(req.body.foo, req.params.id, req.query.name);
 *       return res.ok({
 *          body: { foo: req.body.foo },
 *          headers: { 'x-elastic-product-origin': 'kibana' },
 *       });
 *     }
 *   )
 *   // BREAKING CHANGE: { foo: string } => { fooString: string } in body
 *   .addVersion(
 *     {
 *       version: '2',
 *       validate: {
 *         query: schema.object({
 *           name: schema.maybe(schema.string({ minLength: 2, maxLength: 50 })),
 *         }),
 *         params: schema.object({
 *           id: schema.maybe(schema.string({ minLength: 10, maxLength: 13 })),
 *         }),
 *         body: schema.object({ fooString: schema.string() }),
 *       }),
 *     },
 *     async (ctx, req, res) => {
 *       // logic to validate request based on `isPublic` flag
 *       await ctx.fooService.create(req.body.fooString, req.params.id, req.query.name);
 *       return res.ok({
 *          body: { fooName: req.body.fooString },
 *          headers: { 'x-elastic-product-origin': 'kibana' },
 *       });
 *     }
 *   )
 * ```
 
</details>

My rough thoughts don't include a request header. Where/how would we include that?

@afharo
Copy link
Member

afharo commented Feb 24, 2023

++ to being explicit about an app being internal or public!

However, should it be a flag public: true/internal: true, or would it be better to have access: 'public' | 'internal'? IMO, the access attribute would allow us for more granular control:

For instance: 'public' | 'ui' | 'localhost'

  • public is OK to be called externally
  • ui is meant to be used by our UI, and subject to change and evolve with it (no guarantees of BWC)
  • localhost is meant to be used by agents like Elastic Agent or the health gateway (deployed in the same container).

For ui, I wonder if we could add an additional check for the internal ones that validates the referrer in the headers (and/or a custom header with a periodically rotated token shared between the UI and the server). This would add an extra layer of obscurity to the internal APIs to frustrate attempts to use them in external integrations.

WDYT?

@Bamieh
Copy link
Member

Bamieh commented Feb 24, 2023

For the access attributes i'd rather have us use internal instead of localhost since it gives a bit about the internal architecture of the cluster, which is also subject to change so this might not make sense long term

@afharo
Copy link
Member

afharo commented Feb 24, 2023

++ I was trying to use names to make it super-obvious in the comment. I think for the MVP, we should be fine with public and internal only. Other levels of granularity (and their names) can be defined at a later time.

@jloleysens
Copy link
Contributor

jloleysens commented Feb 24, 2023

Presumably these headers could eventually be stripped from incoming requests at the control plane level to ensure the protection can't be circumvented.

So it will also be stripped from browser requests.

x-elastic-internal-origin

If possible to send from browsers we should use this. Otherwise we should pick something else. x-elastic-kibana-internal-origin 😄 . I am not familiar with the original context behind x-elastic-product-origin, but it would be nice to avoid overloading it unless it fits our use case well enough (well enough TBD 🤔).

This would mean all requests from the client-side http service would need to have the header injected.

Yeah, I am curious about how we want to treat the distinction between serverless and on-prem/cloud. For serverless we can definitely be more aggressive in reducing the HTTP APIs we make available, but it seems we will need to continue as we have done for on-prem/cloud for a while. This probably means some kind of on/off switch based on whether we are running serverless (assuming Kibana has the final say about whether a request goes through).

This is potentially related to the versioned router discussion

It is related, but the concept of internal/public does not only apply to versioned routes. It applies to ALL routes. The internal/public concept could be middleware run by our server on all requests.

The design work here should be to decide where to expose this kind of setting: existing RouteConfig or somewhere else? The versioned router design will just adopt whatever we decide there.

For instance: 'public' | 'ui' | 'localhost'

From an API perspective, I think a binary setting of public: boolean or internal: boolean has a clearer case for it and therefore is likely to see its intended use fulfilled. If we have need for further categorisation we should consider a new setting like access. I am weary of designing for more granular categories as there is risk they don't cleanly fit reality. To my mind internal or public is a much safer bet. But I may be missing some data here.

@TinaHeiligers
Copy link
Contributor

From an API perspective, I think a binary setting of public: boolean or internal: boolean has a clearer case for it and therefore is likely to see its intended use fulfilled.

++

@lukeelmers
Copy link
Member Author

Presumably these headers could eventually be stripped from incoming requests at the control plane level to ensure the protection can't be circumvented.

So it will also be stripped from browser requests.

That's a great point, I copied that here because it is what ES will likely do, but this wouldn't work for browser traffic from Kibana. Regardless I don't think it matters too much - even if folks can circumvent it, they'd have to really go out of their way to do so (vs just copying an endpoint that happens to have internal in the path and not thinking about it). And if we eventually split more server-side components out into separate services, it will open up the opportunity to enforce this more strictly in some areas.

x-elastic-internal-origin

If possible to send from browsers we should use this.

Agreed, we should be able to inject it via the client-side http service.

This probably means some kind of on/off switch based on whether we are running serverless

Yeah we'd need to conditionally enforce this on serverless only for awhile, but perhaps in the future could roll it out consistently for self-managed as well.

the concept of internal/public does not only apply to versioned routes. It applies to ALL routes

What types of non-versioned routes are you thinking of? Static assets? It seems to me that pretty much all of our http APIs will need to be versioned in some way. To support downtimeless upgrades we'll even need to have a concept of versioning on our static assets eventually, whether it be handled on the Kibana side or on a CDN.

I think a binary setting of public: boolean or internal: boolean has a clearer case for it

++ I don't think we need to get super granular about this. Ultimately this proposal isn't a security feature, it's about protecting users from foot guns and protecting us from unintended maintenance overhead.

The design work here should be to decide where to expose this kind of setting: existing RouteConfig or somewhere else? The versioned router design will just adopt whatever we decide there.

This is the main thing I wanted to propose we consider now. I don't want to blow up the scope of the versioned router project, and we don't even need to enforce anything at this point, but it'd be worth considering whether to start collecting this info since we're already asking teams to migrate their APIs.

@TinaHeiligers
Copy link
Contributor

we don't even need to enforce anything at this point, but it'd be worth considering whether to start collecting this info since we're already asking teams to migrate their APIs.

Having a route default to internal makes it easier to introduce the mechanism later, leaving us some room to think through this more. I want to keep it simple and we need to get it right.

@pgayvallet
Copy link
Contributor

My main question here is, do we need to make a distinction between 'Kibana-internal' (APIs that should only be accessed from the Kibana UI, or for -TBD- inter-node communcations) and 'Stack-internal' (APIs callable from other components of the stack but not from our customers), or are we fine with just the public and internal (Kibana+stack) isolation?

My feeling is that we're good with only two options, as we're supposed to be in control of internal calls, and therefor adding an isolation layer between Kibana and the rest of the 'internal' components doesn't seem to bring much value.

So, if we go with an access: 'public' | 'internal' option for our routes, my contributions to the discussions:

If we add this as optional, what's an appropriate default?

Adding the option as mandatory would mean adapting all the routes definition in the whole codebase in a single PR, which doesn't seem realistic, so I think we don't have any choice that having it optional for the current unversioned router. For the versioned router, I would just introduce the option in the initial implementation, which would allow us to have it defined as mandatory.

Now regarding the appropriate default, to avoid the risk of introducing breaking changes, I think we're kinda forced to default to public

Do we also want to have the router automatically enforce/handle the path naming convention

I don't really see any value in this. I do see value in doing it the other way around though, to set the default access value on routes that don't explicitly specify it. So /api/foo/bar will be defaulting to access: public and /internal/hello/dolly would default to access: internal (and /other-prefix/wut would be public too FWIW).

x-elastic-internal-origin vs x-elastic-product-origin

TLDR: x-elastic-internal-origin is the way to go ihmo. (x-elastic-product-origin is/was supposed to be used for different purposes, let's not mix those things).

Proposal summary

Allow to define Kibana HTTP APIs either as being public or internal. When Kibana is configured accordingly, access to internal APIs is blocked unless the x-elastic-internal-origin is properly specified.

1. Adapting the current server-side route declaration

Adapt RouteConfigOptions to add a new optional access parameter (NOTE: the access naming is fully open to discussions).

This parameter can set to either public or internal.

When unspecified, the value will be inferred from the path of the route. If the path starts with /internal, it will default to internal. Otherwise, it will default to public.

2. Adapting the versioned router specification

Same idea. note that the optional parameter is at the route's level (opposed to the version's level) - a route access level cannot change between versions.

Given the API isn't exposed / implemented yet, we don't need to have the option optional. For versioned routes, access will be a mandatory parameter from day 1.

3. Adapting Core's internal router to handle the access parameter

Core's internal routing system will be adapted to properly use this access parameter. When accessing a route defined as internal, and when Kibana is configured accordingly (see point 6.), the system will check the presence of the x-elastic-internal-origin header, and return an error (403? other?) if the said header is not present.

Note: validation on the content of the header is probably not necessary, unless someone thinks otherwise.

4. Adapting the browser-side http service

Core's browser http service will be adapted to send an additional x-elastic-internal-origin: Kibana header. This header will be added internally by the fetch/http service, and will not be overridable by consumers of said services (as already done for kbn- headers)

5. Adapting browser-side code not using Core's http service

I know at least bfetch uses their own way of accessing the server-side. We should make sure that all similar calls to internal APIs are also adding the correct header

6. Allowing internal API access to be configurable

Given we only want to restrict access to internal APIs for our serverless env (at least for now), given it would/could be considered a breaking change for other envs, we need a new configuration option to toggle access to APIs defined as internal.

A new http.restrictInternalApis (I'm very open on the naming here, but you get the intent) will be introduced. By default, access to internal APIs will be allowed to avoid this being a breaking change. On serverless, we will set this new option to the proper value to properly restrict access to our internal APIs.

7. Make sure intra-stack components are updated too

Not directly related to the implementation, but we will need to make sure that all internal actors using internal APIs (at least in our serverless env) are properly using the x-elastic-internal-origin when communicating with Kibana.

WDYT, all?

@jloleysens
Copy link
Contributor

Overall: great summary and breakdown of a path forward. I am happy to move ahead with the proposed tasking. I only have a few comments for now.


When unspecified, the value will be inferred from the path of the route. If the path starts with /internal, it will default to internal. Otherwise, it will default to public.

I like this idea!

For versioned routes, access will be a mandatory parameter from day 1.

+1 to this idea too

at least bfetch uses their own way of accessing the server-side.

One interesting case is Console. We would need to make sure that Kibana <-> ES traffic is not flagged as internal for the proxied requests. Might not require any changes, just another instance of the http client not be used.

A new http.restrictInternalApis

I know something like this is strictly necessary to avoid removing these APIs for cloud/on-prem, but I am a bit concerned about the the scale of this change and the impact it may have on tests and QA. Do we need specialised tests for Kibana in "restricted internal APIs" mode? Not saying we need a specific solution to this, rather just think about how we will ensure quality of the serverless variant.

@pgayvallet
Copy link
Contributor

but I am a bit concerned about the the scale of this change and the impact it may have on tests and QA. Do we need specialised tests for Kibana in "restricted internal APIs" mode?

What do you mean here exactly? Are you thinking about the eventual need to have a full FTR coverage that all 'internal' APIs are not accessible without the header when this 'restrict' config option is enabled? Or is this something else?

@jloleysens
Copy link
Contributor

I had in mind that Kibana UI still works with Kibana server as intended (has the same APIs available). But perhaps this is as simple as ensuring that we are sending the needed header via the http client.

@lukeelmers
Copy link
Member Author

lukeelmers commented Feb 27, 2023

we will need to make sure that all internal actors using internal APIs (at least in our serverless env) are properly using the x-elastic-internal-origin when communicating with Kibana

That's probably an argument in favor of tackling this sooner than later, that way folks can hit this barrier early and adapt their code (vs us needing to go back and chase them retroactively after we've broken stuff by enabling the setting).

return an error (403? other?) if the said header is not present.

FWIW it looks like ES is treating it as a 400 (i.e. "you should have known not to call this API in serverless") with an error message that says uri [/foo] with method [POST] exists but is not available when running in serverless mode. The situation is a little different for ES though, because they have an explicit concept of a "serverless mode" which we have avoided in Kibana.

Perhaps in our case it'd be more appropriate to do something like a 400 with uri [/foo] with method [POST] exists but is not available when http.restrictInternalApis is enabled. IMO the advantage of using a 400 is that it more clearly communicates "you did something wrong which is fixable" vs a 403 which implies "access is forbidden even if you did the right things". But I'm just thinking aloud; I don't have strong feelings on this and am open to whichever outcome makes more sense to everyone. [Edit: on second thought we probably don't want to leak the existence of the http.restrictInternalApis setting since (a) folks won't be able to change it, and (b) it will hopefully be disallowed in self-managed anyway]

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented Feb 27, 2023

the advantage of using a 400 is that it more clearly communicates "you did something wrong which is fixable" vs a 403 which implies "access is forbidden even if you did the right things"

Personally, I prefer a 403 since it's communicating a restricted-access situation whereas a 400 implies that the problem can be fixed. We know that API customers would try to "fix" their call but they can't (assuming that the http.restrictInternalApis wouldn't be exposed) which could lead to frustrated users and many support cases asking for help.

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented Feb 27, 2023

I'll go ahead and create issues for each step in the proposal and @jloleysens and I can work on them in parallel.
(We can consolidate issues and combine some if needs be.)

We don't really have the luxury of spending too much time debating names.
Let's just go with an access tag and an http.restrictInternalApis config setting.

The error code, however, we should settle on (even though we can still change it in PR review). If we want to be consistent with Elasticsearch, then yes, we should go with a 400 but as @pgayvallet points out, kibana's use case is slightly different. I'd rather go with a 403, directly implying restricted access but we can also debate that.

@drewdaemon
Copy link
Contributor

Can I just give a strong +1 here, but not just for serverless.

We are introducing saveable event annotation groups in Lens. The plan was to mark that SO type as hidden so that customers are not able to automate the creation of annotations via the global SO APIs.

However, #149098 directs us to create an HTTP API of our own. This opens up the problem again since external users could again use it to automate the creation of these objects.

@TinaHeiligers
Copy link
Contributor

Having created issues for each step, I think that we could only realistically handle the config option for restricting API access and ensuring that intra-stack components are updated as a separate item of work.

The Versioned Router will inherit from the Core implementation (as will the configuration adaption for specific environments).

For now, we should communicate the intent to add access control in the guide and update the design once the core part is done.

@jloleysens
Copy link
Contributor

@drewdaemon

This opens up the problem again since external users could again use it to automate the creation of these objects.

This control will be available on serverless initially (not cloud or on-prem). One of the conventions for HTTP APIs in Kibana (there are not many) is to prefix your path with /internal. I would recommend doing the same for the new Lens endpoint!

@drewdaemon
Copy link
Contributor

drewdaemon commented Feb 28, 2023

@jloleysens we absolutely will. But, as has already been discussed in this thread, if enough customers use the internal APIs they can use business pressure to force us away from breaking changes.

As an extra deterrent, I will probably end up using some heuristics in the body of the handler to check if the request originates in Kibana UI.

I really like the idea of a universal boolean flag I can just set. 👍

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented Feb 28, 2023

The team discussed this and decided to postpone the full implementation until other priorities are met.

For now, we'll lay the groundwork (steps 1, 2 and possibly 3) and add the config option, inferring it for routes that don't explicitly declare the access.

Versioned routes must declare access from the start, that's why we've had to do the first part of the work.

@TinaHeiligers
Copy link
Contributor

@lukeelmers We keep "promising" internal APIs will have restricted access and it would be great to have that in place before folks start migrating to versioned routes. Is there anything more we can use to motivate jumping on the rest of the implementation here?

@lukeelmers
Copy link
Member Author

I think the main motivation is making sure we have these restrictions in place before any customers are using the system in the MVP -- otherwise we risk internal APIs being discovered and used, or folks migrating existing integrations which rely on internal APIs from self-managed to serverless, where they wouldn't have any protections in place and might give the impression that these APIs are still fine to use.

IMO there isn't much risk in communicating to teams how to set the access on their APIs now, even if we aren't enforcing it right away. The key thing is making sure it is enforced before external customers/partners start using the product.

@TinaHeiligers
Copy link
Contributor

Closed by #156935

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic:VersionedAPIs Kibana Versioned APIs Feature:http Project:Serverless Work as part of the Serverless project for its initial release Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

8 participants