Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSS] ML - Spaces and Kibana Privileges #37709

Closed
kobelb opened this issue May 31, 2019 · 11 comments
Closed

[DISCUSS] ML - Spaces and Kibana Privileges #37709

kobelb opened this issue May 31, 2019 · 11 comments
Labels
discuss :ml Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more!

Comments

@kobelb
Copy link
Contributor

kobelb commented May 31, 2019

We've had a number of discussions regarding the plans and requirements to transition Machine Learning to using Spaces and Kibana Privileges. So many discussions, that I figured it was worth us documenting the potential paths forward.

Ability to view all jobs across all spaces

My understanding is that ML's primary concern with adopting Spaces is it makes it hard for users to get a list of all ML jobs across every space, which could potentially lead to duplicate jobs being created. The duplicated jobs are an issue because ML jobs are computationally expensive and have a non-negligible impact on the health of the Elasticsearch cluster.

To address this concern, it was suggested that ML could add a new section to the Management application which would list all ML jobs across all spaces. Currently, the management sections are either for the entire Elasticsearch cluster, global to Kibana, or specific to the current space. It's not immediately obvious which sections are which "scope", so we've started discussing how we could improve this situation here. However, I don't think these changes absolutely have to be made before ML adds a management section to manage all ML jobs across all spaces.

Using spaces only for organization

Prior to migrating to Kibana Privileges, which is described in more detail below, it's possible for ML to use spaces as an organizational feature without changing the authorization model. This would require no changes to the way that ML performs authorization, but instead allow for each ML job to be augmented with a space identifier, and then ML's job endpoints in Kibana could filter out the jobs based on the currently selected space.

If there's enough need from ML's users to be able to categorize their ML jobs, this might be a reasonable place to start as it shouldn't require any change to "core Kibana" and appears on the surface to require minimal development effort from the ML team.

Migrating to Kibana Privileges

Kibana Privileges allow us to grant access to individual features and spaces within Kibana. Instead of allowing users to have direct access to the underlying system indices, we grant the kibana internal server user access to the system indices. We're then able to perform authorization within Kibana using the Kibana application privileges before executing the query against the Elasticsearch using the internal server user. The following is a rough sequence diagram of how this works within the context of Saved Objects:

Screen Shot 2019-05-31 at 10 37 05 AM

Kibana applications which rely upon "saved objects" get this authorization applied automatically as part of the secured instance of the SavedObjectsClient. However, since ML doesn't store its data in the .kibana index, it isn't currently possible to use the SavedObjectsClient to access the ML jobs. There is an effort underway to allow the SavedObjectsClient to work with other indices, but it will require the documents in the Elasticsearch index abide by the same "schema" that we use on the .kibana index for this to work properly.

Changing ML to use "saved objects" isn't an absolute necessity, at least immediately. However, to take advantage of the future enhancements to Kibana authorization, it would be advantageous to do so.

Regardless of whether or when we decide to change ML to use saved objects, the following change to the way that base privileges will behave will be required to ensure that existing roles which aren't supposed to grant access to ML don't unintentionally get access to ML: #35865

Option 1 - ML implements their own authorization

It's theoretically possible for ML to implement a similar workflow which is performed by the existing secured instance of the SavedObjectsClient for ML jobs. This would allow ML to continue using their existing Elasticsearch index structure.

Additionally, this would require no changes to the SavedObjectsClient to support the management section for managing ML jobs across all Spaces.

The largest downside to this approach is that we'd end up re-implementing the existing access patterns which are already performed by the existing SavedObjectsClient. Additionally, as we continue to enhance Kibana's authorization, we'd have to continually ensure the ML authorization logic is updated as well, or else we lose feature parity.

Option 2 - ML switches to using the SavedObjectsClient

This option puts us in a much better position from the technical-debt/maintenance perspective. Additionally, ML would automatically be able to take advantage of the new Kibana authorization features.

For ML to switch to use the SavedObjectsClient once #35747 is complete, the ML jobs will have to be stored in the same "schema" as existing saved objects within the .kibana index. This will likely be the most challenging aspect of this approach.

Additionally, we'll have to augment the existing ML reserved roles to grant additional privileges to read/write the new ML saved object types.

This would require changes to the SavedObjectsClient to allow ML jobs to be queried across all Spaces, which we don't have an issue to track yet.

@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui

@kobelb kobelb added the Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! label May 31, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-security

@kobelb
Copy link
Contributor Author

kobelb commented May 31, 2019

/cc @epixa

@droberts195
Copy link
Contributor

I think there is also an option 3 that is half way between options 1 and 2, and similar in some ways to how index patterns work. Indices live in Elasticsearch and know nothing about Kibana, so Kibana has index patterns that are Kibana saved objects that refer to the Elasticsearch indices. In the same way we could introduce Kibana saved objects associated with spaces that refer to ML jobs in Elasticsearch. The ML UI would only display jobs in a particular space if there was a corresponding Kibana saved object.

A rough outline of how this might work is as follows:

  • ML will introduce new types of saved objects:
    • anomaly_detector
    • data_frame_transform
    • data_frame_analytics
  • ML jobs created through the ML UI after this feature is implemented will be visible initially only in the space they were created from
  • The ML UI will filter which jobs can be seen in each space
    • It will do this by getting the jobs list from Elasticsearch as now, then using the bulk get saved objects Kibana API to check whether each job returned by Elasticsearch has a corresponding saved object in the current space and displaying only the jobs that do
    • For each job there will be a saved object in the .kibana index for each space that it is to be visible in
    • This mimics the way that Elasticsearch indices can be accessible in multiple spaces via space-specific index patterns
    • But unlike index patterns there would be no wildcarding - each saved object refers to one job
  • There will be no saved object for datafeeds, as there is at most one datafeed per job, and that datafeed will always be treated as being in the same space as its associated job
  • On Kibana startup, and periodically thereafter, jobs which are not in any space will be added to the default space
    • This will be done by a server side timer that uses the Kibana system user to look across spaces
    • It means that ML jobs created in earlier versions of the stack will initially be visible in the default space (which matches how other pre-space objects were treated)
  • ML jobs created by directly calling Elasticsearch APIs will also become visible in the default space some time after they are created
  • Users who wish to programmatically create ML jobs in a specific space should use ML's Kibana APIs to create them, not the underlying Elasticsearch APIs
  • All jobs will be available in a special management view
    • This will allow the view of all jobs which is required for resource management
    • This will allow jobs to be added to and removed from spaces
    • This will provide links to the spaces in which the jobs can be managed from, so as not to replicate the job management UI
  • CRUD permissions remain as is, controlled using existing Elasticsearch privileges
    • In other words, if you want to open/close/delete a job you still need the machine_learning_admin role in addition to the privilege on the Kibana saved object corresponding to the job
    • However, in future we could relatively easily use Kibana privileges to make jobs read-only via the UI in a particular space, even for users with the machine_learning_admin role

(There are probably holes in this, and it should be taken as a starting point for discussion rather than a spec for implementation.)

@kobelb
Copy link
Contributor Author

kobelb commented Jun 5, 2019

The approach which you've outlined @droberts195 seems reasonable.

The biggest issue that I foresee, which isn't necessarily a deal breaker, is that ML jobs will have to be interacted with using ML specific APIs as the existing saved object APIs won't be able to perform the "application level join" with the jobs which are stored in the machine learning specific indices. Similarly, ML jobs won't be able to be exported/imported using the saved object management screens, without augmenting the underlying infrastructure.

Another thing to consider is making the necessary changes to the "saved objects service" to be able to query the ES specific ML indices. I'm not too familiar with the structure of the indices to know how feasible this would be.

@droberts195
Copy link
Contributor

Similarly, ML jobs won't be able to be exported/imported using the saved object management screens

I think that's probably for the best. We have an open issue for the ability to import and export ML jobs: elastic/elasticsearch#37987

An ML anomaly detector job could have a huge amount of data associated with it, spread over multiple indices:

  1. A job config document in the .ml-config index
  2. Probably a datafeed config document in the .ml-config index (it's theoretically possible to create a job without a corresponding datafeed, but in practice UI users wouldn't do this)
  3. One or more audit events in the .ml-notifications index
  4. Possibly some result annotations in the .ml-annotations index
  5. Persisted model state in the .ml-state index - this can potentially be gigabytes for a single job
  6. Results in one or more .ml-anomalies-* indices - there could be hundreds of thousands of documents for the biggest jobs if they've been running for a long time

Whatever is eventually done for the ML import/export functionality will have to take all of this into account. Therefore I think it would be best if the saved objects associated with ML jobs didn't show up at all in the list of objects to be exported. It would be highly misleading to just export or import the single job config document and lose everything else.

ML jobs will have to be interacted with using ML specific APIs as the existing saved object APIs won't be able to perform the "application level join" with the jobs which are stored in the machine learning specific indices

I think that this is also for the best based on my experience of the 6.x -> 7.x upgrade process.

The ML job documents may need to be modified during the major version upgrade process. This will need to be done in such a way that jobs can continue to run in the mixed version cluster during a rolling upgrade. To facilitate this we grant no permissions to anyone on the .ml-config index. (Obviously superuser can access it for the time being, but even that might change when Elasticsearch introduces internal indices - then .ml-config may be hidden completely like the .security index.) Instead everyone must query the jobs using the get jobs API (and equivalent APIs for other types of config). If the UI owned the config documents then the UI would need to be responsible for upgrades to the structure/content.

If we get to the stage where we have a saved object twin for every job that is to be visible in the UI then initially these could be used to just define which space the job was in, but in the longer term could potentially replace the machine_learning_admin and machine_learning_user roles for UI access to ML jobs. We could give the kibana_system user permission to query the ML endpoints results indices, then instead of querying Elasticsearch as the logged in user in the ML app, query using kibana_system but after having checked which jobs the currently logged in user is permitted to read and/or administer. This would effectively give job level security for UI users of ML. (Purely backend users, e.g. a handful of OEM customers, would just get the all-or-nothing granularity of access provided by machine_learning_admin and machine_learning_user.)

I'm sure there are still a lot of details to think through. Maybe we should organise a call to discuss in more depth.

@kobelb
Copy link
Contributor Author

kobelb commented Jun 5, 2019

We could give the kibana_system user permission to query the ML endpoints results indices, then instead of querying Elasticsearch as the logged in user in the ML app, query using kibana_system but after having checked which jobs the currently logged in user is permitted to read and/or administer.

Agreed, and making this change would allow us to grant users access to ML jobs on a per-space level. Until we remove the user's privileges to query the ML indices directly, we aren't able to do so.

I'm sure there are still a lot of details to think through. Maybe we should organise a call to discuss in more depth.

That sounds reasonable to me, let me know if there's anything that I can prepare before the meeting which would aide the discussion.

@kobelb
Copy link
Contributor Author

kobelb commented Jun 20, 2019

During a security team roadmap planning, we discussed two alternate approaches for allowing ML to be part of the Kibana "base privileges". Originally we were planning on suggesting Configurable base privileges as the solution. However, we're currently favoring Kibana base privileges opt-in using kibana.yml as the solution.

We are hesitant about adding "configurable base privileges" without allowing users time to adopt "feature controls" to determine whether we're solving a persistent and common issue, or whether we're introducing functionality which will cause confusion and increased complexity. Implementing the "configurable base privileges" primarily makes sense if this is a feature which we would like to have long-term for Kibana, but using it primarily as the solution for the 7.x time-frame to maintain backwards compatibility as we migrate ML to using Kibana privileges doesn't make much sense.

If you all could read through the Kibana base privileges opt-in using kibana.yml issue and let us know whether the solution is acceptable or whether you'd prefer to discuss in person. We're hoping to get one of these solutions on our roadmap and ensure it's synchronized with your timeline for migrating the ML privileges model.

/cc @jinmu03 @droberts195 @sophiec20

@kobelb
Copy link
Contributor Author

kobelb commented Oct 15, 2019

I've given this some more thought in the context of what changes should be made to more fully support "linked" saved-objects, like the ones that ML will be using. Since the initial discussion, we've begun to think of Code's usage of saved-objects as being "linked" to resources on the Kibana server's filesystem, which is similar to ML's jobs being "linked" to Elasticsearch ML Jobs. The following are purely my thoughts, and are in no way prescriptive of how the ML team should implement this. It's very likely I'm grossly over-simplifying matters because of my lack of detailed knowledge about ML, so please take my opinions with a grain of salt.

From the end-users perspective, there are two distinct operations which we intend to support for all saved-objects:

  1. Copy to space - currently implemented, creates isolated saved-object in the different spaces
  2. Share to space - unsupported, same saved-object in multiple spaces

When using a "linked" saved-object, this becomes somewhat more complicated because part of the data is stored in the saved-object itself and some of it in the "linked" resource. However, it's possible to abstract away this complexity from the users.

For example, when the user performs a create/update operation, they can be required to provide all of the data that will be stored in the saved-object and the ES ML job, and the Kibana server can create/update both the saved-object and associated ES ML job. This can be done using a SavedObjectsClientWrapper, so that end-users can continue to use the Saved objects APIs to perform the operation.

A similar approach could be utilized for get/find operations. Utilizing the same SavedObjectsClientWrapper after retrieving the saved-objects, the Kibana server could retrieve all ES ML jobs and merge the two definitions.

This would allow end-users and ML UI engineers to interact with Kibana ML Jobs using similar interfaces, and allow us to implement the ability to "share" ML Jobs in multiple spaces when that time comes. We'll likely want to add the ability for specific saved-object types to opt-out of being exportable, importable and copied to encourage the proper usage of these linked saved-objects given the inherent complexities.

@kobelb
Copy link
Contributor Author

kobelb commented Oct 30, 2019

FWIW, after discussing with @epixa, he's not on-board with my previous recommendation to use the SavedObjectsClientWrapper to do the "in application joins" to the linked ES ML Jobs and favors using dedicated ML APIs for this. Since import/export and "copy to space" are out-of-scope for the initial implementation, the non-pedantic benefits we get from my prior recommendation are negligible.

@kobelb
Copy link
Contributor Author

kobelb commented Jan 14, 2020

Closing this issue, as it's no longer being used for discussing this integration.

@kobelb kobelb closed this as completed Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss :ml Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more!
Projects
None yet
Development

No branches or pull requests

3 participants