[RFC] Enable OpenSearch Dashboards to support multiple OpenSearch clusters #1388

zengyan-amazon · 2022-03-25T20:16:48Z

Problem Statement

OpenSearch Dashboards (OSD for short) was design and implemented to work with one single OpenSearch cluster. Dashboards users need to navigate between Dashboards endpoints to visualize their data if they have multiple OpenSearch clusters. This experience is not user friendly and also added overheads as users need to maintain multiple OpenSearch Dashboards instances, one for each OpenSearch cluster.

We expect to provide the experience for OpenSearch Dashboards users to have one single Dashboards that can visualize data in different OpenSearch clusters. An OpenSearch that saves raw data for analysis is a data source.

The proposal here is to enable OpenSearch Dashboards to have the capability allow users to dynamically manage their data sources. Then users can build visualization and dashboards against data in those data sources, and put those visualizations into single dashboard.

Proposed Solution

We propose to add a new data-source type in Dashboards saved objects, which includes the data source URL, capabilities (such as what plugins are available), and credentials (credentials will be encrypted by OSD when persisted) to be used to access the data source. Then index-pattern can refer to a data-source, and based on this data-source reference, Dashboards server can execute the query against the specific data-source.

For instance, a data-source object may look like:

{
  "type": "data-source",
  "data-source": {
    "title": "demo-data-source",
    "host": "https://my.opensearch.domain/",
    "auth_type": "basicauth",
    "credentials": {
      "username": "dashboards_user",
      "password": "password",
    },
    "capabilities": {
      "alerting": {
        "enabled": true,
        "version": "1.2",
      },
      "ism": {
        "enabled": true,
        "supported_actions": [
          "roll_over",
          "shrink"
        ]
      }
    }
  },
  ...
}

And we will add a reference to data-source in index-pattern, so that an index-pattern object will look like:

{
  "type": "index-pattern",
  "index-pattern": {
    "title": "demo-index-pattern",
    "fields": {
      ...
    },
    "dataSource": "data-source-obj-id" 
  },
  "references" : [
    {
      "id": "data-source-obj-id",
      "name": "kibanaSavedObjectMeta.dataSource",
      "type": "data-source"
    }
  ],
  ...
}

With the new data-source model being added, visualziations are able to get the data source reference id from index pattern and then pass it to OSD server along with the query. Then OSD server can get the data source attributes using saved object service, then query that specific data source.

The user experience will be changed by having the new data-source model. Users needs to create data sources before they can create an index pattern. Then, when creating an index pattern, users will need to select a data source which the index pattern will be associated to. Going afterwards, the visualization and dashboard building experience will remain the same as it is today.

A PoC for adding data-source model and use it in index-pattern and visualization can be found at: https://github.com/zengyan-amazon/OpenSearch-Dashboards/tree/ext-data-source-discover

There is a caveat that data-source includes user credentials, which needs to be encrypted and handled carefully. That may break the general data handling in saved object service, as data-source needs to be handled specially. Or we may end up letting OSD to manage another secure index(or data store) to handle data-source/credentails.

Scope

For this RFC, we focus on supporting data sources that is compatible with OpenSearch 1.x APIs. We will try to make sure the design and implementation to be extensible to support other data sources, but it is not a design goal.
The credentials should be handled in secure way, such as encryption is in scope.
Support of non-visualization plugins, such as alerting, to connect to different OpenSearch data sources is in scope.

FAQ

Is it required to have data source defined for all index patterns? What if I don't want this capability?

The plan is to have this multiple data source feature configurable, so that users can enable or disable it in OSD's yml config file.

Also, we wanted to maintain backward compatibility, so that users can upgrade safely. When an index pattern doesn't have a data source, it can fall back to use the same OpenSearch endpoint as its saved object store.

I enabled security plugin for both OpenSearch Dashboards and OpenSearch clusters, can OpenSearch Dashboards use my OSD credentials to query OpenSearch data sources?

This is more about a implementation level detail. It can work with basic auth, but not likely to work with users who logs into OSD using SSO like OIDC or SAML. We want to provide the simplest expreience to users, and will figure out more details during design and implemenation phase.

The text was updated successfully, but these errors were encountered:

seraphjiang · 2022-04-01T16:34:21Z

@zhongnansu would you take a look

zhongnansu · 2022-04-05T18:23:31Z

@zengyan-amazon @seraphjiang Thanks for putting up the detailed proposal. I do have some questions that I'd like to discuss here.

Does this allow OSD to support cross cluster features(queries, visualization, etc)? Can one index-pattern be associated to indices from different cluster? From the RFC, I don't think cross cluster features will be supported, just to confirm. Also, I wonder if this is a valid user need.
For the capabilities field in the data-source model, why do we need to define plugin availability explicitly, if OSD is connecting to one cluster at a time. I thought this meta info can always be retrieved on demand by calling the_cat/_plugin API to the OpenSearch endpoint defined in data-source ?

zengyan-amazon · 2022-04-05T20:06:57Z

@zhongnansu these are good points and call outs

Does this allow OSD to support cross cluster features(queries, visualization, etc)? Can one index-pattern be associated to indices from different cluster? From the RFC, I don't think cross cluster features will be supported, just to confirm. Also, I wonder if this is a valid user need.

I would suggest to start with no cross cluster and then we can discuss if we want to add the capability later to OSD. cross cluster can be a follow up project, which may also requires aggregation in OSD and a lot of other considerations. We can keep it in mind when doing the design and implementation to leave as much flexibility as possbile for future. Keeping OSD flexible and extensible is always one of our principle.

For the capabilities field in the data-source model, why do we need to define plugin availability explicitly, if OSD is connecting to one cluster at a time. I thought this meta info can always be retrieved on demand by calling the_cat/_plugin API to the OpenSearch endpoint defined in data-source ?

This is more about the design detail, the point is the OSD needs to have a way to determine a specific capability is support in the given data source. Define it in data source is one approach, using _cat/_plugin API is another.

We can discuss the pros and cons in design phase, e.g. the config in data source approach is not dynamic, while the API approach has an assumption that the identity used by data source (or maybe OSD server) needs to have the permission to list all plugins in each backend cluster, which may or many not be a valid assumption.

zhongnansu · 2022-04-26T21:35:10Z

Created a proposal for better client management in multi data source project. POC is needed
Tracking here #1499

peternied · 2022-04-28T19:28:05Z

@zengyan-amazon This seems like an opportunity to reinvent the primitive data type used to power OpenSearch Dashboard queries, index-patterns are an OpenSearch concept. What do you think about embedding index-patterns into the data-source definition?

When adding support for other sources like SQL tables, DynamoDB, or CosmoDB there would be a common interface. Another way to frame this problem is how to write an OpenSearch data-source.

zengyan-amazon · 2022-05-10T18:06:44Z

@peternied A datasource here describes an OpenSearch endpoint, which may have multiple index-patterns. And index-pattern is the foundation of visualizations, so embedding index-pattern into data source definition may not be a good idea based on current model in OpenSearch Dashboards.

If we really want to support other sources like SQL DB, DynamoDB or others, we may consider still use composition that making data source as an attribute of index-pattern, and remodel index-pattern to make it more generic to support other use cases

bjo004 · 2022-06-04T11:24:32Z

@zengyan-amazon @seraphjiang Thanks for putting up the detailed proposal. I do have some questions that I'd like to discuss here.

Does this allow OSD to support cross cluster features(queries, visualization, etc)? Can one index-pattern be associated to indices from different cluster? From the RFC, I don't think cross cluster features will be supported, just to confirm. Also, I wonder if this is a valid user need.

For the capabilities field in the data-source model, why do we need to define plugin availability explicitly, if OSD is connecting to one cluster at a time. I thought this meta info can always be retrieved on demand by calling the_cat/_plugin API to the OpenSearch endpoint defined in data-source ?

I can confirm that this is a valid user need. I'm dealing with very many OpenSearch clusters deployed in kubernetes and it would be very nice to have one OSD in front of them all and also not having to configure SSO for each OSD.

Kind regards,

Bankole.

dblock · 2022-08-31T15:12:51Z

I see that the proposal has a separate UX for credentials and data sources. I think this is a bad idea.

Bad user experience. How many data sources will a typical cluster have? I bet no more than 5, so why would users have to configure credentials in a separate panel, associate them with a data source, etc?
It's a security problem that implies everybody is an admin and anyone who can create a data source can see all credentials. Charlie creates credentials C, attaches them to data source D1. Now Alice attaches credentials C to data source D2, so Alice can now get a copy of C.
It's a 1-way door. Once you can associate a set of credentials with multiple data sources you cannot go back to a 1:1 relationship because users rely on the many:1 behavior.

I think that for the first cut you should simplify and not build a credentials panel, but let users configure credentials in the data source editor UX. You can still store credentials in a separate object so that you can build a credentials management panel in the future.

It's a lot simpler for users to edit credentials in the data source editor.
It enables a security model where Charlie can create a set of credentials that will never be accessible by anyone other than Charlie. Charlie owns the data source they create, nobody else needs to modify/see it.
It's a 2-way door. You can build a 1:1 data-source:credentials now, and always expand it to many:1, but not the other way around.

seraphjiang · 2022-09-21T23:07:58Z

here is link to track the feedback
#2400

joshuarrrr · 2022-11-22T02:01:09Z

@zengyan-amazon This issue is labeled v2.4.0. Is it ready to be closed, or should we update/remove the release label?

seraphjiang · 2022-12-01T22:59:14Z

Closing this as it is RFC, will track task in other meta issue.
@zengyan-amazon @joshuarrrr @seanneumann @kristenTian

zengyan-amazon added discuss RFC Substantial changes or new features that require community input to garner consensus. labels Mar 25, 2022

zengyan-amazon self-assigned this Mar 25, 2022

zengyan-amazon mentioned this issue Mar 31, 2022

OpenSearch Dashboards 2022 Initiatives #1405

Closed

zengyan-amazon mentioned this issue Apr 6, 2022

[Draft][PoC] multiple OpenSearch data source PoC #1430

Closed

7 tasks

kavilla linked a pull request Apr 11, 2022 that will close this issue

[Draft][PoC] multiple OpenSearch data source PoC #1430

Closed

7 tasks

seraphjiang mentioned this issue Apr 14, 2022

[PROPOSAL] Dashboards anywhere for OpenSearch eco-system opensearch-project/dashboards-anywhere#1

Closed

5 tasks

zhongnansu mentioned this issue Apr 26, 2022

[PoC] Efficient client management to support multiple OpenSearch clusters #1499

Closed

6 tasks

This was referenced Jul 28, 2022

[MD] Instantiate credential management plugin code structure #1996

Merged

[MD] Add initial credential management CRUD pages #2040

Merged

kristenTian mentioned this issue Aug 18, 2022

[DOC] Documentation request for Multiple Data Source project opensearch-project/documentation-website#922

Closed

4 tasks

zhongnansu added the multiple datasource multiple datasource project label Sep 1, 2022

seraphjiang added the v2.4.0 'Issues and PRs related to version v2.4.0' label Sep 12, 2022

noCharger mentioned this issue Sep 28, 2022

[MD] Research on encryption / decryption strategies #1756

Closed

vagimeli mentioned this issue Oct 26, 2022

[DOC] Create Discover homepage opensearch-project/documentation-website#991

Closed

joshuarrrr mentioned this issue Nov 8, 2022

[Discuss] Separate out data source configurations from existing visualization types #2823

Open

ahopp mentioned this issue Nov 8, 2022

[Feedback] OpenSearch Dashboards Multiple OpenSearch Clusters Support #2829

Open

elfisher mentioned this issue Nov 16, 2022

[FEATURE] Materialized views (aka virtual indexes) on object stores opensearch-project/sql#1080

Open

seraphjiang closed this as completed Dec 1, 2022

wbeckler mentioned this issue Jun 14, 2023

Cross replicate internal users and roles database opensearch-project/cross-cluster-replication#1076

Closed

seraphjiang mentioned this issue Feb 18, 2024

[RFC] Plugins Version Decoupling #5877

Open

This was referenced Feb 20, 2024

[Meta][2.14] Support Multiple Data Source in OpenSearch Dashboards Plugins #5870

Closed

[FEATURE] Support Multiple Data Source in Security Dashboards Plugin opensearch-project/security-dashboards-plugin#1782

Closed

seraphjiang mentioned this issue Feb 20, 2024

[MD] Index pattern selector which was used in maps dashboard plugin and visualization should show data source name as prefix #5900

Closed

cwperks mentioned this issue Mar 26, 2024

[FEATURE] Support external for authentication_backend where user list is defined in another cluster opensearch-project/security#4175

Closed

BionIT mentioned this issue Mar 26, 2024

[Multiple Datasource] Create example plugin with multiple data source support #6275

Closed

BionIT mentioned this issue Apr 22, 2024

[Meta][2.15] Support Multiple Data Source in OpenSearch Dashboards Plugins #6596

Closed

5 tasks

BionIT mentioned this issue Jun 7, 2024

[Meta][2.16] Support Multiple Data Source in OpenSearch Dashboards Plugins #6976

Closed

3 tasks

BionIT mentioned this issue Jul 30, 2024

[Meta][2.17] Support Multiple Data Source in OpenSearch Dashboards Plugins #7578

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Enable OpenSearch Dashboards to support multiple OpenSearch clusters #1388

[RFC] Enable OpenSearch Dashboards to support multiple OpenSearch clusters #1388

zengyan-amazon commented Mar 25, 2022

seraphjiang commented Apr 1, 2022

zhongnansu commented Apr 5, 2022

zengyan-amazon commented Apr 5, 2022

zhongnansu commented Apr 26, 2022

peternied commented Apr 28, 2022

zengyan-amazon commented May 10, 2022

bjo004 commented Jun 4, 2022

dblock commented Aug 31, 2022 •

edited

Loading

seraphjiang commented Sep 21, 2022 •

edited

Loading

joshuarrrr commented Nov 22, 2022

seraphjiang commented Dec 1, 2022

[RFC] Enable OpenSearch Dashboards to support multiple OpenSearch clusters #1388

[RFC] Enable OpenSearch Dashboards to support multiple OpenSearch clusters #1388

Comments

zengyan-amazon commented Mar 25, 2022

Problem Statement

Proposed Solution

Scope

FAQ

Is it required to have data source defined for all index patterns? What if I don't want this capability?

I enabled security plugin for both OpenSearch Dashboards and OpenSearch clusters, can OpenSearch Dashboards use my OSD credentials to query OpenSearch data sources?

seraphjiang commented Apr 1, 2022

zhongnansu commented Apr 5, 2022

zengyan-amazon commented Apr 5, 2022

zhongnansu commented Apr 26, 2022

peternied commented Apr 28, 2022

zengyan-amazon commented May 10, 2022

bjo004 commented Jun 4, 2022

dblock commented Aug 31, 2022 • edited Loading

seraphjiang commented Sep 21, 2022 • edited Loading

joshuarrrr commented Nov 22, 2022

seraphjiang commented Dec 1, 2022

dblock commented Aug 31, 2022 •

edited

Loading

seraphjiang commented Sep 21, 2022 •

edited

Loading