This design is part of the OpenSearch Dashboards multi data source project [RFC], where we need to manage and expose datasource clients. Connections are established by creating clients that can then be used by a caller to interact with any data source (OpenSearch is the only data source type in scope at this phase).
Overall the critical problems we are solving are:
- How to set up connection(clients) for different data sources?
- How to expose data source clients to callers through clean interfaces?
- How to maintain backwards compatibility if user turn off this feature?
- How to manage multiple clients/connection efficiently, and not consume all the memory?
- Where should we implement the core logic?
- How to register custom API schema and add into the client during initialization?
- Accessibility:
- Clients need to be accessible by other OpenSearch Dashboards plugins or modules through interfaces, in all stages of the plugin lifecycle. E.g “Setup”, and “Start”
- Clients should be accessible by plugin through request handler context.
- Client Management: Clients needs to be reused in a resource-efficient way to not harm the performance.
- Backwards compatibility: if user enables this feature and later disabled it. Any related logic should be able to take in this config change, and deal with any user cases.
- Either switching to connect to default OpenSearch cluster
- Or blocking the connection to data source, and throw error message
- Auditing: Need to log different user query on different data sources, for troubleshooting, or log analysis
- We are adding a new service in core to manage data source clients, and expose interface for plugins and modules to access data source client.
- Existing OpenSearch services and saved object services should not be affected by this change
3.1 Dataflow of plugin(use viz plugin as example) call sequence to retrieve data form any datasource.
1. How to set up connection(clients) for different data sources?
Similar to how current OpenSearch Dashboards talks to default OpenSearch by creating a client using opensearch-js library, for data sources we also create clients for each connection. Critical params that differentiate data sources are url
and auth
const { Client } = require('@opensearch-project/opensearch');
const dataSourceClient = new Client({
node: url,
auth: {
username,
password,
},
...OtherClientOptions,
});
dataSourceClient.search();
dataSourceClient.ping();
2. How to expose datasource clients to callers through clean interfaces?
We create a data source service
. Similar to existing opensearch service
in core, which provides client of default OpenSearch cluster. This new service will be dedicated to provide clients for data sources. Following the same paradigm we can register this new service to CoreStart
, CoreRouteHandlerContext
, in order to expose data source client to plugins and modules. The interface is exposed from new service, and thus it doesn’t mess up with any existing services, and keeps the interface clean.
*// Existing*
*const defaultClient: OpenSearchClient = core.opensearch.client.asCurrentUser
*
// With openearch_data_services added
const dataSourceClient: OpenSearchClient = core.openearchData.client
3.How to maintain backwards compatibility if user turns off this feature?
The context is that user can only turn on/off multiple datasource feature by updating boolean config data_source.enabled
in opensearch_dashboards.yml
and reboot.
- Browser side, if datasource feature is turned off, browser should detect the config change and update UI not allowing request to be submitted to any datasource. Multiple datasource related UI shouldn't render. If the request is not submitted to a datasource, the logic won’t return a datasource client at all.
- Server side, if user submits the request to datasource manually, on purpose. Or the plugin tries to access datasource client from server side. In the corresponding core service we’ll have a flag that maps to the enable_multi_datasource boolean config, and throw error if API is called while this feature is turned off.
4.How to manage multiple clients/connection efficiently, and not consume all the memory?
- For data sources with different endpoint, user client Pooling (E.g. LRU cache)
- For data sources with same endpoint, but different user, use connection pooling strategy (child client) provided by opensearch-js.
5.Where should we implement the core logic?
Current opensearch service
exists in core. The module we'll implement has similarity function wise, but we choose to implement data source service
in plugin along with crypto
service for the following reasons.
- Data source is a feature that can be turned on or off. Plugin is born for such pluggable use case.
- We don't mess up with OpenSearch Dashboards core, since this is an experimental feature, the potential risk of breaking existing behavior will be lowered if we use plugin. Worst case, user could just uninstall the plugin.
- Complexity wise, it's about the same amount of work.
6.How to register custom API schema and add into the client during initialization? Currently, OpenSearch Dashboards plugins uses the following to initialize instance of Cluster client and register the custom API schema via the plugins configuration option.
core.opensearch.legacy.createClient(
'exampleName',
{
plugins: [ExamplePlugin],
}
);
The downside of this approach is the schema is defined inside the plugin and there is no centralized registry for the schema making it not easy to access. This will be resolved by implementing a centralized API schema registry, and consumers can add data source plugin as dependency and be able to consume all the registered schema, eg. dataSource.registerCustomApiSchema(sqlPlugin)
.
The schema will be added into the client configuration when multi data source client is initiated.
Create a data source plugin that only has server side code, to hold most core logic of data source feature. Including data service, crypto service, and client management. A plugin will have all setup, start and stop as lifecycle.
Functionality
- Setup plugin configuration such as
data_source.enabled
- Define and register datasource as a new saved object type
- Initiate data source service and crypto service
- Register API to get datasource client to core route handler context
- Setup logging and auditing
- Stop all running services in plugin
stop()
phase
We need to create a data source service in the data source plugin, to provide the main functionality and APIs for callers to getDataSourceClient()
. A service in a plugin will have all setup, start and stop as lifecycle.
Functionality
- Initialize client pool as empty data structure but with size mapped to user config value. (
data_source.clientPool.size
) - Configuring a data source client and expose as
getDataSourceClient()
from service level.
We need to configure the data source client by either creating a new one, or looking up the client pool.
Functionality
-
Get data source meta info: Use saved object client to retrieve data source info from OpenSearch Dashboards system index by id, and parse results to
DataSource
object.{ title: ds-sample; description?: data source; endpoint: http://opensearch.com; auth: { type: "Basic Auth" username: "user name" password: "encrypted content" }; }
-
Get root client: Look up the client pool by endpoint and return the client if it exists. If a client was not found, a new client instance is created and loaded into pool. At this step, the client won't have any auth info.
-
Get credentials: Call crypto service utilities to decrypt user credentials from
DataSource
Object. -
Assemble the actual query client: With auth info and root client, we’ll leverage the
opensearch-js
connection pooling strategy to create the actual query client from root client byclient.child()
.
OpenSearch Dashboards had two types of clients available for use when created. One was the "new client" which has since been separated into opensearch-js
, and the other was the legacy client named elasticsearch-js
. Legacy clients are still used by some core features like visualization and index pattern management.
// legacy client
context.core.opensearch.legacy.client.callAsCurrentUser;
// new client
context.core.opensearch.client.asCurrentUser;
Since deprecating legacy client could be a bigger scope of project, multiple data source feature still need to implement a substitute for it as for now. Implementation should be done in a way that's decoupled with data source client as much as possible, for easier deprecation. Similar to opensearch legacy service in core. See how to intialize the data source client below:
context.dataSource.opensearch.legacy.getClient(dataSourceId);
If using Legacy cluster client with asScoped and callAsCurrentUser, the following is the equivalent when using data source client:
//legacy cluster client
const response = client.asScoped(request).callAsCurrentUser(format, params);
//equivalent when using data source client instead
const response = client.callAPI(format, params);
This is for plugin to access data source client via request handler. For example, by core.client.search(params)
. It’s a very common use case for plugin to access cluster while handling request. In fact data plugin uses it in its search module to get client, and I’ll talk about it in details in next section.
- param
- dataSourceId: need it to retrieve datasource info for either creating new client, or look up the client pool
- return type: OpenSearchClient
core.http.registerRouteHandlerContext( 'dataSource', { opensearch: { getClient: (dataSourceId: string) = { ... return dataSourceService.getDataSourceClient() } } }
Search strategy
is the low level API of data plugin search module. It retrieves clients and queries OpenSearch. It needs to be refactored to switch between the default client and the datasource client, depending on whether or not a request is sent to the datasource.
Currently default client is retrieved by search module of data plugin to interact with OpenSearch by this API call. Ref: opensearch-search-strategy.ts
const client: OpenSearchClient = core.opensearch.client.asCurrentUser;
// use API provided by opensearch-js lib to interact with OpenSearch
client.search(params);
Similarly we’ll have the following for datasource use case. AsCurrentUser
doesn't really apply to a datasource because it’s always the “current” user's credentials, defined in the “datasource”, that gets used to initialize the client or lookup the client pool.
if (request.dataSource) {
await client: OpenSearchClient =
core.opensearchData.getClient(<datasourceId>)
} else {
// existing logic to retrieve default client
client: OpenSearchClient = core.opensearch.client.asCurrentUser
}
// use API provided by opensearch-js lib to interact with OpenSearch
client.ping()
client.search(params)
When loading a dashboard with visualizations, each visualization sends at least 1 request to server side to retrieve data. With multiple data source feature enabled, multiple requests are being sent to multiple datasources, that requires multiple clients. If we return a new client per request, it will soon fill up the memory and sockets with idle clients hanging there. Of course we can close a client anytime. But the connection is supposed to be kept alive for easy reload and periodic pulling data. Therefore, we should come up with better solution to manage clients efficiently.
- Key: data source endpoint
- Value: OpenSearch client object
- Configurable pool size:
data_source.clientPool.size
, default to 5 - Use existing js
lru-cache
lib in OpenSearch Dashboards, that enables easy initialization, look up, and dumping outdated client. - While stopping the service, we can close all the connections by looping the LRU cache and calling
client.close()
for each. - For data sources with same endpoint, but different user, use connection pooling strategy (child client) provided by opensearch-js.
import LRUCache from 'lru-cache';
export class OpenSearchClientPool {
private cache?: LRUCache<string, Client>
...