[Discuss] Core Foundation Code Refactoring #5910

nknize · 2023-01-17T20:06:38Z

Is your feature request related to a problem? Please describe.

Elasticsearch never supported a third-party plugin ecosystem. Elasticsearch plugins were implemented as a mechanism to enable internal Elastic developers the ability to quickly write and release independent and decoupled features across a growing startup organization but within a monolithic repo. Third party developers (e.g., ODFE, independent developers) exploited this plugin framework to write custom "untrusted" / "second class" plugins for elasticsearch in separate multi-repositories. When OpenSearch forked, we inherited this disorganized and poorly documented set of plugin interfaces, and quickly (some might say, prematurely) marketed it for downstream usage with strict "semver" guarantees (that actually aren't enforced).

To fix this, OpenSearch is striving for (well supported) modularity, in the same spirit, to enable rapid development of new decoupled features (e.g., replication strategies, decoupling translog, streaming index api).
OpenSearch is also striving for "extensibility" through mechanisms like opensearch-sdk to enable community and third-party development / contribution of plugins, modules, and libraries.

The problem, however, is multi-fold. For instance, transitive / bloated dependencies that are not well encapsulated. The opensearch-sdk, along with countless other plugins, pull server as a dependency which transitively depends on 29 other libraries whether an extension / plugin, needs them or not. This exposes the entire OpenSearch internals to downstream plugins giving the ability to do potentially (most likely unknowingly) malicious or dangerous things.

Additionally, as core is being developed, there is no clear definition of what foundation or concrete classes should go where (e.g., :server vs :module vs :plugin vs :libs). For example, DocValueFormat is an interface inside the :server module search namespace used to define custom DocValueFormatters. It would be useful if this were in the opensearch-core lib so a plugin wouldn't have to depend on the entire server jar just to pull in dependency foundation interfaces do define their own DocValueFormatter.

Finally, and one of the most important justifications for this refactor effort, we all know JSM is deprecated with an API targeted to be replaced with noops sometime in the future. There is no drop in replacement mostly because java has moved away from the world of client side applets into the world of server side trusted code. Shallow sandboxing is the new model. Thus, the interim recommendation is to leverage the "new" language design features (since Java 9) of JEP 403 and Project Jigsaw (JPMS). OpenSearch is unable to use these design features until we eliminate split-package across the core namespaces (e.g., o.o.common, o.o.bootstrap).

At present the code organization is as follows:

libs/ - foundation classes that can be used in any of modules/, plugins/, server/, qa/, sandbox/, test/. (e.g., cli, core, nio, secure-sm, x-content, ...).
modules/ - "first class" features and capabilities considered part of "the core" / "min distribution". Installed by default. (e.g., geo, ingest-common, lang-painless, percolator, reindex, ...)
plugins/ - "first class" features and capabilities not considered part of the "the core" / "min distribution". Not installed by default. (e.g., analysis-icu, ingest-attachment, repository-s3, store-smb, ...)
server/ - "the core" implementation, architected into logical namespaces (e.g., index, ingest, repositories, search, snapshots, ...)

Due to the tightly coupled nature of the plugins and module plugins to the :server module big feature development efforts (e.g., replication strategies, decoupling translog, streaming index api) spend a significant amount of time unwinding foundation :server classes from concrete implementation details just to extend them for new use cases (e.g., local vs remote storage). The unwinding typically leads to more foundation classes remaining in the :server module only because there is no formal distinction of where these foundation classes should go. This problem exacerbated by the limitation that the build is configured such that library modules may not depend on other library modules except opensearch-core. This heavily restricts opensearch from using libraries to trim down the bloated :server module (example provided below).

Describe the solution you'd like
Decoupling foundation class and interface contracts from the :server module into appropriate separable JPMS enabled libraries as defined below:

:libs:opensearch-common - Reusable java components only (e.g., collections, generic data structures). This library is dependency free. It may not depend on any other opensearch or third-party libraries unless it's needed for build-tools only.
:libs:opensearch-core - The OpenSearch foundation classes, interface contracts / definitions, and base classes implemented by :server and other plugins.
:libs:opensearch-* - Foundation libraries implemented by modules and plugins. May depend on other opensearch libraries but not on modules or plugins.
:modules:* - Plugins that are installed by default as part of the minimum distribution
:plugins:* - Plugins that are not installed by default but shipped as part of the min distribution
opensearch-project/Plugin-* - External OpenSearch plugins that are not installed by default and shipped as part of the bundle

We will ensure that the core libraries (e.g., opensearch-core) and concrete module implementations (e.g., :modules:mapper-extras) or library implementations (e.g., :libs:x-content) do not not have package split-brains that prevent using JPMS for strong encapsulation, and breaking SemVer guarantees.

For clarity, here's an example where the cross library dependency limitation currently restricts a big part of modularization:

Aggregations

Third party libraries, plugins, modules, etc, want to build new aggregations all the time. We provide this ability and it's super powerful. To do so is quite boiler plate: 1. If the CoreValuesSourceType library doesn't already provide the requisite ValueSource, write the needed ValuesSourceType the new aggregation depends on, 2. write the Aggregator, 3. write the Aggregator Factory, 4. write the AggregationBuilder, 5. register the aggregation through your plugin using the AggregationSpec

As you can see in this process there is a large number of abstract classes and interfaces required to support implementing a new concrete aggregation anywhere in the "ecosystem". Organizationally, these base classes are spread across the aggregations and aggregations.support namespace, along with the mechanisms needed to implement the aggregation phase(s) in the SearchModule and SearchService class implementations. This means for anyone wanting to build a custom aggregation they would take a build.gradle dependency on "org.opensearch:opensearch:${opensearchVersion}" which pulls in a bloated server jar containing much more than the classes and dependencies they need to build their aggregation.

I think most on this project agree this is a modularization anti-pattern, so to remedy we're striving to make the codebase much more modular. So how do we achieve that here with this clumsy aggregation foundation and user facing "API" (which isn't exactly well documented, or implemented, or supported)? At first pass you might say, "hey, how about we stick with the intent of the libraries and modules design and refactor the generics and base classes from the bloated :server module to a new :opensearch-aggregations library with the concrete implementation in a new :aggs-common module that continue to use the search.aggregations.* namespace? I'd agree with this, in fact I'd go a step further and say this is where I think we should go with, not just aggregations, most of the big opensearch mechanisms (field mappers, field types, doc value types, field data, etc). In the end the :server module should be nothing more than the "motherboard" providing the concrete implementation of all the opensearch base components within the libraries and concrete extendable mechanisms provided by the modules and optional plugins.

But if you embark down that journey you will quickly find the cross library dependency restriction blocks much of this objective and the tight dependency requirement of the :server module with other modules quickly leads to JarHell issues. Take the aggregation decoupling use case, for example. Aggregations, like field mappers, are serializable and therefore depend on XContent in order to implement the ToXContentFragment logic. So how about refactoring all the XContent base classes to the :opensearch-x-content library then? That works until you refactor any base interfaces that depend on XContent to a library other than opensearch-core, because libraries cannot depend on each other (e.g., any new library cannot take a dependency on both core and x-content).

So that leaves us with a few options:

remove the cross library dependency restriction
refactor all base classes that would result in a cross library dependency situation into modules instead
Do nothing, leave all base classes and interfaces that would result in a cross library dependency in the server module and leave users to depend on the massive jar
refactor opensearch core base classes and interface contracts that would result in a cross library dependency to the opensearch-core library and the implementation details to other libraries, modules, and plugins

With this refactoring effort we chose Option 4 because it not only fits the modularization objective but after the aggregation framework is refactored to a new module (and supporting library) users in the aggregation scenario above would only need to take a dependency on the opensearch-core and a (soon coming) opensearch-aggregations library to implement a new concrete aggregation in their own plugin.

Describe alternatives you've considered

remove the cross library dependency restriction
refactor all base classes that would result in a cross library dependency situation into modules instead
Do nothing, leave all base classes and interfaces that would result in a cross library dependency in the server module and leave users to depend on the massive jar

Additional context
relates #2447

Also see conversation in #5902 for additional discussion.

The text was updated successfully, but these errors were encountered:

andrross · 2023-01-20T18:11:22Z

I think this is in line with the work started in #1838. Tagging it here for reference.

dblock · 2023-01-20T22:32:31Z

I think the proposed refactor is a huge undertaking. I agree directionally that it's a good thing, however we are still refactoring a massive monolith which seems like a ton of work to create extension points all while cutting out deep dependencies. Have you considered an alternate strategy where we'd expose new interfaces on top of the current mess, export them into the SDK, build new and existing functionality as clean extensions, fold the default implementation (as an extension) back in the distribution, and finally delete the old implementation? If we do this enough times we'd end up with your idea of motherboard.

(Note that the SDK is pulling server as a dependency is a shortcut, and either way it will only do so at build time and not at runtime, if I am not mistaken - @saratvemulapalli can confirm).

nknize · 2023-01-25T15:03:33Z

I think the proposed refactor is a huge undertaking....

Of course it is. We can do this incrementally without shimming...

Have you considered an alternate strategy where we'd expose new interfaces...

No because shimming will create intermediate interfaces we ultimately may not want exposed in the sdk. Incrementally refactoring core to the proper libraries not only gives us control within this repository but enables us the ability to conservatively hold classes back as @opensearch.internal and thoughtfully expose them on-demand as the sdk is being built. If, in the meantime, the sdk needs to expose an interface we are welcome to do that on a case by case basis. We shouldn't tightly couple the sdk and core house cleaning efforts.

dblock · 2023-01-25T23:59:58Z

Thanks @nknize. My concern is that we continue to overly rely on internal (plugin) interfaces where we shouldn't. @saratvemulapalli brought this up recently around the efforts of moving security into core, for example.

nknize · 2023-01-26T00:45:07Z

Thanks @nknize. My concern is that we continue to overly rely on internal (plugin) interfaces where we shouldn't. @saratvemulapalli brought this up recently around the efforts of moving security into core, for example.

Let's call that out on a case by case basis where it continues to occur. Chances are we inherited this and are trying to move away from it.

The more I get into the throws of reorganizing the house the more I lean into the java plugin sdk being a :libs/sdk project in the OpenSearch repository to keep it close to the internal libs.

dblock · 2023-01-26T17:11:13Z

The extensions SDK is https://github.com/opensearch-project/opensearch-sdk-java, and I think it's a good idea if it only relies on whatever is in :libs/sdk (wouldn't limit it to "plugins").

nknize · 2023-01-30T15:19:27Z

I updated the issue with the following definitions:

:libs:opensearch-common - Reusable java components only (e.g., collections, generic data structures). This library is dependency free. It may not depend on any other opensearch or third-party libraries unless it's needed for build-tools only.
:libs:opensearch-core - The OpenSearch foundation classes, interface contracts / definitions, and base classes implemented by :server and other plugins.
:libs:opensearch-* - Foundation libraries implemented by modules and plugins. May depend on other opensearch libraries but not on modules or plugins.
:modules:* - Plugins that are installed by default as part of the minimum distribution
:plugins:* - Plugins that are not installed by default but shipped as part of the min distribution

saratvemulapalli · 2023-02-04T02:56:31Z

Thanks @nknize for opening this up. I like it!!
As we are working towards Extensions/SDK, we realized compile time dependencies restrict the usage (of plugins/extension) across different versions of OpenSearch.

+1 to :libs/sdk to be the only interface for all things external interacting with the core.
Also these changes will bring pain for folks who depend on OpenSearch (eg clients, plugins etc).
And can only be done in a major version. I believe we are at the right spot that plugins/clients consume 3.0.0-SNAPSHOTS and will have enough time to consume these changes, sooner the better.

nknize · 2023-03-31T19:00:25Z

Coming out of #6875 (comment)

Objective: Eliminate downstream dependencies on :server that look like this implementation("org.opensearch:opensearch:${opensearchVersion}") because they expose concrete OpenSearch implementations we don't want downstream consumers to override.

Option 1: Refactor OpenSearch foundation classes to opensearch-core library and open the opensearch-core library to third party dependencies like lucene
Con: Continues transitive dependencies for anything depending on opensearch-core

Option 2: Refactor OpenSearch foundation classes that do not have third-party dependencies to opensearch-common library, and OpenSearch Foundation classes that do have third-party dependencies to opensearch-core
Con: High bar for first time contributors. Confusion on where foundation classes should go causing potential refactors between two libraries and namespaces.

andrross · 2023-03-31T23:38:53Z

Objective: Eliminate downstream dependencies on :server

@nknize Just to make sure I fully understand the objective, concretely this will mean all *Plugin classes and interfaces and everything else that plugins currently use in :server will go into opensearch-core (or opensearch-common if appropriate). Is that right? Assuming that's true, then I'd vote for Option 1 above as it gets us closer to that goal and doesn't make any existing problems worse. If down the road we want to further split up opensearch-core we can do so if desired.

reta · 2023-04-01T01:10:01Z

@nknize Just to make sure I fully understand the objective, concretely this will mean all *Plugin classes and interfaces and everything else that plugins currently use in :server will go into opensearch-core

Not answering the question, but this is my understanding as per #6875 (comment)

nknize · 2023-04-01T04:06:52Z

concretely this will mean all *Plugin classes and interfaces and everything else that plugins currently use in :server will go into opensearch-core

Correct!

If down the road we want to further split up opensearch-core we can do so if desired.

💯 that's the approach I like...progress toward splitting these into their proper home. It may not be perfect, but it helps lean out the dependencies and gives us some clear understanding where all of the modules live.

I put this rough diagram together to capture the spirit of the modularity vision. The reality is most of the concrete code lives in :server... 😞 I'm trying to split that out to get us closer to this extensible future so we really can start looking at alternative replication models, or translog implementations.

minalsha · 2023-04-01T16:04:15Z

@dbwiddis , @owaiskazi19 thoughts?

nknize added enhancement Enhancement or improvement to existing feature or request discuss Issues intended to help drive brainstorming and decision making untriaged labels Jan 17, 2023

anasalkouz added the Plugins label Jan 17, 2023

This was referenced Jan 18, 2023

[Refactor] XContent base classes from xcontent to core library #5902

Merged

[Refactor] Use local opensearch.common.SetOnce instead of lucene's utility class #5947

Merged

saratvemulapalli mentioned this issue Jan 20, 2023

Add versioning support for SDK opensearch-project/opensearch-sdk-java#337

Closed

7 tasks

nknize mentioned this issue Jan 23, 2023

[Refactor] core.common to new opensearch-common library #5976

Merged

andrross added untriaged and removed untriaged labels Jan 24, 2023

nknize mentioned this issue Jan 25, 2023

Require MediaType in Strings.toString API #6009

Merged

minalsha removed the untriaged label Jan 31, 2023

saratvemulapalli added the extensions label Feb 4, 2023

nknize mentioned this issue Feb 7, 2023

[Refactor] Requests to use MediaType instead of XContentType #6218

Merged

gaiksaya mentioned this issue Aug 2, 2023

[BUG] Rebase Main Branch w/ Refactors from upstream core opensearch-project/index-management#883

Closed

nknize mentioned this issue Aug 11, 2023

Refactor Compressors from CompressorFactory to CompressorRegistry for extensibility #9262

Merged

minalsha removed the Plugins label Aug 11, 2023

nknize mentioned this issue Aug 15, 2023

[Remove] JodaCompatibleZonedDateTime #9355

Merged

nknize mentioned this issue Aug 29, 2023

[Demote] ZStd compression from GA / LTS to experimental and release a 2.9.1 patch #9422

Closed

This was referenced Oct 13, 2023

Update to OpenSearch 2.10 opensearch-project/spring-data-opensearch#174

Merged

[RFC] Pull-based indexing #10610

Closed

This was referenced Oct 25, 2023

Guice resection #10921

Closed

[Remove] Remaining Joda and Joda Dependency #10949

Closed

Refactor common parts from the Rounding class into a separate 'round' package #11023

Merged

linuxpi mentioned this issue Mar 21, 2024

[Feature Request] Separation of Merges #12726

Open

sachinpkale mentioned this issue Apr 4, 2024

[Feature Request] Stronger directory abstraction for segment storage #13075

Open

linuxpi mentioned this issue May 6, 2024

[Design Proposal] Offline Background Tasks #13554

Open

andrross mentioned this issue May 7, 2024

[RFC] Cloud Native SQL Plugin #13274

Closed

andrross added the Roadmap:Modular Architecture Project-wide roadmap label label Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discuss] Core Foundation Code Refactoring #5910

[Discuss] Core Foundation Code Refactoring #5910

nknize commented Jan 17, 2023 •

edited

Loading

andrross commented Jan 20, 2023

dblock commented Jan 20, 2023 •

edited

Loading

nknize commented Jan 25, 2023

dblock commented Jan 25, 2023

nknize commented Jan 26, 2023

dblock commented Jan 26, 2023

nknize commented Jan 30, 2023

saratvemulapalli commented Feb 4, 2023

nknize commented Mar 31, 2023

andrross commented Mar 31, 2023

reta commented Apr 1, 2023

nknize commented Apr 1, 2023

minalsha commented Apr 1, 2023

[Discuss] Core Foundation Code Refactoring #5910

[Discuss] Core Foundation Code Refactoring #5910

Comments

nknize commented Jan 17, 2023 • edited Loading

Aggregations

andrross commented Jan 20, 2023

dblock commented Jan 20, 2023 • edited Loading

nknize commented Jan 25, 2023

dblock commented Jan 25, 2023

nknize commented Jan 26, 2023

dblock commented Jan 26, 2023

nknize commented Jan 30, 2023

saratvemulapalli commented Feb 4, 2023

nknize commented Mar 31, 2023

andrross commented Mar 31, 2023

reta commented Apr 1, 2023

nknize commented Apr 1, 2023

minalsha commented Apr 1, 2023

nknize commented Jan 17, 2023 •

edited

Loading

dblock commented Jan 20, 2023 •

edited

Loading