-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create new feature interface for querying features on a node/cluster #100330
Conversation
Hi @thecoop, I've created a changelog YAML for you. |
While it's probably been discussed in planning, we're probably just about at the last stage where we can discuss naming. The term "feature" is already used in Elasticsearch to describe a group of system indices and data streams. We have a "get features" API, a feature migration API, and users can specify features to include in snapshot operations. I haven't brought this up before because the concepts seem conceptually distant enough to coexist under the same name in our codebase. And I have no ideas for an alternative. But I think it's worth mentioning just in case any reviewers spot a possibility for confusion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
The only concern I have about the statics is testability (ovverriding/mocking features in tests).
Suppose I want to have a specific set of features in my tests; how would I mock this? Do I need to specify a FeatureSpecification? Since this load statically, would this interfere with other tests? Can we get around this somehow (e.g. via specific classloaders for ServiceLoader or something similar)
Or do you see other ways, like overriding DiscoveryNode.hasFeature, or something else?
...ework/src/main/java/org/elasticsearch/cluster/coordination/CoordinationStateTestCluster.java
Show resolved
Hide resolved
@@ -154,7 +161,23 @@ public DiscoveryNode build() { | |||
versionInfo = new VersionInformation(version, minIndexVersion, maxIndexVersion); | |||
} | |||
|
|||
return new DiscoveryNode(name, id, ephemeralId, hostName, hostAddress, address, attributes, roles, versionInfo, externalId); | |||
if (features == null) { | |||
features = FeatureService.readFeatures(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this or if it's better to have Set.of()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're not specifying any features, you want it to have the features of the local codebase so that your new shiny feature is automatically enabled
Why not?
closing over both the Set and FeatureService (if needed?) |
I've added a reset method to |
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Pinging @elastic/es-distributed (Team:Distributed) |
There are two main reasons I've used statics for FeatureService:
That said, I don't like the use of statics here, but it's a tradeoff with ease-of-use for the majority use case. |
Is there a potential problem here with upgrades across the version boundary...
At step 3, are all the DiscoveryNodes re-registered with the new master, and so send the master their features, or does it continue to use the cluster state created by the 8.11 master, so it won't have any features in it until each node re-joins the cluster at some point later? |
I've had a look at this, and the difficulty comes from deserialization - DiscoveryNode needs to get an instance of this from nothing within the deser constructor. Maybe when the serialization is consolidated in DiscoveryNodes there'll be a way to pull an instance from the parent DiscoveryNodes, but that's currently not possible |
Yeah, that's definitely the tricky bit; it clashes with the goal is to minimize what we serialize/deserialize. Of course you can complicate the logic in the injected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
server/src/main/java/org/elasticsearch/features/FeatureService.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting this together so quickly! I have a few thoughts.
final Function<Settings, PluginsService> pluginServiceCtor, | ||
Environment initialEnvironment, | ||
Function<Settings, PluginsService> pluginServiceCtor, | ||
Collection<? extends FeatureSpecification> additionalFeatureSpecs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adjusting the ctor of Node, can we make this available for test overriding like other capabilities here? Have a method which loads the features (or feature service), and then have MockNode use a special version of that if a marker plugin exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that, but there's an issue with subclass-superclass construction - FeatureService
needs to be created in the Node
constructor, but if we override it in MockNode
the field storing the feature specs in MockNode
won't be assigned when the superclass Node
constructor runs. All the other overrides don't access fields in MockNode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MockNode doesn't need to contain any additional members, so there shouldn't be any field access necessary. The pattern used throughout Node is to have a protected, overridable method, eg newFeatureService()
. The MockNode ctor wouldn't be affected, only the overridable method which would be called from the Node ctor. See for example createBigArrays or newSearchService (we are inconsistent about naming...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you give an example by the change you're thinking of? Particularly how the node-specific featurespecs get into the Node instance constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refactor in #100768 will make it a lot easier to introduce this to mock nodes
@@ -473,6 +476,9 @@ protected Node( | |||
SettingsExtension.load().forEach(e -> additionalSettings.addAll(e.getSettings())); | |||
client = new NodeClient(settings, threadPool); | |||
|
|||
FeatureService.registerSpecificationsFrom(pluginsService, additionalFeatureSpecs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having a static method here, coudln't this be passed into the ctor on the next line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to make it clear what is stored statically and what is in the service instance. Specifying it in the constructor hides the fact that the specs are actually stored statically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the specs may be stored statically, it is better for tests if they are non static (so eg each node can be independent). Is there a reason the specs couldn't be loaded in the ctor altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the discussion #100330 (comment) and #100330 (comment) - DiscoveryNode needs to get the feature specs from nothing, which means they need to be static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The features can be overridden per-node by implementing ESIntegTestCase.nodeFeatureSpecifications
, but I'm not sure if there's a better way to do it - see #100330 (comment)
/** | ||
* Returns information on historical features that should be added to all nodes at or above the {@link Version} specified. | ||
*/ | ||
default Map<NodeFeature, Version> getHistoricalFeatures() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid using Version here? Internally to compare these historic versions we need a very reduced concept of Version, so I wonder if we could hardcode that there, so as not to hold up overall reduction of uses of Version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historical features need to be applied based on what release version nodes are - their Version. Similar to the issues the ML team are having with converting MlConfigVersion, we need some way to identify previous release versions consistently.
We will probably replace Version
with something else later on, and change this interface accordingly, but for the moment, Version
is what we've got.
private static volatile Set<String> NODE_FEATURES; | ||
private static volatile NavigableMap<Version, Set<String>> HISTORICAL_FEATURES; | ||
|
||
private static void readSpecs() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the terminology here is confusing, this really should be server features? ie non plugins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loads the FeatureSpecification
implementations, not the features themselves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still specs from server, vs those from plugins, right? I think we should have some distinction, otherwise it seems like these are all the specs, when they're only some.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily, in unit tests this will read all the specs as the modules under test are loaded by the main classloader
server/src/main/java/org/elasticsearch/features/FeatureService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/features/FeatureEra.java
Outdated
Show resolved
Hide resolved
We do indeed need to worry about this, with a fixup listener similar to |
We can ensure this works for serverless by ensuring the feature api gets pushed into all serverless deployments with no features declared (so the default of the empty set is actually correct). We can then happily add features to serverless. This is only really an issue for release deployments, where we will be going from no featureset at all to a populated featureset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may sound like déja vu, but I'm not convinced we want to add this to DiscoveryNode
directly. I worry about subtle BwC problems related to applying a cluster state from nodes that don't have features in the DiscoveryNode
object. I know we said we could add a post-upgrade-fix-up thing, but that won't fix up any DiscoveryNode
objects that already exist and have been stored somewhere else (e.g. the keys of the map in org.elasticsearch.transport.ClusterConnectionManager#connectedNodes
).
I expect these features to be cluster-wide things - either the cluster knows about feature X or it doesn't, and the fact that this computation is based on the cluster membership is kind of an implementation detail. So I'd encourage thinking about making these things separate from the DiscoveryNode
objects shared in the transport handshake, rather like we did with TransportVersion
.
We also pass DiscoveryNode
objects over the wire in various places where we want to just identify a node. They're already rather overweight for that purpose unfortunately (see e.g. #99744) but I reckon this set of features is going to grow pretty fast and make that problem a whole lot worse.
Do we need some kind of join barrier for features? Once we've reported that a cluster supports feature X, it seems that it might be unsafe to then go back into a state where it doesn't.
@DaveCTurner Yes, the plan is to use features for a join barrier, as a feature ratchet - a node won't be able to join the cluster if it is missing one or more features that are present in every other node in the cluster (with a few exceptions). I would say this does belong alongside Version on DiscoveryNode, as it is supplanting the concept & use cases of node version. I agree that DiscoveryNode is becoming too big, but that larger refactor should be done separately to this piece of work. This PR is blocking lots of other significant work, so we should get this in soon. |
That is not a good reason to compromise on the design IMO. These BwC concerns are very difficult to resolve once released, and I'd rather not repeat the mistakes we made with pluggable node roles in the 7.x series. |
After discussing with @DaveCTurner and @rjernst, we're going to take a slightly different approach to this:
I'll do an initial pass over code that is checking the version of individual nodes & see if there's any blockers on changing those to cluster-wide checks |
I'm closing this PR & will open a new one for the new implementation, to help clear the floor a bit |
Sounds right. I only found two instances of this in tests so far; one was easily re-written as a check against the whole cluster. The second one was about checking different things (warning headers, IIRC) if a cluster was totally on a version that had the feature or partially. But I suspect that could also be re-written with a different assertion |
Expose features on DiscoveryNode and node startup to publish regular feature, and handle historical features
FeatureService
is the main class handling this. It is largely static, as the feature specs exposed by a node are a static property of the code. And getting aFeatureService
implementation into everyDiscoveryNode
instance is not feasible.As part of IT tests, feature specifications are wound through the the internal clusters to MockNodes where they are added to each node's features without requiring SPI.