-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NETWORKING: Make RemoteClusterConn. Lazy Resolve DNS #32764
Conversation
Pinging @elastic/es-core-infra |
@@ -48,8 +49,12 @@ | |||
/** | |||
* A list of initial seed nodes to discover eligible nodes from the remote cluster | |||
*/ | |||
public static final Setting.AffixSetting<List<InetSocketAddress>> REMOTE_CLUSTERS_SEEDS = Setting.affixKeySetting("search.remote.", | |||
"seeds", (key) -> Setting.listSetting(key, Collections.emptyList(), RemoteClusterAware::parseSeedAddress, | |||
public static final Setting.AffixSetting<List<String>> REMOTE_CLUSTERS_SEEDS = Setting.affixKeySetting("search.remote.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am stranger here, so sorry if I am wrong, but it would be more readable:
public static final Setting.AffixSetting<List<String>> REMOTE_CLUSTERS_SEEDS = Setting.affixKeySetting(
"search.remote.",
"seeds",
(key) -> Setting.listSetting(
key,
Collections.emptyList(),
s -> {
// validate seed address
RemoteClusterAware.parseSeedAddress(s);
return s;
},
Setting.Property.NodeScope,
Setting.Property.Dynamic
)
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, I don't think there's much of a wrong or right here :) I tend to agree with your suggested style but if you look around the codebase you will find that my style is what's used pretty much everywhere so I went with the consistent approach here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I see now.
That's completely fine then.
Consistency is a very good thing, your're right.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@original-brownbear I use the style suggested by @chernecov. 😇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe that's all I needed to hear :) Will reformat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed this and the general approach looks right. I think what is missing though, sorry if I overlooked, is a test that shows that we are now resolve addresses on demand? That is, that at least we (independent of what the JVM might be doing), no longer cache addresses.
@jasontedor thanks for taking a look :) You didn't overlook anything, I was a bad one for not adding a test => added one now though :) |
hold off a few minutes with the review, seems the new test is unstable in some way -> fixing |
@jasontedor should be stable/good to review now. Sorry for the noise :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a concern about how we validate the seed node setting now.
key, Collections.emptyList(), | ||
s -> { | ||
// validate seed address | ||
RemoteClusterAware.parseSeedAddress(s); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this seems problematic. It means, for example, if a host fails to resolve, we mark the setting as invalid. Yet, it could resolve for us later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasontedor yea, I guess we could leave it out. I mainly just added this because there was an explicit test for this behavior.
I guess we could just delete it and drop the validation. I could go either way here:
Yes this could resolve later, so maybe it's fine not to validate. On the other hand, you could make an annoying typo in a hostname and now you won't catch it right away. Probably a question of how often and if someone would ever configure a hostname that doesn't resolve yet. Also, if you don't have the DNS caching ttl configured, not having this check is kind of annoying because it will never resolve and you might rather want this to fail right away
=> You decide :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a hostname fails to resolve at time of use (i.e., when we try to connect to the remote cluster), we would have a failure message in the logs indicating the problem? That is, the unknown host exception would be there?
I lean towards saying we leave the lookup out of validation, yet I think the remainder of the validation is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasontedor sounds good will adjust accordingly in a bit :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasontedor done in a22994a, extracted the port parsing from the address parsing to get all validation except for the hostname lookup into a separate method :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I left a few comments, but no need for another round.
if (port <= 0) { | ||
throw new IllegalArgumentException("port number must be > 0 but was: [" + port + "]"); | ||
} | ||
return new InetSocketAddress(hostAddress, port); | ||
return port; | ||
} catch (NumberFormatException e) { | ||
throw new IllegalArgumentException("port must be a number", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this error message can be misleading. For example, Integer.valueOf
will fail to parse 123456789123456789
with a NumberFormatException
yet 123456789123456789
is a number! I am okay with catching NumberFormatException
and re-throwing since we can add context that NumberFormatException
does not provide (For input string: "123456789123456789"
is not helpful, specifying that this occurred while parsing a port is) but I think a more correct error message would be helpful. For example, failed to parse port
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I do not see a test for this case.
transportAddress, | ||
Version.CURRENT.minimumCompatibilityVersion()); | ||
nodes.add(node); | ||
List<Supplier<DiscoveryNode>> nodes = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can pre-size the list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure will do :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, one more nit.
parsePort(s); | ||
return s; | ||
}, | ||
Setting.Property.NodeScope, Setting.Property.Dynamic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could probably be on a separate line each, too. 😇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure will do :)
@jasontedor thanks addressed all points => will merge once green :) |
Jenkins test this |
* master: NETWORKING: Make RemoteClusterConn. Lazy Resolve DNS (elastic#32764) [DOCS] Splits the users API documentation into multiple pages (elastic#32825) [DOCS] Splits the token APIs into separate pages (elastic#32865) [DOCS] Creates redirects for role management APIs page Bypassing failing test PainlessDomainSplitIT#testHRDSplit (elastic#32966) TEST: Mute testRetentionPolicyChangeDuringRecovery [DOCS] Fixes more broken links to role management APIs [Docs] Tweaks and fixes to rollup docs [DOCS] Fixes links to role management APIs [ML][TEST] Fix BasicRenormalizationIT after adding multibucket feature [DOCS] Splits the roles API documentation into multiple pages (elastic#32794) [TEST] Run pre 6.4 nodes in non-FIPS JVMs (elastic#32901) Make Geo Context Mapping Parsing More Strict (elastic#32821)
* elastic/master: (46 commits) NETWORKING: Make RemoteClusterConn. Lazy Resolve DNS (#32764) [DOCS] Splits the users API documentation into multiple pages (#32825) [DOCS] Splits the token APIs into separate pages (#32865) [DOCS] Creates redirects for role management APIs page Bypassing failing test PainlessDomainSplitIT#testHRDSplit (#32966) TEST: Mute testRetentionPolicyChangeDuringRecovery [DOCS] Fixes more broken links to role management APIs [Docs] Tweaks and fixes to rollup docs [DOCS] Fixes links to role management APIs [ML][TEST] Fix BasicRenormalizationIT after adding multibucket feature [DOCS] Splits the roles API documentation into multiple pages (#32794) [TEST] Run pre 6.4 nodes in non-FIPS JVMs (#32901) Make Geo Context Mapping Parsing More Strict (#32821) [ML] fix updating opened jobs scheduled events (#31651) (#32881) Scripted metric aggregations: add deprecation warning and system property to control legacy params (#31597) Tests: Fix timezone conversion in DateTimeUnitTests Enable FIPS140LicenseBootstrapCheck (#32903) Fix InternalAutoDateHistogram reproducible failure (#32723) Remove assertion in testDocStats on deletedDocs counter (#32914) HLRC: Move ML request converters into their own class (#32906) ...
* Lazy resolve DNS (i.e. `String` to `DiscoveryNode`) to not run into indefinitely caching lookup issues (provided the JVM dns cache is configured correctly as explained in https://www.elastic.co/guide/en/elasticsearch/reference/6.3/networkaddress-cache-ttl.html) * Changed `InetAddress` type to `String` for that higher up the stack * Passed down `Supplier<DiscoveryNode>` instead of outright `DiscoveryNode` from `RemoteClusterAware#buildRemoteClustersSeeds` on to lazy resolve DNS when the `DiscoveryNode` is actually used (could've also passed down the value of `clusterName = REMOTE_CLUSTERS_SEEDS.getNamespace(concreteSetting)` together with the `List<String>` of hosts, but this route seemed to introduce less duplication and resulted in a significantly smaller changeset). * Closes elastic#28858
* Lazy resolve DNS (i.e. `String` to `DiscoveryNode`) to not run into indefinitely caching lookup issues (provided the JVM dns cache is configured correctly as explained in https://www.elastic.co/guide/en/elasticsearch/reference/6.3/networkaddress-cache-ttl.html) * Changed `InetAddress` type to `String` for that higher up the stack * Passed down `Supplier<DiscoveryNode>` instead of outright `DiscoveryNode` from `RemoteClusterAware#buildRemoteClustersSeeds` on to lazy resolve DNS when the `DiscoveryNode` is actually used (could've also passed down the value of `clusterName = REMOTE_CLUSTERS_SEEDS.getNamespace(concreteSetting)` together with the `List<String>` of hosts, but this route seemed to introduce less duplication and resulted in a significantly smaller changeset). * Closes #28858
* Lazy resolve DNS (i.e. `String` to `DiscoveryNode`) to not run into indefinitely caching lookup issues (provided the JVM dns cache is configured correctly as explained in https://www.elastic.co/guide/en/elasticsearch/reference/6.3/networkaddress-cache-ttl.html) * Changed `InetAddress` type to `String` for that higher up the stack * Passed down `Supplier<DiscoveryNode>` instead of outright `DiscoveryNode` from `RemoteClusterAware#buildRemoteClustersSeeds` on to lazy resolve DNS when the `DiscoveryNode` is actually used (could've also passed down the value of `clusterName = REMOTE_CLUSTERS_SEEDS.getNamespace(concreteSetting)` together with the `List<String>` of hosts, but this route seemed to introduce less duplication and resulted in a significantly smaller changeset). * Closes #28858
* commit '9088d811ce9cff922e6ef1befbeb0f1e0c27016a': (23 commits) Generalize remote license checker (elastic#32971) Trim translog when safe commit advanced (elastic#32967) Fix an inaccuracy in the dynamic templates documentation. (elastic#32890) HLREST: AwaitsFix ML Test Make Geo Context Mapping Parsing More Strict (elastic#32862) Add mzn and dz to unsupported locales (elastic#32957) Use settings from the context in BootstrapChecks (elastic#32908) Update docs for node specifications (elastic#30468) HLRC: Forbid all Elasticsearch logging infra (elastic#32784) Fix use of deprecated apis Only configure publishing if it's applied externally (elastic#32351) Protect ScriptedMetricIT test cases against failures on 0-doc shards (elastic#32959) (elastic#32968) Scripted metric aggregations: add deprecation warning and system (elastic#32944) Watcher: Properly find next valid date in cron expressions (elastic#32734) Fix some small issues in the getting started docs (elastic#30346) Set forbidden APIs target compatibility to compiler java version (elastic#32935) [TEST] Add "ne" as an unsupported SimpleKdc locale (elastic#32700) MINOR: Remove `IndexTemplateFilter` (elastic#32841) (elastic#32974) INGEST: Create Index Before Pipeline Execute (elastic#32786) (elastic#32975) NETWORKING: Make RemoteClusterConn. Lazy Resolve DNS (elastic#32764) (elastic#32976) ...
String
toDiscoveryNode
) to not run into indefinitely caching lookup issues (provided the JVM dns cache is configured correctly as explained in https://www.elastic.co/guide/en/elasticsearch/reference/6.3/networkaddress-cache-ttl.html)InetAddress
type toString
for that higher up the stackSupplier<DiscoveryNode>
instead of outrightDiscoveryNode
fromRemoteClusterAware#buildRemoteClustersSeeds
on to lazy resolve DNS when theDiscoveryNode
is actually used (could've also passed down the value ofclusterName = REMOTE_CLUSTERS_SEEDS.getNamespace(concreteSetting)
together with theList<String>
of hosts, but this route seemed to introduce less duplication and resulted in a significantly smaller changeset).