Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] ServiceAccountIT classMethod failing #92930

Closed
slobodanadamovic opened this issue Jan 16, 2023 · 7 comments
Closed

[CI] ServiceAccountIT classMethod failing #92930

slobodanadamovic opened this issue Jan 16, 2023 · 7 comments
Assignees
Labels
:Delivery/Build Build or test infrastructure :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) Team:Delivery Meta label for Delivery team Team:Security Meta label for security team >test-failure Triaged test failures from CI

Comments

@slobodanadamovic
Copy link
Contributor

slobodanadamovic commented Jan 16, 2023

Build scan:
https://gradle-enterprise.elastic.co/s/ugoqwcksbynes/tests/:x-pack:plugin:security:qa:service-account:javaRestTest/org.elasticsearch.xpack.security.authc.service.ServiceAccountIT

Reproduction line:

./gradlew ':x-pack:plugin:security:qa:service-account:javaRestTest' --tests "org.elasticsearch.xpack.security.authc.service.ServiceAccountIT" -Dtests.seed=7BBBFBC0C57760FE -Dtests.locale=en-US -Dtests.timezone=Etc/UTC -Druntime.java=17 -Dtests.fips.enabled=true

Applicable branches:
main

Reproduces locally?:
Didn't try

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.security.authc.service.ServiceAccountIT&tests.test=classMethod

Failure excerpt:

java.lang.RuntimeException: An error occurred while checking cluster 'test-cluster' status.

  at __randomizedtesting.SeedInfo.seed([7BBBFBC0C57760FE]:0)
  at org.elasticsearch.test.cluster.local.LocalClusterHandle.waitUntilReady(LocalClusterHandle.java:141)
  at org.elasticsearch.test.cluster.local.LocalClusterHandle.start(LocalClusterHandle.java:71)
  at org.elasticsearch.test.cluster.local.LocalElasticsearchCluster$1.evaluate(LocalElasticsearchCluster.java:37)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:833)

  Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Contains non-LDH ASCII characters

    at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
    at org.elasticsearch.test.cluster.util.Retry.getValueWithTimeout(Retry.java:54)
    at org.elasticsearch.test.cluster.util.Retry.retryUntilTrue(Retry.java:31)
    at org.elasticsearch.test.cluster.local.LocalClusterHandle.waitUntilReady(LocalClusterHandle.java:134)
    at org.elasticsearch.test.cluster.local.LocalClusterHandle.start(LocalClusterHandle.java:71)
    at org.elasticsearch.test.cluster.local.LocalElasticsearchCluster$1.evaluate(LocalElasticsearchCluster.java:37)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
    at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
    at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
    at java.lang.Thread.run(Thread.java:833)

    Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Contains non-LDH ASCII characters

      at org.elasticsearch.test.cluster.util.Retry.lambda$getValueWithTimeout$1(Retry.java:49)
      at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
      at java.lang.Thread.run(Thread.java:833)

      Caused by: java.lang.IllegalArgumentException: Contains non-LDH ASCII characters

        at java.net.IDN.toASCIIInternal(IDN.java:297)
        at java.net.IDN.toASCII(IDN.java:123)
        at javax.net.ssl.SNIHostName.<init>(SNIHostName.java:99)
        at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:572)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:183)
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:142)
        at org.elasticsearch.test.cluster.local.WaitForHttpResource.checkResource(WaitForHttpResource.java:123)
        at org.elasticsearch.test.cluster.local.WaitForHttpResource.wait(WaitForHttpResource.java:107)
        at org.elasticsearch.test.cluster.local.LocalClusterHandle.lambda$waitUntilReady$7(LocalClusterHandle.java:136)
        at org.elasticsearch.test.cluster.util.Retry.lambda$retryUntilTrue$0(Retry.java:33)
        at org.elasticsearch.test.cluster.util.Retry.lambda$getValueWithTimeout$1(Retry.java:47)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
        at java.lang.Thread.run(Thread.java:833)

Edit: Added reproduction line.

@slobodanadamovic slobodanadamovic added Team:Security Meta label for security team :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) >test-failure Triaged test failures from CI labels Jan 16, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@slobodanadamovic
Copy link
Contributor Author

slobodanadamovic commented Jan 16, 2023

There seems to be an issue with host name which is passed to WaitForHttpResource when it is created.

The problem is that LocalClusterHandle at line 151 calls node.getHttpAddress() which returns IPv6 address with port (e.g. [::1]:64922). This causes an issue downstream when creating SNIHostName.

The getHttpAddress simply takes the first address from the list see:

in this concrete case the list had 2 addresses:

  1. [::1]:64922 and
  2. 127.0.0.1:64923

Note: Using the IPv6 address causes the issue, while using IPv4 doesn't.

I've raised a PR which, as a workaround, sets localhost for http.host setting, but I believe we need a general solution in WaitForHttpResource.

@slobodanadamovic slobodanadamovic added :Delivery/Build Build or test infrastructure Team:Delivery Meta label for Delivery team labels Jan 16, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

@ywangd
Copy link
Member

ywangd commented Jan 17, 2023

@slobodanadamovic Thanks for looking into it. I invested sometime trying to understand the failure as well. It was puzzling to me because the http.ports file does not change, i.e. the IPv6 address worked previously before the refactoring of using the new inline test cluster. I was curious why it is not working now.

It turned out to be related to FIPS. As a side effect of moving test cluster setup to JUnit, the client (WaitForHttpResource) code now runs with the same JVM options as the ES server. In this case, it runs with FIPS enabled. With FIPS enabled, SSL factory and socket have different implementations than the JDK ones which trigger SNI configuration within HttpsClient.

I think ideally we do not want the test cluster setup client to run with FIPS since it can potentially create unnecessary maintainence burden. I am not sure how feasible it is to separate them though. This behaviour can probably be argued to be "more correct" since actual rest tests do run in FIPS mode and they are also mostly client code (different client implementations). @mark-vieira

If we accept this new behaviour, one possible workaround is to disable hostname verification for the setup client, e.g. something like the following works:

diff --git a/test/test-clusters/src/main/java/org/elasticsearch/test/cluster/local/WaitForHttpResource.java b/test/test-clusters/src/main/java/org/elasticsearch/test/cluster/local/WaitForHttpResource.java
index edab2cdf1e7..8ee9169f8c9 100644
--- a/test/test-clusters/src/main/java/org/elasticsearch/test/cluster/local/WaitForHttpResource.java
+++ b/test/test-clusters/src/main/java/org/elasticsearch/test/cluster/local/WaitForHttpResource.java
@@ -140,8 +140,9 @@ public class WaitForHttpResource {
 
     private void configureSslContext(HttpURLConnection connection, SSLContext ssl) {
         if (ssl != null) {
-            if (connection instanceof HttpsURLConnection) {
-                ((HttpsURLConnection) connection).setSSLSocketFactory(ssl.getSocketFactory());
+            if (connection instanceof final HttpsURLConnection httpsURLConnection) {
+                httpsURLConnection.setSSLSocketFactory(ssl.getSocketFactory());
+                httpsURLConnection.setHostnameVerifier((hostname, sslSession) -> true);
             } else {
                 throw new IllegalStateException("SSL trust has been configured, but [" + url + "] is not a 'https' URL");
             }

Obviously there are other alternatives. I have no strong opinion on how it should be fixed. But it would be better if individual test class does not have to deal with it as suggested by Slobodan.

@slobodanadamovic
Copy link
Contributor Author

@ywangd Thanks for checking this as well. I missed the fact that the issue happens only when FIPS is enabled.

TIL

With FIPS enabled, SSL factory and socket have different implementations than the JDK ones which trigger SNI configuration within HttpsClient.

slobodanadamovic added a commit that referenced this issue Jan 17, 2023
This commit adds a workaround for failing `ServiceAccountIT` test
by explicitly setting a host name to `localhost` instead of using default
`_local_` address.

Relates to #92930
@benwtrent
Copy link
Member

@slobodanadamovic this is also failing on 8.6 if you want to backport your fix: https://gradle-enterprise.elastic.co/s/vng6xiitobh3c

@mark-vieira
Copy link
Contributor

I actually now see that we aren't properly setting up test clusters using the new framework for FIPS at all. This is an oversight and I'll get that fixed along with a proper fix for this that doesn't require hard-coding http.host.

slobodanadamovic added a commit to slobodanadamovic/elasticsearch that referenced this issue Jan 17, 2023
This commit adds a workaround for failing `ServiceAccountIT` test
by explicitly setting a host name to `localhost` instead of using default
`_local_` address.

Relates to elastic#92930

(cherry picked from commit 96d3fce)
elasticsearchmachine pushed a commit that referenced this issue Jan 17, 2023
This commit adds a workaround for failing `ServiceAccountIT` test
by explicitly setting a host name to `localhost` instead of using default
`_local_` address.

Relates to #92930

(cherry picked from commit 96d3fce)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Build Build or test infrastructure :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) Team:Delivery Meta label for Delivery team Team:Security Meta label for security team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

5 participants