Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] TransportGetAutoscalingCapacityActionIT.testCurrentCapacity fails #67089

Closed
tlrx opened this issue Jan 6, 2021 · 4 comments
Closed

[CI] TransportGetAutoscalingCapacityActionIT.testCurrentCapacity fails #67089

tlrx opened this issue Jan 6, 2021 · 4 comments
Assignees
Labels
:Distributed Coordination/Autoscaling Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI v7.12.0

Comments

@tlrx
Copy link
Member

tlrx commented Jan 6, 2021

The test TransportGetAutoscalingCapacityActionIT.testCurrentCapacity failed today on 7.x.

Build scan:
https://gradle-enterprise.elastic.co/s/nhhgubn2rakko

Repro line:

./gradlew ':x-pack:plugin:autoscaling:internalClusterTest' --tests "org.elasticsearch.xpack.autoscaling.action.TransportGetAutoscalingCapacityActionIT.testCurrentCapacity" -Dtests.seed=A296FEA92BAF236F -Dtests.security.manager=true -Dtests.locale=ru-RU -Dtests.timezone=Europe/Amsterdam -Druntime.java=8

Reproduces locally?:
No

Applicable branches:
7.x

Failure history:
Failed once today.

Failure excerpt:

org.elasticsearch.xpack.autoscaling.action.TransportGetAutoscalingCapacityActionIT > testCurrentCapacity FAILED
java.lang.AssertionError:
Expected: a value greater than <0L>
but: <0L> was equal to <0L>
at __randomizedtesting.SeedInfo.seed([A296FEA92BAF236F:400CA2FCCE0775B2]:0)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
at org.junit.Assert.assertThat(Assert.java:956)
at org.junit.Assert.assertThat(Assert.java:923)
at org.elasticsearch.xpack.autoscaling.action.TransportGetAutoscalingCapacityActionIT.testCurrentCapacity(TransportGetAutoscalingCapacityActionIT.java:29)

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 6, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@henningandersen
Copy link
Contributor

henningandersen commented Jan 7, 2021

This turns out to be a java bug against debian 8 (have not ruled out that it could not happen on other distros). It is fixed in java 15. Following small program can demonstrate the issue:

import java.lang.management.ManagementFactory;
import java.lang.management.OperatingSystemMXBean;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;

public class Memory {
    private static final OperatingSystemMXBean osMxBean = ManagementFactory.getOperatingSystemMXBean();

    public static void main(String[] args) throws InvocationTargetException, IllegalAccessException {
        System.out.println(getMethod("getTotalPhysicalMemorySize").invoke(osMxBean));
    }

    private static Method getMethod(String methodName) {
        try {
            return Class.forName("com.sun.management.OperatingSystemMXBean").getMethod(methodName);
        } catch (Exception e) {
            // not available
            return null;
        }
    }

}

Resulting in 0 on java 8, 11 and 14 when run on debian 8. Java 15 works fine, looks like this was resolved when they implemented cgroupv2 support, in that they fixed that null memory subsystem would result in -1 rather than 0 here:

https://hg.openjdk.java.net/jdk/jdk/rev/ad9525a5d546#l4.127

The problem occurs when some cgroup subsystem is detected but there is no memory subsystem. Whether this only occurs on debian 8 or also on other kernels is unknown at present. Will fix the test to skip being run for debian-8 and java versions prior to 15.

Notice that cgroups are detected also when not running in a container, but the cgroups are then typically unlimited.

henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Jan 7, 2021
Prior to java 15, ES running on debian 8 will report 0 memory of host,
therefore this test cannot run on debian 8.

Relates elastic#67089
henningandersen added a commit that referenced this issue Jan 8, 2021
Prior to java 15, ES running on debian 8 will report 0 memory of host,
therefore this test cannot run on debian 8.

Relates #67089
henningandersen added a commit that referenced this issue Jan 8, 2021
Prior to java 15, ES running on debian 8 will report 0 memory of host,
therefore this test cannot run on debian 8.

Relates #67089
henningandersen added a commit that referenced this issue Jan 8, 2021
Prior to java 15, ES running on debian 8 will report 0 memory of host,
therefore this test cannot run on debian 8.

Relates #67089
@henningandersen
Copy link
Contributor

Disabled this test for java versions prior to 15 when running on debian 8 - autoscaling will not need debian 8 support (and always run in containers anyway).

Leaving this issue open for the test to have been run on all platforms before closing.

@henningandersen
Copy link
Contributor

The test remained stable, closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Autoscaling Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI v7.12.0
Projects
None yet
Development

No branches or pull requests

3 participants