Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MauveMultiThreadLoadTest_special_16 String.ConsCharset got 65503 but expected 223 #11192

Closed
pshipton opened this issue Nov 16, 2020 · 44 comments · Fixed by eclipse-omr/omr#5748
Closed

Comments

@pshipton
Copy link
Member

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly_mauveLoadTest/177
MauveMultiThreadLoadTest_special_16
variation: Mode555
JVM_OPTIONS: -XX:+UseCompressedOops -Xgcpolicy:balanced -Xjit:counts=- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000,gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xcheck:gc:vmthreads:all:quiet
cent7-ppcle-4

LT  00:32:27.176 - First failure detected by thread: load-1. Not creating dumps as no dump generation is requested for this load test
LT  stderr JVMDUMP039I Processing dump event "throw", detail "net/adoptopenjdk/loadTest/reporting/MauveTestFailureException" at 2020/11/15 00:32:27 - please wait.
LT  stderr JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly_mauveLoadTest/openjdk-tests/TKG/test_output_1605414111127/MauveMultiThreadLoadTest_special_16/20201115-003214-MauveMultiThreadTestTrc/results/core.20201115.003227.22927.0001.dmp' in response to an event
STF 00:32:38.975 - Found dump at: /home/jenkins/workspace/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly_mauveLoadTest/openjdk-tests/TKG/test_output_1605414111127/MauveMultiThreadLoadTest_special_16/20201115-003214-MauveMultiThreadTestTrc/results/core.20201115.003227.22927.0001.dmp
LT  00:32:39.073 - suite.getInventory().getInventoryFileRef(): openjdk.test.load/config/inventories/mauve/mauve_multiThread.xml
LT  00:32:39.088 - suite.isCreateDump() : false
LT  00:32:39.091 - Test failed
LT    Failure num.  = 1
LT    Test number   = 2649
LT    Test details  = 'Mauve[gnu.testlet.java.lang.String.ConsCharset]'
LT    Suite number  = 0
LT    Thread number = 1
LT  >>> Captured test output >>>
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 0)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 1)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 2)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 3)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 4)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 5)
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 6)
LT  got 65503 but expected 223
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 7)
LT  got 57683 but expected 8531
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 8)
LT  got 57519 but expected 8367
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 9)
LT  got 65533 but expected 55296
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 10)
LT  got 65533 but expected 56640
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 11)
LT  got 㿟⅓₯㿽㿽 but expected ￟��
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 12)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 13)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 14)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 15)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 16)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 17)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 18)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 19)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 20)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 21)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 22)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 23)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 24)
LT  <<<
@pshipton
Copy link
Member Author

The test, which failed converting some UTF8 bytes to a String.

public class ConsCharset
  implements Testlet
{
  public void test (TestHarness h)
  {
    try
      {
        byte[] cp437Bytes = asByteArray(new int[] { 224, 226, 227, 228, 156 });
        checkString(h, new String(cp437Bytes, Charset.forName("CP437")),
                                  "\u03b1\u0393\u03c0\u03a3\u00a3");
      }
    catch (UnsupportedCharsetException e)
      {
        // Skip tests as CP437 is not required by the spec.
      }

    byte[] utf8Bytes = asByteArray(new int[] { 0xC3,0x9F,0xE2,0x85,0x93,0xE2,0x82,0xAF,0xF0,0x90,0x85,0x80 });
    checkString(h, new String(utf8Bytes, Charset.forName("UTF8")),
                "\u00DF\u2153\u20AF\uD800\uDD40");

    byte[] isoBytes = asByteArray(new int[] {0x48,0x65,0x6C,0x6C,0x6F,0x20,0x57,0x6F,0x72,0x6C,0x64,0x21});
    checkString(h, new String(isoBytes, Charset.forName("ISO-8859-1")),
                "Hello World!");
  }

  private void checkString(TestHarness h, String result, String expected)
  {
    for (int a = 0; a < result.length(); ++a)
      {
        h.check(result.charAt(a), expected.charAt(a));
      }
    h.check(result, expected);
  }

  private byte[] asByteArray(int[] ints)
  {
    byte[] bytes = new byte[ints.length];
    for (int a = 0; a < ints.length; ++a)
      bytes[a] = Integer.valueOf(ints[a]).byteValue();
    return bytes;
  }

}

@pshipton
Copy link
Member Author

pshipton commented Nov 16, 2020

Run some grinders https://ci.eclipse.org/openj9/job/Grinder/1174/
Failed 2 / 24

@pshipton
Copy link
Member Author

@gita-omr can someone take a look pls.

@pshipton
Copy link
Member Author

pshipton commented Nov 18, 2020

@pshipton
Copy link
Member Author

@gita-omr this is failing enough to be a blocker, but it's running in a non-production mode. We need to evaluate if the problem is specific to the non-production options.

@gita-omr
Copy link
Contributor

Yes, we will take a look. @mnalam-p

@mnalam-p
Copy link
Contributor

mnalam-p commented Nov 25, 2020

I tried running in grinder and test locally, I am not able to reproduce the issue with same jdk build (openj9/omr/jcl hash). @pshipton @gita-omr Any suggestion on what else should I try?

@pshipton
Copy link
Member Author

pshipton commented Nov 25, 2020

Trying another grinder on the OpenJ9 machines, using the latest build.
https://ci.eclipse.org/openj9/job/Grinder/1205/ - failed 6/48

@pshipton
Copy link
Member Author

@mnalam-p the grinder isn't done, but there are still failures https://ci.eclipse.org/openj9/job/Grinder_iteration_2/20/

@pshipton
Copy link
Member Author

btw, I suggested to Nazmul he try grinding using NodesByIterations and NUM_MACHINES to get coverage across a number of machines in order to recreate it.

@pshipton
Copy link
Member Author

pshipton commented Nov 26, 2020

@gita-omr
Copy link
Contributor

Started 5 iterations here: https://ci.eclipse.org/openj9/job/Grinder/1226/

@gita-omr
Copy link
Contributor

gita-omr commented Nov 27, 2020

Had to replace IP address in the CUSTOMIZED_SDK_URL: https://ci.eclipse.org/openj9/job/Grinder/1227/

@gita-omr
Copy link
Contributor

Had to replace IP address in the CUSTOMIZED_SDK_URL: https://ci.eclipse.org/openj9/job/Grinder/1227/

@jdekonin still did not manage to make it run. Could you please help?

@mnalam-p
Copy link
Contributor

@pshipton NodesByIteration was not able to reproduce the issue. Two config tested -

  1. ITERATIONS=5 and NUM_MACHINES=4
  2. ITERATIONS=20 and NUM_MACHINES=10
    Any other suggestions?
    One question, what is the difference between production / non-production build?

@pshipton
Copy link
Member Author

what is the difference between production / non-production build?

Where do you see that, I might want to change my answer. Something we build nightly or in a personal build is non-production. A build created for a release, with the correct properties, version numbers, and perhaps even building on special machines with restricted access is a production build. There would be no difference in functionality if built from the same SHAs, it's more about the -version output.

Nightly build:
openjdk version "1.8.0_282-internal"
OpenJDK Runtime Environment (build 1.8.0_282-internal-jenkins_2020_11_26_18_17-b00)
Eclipse OpenJ9 VM (build master-2f4ea25a94c, JRE 1.8.0 Linux ppc64le-64-Bit Compressed References 20201126_579 (JIT enabled, AOT enabled)

Release build:
openjdk version "1.8.0_272"
OpenJDK Runtime Environment (build 1.8.0_272-b10)
Eclipse OpenJ9 VM (build openj9-0.23.0, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20201022_822 (JIT enabled, AOT enabled)

@pshipton
Copy link
Member Author

Any other suggestions?

Request access to an OpenJ9 machine where the problem occurs.

@jdekonin
Copy link
Contributor

Had to replace IP address in the CUSTOMIZED_SDK_URL...

@gita-omr CUSTOMIZED_SDK_URL has a few links that need modifying you only got the first which was the SDK. The second was the test-images.tar.gz. I relaunched with the correction
https://ci.eclipse.org/openj9/job/Grinder/1230/console

Side note: @rajdeepsingh1 increased the artifactory space, so anything after November 19th won't matter it should be in both locations. I realize that doesn't help this issue though.

@mnalam-p
Copy link
Contributor

mnalam-p commented Dec 1, 2020

@pshipton I ran 20 iteration of test.MauveMultiThreadLoadTest on an OpenJ9 machine (twice). All test passed and did not see any test failure.

@pshipton
Copy link
Member Author

pshipton commented Dec 1, 2020

I can grind it again if you like. There were more failures last night.
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly/1848

Did you choose a machine where we've seen failures before?

@gita-omr
Copy link
Contributor

gita-omr commented Dec 2, 2020

5 iterations passed: https://ci.eclipse.org/openj9/job/Grinder/1230/console

@gita-omr
Copy link
Contributor

gita-omr commented Dec 2, 2020

I think the above only ran MauveMultiThreadLoadTest_0

Started 1 iteration of MauveMultiThreadLoadTest_special_16 here: https://ci.eclipse.org/openj9/job/Grinder/1235/
Passed.

@gita-omr
Copy link
Contributor

gita-omr commented Dec 2, 2020

Started 25 iterations of MauveMultiThreadLoadTest_special_16 here: https://ci.eclipse.org/openj9/job/Grinder/1236/

@gita-omr
Copy link
Contributor

gita-omr commented Dec 2, 2020

7 out of 100 failed (each iteration has 4 runs).

@gita-omr
Copy link
Contributor

gita-omr commented Dec 2, 2020

Just wanted to mention that @mnalam-p was able to reproduce on openj9 machine manually and that the failure rate with count=0 seems to be pretty high. Narrowing down the method now.

@mnalam-p
Copy link
Contributor

mnalam-p commented Dec 4, 2020

I have reduced the test cases to only the failing one (ConsCharset) and thus now can cycle test faster. Here is what I have found so far -

  • Test failure is still random, but with single test case running it is faster to cycle through
  • Giving any cmdline options to test vm that effects timing (e,g - traceFile, verbose, count etc) affects the reproducibility
  • @gita-omr ran grinder with disableAutoSIMD and it did not reproduce the test failure
    I am still trying to reproduce the test failure more consistently. I did created a PR to fix an autoSIMD. As a side test, I am going to also try reproduce the test failure with and without the fix.

Any suggestion would be helpful on what to do next.

@pshipton
Copy link
Member Author

pshipton commented Dec 7, 2020

@mnalam-p
Copy link
Contributor

mnalam-p commented Dec 8, 2020

With the AutoSIMD fix, still I was able to generate the failure locally. It is still kind of hit and miss to generate the failure. One observation I can see from the actual vs expected output failure is that the first three values are off by one byte.
For 0xC3,0x9F,0xE2 vs 00DF we get 65503 (FFDF) and our expected was 0x00DF
For 0x85,0x93,0xE2 vs 2153 we get 57683 (E153) and our expected was 0x2153
Based on this observation I tried to look into asByteArray method.

Also, I modified the suite to enable dump of failure and got a javacore and heapdump as well. I will try to see if I can get some clue from them or not. I am still trying to pinpoint the actual failure to consistently reproduce it.

@pshipton
Copy link
Member Author

pshipton commented Dec 8, 2020

FYI a hang in the same test #11400

@mnalam-p
Copy link
Contributor

mnalam-p commented Dec 9, 2020

After reducing the test case, I still see the test failure. I am suspecting the problem is in new String(byte [], Charset). Following is the test case which was able to generate the test failure -

import java.nio.charset.UnsupportedCharsetException;

public class ConsCharset
  implements Testlet
{
  public void test (TestHarness h)
  {
    try
      {
        byte[] cp437Bytes = new byte[] { (byte)224, (byte)226, (byte)227, (byte)228, (byte)156 };  //asByteArray(new int[] { 224, 226, 227, 228, 156 });
        checkString(h, new String(cp437Bytes, Charset.forName("CP437")),
                                  "\u03b1\u0393\u03c0\u03a3\u00a3");
      }
    catch (UnsupportedCharsetException e)
      {
        // Skip tests as CP437 is not required by the spec.
      }

    byte[] utf8Bytes = new byte[] { (byte)0xC3,(byte)0x9F,(byte)0xE2,(byte)0x85,(byte)0x93,(byte)0xE2,(byte)0x82,(byte)0xAF,(byte)0xF0,(byte)0x90,(byte)0x85,(byte)0x80 }; //asByteArray(new int[] { 0xC3,0x9F,0xE2,0x85,0x93,0xE2,0x82,0xAF,0xF0,0x90,0x85,0x80 });
    checkString(h, new String(utf8Bytes, Charset.forName("UTF8")),
                "\u00DF\u2153\u20AF\uD800\uDD40");

    byte[] isoBytes = new byte[] {(byte)0x48,(byte)0x65,(byte)0x6C,(byte)0x6C,(byte)0x6F,(byte)0x20,(byte)0x57,(byte)0x6F,(byte)0x72,(byte)0x6C,(byte)0x64,(byte)0x21}; //asByteArray(new int[] {0x48,0x65,0x6C,0x6C,0x6F,0x20,0x57,0x6F,0x72,0x6C,0x64,0x21});
    checkString(h, new String(isoBytes, Charset.forName("ISO-8859-1")),
                "Hello World!");
  }

  private void checkString(TestHarness h, String result, String expected)
  {
    for (int a = 0; a < result.length(); ++a)
      {
        h.check(result.charAt(a), expected.charAt(a));
      }
    h.check(result, expected);
  }
}

@gita-omr
Copy link
Contributor

Started here: https://ci.eclipse.org/openj9/job/Grinder/1274/

This time with -Xjit=limit={*} since we suspect that even that option can change the behaviour.

@mnalam-p
Copy link
Contributor

mnalam-p commented Dec 14, 2020

I was able to reproduce this issue with the oldest available nightly build from adoptopenjdk (17 April 2020).

openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-b09)
Eclipse OpenJ9 VM (build openj9-0.20.0, JRE 1.8.0 Linux ppc64le-64-Bit Compressed References 20200416_571 (JIT enabled, AOT enabled)
OpenJ9   - 05fa2d361
OMR      - d4365f371
JCL      - cfa47e66cd5f based on jdk8u252-b09)

LT  stderr 
LT  13:07:20.528 - suite.getInventory().getInventoryFileRef(): openjdk.test.load/config/inventories/mauve/mauve_multiThread.xml
LT  13:07:20.528 - suite.getInventory().getInventoryFileRef(): openjdk.test.load/config/inventories/mauve/mauve_multiThread.xml
LT  13:07:20.528 - suite.getInventory().getInventoryFileRef(): openjdk.test.load/config/inventories/mauve/mauve_multiThread.xml
LT  13:07:20.528 - suite.getInventory().getInventoryFileRef(): openjdk.test.load/config/inventories/mauve/mauve_multiThread.xml
LT  13:07:20.528 - suite.isCreateDump() : true
LT  13:07:20.528 - suite.isCreateDump() : true
LT  13:07:20.528 - suite.isCreateDump() : true
LT  13:07:20.528 - suite.isCreateDump() : true
LT  13:07:20.528 - Test failed. Details recorded in execution log.
LT  13:07:20.528 - Test failed. Details recorded in execution log.
LT  13:07:20.528 - Test failed. Details recorded in execution log.
LT  13:07:20.529 - Test failed
LT    Failure num.  = 1
LT    Test number   = 0
LT    Test details  = 'Mauve[gnu.testlet.java.lang.String.ConsCharset]'
LT    Suite number  = 0
LT    Thread number = 34
LT  >>> Captured test output >>>
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 0)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 1)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 2)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 3)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 4)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 5)
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 6)
LT  got 65503 but expected 223
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 7)
LT  got 57683 but expected 8531
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 8)
LT  got 57519 but expected 8367
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 9)
LT  got 65533 but expected 55296
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 10)
LT  got 65533 but expected 56640
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 11)
LT  got 㿟⅓₯㿽㿽 but expected ￟��

@pshipton
Copy link
Member Author

While it would be nice to figure this out, obviously it doesn't need to be resolved for the 0.24 release, moving it forward.

@AlenBadel
Copy link
Contributor

AlenBadel commented Dec 15, 2020

We've discovered a way to reliably reproduce this failure nearly 100% of the time.

Use the following simplified test:

import java.nio.charset.UnsupportedCharsetException;
import java.nio.charset.Charset;

public class ConsCharset
    {
    public static void main(String[] args)
        {
        System.out.println("Starting Test");
        for (int i = 0; i < 100000; i++)
            {
                if (!runTest())
                {
                        System.out.println("Test Failed");
                        return;
                }
            }
        System.out.println("Test Passed");
        }

    public static boolean runTest()
        {
        try
            {
            byte[] cp437Bytes = new byte[] { (byte)224, (byte)226, (byte)227, (byte)228, (byte)156 };  //asByteArray(new int[] { 224, 226, 227, 228, 156 });
            checkString(new String(cp437Bytes, Charset.forName("CP437")),
                                        "\u03b1\u0393\u03c0\u03a3\u00a3");
            }
        catch (UnsupportedCharsetException e)
            {
            System.out.println("Skipped Tests");
            // Skip tests as CP437 is not required by the spec.
            }

        byte[] utf8Bytes = new byte[] { (byte)0xC3,(byte)0x9F,(byte)0xE2,(byte)0x85,(byte)0x93,(byte)0xE2,(byte)0x82,(byte)0xAF,(byte)0xF0,(byte)0x90,(byte)0x85,(byte)0x80 }; //asByteArray(new int[] { 0xC3,0x9F,0xE2,0x85,0x93,0xE2,0x82,0xAF,0xF0,0x90,0x85,0x80 });
        boolean  res = checkString(new String(utf8Bytes, Charset.forName("UTF8")),
                    "\u00DF\u2153\u20AF\uD800\uDD40");
        if (!res) 
            {
            System.out.println("Character Match Failed");
            return false;
            }
        return true;
        }

    private static boolean checkString(String result, String expected)
        {
        boolean res = true;
        for (int a = 0; a < result.length(); ++a)
            {
                if (result.charAt(a) != expected.charAt(a))
                    {
                    res = false;
                    System.out.println("Failed Character Match");
                    break;
                    }
            }

        if (!result.equals(expected))
            {
            System.out.println("Failed String.equals Match");
            res = false;
            }

        if (!res)
            {
            System.out.println("Expected:" + expected + " But Got:" + result);
            return false;
            }

        return true;
        }
}

Invoke using

./jdk/jre/bin/java -XX:+UseCompressedOops -Xgcpolicy:balanced -Xjit:counts="- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000",gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xcheck:gc:vmthreads:all:quiet -Xshareclasses:none ConsCharset

-Xshareclasses:none Needs to added along with Mode555, because the AOT compiled method doesn't produce the failure.

The issue occurs only when sun/nio/cs/UTF_8$Decoder.decode([BII[C)I is compiled.

The optimization that causes this to start failing is partialRedundancyElimination.

Invoking the following will produce a trace file reliably:

./jdk/jre/bin/java -XX:+UseCompressedOops -Xgcpolicy:balanced -Xjit:counts="- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000",gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile,limit='{sun/nio/cs/UTF_8$Decoder.decode([BII[C)I}',disableAsyncCompilation,disableInlining,traceFull,log=jit.trc -Xcheck:gc:vmthreads:all:quiet -Xshareclasses:none ConsCharset

Notably, disableAsyncCompilation needs to be added or the failure is never reproduced when tracing.

@AlenBadel
Copy link
Contributor

Specifically, this starts failing at transformation 19.

[  2689]  96.19   O^O RECOMPILATION COUNTERS:  VALUE PROFILER: Add JProfiling trees to track the value of node 000075A568374330 near tree 000075A5683743D0, commonNode 1
         O^O PARTIAL REDUNDANCY ELIMINATION: Eliminating redundant computation (store) : 000075A568708DE0
         O^O PARTIAL REDUNDANCY ELIMINATION: Eliminating redundant computation (store) : 000075A568708BB0
              (Invalidating alias info)

@JasonFengJ9
Copy link
Member

Observed a different failure (StringIndexOutOfBoundsException) at same test suite at an internal build Test_openjdk8_j9_special.system_ppc64le_linux_Nightly/1749:

LT  05:05:49.192 -   4615 Mauve[gnu.testlet.javax.xml.xpath.XPath]  Weighting=1 
LT  05:05:49.284 - Starting thread. Suite=0 thread=0
LT  05:05:56.217 - First failure detected by thread: load-0. Not creating dumps as no dump generation is requested for this load test
LT  05:05:56.230 - Test failed
LT    Failure num.  = 1
LT    Test number   = 3180
LT    Test details  = 'Mauve[gnu.testlet.java.lang.String.ConsCharset]'
LT    Suite number  = 0
LT    Thread number = 0
LT  >>> Captured test output >>>
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 0)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 1)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 2)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 3)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 4)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 5)
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 6)
LT  got 195 but expected 223
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 7)
LT  got 159 but expected 8531
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 8)
LT  got 226 but expected 8367
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 9)
LT  got 133 but expected 55296
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 10)
LT  got 147 but expected 56640
LT  Test failed:
LT  java.lang.StringIndexOutOfBoundsException
LT  	at java.lang.String.charAt(String.java:1397)
LT  	at gnu.testlet.java.lang.String.ConsCharset.checkString(ConsCharset.java:59)
LT  	at gnu.testlet.java.lang.String.ConsCharset.test(ConsCharset.java:47)
LT  	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
LT  	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
LT  	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
LT  	at java.lang.reflect.Method.invoke(Method.java:498)
LT  	at net.adoptopenjdk.loadTest.adaptors.MauveAdaptor.executeTest(MauveAdaptor.java:74)
LT  	at net.adoptopenjdk.loadTest.LoadTestRunner$2.run(LoadTestRunner.java:182)
LT  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
LT  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
LT  	at java.lang.Thread.run(Thread.java:823)
LT  <<<

@r30shah
Copy link
Contributor

r30shah commented Dec 18, 2020

I was looking into this with @AlenBadel as he mentioned in #11192 (comment) where PRE decides to profile the array length and inserts placeholder JProfiling Call. With a small hack to make further narrow down the issue, we used a driver that limits the profiling values and used binary search to find out which was the last value for which it adds profiling trees and it starts failing. Following are key observations,

  1. When Last Value to profile is the same array length (mentioned in MauveMultiThreadLoadTest_special_16 String.ConsCharset got 65503 but expected 223 #11192 (comment)) it fails .
  2. Power is using the BitShiftHash where for value hash is calculated by series of shift and extract operations compared to BitIndexHash (Needs BitPermute to be enabled) on Z and x86. Going to open up issue to summarize the operation carried out by bit permute evaluator and if seems beneficial, it will work as guide to implement the evaluator.
  3. As the failures happens when we profile that particular array length and Power is using different code to calculate the hash, we had suspicion that it could be the trees for BitShiftIndex which are causing the failure so to verify the doubt, we disabled the Bit Permute and forced to generate code for BitShiftIndex on Z and x86. Both passes and on Z, I have verified, we do profile the same array length and cfg looks similar. It might be possible that the code generated on Power is incorrect so right now looking into the generated code to see if it is the case.
  4. Another thing that does not make any sense is that, no where in the profiling code, we write the array length. It only reads it. Do not know how this could alter the functionality of the Java Method.

I will continue to help Alen and also take a look myself to find out the cause of the failures.

@pshipton
Copy link
Member Author

pshipton commented Dec 24, 2020

@gita-omr
Copy link
Contributor

gita-omr commented Jan 4, 2021

@pshipton we only seen it on Power? Maybe we can set the label?

@gita-omr
Copy link
Contributor

gita-omr commented Jan 4, 2021

Thanks a lot @AlenBadel and @r30shah for narrowing the test case down and the investigation. Just to update: since the test case fails practically every time now, we decided that @mnalam-p will try to step through it in the debugger. Whatever trees JPorifiler is inserting they should not be affecting the user data so I am also suspecting some codegen issue. But @r30shah if you could take another look at the inserted tree it would be very helpful.

@pshipton
Copy link
Member Author

pshipton commented Jan 4, 2021

Sure, set any labels you like.

@pshipton
Copy link
Member Author

pshipton commented Jan 5, 2021

@mnalam-p
Copy link
Contributor

Further debugging reveals that the underlying array of the utf-8 encoded string was changed from 5 to 12. 12 being the initial byte array length. In fact, the String's underlying char has the same byte buffer as if it was encoded in ASCII instead of UTF-8. I am currently narrowing down on this issue (why it is not UTF-8 encoded).

@mnalam-p
Copy link
Contributor

Based on gdb debugging and JCL source, it is clear that the string is wrongly encoded as 1-byte utf-8 for the bad case, thus resulting the same byte and length as source byte array. For the input buffer 0xC3 0x9F 0xE2 0x85 0x93 0xE2 0x82 0xAF 0xF0 0x90 0x85 0x80 the utf-8 encoding should be 2-bytes, 3-bytes, 3-bytes, 4-bytes.

For the 0xC3 0x9F bytes, the code should take this branch. The compare value b1 is an integer which loads the source byte in this line.

The generated IL for the byte to integer load is -

istore
   b2i
      ==>bloadi

The byte value 0xC3 is a negative value, but in the bad case when it is loaded as integer it becomes a positive value of 0x000000C3. This is the main reason that causes the first branch to be taken. And thus wrongly encodes the whole source byte as 1-byte utf-8. This is caused by the missing extsb opcode in the generated code for the bad case.

When the jProfiler is active, the generated IL is changed to -

istore
   b2i
      ==>bRegLoad

This IL does not generates the extsb instruction and thus leads us to the root cause for this issue.
In the b2iEvaluator, the code is set to generate extsb for bloadi but not for bRegLoad.

mnalam-p added a commit to mnalam-p/omr that referenced this issue Jan 13, 2021
When the jProfiler is enabled, it can change the IL from bloadi to
bRegLoad. The b2iEvaluator in power does not generate extsb opcode
for bRegLoad. This commit fixes that.

Fixes: eclipse-openj9/openj9#11192
Signed-off-by: Mohammad Nazmul Alam <[email protected]>
mnalam-p added a commit to mnalam-p/omr that referenced this issue Jan 13, 2021
When the jProfiler is enabled, it can change the IL from bloadi to
bRegLoad. The b2iEvaluator in power does not generate extsb opcode
for bRegLoad. This commit fixes that.

Fixes: eclipse-openj9/openj9#11192
Signed-off-by: Mohammad Nazmul Alam <[email protected]>
mnalam-p added a commit to mnalam-p/omr that referenced this issue Jan 13, 2021
When the jProfiler is enabled, it can change the IL from bloadi to
bRegLoad. The b2iEvaluator in power does not generate extsb opcode
for bRegLoad. This commit fixes that.

Fixes: eclipse-openj9/openj9#11192
Signed-off-by: Mohammad Nazmul Alam <[email protected]>
mnalam-p added a commit to mnalam-p/omr that referenced this issue Jan 14, 2021
The b2iEvaluator currently does not generate sign extension for
bRegLoad. While it will work for sign extended bRegLoad, it may
yield wrong result if the byte was not sign extended. This commit
cover such cases, by making sure bRegLoad is always sign extended.

Fixes: eclipse-openj9/openj9#11192
Signed-off-by: Mohammad Nazmul Alam <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants