Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop trying to test Elasticsearch 6.8.0 on ARM #1571

Merged
merged 2 commits into from
Sep 12, 2022

Conversation

pquentin
Copy link
Member

This will allow developers with Apple Silicon hardware to still run
integration tests.

This will allow developers with Apple Silicon hardware to still run
integration tests.
@pquentin pquentin added this to the 2.6.1 milestone Aug 31, 2022
@pquentin pquentin requested a review from b-deam August 31, 2022 07:52
@pquentin pquentin self-assigned this Aug 31, 2022
@pquentin pquentin changed the title Stop trying to test 6.8.0 on ARM Stop trying to test Elasticsearch 6.8.0 on ARM Aug 31, 2022
b-deam
b-deam previously approved these changes Aug 31, 2022
@pquentin pquentin requested a review from b-deam August 31, 2022 09:32
@pquentin pquentin dismissed b-deam’s stale review August 31, 2022 09:32

It was submitted by mistake, the metrics store version is also too old

@b-deam
Copy link
Member

b-deam commented Sep 7, 2022

Interestingly ITs pass up until test_sources, where Rally just hangs:

[...]
it/sources_test.py::test_sources[es-it]
    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

[INFO] Race id is [2b313325-31b2-4f45-884d-c9ca86495c73]
[INFO] Preparing for race ...

If we use py-spy to dump a stack trace, it again seems like it's stuck in the Actor system:

$ pstree -s esrally -w
-+= 00001 root /sbin/launchd
 |-+= 00766 bradleydeam /Library/Application Support/iTerm2/iTermServer-3.4.3 /Library/Application Support/iTerm2/iterm2-daemon-1.socket
 | \-+= 00767 root /usr/bin/login -fpl bradleydeam /Applications/iTerm.app/Contents/MacOS/iTerm2 --launch_shell
 |   \-+= 00777 bradleydeam -zsh
 |     \-+= 92100 bradleydeam /Applications/Xcode.app/Contents/Developer/usr/bin/make it
 |       \-+- 92624 bradleydeam /bin/bash -c . /elastic/rally/.venv/bin/activate; tox -e py38-it
 |         \-+- 92625 bradleydeam /elastic/rally/.venv/bin/python3 /elastic/rally/.venv/bin/tox -e py38-it
 |           \-+- 92757 bradleydeam /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/pytest -s it --junitxml=junit-py38-it.xml
 |             \-+- 98774 bradleydeam /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/esrally race --revision=latest --track=geonames --test-mode --target-hosts=127.0.0.1:19200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --kill-running-processes --on-error=abort --enable-assertions --configuration-name=es-it
 |               \--= 98777 bradleydeam (python3.8)
 \-+- 98778 bradleydeam /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/esrally race --revision=latest --track=geonames --test-mode --target-hosts=127.0.0.1:19200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --kill-running-processes --on-error=abort --enable-assertions --configuration-name=es-it
   |--- 98779 bradleydeam /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/esrally race --revision=latest --track=geonames --test-mode --target-hosts=127.0.0.1:19200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --kill-running-processes --on-error=abort --enable-assertions --configuration-name=es-it
   \-+- 98780 bradleydeam /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/esrally race --revision=latest --track=geonames --test-mode --target-hosts=127.0.0.1:19200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --kill-running-processes --on-error=abort --enable-assertions --configuration-name=es-it
     \-+- 98803 bradleydeam /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/esrally race --revision=latest --track=geonames --test-mode --target-hosts=127.0.0.1:19200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --kill-running-processes --on-error=abort --enable-assertions --configuration-name=es-it
       \-+- 98822 bradleydeam /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/esrally race --revision=latest --track=geonames --test-mode --target-hosts=127.0.0.1:19200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --kill-running-processes --on-error=abort --enable-assertions --configuration-name=es-it
         \--- 98823 bradleydeam /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/esrally race --revision=latest --track=geonames --test-mode --target-hosts=127.0.0.1:19200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --kill-running-processes --on-error=abort --enable-assertions --configuration-name=es-it

$ sudo py-spy dump --pid 98823
Process 98823: /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/esrally race --revision=latest --track=geonames --test-mode --target-hosts=127.0.0.1:19200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --kill-running-processes --on-error=abort --enable-assertions --configuration-name=es-it
Python v3.8.10 (/.pyenv/versions/3.8.10/bin/python3.8)

Thread 0x100710580 (active): "MainThread"
    exclusive_processing (thespian/system/transport/asyncTransportBase.py:46)
    __enter__ (contextlib.py:113)
    _runWithExpiry (thespian/system/transport/TCPTransport.py:1094)
    _run_subtransport (thespian/system/transport/wakeupTransportBase.py:80)
    run (thespian/system/transport/wakeupTransportBase.py:71)
    drainTransmits (thespian/system/systemCommon.py:202)
    run (thespian/system/actorManager.py:112)
    startChild (thespian/system/multiprocCommon.py:591)
    run (multiprocessing/process.py:108)
    _bootstrap (multiprocessing/process.py:315)
    _launch (multiprocessing/popen_fork.py:75)
    __init__ (multiprocessing/popen_fork.py:19)
    _Popen (multiprocessing/context.py:277)
    start (multiprocessing/process.py:121)
    _startChildActor (thespian/system/multiprocCommon.py:346)
    createActor (thespian/system/actorManager.py:316)
    createActor (thespian/actors.py:189)
    receiveMsg_StartEngine (esrally/mechanic/mechanic.py:486)
    guard (esrally/actor.py:92)
    receiveMessage (thespian/actors.py:838)
    _handleOneMessage (thespian/system/actorManager.py:163)
    handleMessages (thespian/system/actorManager.py:121)
    _runWithExpiry (thespian/system/transport/TCPTransport.py:1310)
    _run_subtransport (thespian/system/transport/wakeupTransportBase.py:80)
    run (thespian/system/transport/wakeupTransportBase.py:71)
    run (thespian/system/actorManager.py:87)
    startChild (thespian/system/multiprocCommon.py:591)
    run (multiprocessing/process.py:108)
    _bootstrap (multiprocessing/process.py:315)
    _launch (multiprocessing/popen_fork.py:75)
    __init__ (multiprocessing/popen_fork.py:19)
    _Popen (multiprocessing/context.py:277)
    start (multiprocessing/process.py:121)
    _startChildActor (thespian/system/multiprocCommon.py:346)
    createActor (thespian/system/actorManager.py:316)
    createActor (thespian/actors.py:189)
    receiveMsg_StartEngine (esrally/mechanic/mechanic.py:392)
    guard (esrally/actor.py:92)
    receiveMessage (thespian/actors.py:838)
    _handleOneMessage (thespian/system/actorManager.py:163)
    handleMessages (thespian/system/actorManager.py:121)
    _runWithExpiry (thespian/system/transport/TCPTransport.py:1310)
    _run_subtransport (thespian/system/transport/wakeupTransportBase.py:80)
    run (thespian/system/transport/wakeupTransportBase.py:71)
    run (thespian/system/actorManager.py:87)
    startChild (thespian/system/multiprocCommon.py:591)
    run (multiprocessing/process.py:108)
    _bootstrap (multiprocessing/process.py:315)
    _launch (multiprocessing/popen_fork.py:75)
    __init__ (multiprocessing/popen_fork.py:19)
    _Popen (multiprocessing/context.py:277)
    start (multiprocessing/process.py:121)
    _startChildActor (thespian/system/multiprocCommon.py:346)
    createActor (thespian/system/actorManager.py:316)
    createActor (thespian/actors.py:189)
    receiveMsg_Setup (esrally/racecontrol.py:112)
    guard (esrally/actor.py:92)
    receiveMessage (thespian/actors.py:838)
    _handleOneMessage (thespian/system/actorManager.py:163)
    handleMessages (thespian/system/actorManager.py:121)
    _runWithExpiry (thespian/system/transport/TCPTransport.py:1310)
    _run_subtransport (thespian/system/transport/wakeupTransportBase.py:80)
    run (thespian/system/transport/wakeupTransportBase.py:71)
    run (thespian/system/actorManager.py:87)
    startChild (thespian/system/multiprocCommon.py:591)
    run (multiprocessing/process.py:108)
    _bootstrap (multiprocessing/process.py:315)
    _launch (multiprocessing/popen_fork.py:75)
    __init__ (multiprocessing/popen_fork.py:19)
    _Popen (multiprocessing/context.py:277)
    start (multiprocessing/process.py:121)
    _startChildActor (thespian/system/multiprocCommon.py:346)
    h_PendingActor (thespian/system/admin/adminCore.py:318)
    h_PendingActor (thespian/system/admin/globalNames.py:19)
    handleIncoming (thespian/system/admin/adminCore.py:114)
    _runWithExpiry (thespian/system/transport/TCPTransport.py:1310)
    _run_subtransport (thespian/system/transport/wakeupTransportBase.py:80)
    run (thespian/system/transport/wakeupTransportBase.py:71)
    run (thespian/system/admin/convention.py:643)
    startAdmin (thespian/system/multiprocCommon.py:207)
    run (multiprocessing/process.py:108)
    _bootstrap (multiprocessing/process.py:315)
    _launch (multiprocessing/popen_fork.py:75)
    __init__ (multiprocessing/popen_fork.py:19)
    _Popen (multiprocessing/context.py:277)
    start (multiprocessing/process.py:121)
    _startAdmin (thespian/system/multiprocCommon.py:104)
    __init__ (thespian/system/systemBase.py:326)
    __init__ (thespian/system/multiprocCommon.py:86)
    __init__ (thespian/system/multiprocTCPBase.py:28)
    _startupActorSys (thespian/actors.py:676)
    __init__ (thespian/actors.py:635)
    bootstrap_actor_system (esrally/actor.py:263)
    with_actor_system (esrally/rally.py:775)
    race (esrally/rally.py:767)
    dispatch_sub_command (esrally/rally.py:991)
    main (esrally/rally.py:1082)
    <module> (esrally:8)

This is the same behaviour I'm seeing in #1570.

@b-deam
Copy link
Member

b-deam commented Sep 8, 2022

Setting Thespian debug logs with:

export THESPLOG_FILE="${THESPLOG_FILE:-${HOME}/.rally/logs/actor-system-internal.log}"
export THESPLOG_FILE_MAXSIZE=${THESPLOG_FILE_MAXSIZE:-204800}
export THESPLOG_THRESHOLD="DEBUG"

Allowed me to capture this as Rally 'hung':

2022-09-08 14:06:16.378618 p73957 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:56895)-pending-ExpiresIn_0:04:59.999941-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 843, "Container [esrally-sour...-quit_0:04:59.999927)

2022-09-08 14:06:44.049477 p73957 dbg  Attempting intent TransportIntent(ActorAddr-(T|:56922)-pending-ExpiresIn_0:04:59.999764-<class 'esrally.mechanic.mechanic.NodesStarted'>-<esrally.mechanic.mechanic.NodesStarted object at 0x10697ba00>-quit_0:04:59.999755)
2022-09-08 14:06:44.052249 p73957 ERR  Actor esrally.mechanic.mechanic.NodeMechanicActor @ ActorAddr-(T|:56941) transport run exception: Traceback (most recent call last):
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/actorManager.py", line 87, in run
    r = self.transport.run(self.handleMessages)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/wakeupTransportBase.py", line 71, in run
    rval = self._run_subtransport(incomingHandler, max_runtime)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/wakeupTransportBase.py", line 80, in _run_subtransport
    rval = self._runWithExpiry(incomingHandler)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/TCPTransport.py", line 1219, in _runWithExpiry
    self._acceptNewIncoming()
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/TCPTransport.py", line 1342, in _acceptNewIncoming
    lsock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
OSError: [Errno 22] Invalid argument

Thanks to @pquentin, we found a workaround for this by commenting out line 1342 in .venv/lib/python3.8/site-packages/thespian/system/transport/TCPTransport.py

- lsock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
+ # lsock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

We're not exactly sure what is happening (envoyproxy/envoy#1446 suggests that perhaps we're trying to set an option on a socket that is shut down), but this appears to only affect MacOS. I will raise an issue to Thespian's repo next week.

Copy link
Member

@b-deam b-deam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from #1575, ITs pass on ARM based OSX.

Therefore IMHO we should be good to merge.

@pquentin pquentin merged commit 7768c79 into elastic:master Sep 12, 2022
@pquentin pquentin added the :misc Changes that don't affect users directly: linter fixes, test improvements, etc. label Nov 2, 2022
@pquentin pquentin deleted the it-apple-silicon branch February 16, 2023 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:misc Changes that don't affect users directly: linter fixes, test improvements, etc. tech debt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants