Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to attach telemetry devices without reprovisioning #737

Merged
merged 6 commits into from
Aug 20, 2019

Conversation

ebadyano
Copy link
Contributor

By using ES_JAVA_OPTS we can provision a node, run a benchmark, and then “dynamically” (i.e. without reprovisioning) start the node again with telemetry attached.

Relates to #697
Relates to #711

By using ES_JAVA_OPTS we can provision a node, run a benchmark, and then
“dynamically” (i.e. without reprovisioning) start the node again with
telemetry attached.

Relates to elastic#697
Relates to elastic#711
Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. The overall direction looks fine but I had some comments regarding the implementation.

#node_telemetry_dir = "%s/telemetry" % node_configuration.node_root_path
node_telemetry_dir = os.path.join(node_configuration.node_root_path, "telemetry")

print("here in oy %s" % node_telemetry_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, thanks for catching

@@ -299,14 +299,21 @@ def _start_node(self, node_configuration, node_count_on_host):
car = node_configuration.car
binary_path = node_configuration.binary_path
data_paths = node_configuration.data_paths
node_telemetry_dir = "%s/telemetry" % node_configuration.node_root_path
#node_telemetry_dir = "%s/telemetry" % node_configuration.node_root_path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover?

@@ -340,7 +352,7 @@ def _set_env(self, env, k, v, separator=' '):
if k not in env:
env[k] = v
else: # merge
env[k] = v + separator + env[k]
env[k] = env[k] + separator + v
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain why you've changed the order here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason JFR telemetry would not start and rally would never return with pid if the the options didn't start with "-XX:+UnlockDiagnosticVMOptions" That's how it was originally, but now that we changed the telemetry to return list instead of string the order got reversed when we were adding them to the env[k]. So I reversed again so it's in the original order. I could change the order in telemetry instead if you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That's very subtle but it makes sense that we preserve the order in the merge routine (i.e. as you've implemented it now). Can you please add a test so we ensure that -XX:+UnlockDiagnosticVMOptions is the first option when we activate multiple telemetry devices (you can just use the full list of JVM args in the assertion)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a bit more testing, so it seems that order of options for other telemetry devices doesn't matter. It's that specifically for jfr that -XX:+UnlockDiagnosticVMOptions has to come before -XX:+DebugNonSafepoints but other telemetry can be before or after jfr options. So maybe modifying order in telemery for jfr makes more sense than in _set_env.. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your proposal makes sense to me.

@@ -119,6 +119,7 @@ def wait(self):


class ProcessLauncherTests(TestCase):
@mock.patch('os.path.join', return_value="/telemetry")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to mock this? IMHO we shouldn't have mocked quite a few of the dependencies of this test case to begin with and specifically not esrally.mechanic.provisioner.NodeConfiguration.

I think we need to clean this up in a separate PR. @drawlerr can you please work on a new PR (based on master) and rewrite this test case to mock only dependencies that have side effects? (as a guideline: things in os are likely ok, things like esrally.config.Config or esrally.metrics.MetricsStore should not be mocked).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the launcher to use os.path instead of concatenating strings. If I don't update the test I get the following error. Not sure how to get away without mocking it..

________________ ProcessLauncherTests.test_daemon_start_stop __________________

args = (<tests.mechanic.launcher_test.ProcessLauncherTests testMethod=test_daemon_start_stop>, <MagicMock name='Process' id='...ion' id='4573833480'>, <MagicMock name='MetricsStore' id='4573862824'>, <MagicMock name='Config' id='4577452888'>, ...)
keywargs = {}
extra_args = [<MagicMock name='Process' id='4574868424'>, <MagicMock name='wait_for_pidfile' id='4578813376'>, <MagicMock name='Nod...tricsStore' id='4573862824'>, <MagicMock name='Config' id='4577452888'>, <MagicMock name='chdir' id='4578781056'>, ...]
entered_patchers = [<unittest.mock._patch object at 0x10d987438>, <unittest.mock._patch object at 0x10d987ef0>, <unittest.mock._patch obj...bject at 0x10d987a58>, <unittest.mock._patch object at 0x10d987320>, <unittest.mock._patch object at 0x10d987ac8>, ...]
exc_info = (<class 'TypeError'>, TypeError("test_daemon_start_stop() missing 1 required positional argument: 'path'"), <traceback object at 0x110efe5c8>)
patching = <unittest.mock._patch object at 0x10d987438>
arg = <MagicMock name='kill' id='4578728480'>

    @wraps(func)
    def patched(*args, **keywargs):
        extra_args = []
        entered_patchers = []

        exc_info = tuple()
        try:
            for patching in patched.patchings:
                arg = patching.__enter__()
                entered_patchers.append(patching)
                if patching.attribute_name is not None:
                    keywargs.update(arg)
                elif patching.new is DEFAULT:
                    extra_args.append(arg)

            args += tuple(extra_args)
>           return func(*args, **keywargs)
E           TypeError: test_daemon_start_stop() missing 1 required positional argument: 'path'

/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/unittest/mock.py:1191: TypeError

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to remove path from the argument list of the test. Otherwise the test framework cannot call test_daemon_start_stop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, sorry if if remove both mock and path in the arguments I get this:

    @mock.patch('os.kill')
    @mock.patch('subprocess.Popen',new=MockPopen)
    @mock.patch('esrally.mechanic.java_resolver.java_home', return_value=(12, "/java_home/"))
    @mock.patch('esrally.utils.jvm.supports_option', return_value=True)
    @mock.patch('os.chdir')
    @mock.patch('esrally.config.Config')
    @mock.patch('esrally.metrics.MetricsStore')
    @mock.patch('esrally.mechanic.provisioner.NodeConfiguration')
    @mock.patch('esrally.mechanic.launcher.wait_for_pidfile', return_value=MOCK_PID_VALUE)
    @mock.patch('psutil.Process')
    def test_daemon_start_stop(self, process, wait_for_pidfile, node_config, ms, cfg, chdir, supports, java_home, kill):
        proc_launcher = launcher.ProcessLauncher(cfg, ms, paths.races_root(cfg))

>       nodes = proc_launcher.start([node_config])

tests/mechanic/launcher_test.py:135:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
esrally/mechanic/launcher.py:294: in start
    return [self._start_node(node_configuration, node_count_on_host) for node_configuration in node_configurations]
esrally/mechanic/launcher.py:294: in <listcomp>
    return [self._start_node(node_configuration, node_count_on_host) for node_configuration in node_configurations]
esrally/mechanic/launcher.py:302: in _start_node
    node_telemetry_dir = os.path.join(node_configuration.node_root_path, "telemetry")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

a = <MagicMock name='NodeConfiguration.node_root_path' id='4598523160'>
p = ('telemetry',)

    def join(a, *p):
        """Join two or more pathname components, inserting '/' as needed.
        If any component is an absolute path, all previous path components
        will be discarded.  An empty last part will result in a path that
        ends with a separator."""
>       a = os.fspath(a)
E       TypeError: expected str, bytes or os.PathLike object, not MagicMock

/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/posixpath.py:80: TypeError

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in another channel this is the result of mocking other dependencies so it is ok if you keep the mock on os.path.join here as this will be cleaned up in a dedicated PR.

@ebadyano ebadyano added this to the 1.3.0 milestone Aug 12, 2019
@ebadyano ebadyano added :Benchmark Candidate Management Anything affecting how Rally sets up Elasticsearch :misc Changes that don't affect users directly: linter fixes, test improvements, etc. enhancement Improves the status quo labels Aug 12, 2019
Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a final test with an activated telemetry device and found an issue with the order of JVM arguments. Can you please check again and address this?

@@ -352,7 +349,7 @@ def _set_env(self, env, k, v, separator=' '):
if k not in env:
env[k] = v
else: # merge
env[k] = env[k] + separator + v
env[k] = v + separator + env[k]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this with

esrally --test-mode --distribution-version=2.4.6 --on-error=abort --challenge=append-no-conflicts-index-only --telemetry=jfr,gc

and indeed you have to preserve the order of the JVM arguments otherwise you'll get an error that the JVM process could not be created. Can you please:

  • Change it to your previous implementation so that the order is preserved
  • Add a test in ProcessLauncher that adds a telemetry device and checks ES_JAVA_OPTS?

@@ -191,7 +191,7 @@ def instrument_java_opts(self, car, candidate_id):

def java_opts(self, log_file):
recording_template = self.telemetry_params.get("recording-template")
java_opts = ["-XX:+UnlockDiagnosticVMOptions", "-XX:+DebugNonSafepoints"]
java_opts = ["-XX:+DebugNonSafepoints", "-XX:+UnlockDiagnosticVMOptions"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the order is preserved (see my comment above) also this needs to be reversed again (-XX:+UnlockDiagnosticVMOptions must be specified before -XX:+DebugNonSafepoints).

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test. I left two more comments but I think we're almost there.

if v is not None:
if k not in env:
env[k] = v
else: # merge
env[k] = v + separator + env[k]
if (prepend == True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can rewrite this as

if k not in env:
  env[k] = v
elif prepend:
# prepend it
else:
# append it

@@ -139,6 +140,25 @@ def test_daemon_start_stop(self, process, wait_for_pidfile, node_config, ms, cfg
proc_launcher.keep_running = False
proc_launcher.stop(nodes)
self.assertTrue(kill.called)

@mock.patch('os.path.join', return_value="/java_home/bin")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be necessary and this test should work without mocking os.path.join.

@ebadyano
Copy link
Contributor Author

@danielmitterdorfer Thank you for the review! I addressed your latest comments.

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one final comment. Also, can you please reword the PR title a bit so the intent is clearer? E.g. "Allow to attach telemetry without reprovisioning"?

@@ -348,10 +348,9 @@ def _set_env(self, env, k, v, separator=' ', prepend=False):
if v is not None:
if k not in env:
env[k] = v
else: # merge
if (prepend == True):
elif (prepend == True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this will work correctly, idiomatic usage is elif prepend instead of elif (prepend == True). Can you please change that?

@ebadyano ebadyano changed the title Revert PR#711 to use ES_JAVA_OPTS for telemetry devices Allow to attach telemetry devices without reprovisioning Aug 16, 2019
Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! LGTM! :)

@ebadyano ebadyano merged commit 556de2d into elastic:master Aug 20, 2019
novosibman pushed a commit to novosibman/rally that referenced this pull request Oct 2, 2019
By using ES_JAVA_OPTS we can provision a node, run a benchmark, and then
“dynamically” (i.e. without reprovisioning) start the node again with
telemetry attached.

Relates to elastic#697
Relates to elastic#711
@ebadyano ebadyano deleted the master-javaopts branch December 16, 2022 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Benchmark Candidate Management Anything affecting how Rally sets up Elasticsearch enhancement Improves the status quo :misc Changes that don't affect users directly: linter fixes, test improvements, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants