-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to attach telemetry devices without reprovisioning #737
Conversation
By using ES_JAVA_OPTS we can provision a node, run a benchmark, and then “dynamically” (i.e. without reprovisioning) start the node again with telemetry attached. Relates to elastic#697 Relates to elastic#711
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. The overall direction looks fine but I had some comments regarding the implementation.
esrally/mechanic/launcher.py
Outdated
#node_telemetry_dir = "%s/telemetry" % node_configuration.node_root_path | ||
node_telemetry_dir = os.path.join(node_configuration.node_root_path, "telemetry") | ||
|
||
print("here in oy %s" % node_telemetry_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, thanks for catching
esrally/mechanic/launcher.py
Outdated
@@ -299,14 +299,21 @@ def _start_node(self, node_configuration, node_count_on_host): | |||
car = node_configuration.car | |||
binary_path = node_configuration.binary_path | |||
data_paths = node_configuration.data_paths | |||
node_telemetry_dir = "%s/telemetry" % node_configuration.node_root_path | |||
#node_telemetry_dir = "%s/telemetry" % node_configuration.node_root_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover?
esrally/mechanic/launcher.py
Outdated
@@ -340,7 +352,7 @@ def _set_env(self, env, k, v, separator=' '): | |||
if k not in env: | |||
env[k] = v | |||
else: # merge | |||
env[k] = v + separator + env[k] | |||
env[k] = env[k] + separator + v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain why you've changed the order here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason JFR telemetry would not start and rally would never return with pid if the the options didn't start with "-XX:+UnlockDiagnosticVMOptions" That's how it was originally, but now that we changed the telemetry to return list instead of string the order got reversed when we were adding them to the env[k]. So I reversed again so it's in the original order. I could change the order in telemetry instead if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. That's very subtle but it makes sense that we preserve the order in the merge routine (i.e. as you've implemented it now). Can you please add a test so we ensure that -XX:+UnlockDiagnosticVMOptions
is the first option when we activate multiple telemetry devices (you can just use the full list of JVM args in the assertion)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a bit more testing, so it seems that order of options for other telemetry devices doesn't matter. It's that specifically for jfr that -XX:+UnlockDiagnosticVMOptions
has to come before -XX:+DebugNonSafepoints
but other telemetry can be before or after jfr options. So maybe modifying order in telemery for jfr makes more sense than in _set_env
.. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, your proposal makes sense to me.
@@ -119,6 +119,7 @@ def wait(self): | |||
|
|||
|
|||
class ProcessLauncherTests(TestCase): | |||
@mock.patch('os.path.join', return_value="/telemetry") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to mock this? IMHO we shouldn't have mocked quite a few of the dependencies of this test case to begin with and specifically not esrally.mechanic.provisioner.NodeConfiguration
.
I think we need to clean this up in a separate PR. @drawlerr can you please work on a new PR (based on master) and rewrite this test case to mock only dependencies that have side effects? (as a guideline: things in os
are likely ok, things like esrally.config.Config
or esrally.metrics.MetricsStore
should not be mocked).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the launcher to use os.path instead of concatenating strings. If I don't update the test I get the following error. Not sure how to get away without mocking it..
________________ ProcessLauncherTests.test_daemon_start_stop __________________
args = (<tests.mechanic.launcher_test.ProcessLauncherTests testMethod=test_daemon_start_stop>, <MagicMock name='Process' id='...ion' id='4573833480'>, <MagicMock name='MetricsStore' id='4573862824'>, <MagicMock name='Config' id='4577452888'>, ...)
keywargs = {}
extra_args = [<MagicMock name='Process' id='4574868424'>, <MagicMock name='wait_for_pidfile' id='4578813376'>, <MagicMock name='Nod...tricsStore' id='4573862824'>, <MagicMock name='Config' id='4577452888'>, <MagicMock name='chdir' id='4578781056'>, ...]
entered_patchers = [<unittest.mock._patch object at 0x10d987438>, <unittest.mock._patch object at 0x10d987ef0>, <unittest.mock._patch obj...bject at 0x10d987a58>, <unittest.mock._patch object at 0x10d987320>, <unittest.mock._patch object at 0x10d987ac8>, ...]
exc_info = (<class 'TypeError'>, TypeError("test_daemon_start_stop() missing 1 required positional argument: 'path'"), <traceback object at 0x110efe5c8>)
patching = <unittest.mock._patch object at 0x10d987438>
arg = <MagicMock name='kill' id='4578728480'>
@wraps(func)
def patched(*args, **keywargs):
extra_args = []
entered_patchers = []
exc_info = tuple()
try:
for patching in patched.patchings:
arg = patching.__enter__()
entered_patchers.append(patching)
if patching.attribute_name is not None:
keywargs.update(arg)
elif patching.new is DEFAULT:
extra_args.append(arg)
args += tuple(extra_args)
> return func(*args, **keywargs)
E TypeError: test_daemon_start_stop() missing 1 required positional argument: 'path'
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/unittest/mock.py:1191: TypeError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to remove path
from the argument list of the test. Otherwise the test framework cannot call test_daemon_start_stop
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, sorry if if remove both mock and path in the arguments I get this:
@mock.patch('os.kill')
@mock.patch('subprocess.Popen',new=MockPopen)
@mock.patch('esrally.mechanic.java_resolver.java_home', return_value=(12, "/java_home/"))
@mock.patch('esrally.utils.jvm.supports_option', return_value=True)
@mock.patch('os.chdir')
@mock.patch('esrally.config.Config')
@mock.patch('esrally.metrics.MetricsStore')
@mock.patch('esrally.mechanic.provisioner.NodeConfiguration')
@mock.patch('esrally.mechanic.launcher.wait_for_pidfile', return_value=MOCK_PID_VALUE)
@mock.patch('psutil.Process')
def test_daemon_start_stop(self, process, wait_for_pidfile, node_config, ms, cfg, chdir, supports, java_home, kill):
proc_launcher = launcher.ProcessLauncher(cfg, ms, paths.races_root(cfg))
> nodes = proc_launcher.start([node_config])
tests/mechanic/launcher_test.py:135:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
esrally/mechanic/launcher.py:294: in start
return [self._start_node(node_configuration, node_count_on_host) for node_configuration in node_configurations]
esrally/mechanic/launcher.py:294: in <listcomp>
return [self._start_node(node_configuration, node_count_on_host) for node_configuration in node_configurations]
esrally/mechanic/launcher.py:302: in _start_node
node_telemetry_dir = os.path.join(node_configuration.node_root_path, "telemetry")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
a = <MagicMock name='NodeConfiguration.node_root_path' id='4598523160'>
p = ('telemetry',)
def join(a, *p):
"""Join two or more pathname components, inserting '/' as needed.
If any component is an absolute path, all previous path components
will be discarded. An empty last part will result in a path that
ends with a separator."""
> a = os.fspath(a)
E TypeError: expected str, bytes or os.PathLike object, not MagicMock
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/posixpath.py:80: TypeError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in another channel this is the result of mocking other dependencies so it is ok if you keep the mock on os.path.join
here as this will be cleaned up in a dedicated PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a final test with an activated telemetry device and found an issue with the order of JVM arguments. Can you please check again and address this?
esrally/mechanic/launcher.py
Outdated
@@ -352,7 +349,7 @@ def _set_env(self, env, k, v, separator=' '): | |||
if k not in env: | |||
env[k] = v | |||
else: # merge | |||
env[k] = env[k] + separator + v | |||
env[k] = v + separator + env[k] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this with
esrally --test-mode --distribution-version=2.4.6 --on-error=abort --challenge=append-no-conflicts-index-only --telemetry=jfr,gc
and indeed you have to preserve the order of the JVM arguments otherwise you'll get an error that the JVM process could not be created. Can you please:
- Change it to your previous implementation so that the order is preserved
- Add a test in ProcessLauncher that adds a telemetry device and checks
ES_JAVA_OPTS
?
esrally/mechanic/telemetry.py
Outdated
@@ -191,7 +191,7 @@ def instrument_java_opts(self, car, candidate_id): | |||
|
|||
def java_opts(self, log_file): | |||
recording_template = self.telemetry_params.get("recording-template") | |||
java_opts = ["-XX:+UnlockDiagnosticVMOptions", "-XX:+DebugNonSafepoints"] | |||
java_opts = ["-XX:+DebugNonSafepoints", "-XX:+UnlockDiagnosticVMOptions"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the order is preserved (see my comment above) also this needs to be reversed again (-XX:+UnlockDiagnosticVMOptions
must be specified before -XX:+DebugNonSafepoints
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the test. I left two more comments but I think we're almost there.
esrally/mechanic/launcher.py
Outdated
if v is not None: | ||
if k not in env: | ||
env[k] = v | ||
else: # merge | ||
env[k] = v + separator + env[k] | ||
if (prepend == True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can rewrite this as
if k not in env:
env[k] = v
elif prepend:
# prepend it
else:
# append it
tests/mechanic/launcher_test.py
Outdated
@@ -139,6 +140,25 @@ def test_daemon_start_stop(self, process, wait_for_pidfile, node_config, ms, cfg | |||
proc_launcher.keep_running = False | |||
proc_launcher.stop(nodes) | |||
self.assertTrue(kill.called) | |||
|
|||
@mock.patch('os.path.join', return_value="/java_home/bin") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be necessary and this test should work without mocking os.path.join
.
@danielmitterdorfer Thank you for the review! I addressed your latest comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one final comment. Also, can you please reword the PR title a bit so the intent is clearer? E.g. "Allow to attach telemetry without reprovisioning"?
esrally/mechanic/launcher.py
Outdated
@@ -348,10 +348,9 @@ def _set_env(self, env, k, v, separator=' ', prepend=False): | |||
if v is not None: | |||
if k not in env: | |||
env[k] = v | |||
else: # merge | |||
if (prepend == True): | |||
elif (prepend == True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this will work correctly, idiomatic usage is elif prepend
instead of elif (prepend == True)
. Can you please change that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating! LGTM! :)
By using ES_JAVA_OPTS we can provision a node, run a benchmark, and then “dynamically” (i.e. without reprovisioning) start the node again with telemetry attached. Relates to elastic#697 Relates to elastic#711
By using ES_JAVA_OPTS we can provision a node, run a benchmark, and then “dynamically” (i.e. without reprovisioning) start the node again with telemetry attached.
Relates to #697
Relates to #711