-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
karabo-bridge-serve-run command #458
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
extra_data/export.py
Outdated
count += 1 | ||
new_time = time.monotonic() | ||
if count % 5 == 0: | ||
rate = len(deque) / (new_time - sent_times[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this is a rolling average, and limited to 10 elements, the length is 10 at max. (10 trains corresp. to 1 second), correct? Then: sent_times[0]
is not the first time ever, but the first of the current deque (oldest elements are thrown out), so sent_times[-1]
minus 10 elements, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, why is it len(deque)
and not len(sent_times)
- do you actually access the overall length of a deque type (not the actual object which has the name "sent_times")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, 10 trains correspond to one second, and with a limit on the deque, each call to .append()
drops one item out of the beginning (once it has filled up).
Each timestamp is measured just after sending a train, and we calculate this before adding the new timestamp. So since sent_times[0]
we have sent 9 more trains corresponding to the other 9 entries in sent_times
, plus the one we've just sent but not yet added. So 10 trains in the measured time interval.
sent_times.append(new_time) | ||
print_update(end='\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Factored-out function to allow different line-end behaviour, carriage return vs. line break - fair enough to avoid the longish format string expression twice.
from ..export import serve_data | ||
except ImportError as e: | ||
sys.exit(IMPORT_FAILED_MSG.format(e)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason to do this import here within the function and not at the top of the file? (except of course the fail message constant needs to be defined first)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd usually try to avoid side-effects (like sys.exit()
) when loading a module, although it doesn't matter so much for a module defining a CLI like this. It's also handy that --help
still works even without the extra dependencies.
We could still have the import at the top like this:
# Top of file
try:
from ..export import serve_data
except ImportError:
serve_data = None
# In the function
if serve_data is None:
sys.exit(msg)
But that looks less neat to me.
Hi Thomas, irrespective of my questions, LGTM. |
Thanks Fabio! I just tweaked the CLI design slightly. I thought the 3 numbers together (proposal, run, port) with no markers was somewhat confusing, so I've made karabo-bridge-serve-run 4237 219 --port 41234 ... I think this makes the meaning clearer. It also means port can be unspecified, which defaults to picking a random unused port. The 'streamer started on' message gives you the actual port in use, so you can tell the consumer where to connect. |
The optional port argument is a very useful change/addition. (LEBTM 😉 ) |
Testing this was something of a challenge; I've introduced an environment variable ( The child processes also still aren't counted for coverage, and I don't know why not. pytest-cov is meant to count subprocesses, and it does when I run the tests locally. 🤷 |
Seems like it's a bit tricker to measure coverage for spawned subprocesses: https://coverage.readthedocs.io/en/latest/subprocess.html#subprocess |
pytest-cov claims that it handles subprocesses automatically, and it does work on my local machine. I thought it was missing because we use SIGKILL to stop the child process, so I made the tests send SIGINT first and give it some time to clean itself up, but it still doesn't seem to count the coverage on CI. 🤷 |
@@ -53,6 +122,9 @@ Stream data from files in the `Karabo bridge | |||
<https://rtd.xfel.eu/docs/data-analysis-user-documentation/en/latest/online.html#streaming-from-karabo-bridge>`_ | |||
format. See :doc:`streaming` for more information. | |||
|
|||
For streaming data from a run directory, we recommend the newer | |||
:ref:`cmd-serve-run` command in place of this. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I have used the old command (only) for streaming from a run directory, using the full path as argument. Apart from the fact that ...-serve-run
is indeed more convenient to achieve this, what would be the main use case for using the old command now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -serve-files
command makes it easy to stream from a non-standard run directory location, e.g. if we do an experimental correction of a run, we might put it in proposal scratch
. Or it gives you a way to stream from red
before we've integrated support for that. Or if users transfer run data back to their home institution and want to use EXtra-data there. I think the new -serve-run
command will be better for ~95% of use cases.
The biggest reason to retain the old command is compatibility & familiarity, though - don't break what's working for people. 🙂
I'm going to merge this on the grounds that I'd already got an LGTM on the interface & implementation, and Fabio has looked through the tests (we also discussed some more points about this on Zulip) without identifying any problems. Thanks for the review! |
prnote: New command |
This is like
karabo-bridge-serve-files
but with the following improvements:--include
, rather than sending everything by default, which is slow. If you really want everything,--include '*'
should do it.--include
multiple times.--allow-partial
to include them (find better name?)You can select a subset of keys using
[]
syntax. For now, I've done it this way for both instrument and control data, whereas Metro only supports that syntax for instrument data. I'm still not sure what I prefer.Closes #455