karabo-bridge-serve-run command #458

takluyver · 2023-10-24T16:10:54Z

This is like karabo-bridge-serve-files but with the following improvements:

You specify proposal & run number rather than a run directory
It combines raw & proc data for streaming (CLI to stream combined raw+proc data #455)
You have to specify sources with --include, rather than sending everything by default, which is slow. If you really want everything, --include '*' should do it.
You can select >1 pattern by passing --include multiple times.
Trains with missing data are skipped by default, use --allow-partial to include them (find better name?)
Some useful info is shown while it runs (also added to -serve-files)

#                   Proposal run
karabo-bridge-serve-run 4237 219 --port 41234 \
    --include 'FXE_XAD_JF1M/DET/JNGFR*:daqOutput[data.adc]' \
    --include 'FXE_AUXT_LIC/DOOCS/PPODL[*Position]'

You can select a subset of keys using [] syntax. For now, I've done it this way for both instrument and control data, whereas Metro only supports that syntax for instrument data. I'm still not sure what I prefer.

Closes #455

review-notebook-app · 2023-10-24T16:10:59Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

takluyver · 2023-10-24T16:30:28Z

extra_data/export.py

dallanto · 2023-10-30T12:26:22Z

extra_data/export.py

+        count += 1
+        new_time = time.monotonic()
+        if count % 5 == 0:
+            rate = len(deque) / (new_time - sent_times[0])


I understand this is a rolling average, and limited to 10 elements, the length is 10 at max. (10 trains corresp. to 1 second), correct? Then: sent_times[0] is not the first time ever, but the first of the current deque (oldest elements are thrown out), so sent_times[-1] minus 10 elements, correct?

Technically, why is it len(deque) and not len(sent_times) - do you actually access the overall length of a deque type (not the actual object which has the name "sent_times")?

Yup, 10 trains correspond to one second, and with a limit on the deque, each call to .append() drops one item out of the beginning (once it has filled up).

Each timestamp is measured just after sending a train, and we calculate this before adding the new timestamp. So since sent_times[0] we have sent 9 more trains corresponding to the other 9 entries in sent_times, plus the one we've just sent but not yet added. So 10 trains in the measured time interval.

extra_data/export.py

dallanto · 2023-10-30T12:46:27Z

extra_data/export.py

        sent_times.append(new_time)
+    print_update(end='\n')


Factored-out function to allow different line-end behaviour, carriage return vs. line break - fair enough to avoid the longish format string expression twice.

dallanto · 2023-10-30T13:09:22Z

extra_data/cli/serve_run.py

+        from ..export import serve_data
+    except ImportError as e:
+        sys.exit(IMPORT_FAILED_MSG.format(e))
+


What is the reason to do this import here within the function and not at the top of the file? (except of course the fail message constant needs to be defined first)

I'd usually try to avoid side-effects (like sys.exit()) when loading a module, although it doesn't matter so much for a module defining a CLI like this. It's also handy that --help still works even without the extra dependencies.

We could still have the import at the top like this:

# Top of file try: from ..export import serve_data except ImportError: serve_data = None # In the function if serve_data is None: sys.exit(msg)

But that looks less neat to me.

dallanto · 2023-10-30T15:35:29Z

Hi Thomas, irrespective of my questions, LGTM.

takluyver · 2023-10-30T15:59:55Z

Thanks Fabio!

I just tweaked the CLI design slightly. I thought the 3 numbers together (proposal, run, port) with no markers was somewhat confusing, so I've made --port an option:

karabo-bridge-serve-run 4237 219 --port 41234 ...

I think this makes the meaning clearer. It also means port can be unspecified, which defaults to picking a random unused port. The 'streamer started on' message gives you the actual port in use, so you can tell the consumer where to connect.

dallanto · 2023-10-31T07:21:22Z

The optional port argument is a very useful change/addition. (LEBTM 😉 )

extra_data/tests/test_streamer.py

takluyver · 2023-11-01T15:37:51Z

Testing this was something of a challenge; I've introduced an environment variable (EXTRA_DATA_DATA_ROOT) to override the /gpfs/exfel/exp root location in a child process. This is undocumented because it's only for testing, at least for now.

The child processes also still aren't counted for coverage, and I don't know why not. pytest-cov is meant to count subprocesses, and it does when I run the tests locally. 🤷

tmichela · 2023-11-02T07:14:02Z

Seems like it's a bit tricker to measure coverage for spawned subprocesses: https://coverage.readthedocs.io/en/latest/subprocess.html#subprocess

takluyver · 2023-11-02T08:25:28Z

pytest-cov claims that it handles subprocesses automatically, and it does work on my local machine. I thought it was missing because we use SIGKILL to stop the child process, so I made the tests send SIGINT first and give it some time to clean itself up, but it still doesn't seem to count the coverage on CI. 🤷

dallanto · 2023-11-03T20:06:49Z

docs/cli.rst

@@ -53,6 +122,9 @@ Stream data from files in the `Karabo bridge
 <https://rtd.xfel.eu/docs/data-analysis-user-documentation/en/latest/online.html#streaming-from-karabo-bridge>`_
 format. See :doc:`streaming` for more information.

+For streaming data from a run directory, we recommend the newer
+:ref:`cmd-serve-run` command in place of this.
+


In fact, I have used the old command (only) for streaming from a run directory, using the full path as argument. Apart from the fact that ...-serve-run is indeed more convenient to achieve this, what would be the main use case for using the old command now?

The -serve-files command makes it easy to stream from a non-standard run directory location, e.g. if we do an experimental correction of a run, we might put it in proposal scratch. Or it gives you a way to stream from red before we've integrated support for that. Or if users transfer run data back to their home institution and want to use EXtra-data there. I think the new -serve-run command will be better for ~95% of use cases.

The biggest reason to retain the old command is compatibility & familiarity, though - don't break what's working for people. 🙂

takluyver · 2023-11-06T11:11:19Z

I'm going to merge this on the grounds that I'd already got an LGTM on the interface & implementation, and Fabio has looked through the tests (we also discussed some more points about this on Zulip) without identifying any problems.

Thanks for the review!

takluyver · 2023-11-06T11:14:42Z

prnote: New command karabo-bridge-serve-run to more conveniently stream data from a saved run in Karabo Bridge format.

takluyver and others added 5 commits October 19, 2023 14:00

Split out serve_data function and show moree info while streaming

a0f059c

Fix getting length of deque

e97f0cb

Print final update when streaming finishes

2d4f64d

Add karabo-bridge-serve-run command

eae9935

karabo-bridge-serve-run: only send complete trains by default

0e3d0df

takluyver added the enhancement New feature or request label Oct 24, 2023

Fix parameters to serve_data()

a67217f

github-advanced-security bot found potential problems Oct 24, 2023

View reviewed changes

extra_data/export.py Fixed Show fixed Hide fixed

Ensure print_update() can't have an undefined variable

cda0b9c

dallanto reviewed Oct 30, 2023

View reviewed changes

Make --port an option

73208fc

takluyver added 4 commits November 1, 2023 11:11

Document karabo-bridge-serve-run command

10f145a

Add a test for karabo-bridge-serve-run

4554836

Simplify test

505aa3b

Recreate raw+proc folder tree for each test using it

a46793d

takluyver force-pushed the serve-run branch from 067a29a to a46793d Compare November 1, 2023 14:21

Give test subprocesses a chance to exit cleanly

f15401c

github-advanced-security bot found potential problems Nov 1, 2023

View reviewed changes

extra_data/tests/test_streamer.py Dismissed Show dismissed Hide dismissed

dallanto reviewed Nov 3, 2023

View reviewed changes

takluyver merged commit 3389e48 into master Nov 6, 2023
7 of 9 checks passed

takluyver deleted the serve-run branch November 6, 2023 11:13

takluyver added this to the 1.15 milestone Nov 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

karabo-bridge-serve-run command #458

karabo-bridge-serve-run command #458

takluyver commented Oct 24, 2023 •

edited

Loading

review-notebook-app bot commented Oct 24, 2023

takluyver commented Oct 24, 2023

dallanto Oct 30, 2023

dallanto Oct 30, 2023

takluyver Oct 30, 2023

dallanto Oct 30, 2023

dallanto Oct 30, 2023

takluyver Oct 30, 2023

dallanto commented Oct 30, 2023

takluyver commented Oct 30, 2023

dallanto commented Oct 31, 2023

takluyver commented Nov 1, 2023

tmichela commented Nov 2, 2023 •

edited

Loading

takluyver commented Nov 2, 2023

dallanto Nov 3, 2023

takluyver Nov 6, 2023

takluyver commented Nov 6, 2023

takluyver commented Nov 6, 2023

karabo-bridge-serve-run command #458

karabo-bridge-serve-run command #458

Conversation

takluyver commented Oct 24, 2023 • edited Loading

review-notebook-app bot commented Oct 24, 2023

takluyver commented Oct 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dallanto commented Oct 30, 2023

takluyver commented Oct 30, 2023

dallanto commented Oct 31, 2023

takluyver commented Nov 1, 2023

tmichela commented Nov 2, 2023 • edited Loading

takluyver commented Nov 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

takluyver commented Nov 6, 2023

takluyver commented Nov 6, 2023

takluyver commented Oct 24, 2023 •

edited

Loading

tmichela commented Nov 2, 2023 •

edited

Loading