Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: unify logging #71

Merged
merged 12 commits into from
Nov 20, 2024
Merged

DOC: unify logging #71

merged 12 commits into from
Nov 20, 2024

Conversation

tangkong
Copy link
Contributor

@tangkong tangkong commented Nov 18, 2024

Description

  • Sweep through package and unify logging format
    • removed color codes since they don't render in plain-text
  • Side quest: get really nitpicky with multiprocessing type hints

Motivation and Context

Our instrumentation here is important, and we want it to be as readable as possible.
I didn't touch the Sequencer / Server logging, I suspect we'll want to redo that once we touch it again. (more aptly: once we fully understand what it's doing in our system)

Conventions

  • Node method logging: node_name.update [STATUS]

    • pytrees uses a similar format, and though we don't directly consume py_trees' logging messages I thought it was reasonable convention
    • An alternative was update(node_name) [STATUS], but this de-emphasizes the node doing the operation
  • Logging from spawned subprocesses prepended with <<-- (node_name): ``, and the spawning process prepended with (node_name) -->>:`

    • I wanted a quick way to see what was being executed from where. Saw some pyepics logs and took inspiration
    • PIDs (which were the old way to differentiate these) have been removed since they are automatically logged in the detailed logs (dumped to file, not console)

How Has This Been Tested?

Interactively. No functional code has been changed, so tests still pass

Where Has This Been Documented?

This PR. Some examples below

image
image

image

Pre-merge checklist

  • Code works interactively
  • Code follows the style guide
  • Code contains descriptive docstrings, including context and API
  • New/changed functions and methods are covered in the test suite where possible
  • Test suite passes locally
  • Test suite passes on GitHub Actions
  • Ran docs/pre-release-notes.sh and created a pre-release documentation page
  • Pre-release docs include context, functional descriptions, and contributors as appropriate

@tangkong tangkong changed the title DOC: unify logging WIP/DOC: unify logging Nov 18, 2024
@tangkong tangkong changed the title WIP/DOC: unify logging DOC: unify logging Nov 18, 2024
@tangkong
Copy link
Contributor Author

Alright this is ready to go. Get your logging formatting nitpicks in now!

Copy link
Member

@ZLLentz ZLLentz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the changes that I understand
I asked questions about the ones that I don't understand
I need to take this for a spin before hitting approve

@@ -99,8 +99,6 @@ def update(self) -> py_trees.common.Status:
def terminate(self, new_status: py_trees.common.Status) -> None:
"""Nothing to clean up."""
logger.debug(
py_trees.console.red
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate a bit more about why the console colors need to go? Do they cause problems in the log files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plain-text files wouldn't strip these. Maybe there's a better way around it but I also didn't think the terminate method warranted big angry red text
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly if we want colorized terminal streams we'd use a file handler that strips the formatting characters, maybe this isn't worth the effort right now though

Copy link
Collaborator

@joshc-slac joshc-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging stuff looks great, have a couple minor questions about other changes

@@ -2,7 +2,7 @@

from py_trees.common import Status

from beams.sequencer.helpers.SharedEnum import SharedEnum
from beams.sequencer.helpers.enum import SharedEnum
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight qualm with the renaming of shared_enum to just enum; it seems to obfuscate the process safe nature of this wrapper utility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The object you're using is still called SharedEnum, so that information is still there. I was mostly trying to avoid confusion between:

from beams.sequencer.helpers import SharedEnum   # Previously would import a module, not an object
from beams.sequencer.helpers.SharedEnum import SharedEnum  # "SharedEnum" repetition

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this is a pythonic framing I haven't considered before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there utility in making this shared_enum and not just enum though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly. Here I leaned toward clear disambiguation and providing a place for future enums to live, tending away from the one-class-per-submodule structure we had before. If we prefer this be beams.sequencer.helpers.shared_enum, I can change it back (and would prefer to do so in this PR)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some uber nitpicks I have with my logfile output sourced in this file from outside the current PR diff:

  • Here, in action_node.py on line 31, "creating worker" is not capitalized, but in action_worker.py line 113, "Creating worker" is capitalized.
  • These messages feel redundant in the log output
  • So they end up feeling both redundant and clashing somehow
  • They are always followed by a not so useful init message sourced from here on line 41
2024-11-19 11:00:52 - PID 314784     action_node.py: 31  __init__           DEBUG    creating worker from 314784
2024-11-19 11:00:52 - PID 314784   action_worker.py: 113 __init__           DEBUG    Creating worker (act_close_dg2_stp)
2024-11-19 11:00:52 - PID 314784     action_node.py: 41  __init__           DEBUG    ActionNode.__init__()

I suspect the above should either have consistency or be boiled down to one debug message.

One more from this file is that the arrow on line 49 looks strange and unspaced next to subprocess arrows.
It also looks strange to have some log messages be super generic (ActionNode.setup) and others to use the node name.

2024-11-19 11:00:52 - PID 314784     action_node.py: 48  setup              DEBUG    ActionNode.setup()->connections to an external process
2024-11-19 11:00:52 - PID 314784          worker.py: 46  start_work         DEBUG    (act_close_dg2_stp) -->>: Starting work
2024-11-19 11:00:52 - PID 314784     action_node.py: 48  setup              DEBUG    ActionNode.setup()->connections to an external process
2024-11-19 11:00:52 - PID 314784          worker.py: 46  start_work         DEBUG    (act_pfl1_remove) -->>: Starting work
2024-11-19 11:00:52 - PID 314784     action_node.py: 48  setup              DEBUG    ActionNode.setup()->connections to an external process
2024-11-19 11:00:52 - PID 314784          worker.py: 46  start_work         DEBUG    (act_pfl2_remove) -->>: Starting work
2024-11-19 11:00:52 - PID 314784     action_node.py: 48  setup              DEBUG    ActionNode.setup()->connections to an external process

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops some of these were originally removed but got re-added in a merge conflict.

  • action_node:L31 should get removed, made redundant by log file PID tracking
  • action_node:L41 should get removed, I mean to only instrument the BT-related methods
  • action_node:L49 should get removed, redundant information to the log messages in Worker.start_work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this whole file got messed up 😢

Comment on lines 36 to 39
logger.debug(f" <<-- ({self.name}): {self.pv} = {value}")

# specific caput logic to SetPVActionItem
caput(self.pv, self.value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this is missing some log message that we are setting a PV to a value.
Currently the way these logs read is reporting the current value using python's assignment operator which is confusing:

2024-11-19 11:00:52 - PID 314804          action.py: 36  work_func          DEBUG     <<-- (act_pfl1_remove): MFX:LENS:DIA:01:REMOVE = 0
2024-11-19 11:00:52 - PID 314804   action_worker.py: 76  work_wrapper       DEBUG     <<-- (act_pfl1_remove): RUNNING
2024-11-19 11:00:53 - PID 314804   action_worker.py: 66  work_wrapper       DEBUG     <<-- (act_pfl1_remove) not complete, doing work
2024-11-19 11:00:53 - PID 314804          action.py: 36  work_func          DEBUG     <<-- (act_pfl1_remove): MFX:LENS:DIA:01:REMOVE = 1

Which I originally interpreted as "I'm setting remove to 0" and "I'm setting remove to 1" which is not the case. In some sense the original "value is" was clearer.

Anyway, the specifics should be quibbled more, but I'd like to see something very explicit and specific to this kind of action node including the get and the put for example:

caget MFX:LENS:DIA:01:REMOVE -> got 0
caput MFX:LENS:DIA:01:REMOVE 1

Symbols are fine too but maybe be more explicit about checking and setting

MFX:LENS:DIA:01:REMOVE == 0
MFX:LENS:DIA:01:REMOVE => 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call out, adjusted to:

<<-- (name): caget(PV:NAME) -> value
<<-- (name): caput(PV:NAME, value)

if comp_condition():
return py_trees.common.Status.SUCCESS

# specific caput logic to IncPVActionItem
caput(self.pv, value + self.increment)
return py_trees.common.Status.RUNNING
except Exception as ex:
logger.warning(f"{self.name}: work failed: {ex}")
logger.warning(f" <<-- ({self.name}): work failed, {ex}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that these log formats are specified clearly in the PR text here but then applied manually wherever they might need to be used, across multiple modules.

Possibly we should be defining logger subclasses or helper functions to proliferate elements like the <<-- so we can apply them uniformly and make it easy on ourselves if we change our minds later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I was thinking something of this shape. It could be a PR on top of this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key here is to make sure we know what is getting executed in subprocesses. If we truly want a hands off approach, maybe we should be tracking the parent PID and appending this if the process PID is different?

That's some shenanigans beyond the scope of this one I imagine.

ZLLentz
ZLLentz previously approved these changes Nov 19, 2024
Copy link
Member

@ZLLentz ZLLentz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as a really great step forward, despite my nitpicks and calls for some refactoring

joshc-slac
joshc-slac previously approved these changes Nov 19, 2024
Copy link
Collaborator

@joshc-slac joshc-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as well. Typing changes in this PR all seem well motivated.
Logging changes are a great step forward. Agree a future effort to enable helper logger functions to:

  • unify logging information communication schema
  • make making changes to that schema easier
  • possibly make it easier to use reflection within our codebase

@tangkong tangkong dismissed stale reviews from joshc-slac and ZLLentz via 75435ab November 19, 2024 20:31
@@ -58,7 +61,9 @@ class EPICSValue(BaseValue):
as_string: bool = False

def get_value(self) -> Any:
return caget(self.pv_name, as_string=self.as_string)
value = caget(self.pv_name, as_string=self.as_string)
logger.debug(f" <<-- (EPICSValue): caget({self.pv_name}) -> {value}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These messages clash with the others in the logfile now, since in other cases the parentheses include the node name. I'm not sure if the other redundancies are logging related or if we actually do so many extra cagets.

2024-11-19 13:22:22 - PID 335106            base.py: 65  get_value          DEBUG     <<-- (EPICSValue): caget(MFX:LENS:DIA:01:STATE) -> 1
2024-11-19 13:22:22 - PID 335106   action_worker.py: 66  work_wrapper       DEBUG     <<-- (act_pfl1_remove) not complete, doing work
2024-11-19 13:22:22 - PID 335106            base.py: 65  get_value          DEBUG     <<-- (EPICSValue): caget(MFX:LENS:DIA:01:STATE) -> 1
2024-11-19 13:22:22 - PID 335106   action_worker.py: 66  work_wrapper       DEBUG     <<-- (act_pfl1_remove) not complete, doing work
2024-11-19 13:22:22 - PID 335106          action.py: 33  work_func          DEBUG     <<-- (act_pfl1_remove): caget(MFX:LENS:DIA:01:REMOVE) -> 0
2024-11-19 13:22:22 - PID 335106            base.py: 65  get_value          DEBUG     <<-- (EPICSValue): caget(MFX:LENS:DIA:01:STATE) -> 1
2024-11-19 13:22:22 - PID 335106          action.py: 33  work_func          DEBUG     <<-- (act_pfl1_remove): caget(MFX:LENS:DIA:01:REMOVE) -> 0
2024-11-19 13:22:22 - PID 335106          action.py: 40  work_func          DEBUG     <<-- (act_pfl1_remove): caput(MFX:LENS:DIA:01:REMOVE, 0)
2024-11-19 13:22:22 - PID 335106            base.py: 65  get_value          DEBUG     <<-- (EPICSValue): caget(MFX:LENS:DIA:01:STATE) -> 1
2024-11-19 13:22:22 - PID 335106          action.py: 40  work_func          DEBUG     <<-- (act_pfl1_remove): caput(MFX:LENS:DIA:01:REMOVE, 0)
2024-11-19 13:22:22 - PID 335106   action_worker.py: 76  work_wrapper       DEBUG     <<-- (act_pfl1_remove): RUNNING
2024-11-19 13:22:22 - PID 335106   action_worker.py: 76  work_wrapper       DEBUG     <<-- (act_pfl1_remove): RUNNING

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure exactly how to annotate these. I'm relatively sure they execute inside the worker subprocess, but there is not EpicsValue.name to reference. I'm open to suggestions here, but thought this was the most clear.

I do believe we are actually running this many cagets

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is something to revisit later, I imagine with some logging helpers/adapters/some equivalent we could automatically fill in the node name given the PID

@ZLLentz
Copy link
Member

ZLLentz commented Nov 19, 2024

To my eye, all the other log messages looked maximally clean and correct!

Copy link
Member

@ZLLentz ZLLentz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-applying green check!

@tangkong tangkong merged commit 2ac8df7 into pcdshub:master Nov 20, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants