-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: tools: crm_mon --daemonize should update when disconnected #2868
Conversation
d14a73c
to
ba96437
Compare
The first commit may not be comprehensive -- for example, "waiting for start" type messages may still appear in daemonized text output during startup. It's an improvement and should take care of most situations after we're already up and running. |
pushed from the wrong local branch... |
Updated. This gets the job done for daemonized mode without switching on output formats and such, and without negatively impacting console or one-shot mode. It does not print a string representation of the I think it would be a good idea to merge them into library functions at some point -- and likewise for other similar static functions. I started working on it and named it I also put a couple of formatter functions for a new message in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've improved the crm_mon code drastically over the past few years, but it's still difficult to follow :-/
Depending on how much effort it is, before having this merged, I may try to make the console and daemonized mode share more code with one-shot mode since we're doing that work in #2902. That would entail, at minimum, using |
Updated minimally to address review. Most of the massive Compare output is from rebasing on current main and resolving conflicts. There's still room for improvement in both the output and the implementation, but this gets the job done. I'll be filing another PR with some other refactors as a start |
The CI failures were RPM issues on three hosts. Two of them were OpenSUSE hosts that timed out when trying to access the libqb100 RPM. The other was CentOS 9 s390x: " - nothing provides libqb(s390-64) = 2.0.6-1.el9.next needed by libqb-devel-2.0.6-1.el9.next.s390x" |
Totally replaced commits. Still working out some schema issues (edit: fixed in #2919) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
tools/crm_mon.c
Outdated
@@ -672,7 +672,7 @@ reconnect_after_timeout(gpointer data) | |||
static void | |||
mon_cib_connection_destroy(gpointer user_data) | |||
{ | |||
out->info(out, "Connection to the cluster-daemons terminated"); | |||
out->transient(out, "\nConnection to the cluster lost"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a newline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because the string of ellipses (of varying length) before the first "Connection lost" message always bugged me:
Cluster Summary:
* Stack: corosync
* Current DC: node1 (version 2.1.5-0b3656e85) - partition with quorum
* Last updated: Thu Nov 3 17:02:27 2022
* Last change: Thu Nov 3 15:59:10 2022 by hacluster via crmd on node1
* 1 node configured
* 2 resource instances configured (1 DISABLED)
Node List:
* Online: [ node1 ]
Active Resources:
* dummy (ocf:pacemaker:Dummy): Started node1
......Connection to the cluster-daemons terminated
Connection to the cluster-daemons terminated
I'm not tied to the newline though. Other options besides the newline:
- Leave it as-is
- Modify the curses error and info/transient functions so that they move the cursor to the beginning of the current line before printing. Not sure if that would have any unintended consequences, but it definitely looks the nicest in this scenario:
Cluster Summary:
...
Active Resources:
* dummy (ocf:pacemaker:Dummy): Started node1
Connection to the cluster lost
Connection to the cluster lost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely think the handling should be in the implementation rather than the message string. The idea is that the message string should be identical across all formats and not have format-specific control characters.
Moving to the beginning of the line should be fine for error at least (clearing the line would also be a good idea in case the existing content is longer than the error message). For info/transient it would probably be fine too. I suppose the ideal would be to check the line position and output a newline if not at the beginning, but that might not be worth the hassle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to just remove the newline from the commit for now. We can revisit this later.
that might not be worth the hassle.
For curses, doing something based on the current position is no more difficult than moving to the beginning of the line. Which is to say, not very. It's probably doable in other formats as well (something like get the character at ftell(fd) - 1
and act if that character is not a newline? Haven't done much with streams).
There may or may not be existing places where we build part of a string piece-by-piece based on conditions, and then finish it with out->info(out, "the rest")
. Any new approach that adds a newline or overwrites part of a line would break that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be, we rejected that possibility when designing the interface. We use the usual techniques to build a string then pass the whole string to the output object function. That's the only way to abstract e.g. text vs xml needing a newline or not. We considered separate methods for build-up-a-line and end-a-line but there are already functions for that purpose, and not all formats are line-oriented.
Just dropping the newline is fine for this PR
The XML message should use the short output, not the friendly output. Signed-off-by: Reid Wahl <[email protected]>
This isn't really important, but it's more correct to set the pacemakerd health last updated time based on the time of the query rather than the time of the message call. Signed-off-by: Reid Wahl <[email protected]>
As far as I can tell, it's equivalent to `clear(); refresh();`, except that it loops over all the lines individually. If we feel it's worth functionizing to save one line per call, we can add it back. I default to being explicit here. Signed-off-by: Reid Wahl <[email protected]>
Currently, if we fail to get a query result, we return pcmk_rc_no_input. This masks the true source of the error, so we should only do this as a failsafe when the return code is pcmk_rc_ok and there's still no query result XML. Signed-off-by: Reid Wahl <[email protected]>
a30e6f0
to
757936d
Compare
Updated to address review, which was more involved than expected (see #2868 (comment)). This is already way too big, so if it gets any bigger, it'll be split up.
Not addressed yet:
|
b406e53
to
6d72ffa
Compare
Ref T15 Signed-off-by: Reid Wahl <[email protected]>
Changes: * New crm-mon-disconnected message when not connected to the CIB * Pacemaker status in cluster stack Ref T15 Signed-off-by: Reid Wahl <[email protected]>
crm_mon's cluster stack section now includes the pacemakerd status for native CIB connections. The stack section does not include the pacemakerd status for file and remote CIB connections. Since the summary should only be shown when we have a CIB connection, there should be three possible values in practice: * Pacemaker is running * Pacemaker is shutting down * pacemaker-remoted is running (on a Pacemaker Remote node) Signed-off-by: Reid Wahl <[email protected]>
pcmk__pacemakerd_status() can replace most of what the static pacemakerd_status() was doing. A new setup_api_connections() function can take the place of pacemakerd_status() and conditionally connect to the fencer and CIB. There's also no compelling reason to keep the use_cib_native and on_remote_node variables. For the latter, we can use a global pcmkd_state variable. (It may be desirable to use fewer globals, but that's out of scope regardless.) Signed-off-by: Reid Wahl <[email protected]>
To align with the cib and daemons functions Signed-off-by: Reid Wahl <[email protected]>
Should have been done in f88cde7 Signed-off-by: Reid Wahl <[email protected]>
Note also that the "Writing html..." message was obsolete. Signed-off-by: Reid Wahl <[email protected]>
With crm_mon --daemonize, currently the output will continue to show the last known status after the cluster is stopped on the local node. This commit causes it to write "Not connected to CIB" (and more specific details) to the output file: * before the initial connection; * as soon as the CIB connection is destroyed; and * every time a reconnection attempt fails External agents are notified via traps rather than via output, so we can ignore them. We register the new message formatter functions directly instead of via crm_mon_register_messages(). The reason is that crm_mon_register_messages() is used for messages specific to the curses format, so that they can be registered from within crm_mon.c. The new crm-mon-disconnected formatter isn't used with console output; it's not really necessary, and it's more complicated to implement (attempts so far led to display issues after connection loss). In the future it might make sense to rename crm_mon_register_messages() in crm_mon_curses.c and reuse that name in crm_mon.c. Closes T15 Signed-off-by: Reid Wahl <[email protected]>
Today I learned you don't have to use "int" here. Signed-off-by: Reid Wahl <[email protected]>
Updated to:
|
With
crm_mon --daemonize
, the output will continue to show the last known status after the cluster is stopped on the local node. It should go back to something like"Cluster is not available"
.Likewise, messages like
"Reconnecting..."
should not go to the daemonized output. The output file (or external handler) should receive only the cluster status. So we print those messages only ifoutput_format
ismon_output_console
(or if we're in one-shot mode, where applicable). Normally those messages don't get printed to the daemonized output anyway because we don't flush the buffer, but theoretically they could get printed if the buffer fills up.It's questionable whether we want to print
"Cluster is not available"
when we're in interactive mode or only in daemonized mode. For now, we'll do daemonized-only. Interactive (console) mode already gets the"Connection to cluster daemons terminated"
message, which is quickly replaced by"Reconnecting..."
.External agents are notified via traps rather than via output, so we can ignore them.
Closes T15