Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature to tune log levels for individual subsystems #16665

Closed
wants to merge 11 commits into from

Conversation

kisunji
Copy link
Contributor

@kisunji kisunji commented Mar 17, 2023

Description

Adds log_sublevels configuration which can fine-tune the log verbosity of individual subcomponents in Consul. This will help users selectively suppress subsystems they may find noisy or help debug certain subsystems by allowing for fine-grained TRACE logs.

Future work

This configuration is not yet reloadable

Examples

Default info logging

$ consul agent -dev -log-level info

...

==> Log data will now stream in as it occurs:

2023-03-17T11:00:57.020-0400 [INFO]  agent.server.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:6eb83d32-5a63-7307-898a-cd568ee705b0 Address:127.0.0.3:8300}]"
2023-03-17T11:00:57.020-0400 [INFO]  agent.server.raft: entering follower state: follower="Node at 127.0.0.3:8300 [Follower]" leader-address= leader-id=
2023-03-17T11:00:57.021-0400 [INFO]  agent.server.serf.wan: serf: EventMemberJoin: chriskim-C02FX04VMD6R.dc1 127.0.0.3
2023-03-17T11:00:57.021-0400 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: chriskim-C02FX04VMD6R 127.0.0.3
2023-03-17T11:00:57.021-0400 [INFO]  agent.router: Initializing LAN area manager

Trace logging makes all subcomponents noisy

$ consul agent -dev -log-level trace

...

2023-03-17T11:30:47.812-0400 [TRACE] agent.tlsutil: Update: version=1
2023-03-17T11:30:47.813-0400 [TRACE] agent.tlsutil: OutgoingRPCWrapper: version=1
2023-03-17T11:30:47.813-0400 [TRACE] agent.tlsutil: OutgoingALPNRPCWrapper: version=1
2023-03-17T11:30:47.813-0400 [TRACE] agent: [core][Channel #1] Channel created
2023-03-17T11:30:47.814-0400 [TRACE] agent: [core][Channel #1] original dial target is: "consul://dc1.8d1dc4e3-78e3-f3b3-8199-224f91fc8e7f/server.dc1"
2023-03-17T11:30:47.814-0400 [TRACE] agent: [core][Channel #1] parsed dial target is: {Scheme:consul Authority:dc1.8d1dc4e3-78e3-f3b3-8199-224f91fc8e7f Endpoint:server.dc1 URL:{Scheme:consul Opaque: User: Host:dc1.8d1dc4e3-78e3-f3b3-8199-224f91fc8e7f Path:/server.dc1 RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}
2023-03-17T11:30:47.814-0400 [TRACE] agent: [core][Channel #1] Channel authority set to "server.dc1"
2023-03-17T11:30:47.814-0400 [TRACE] agent: [core][Channel #1] Resolver state updated: {
  "Addresses": null,
  "ServiceConfig": null,
  "Attributes": null
} ()
2023-03-17T11:30:47.815-0400 [TRACE] agent: [core][Channel #1] Channel switches to new LB policy "consul-internal"
2023-03-17T11:30:47.815-0400 [TRACE] agent.grpc.balancer: creating balancer: target=consul://dc1.8d1dc4e3-78e3-f3b3-8199-224f91fc8e7f/server.dc1
2023-03-17T11:30:47.815-0400 [DEBUG] agent.grpc.balancer: switching server: target=consul://dc1.8d1dc4e3-78e3-f3b3-8199-224f91fc8e7f/server.dc1 from=<none> to=<none>
2023-03-17T11:30:47.815-0400 [TRACE] agent: [core][Channel #1] Channel Connectivity change to TRANSIENT_FAILURE

Fine-grained trace logging for agent.server.peering-syncer

$ consul agent -dev -log-level warn -log-sublevels agent.server.peering-syncer:trace

...

==> Log data will now stream in as it occurs:

2023-03-17T11:37:47.833-0400 [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
2023-03-17T11:37:47.833-0400 [TRACE] agent.server.peering-syncer: syncing new list of peers: num_peers=0 sequence_id=a71e3343-e22b-6790-3351-df33bc2bbd11
2023-03-17T11:37:47.833-0400 [TRACE] agent.server.peering-syncer: checking connected streams: streams=map[] sequence_id=a71e3343-e22b-6790-3351-df33bc2bbd11
2023-03-17T11:37:47.833-0400 [TRACE] agent.server.peering-syncer: blocking for changes: sequence_id=a71e3343-e22b-6790-3351-df33bc2bbd11
2023-03-17T11:37:47.834-0400 [TRACE] agent.server.peering-syncer: unblocked: sequence_id=a71e3343-e22b-6790-3351-df33bc2bbd11
2023-03-17T11:37:47.834-0400 [TRACE] agent.server.peering-syncer: syncing new list of peers: num_peers=0 sequence_id=fa411fe1-b824-3c33-2789-8bd7c3df12f6
2023-03-17T11:37:47.834-0400 [TRACE] agent.server.peering-syncer: checking connected streams: streams=map[] sequence_id=fa411fe1-b824-3c33-2789-8bd7c3df12f6
2023-03-17T11:37:47.842-0400 [TRACE] agent.server.peering-syncer: blocking for changes: sequence_id=fa411fe1-b824-3c33-2789-8bd7c3df12f6
2023-03-17T11:37:47.842-0400 [TRACE] agent.server.peering-syncer: unblocked: sequence_id=fa411fe1-b824-3c33-2789-8bd7c3df12f6

Limitations

This feature depends on the root Consul logger (named agent) being the parent in the sublogger tree. Some Consul components may use independently created loggers which do not inherit the configuration OR may not be using named subloggers.

Testing & Reproduction steps

  • Added new tests
  • Tested manually

PR Checklist

  • updated test coverage
  • external facing docs updated
  • not a security concern

@github-actions github-actions bot added pr/dependencies PR specifically updates dependencies of project theme/config Relating to Consul Agent configuration, including reloading theme/telemetry Anything related to telemetry or observability labels Mar 17, 2023
@kisunji kisunji force-pushed the kisunji/subloggers branch from 1300dcd to 8a65a49 Compare March 17, 2023 15:18
go.mod Outdated Show resolved Hide resolved
@kisunji kisunji added the backport/1.15 This release series is no longer active on CE. Use backport/ent/1.15. label Mar 17, 2023
@kisunji kisunji requested a review from a team March 17, 2023 15:49
@kisunji kisunji force-pushed the kisunji/subloggers branch 2 times, most recently from 8be3c67 to ffe6dc0 Compare March 17, 2023 20:40
Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work here! I have wanted this many times myself.

One bikeshed comment would be about the "sublevel" naming. Is that naming a shortened form of subsystem log level? I would probably find something with subsystem or component in the name easier to intuit its use.

@kisunji
Copy link
Contributor Author

kisunji commented Mar 20, 2023

Great work here! I have wanted this many times myself.

One bikeshed comment would be about the "sublevel" naming. Is that naming a shortened form of subsystem log level? I would probably find something with subsystem or component in the name easier to intuit its use.

Would LogSubsystemLevels be clearer? Or LogSubloggerLevel? Open to suggestions!

@kisunji kisunji force-pushed the kisunji/subloggers branch 2 times, most recently from a3f4f1d to 7c1bddb Compare March 20, 2023 15:28
@jkirschner-hashicorp
Copy link
Contributor

@kisunji : It looks like this allows modifying a subsystem log level at startup. Does it make sense to extend this same functionality to consul monitor, so a subsystem log level can be set during runtime (to assist with debugging an already running agent)?

@kisunji
Copy link
Contributor Author

kisunji commented Mar 20, 2023

@kisunji : It looks like this allows modifying a subsystem log level at startup. Does it make sense to extend this same functionality to consul monitor, so a subsystem log level can be set during runtime (to assist with debugging an already running agent)?

Done! c01ee5e

@kisunji
Copy link
Contributor Author

kisunji commented May 11, 2023

@mkeeler is the naming the only blocker for this PR? I believe @lkysow preferred the brevity of -log-sublevels.

@kisunji kisunji force-pushed the kisunji/subloggers branch from 383d4ba to 5a10aff Compare May 11, 2023 19:06
@lkysow
Copy link
Member

lkysow commented May 11, 2023

Although I don't super mind either way, just a preference.

@kisunji kisunji force-pushed the kisunji/subloggers branch from 5a10aff to a71bfba Compare May 12, 2023 13:49
@github-actions
Copy link

This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions.

@github-actions github-actions bot added the meta/stale Automatically flagged for inactivity by stalebot label Aug 20, 2023
@kisunji kisunji added meta/staleproof Exempt from stalebot automation and removed meta/stale Automatically flagged for inactivity by stalebot labels Aug 22, 2023
@kisunji
Copy link
Contributor Author

kisunji commented Aug 22, 2023

Don't have time to bring this over the finish line but potential improvements:

  • break this up into multiple PRs for easier review
  • make config reloadable

@zalimeni
Copy link
Member

Closing this PR as stale for now, but hoping we can return to it if need arises in the future as this was a really nice improvement.

@zalimeni zalimeni closed this Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.15 This release series is no longer active on CE. Use backport/ent/1.15. meta/staleproof Exempt from stalebot automation pr/dependencies PR specifically updates dependencies of project theme/config Relating to Consul Agent configuration, including reloading theme/telemetry Anything related to telemetry or observability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants