Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The behaviour of cluster settings API for changing log levels #65208

Open
pgomulka opened this issue Nov 18, 2020 · 7 comments
Open

The behaviour of cluster settings API for changing log levels #65208

pgomulka opened this issue Nov 18, 2020 · 7 comments
Labels
>bug :Core/Infra/Logging Log management and logging utilities :Core/Infra/Settings Settings infrastructure and APIs Team:Core/Infra Meta label for core/infra team

Comments

@pgomulka
Copy link
Contributor

pgomulka commented Nov 18, 2020

there is some confusion about how configuring logging levels via cluster settings api works.

as per doc we change logging level for a logger via

PUT /_cluster/settings
{
  "transient": {
    "logger.org.elasticsearch.transport": "DEBUG"
  }
}

the current logging also trickle down the logging hierarchy and sets that level on all the loggers down the hierarchy. Meaning loggers logger.org.elasticsearch.transport.X and logger.org.elasticsearch.transport.X.Y will have a level DEBUG too.
This is expected and works the same way as log4j configuration from a file.

However if you set a logger down the hierarchy earlier
"logger.org.elasticsearch.transport.X": "TRACE"
the subsequent call to set "logger.org.elasticsearch.transport": "DEBUG" will
overridde "logger.org.elasticsearch.transport.X" logger to DEBUG

  • that is different to what you would expect from log4j configuration,
    -- but it make using the API easier. You don't have to remember which loggers down the hierarchy you have changed and allows you to "clear" them back

  • the get /_cluster/settings still shows logger.org.elasticsearch.transport.X" = trace even thought you overriden it with logger.org.elasticsearch.transport = debug

  • in order to really enforce logger.org.elasticsearch.transport.X to be trace you have to trigger cluster state settings change. Meaning you have to change it to something else (like debug) and then switch it back to intended trace value.

I think we should either

  • change the behaviour so that we don't clear the overrides when setting a parent logger
    or
  • document that behaviour and
  • remove the override setting from cluster settings as the result from get /_cluster/settings is incorrect

reproduce steps:

curl --request PUT \
  --url http://localhost:9200/_cluster/settings \
  --header 'content-type: application/json' \
  --data '{
  "transient": {
		"logger.org.elasticsearch.http.HttpTracer": "trace"
  }
}'

expect:

curl --request GET \
  --url http://localhost:9200/_cluster/settings \
  --header 'content-type: application/json'

result with

{
  "persistent": {},
  "transient": {
    "logger": {
      "org.elasticsearch.http":  "trace"

      }
    }
  }
}

also expect that HttpTracer starts logging its trace messages

curl --request PUT \
  --url http://localhost:9200/_cluster/settings \
  --header 'authorization: Basic ZWxhc3RpYzpwYXNzd29yZA==' \
  --header 'content-type: application/json' \
  --data '{
  "transient": {
		"logger.org.elasticsearch": "INFO"
  }
}'

HttpTracer no longer logs

the result of _cluster/settings is not clear

curl --request GET \
  --url http://localhost:9200/_cluster/settings \
  --header 'content-type: application/json'
{
  "persistent": {},
  "transient": {
    "logger": {
      "org": {
        "elasticsearch": "info",
        "elasticsearch.http": "trace"
      }
    }
  }
}

the change was intentionally introduced #20463

@pgomulka pgomulka added >bug discuss :Core/Infra/Settings Settings infrastructure and APIs :Core/Infra/Logging Log management and logging utilities needs:triage Requires assignment of a team area label labels Nov 18, 2020
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Nov 18, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@pgomulka
Copy link
Contributor Author

cc @ywangd and @jasontedor

@pgomulka
Copy link
Contributor Author

and it is more unclear when you "clear" the loggers with
"logger._root": "INFO"
and then "unset" it with logger._root: null because a GET _cluster/setting would still show previously trace logger but not show that a root logger was set to INFO and "cleared" all other loggers

 "logger": {
      "org.elasticsearch.http": "trace"
      }
    }

@pgomulka
Copy link
Contributor Author

pgomulka commented Nov 18, 2020

we discussed this on core infra sync on 18th Nov.
points were raised that for some users this might be an intended behaviour to quickly override all settings for the hierarchy of loggers. User might not be aware that we set some loggers to error and this api allows him to enable everything to some more verbose levels. (example config/repository-s3/log4j2.properties)

However, this also has a downside that if a user overrides a logger which was not set by him earlier, he will not be aware that he is changing it and what was the previous level.

If we were to stay with current behaviour of allowing to apply the level for the whole hierarchy we should consider to:

  • remove the overrides which are invalidated by the setting in parent logger (so that get _cluster/settings response is correct)
  • be able to revert back the "previous" values (by previous we could mean the values that were set on startup)
  • possibly introduce a new get logging/settings api to show the current state of logging levels?

we did not reach conclusion and will continue discussion here

@pgomulka
Copy link
Contributor Author

two issues I found that were asked due to the way it works now
#53061 - user asks if this is a bug, we replied back that this was by design
#47251 - user asks about security logger being enabled after the root logger is cleared (nulled) - fixed by a security team by implementing a marker

@pgomulka pgomulka removed discuss needs:triage Requires assignment of a team area label labels Nov 19, 2020
@albertzaharovits
Copy link
Contributor

albertzaharovits commented Nov 19, 2020

I think that, given the users' expectations coming from log4j2 configuration, the behavior that less specific log level configs override more specific ones is counter intuitive.
Even leaving aside the log4j2 configuration baggage, being able to override more specific loggers by only setting the parent is useful, but it introduces the obverse problem where if you wish to set a very specific logger, you must walk up the parent chain and change those too.

My leaning is to implement the log4j2 behavior that more specific configs override parent configs.

If we desire to maintain the capability to affect all the logger in a hierarchy, I think we could employ the wildcard trick we use elsewhere, eg:

{
  "persistent": {},
  "transient": {
    "logger.org.elasticsearch.*": "info"
  }
}

will iteratively descend in the hierarchy of more specific loggers and change their levels. If no loggers exist, this is a no-op.
To set the parent logger level (which is overriden by any existing or future more specific loggers), leave out the *, eg.

{
  "persistent": {},
  "transient": {
    "logger.org.elasticsearch.*": "info"
  }
}

This latter call, unlike the previous one, will be reflected in the _cluster/settings API.

@ywangd
Copy link
Member

ywangd commented Nov 19, 2020

Just wanted to add the original issue that sparked this discussion. During recent BugPool meeting, a case was reported about ActiveDirectory realm configuration. As a standard approach, trace logging was suggested, i.e.:

PUT _cluster/settings
{ "transient": { "logger.org.elasticsearch.xpack.security.authc.ldap":"trace" } }

However, because the user had logger._root set to INFO, the above configuration ended up doing nothing. This was confusing and caused extra back and forth.

When working with customers through messages, it is challenging to configure the logging level with 100% confidence with this behaviour. We would need to first ask user to provide the content of GET _cluster/settings, then construct the PUT call accordingly to take care of any overridings.

@rjernst rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020
@pgomulka pgomulka removed the needs:triage Requires assignment of a team area label label Dec 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Logging Log management and logging utilities :Core/Infra/Settings Settings infrastructure and APIs Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

No branches or pull requests

5 participants