The behaviour of cluster settings API for changing log levels #65208

pgomulka · 2020-11-18T15:08:54Z

there is some confusion about how configuring logging levels via cluster settings api works.

as per doc we change logging level for a logger via

PUT /_cluster/settings
{
  "transient": {
    "logger.org.elasticsearch.transport": "DEBUG"
  }
}

the current logging also trickle down the logging hierarchy and sets that level on all the loggers down the hierarchy. Meaning loggers logger.org.elasticsearch.transport.X and logger.org.elasticsearch.transport.X.Y will have a level DEBUG too.
This is expected and works the same way as log4j configuration from a file.

However if you set a logger down the hierarchy earlier
"logger.org.elasticsearch.transport.X": "TRACE"
the subsequent call to set "logger.org.elasticsearch.transport": "DEBUG" will
overridde "logger.org.elasticsearch.transport.X" logger to DEBUG

that is different to what you would expect from log4j configuration,
-- but it make using the API easier. You don't have to remember which loggers down the hierarchy you have changed and allows you to "clear" them back
the get /_cluster/settings still shows logger.org.elasticsearch.transport.X" = trace even thought you overriden it with logger.org.elasticsearch.transport = debug
in order to really enforce logger.org.elasticsearch.transport.X to be trace you have to trigger cluster state settings change. Meaning you have to change it to something else (like debug) and then switch it back to intended trace value.

I think we should either

change the behaviour so that we don't clear the overrides when setting a parent logger
or
document that behaviour and
remove the override setting from cluster settings as the result from get /_cluster/settings is incorrect

reproduce steps:

curl --request PUT \
  --url http://localhost:9200/_cluster/settings \
  --header 'content-type: application/json' \
  --data '{
  "transient": {
		"logger.org.elasticsearch.http.HttpTracer": "trace"
  }
}'

expect:

curl --request GET \
  --url http://localhost:9200/_cluster/settings \
  --header 'content-type: application/json'

result with

{
  "persistent": {},
  "transient": {
    "logger": {
      "org.elasticsearch.http":  "trace"

      }
    }
  }
}

also expect that HttpTracer starts logging its trace messages

curl --request PUT \
  --url http://localhost:9200/_cluster/settings \
  --header 'authorization: Basic ZWxhc3RpYzpwYXNzd29yZA==' \
  --header 'content-type: application/json' \
  --data '{
  "transient": {
		"logger.org.elasticsearch": "INFO"
  }
}'

HttpTracer no longer logs

the result of _cluster/settings is not clear

curl --request GET \
  --url http://localhost:9200/_cluster/settings \
  --header 'content-type: application/json'

{
  "persistent": {},
  "transient": {
    "logger": {
      "org": {
        "elasticsearch": "info",
        "elasticsearch.http": "trace"
      }
    }
  }
}

the change was intentionally introduced #20463

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-11-18T15:08:57Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

pgomulka · 2020-11-18T15:11:32Z

cc @ywangd and @jasontedor

pgomulka · 2020-11-18T15:52:11Z

and it is more unclear when you "clear" the loggers with
"logger._root": "INFO"
and then "unset" it with logger._root: null because a GET _cluster/setting would still show previously trace logger but not show that a root logger was set to INFO and "cleared" all other loggers

 "logger": {
      "org.elasticsearch.http": "trace"
      }
    }

pgomulka · 2020-11-18T17:18:09Z

we discussed this on core infra sync on 18th Nov.
points were raised that for some users this might be an intended behaviour to quickly override all settings for the hierarchy of loggers. User might not be aware that we set some loggers to error and this api allows him to enable everything to some more verbose levels. (example config/repository-s3/log4j2.properties)

However, this also has a downside that if a user overrides a logger which was not set by him earlier, he will not be aware that he is changing it and what was the previous level.

If we were to stay with current behaviour of allowing to apply the level for the whole hierarchy we should consider to:

remove the overrides which are invalidated by the setting in parent logger (so that get _cluster/settings response is correct)
be able to revert back the "previous" values (by previous we could mean the values that were set on startup)
possibly introduce a new get logging/settings api to show the current state of logging levels?

we did not reach conclusion and will continue discussion here

pgomulka · 2020-11-19T08:06:18Z

two issues I found that were asked due to the way it works now
#53061 - user asks if this is a bug, we replied back that this was by design
#47251 - user asks about security logger being enabled after the root logger is cleared (nulled) - fixed by a security team by implementing a marker

albertzaharovits · 2020-11-19T11:03:24Z

I think that, given the users' expectations coming from log4j2 configuration, the behavior that less specific log level configs override more specific ones is counter intuitive.
Even leaving aside the log4j2 configuration baggage, being able to override more specific loggers by only setting the parent is useful, but it introduces the obverse problem where if you wish to set a very specific logger, you must walk up the parent chain and change those too.

My leaning is to implement the log4j2 behavior that more specific configs override parent configs.

If we desire to maintain the capability to affect all the logger in a hierarchy, I think we could employ the wildcard trick we use elsewhere, eg:

{
  "persistent": {},
  "transient": {
    "logger.org.elasticsearch.*": "info"
  }
}

will iteratively descend in the hierarchy of more specific loggers and change their levels. If no loggers exist, this is a no-op.
To set the parent logger level (which is overriden by any existing or future more specific loggers), leave out the *, eg.

{
  "persistent": {},
  "transient": {
    "logger.org.elasticsearch.*": "info"
  }
}

This latter call, unlike the previous one, will be reflected in the _cluster/settings API.

ywangd · 2020-11-19T13:27:03Z

Just wanted to add the original issue that sparked this discussion. During recent BugPool meeting, a case was reported about ActiveDirectory realm configuration. As a standard approach, trace logging was suggested, i.e.:

PUT _cluster/settings
{ "transient": { "logger.org.elasticsearch.xpack.security.authc.ldap":"trace" } }

However, because the user had logger._root set to INFO, the above configuration ended up doing nothing. This was confusing and caused extra back and forth.

When working with customers through messages, it is challenging to configure the logging level with 100% confidence with this behaviour. We would need to first ask user to provide the content of GET _cluster/settings, then construct the PUT call accordingly to take care of any overridings.

pgomulka added >bug discuss :Core/Infra/Settings Settings infrastructure and APIs :Core/Infra/Logging Log management and logging utilities needs:triage Requires assignment of a team area label labels Nov 18, 2020

elasticmachine added the Team:Core/Infra Meta label for core/infra team label Nov 18, 2020

jrodewig mentioned this issue Nov 18, 2020

[DOCS] Clarify logger-package relationship #65169

Merged

pgomulka removed discuss needs:triage Requires assignment of a team area label labels Nov 19, 2020

rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020

pgomulka removed the needs:triage Requires assignment of a team area label label Dec 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The behaviour of cluster settings API for changing log levels #65208

The behaviour of cluster settings API for changing log levels #65208

pgomulka commented Nov 18, 2020 •

edited

Loading

elasticmachine commented Nov 18, 2020

pgomulka commented Nov 18, 2020

pgomulka commented Nov 18, 2020

pgomulka commented Nov 18, 2020 •

edited

Loading

pgomulka commented Nov 19, 2020

albertzaharovits commented Nov 19, 2020 •

edited

Loading

ywangd commented Nov 19, 2020

The behaviour of cluster settings API for changing log levels #65208

The behaviour of cluster settings API for changing log levels #65208

Comments

pgomulka commented Nov 18, 2020 • edited Loading

elasticmachine commented Nov 18, 2020

pgomulka commented Nov 18, 2020

pgomulka commented Nov 18, 2020

pgomulka commented Nov 18, 2020 • edited Loading

pgomulka commented Nov 19, 2020

albertzaharovits commented Nov 19, 2020 • edited Loading

ywangd commented Nov 19, 2020

pgomulka commented Nov 18, 2020 •

edited

Loading

pgomulka commented Nov 18, 2020 •

edited

Loading

albertzaharovits commented Nov 19, 2020 •

edited

Loading