Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1 day is only 23h #10255

Open
moreamazingnick opened this issue Nov 26, 2024 · 6 comments
Open

1 day is only 23h #10255

moreamazingnick opened this issue Nov 26, 2024 · 6 comments

Comments

@moreamazingnick
Copy link

Describe the bug

I recently noticed something strange using the following config:

object Service "check_by_selenium" {
    host_name = "***"
    check_command = "check_by_selenium"
    check_interval = 1d
    vars.selenium_project = "test"
    vars.selenium_testsuite = "icinga community forum"
    vars.selenium_with_images = true
}

My selenium icingaweb2 module writes to a database so I know the time the check is actually executed and the time between two checks is not 24h but only 23h

To Reproduce

  1. create a service with intervall 24h or 86400
  2. watch next execution

Screenshots

Selenium check:
Screenshot 2024-11-26 221430 - Kopie
Monitoring module:
Screenshot 2024-11-26 221518 - Kopie
IcingaDb module:
Screenshot 2024-11-26 221559

Your Environment

Include as many relevant details about the environment you experienced the problem in

  • Version used (icinga2 --version): 2.14.3
  • Operating System and version: Ubuntu 20.4
  • Enabled features (icinga2 feature list):
  • Icinga Web 2 version and modules (System - About): 2.12.2, icingadb 1.1.2
  • Config validation (icinga2 daemon -C):
  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.

Best Regards
Nicolas

@oxzi
Copy link
Member

oxzi commented Nov 27, 2024

First, this looks kinda strange. So strange that I verified at an own setup that a daily Service check there was actually scheduled 24 hours after the last run.

Could you please provide some more insights by running the following commands as your usual Icinga 2 user. Please alter the object name in the second command accordingly.

icinga2 daemon -C --dump-objects
icinga2 object list -n '__HOST_NAME__!check_by_selenium'

As you have timestamps in your database, would it be possible for you to go further back and verify if this has happened since "always" or if this is a more recent development?

Furthermore, being scheduled exactly at 06:00 seems a bit odd. Is there something rescheduling checks?

@moreamazingnick
Copy link
Author

moreamazingnick commented Nov 27, 2024

thanks for the fast reply

root@***:/home/nicolas# icinga2 daemon -C --dump-objects
[2024-11-27 08:10:05 +0000] information/cli: Icinga application loader (version: r2.14.3-1)
[2024-11-27 08:10:05 +0000] information/cli: Loading configuration file(s).
[2024-11-27 08:10:05 +0000] information/ConfigItem: Committing config item(s).
[2024-11-27 08:10:05 +0000] information/ApiListener: My API identity: ******
[2024-11-27 08:10:05 +0000] warning/ApplyRule: Apply rule 'TESTs' (in /var/lib/icinga2/api/packages/director/3fa4fec3-3263-48a1-b688-fab62e0278e0/zones.d/master/scheduled_downtime_apply.conf: 22:1-22:42) for type 'ScheduledDowntime' does not match anywhere!
[2024-11-27 08:10:05 +0000] warning/ApplyRule: Apply rule 'testtest' (in /var/lib/icinga2/api/packages/director/3fa4fec3-3263-48a1-b688-fab62e0278e0/zones.d/director-global/service_apply.conf: 76:1-76:24) for type 'Service' does not match anywhere!
[2024-11-27 08:10:05 +0000] warning/ApplyRule: Apply rule 'test' (in /var/lib/icinga2/api/packages/director/3fa4fec3-3263-48a1-b688-fab62e0278e0/zones.d/director-global/service_apply.conf: 105:1-105:20) for type 'Service' does not match anywhere!
[2024-11-27 08:10:05 +0000] warning/ApplyRule: Apply rule 'wordpress' (in /var/lib/icinga2/api/packages/director/3fa4fec3-3263-48a1-b688-fab62e0278e0/zones.d/director-global/servicesets.conf: 62:1-62:25) for type 'Service' does not match anywhere!
[2024-11-27 08:10:05 +0000] warning/ApplyRule: Apply rule 'exchange_cu_checker' (in /var/lib/icinga2/api/packages/director/3fa4fec3-3263-48a1-b688-fab62e0278e0/zones.d/director-global/servicesets.conf: 70:1-70:35) for type 'Service' does not match anywhere!
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 13 NotificationCommands.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 343 Notifications.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 1 IcingaApplication.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 41 HostGroups.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 32 Hosts.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 1 Downtime.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 3 EventCommands.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 1 Influxdb2Writer.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 1 IcingaDB.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 1 FileLogger.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 1 InfluxdbWriter.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 6 Zones.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 1 CheckerComponent.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 1 IdoPgsqlConnection.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 3 Users.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 4 Endpoints.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 2 ApiUsers.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 1 ApiListener.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 1 NotificationComponent.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 368 CheckCommands.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 1 UserGroup.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 24 ServiceGroups.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 2 TimePeriods.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 4 ScheduledDowntimes.
[2024-11-27 08:10:05 +0000] information/ConfigItem: Instantiated 77 Services.
[2024-11-27 08:10:05 +0000] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2024-11-27 08:10:05 +0000] information/cli: Finished validating the configuration file(s).

root@***:/home/nicolas# icinga2 object list -n 'community.icinga.com!check_by_selenium'
Object 'community.icinga.com!check_by_selenium' of type 'Service':
  % declared in '/var/lib/icinga2/api/packages/director/3fa4fec3-3263-48a1-b688-fab62e0278e0/zones.d/master/services.conf', lines 417:1-417:34
  * __name = "community.icinga.com!check_by_selenium"
  * action_url = ""
  * check_command = "check_by_selenium"
    % = modified in '/var/lib/icinga2/api/packages/director/3fa4fec3-3263-48a1-b688-fab62e0278e0/zones.d/director-global/service_templates.conf', lines 1339:5-1339:39
  * check_interval = 86400
    % = modified in '/var/lib/icinga2/api/packages/director/3fa4fec3-3263-48a1-b688-fab62e0278e0/zones.d/director-global/service_templates.conf', lines 1340:5-1340:23
  * check_period = ""
  * check_timeout = null
  * command_endpoint = ""
  * display_name = "check_by_selenium"
  * enable_active_checks = true
  * enable_event_handler = true
  * enable_flapping = false
  * enable_notifications = true
  * enable_passive_checks = true
  * enable_perfdata = true
  * event_command = "icinga-notifications-service-events"
    % = modified in '/etc/icinga2/features-enabled/icinga2.conf', lines 266:2-266:54
  * flapping_ignore_states = null
  * flapping_threshold = 0
  * flapping_threshold_high = 30
  * flapping_threshold_low = 25
  * groups = [ ]
  * host_name = "community.icinga.com"
    % = modified in '/var/lib/icinga2/api/packages/director/3fa4fec3-3263-48a1-b688-fab62e0278e0/zones.d/master/services.conf', lines 418:5-418:38
  * icon_image = ""
  * icon_image_alt = ""
  * max_check_attempts = 3
  * name = "check_by_selenium"
  * notes = ""
  * notes_url = ""
  * package = "director"
  * retry_interval = 60
  * source_location
    * first_column = 1
    * first_line = 417
    * last_column = 34
    * last_line = 417
    * path = "/var/lib/icinga2/api/packages/director/3fa4fec3-3263-48a1-b688-fab62e0278e0/zones.d/master/services.conf"
  * templates = [ "check_by_selenium", "generic-icinga-notifications-service", "check_by_selenium" ]
    % = modified in '/var/lib/icinga2/api/packages/director/3fa4fec3-3263-48a1-b688-fab62e0278e0/zones.d/master/services.conf', lines 417:1-417:34
    % = modified in '/etc/icinga2/features-enabled/icinga2.conf', lines 265:1-265:63
    % = modified in '/var/lib/icinga2/api/packages/director/3fa4fec3-3263-48a1-b688-fab62e0278e0/zones.d/director-global/service_templates.conf', lines 1338:1-1338:36
  * type = "Service"
  * vars
    * selenium_project = "test"
      % = modified in '/var/lib/icinga2/api/packages/director/3fa4fec3-3263-48a1-b688-fab62e0278e0/zones.d/master/services.conf', lines 421:5-421:34
    * selenium_testsuite = "icinga community forum"
      % = modified in '/var/lib/icinga2/api/packages/director/3fa4fec3-3263-48a1-b688-fab62e0278e0/zones.d/master/services.conf', lines 422:5-422:54
    * selenium_with_images = true
      % = modified in '/var/lib/icinga2/api/packages/director/3fa4fec3-3263-48a1-b688-fab62e0278e0/zones.d/master/services.conf', lines 423:5-423:36
  * volatile = false
  * zone = "master"

I previously had a interval of 24h minus 5minutes
than I changed that and the offset is since that.

but it also occurs for newly added checks on another machine
image

@oxzi
Copy link
Member

oxzi commented Nov 27, 2024

This seems to be the normal check scheduler behavior as also documented here.

void Checkable::UpdateNextCheck(const MessageOrigin::Ptr& origin)
{
double interval;
if (GetStateType() == StateTypeSoft && GetLastCheckResult() != nullptr)
interval = GetRetryInterval();
else
interval = GetCheckInterval();
double now = Utility::GetTime();
double adj = 0;
if (interval > 1)
adj = fmod(now * 100 + GetSchedulingOffset(), interval * 100) / 100.0;
if (adj != 0.0)
adj = std::min(0.5 + fmod(GetSchedulingOffset(), interval * 5) / 100.0, adj);
double nextCheck = now - adj + interval;
double lastCheck = GetLastCheck();
Log(LogDebug, "Checkable")
<< "Update checkable '" << GetName() << "' with check interval '" << GetCheckInterval()
<< "' from last check time at " << Utility::FormatDateTime("%Y-%m-%d %H:%M:%S %z", (lastCheck < 0 ? 0 : lastCheck))
<< " (" << GetLastCheck() << ") to next check time at " << Utility::FormatDateTime("%Y-%m-%d %H:%M:%S %z", nextCheck) << " (" << nextCheck << ").";
SetNextCheck(nextCheck, false, origin);
}

However, without having to investigated this further, there seems to be a difference between this code block and the linked documentation.

Could you temporary enable debug logs and wait for the matching "Update checkable" log message with its surrounding messages?

@moreamazingnick
Copy link
Author

moreamazingnick commented Nov 28, 2024

here you go:

[2024-11-28 03:51:38 +0000] debug/CheckerComponent: Scheduling info for checkable 'community.icinga.com!check_by_selenium' (2024-11-28 03:51:38 +0000): Object 'community.icinga.com!check_by_selenium', Next Check: 2024-11-28 03:51:38 +0000(1.73277e+09).
[2024-11-28 03:51:38 +0000] debug/CheckerComponent: Executing check for 'community.icinga.com!check_by_selenium'
[2024-11-28 03:51:38 +0000] debug/Checkable: Update checkable 'community.icinga.com!check_by_selenium' with check interval '86400' from last check time at 2024-11-27 05:00:33 +0000 (1.73268e+09) to next check time at 2024-11-29 03:38:19 +0000 (1.73285e+09).
[2024-11-28 03:51:38 +0000] notice/ApiListener: Relaying 'event::SetLastCheckStarted' message
[2024-11-28 03:51:38 +0000] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2024-11-28 03:51:38 +0000] notice/Process: Running command '/usr/bin/icingacli' 'selenium' 'check' '--project' 'test' '--testsuite' 'icinga community forum' '--with-images': PID 1105608
[2024-11-28 03:51:38 +0000] debug/CheckerComponent: Check finished for object 'community.icinga.com!check_by_selenium'

image

@moreamazingnick
Copy link
Author

Screenshot 2024-12-01 at 09 09 50

Just for the record I pressed the check now button (9:13) and the check is scheduled for 8:59 the next day

@oxzi
Copy link
Member

oxzi commented Jan 14, 2025

Please excuse the delayed response. In your prior comment, the following log file caught my attention.

[2024-11-28 03:51:38 +0000] debug/Checkable: Update checkable 'community.icinga.com!check_by_selenium' with check interval '86400' from last check time at 2024-11-27 05:00:33 +0000 (1.73268e+09) to next check time at 2024-11-29 03:38:19 +0000 (1.73285e+09).

It was

  • logged at 2024-11-28 03:51:38,
  • the check was last executed at 2024-11-27 05:00:33 and
  • the next check was calculated for 2024-11-29 03:38:19.

Since the Checkable::UpdateNextCheck method calculates the next check time based on the scheduling offset1, the interval and the current timestamp, the last execution time is not taken into account.

To make sure that the method itself does not create "too big" offsets, I have converted it in Python and ran the numbers. For a check interval of 24h, the maximum offset were roughly five minutes2.

Details

$ cat foo.py
from math import fmod

RAND_MAX = 32_767
interval = 86_400

for now in range(100):
    for sched_offset in range(RAND_MAX):
        adj = 0.0
        if interval > 1:
            adj = fmod(now * 100 + sched_offset, interval * 100) / 100.0
        if adj != 0.0:
            adj = min(0.5 + fmod(sched_offset, interval * 5) / 100.0, adj)

        print(f"{now}\t{sched_offset}\t{adj}")
$ python foo.py | sort -n -k 3 | head
0       0       0.0
0       1       0.01
0       2       0.02
0       3       0.03
0       4       0.04
0       5       0.05
0       6       0.06
0       7       0.07
0       8       0.08
0       9       0.09
$ python foo.py | sort -n -k 3 | tail
91      32766   328.16
92      32766   328.16
9       32766   328.16
93      32766   328.16
94      32766   328.16
95      32766   328.16
96      32766   328.16
97      32766   328.16
98      32766   328.16
99      32766   328.16

Thus, the relevant question is why the Checkable::UpdateNextCheck method was called at 03:51:38.

Footnotes

  1. Integer returned by rand_r, which is at least 32767.

  2. But even these five minutes are below the diff between the log line above :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants