Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error setting schedule on macOS #64153

Closed
cdalvaro opened this issue Apr 25, 2023 · 7 comments
Closed

Error setting schedule on macOS #64153

cdalvaro opened this issue Apr 25, 2023 · 7 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior MacOS pertains to the OS of fruit needs-triage

Comments

@cdalvaro
Copy link
Contributor

cdalvaro commented Apr 25, 2023

Description

When trying to set or update an schedule with the state schedule.present the state fails.

This is my current state:

periodic_update schedule set:
  schedule.present:
    - name: periodic_update
    - function: state.apply
    - run_on_start: True
    - hours: 2
    - returner: highstate
    - enabled: True
    - persist: True

The result of applying the state is the following:

local:
    ----------
    changes:
        ----------
    comment:
        Failed to add job periodic_update to schedule.
    result:
        False

After debugging the schedule.py (module/state/util) I've found that the issue is here:

event_ret = event_bus.get_event(
tag="/salt/minion/minion_schedule_add_complete",
wait=30,
)

Because no event is recovered.

Setup

I'm currently running salt 3005 on an Apple Silicon Mac (macOS 13.3.1) with salt installed via Homebrew. However, the issue also happens on an Intel Mac (macOS 12) running salt 3006 installed via onedir package.

Steps to Reproduce the behavior

The issue can be reproduced with the following command too:

sudo -E salt-call --local schedule.add periodic_update \
  function='state.apply' run_on_start=True hours=2 \
  returner='highstate' enabled=True persist=True

local:
    ----------
    changes:
        ----------
    comment:
        Failed to add job periodic_update to schedule.
    result:
        False
@cdalvaro cdalvaro added Bug broken, incorrect, or confusing behavior needs-triage labels Apr 25, 2023
@cdalvaro
Copy link
Contributor Author

cdalvaro commented Apr 25, 2023

Maybe related with #60765 and #62334

@anilsil anilsil added this to the Sulfur v3006.2 milestone Apr 25, 2023
@dwoz
Copy link
Contributor

dwoz commented Apr 25, 2023

Possible duplicate of #64111

@dwoz dwoz self-assigned this Apr 25, 2023
@cdalvaro
Copy link
Contributor Author

I'm seeing this error since salt 3005. Not sure whether this is a duplicate of #64111

@OrangeDog OrangeDog added the MacOS pertains to the OS of fruit label Apr 27, 2023
@cdalvaro
Copy link
Contributor Author

More details on this.

It seems that when I start the salt-minion, the scheduled is set and jobs are scheduled:

sudo -E salt-call schedule.list
local:
    schedule:
      periodic_update:
        enabled: true
        function: state.apply
        hours: 2
        jid_include: true
        maxrunning: 1
        name: periodic_update
        returner: highstate
        run_on_start: true
        saved: true

sudo -E salt-call schedule.show_next_fire_time periodic_update
local:
    ----------
    next_fire_time:
        2023-05-20T12:10:02
    result:
        True

However, after running the first highstate, the minion loses its connection with the master and schedules disappear.

Moreover, while the highstate is running, the master can ping the minion. But once the highstate has finished the master cannot ping the minion anymore.

So it looks more an issue on my config than on the code itself.

@cdalvaro
Copy link
Contributor Author

cdalvaro commented May 20, 2023

It seems to be related with some kind of timeout with my macOS states.

After commenting several macOS states and decreasing the highstate running time, the minion keeps connected to the master. However, with the full configuration, after the highstate has finished the minion is no longer connected to the master.

Maybe related with:

@cdalvaro
Copy link
Contributor Author

So finally I've found the issue.

Running salt-minion daemonless I have seen the following log messages after running the highstate and killing the process:

[DEBUG   ] Closing IPCMessageClient instance
[DEBUG   ] Closing IPCMessageSubscriber instance
[WARNING ] Minion received a SIGINT. Exiting.
[INFO    ] Shutting down the Salt Minion
The Salt Minion is shutdown. Minion received a SIGINT. Exited.
The minion failed to return the job information for job 20230524093102271062. This is often due to the master being shut down or overloaded. If the master is running, consider increasing the worker_threads value.
Future <salt.ext.tornado.concurrent.Future object at 0x10a5ba110> exception was never retrieved: Traceback (most recent call last):
  File "/opt/homebrew/Cellar/salt/3006.1/libexec/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 309, in wrapper
    yielded = next(result)
  File "/opt/homebrew/Cellar/salt/3006.1/libexec/lib/python3.10/site-packages/salt/minion.py", line 2921, in handle_event
    self._return_pub(data, ret_cmd="_return", sync=False)
  File "/opt/homebrew/Cellar/salt/3006.1/libexec/lib/python3.10/site-packages/salt/minion.py", line 2263, in _return_pub
    log.trace("ret_val = %s", ret_val)  # pylint: disable=no-member
UnboundLocalError: local variable 'ret_val' referenced before assignment

My salt-master config is set with:

worker_threads: 6
reactor_worker_threads: 10

and I only manage 8 minions. Reading the official documentation it is recommended one worker thread per 200 minions. So I understand my salt-master config is right.

Tracking the message I have found this issue #53474, where @weswhet suggest to set pub_ret to False. After setting the schedule to not return the job to the master (return_job: False), the minions keeps its connection with the master after applying highstate and everything seams to be working properly. 🎉

I would like to understand now, why the master is unable to manage the return. And it would be great to add this information to the minion's log even running the minion as a service. Since finding the message with the real error has not been easy.

I'll close this issue in the following days.

@cdalvaro
Copy link
Contributor Author

cdalvaro commented Jun 4, 2023

I haven't found why my master is unable to process the returned job from the minion. But given it seems to be something related with my config, I'm closing this issue now.

However, I hope this issue can help other people with similar issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior MacOS pertains to the OS of fruit needs-triage
Projects
None yet
Development

No branches or pull requests

4 participants