Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[System ready] Extend sysmonitor functionality to wait for host daemons #18817

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
System-ready: SNMP should wait for system ready event
Signed-off-by: Yevhen Fastiuk <[email protected]>
  • Loading branch information
fastiuk committed May 20, 2024
commit 85ca8a9022c2f10058ca56d633a0ba750824c491
31 changes: 31 additions & 0 deletions files/build_templates/docker_image_ctl.j2
Original file line number Diff line number Diff line change
@@ -108,6 +108,37 @@ function preStartAction()
fi
{%- elif docker_container_name == "snmp" %}
$SONIC_DB_CLI STATE_DB HSET 'DEVICE_METADATA|localhost' chassis_serial_number $(decode-syseeprom -s)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you enabling this feature by default for snmp? Shouldn't it be based on configuration?

@dgsudharsan It is disabled for snmp by default: see files/build_templates/init_cfg.json.j2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is different. In the init_cfg.j2 you are considering snmp as irrelevent for system ready but using system ready as a criteria to spawn snmp. Is this correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you consider handling this logic through featured? Today we have a similar flow in featured where certain daemons are spawned after portinitdone. In this case the critera can be systemd. It can be much easy to extend this to multiple daemons

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is different. In the init_cfg.j2 you are considering snmp as irrelevent for system ready but using system ready as a criteria to spawn snmp. Is this correct?

Yes, it is indeed correct. Irrelevant for system ready means that system ready feature won't just collect readiness status of snmp systemd service. But SNMP must be started after system is ready. The logic actually does't break each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you consider handling this logic through featured? Today we have a similar flow in featured where certain daemons are spawned after portinitdone. In this case the critera can be systemd. It can be much easy to extend this to multiple daemons

I looked at the code in featured and I don't want to break anything there by adding wait for system ready. Most of the delayed services there depend on portinit done, and I don't want to add another wait point there as it will just make the logic there more complex and less clear.

# Wait for system ready. Continue after timeout if we won't hear anything
# from DB. Read timeout from config file and add two extra minutes on top of
# it.
CONFFILE=system_health_monitoring_config.json
PLATFORM=${PLATFORM:-`sonic-cfggen -d -v DEVICE_METADATA.localhost.platform`}
TIMEOUT=$(cat /usr/share/sonic/device/$PLATFORM/$CONFFILE | jq ".timeout")
SYSTEM_READY_EVENT="SYSTEM_READY|SYSTEM_STATE"
CHECK_SR_EVENT="sonic-db-dump -n STATE_DB -k $SYSTEM_READY_EVENT"

# Set default if not found
if [[ -z $TIMEOUT ]]; then
MESSAGE="Failed to read timeout from config file, assuming default is \
10 minutes"
echo $MESSAGE
TIMEOUT=10
fi

# Add to extra minutes and convert to seconds and
TIMEOUT=$(((TIMEOUT + 2) * 60))

# Waiting for event
echo "Waiting for $SYSTEM_READY_EVENT event"
while [[ -z "$($CHECK_SR_EVENT | jq '.[]["value"]["Status"]')" ]]; do
UPTIME=$(awk -F. '{print $1}' /proc/uptime)
if [[ $UPTIME -gt $TIMEOUT ]]; then
echo "Got uptime timeout - starting snmp..."
break
fi
sleep 15
done
{%- else %}
: # nothing
{%- endif %}