Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix self-update frequency to spread over 24 hrs for regular type and 4 hrs for hotfix #2948

Merged
merged 4 commits into from
Oct 23, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 35 additions & 42 deletions azurelinuxagent/ga/agent_update_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ def __init__(self, protocol):
self._is_requested_version_update = True # This is to track the current update type(requested version or self update)
self.update_state = AgentUpdateHandlerUpdateState()

def __should_update_agent(self, requested_version):
def __check_if_agent_update_allowed_and_update_next_upgrade_times(self, requested_version):
narrieta marked this conversation as resolved.
Show resolved Hide resolved
"""
requested version update:
update is allowed once per (as specified in the conf.get_autoupdate_frequency())
Expand All @@ -70,24 +70,20 @@ def __should_update_agent(self, requested_version):
if next_attempt_time > now:
return False
# The time limit elapsed for us to allow updates.
self.update_state.last_attempted_requested_version_update_time = now
return True
else:
next_hotfix_time, next_normal_time = self.__get_next_upgrade_times(now)
upgrade_type = self.__get_agent_upgrade_type(requested_version)

if (upgrade_type == AgentUpgradeType.Hotfix and next_hotfix_time <= now) or (
upgrade_type == AgentUpgradeType.Normal and next_normal_time <= now):
# Update the last upgrade check time even if no new agent is available for upgrade
self.update_state.last_attempted_hotfix_update_time = now
self.update_state.last_attempted_normal_update_time = now
return True
return False

def __update_last_attempt_update_times(self):
now = datetime.datetime.now()
if self._is_requested_version_update:
self.update_state.last_attempted_requested_version_update_time = now
else:
self.update_state.last_attempted_normal_update_time = now
self.update_state.last_attempted_hotfix_update_time = now

def __should_agent_attempt_manifest_download(self):
"""
The agent should attempt to download the manifest if
Expand Down Expand Up @@ -322,49 +318,46 @@ def run(self, goal_state):
if "Missing requested version" in GAUpdateReportState.report_error_msg:
GAUpdateReportState.report_error_msg = ""

if requested_version == CURRENT_VERSION:
# Check if an update is allowed and update next upgrade times even if no new agent is available for upgrade
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the warning at line 308?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if some reason requested version is empty or missing in the goal state. we still want to update with old logic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think that is the case, since the check is on the boolean flag, and if the boolean is true, the version cannot be missing

Copy link
Contributor Author

@nagworld9 nagworld9 Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boolean flag is local to agent. If we enable the GA versioning but the version property is empty in GS for whatever reason (may be issue in CRP), in that case, we log the warning and continue with self-update.

Copy link
Member

@narrieta narrieta Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that behavior is correct. If CRP is indicating that the machine is enrolled to RSM updates, we should never fallback to self updates (unless the CRP flag changes, of course).

If the flag is True and the version is missing, then, as you point out, it may be an issue on CRP. We should then report this error and skip the update.

Copy link
Contributor Author

@nagworld9 nagworld9 Oct 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially this flag is not there when we implemented. That time we decided to do like this because things are in flux and if issue in crp and fix takes days to reach prod, till then we don't do the update. That's not good for customer without update so we want to continue with self-update and report status with this msg as RSM update has error.

After all that discussion today we think we shouldn't update sure I'll change it. But do you think is this issue in current release?

I'll consider changing as part of RSM stuff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the code and add it when it is needed, or comment it out or mark it somehow. Otherwise this may be missed in a future review. Marking it facilitates reviews.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment to revisit this.

if not self.__check_if_agent_update_allowed_and_update_next_upgrade_times(requested_version):
return

# Check if an update is allowed
if not self.__should_update_agent(requested_version):
if requested_version == CURRENT_VERSION:
return

if warn_msg != "":
self.__log_event(LogLevel.WARNING, warn_msg)

try:
# Downgrades are not allowed for self-update version
# Added it in try block after agent update timewindow check so that we don't log it too frequently
if not self.__check_if_downgrade_is_requested_and_allowed(requested_version):
return

daemon_version = self.__get_daemon_version_for_update()
if requested_version < daemon_version:
# Don't process the update if the requested version is less than daemon version,
# as historically we don't support downgrades below daemon versions. So daemon will not pickup that requested version rather start with
# installed latest version again. When that happens agent go into loop of downloading the requested version, exiting and start again with same version.
#
raise AgentUpdateError("The Agent received a request to downgrade to version {0}, but downgrading to a version less than "
"the Agent installed on the image ({1}) is not supported. Skipping downgrade.".format(requested_version, daemon_version))

msg = "Goal state {0} is requesting a new agent version {1}, will update the agent before processing the goal state.".format(
self._gs_id, str(requested_version))
self.__log_event(LogLevel.INFO, msg)

agent = self.__download_and_get_agent(goal_state, agent_family, agent_manifest, requested_version)
# Downgrades are not allowed for self-update version
# Added it in try block after agent update timewindow check so that we don't log it too frequently
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try block is gone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was added because I need finally block before. Now I put that finallly logic when check the times, So I don't need finally, so I removed try too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what does that comment mean? can you clarify it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should remove this msg

if not self.__check_if_downgrade_is_requested_and_allowed(requested_version):
return

if agent.is_blacklisted or not agent.is_downloaded:
msg = "Downloaded agent version is in bad state : {0} , skipping agent update".format(
str(agent.version))
self.__log_event(LogLevel.WARNING, msg)
return
daemon_version = self.__get_daemon_version_for_update()
if requested_version < daemon_version:
# Don't process the update if the requested version is less than daemon version,
# as historically we don't support downgrades below daemon versions. So daemon will not pickup that requested version rather start with
# installed latest version again. When that happens agent go into loop of downloading the requested version, exiting and start again with same version.
#
raise AgentUpdateError("The Agent received a request to downgrade to version {0}, but downgrading to a version less than "
"the Agent installed on the image ({1}) is not supported. Skipping downgrade.".format(requested_version, daemon_version))

msg = "Goal state {0} is requesting a new agent version {1}, will update the agent before processing the goal state.".format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message is misleading, since it is logged for self-update too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. As part of IsVersionFromRSM PR, I improved this msg to include update mode. But that PR is in hold since we were having discussion with Kashif/Srinath

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please improve this message for this release and you can adjust it later once the RSM stuff is ready

self._gs_id, str(requested_version))
self.__log_event(LogLevel.INFO, msg)

agent = self.__download_and_get_agent(goal_state, agent_family, agent_manifest, requested_version)

if agent.is_blacklisted or not agent.is_downloaded:
msg = "Downloaded agent version is in bad state : {0} , skipping agent update".format(
str(agent.version))
self.__log_event(LogLevel.WARNING, msg)
return

# We delete the directory and the zip package from the filesystem except current version and target version
self.__purge_extra_agents_from_disk(CURRENT_VERSION, known_agents=[agent])
self.__proceed_with_update(requested_version)
# We delete the directory and the zip package from the filesystem except current version and target version
self.__purge_extra_agents_from_disk(CURRENT_VERSION, known_agents=[agent])
self.__proceed_with_update(requested_version)

finally:
self.__update_last_attempt_update_times()

except Exception as err:
if isinstance(err, AgentUpgradeExitException):
Expand Down
Loading