Block extensions disallowed by policy #3259

mgunnala · 2024-11-11T20:43:38Z

Description

Issue #

PR #2 for the policy engine allowlist feature:

invoke policy engine from exthandlers.py
block all extensions and report status if any errors are thrown during engine initialization
block any extensions that are disallowed by policy
add unit and e2e tests

PR information

The title of the PR is clear and informative.
There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
If applicable, the PR references the bug/issue that it fixes in the description.
New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines

I have read the contribution guidelines.

azurelinuxagent/ga/policy/policy_engine.py

codecov · 2024-11-13T00:57:46Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.77%. Comparing base (3aebcdd) to head (a37508f).
Report is 328 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3259      +/-   ##
===========================================
+ Coverage    71.97%   72.77%   +0.79%     
===========================================
  Files          103      114      +11     
  Lines        15692    17081    +1389     
  Branches      2486     2277     -209     
===========================================
+ Hits         11295    12431    +1136     
- Misses        3881     4107     +226     
- Partials       516      543      +27

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

maddieford

Did an initial review not including tests

azurelinuxagent/ga/policy/policy_engine.py

maddieford · 2024-11-14T22:12:25Z

azurelinuxagent/ga/policy/policy_engine.py

+    """
+    # TODO: when CRP adds terminal error code for policy-related extension failures, set that as the default code.
+    def __init__(self, msg, inner=None, code=-1):
+        msg = "Extension is disallowed by agent policy and will not be processed: {0}".format(msg)


In case where agent failed to parse policy, I'm not sure we should say 'Extension is disallowed by policy'. In this case, extension is disallowed because there's some issue reading or parsing the policy.

I also am hesitant about 'agent policy' since policy is provided by customer

I could change this to "Extension will not be processed: "

Parsing errors (InvalidPolicyError) would look like "Extension will not be processed: customer-provided policy file (path) is invalid, please correct the following error..."

Extension disallowed errors (ExtensionPolicyError) would look like "Extension will not be processed: failed to enable extension CustomScript because extension is not specified in policy allowlist. To enable, add extension to the allowed list in policy file (path)."

I've updated the error message as discussed above.

maddieford · 2024-11-14T22:32:44Z

azurelinuxagent/ga/exthandlers.py

+            policy_op, policy_err_code = policy_err_map.get(ext_handler.state)
+            if policy_error is not None:
+                err = ExtensionPolicyError(msg="", inner=policy_error, code=policy_err_code)
+                self.__handle_and_report_ext_handler_errors(handler_i, err,


Does this create .status files for single config extensions?

I added a new function __handle_and_report_policy_error() - this should create a status file for any extension with settings.

maddieford · 2024-11-14T22:36:07Z

azurelinuxagent/ga/exthandlers.py

+                                                                                             ext_handler.name,
+                                                                                             conf.get_policy_file_path())
+                err = ExtensionPolicyError(msg, code=policy_err_code)
+                self.__handle_and_report_ext_handler_errors(handler_i, err,


same question here about .status file for single config extensions

I added a new function __handle_and_report_policy_error() - this should create a status file for any extension with settings.

maddieford

Changes look good. I'm going to spend tomorrow going through each e2e scenario and unit test, sorry for the slow review :/

maddieford · 2024-11-20T00:15:39Z

azurelinuxagent/ga/exthandlers.py

+                ExtensionRequestedState.Enabled: ('enable', ExtensionErrorCodes.PluginEnableProcessingFailed),
+                # TODO: CRP does not currently have a terminal error code for uninstall. Once CRP adds
+                # an error code for uninstall or for policy, use this code instead of PluginDisableProcessingFailed
+                # Note that currently, CRP waits for 90 minutes to time out for a failed uninstall operation, instead of


Could we add some more detail to this comment?
Something like:

Note that currently, CRP will poll until the agent does not report a status for an extension that should be uninstalled. In the case of a policy error, the agent will report a failed status on behalf of the extension, which will cause CRP to poll for the full timeout period, instead of failing fast.

maddieford · 2024-11-20T00:21:21Z

azurelinuxagent/ga/exthandlers.py

@@ -692,6 +734,26 @@ def __handle_and_report_ext_handler_errors(ext_handler_i, error, report_op, mess
            add_event(name=name, version=handler_version, op=report_op, is_success=False, log_event=True,
                      message=message)

+    @staticmethod
+    def __handle_and_report_policy_error(ext_handler_i, error, report_op, message, report=True, extension=None):
+        # TODO: Consider merging this function with __handle_and_report_ext_handler_errors() above.


Could you please leave some comment explaining why we broke this into a separate function? For policy related failures, we want to fail extensions fast. CRP will continue to poll for single-config ext status until timeout, so agent should write a status for single-config extensions. The other function does not create that status and we didn't want to touch the other function without investigating the impact of that change further

mgunnala · 2024-11-20T17:54:19Z

azurelinuxagent/ga/exthandlers.py

+
+        # Create status file for extensions with settings (single and multi config).
+        if extension is not None:
+            ext_handler_i.create_status_file_if_not_exist(extension, status=ExtensionStatusValue.error, code=error.code,


create_status_file_if_not_exist() will not overwrite existing status file (for the current sequence number). Is this behavior acceptable?

we should overwrite the existing file with the policy error

We now overwrite the existing file with policy error. I've added an "overwrite" parameter and changed the function name to create_status_file( ).

maddieford · 2024-11-21T02:26:07Z

azurelinuxagent/ga/exthandlers.py

+                ExtensionRequestedState.Enabled: ('enable', ExtensionErrorCodes.PluginEnableProcessingFailed),
+                # Note: currently, when uninstall is requested for an extension, CRP polls until the agent does not
+                # report status for that extension, or until timeout is reached. In the case of a policy error, the
+                # agent reports failed status on behalf of the extension, which will cause CRP to  for the full timeout,


Suggested change

# agent reports failed status on behalf of the extension, which will cause CRP to for the full timeout,

# agent reports failed status on behalf of the extension, which will cause CRP to poll for the full timeout,

nit

Fixed, thanks!

tests/ga/test_extension.py

maddieford · 2024-11-21T17:51:30Z

tests_e2e/test_suites/ext_policy.yml

+name: "ExtensionPolicy"
+tests:
+  - "ext_policy/ext_policy.py"
+images: "random(endorsed)"


I think we should run this on more distros so we can get better coverage before releasing the changes

will running on all endorsed distros add too much overhead to the daily runs?

tests_e2e/tests/ext_policy/ext_policy.py

maddieford · 2024-11-21T18:38:34Z

tests_e2e/tests/ext_policy/ext_policy.py

+            fail(f"The agent should have reported an error trying to {operation} {extension_case.extension.__str__()} "
+                 f"because the extension is disallowed by policy.")
+        except Exception as error:
+            assert_that("Extension will not be processed" in str(error)) \


Could we also check for [ExtensionPolicyError] in the message to confirm the failure was due to policy

azurelinuxagent/ga/exthandlers.py

maddieford · 2024-11-21T18:58:32Z

tests/ga/test_multi_config_extension.py

@@ -630,6 +630,70 @@ def test_it_should_handle_and_report_enable_errors_properly(self):
                }
                self._assert_extension_status(sc_handler, expected_extensions)

+    def test_it_should_handle_and_report_disallowed_extensions_properly(self):


Could you also please add a case for multi config ext allowed by policy

maddieford · 2024-11-21T18:58:56Z

tests_e2e/test_suites/ext_policy_with_dependencies.yml

+name: "ExtPolicyWithDependencies"
+tests:
+  - "ext_policy/ext_policy_with_dependencies.py"
+images: "random(endorsed)"


same comment here, we should get more coverage than 1 run per day, maybe consider running on all endorsed, or 5-10 endorsed images per day

Updated to all endorsed, but I can change to 5-10 if this adds too much overhead.

tests_e2e/tests/ext_policy/ext_policy.py

narrieta

Posting comments for Agent code.

Will post comments for test code separately.

narrieta · 2024-11-26T18:32:53Z

azurelinuxagent/ga/policy/policy_engine.py

+    """
+    Error raised during agent extension policy enforcement.
+    """
+    def __init__(self, msg, code, inner=None):


The 'code' and 'inner' parameters are not in the same order as in the base class, which can lead to subtle coding errors.

I wrote it this way because I wanted "code" to be a required parameter in ExtensionPolicyEngine, but not "inner". But I can set a default value for "code", to keep them in the same order as in the base class.

I ended up removing this class, based on the other comments

narrieta · 2024-11-26T19:18:06Z

azurelinuxagent/ga/exthandlers.py

+            # Invoke policy engine to determine if extension is allowed. If not, block extension and report error on
+            # behalf of the extension.
+            policy_err_map = {
+                ExtensionRequestedState.Enabled: ('enable', ExtensionErrorCodes.PluginEnableProcessingFailed),


can we add a comment describing the elements in the tuple?

Added and moved this to the class level.

narrieta · 2024-11-26T19:19:43Z

azurelinuxagent/ga/exthandlers.py

        for extension, ext_handler in all_extensions:

            handler_i = ExtHandlerInstance(ext_handler, self.protocol, extension=extension)

+            # Invoke policy engine to determine if extension is allowed. If not, block extension and report error on
+            # behalf of the extension.
+            policy_err_map = {


seems like this is a constant... define it at the class level?

narrieta · 2024-11-26T19:36:22Z

azurelinuxagent/ga/exthandlers.py

+            # Invoke policy engine to determine if extension is allowed. If not, block extension and report error on
+            # behalf of the extension.
+            policy_err_map = {
+                ExtensionRequestedState.Enabled: ('enable', ExtensionErrorCodes.PluginEnableProcessingFailed),


'enable' and 'disable' are internal CRP/Agent operations; users are not aware of them. They should not be propagated to error messages displayed to the user

Updated this to "run" and "uninstall"

narrieta · 2024-11-26T19:41:42Z

azurelinuxagent/ga/exthandlers.py

+            }
+            policy_op, policy_err_code = policy_err_map.get(ext_handler.state)
+            if policy_error is not None:
+                err = ExtensionPolicyError(msg="", inner=policy_error, code=policy_err_code)


what is the intention of creating an exception object here? seems like it is only used to pass the error code, but it is never raised

I initially implemented the ExtensionPolicyError class to have a centralized error message for extensions blocked by policy, and also to pass the code. But you make a good point - since we never actually raise the exception, I've removed the ExtensionPolicyError class and now pass the code/message directly into the reporting function.

azurelinuxagent/ga/exthandlers.py

narrieta · 2024-11-26T20:21:05Z

azurelinuxagent/ga/exthandlers.py

+
+        # Create status file for extensions with settings (single and multi config).
+        if extension is not None:
+            ext_handler_i.create_status_file_if_not_exist(extension, status=ExtensionStatusValue.error, code=error.code,


we should overwrite the existing file with the policy error

narrieta · 2024-11-26T20:23:14Z

azurelinuxagent/ga/exthandlers.py

+            ext_handler_i.create_status_file_if_not_exist(extension, status=ExtensionStatusValue.error, code=error.code,
+                                                          operation=report_op, message=message)
+
+        if report:


when would report be False?

Currently it isn't ever false, I initially wrote it this way because I was copying the exact structure of __handle_and_report_ext_handler_errors(). But I've removed it since that parameter isn't being used for now.

narrieta · 2024-11-26T23:05:56Z

azurelinuxagent/ga/exthandlers.py

@@ -990,7 +1061,10 @@ def report_ext_handler_status(self, vm_status, ext_handler, goal_state_changed):
        # extension even if HandlerState == NotInstalled (Sample scenario: ExtensionsGoalStateError, DecideVersionError, etc)
        # We also need to report extension status for an uninstalled handler if extensions are disabled because CRP
        # waits for extension runtime status before failing the extension operation.
-        if handler_state != ExtHandlerState.NotInstalled or ext_handler.supports_multi_config or not conf.get_extensions_enabled():
+        # In the case of policy failures, we want to report extension status with a terminal code so CRP fails fast. If


Let's change this

# We also need to report extension status for an uninstalled handler if extensions are disabled because CRP # waits for extension runtime status before failing the extension operation. # In the case of policy failures, we want to report extension status with a terminal code so CRP fails fast. If # extension status is not present, collect_ext_status() will set a default transitioning status, and CRP will # wait for timeout.

to

# We also need to report extension status for an uninstalled handler if extensions are disabled, or if the extension # failed due to policy, because CRP waits for extension runtime status before failing the extension operation.

The intention of the change is to enter this condition when the extension fails due to policy, but this change means that we enter the condition whenever policy is enabled.

Is there any negative effect to calling ext_handler_i.get_extension_handler_statuses... whenever policy is enabled? Why is this behind the if condition in the first place?

Discussed offline - removed this condition, because it would cause us to enter the if condition even for non-policy-related uninstall failures.

maddieford

Left comments mainly for e2e tests. I'll review unit tests once the comments in exthandlers.py are resolved

maddieford · 2024-11-26T22:53:42Z

tests_e2e/tests/ext_policy/ext_policy.py

+
+        # Only allowlisted extensions should be processed.
+        # We only allowlist CustomScript: CustomScript should be enabled, RunCommand and AzureMonitor should fail.
+        # (Note that CustomScript blocked by policy is tested in a later test case.)


Adding comments to the review so I can follow the scenarios easier. Consider adding these as comments in the code, but ultimately up to you:
This policy tests the following scenarios:
- single config ext (CSE) enable operation succeeds when allowed by policy
- no-config ext (AzureMonitor) enable operation fails fast when disallowed by policy
- single multi-config instance (RunCommandHandler) enable operation fails fast when disallowed by policy

Added, thanks!

tests_e2e/tests/ext_policy/ext_policy.py

maddieford · 2024-11-26T23:04:04Z

tests_e2e/tests/ext_policy/ext_policy.py

+            self._operation_should_succeed("delete", azure_monitor)
+
+        # Should not uninstall disallowed extensions.
+        # CustomScript is removed from the allowlist: delete operation should fail.


This policy tests the following scenarios:
- a delete operation on a previously enabled single-config ext (CSE) which is now disallowed by policy fails fast
- multiple multi-config instances (RunCommandHandler and RunCommandHandler2) enable operations fail fast when disallowed by policy
- single-config ext (CSE) enable operation fails fast when disallowed by policy

Added, thanks!

tests_e2e/tests/ext_policy/ext_policy.py

maddieford · 2024-11-27T00:02:58Z

tests_e2e/tests/ext_policy/ext_policy_with_dependencies.py

+                    log.info("CRP returned an error for deletion operation, may be a false error. Checking agent log to determine if operation succeeded. Exception: {0}".format(crp_err))
+                    try:
+                        for ssh_client in ssh_clients.values():
+                            msg = ("Remove the extension slice: {0}".format(str(ext_to_delete)))


This message is related to cgroup. Right now it's logged even when cgroup isn't enabled (which might and probably should change in the future).

Instead, we should check that the handler was successfully uninstalled. i.e. the last ext status reported by the agent shouldn't include the handler:

2024-11-26T23:54:04.306568Z INFO ExtHandler ExtHandler Extension status: [('Microsoft.Azure.Monitor.AzureMonitorLinuxAgent', 'Ready')]

You might also consider doing this by checking the instance view

Updated this - we now check for the last status reported by the agent and confirm that there is no handler status reported. (Instance view doesn't work because CRP reports a stale status)

maddieford · 2024-11-27T00:09:45Z

tests_e2e/tests/ext_policy/ext_policy_with_dependencies.py

+    _test_cases = [
+        _should_fail_single_config_depends_on_disallowed_no_config,
+        _should_fail_single_config_depends_on_disallowed_single_config,
+        # TODO: RunCommand is unable to be installed properly, so these tests are currently disabled. Investigate the


Let's specify 'RunCommandHandler' since RunCommand is a different extension (confusing, I know :)

Also is it that RunCommandHandler is unable to be "installed properly" or uninstalled?

Good catch - it's that RunCommandHandler is unable to be uninstalled. I've updated this.

tests_e2e/tests/ext_policy/ext_policy_with_dependencies.py

maddieford · 2024-11-27T00:19:39Z

tests_e2e/tests/ext_policy/policy_dependencies_cases.py

+    return policy, template, expected_errors, deletion_order
+
+
+def _should_no_dependencies():


Looks like this may be leftover code, I don't see it referenced

Removed, thanks!

maddieford · 2024-11-27T00:19:48Z

tests_e2e/tests/scripts/agent_ext_workflow-check_data_in_agent_log.py

 from pathlib import Path
 from tests_e2e.tests.lib.agent_log import AgentLog


 def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--data", dest='data', required=True)
+    parser.add_argument("--after-timestamp", dest='after_timestamp', required=False)


nice :) thanks for adding this!

narrieta

Comments for test code

tests/ga/test_extension.py

narrieta · 2024-11-27T20:18:19Z

tests/ga/test_multi_config_extension.py

@@ -630,6 +630,98 @@ def test_it_should_handle_and_report_enable_errors_properly(self):
                }
                self._assert_extension_status(sc_handler, expected_extensions)

+    def test_it_should_handle_and_report_extensions_disallowed_by_policy_properly(self):


what does "properly" mean? (what is the expected behavior?)

changed this to "test_it_should_report_successful_status_for_extensions_allowed_by_policy" and "test_it_should_report_failed_status_for_extensions_disallowed_by_policy"

narrieta · 2024-11-27T20:36:12Z

tests/ga/test_multi_config_extension.py

+    def test_it_should_handle_and_report_extensions_disallowed_by_policy_properly(self):
+        """If multiconfig extension is disallowed by policy, all instances should be blocked."""
+        policy_path = os.path.join(self.tmp_dir, "waagent_policy.json")
+        patch('azurelinuxagent.common.conf.get_policy_file_path', return_value=str(policy_path)).start()


this should be done using the 'with' statement

Updated, thanks

narrieta · 2024-11-27T20:40:25Z

tests_e2e/test_suites/ext_policy_with_dependencies.yml

+  - "ext_policy/ext_policy_with_dependencies.py"
+images: "endorsed"
+executes_on_scale_set: true
+# This test should run on its own VMSS, because other tests may leave behind extensions


we should handle this in the test and allow it to share the vm

Added code at the start of each test to remove leftover extensions

tests_e2e/tests/ext_policy/ext_policy.py

narrieta · 2024-11-27T22:43:28Z

tests_e2e/tests/ext_policy/ext_policy.py

+
+        # RunCommandHandler is a multi-config extension, so we set up two instances (configurations) here and test both.
+        run_command = ExtPolicy.TestCase(
+            VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.RunCommandHandler,


should use VirtualMachineRunCommand instead of VirtualMachineExtensionClient

narrieta · 2024-11-27T23:04:44Z

tests_e2e/tests/ext_policy/ext_policy.py

+        self._create_policy_file(policy)
+        self._operation_should_succeed("enable", custom_script)
+        self._operation_should_fail("enable", run_command)
+        if VmExtensionIds.AzureMonitorLinuxAgent.supports_distro((self._ssh_client.run_command("get_distro.py").rstrip())):


how much coverage are we getting for this case?

I've changed ext_policy.yml to run on all endorsed distros, so this case will be run on all distros that support AzureMonitorLinuxAgent.

why do we need the check on distro if it's going to fail due to policy anyways?

I figured I would avoid testing this extension at all on an unsupported distro, but I can remove the distro check specifically for this case, if you think it would be useful.

yes, no need for that check

I realized this is causing an error - when the policy is changed to allow all, the agent tries to re-enable AMA with the next goal state, and the install fails on unsupported distros.

I've added the supported distro check again. Another option is to uninstall AMA before sending any other goal state and then check agent log to validate the uninstall (race condition will result in an error). If you think it's necessary, I could add that change.

narrieta · 2024-11-27T23:07:10Z

tests_e2e/tests/ext_policy/ext_policy.py

+        if VmExtensionIds.AzureMonitorLinuxAgent.supports_distro((self._ssh_client.run_command("get_distro.py").rstrip())):
+            self._operation_should_fail("enable", azure_monitor)
+
+        # When allowlist is turned off, all extensions should be processed.


How about CustomScript?

The final test case tests this - line 256

narrieta · 2024-11-27T23:20:44Z

tests_e2e/tests/ext_policy/ext_policy.py

+            self._operation_should_succeed("enable", azure_monitor)
+            self._operation_should_succeed("delete", azure_monitor)
+
+        # Should not uninstall disallowed extensions.


seems like

# Only allowlisted extensions should be processed.

and

# Should not uninstall disallowed extensions.

or are we trying to test something different?

I've updated the comments to make it more clear what is being tested with each policy

narrieta · 2024-11-27T23:22:30Z

tests_e2e/tests/ext_policy/ext_policy.py

+                }
+            }
+        self._create_policy_file(policy)
+        # # Known CRP issue - delete/uninstall operation times out instead of reporting an error.


This is not a CRP issue, but rather a design issue. Uninstall is best effort and never fails. You should consider checking the agent log for the error and then the instance view to confirm the extension was not uninstalled

I've enabled the failed deletion case and updated the test to validate the agent log and instance view.

azurelinuxagent/ga/exthandlers.py

maddieford · 2025-01-14T22:07:05Z

azurelinuxagent/ga/exthandlers.py

+        # - For extensions with settings (install/enable errors): report at both handler and extension levels.
+
+        # Keep a list of disallowed extensions so that report_ext_handler_status() can report status for them.
+        self.disallowed_ext_handlers.append(ext_handler_i.ext_handler)


When does an extension get removed from disallowed_ext_handlers? The ExtHandlersHandler is instantiated only once on agent init

appending to the list in a method named "report_policy_error" does not feel right (reporting should not change the state of the object). rename to handle_policy_error?

I can change this to "handle_ext_disallowed_error"

maddieford · 2025-01-14T22:15:54Z

azurelinuxagent/ga/exthandlers.py

@@ -498,11 +530,22 @@ def handle_ext_handlers(self, goal_state_id):
                logger.info("{0}: {1}".format(ext_full_name, msg))
                add_event(op=WALAEventOperation.ExtensionProcessing, message="{0}: {1}".format(ext_full_name, msg))
                handler_i.set_handler_status(status=ExtHandlerStatusValue.not_ready, message=msg, code=-1)


If we want to handle extensions_disabled scenario in this PR, I think we should remove the logic to create handler status and extension status here and call __report_policy_error instead.

Although, if we do that, I think we should rename __report_policy_error to something like '__report_ext_disallowed_error' and update the comments in that function to indicate it can be called for either extension disallowed by policy OR extension processing disabled via config.

We should also rename the policy_error_map to be generic to extensions disabled too

azurelinuxagent/ga/exthandlers.py

tests/data/test_waagent.conf

maddieford · 2025-01-15T15:25:50Z

tests_e2e/test_suites/ext_policy_with_dependencies.yml

@@ -0,0 +1,11 @@
+#
+# The test suite verifies that disallowed extensions are not processed, but the agent should still report status.


nit:

Suggested change

# The test suite verifies that disallowed extensions are not processed, but the agent should still report status.

# The test suite verifies that disallowed extensions and any extensions dependent on disallowed extensions are not processed, but the agent should still report status.

maddieford · 2025-01-15T15:55:15Z

tests_e2e/tests/ext_policy/ext_policy.py

+
+        # The full CRP timeout period for extension operation failure is 90 minutes. For efficiency, we reduce the
+        # timeout limit to 15 minutes here. We expect "delete" operations on disallowed VMs to reach timeout instead of
+        # failing fast, because delete is a best effort operation by-design and should not fail.


because delete is a best effort operation by-design and should not fail.
I think we should these details to this comment for future readers:

Delete operations on disallowed VMs reach timeout because the agent reports status for the failure, but CRP is waiting for no status to be reported for the extension.

CRP continues to poll for no status to be reported because delete is treated as a best effort operation and should not fail.

Essentially, we're forcing a delete failure, which is unexpected by CRP

The test no longer updates the crp timeout, but I've added these comments in _operation_should_fail()

maddieford · 2025-01-15T16:18:19Z

tests_e2e/tests/ext_policy/ext_policy.py

+        self._operation_should_fail("enable", run_command)
+        self._operation_should_fail("enable", run_command_2)
+        # Only call enable on AMA if supported. The agent will try to re-enable AMA as a part of the next goal state, when
+        # policy is changed to allow all. This will cause errors on an unsupported distro.


This comment is a little confusing.
I think you added this part as an explanation for why you only call enable on supported distros: The agent will try to re-enable AMA as a part of the next goal state, when policy is changed to allow all. This will cause errors on an unsupported distro.
But it sounds like this is an explanation of what is going to happen even if you only call enable on supported distros

Updated the comment

maddieford · 2025-01-15T16:28:25Z

tests_e2e/tests/ext_policy/ext_policy.py

+                assert_that(expected_msg in str(error)) \
+                    .described_as(
+                    f"Error message is expected to contain '{expected_msg}', but actual error message was '{error}'").is_true()
+                log.info(f"{extension_case.extension} {operation} failed as expected")


Suggested change

log.info(f"{extension_case.extension} {operation} failed as expected")

log.info(f"{extension_case.extension} {operation} failed as expected due to policy")

maddieford · 2025-01-15T16:29:41Z

tests_e2e/tests/ext_policy/ext_policy.py

+                    f"The agent should have reported an error trying to {operation} {extension_case.extension} "
+                    f"because the extension is disallowed by policy.")
+            except Exception as error:
+                expected_msg = "Extension will not be processed: failed to run extension"


I think the expected_msg should include that the ext failed to run due to being disallowed

tests_e2e/tests/ext_policy/ext_policy.py

tests_e2e/tests/ext_policy/ext_policy_with_dependencies.py

azurelinuxagent/ga/exthandlers.py

narrieta · 2025-01-15T17:55:40Z

azurelinuxagent/ga/exthandlers.py

@@ -276,6 +297,8 @@ class ExtHandlersHandler(object):
    def __init__(self, protocol):
        self.protocol = protocol
        self.ext_handlers = None
+        # Maintain a list of extensions that are disallowed, and always report extension status for disallowed extensions.
+        self.disallowed_ext_handlers = []


should be private

narrieta · 2025-01-16T21:20:25Z

tests_e2e/orchestrator/scripts/collect-logs

@@ -19,7 +19,8 @@ tar --exclude='journal/*' --exclude='omsbundle' --exclude='omsagent' --exclude='
    -czf "$logs_file_name" \
    /var/log \
    /var/lib/waagent/ \
-    $waagent_conf
+    $waagent_conf \
+    /etc/waagent_policy.json


no issue with collecting this, just curious: why do we need it?

Earlier, I found a bug where the policy file was not being updated by the test correctly. Collecting the policy file was useful for debugging that scenario, and I figured it might be helpful for future debugging too.

Yes, but as far as debugging the tests, they change the policy file multiple times and this collects only the last update. You should remove it from here and add it to the goal state history (ok to mark as TODO for next PR)

Added as a TODO

tests_e2e/tests/ext_policy/ext_policy.py

tests_e2e/tests/ext_policy/ext_policy_with_dependencies.py

narrieta · 2025-01-17T23:37:07Z

azurelinuxagent/ga/exthandlers.py

+                agent_conf_file_path = get_osutil().agent_conf_file_path
+                msg = "Extension '{0}' will not be processed since extension processing is disabled. To enable extension " \
+                      "processing, set Extensions.Enabled=y in '{1}'".format(ext_full_name, agent_conf_file_path)
+                # logger.info(msg)


This was removed in commit ce5cf20

maddieford · 2025-01-21T22:31:44Z

azurelinuxagent/ga/exthandlers.py

+        #       it instead of PluginDisableProcessingFailed below.
+        #
+        # Note: currently, when uninstall is requested for an extension, CRP polls until the agent does not
+        #       report status for that extension, or until timeout is reached. In the case of a policy error, the


nit

Suggested change

# report status for that extension, or until timeout is reached. In the case of a policy error, the

# report status for that extension, or until timeout is reached. In the case of an extension disallowed error, the

maddieford · 2025-01-21T22:33:34Z

azurelinuxagent/ga/exthandlers.py

@@ -276,6 +297,9 @@ class ExtHandlersHandler(object):
    def __init__(self, protocol):
        self.protocol = protocol
        self.ext_handlers = None
+        # Maintain a list of extension handler objects that are disallowed (e.g. blocked by policy, extensions disabled, etc.).
+        # Extension status is always reported for the extensions in this list. List is reset for each goal state.


nit: I think this comment should indicate that extension status is always reported for the extensions in the list if an extension status exists (i.e. it's not a no-config ext)

Updated this wording to

Extension status, if it exists, is always reported for the extensions in this list.

maddieford · 2025-01-21T22:37:26Z

azurelinuxagent/ga/exthandlers.py

-                                                          code=-1,
-                                                          operation=handler_i.operation,
-                                                          message=msg)
+                agent_conf_file_path = get_osutil().agent_conf_file_path


nit: should probably use get_agent_conf_file_path() instead

azurelinuxagent/ga/exthandlers.py

mgunnala added 3 commits November 8, 2024 16:22

Block disallowed extension processing

c2cc2c6

Enable policy e2e tests

151081d

Pylint

edec2af

mgunnala requested review from narrieta, ZhidongPeng, nagworld9 and maddieford as code owners November 11, 2024 20:43

mgunnala commented Nov 12, 2024

View reviewed changes

azurelinuxagent/ga/policy/policy_engine.py Outdated Show resolved Hide resolved

Fix e2e test failures

a37508f

mgunnala force-pushed the allowlist_2 branch from b440696 to a37508f Compare November 12, 2024 18:52

maddieford reviewed Nov 14, 2024

View reviewed changes

mgunnala and others added 2 commits November 18, 2024 12:19

Address review comments

b0da554

Merge branch 'develop' into allowlist_2

a4f5cab

maddieford reviewed Nov 20, 2024

View reviewed changes

Address review comments

699b9ba

mgunnala commented Nov 20, 2024

View reviewed changes

maddieford reviewed Nov 21, 2024

View reviewed changes

Address test review comments

86de0c5

mgunnala force-pushed the allowlist_2 branch from e909568 to 86de0c5 Compare November 21, 2024 21:41

mgunnala added 5 commits November 22, 2024 13:08

Remove status file for single-config

c3e9b89

Add back status file for single-config

65d7034

Run e2e tests on all endorsed

95f247a

Fix UT failures

3b18519

Pylint

63da127

narrieta reviewed Nov 26, 2024

View reviewed changes

Merge branch 'develop' into allowlist_2

471cd59

maddieford reviewed Nov 27, 2024

View reviewed changes

narrieta reviewed Nov 27, 2024

View reviewed changes

Address review comments for agent code

8ea989b

mgunnala commented Jan 14, 2025

View reviewed changes

azurelinuxagent/ga/exthandlers.py Outdated Show resolved Hide resolved

maddieford reviewed Jan 14, 2025

View reviewed changes

maddieford reviewed Jan 15, 2025

View reviewed changes

narrieta reviewed Jan 15, 2025

View reviewed changes

azurelinuxagent/ga/exthandlers.py Show resolved Hide resolved

narrieta reviewed Jan 15, 2025

View reviewed changes

mgunnala and others added 2 commits January 15, 2025 15:35

Address agent code comments

7069a0b

Merge branch 'develop' into allowlist_2

8231234

narrieta reviewed Jan 16, 2025

View reviewed changes

tests_e2e/tests/ext_policy/ext_policy.py Outdated Show resolved Hide resolved

narrieta reviewed Jan 16, 2025

View reviewed changes

tests_e2e/tests/ext_policy/ext_policy.py Outdated Show resolved Hide resolved

narrieta reviewed Jan 16, 2025

View reviewed changes

tests_e2e/tests/ext_policy/ext_policy.py Outdated Show resolved Hide resolved

narrieta reviewed Jan 16, 2025

View reviewed changes

tests_e2e/tests/ext_policy/ext_policy_with_dependencies.py Outdated Show resolved Hide resolved

narrieta reviewed Jan 16, 2025

View reviewed changes

tests_e2e/tests/ext_policy/ext_policy_with_dependencies.py Outdated Show resolved Hide resolved

narrieta reviewed Jan 16, 2025

View reviewed changes

tests_e2e/tests/ext_policy/ext_policy_with_dependencies.py Outdated Show resolved Hide resolved

mgunnala and others added 3 commits January 17, 2025 14:16

Address test comments

ce5cf20

Merge branch 'develop' into allowlist_2

e222aa7

Small formatting fix

5abbfc6

narrieta reviewed Jan 17, 2025

View reviewed changes

narrieta and others added 4 commits January 17, 2025 16:43

Merge branch 'develop' into allowlist_2

6abd3f8

Fix comments

9fa38ea

Merge branch 'develop' into allowlist_2

7ebbd2b

Merge branch 'develop' into allowlist_2

1e6a82d

narrieta previously approved these changes Jan 21, 2025

View reviewed changes

maddieford reviewed Jan 21, 2025

View reviewed changes

Address review comments

8cd64d4

mgunnala dismissed narrieta’s stale review via 8cd64d4 January 22, 2025 15:35

maddieford approved these changes Jan 22, 2025

View reviewed changes

narrieta approved these changes Jan 22, 2025

View reviewed changes

narrieta merged commit 9e2ada3 into Azure:develop Jan 22, 2025
11 checks passed

mgunnala deleted the allowlist_2 branch January 22, 2025 16:40

	# agent reports failed status on behalf of the extension, which will cause CRP to for the full timeout,
	# agent reports failed status on behalf of the extension, which will cause CRP to poll for the full timeout,

		return policy, template, expected_errors, deletion_order


		def _should_no_dependencies():

		@@ -0,0 +1,11 @@
		#
		# The test suite verifies that disallowed extensions are not processed, but the agent should still report status.

	log.info(f"{extension_case.extension} {operation} failed as expected")
	log.info(f"{extension_case.extension} {operation} failed as expected due to policy")

	# report status for that extension, or until timeout is reached. In the case of a policy error, the
	# report status for that extension, or until timeout is reached. In the case of an extension disallowed error, the

Block extensions disallowed by policy #3259

Block extensions disallowed by policy #3259

Conversation

mgunnala commented Nov 11, 2024

Description

PR information

Quality of Code and Contribution Guidelines

codecov bot commented Nov 13, 2024

Codecov Report

maddieford left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgunnala Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maddieford left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

narrieta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maddieford left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

narrieta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgunnala Nov 15, 2024 •

edited

Loading