Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] RBAC - Add API integration tests for bulk endpoints proceed in batches #191643

Open
jillguyonnet opened this issue Aug 28, 2024 · 1 comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@jillguyonnet
Copy link
Contributor

With #185040 the agents write APIs were made space-aware. As part of this change, a number of API integration tests were added, mostly in https://github.com/elastic/kibana/blob/main/x-pack/test/fleet_api_integration/apis/space_awareness/agents.ts.

However, these tests don't adequately cover the case of certain bulk queries when these are processed in batches. These test cases should be added for bulk endpoints that can result in batch processing.

Example

When a kuery is passed to POST /agents/bulk_reassign and the number of agents to be processed is higher than the batch size, a task is kicked off:

if (res.total <= batchSize) {
givenAgents = res.agents;
} else {
return await new ReassignActionRunner(
esClient,
soClient,
{
...options,
spaceId: currentNameSpace,
batchSize,
total: res.total,
newAgentPolicyId,
},
{ pitId: await openPointInTime(esClient) }
).runActionAsyncWithRetry();
}

The snippet below is a WIP test case that aims to cover this scenario. Note that it only worked intermittently (usually the first run on a fresh setup passed and the following ones timed out).

Note: looking at the ES logs, it seems that the tasks are retried forever after a test fails.

it('should reassign agents in the same space by kuery in batches', async () => {
       // Initial test to check that we are starting as expected
        let agents = await apiClient.getAgents();
        let agentPolicyIds = getAgentPolicyIds(agents);
        expect(agentPolicyIds).to.eql({
          [defaultSpaceAgent1]: defaultSpacePolicy1.item.id,
          [defaultSpaceAgent2]: defaultSpacePolicy2.item.id,
        });
        agents = await apiClient.getAgents(TEST_SPACE_1);
        agentPolicyIds = getAgentPolicyIds(agents);
        expect(agentPolicyIds).to.eql({
          [testSpaceAgent1]: spaceTest1Policy1.item.id,
          [testSpaceAgent2]: spaceTest1Policy2.item.id,
          [testSpaceAgent3]: spaceTest1Policy1.item.id,
        });

       // Do a bulk reassign with a small batch size, resulting in a task running
        const res = await apiClient.bulkReassignAgents(
          {
            agents: '*',
            policy_id: spaceTest1Policy2.item.id,
            batchSize: 1,
          },
          TEST_SPACE_1
        );

       // Utility function to verify the results of reassigning
        const verifyActionResult = async () => {
          agents = await apiClient.getAgents(TEST_SPACE_1);
          agentPolicyIds = getAgentPolicyIds(agents);
          expect(agentPolicyIds).to.eql({
            [testSpaceAgent1]: spaceTest1Policy2.item.id,
            [testSpaceAgent2]: spaceTest1Policy2.item.id,
            [testSpaceAgent3]: spaceTest1Policy2.item.id,
          });
        };

       // Retry logic that waits until the agent actions is ready to verify the results
        await new Promise((resolve, reject) => {
          let attempts = 0;
          const intervalId = setInterval(async () => {
            if (attempts > 20) {
              clearInterval(intervalId);
              reject(new Error('action timed out'));
            }
            ++attempts;
            const actionStatus = await apiClient.getActionStatus(TEST_SPACE_1);
            const action = actionStatus.items.find((a: any) => a.actionId === res.actionId);
            if (action && action.nbAgentsActioned === 3 && action.nbAgentsActionCreated === 2) {
              clearInterval(intervalId);
              await verifyActionResult();
              resolve({});
            }
          }, 1000);
        }).catch((e) => {
          throw e;
        });
      });
    });
@jillguyonnet jillguyonnet added the Team:Fleet Team label for Observability Data Collection Fleet team label Aug 28, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

jillguyonnet added a commit that referenced this issue Aug 30, 2024
## Summary

Closes #185040

Followup to:
#188507
#189519
#190069

This PR contains the last required changed for making Fleet agents write
APIs space aware:
* Implement space awareness in the following endpoints:
   * `POST /agents/{agentId}/unenroll`
   * `POST /agents/{agentId}/request_diagnostics`
   * `POST /agents/bulk_unenroll`
   * `POST /agents/bulk_request_diagnostics`
* Fix a bug in `POST /agents/bulk_update_agent_tags` where the space id
was not passed.
* Simply the setting of the `namespaces` property when creating agent
actions when the space id is known.
* Rename `currentNamespace` to `currentSpaceId` where appropriate (see
comment below).
* Add API integration tests and consolidate existing ones. ~⚠️ At the
time of writing, I would like there to be more tests covering bulk query
processing in batches, which are currently lacking. I have experienced
difficulties getting those tests to pass consistently.~ Filed [followup
issue](#191643) for those.

### A note on terminology

As pointed out in
#191083 (comment),
it seems that the terms "namespace" and "space id" are occasionally used
interchangeably in some parts of the codebase to refer to a Kibana
space. For instance, documents in Fleet indices (agents, agent policies,
agent actions...) [possess a `namespaces`
property](elastic/elasticsearch#108363) to track
the spaces they belong to. The current space id is also returned using
the Saved Object client's `getCurrentNamespace` function.

However, "namespace" is also a datastream property. In the Agent policy
settings UI, the "Spaces" property (which will be linked to the saved
object's `namespaces` property) is above the "Default namespace"
property, which relates to the integration's data streams:
<img width="1916" alt="Screenshot 2024-08-26 at 14 51 10"
src="https://github.com/user-attachments/assets/fe2a0948-3387-4a93-96dc-90fc5cf1a683">

This should not be a source of major issues, but is best clarified for
future reference. In this PR, I've replaced some occurrences of
`namespace` with `spaceId` where appropriate to try to maximise the use
of the latter.

### Testing

* This PR should be put through the Flaky Test Runner prior to merging
(I will kick the job).
* Manual testing should also be performed for the new endpoints
mentioned above.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed

---------

Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

2 participants