[Fleet] Return manifest info from get package info endpoints #143198

hop-dev · 2022-10-12T14:32:13Z

In #142353 we moved to using the package content during installation. But we do not use the package content for the get package info APIs. This is because not all manifest fields are returned from the package registry API, and new fields going forward will not be added to the package registry API. This means to get all of the package info, we must go to the archive.

For viewing a given integration (title, description, screenshots etc) , the package registry API is sufficient, however when we go to create a package policy, we need the full manifest. This has caused issues recently e.g #131609 is blocked until we return all fields. Also input packages which were recently added define many new fields, as a result we now have a fork in the get package info logic for input packages, where we always get the info from the archive, this is OK for input packages as they are very small by design.

Going forward we will need a way to get the full manifest information ideally without downloading and extracting the full package archive which is not bounded in size.

Issues/concerns:

how do we minimise the amount we download and extract in order to get all of the information needed to install a package?
how do we keep the APIs speedy
any solution will need to play well with package validation

Solution Ideas

1. Package registry provides a way for us to get all manifest.yml files directly

The package registry API could allow fleet to get the main manifest file and all data stream manifest files directly. This would mean the registry team don't have to update their schema/parsing logic every time a new field is added, this is handled by fleet.

this reduces the bandwidth of downloading the full archive, and reduces the overhead of extracting the archive, but does increase the complexity of the EPR API. And are there backwards compatibility issues?

2. Fleet has different APIs for "viewing" a package vs. getting "full" info needed for create package policy

Fleet could only load the full package info from the archive when the user goes to create a package policy for the first time. This means when clicking and viewing integrations the API will still be quick, but on loading the package policy create page there could be a long delay.

One concern also is this makes our APIs more complex, we would have a get info endpoint as well as a flag or other API for getting the "full" info

3. ???

elasticmachine · 2022-10-12T14:32:46Z

Pinging @elastic/fleet (Team:Fleet)

hop-dev · 2022-10-12T14:34:00Z

@joshdover I have done a brain dump here of what we chatted about. I think we will need to address this sooner rather than later.

CC @kpollich for info!

joshdover · 2022-10-17T11:11:37Z

For viewing a given integration (title, description, screenshots etc) , the package registry API is sufficient, however when we go to create a package policy, we need the full manifest.

In terms of unblocking #131609, this part is interesting to me. I'm not incredibly familiar with this code, but it seems to me that the way the package policy form works today requires that we have all the information about the package's data streams to construct the package policy via the API, when all we should really need is the input vars and stream structure. Essentially, we're requiring the UI to build the full policy rather than the server where all the necessary data should already be available.

I think we may be able to leverage the new package policy API to solve this:

UI loads necessary info to show the package policy form from the registry
Use the new package policy API format to request creating the policy
1. Backend installs the full package first
2. Backend uses the package contents to build the full policy, without relying on the form to send all the necessary data in the API call

That said, I don't think this will be a solution for all cases. Even in this case, if we wanted to display anything in the UI for the allow_routing flag, we'd need to download the entire package to be able to show this since it's not currently specified on the registry API.

I also don't think having different behavior for different package types or requiring additional API calls for each data stream manifest is going to scalable. My suggestion would be that we pursue some version of option 2, and accepting the penalty this will incur for large packages, like ones that contain ML models. This package doesn't even really need a package policy anyways, and could have it's "Add" button switched to just install the package directly.

hop-dev · 2022-10-18T14:40:46Z

UI loads necessary info to show the package policy form from the registry

For input packages this would need some more info returned from the registry API: elastic/package-registry#864. And I think in the future there would be more fields added to the API as packages become more complex, I think this is what the EPR team are trying to move away from.

This package doesn't even really need a package policy anyways, and could have it's "Add" button switched to just install the package directly.

I like this, this could also work for sample data packages which we've talked about before.

hop-dev · 2022-10-18T14:48:31Z

I generally agree with some version of option 2, we have GET /packages/${name}/${version} at the moment.
We could either:

1. add a new /packages/${name}/${version}/full endpoint or similar (could also be a query param) which returns the package info from the archive and leave the existing endpoint as it is.
1. have the existing endpoint return the full info from the archive and add a new /packages/${name}/${version}/summary or similar endpoint which only returns title, description, screenshots etc

Where option 1 is a less breaking change

kpollich · 2022-10-19T14:22:20Z

add a new /packages/${name}/${version}/full endpoint or similar (could also be a query param) which returns the package info from the archive and leave the existing endpoint as it is.

A query param ?full=true aligns with how our existing agent policy endpoint works when requesting the full policy object, so I'm in favor of this change.

joshdover · 2022-10-28T09:50:34Z

@hop-dev are you actively working on this? Should we assign this to you? If not, we need to get this scheduled so we can finally unblock the allow_routing requirement from APM.

hop-dev · 2022-10-28T12:54:06Z

@joshdover not currently but this has just blocked a ticket I was working on ⬆️ , so I could look to pick it up this sprint, need to check with @kpollich

ruflin · 2022-11-01T08:40:40Z

After the package is installed, will there be any future request to the registry? Next time a policy is created, will Fleet try to reach out to the registry again? How does this work in the context of "upload package" feature?

hop-dev · 2022-11-01T11:04:10Z

Just to emphasise that with the chosen solution (have a new ?full param on the get info endpoint which gets all fields from the archive) our behaviour when getting package info wont change, this new param will only be used when opening the package policy editor.

@ruflin regarding your question, for a registry package that is installed we currently attempt go to the registry first to get the package info then fall back to the archive, I won't be changing this behaviour. That is unless ?full is provided in which case the archive will always be used.

ruflin · 2022-11-02T08:04:04Z

for a registry package that is installed we currently attempt go to the registry first to get the package info then fall back to the archive

Doing this has a few downsides:

Adds latency to the request instead of being able to do a local lookup
Creates cost on the CDN side
Creates 2 different paths for packages manually uploaded vs packages which are part of the registry

kpollich · 2022-11-02T11:37:39Z

Adds latency to the request instead of being able to do a local lookup

Creates cost on the CDN side

FWIW we do cache most of our requests that incur EPR requests on Kibana's side as well via cache-control headers on Fleet's /epm/packages/* endpoints - see #125794.

This does help with perceived latency on the client side and limits traffic to the CDN.

During our get info workflow, we always check the registry first today in order to determine whether the package in question is out of date. A better strategy here would probably be to make the "is there an update available" part of the process a separate, non-blocking request. There's a lot of improvements we can make to the critical path of loading the integration to reduce latency here.

hop-dev added enhancement New value added to drive a business result Team:Fleet Team label for Observability Data Collection Fleet team labels Oct 12, 2022

hop-dev mentioned this issue Oct 13, 2022

[Fleet] Simplify skipArchive behaviour for input packages #141671

Merged

joshdover mentioned this issue Oct 17, 2022

[Fleet] Add support for allow_routing data streams option #131609

Closed

9 tasks

hop-dev mentioned this issue Oct 28, 2022

[Fleet] Honor value of source_mode for data streams #141211

Closed

hop-dev mentioned this issue Nov 4, 2022

[Fleet] Add ?full option to get package info endpoint to return all package fields #144343

Merged

1 task

hop-dev closed this as completed in #144343 Nov 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Return manifest info from get package info endpoints #143198

[Fleet] Return manifest info from get package info endpoints #143198

hop-dev commented Oct 12, 2022 •

edited

Loading

elasticmachine commented Oct 12, 2022

hop-dev commented Oct 12, 2022

joshdover commented Oct 17, 2022 •

edited

Loading

hop-dev commented Oct 18, 2022

hop-dev commented Oct 18, 2022 •

edited

Loading

kpollich commented Oct 19, 2022

joshdover commented Oct 28, 2022

hop-dev commented Oct 28, 2022 •

edited

Loading

ruflin commented Nov 1, 2022

hop-dev commented Nov 1, 2022 •

edited

Loading

ruflin commented Nov 2, 2022

kpollich commented Nov 2, 2022

[Fleet] Return manifest info from get package info endpoints #143198

[Fleet] Return manifest info from get package info endpoints #143198

Comments

hop-dev commented Oct 12, 2022 • edited Loading

Solution Ideas

1. Package registry provides a way for us to get all manifest.yml files directly

2. Fleet has different APIs for "viewing" a package vs. getting "full" info needed for create package policy

3. ???

elasticmachine commented Oct 12, 2022

hop-dev commented Oct 12, 2022

joshdover commented Oct 17, 2022 • edited Loading

hop-dev commented Oct 18, 2022

hop-dev commented Oct 18, 2022 • edited Loading

kpollich commented Oct 19, 2022

joshdover commented Oct 28, 2022

hop-dev commented Oct 28, 2022 • edited Loading

ruflin commented Nov 1, 2022

hop-dev commented Nov 1, 2022 • edited Loading

ruflin commented Nov 2, 2022

kpollich commented Nov 2, 2022

hop-dev commented Oct 12, 2022 •

edited

Loading

joshdover commented Oct 17, 2022 •

edited

Loading

hop-dev commented Oct 18, 2022 •

edited

Loading

hop-dev commented Oct 28, 2022 •

edited

Loading

hop-dev commented Nov 1, 2022 •

edited

Loading