-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet][EPM] Save installed package assets in ES #83391
Conversation
Pinging @elastic/ingest-management (Team:Ingest Management) |
path: { type: 'text' }, | ||
media_type: { type: 'keyword' }, | ||
data_utf8: { type: 'text' }, | ||
data_base64: { type: 'binary' }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went with two fields because I thought we wanted binary
type for the binary assets vs text
but perhaps one field is possible/desirable?
package_version: { type: 'keyword' }, | ||
install_source: { type: 'keyword' }, | ||
path: { type: 'text' }, | ||
media_type: { type: 'keyword' }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also considered mime_type
. Happy to hear any comments / suggestions.
@jfsiii How are we going to consume these assets later? Do we want to use a generated id like |
@nchaulet good question, perhaps we should discuss in #83426 so others see & comment? I believe combining the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
19b9365
to
cfdb0c9
Compare
@elasticmachine merge upstream |
I left a comment #83426 (comment) to see if there was some concern about having so many saved objects. If not, I'm happy for this to go in. |
579f355
to
5ae363a
Compare
@elasticmachine merge upstream |
The biggest cons to not using the
Without a strong justification, I lean heavily towards using the |
@joshdover Thanks for the quick and helpful reply re: The initial commit used Saved Objects out of habit, but was switched to ES documents because I'm not familiar with the tradeoffs. Getting migrations seemed like a Good Thing ™️ but I didn't know about the impact that number (typically hundreds or thousands but could be tens of thousands) or size of documents. Distribution of the file sizes is shown in https://docs.google.com/document/d/18XoS6CSl9UxxPPBt9LXuJngf1Jv-4tl3jY6l19U1yH8/edit?ts=5fc4f129#heading=h.qnioeo56ltp4 if that helps. Let me know if I can provide any more context |
You're correct that we decided to store the PDFs and PNGs created by Reporting in a separate index because of file size concerns. Currently, Reporting uses a new index per week to allow users to implement their own retention policy for how long to keep these reports around. In general, I think we should be defaulting to using saved-objects to store Kibana-specific assets until there's a substantial reason to not do so. #80912 includes some of the historic reasons for why saved-objects haven't been used, but if it's possible to address the underlying deficiencies in saved-objects, we should do so instead of resorting to a custom index. How large of documents are we talking about for packages? If we do decide to go the custom index route, it should be prefixed by a |
Yes!
Migrations only get applied to objects that have an actual migration function registered, so this should not really be an issue, even with 10k objects.
Most of these look mostly small (it appears 75% of the packages are less than 500kb). Are there any known reasons these will get larger in the future? Also this text from the proposal:
Have we actually verified that a base64 encoded zip file is larger than the unencoded strings of all the individual files? It's possible that the compression on the zip outweighs the 30% increase from base64 encoding. From a long-term maintenance standpoint, I think it's better if we can limit the number of documents in Elasticsearch to just the zip'd packages. This will make removing & reinstalling any corrupted packages much simpler for our support team in the case that something breaks. I also think the runtime of unzipping a 1MB file is quite small and I don't imagine this to be a common operation (maybe I'm wrong?) that we need to optimize for. |
I could have worded this more clearly. I meant that migrations seemed like a good reason to use them but, migrations aside, I didn't know if the number & size of objects we'd be creating would be a concern. Doesn't seem so.
That's size of the individual files. Here are the current archive sizes The distribution might say the same, but I expect we will see an increase in packages at the larger end as we open to the public and get different types of integrations (e.g. example logs
In my mind it's less about total size store on disk as it is minimum & median request. It's less taxing to transfer & process several smaller strings vs one large one. Especially because (de)serialization is CPU-intensive and often blocking. In my experience with web performance, time spent in JSON.parse/stringify and other string operations is a primary culprit.
This should always be possible via a single action, whether an http call, a "delete by" query, or action in the UI.
We'll use the local assets on the package details page to avoid hitting the registry for installed assets. We don't have to have them stored individually to do this. We could simply keep the unzipped buffers in memory as we do now. It's more about what I mentioned above about keeping the transaction small. |
One difference here is that Reporting outputs could potentially grow indefinitely, whereas package assets are more finite. It seems unlikely that a user would install tens of thousands of packages (or that we'd ever even have that many in our registry). This makes file size potentially less of a concern, but I think we should verify it. Could we do a simple test where we create 10,000 of these objects in the
Makes sense, and I haven't spent time looking at how Fleet processes this data. If we store the decompressed output of files, wouldn't we still need to parse the JSON on the responses from ES? Even so, your point about minimum & median request times checks out. I thought the size distribution charts were the entire packages, not the files. I agree that decompressing & parsing 10MB+ during a request is not going to be a good experience. |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
💚 Build SucceededMetrics [docs]Distributable file count
Page load bundle
Saved Objects .kibana field count
History
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM John. Thanks again for teh changes to the endpoint generator
@kobelb I missed this question. I believe I addressed it in comments to @joshdover but here's some data on them |
@joshdover I didn't get a chance to run the load tests with 10,000 Saved Objects, but I did run them with every currently available package installed. That resulted in 1951 saved objects
This command
shows the assets installed (only name, version & path) https://gist.github.com/jfsiii/9b2c25fe204a7ebc1912692ed11c13e3 Here are the results of the load tests. They look quite similar to me |
I'm merging this so it's available for @skh, @neptunian, and others to use before Feature Freeze. I believe I've addressed all the feedback from @kobelb, @joshdover, @legrego and others but please comment if there's anything remaining you'd like addressed. We'll add more tests and perhaps instrumentation in follow up PRs. |
## Summary Store package assets (from Registry or local upload) in Elasticsearch. Related to proposal [issue](#83426) & [document](https://docs.google.com/document/d/18XoS6CSl9UxxPPBt9LXuJngf1Jv-4tl3jY6l19U1yH8) * New `epm-packages-assets` saved objects are stored on `.kibana` index, like our existing saved object `epm-packages` * Asset id is uuid v5 based on the package name, package version & file path. See 1974324 * Add a list of IDs of all the installed assets, to `epm-packages` saved object. Like the existing `installed_` properties. [Example](https://github.com/elastic/kibana/pull/83391/files#diff-fa07cac51b6a49bf1e4824bc2250c9a77dac6c7d6b0a56020f559ef1ff9be25fR491-R512) from a test <details><summary>Mapping for new Saved Object</summary> https://github.com/elastic/kibana/blob/37f7b6ded747edb5cc487661b801c5e1c0a102a7/x-pack%2Fplugins%2Ffleet%2Fserver%2Fsaved_objects%2Findex.ts#L329-L339 </details> <details><summary>Additional property on existing <code>epm-packages</code> Saved Object</summary> https://github.com/elastic/kibana/blob/c4f27ab25715a225c710727a37f5f105364e2f72/x-pack/plugins/fleet/server/saved_objects/index.ts#L306-L312 I don't think the saved object changes are strictly required. It can be removed without changing much about how things work - Pros: - Preserves accurate record of the assets added at installation time. Separates what assets are currently available for package-version from what was installed. They _should_ be the same, but things happen. - Avoids a query to get the installed assets before operating on them - Cons: - size/noise? Could be tens or hundreds of ids - migration? </details> ### More details **When are saved objects added?** During installation, after all other actions have succeeded, just before marking the save object as installed, we commit all the files from the package to ES https://github.com/elastic/kibana/blob/37f7b6ded747edb5cc487661b801c5e1c0a102a7/x-pack%2Fplugins%2Ffleet%2Fserver%2Fservices%2Fepm%2Fpackages%2F_install_package.ts#L193-L198 **When are documents removed from the index?** In the `removeInstallation` function which is called in response to a `DELETE /api/fleet/epm/packages/pkgkey` https://github.com/elastic/kibana/blob/37f7b6ded747edb5cc487661b801c5e1c0a102a7/x-pack%2Fplugins%2Ffleet%2Fserver%2Fservices%2Fepm%2Fpackages%2Fremove.ts#L72 or a failed package (re-)installation https://github.com/elastic/kibana/blob/bf068739acce044ac27902253e8fc31df621f081/x-pack%2Fplugins%2Ffleet%2Fserver%2Fservices%2Fepm%2Fpackages%2Finstall.ts#L145 **How are we using these assets?** We're not, currently. Here's an example showing how we could update [`getFileHandler`](https://github.com/elastic/kibana/blob/514b50e4c2d7a3be79d77e73838ff57b6cf1304a/x-pack%2Fplugins%2Ffleet%2Fserver%2Froutes%2Fepm%2Fhandlers.ts#L101) to check for local assets before reaching out to the Registry if we wished. It's not DRY, but it does work ```typescript const esDocRoot = `http://elastic:changeme@localhost:9200/${PACKAGE_ASSETS_INDEX_NAME}/_doc`; const escapedDocId = encodeURIComponent(`${pkgName}-${pkgVersion}/${filePath}`); const esRes = await fetch(`${esDocRoot}/${escapedDocId}`); const esJson = await esRes.json(); if (esJson.found) { const asset: PackageAsset = esJson._source; const body = asset.data_utf8 || Buffer.from(asset.data_base64, 'base64'); return response.ok({ body, headers: { 'content-type': asset.media_type, // should add our own `cache-control` header here // kibana default is prevents caching: `private, no-cache, no-store, must-revalidate` // #83631 }, }); } ``` ### Checklist _updated tests to include new saved object output, no tests added yet_ - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
install_started_at: { type: 'date' }, | ||
install_version: { type: 'keyword' }, | ||
install_status: { type: 'keyword' }, | ||
install_source: { type: 'keyword' }, | ||
}, | ||
}, | ||
}, | ||
[ASSETS_SAVED_OBJECT_TYPE]: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to add this new type to allSavedObjectTypes
so that your users are granted access to these new saved objects? It's probably not strictly necessary since you still require superuser
privileges for Fleet, and superusers can do everything regardless of what you specify here
kibana/x-pack/plugins/fleet/server/plugin.ts
Lines 104 to 112 in aa07f5c
const allSavedObjectTypes = [ | |
OUTPUT_SAVED_OBJECT_TYPE, | |
AGENT_POLICY_SAVED_OBJECT_TYPE, | |
PACKAGE_POLICY_SAVED_OBJECT_TYPE, | |
PACKAGES_SAVED_OBJECT_TYPE, | |
AGENT_SAVED_OBJECT_TYPE, | |
AGENT_EVENT_SAVED_OBJECT_TYPE, | |
ENROLLMENT_API_KEYS_SAVED_OBJECT_TYPE, | |
]; |
* Revert "[Fleet][EPM] Save installed package assets in ES (#83391)" This reverts commit 81a340e. * Revert 00c2e96 Co-authored-by: Kibana Machine <[email protected]>
* Revert "[Fleet][EPM] Save installed package assets in ES (#83391)" This reverts commit 81a340e. * Revert 00c2e96 Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Kibana Machine <[email protected]>
Summary
Store package assets (from Registry or local upload) in Elasticsearch. Related to proposal issue & document
epm-packages-assets
saved objects are stored on.kibana
index, like our existing saved objectepm-packages
epm-packages
saved object. Like the existinginstalled_
properties. Example from a testMapping for new Saved Object
kibana/x-pack/plugins/fleet/server/saved_objects/index.ts
Lines 329 to 339 in 37f7b6d
Additional property on existing
epm-packages
Saved Objectkibana/x-pack/plugins/fleet/server/saved_objects/index.ts
Lines 306 to 312 in c4f27ab
I don't think the saved object changes are strictly required. It can be removed without changing much about how things work
- Preserves accurate record of the assets added at installation time. Separates what assets are currently available for package-version from what was installed. They should be the same, but things happen.
- Avoids a query to get the installed assets before operating on them
- size/noise? Could be tens or hundreds of ids
- migration?
More details
When are saved objects added?
During installation, after all other actions have succeeded, just before marking the save object as installed, we commit all the files from the package to ES
kibana/x-pack/plugins/fleet/server/services/epm/packages/_install_package.ts
Lines 193 to 198 in 37f7b6d
When are documents removed from the index?
In the
removeInstallation
function which is called in response to aDELETE /api/fleet/epm/packages/pkgkey
kibana/x-pack/plugins/fleet/server/services/epm/packages/remove.ts
Line 72 in 37f7b6d
or a failed package (re-)installation
kibana/x-pack/plugins/fleet/server/services/epm/packages/install.ts
Line 145 in bf06873
How are we using these assets?
We're not, currently. Here's an example showing how we could update
getFileHandler
to check for local assets before reaching out to the Registry if we wished. It's not DRY, but it does workChecklist
updated tests to include new saved object output, no tests added yet