Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Store installed packages in cluster #83426

Closed
jfsiii opened this issue Nov 16, 2020 · 16 comments · Fixed by #83384
Closed

[Fleet] Store installed packages in cluster #83426

jfsiii opened this issue Nov 16, 2020 · 16 comments · Fixed by #83384
Assignees
Labels
Team:Fleet Team label for Observability Data Collection Fleet team v8.0.0

Comments

@jfsiii
Copy link
Contributor

jfsiii commented Nov 16, 2020

Summary

Store the individual package assets (e.g. templates, images, etc) in an Elasticsearch saved object. index
Discussion follow up to proposal ticket: #81110

Open questions

  1. What should the new index name be? [Fleet] Proposal: Store installed packages in cluster #81110 says "also store them in a dedicated ES index" Should this use the default index like EPM's other saved objects or should it use a different (new?) one
    • Went with elastic-packages-assets.
  2. Any restrictions on file types to be stored?
    • e.g. is a JavaScript file allowed? an exe or other binary?
  3. Any restrictions on file size?
    • Are there limits (hard or practical) from Elasticsearch?
    • Any limits for other parts of the system,
    • Is there one limit for all types or different limits?
    • Should we validate the sizes of all assets before attempting to add any?
    • How do we deal with a file size violation?
      • Do we warn & skip the asset?
      • Does it invalidate the installation entirely?

Initial PR: #83391

ES index mappings

const packageAssetMappings = {
package_name: { type: 'keyword' },
package_version: { type: 'keyword' },
install_source: { type: 'keyword' },
asset_path: { type: 'keyword' },
media_type: { type: 'keyword' },
data_utf8: { type: 'text', index: false },
data_base64: { type: 'binary' },
};

@jfsiii jfsiii self-assigned this Nov 16, 2020
@jfsiii jfsiii added the Team:Fleet Team label for Observability Data Collection Fleet team label Nov 16, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

@skh
Copy link
Contributor

skh commented Nov 16, 2020

Should this use the default index like EPM's other saved objects or should it use a different (new?) one

How many saved objects will this create and how large will they be?

Any restrictions on file types to be stored?

Can this be offloaded to the (yet to be implemented) validator? I think we can safely restrict files to only allowing valid package contents.

Any restrictions on file size?

Those you mentioned -- restrictions by ES, by the saved objects implementation (if we use it). As long as this is a developer feature I'd probably lean towards warning and skipping.

@ph
Copy link
Contributor

ph commented Nov 16, 2020

@skh I am reading your last comment and you have mentioned that warning and skipping should be ok? I am not if this should be the behavior. If an assets is skipped it mean that my integration is probably invalid and I cannot test it?

@ph
Copy link
Contributor

ph commented Nov 17, 2020

@jfsiii I've looked at #81110 from @ruflin and this issue.

I don't think I understand how it will look like from both of you, can you take a bit of time to write it down your vision a google docs, maybe one or two pages? All the question you have raised should be in there and the document structure with pros and cons. And this make sure it solve the issues: upload installation, index pattern generation, multiple kibanas, removal of integration.

Initially it's to solve only our problem but maybe others project in Kibana would also benefit from it and if we have traces that would help sharing it and understand the decision.

I think that would simplify consuming the feature and would make sure we are aligned.

@jfsiii jfsiii reopened this Nov 17, 2020
@skh
Copy link
Contributor

skh commented Nov 18, 2020

I am not if this should be the behavior. If an assets is skipped it mean that my integration is probably invalid and I cannot test it?

My idea was that when an asset is too large to be stored, but was otherwise handled fine, a warning for the developer would be ok, but I understand your point.

I think checking for file sized could be done during the validation step, so ideally the part this issue is about should not ever run into files that are too large.

@neptunian
Copy link
Contributor

neptunian commented Nov 18, 2020

I think I asked something similar before, but what is the use case for needing to have install_source as a property in the saved object? Since only one package is installed at a time, I thought we'd only have one package and version in the storage at a time, regardless of its install source.

@skh
Copy link
Contributor

skh commented Nov 18, 2020

I think I asked something similar before, but what is the use case for needing to have install_source as a property in the saved object? Since only one package is installed at a time, I thought we'd only have one package and version in the storage at a time, regardless of its install source.

  • At the time the property was added, discussions about storing packages had just started, and we didn't know if we'd use it for registry packages as well.
  • It is valuable information for the user to know from which source the package came -- upload, our registry, at some point maybe some other registry.
  • When the package can't be found in the storage index (which is an error case but I'm sure it will happen), it is valuable to know if the package can be re-fetched from the registry, or if the user should be warned, or asked to re-upload the package.

@neptunian
Copy link
Contributor

To clarify, I was referring to the new storage saved object, not the epm-packages saved object. I suppose if we needed to rollback to the package in the storage index we would want to have that information again.

@skh
Copy link
Contributor

skh commented Nov 18, 2020

To clarify, I was referring to the new storage saved object

Thanks for the clarification! I was indeed referring to epm-packages in my answer.

@neptunian
Copy link
Contributor

Is there any concern we should have about creating a saved object for each asset in the package? Installing a package like AWS, for example, would create hundreds of saved objects.

@jfsiii
Copy link
Contributor Author

jfsiii commented Nov 22, 2020

@neptunian

Is there any concern we should have about creating a saved object for each asset in the package? Installing a package like AWS, for example, would create hundreds of saved objects.

I have switched to using plain ES documents.

I kept install_source in the document because I believe it's helpful metadata. As you said, only one version of a package asset will be in the storage index, but knowing the origin might be useful. It's not required to fetch the document or any other actions.

@neptunian
Copy link
Contributor

neptunian commented Nov 22, 2020

I think I asked something similar before, but what is the use case for needing to have install_source as a property in the saved object? Since only one package is installed at a time, I thought we'd only have one package and version in the storage at a time, regardless of its install source.

  • At the time the property was added, discussions about storing packages had just started, and we didn't know if we'd use it for registry packages as well.
  • It is valuable information for the user to know from which source the package came -- upload, our registry, at some point maybe some other registry.
  • When the package can't be found in the storage index (which is an error case but I'm sure it will happen), it is valuable to know if the package can be re-fetched from the registry, or if the user should be warned, or asked to re-upload the package.

I've discussed with John and I think it makes sense to drop the install source as part of the key name whether for the cache or the package storage. Since we are only storing the one installed, there shouldn't be more than one version of a package at a time so i don't think we need the install source as a key or as a way to search through the cache or storage.

@skh
Copy link
Contributor

skh commented Nov 23, 2020

I've discussed with John and I think it makes sense to drop the install source as part of the key name whether for the cache or the package storage. Since we are only storing the one installed, there shouldn't be more than one version of a package at a time so i don't think we need the install source as a key or as a way to search through the cache or storage.

While it is not needed for search, I still think the install source is interesting metadata to know where the package came from. Would keeping it hurt us in any way, and how?

@neptunian
Copy link
Contributor

I've discussed with John and I think it makes sense to drop the install source as part of the key name whether for the cache or the package storage. Since we are only storing the one installed, there shouldn't be more than one version of a package at a time so i don't think we need the install source as a key or as a way to search through the cache or storage.

While it is not needed for search, I still think the install source is interesting metadata to know where the package came from. Would keeping it hurt us in any way, and how?

Sorry, yes, I think we should keep it as metadata.

jfsiii pushed a commit that referenced this issue Dec 7, 2020
## Summary
Store package assets (from Registry or local upload) in Elasticsearch. Related to proposal [issue](#83426) & [document](https://docs.google.com/document/d/18XoS6CSl9UxxPPBt9LXuJngf1Jv-4tl3jY6l19U1yH8)

 * New `epm-packages-assets` saved objects are stored on `.kibana` index, like our existing saved object `epm-packages`
 * Asset id is uuid v5 based on the package name, package version & file path. See 1974324
 * Add a list of IDs of all the installed assets, to `epm-packages` saved object. Like the existing `installed_` properties.  [Example](https://github.com/elastic/kibana/pull/83391/files#diff-fa07cac51b6a49bf1e4824bc2250c9a77dac6c7d6b0a56020f559ef1ff9be25fR491-R512) from a test

<details><summary>Mapping for new Saved Object</summary>

https://github.com/elastic/kibana/blob/37f7b6ded747edb5cc487661b801c5e1c0a102a7/x-pack%2Fplugins%2Ffleet%2Fserver%2Fsaved_objects%2Findex.ts#L329-L339
</details>

<details><summary>Additional property on existing <code>epm-packages</code> Saved Object</summary>

https://github.com/elastic/kibana/blob/c4f27ab25715a225c710727a37f5f105364e2f72/x-pack/plugins/fleet/server/saved_objects/index.ts#L306-L312

 I don't think the saved object changes are strictly required. It can be removed without changing much about how things work

- Pros: 
      - Preserves accurate record of the assets added at installation time. Separates what assets are currently available for package-version from what was installed. They _should_ be the same, but things happen.
      - Avoids a query to get the installed assets before operating on them
- Cons:
      - size/noise? Could be tens or hundreds of ids
      - migration?
</details>

### More details

**When are saved objects added?**
During installation, after all other actions have succeeded, just before marking the save object as installed, we commit all the files from the package to ES

https://github.com/elastic/kibana/blob/37f7b6ded747edb5cc487661b801c5e1c0a102a7/x-pack%2Fplugins%2Ffleet%2Fserver%2Fservices%2Fepm%2Fpackages%2F_install_package.ts#L193-L198

**When are documents removed from the index?**

In the `removeInstallation` function which is called in response to a `DELETE /api/fleet/epm/packages/pkgkey`

https://github.com/elastic/kibana/blob/37f7b6ded747edb5cc487661b801c5e1c0a102a7/x-pack%2Fplugins%2Ffleet%2Fserver%2Fservices%2Fepm%2Fpackages%2Fremove.ts#L72

or a failed package (re-)installation

https://github.com/elastic/kibana/blob/bf068739acce044ac27902253e8fc31df621f081/x-pack%2Fplugins%2Ffleet%2Fserver%2Fservices%2Fepm%2Fpackages%2Finstall.ts#L145




**How are we using these assets?**
We're not, currently. Here's an example showing how we could update [`getFileHandler`](https://github.com/elastic/kibana/blob/514b50e4c2d7a3be79d77e73838ff57b6cf1304a/x-pack%2Fplugins%2Ffleet%2Fserver%2Froutes%2Fepm%2Fhandlers.ts#L101) to check for local assets before reaching out to the Registry if we wished. It's not DRY, but it does work

```typescript
const esDocRoot = `http://elastic:changeme@localhost:9200/${PACKAGE_ASSETS_INDEX_NAME}/_doc`;
const escapedDocId = encodeURIComponent(`${pkgName}-${pkgVersion}/${filePath}`);
const esRes = await fetch(`${esDocRoot}/${escapedDocId}`);
const esJson = await esRes.json();
if (esJson.found) {
  const asset: PackageAsset = esJson._source;
  const body = asset.data_utf8 || Buffer.from(asset.data_base64, 'base64');
  return response.ok({
    body,
    headers: {
      'content-type': asset.media_type,
      // should add our own `cache-control` header here
      // kibana default is prevents caching: `private, no-cache, no-store, must-revalidate`
      // #83631
    },
  });
}
```

### Checklist
_updated tests to include new saved object output, no tests added yet_
- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
jfsiii pushed a commit that referenced this issue Dec 7, 2020
## Summary
Store package assets (from Registry or local upload) in Elasticsearch. Related to proposal [issue](#83426) & [document](https://docs.google.com/document/d/18XoS6CSl9UxxPPBt9LXuJngf1Jv-4tl3jY6l19U1yH8)

 * New `epm-packages-assets` saved objects are stored on `.kibana` index, like our existing saved object `epm-packages`
 * Asset id is uuid v5 based on the package name, package version & file path. See 1974324
 * Add a list of IDs of all the installed assets, to `epm-packages` saved object. Like the existing `installed_` properties.  [Example](https://github.com/elastic/kibana/pull/83391/files#diff-fa07cac51b6a49bf1e4824bc2250c9a77dac6c7d6b0a56020f559ef1ff9be25fR491-R512) from a test

<details><summary>Mapping for new Saved Object</summary>

https://github.com/elastic/kibana/blob/37f7b6ded747edb5cc487661b801c5e1c0a102a7/x-pack%2Fplugins%2Ffleet%2Fserver%2Fsaved_objects%2Findex.ts#L329-L339
</details>

<details><summary>Additional property on existing <code>epm-packages</code> Saved Object</summary>

https://github.com/elastic/kibana/blob/c4f27ab25715a225c710727a37f5f105364e2f72/x-pack/plugins/fleet/server/saved_objects/index.ts#L306-L312

 I don't think the saved object changes are strictly required. It can be removed without changing much about how things work

- Pros: 
      - Preserves accurate record of the assets added at installation time. Separates what assets are currently available for package-version from what was installed. They _should_ be the same, but things happen.
      - Avoids a query to get the installed assets before operating on them
- Cons:
      - size/noise? Could be tens or hundreds of ids
      - migration?
</details>

### More details

**When are saved objects added?**
During installation, after all other actions have succeeded, just before marking the save object as installed, we commit all the files from the package to ES

https://github.com/elastic/kibana/blob/37f7b6ded747edb5cc487661b801c5e1c0a102a7/x-pack%2Fplugins%2Ffleet%2Fserver%2Fservices%2Fepm%2Fpackages%2F_install_package.ts#L193-L198

**When are documents removed from the index?**

In the `removeInstallation` function which is called in response to a `DELETE /api/fleet/epm/packages/pkgkey`

https://github.com/elastic/kibana/blob/37f7b6ded747edb5cc487661b801c5e1c0a102a7/x-pack%2Fplugins%2Ffleet%2Fserver%2Fservices%2Fepm%2Fpackages%2Fremove.ts#L72

or a failed package (re-)installation

https://github.com/elastic/kibana/blob/bf068739acce044ac27902253e8fc31df621f081/x-pack%2Fplugins%2Ffleet%2Fserver%2Fservices%2Fepm%2Fpackages%2Finstall.ts#L145




**How are we using these assets?**
We're not, currently. Here's an example showing how we could update [`getFileHandler`](https://github.com/elastic/kibana/blob/514b50e4c2d7a3be79d77e73838ff57b6cf1304a/x-pack%2Fplugins%2Ffleet%2Fserver%2Froutes%2Fepm%2Fhandlers.ts#L101) to check for local assets before reaching out to the Registry if we wished. It's not DRY, but it does work

```typescript
const esDocRoot = `http://elastic:changeme@localhost:9200/${PACKAGE_ASSETS_INDEX_NAME}/_doc`;
const escapedDocId = encodeURIComponent(`${pkgName}-${pkgVersion}/${filePath}`);
const esRes = await fetch(`${esDocRoot}/${escapedDocId}`);
const esJson = await esRes.json();
if (esJson.found) {
  const asset: PackageAsset = esJson._source;
  const body = asset.data_utf8 || Buffer.from(asset.data_base64, 'base64');
  return response.ok({
    body,
    headers: {
      'content-type': asset.media_type,
      // should add our own `cache-control` header here
      // kibana default is prevents caching: `private, no-cache, no-store, must-revalidate`
      // #83631
    },
  });
}
```

### Checklist
_updated tests to include new saved object output, no tests added yet_
- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
@EricDavisX
Copy link
Contributor

is this ready for testing in 8.0 @jfsiii ?

@ph
Copy link
Contributor

ph commented Feb 2, 2021

@jfsiii This should be closed right?

@jfsiii jfsiii closed this as completed Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team v8.0.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants