Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Change Proposal][Discuss] Support including sample data in packages #348

Closed
jsoriano opened this issue Jun 1, 2022 · 4 comments
Closed
Labels
discuss Issue needs discussion Team:Ecosystem Label for the Packages Ecosystem team

Comments

@jsoriano
Copy link
Member

jsoriano commented Jun 1, 2022

Currently we allow to include a sample document that is used for documentation purposes. The proposal would be to include more sample data in packages.

This data could be stored in packages in a directory as ndjson files. This data wouldn't be installed by default, but Kibana/Fleet or other tools could make use of this custom data.

To avoid it increasing too much the size of packages, an alternative could be to include this custom data in a new dataset type of package (as the one proposed in #346), that could include some reference to other packages. So if a user or feature needs sample data for apache, dataset packages containing this data could be automatically discovered.

elastic-package could include some tool to export data in the expected format.

This sample data could be used for:

@jsoriano jsoriano added discuss Issue needs discussion Team:Ecosystem Label for the Packages Ecosystem team labels Jun 1, 2022
@ruflin
Copy link
Contributor

ruflin commented Jun 2, 2022

Another way to think of the concept you describe above is that packages can reference each other. Basically every package can have data but apache-sample-data has a reference to the apache package and recommends to have it installed too.

An alternative proposal is to have references to large files in packages. Either the reference could be manual as a url and the dev is responsible to upload it to the right place or as part of the building / publishing step, these assets are taken out and a reference is added. During the publishing process, it could maybe even decided if the asset should be taken out or not. This would mean the apache package would always reference to sample data but as long as the sample data is not changing, it is only published once.

@jsoriano
Copy link
Member Author

jsoriano commented Jun 2, 2022

The problem I see with referenced files is that we would need to think on where to host these files. And we need to think what happens if they stop being available, or if they are unexpectedly modified.
Using packages for this allows us to reuse everything we are building for packages.

@majagrubic
Copy link

To reiterate on the above, I don't see much benefit of hosting the data somewhere externally and including a URL in the package vs hosting the data externally and downloading it upon user's request directly to Kibana. One of the benefits why we considered packages in the first place is that they solve a lot of problems that hosting data that way would bring (scalability / monitoring / latency / security).

@jsoriano
Copy link
Member Author

jsoriano commented Jun 3, 2022

Closing this one by now to avoid having duplicate discussions, sorry if it caused confusion.

Let's continue the discussion in #346, and we can reopen this one if we find significative differences or some follow up is needed.

@jsoriano jsoriano closed this as completed Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs discussion Team:Ecosystem Label for the Packages Ecosystem team
Projects
None yet
Development

No branches or pull requests

3 participants