-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make designing pipelines easier by providing lists of compatible components #4754
Comments
I'm not opposed to this, and I appreciate that you wrote a prototype which includes generating the documentation, which I think is a requirement to help prevent the list from ever being stale. I have concerns about the specific way the relationships are defined in the prototype (I'd want to find a way to do it just existing the arguments/exports types instead of defining a new Metadata type), but overall I'm personally in favor of this proposal. |
There are some challenges with that approach:
I think that I may be able to overcome these by implementing a method that will infer the current prototype's Thoughts? |
@rfratto I've updated the prototype with an implementation that infers the |
@thampiotr - thank you so much for working on this! I've thought about the need for such a thing as well. With this solution we definitely help the users more than we currently do, but when users actually try to wire up their component, I still think they will struggle with how to do it exactly. For example, users may still not know how to point
Personally what I think might be even more useful is if each type listed in a component's doc hyperlinked to a new page which tells you exported attributes of components which can supply this type. It would probably save people from trying to find out what attribute to pipe to what other attribute. Also, a note on otelcol - many otelcol components accept logs, metrics and traces. But some only accept a subset of these 3 signal types. So we would need 3 different sections, for which components can accept the outputs of an otelcol component. |
There are some assumptions that can be made:
I believe these two assumptions can be combined to automatically build a component compatibility list without having to define a top-level metadata package. The most current approach seems better. My overall concern is whether we should consider it out of scope for the component package to be aware of the different component namespaces; this may make Flow feel more rigid such that adding new pipelines requires updating more code than it used to. An additional twist for how you can implement this is to generate component schemas of arguments and exports, and then build tooling on top of those schemas, such as generating compatible component documentation. This adds a layer of indirection for what you have now, but would allow the schemas to be used for other useful tools too, such as editors or config validators that don't import the project as a whole. |
I want this to also be used with targets, which are pull-based in config though.
I think we could come up with a convention using capsules (the way you describe above), make sure that Targets also work (they seem to be an exception?) and in case there appear some capsules that we don't want to be included, we could use a marker interface to exclude them?
That's a good idea! Even if this representation for now has only the fields we need, I think it would make sense to set up foundations for it in the future. I can take a look into this when we do actual implementation. |
Problem
When designing a telemetry pipeline, it is currently difficult to discover what components can connect to what other components.
Specifically, I believe we should focus on making it easier to design a telemetry pipeline - where we are concerned about a conceptually high-level data flow and transformations (e.g. discover files -> read logs -> add labels -> send logs to DB). This is in contrast to a lower-level details where component connections are used for further pipeline configuration (e.g. read a string from env variable and set it as a username argument).
Background
My initial thoughts were that a better naming convention that clearly differentiates between, for example, sources, transformers and sinks would allow to alleviate this problem. However, it quickly led to a rather rigid and verbose names while still leaving some confusion. A rename of nearly all the components would be a large breaking change.
Our current naming convention groups components into namespaces, which in most cases make it easier to narrow down the set of components one needs to look at. However, identifying potential links between components can frequently be challenging, and uncovering connections that span across namespaces can be even more difficult to achieve.
What makes it even harder to navigate is that there are two ways the data can flow into the component:
Similarly, data can leave the component through an export or written to another component's receiver that is passed to it as an argument. As a side note, this also leads to graphs in UI being somewhat confusing, where the data flow is not reflected by the direction of the arrows.
Proposal
Help to conceptually design pipelines (on a high level) by making it clear:
For example:
discovery.kubernetes
ordiscovery.gce
- accept nothing and output targetsdiscovery.relabel
- accepts targets and outputs targetsloki.source.file
- accepts targets and outputs Loki logsloki.process
orloki.relabel
- accepts Loki logs and outputs Loki logsloki.write
- accepts Loki logs and outputs nothingUse the above information to add an auto-generated section to every component's reference documentation page that will list:
Prototype
There is a prototype available here: #4753
Here's an example of how the generated docs look like:
The text was updated successfully, but these errors were encountered: