Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Add component specification #8858

Merged
merged 19 commits into from
Aug 30, 2021
Merged

chore: Add component specification #8858

merged 19 commits into from
Aug 30, 2021

Conversation

binarylogic
Copy link
Contributor

@binarylogic binarylogic commented Aug 24, 2021

Closes #8728
Closes #8709

This introduces the beginnings of a Vector component specification. The immediate need for this is to drive improvements around Vector's internal instrumentation. Issues like #8839, #8710, #8709, and #8693 all expose inconsistencies in Vector's internal instrumentation that presents problems for future work based on this data.

Signed-off-by: Ben Johnson <[email protected]>
@netlify
Copy link

netlify bot commented Aug 24, 2021

✔️ Deploy Preview for vector-project canceled.

🔨 Explore the source changes: 8318f12

🔍 Inspect the deploy log: https://app.netlify.com/sites/vector-project/deploys/612967146c5b4b0008582c71

@bruceg bruceg self-requested a review August 24, 2021 16:07
Copy link
Member

@bruceg bruceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pretty minimal. I think it would be worth noting the standard tags logs and metrics are annotated with through the calling context. Should we also mandate counters on events/bytes out for transforms? There is also precedent for discarded events in some components. Some discussion of standardized attributes for logs would be useful as well (ex all error logs must have an error field, etc).

docs/specs/component.md Show resolved Hide resolved
@binarylogic
Copy link
Contributor Author

@bruceg this is the very beginnings of the spec (hence the draft status). I wanted to get feedback on the structure before fleshing it out. I just wanted to make sure you were aware of that. I plan to add many more events and metrics to this spec.

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good start to me. I like the direction it is heading.

docs/specs/component.md Outdated Show resolved Hide resolved
docs/specs/component.md Outdated Show resolved Hide resolved
docs/specs/component.md Show resolved Hide resolved
@binarylogic binarylogic marked this pull request as ready for review August 25, 2021 17:45
@binarylogic
Copy link
Contributor Author

@jszwedko @bruceg this is ready for review. Once we generally align, I'd like to include the rest of the team. My goal was to set the foundation so that @bruceg could take over with his upcoming observability epic. Once this is agreed upon we will need to prioritize which components we want audit and fix.

Signed-off-by: Ben Johnson <[email protected]>
Signed-off-by: Ben Johnson <[email protected]>
Signed-off-by: Ben Johnson <[email protected]>
Signed-off-by: Ben Johnson <[email protected]>
docs/specs/component.md Outdated Show resolved Hide resolved

## Configuration

### Options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is still as stated on #8877 (comment):

  • endpoint(s) for when a URL is used to configure a remote destination that vector is communicating with
  • address for when a simple ip:port combo is used to configure a remote destination
  • bind for any time we are binding to an address

I think bind (or listen) makes it clearer that Vector is binding to the port. address seems, to me, more suitable for outgoing connections.

I think having endpoint separate and only representing URLs is also clearer that you need a full URL and not just a host:port.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What endpoint do we have that is totally separate from address?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What endpoint do we have that is totally separate from address?

Ah, sorry, I just mean the name should be distinct. That is we should call URLs endpoint in the config (like for the http sink). For components that send raw data to an ip:port I think address is a more suitable name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 I'd favor endpoints for both, personally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I feel more strongly about using bind (or listen) for options where Vector binds the given value than endpoint vs. address. I do think the distinction is useful though, so users immediately know what is expected and are less likely to, for example, think that they should provide a URL scheme to an address value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree on the endpoint vs address distinction, I am still not a fan of either bind or listen. No existing component uses bind as the configured local address, which means all users configuring one of those components must change their config. Granted, we will be breaking metric names, but that doesn't make a component fail to work after upgrading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there will be a fight to the death over this naming

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've yanked the option under dispute so we can move forward on the rest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which option was yanked? I didn't see any new commits and it seems the same to me since last time I looked.

Granted, we will be breaking metric names, but that doesn't make a component fail to work after upgrading.

In both the rename to bind case and the metric name case, I was thinking we'd maintaining aliases for backwards compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I pushed but missed the error message saying my HEAD was not up to date. Fixed.

docs/specs/component.md Outdated Show resolved Hide resolved
docs/specs/component.md Outdated Show resolved Hide resolved
@binarylogic
Copy link
Contributor Author

@bruceg officially transitioning off of this. Can you handle the feedback and final changes?

@bruceg
Copy link
Member

bruceg commented Aug 25, 2021

Will do. 👍🏻

@bruceg
Copy link
Member

bruceg commented Aug 25, 2021

It looks like we need to come to an agreement on the naming of the counters. The internal pattern had been THING_in/out_total, which leads to names like bytes_in_total, while other systems (notably Prometheus, as indicated above) use THING_UNITS_total, which would lead to names like received_bytes_total. I would tend to prefer to maintain our current pattern, but the argument from an external precedent is strong, and has other benefits for predictability.

@binarylogic
Copy link
Contributor Author

@bruceg, I'll defer to you and @jszwedko to settle on naming. I'm indifferent. I just wanted to give context on why I chose the names I did.

binarylogic and others added 7 commits August 26, 2021 14:56
Co-authored-by: Jesse Szwedko <[email protected]>
Signed-off-by: Bruce Guenter <[email protected]>
Signed-off-by: Bruce Guenter <[email protected]>
Signed-off-by: Bruce Guenter <[email protected]>
Signed-off-by: Bruce Guenter <[email protected]>
Signed-off-by: Bruce Guenter <[email protected]>
@bruceg bruceg force-pushed the component-specification branch from 2cc7123 to 7111bec Compare August 26, 2021 20:56
@bruceg
Copy link
Member

bruceg commented Aug 26, 2021

Hmm, @binarylogic the commit you did through GitHub is blocking the DCO check. Do you want to fix that (I tried to sign it myself, but it's not satisfied), or should I just squash it down?

@binarylogic
Copy link
Contributor Author

No need to enforce the DCO for employed Vector team members.

docs/specs/component.md Outdated Show resolved Hide resolved
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, work. This is coming together and feeling more consistent. I still have that one outstanding question about the naming of the address/endpoint/bind fields 😅 , but otherwise I think this looks good.


## Configuration

### Options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a fan of what I have above:

  • endpoint(s) for when a URL is used to configure a remote destination that vector is communicating with
  • address for when a simple ip:port combo is used to configure a remote destination
  • bind for any time we are binding to an address

I think this provides the most clarity in option names that will make it more self-evident what the options are for and what the syntax of the value should be without having to refer to docs.

docs/specs/component.md Show resolved Hide resolved
@bruceg bruceg requested a review from jszwedko August 27, 2021 21:08
Copy link
Contributor

@spencergilbert spencergilbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with everything, and also agree with some consistent naming (address v endpoint) - with less personal care about what we land on there.

The discussion over this option was blocking moving forward on the rest
of this specification, which is more important than nailing down every
configuration option.

Signed-off-by: Bruce Guenter <[email protected]>
@bruceg bruceg merged commit c6c0ea7 into master Aug 30, 2021
@bruceg bruceg deleted the component-specification branch August 30, 2021 20:13
jdrouet pushed a commit that referenced this pull request Sep 6, 2021
* Kick off the specification

Signed-off-by: Ben Johnson <[email protected]>

* Add a new BytesReceived event

Signed-off-by: Ben Johnson <[email protected]>

* Better language for event implementation leeway

Signed-off-by: Ben Johnson <[email protected]>

* Final touches for specification foundation

Signed-off-by: Ben Johnson <[email protected]>

* Error logs

Signed-off-by: Ben Johnson <[email protected]>

* Comma separated

Signed-off-by: Ben Johnson <[email protected]>

* Finish scope section

Signed-off-by: Ben Johnson <[email protected]>

* Rename metric

Signed-off-by: Ben Johnson <[email protected]>

* Update endpoint option wording

Signed-off-by: Bruce Guenter <[email protected]>

* Update docs/specs/component.md

Co-authored-by: Jesse Szwedko <[email protected]>
Signed-off-by: Bruce Guenter <[email protected]>

* Wording tweaks

Signed-off-by: Bruce Guenter <[email protected]>

* Address feedback

Signed-off-by: Bruce Guenter <[email protected]>

* Address feedback

Signed-off-by: Bruce Guenter <[email protected]>

* Fix the other byte_size

Signed-off-by: Bruce Guenter <[email protected]>

* Fix counter naming pattern to `DESCRIPTOR_UNITS_total`

Signed-off-by: Bruce Guenter <[email protected]>

* Change endpoint wording and http_path

Signed-off-by: Bruce Guenter <[email protected]>

* Update docs/specs/component.md

* Remove the `address` configuration specification.

The discussion over this option was blocking moving forward on the rest
of this specification, which is more important than nailing down every
configuration option.

Signed-off-by: Bruce Guenter <[email protected]>

Co-authored-by: Bruce Guenter <[email protected]>
Co-authored-by: Jesse Szwedko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a Vector component specification Define what processed_bytes_total means for sources and sinks
4 participants