Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managing schemas guide #1651

Closed
binarylogic opened this issue Jan 31, 2020 · 0 comments · Fixed by #1745
Closed

Managing schemas guide #1651

binarylogic opened this issue Jan 31, 2020 · 0 comments · Fixed by #1745
Assignees
Labels
type: task Generic non-code related tasks

Comments

@binarylogic
Copy link
Contributor

binarylogic commented Jan 31, 2020

As part of completing the Initial Schema Support project we should deliver a detailed "Managing Schemas In Vector" guide. This will live in our new /guides section and serve as the authoritative document on how to manage and enforce schemas with Vector. This is especially important for downstream sinks that require a strict schema (clickhouse, ORC encoding, etc). Ideally, this guide would start simple and progress to more advanced strategies. For example:

  1. Start with defining common global names for fields (Global options for schema field names (message, host, timestamp, etc) #1446). Ex: it might make more sense for a user to use @timestamp instead of timestamp.
  2. Then provide the ability to simply whitelist/blacklist fields. Maybe this is a transform, or maybe this is part of the sink encoding options? (Private fields used for processing that are not encoded #1448)
  3. Then provide a way to rename fields and shape events. (New shape transform #750).
  4. Then we could progress to introducing types and using the new coercer transform changes. (feat(coercer transform): Add option to drop unspecified fields #1636).
  5. Then we could suggest defining custom Protobuf schemas (Support protobuf transform #1472) or even JSON schemas (New schema transform #165 (comment)).
  6. And finally, integration with a fully managed schema service, like AWS Glue. (New aws_glue transform #751). This is a viable option when that schema is strictly managed and also used to query the data.

This is just a rough example of how I see a guide like this progressing. I fully expect for this to change as we work through the project.

Pro-tip: it might make sense to create this guide outline first, as sort of a specification for the work involved in this project I'll leave that up to you though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: task Generic non-code related tasks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants