Managing schemas guide #1651

binarylogic · 2020-01-31T02:10:26Z

As part of completing the Initial Schema Support project we should deliver a detailed "Managing Schemas In Vector" guide. This will live in our new /guides section and serve as the authoritative document on how to manage and enforce schemas with Vector. This is especially important for downstream sinks that require a strict schema (clickhouse, ORC encoding, etc). Ideally, this guide would start simple and progress to more advanced strategies. For example:

Start with defining common global names for fields (Global options for schema field names (message, host, timestamp, etc) #1446). Ex: it might make more sense for a user to use @timestamp instead of timestamp.
Then provide the ability to simply whitelist/blacklist fields. Maybe this is a transform, or maybe this is part of the sink encoding options? (Private fields used for processing that are not encoded #1448)
Then provide a way to rename fields and shape events. (New shape transform #750).
Then we could progress to introducing types and using the new coercer transform changes. (feat(coercer transform): Add option to drop unspecified fields #1636).
Then we could suggest defining custom Protobuf schemas (Support protobuf transform #1472) or even JSON schemas (New schema transform #165 (comment)).
And finally, integration with a fully managed schema service, like AWS Glue. (New aws_glue transform #751). This is a viable option when that schema is strictly managed and also used to query the data.

This is just a rough example of how I see a guide like this progressing. I fully expect for this to change as we work through the project.

Pro-tip: it might make sense to create this guide outline first, as sort of a specification for the work involved in this project I'll leave that up to you though.

The text was updated successfully, but these errors were encountered:

binarylogic added type: task Generic non-code related tasks domain: website labels Jan 31, 2020

binarylogic added this to the Initial schema support milestone Jan 31, 2020

binarylogic assigned Hoverbear Jan 31, 2020

binarylogic added the domain: guides label Jan 31, 2020

Hoverbear mentioned this issue Feb 7, 2020

docs(config): Schema Guide #1745

Merged

Hoverbear linked a pull request Mar 10, 2020 that will close this issue

docs(config): Schema Guide #1745

Merged

github-actions bot removed the domain: guides label Mar 26, 2020

binarylogic closed this as completed in #1745 Apr 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Managing schemas guide #1651

Managing schemas guide #1651

binarylogic commented Jan 31, 2020 •

edited

Loading

Managing schemas guide #1651

Managing schemas guide #1651

Comments

binarylogic commented Jan 31, 2020 • edited Loading

binarylogic commented Jan 31, 2020 •

edited

Loading