feat(codecs): Add syslog encoder #21307

polarathene · 2024-09-18T00:34:42Z

Replaces: #17668

Each commit has a scoped change. I built off the from the great work @syedriko originally put together and refactored that heavily, applying the review feedback I had provided there among other fixes.
Each commit provides additional context for the changes. Thus if something is unclear in the review diff, using git blame on my branch (Git web UI works fine for that) may provide the extra context needed 👍

NOTE:

This PR has been opened at the request of others for providing feedback on the general direction and approach.
The branch does not yet reflect the current progress. I had been working on a refactor of some changes earlier this year that ran into a blocker to completion for committing. I intended to resolve that first, but have not yet been able to afford the time.
As I have not been actively working on this feature for a while, I may not be able to respond to some feedback as well as I would like to at this stage.

I'll try to allocate time within October to get those changes applied to this branch, along with a proper PR description at that point.

The conflicts were already resolved locally IIRC, they'll be taken care of with the future update.

Original commit from syedriko

This is only a temporary change to make the diffs for future commits easier to follow.

- Introduce a `Pri` struct with fields for severity and facility as enum values. - `Pri` uses `strum` crate to parse string values into their appropriate enum variant. - Handles the responsibility of encoding the two enum values ordinal values into the `PRIVAL` value for the encoder. - As `Facility` and `Severity` enums better represent their ordinal mapping directly - The `Fixed` + `Field` subtyping with custom deserializer isn't necessary. Parsing a string that represents the enum by name or its ordinal representation is much simpler. - Likewise this removes the need for the get methods as the enum can provide both the `String` or `u8` representation as needed.

`SyslogSerializer::encode()` has been simplified. - Only matching `Event::Log` is relevant, an `if let` bind instead of `match` helps remove a redundant level of nesting. - This method only focuses on boilerplate now, delegating the rest to `ConfigDecanter` (_adapt `LogEvent` + encoder config_) and `SyslogMessage` (_encode into syslog message string_). - This removes some complexity during actual encoding logic, which should only be concerned about directly encoding from one representation to another, not complimentary features related to Vector config or it's type system. The new `ConfigDecanter` is where many of the original helper methods that were used by `SyslogSerializer::encode()` now reside. This change better communicates the scope of their usage. - Any interaction with `LogEvent` is now contained within the methods of this new struct. Likewise for the consumption of the encoder configuration (instead of queries to config throughout encoding). - The `decant_config()` method better illustrates an overview of the data we're encoding and where that's being sourced from via the new `SyslogMessage` struct, which splits off the actual encoding responsibility (see next commit).

`SyslogSerializerConfig` has been simplified. - Facility / Severity deserializer methods aren't needed, as per their prior refactor with `strum`. - The `app_name` default is set via `decant_config()` when not configured explicitly. - The other two fields calling a `default_nil_value()` method instead use an option value which encodes `None` into the expected `-` value. - Everything else does not need a serde attribute to apply a default, the `Default` trait on the struct is sufficient. - `trim_prefix` was removed as it didn't seem relevant. `tag` was also removed as it's represented by several subfields in RFC 5424 which RFC 3164 can also use. `SyslogMessage::encode()` refactors the original PR encoding logic: - Syslog Header fields focused, the PRI and final message value have already been prepared prior. They are only referenced at the end of `encode()` to combine into the final string output. - While less efficient than `push_str()`, each match variant has a clear structure returned via the array `join(" ")` which minimizes the noise of `SP` from the original PR. Value preparation prior to this is clear and better documented. - `Tag` is a child struct to keep the main logic easy to grok. `StructuredData` is a similar case.

No changes beyond relocating the code into a single file.

- Drop notes referring to original PR differences + StructuredData adaption references. None of it should be relevant going forward. - Revise some other notes. - Drop `add_log_source` method (introduced from the original PR author) in favor of using `StructuredData` support instead.

This should be simple and lightweight enough to justify for the DRY benefit? This way the method doesn't need to be duplicated redundantly. That was required because there is no trait for `FromRepr` provided via `strum`. That would require a similar amount of lines for the small duplication here. The `akin` macro duplicates the `impl` block for each value in the `&enums` array.

- `ConfigDecanter::get_message()` replaces the fallback method in favor of `to_string_lossy()` (a dedicated equivalent for converting `Value` type to a String type (_technically it is a CoW str, hence the follow-up with `to_string()`_)). - This also encodes the value better, especially for the default `log_namespace: false` as the message value (when `String`) is not quote wrapped, which matches the behaviour of the `text` encoder output. - Additionally uses the `LogEvent` method `get_message()` directly from `lib/vector-core/src/event /log_event.rs`. This can better retrieve the log message regardless of the `log_namespace` setting. - Encoding of RFC 5424 fields has changed to inline the `version` constant directly, instead of via a redundant variable. If there's ever multiple versions that need to be supported, it could be addressed then. - The RFC 5424 timestamp has a max precision of microseconds, thus this should be rounded and `AutoSi` can be used (_or `Micros` if it should have fixed padding instead of truncating trailing `000`_).

- The original PR author appears to have relied on a hard-coded timestamp key here. - `DateTime<Local>` would render the timestamp field with the local timezone offset, but other than that `DateTime<Utc>` would seem more consistent with usage in Vector, especially since any original TZ context is lost by this point? - Notes adjusted accordingly, with added TODO query for each encoding mode to potentially support configurable timezone.

- Move encoder config settings under a single `syslog` config field. This better mirrors configuration options for existing encoders like Avro and CSV. - `ConfigDecanter::value_by_key()` appears to accomplish roughly the same as the existing helper method `to_string_lossy()`. Prefer that instead. This also makes the `StructuredData` helper `value_to_string()` redundant too at a glance? - Added some reference for the priority value `PRIVAL`. - `Pri::from_str_variants()` uses the existing defaults for fallback, communicate that more clearly. Contextual note is no longer useful, removed.

To better communicate the allowed values, these two config fields can change from the `String` type to their appropriate enum type. - This relies on serde to deserialize the config value to the enum which adds a bit more noise to grok. - It does make `Pri::from_str_variants()` redundant, while the `into_variant()` methods are refactored to `deserialize()` with a proper error message emitted to match the what serde would normally emit for failed enum variant deserialization. - A drawback of this change is that these two config fields lost the ability to reference a different value path in the `LogEvent`. That'll be addressed in a future commit.

In a YAML config a string can optionally be wrapped with quotes, while a number that isn't quote wrapped will be treated as a number type. The current support was only for string numbers, this change now supports flexibility for config using ordinal values in YAML regardless of quote usage. The previous `Self::into_variant(&s)` logic could have been used instead of bringing in `serde-aux`, but the external helper attribute approach seems easier to grok/follow as the intermediary container still seems required for a terse implementation. The match statement uses a reference (_which requires a deref for `from_repr`_) to appease the borrow checker for the later borrow needed by `value` in the error message.

This seems redundant given the context? Mostly adds unnecessary noise. Could probably `impl Configurable` or similar to try workaround the requirement. The metadata description could generate the variant list similar to how it's been handled for error message handling?

Not sure if this is worthwhile, but it adopts error message convention elsewhere I've seen by managing them via Snafu.

bits-bot · 2024-09-18T00:34:46Z

All committers have signed the CLA.

gaby · 2024-11-07T03:39:39Z

@polarathene Any updates on this ?

polarathene · 2024-11-08T00:42:57Z

Any updates on this ?

No sorry. Several unexpected events happened in a short time period, I was unsuccessful at avoiding burnout which I'm still in the process of exiting from.

Context

FWIW, I over committed myself in 2023, while facing several setbacks and issues in 2024. This isn't the only work where I'm pinged for updates to move things forward that have come to a halt.

I do understand the frustration of wanting to have this become actively developed again and merged, but without any assistance it's a task I struggle to allocate time for. More on that below.

PR was opened due to demand

When I raised a concern about the lack of feedback received thus far, it was insisted that I open a PR, even if it's not in the state desired for handing to reviewers.

#17668 (comment)

Efforts are appreciated but AFAIK it is not equivalent to a PR; I'm pretty certain you are only able to comment and provide input with a pull request.

#17668 (comment)

in order for reviewers to leave proper feedback for a branch it needs to have an issue or PR ticket. This would allow reviewers to make comparison between branches and might proceed with the better one.

However opening the PR early has made no difference vs the diff link I had previously provided. Which was anticipated:

#17668 (comment)

When you say that we have nothing and making no progress without a PR, are you offering to participate in review?
So far no one has provided any feedback on the linked branch/diff, so if a new PR left in a WIP status is opened will anything change?

My previous input

None of this has changed.

I've been seeking feedback/guidance since July but none received yet, even after opening the PR as requested.
I still want to work on it, but it is difficult to prioritize in my current situation.

#17668 (comment)

I was hoping to just get general feedback if it was going in the right direction or if there was anything that stood out at a glance.

#17668 (comment)

It would probably help to know how you'd like to use/configure it.

How important is it to have config for setting default/fallback values?

How important is it to have customization for keys vs managing it all via VRL?

Feedback on the last shared iteration would also be helpful to know if that's going in the right direction or not. I might have more time spare by October or November to finish up my branch.

#17668 (comment)

I'd love to finish the PR sooner but have no employer at present, thus it's tricky to afford dedicating time to this over other priorities 😓

#17668 (comment)

My work on it is purely voluntary, no employer backing, so dedicating my spare time on it is tricky 😓

#17668 (comment)

I do want to complete the work, it's just been difficult to budget my volunteer time for.
So if nobody helps, I will eventually be able to block time out again for coming back to this. I sunk a considerable chunk of time into the branch already, would be an awful waste to leave it to rot.

Personal friction

When it comes to finding time to allocate to this PR, there's a bit more to it than covered below, but it's effectively a high cost item for me to tackle.

I need to have:

No other events with higher priorities to resolve, or that unintentionally become time sinks (ADHD related)
Sufficient time to context switch to this PR / task, and the energy to figure out and resolve any blocker or how to move forward (receiving the feedback and guidance requested would be helpful).
Sufficient hardware resources. Often I am lacking enough memory on my system to confidently work on this PR. I juggle quite a lot and experience crashes where I lose a fair amount of work (I do make an effort to save and add notes, but it's a tad complicated). I also have very low storage spare on the system which has been a point of friction in the past for this PR.

As such, the most optimal time is when I have nothing pressing elsewhere and I'm able to save everything I can to perform a system reboot (I only have a few of these a year as preparing for one can take multiple days for me). I'd much rather have projects properly isolated to individual VMs which would address quite a lot of these problems and allow me to work much more efficiently, but that requires time.

If I had a separate system temporarily dedicated for working on this PR I could tackle it much more easily. Presently my goals are:

Get a new phone (the one I've used since 2018 broke a few days ago)
Finish up a documentation PR
Prepare for system reboot and hope my system boots (sometimes it appears bricked, or it fails in other ways).
Block out time for Vector and hope nothing interrupts getting to the sync point to update this PR branch.

Optimistically the system reboot will be completed by the 17th if nothing else interferes, so there should be a new push on this branch before the end of the month.

If that happens, I'd toggle of the draft status and ping for review/feedback. If I receive feedback in a timely manner progress will be quick from then on, but if there's a notable delay (over a week) I'd likely have switched to something else and it could take a while to return back to the PR. I'm slightly concerned about timing as we approach December, but hopefully we can get this PR back on track 💪

jcantrill · 2024-11-08T14:12:49Z

@StephenWakely you commented in the previous PR, is it possible for you to provide additional feedback? My limited familiarity with RUST makes my input less then useful

StephenWakely · 2024-11-13T15:02:48Z

Sorry, I had assumed that since the PR is in draft you weren't looking for feedback yet, but I can definitely take a look

polarathene · 2024-11-13T19:30:35Z

I had assumed that since the PR is in draft you weren't looking for feedback yet

Not after anything too deep, just a glance over by someone more familiar with Vector. My local branch still has some changes to get pushed up that refactor some of the current changes, so only a light review for providing guidance / expectations is needed.

I left it in draft due to not having found time to sync my local branch to this PR yet, and that the PR itself is still not ready to merge as it's not functioning as desired. I ran into a blocker on the local branch earlier this year and will bring that up to discuss here or on Discord once the branch is sync'd (presently the local changes are only partially committed / stashed).

I'm still on schedule per my prior comment timeline for returning to this PR next week. Ideally someone can look over and provide feedback before Tuesday.

polarathene

This is some parts I can recall that I would like to receive feedback on, I'll bring up other concerns next week.

polarathene · 2024-11-13T19:35:35Z

lib/codecs/Cargo.toml

@@ -10,11 +10,13 @@ name = "generate-avro-fixtures"
 path = "tests/bin/generate-avro-fixtures.rs"

 [dependencies]
+akin = { version = "0.4", optional = true }


I'd like to know if there's any policies about the additional crate deps I've added to support the PR. They're all marked optional and assigned to the syslog feature.

I could minimize the deps but it'd make the feature code itself less maintainer friendly.

Generally if it is needed and the license is ok then we are fine to pull it in like you have done here.

However, I don't really see that this crate is adding enough to warrant it's inclusion. It's only used in one place from what I can see. There it would be more idiomatic to just use a simple macro:

macro_rules! deserialize_impl { ($enum:ty) => { impl $enum { fn deserialize<'de, D>(deserializer: D) -> Result<Self, D::Error> ... } deserialize_impl!(Facility); deserialize_impl!(Severity);

lib/codecs/src/encoding/format/syslog.rs

polarathene · 2024-11-13T21:34:40Z

lib/codecs/src/encoding/format/syslog.rs

+/// Syslog serializer options.
+#[configurable_component]
+#[derive(Clone, Debug, Default)]
+// Serde default makes all config keys optional.
+// Each field assigns either a fixed value, or field name (lookup field key to retrieve dynamic value per `LogEvent`).
+#[serde(default)]
+pub struct SyslogSerializerOptions {
+    /// RFC
+    rfc: SyslogRFC,
+    /// Facility
+    #[serde(deserialize_with = "Facility::deserialize")]
+    facility: Facility,
+    /// Severity
+    #[serde(deserialize_with = "Severity::deserialize")]
+    severity: Severity,
+
+    /// App Name
+    app_name: Option<String>,
+    /// Proc ID
+    proc_id: Option<String>,
+    /// Msg ID
+    msg_id: Option<String>,
+
+    /// Payload key
+    payload_key: String,
+
+    // Q: The majority of the fields above pragmatically only make sense as config for keys to query?
+}


This was adapted from the original PR, payload_key has since been removed in favor of the actual Vector log message when available or could be set via VRL.

In my local branch I was looking into having most of these field that have an equivalent syslog struct field to instead share the same struct:

pub struct SyslogSerializerOptions { /// RFC rfc: SyslogRFC, #[serde(flatten)] #[configurable(derived)] priority: Pri, #[serde(flatten)] #[configurable(derived)] tag: Tag, }

There was another variation that instead of configuring overrides to the defaults, would use ConfigTargetPath as a more appropriate config approach by allowing the user to adjust the VRL keys to map into the syslog struct fields:

pub struct SyslogSerializerOptions { /// Facility #[default = "facility"] facility: ConfigTargetPath, /// Severity #[default = "severity"] severity: ConfigTargetPath, /// Application Name #[default = "appname"] app_name: ConfigTargetPath, /// Process ID #[default = "procid"] proc_id: ConfigTargetPath, /// Message ID #[default = "msgid"] msg_id: ConfigTargetPath, /// Hostname #[default = "hostname"] hostname: ConfigTargetPath, /// Message message: Option<ConfigTargetPath>, /// Timestamp timestamp: Option<ConfigTargetPath>, /// Structured Data structured_data: Option<ConfigTargetPath>, }

Probably the right way to go?

Really these values should ideally be extracted by their meaning from the event. I've added some links to the comments I made on the other PR about them.

I added some explanation of what we actually mean by meaning here. The concept of meaning is in a bit of transition in Vector at the minute, hence perhaps it's not completely clear how it all fits in - but hopefully in time it will!

StephenWakely

Thanks for your efforts in putting this together! I appreciate it has been a bit of journey.

Mostly the changes needed are to make it work better when we have log_namespace: true set.

There's a few other checks that need to pass in order to merge this. See here for more details.

Any questions, please give me a shout.

StephenWakely · 2024-11-14T11:27:31Z

lib/codecs/src/encoding/format/syslog.rs

+    rfc: SyslogRFC,
+    /// Facility
+    #[serde(deserialize_with = "Facility::deserialize")]
+    facility: Facility,


What I said here should apply to this.

StephenWakely · 2024-11-14T11:27:55Z

lib/codecs/src/encoding/format/syslog.rs

+    facility: Facility,
+    /// Severity
+    #[serde(deserialize_with = "Severity::deserialize")]
+    severity: Severity,


What I said here should apply to this.

StephenWakely · 2024-11-14T11:28:16Z

lib/codecs/src/encoding/format/syslog.rs

+    msg_id: Option<String>,
+
+    /// Payload key
+    payload_key: String,


#17668 (comment)

| Remove this option. This should always be the field with the meaning message.

I would disagree with this suggestion as it reduces the flexibility of using the protocol. This implies you always want the "message" portion of the payload to be what vector determines is the "message" and not for instance how an administrator may have transformed it in a prior pipeline to be something else. The default if not specified IMO should be "message" but allow the flexibility; there is no reason not to

This implies you always want the "message" portion of the payload to be what vector determines is the "message" and not for instance how an administrator may have transformed it in a prior pipeline to be something else.

Could you elaborate why you'd have introduced a difference in semantics? Please document a valid use-case for why you'd want such functionality so we know it's not an XY problem.

When would you transform the log message to not represent the message, yet provide it as input to an encoder to but require a different key for the actual message content?

I am familiar with the original PR use with add_source to compose metadata related to k8s with the actual message. You could do that with VRL, or for the metadata leverage structured data?

Could you elaborate why you'd have introduced a difference in semantics? Please document a valid use-case for why you'd want such functionality so we know it's not an XY problem.

Vector is configurable as a transformation engine. Logs coming in do not need to equal logs going out. Consider almost any case where my desired payload is a subset of the original message, a value in the message which was a JSON string. In a prior transform, I could parse the message and now it is available to me as key/value pairs, where I only care about some portion of the message.

By removing this option does it mean in some prior transform do I always have to modify the event and set the key "message" to the desired outcome? Maybe I'm missing where the encoding ends and transforms and sinks begin 🤷‍♂️

By removing this option does it mean in some prior transform do I always have to modify the event and set the key "message" to the desired outcome? Maybe I'm missing where the encoding ends and transforms and sinks begin 🤷‍♂️

Encoding is A => B, given input A produce output B (syslog formatted log in this case).

Perhaps I should phrase this a bit differently?:

If Vector supports encoding your input to other existing log encoders the way you want, you should expect to get the same with this syslog encoder.

If it does not, then there would need to be a very valid reason why this encoder is an exception vs what you'd otherwise need to do with any other encoder already supported by Vector.

As such, you should find that what you're requesting isn't specific to this encoder implemented via the PR, but something one would expect to be encoder agnostic.

That would be a separate feature request to raise, which may be rejected on the basis that VRL would be advised for your use-case to pre-process your input for the encoder instead.

I'm not an active Vector user myself yet, but from what I do understand message is an internal convention and the encoders rely on that to be semantically correct as input, thus you'd be expected to use existing Vector features to handle that prior to encoding.

@StephenWakely will chime in if I'm mistaken, and you're also welcome to correct me too.

StephenWakely · 2024-11-14T11:28:35Z

lib/codecs/src/encoding/format/syslog.rs

+    severity: Severity,
+
+    /// App Name
+    app_name: Option<String>,


#17668 (comment)

StephenWakely · 2024-11-14T11:28:52Z

lib/codecs/src/encoding/format/syslog.rs

+    /// App Name
+    app_name: Option<String>,
+    /// Proc ID
+    proc_id: Option<String>,


#17668 (comment)

StephenWakely · 2024-11-14T13:08:22Z

lib/codecs/src/encoding/format/syslog.rs

+        let y = |v| self.replace_if_proxied_opt(v);
+        let app_name = y(&config.app_name).unwrap_or("vector".to_owned());
+        let proc_id = y(&config.proc_id);
+        let msg_id = y(&config.msg_id);


These should all be set like:

field = config.value.unwrap_or_else(|| log.get_field_by_meaning("service"))

StephenWakely · 2024-11-14T13:10:07Z

lib/codecs/src/encoding/format/syslog.rs

+        let payload = if config.payload_key.is_empty() {
+            self.log.get_message().map(|v| v.to_string_lossy().to_string() )
+        } else {
+            self.value_by_key(&config.payload_key)
+        };
+
+        payload.unwrap_or_default()


No payload key, so just something like this should do:

Suggested change

let payload = if config.payload_key.is_empty() {

self.log.get_message().map(|v| v.to_string_lossy().to_string() )

} else {

self.value_by_key(&config.payload_key)

};

payload.unwrap_or_default()

self.log.get_message().map(|v| v.to_string_lossy().to_string())

StephenWakely · 2024-11-14T13:11:58Z

lib/codecs/src/encoding/format/syslog.rs

+                // https://datatracker.ietf.org/doc/html/rfc3164#section-4.1.2
+                // https://docs.rs/chrono/latest/chrono/format/strftime/index.html
+                //
+                // TODO: Should this remain as UTC or adjust to the local TZ of the environment (or Vector config)?


Keep it UTC.

StephenWakely · 2024-11-14T13:17:22Z

lib/codecs/src/encoding/format/syslog.rs

+            &self.message,
+        ].concat()
+
+        // Q: RFC 5424 MSG part should technically ensure UTF-8 message begins with BOM?


If you want the challenge you are welcome to embark on the journey to make this comply. However, I feel that this occurs in practice rarely enough that it won't be a blocker to getting this merged.

StephenWakely · 2024-11-14T13:33:51Z

lib/codecs/src/encoding/format/syslog.rs

+    }
+
+    fn get_structured_data(&self) -> Option<StructuredData> {
+        self.log.get("structured_data")


For the Vector namespace, (when log_namespace: true) we want to get this by meaning instead of name:

Suggested change

self.log.get("structured_data")

match self.log.namespace() {

LogNamespace::Vector => self.log.get_by_meaning("structured_data")

LogNamespace::Legacy => self.log.get("structured_data")

}

polarathene

Thanks for the speedy feedback! ❤️

I'll return to this next week to go over your feedback properly then and address it, cheers 😁

polarathene · 2024-11-14T19:28:54Z

lib/codecs/src/encoding/format/syslog.rs

+    //
+    // Q: Why `$.message.` as the prefix? (Appears to be JSONPath syntax?)
+    // NOTE: Originally named in PR as: `get_field_or_config()`
+    fn replace_if_proxied(&self, value: &str) -> Option<String> {


The original PR IIRC had implemented a feature to support config that overrides a default value (such as always X severity), presumably because the log event did not provide an equivalent mapping. Or the config key would reference a field to lookup and get the value of from the event, which I think their $.message prefix was specific to their own use-case.

I'd rather drop it, but wasn't sure how much of the original PR was the correct approach as I'm not familiar enough with Vector myself. Thus this PR focused on refactoring the functionality that was already in the existing PR and moving forward from there 😅

So this method was mostly a helper:

value .strip_prefix("$.message.") .map_or( Some(value.to_owned()), |field_key| self.value_by_key(field_key), ) } fn value_by_key(&self, field_key: &str) -> Option<String> { self.log.get(field_key).map(|field_value| { field_value.to_string_lossy().to_string() }) }

Used earlier like:

fn decant_config(&self, config: &SyslogSerializerOptions) -> SyslogMessage { let y = |v| self.replace_if_proxied_opt(v); let app_name = y(&config.app_name).unwrap_or("vector".to_owned()); let proc_id = y(&config.proc_id); let msg_id = y(&config.msg_id);

Since they're options, if no lookup key or default value was configured, y would return None:

proc_id and msg_id would later be formatted as -.

app_name default is meant to fallback to something IIRC so vector was the default chosen. In my local branch I used the SmartDefault crate to add a default to the actual syslog Tag struct for the app_name field instead.

I'm not familiar with log.get_field_by_meaning or if that was available when I started working on this PR early this year. I'll look into that if it's appropriate (I'm aware of this reference you shared in another feedback comment, I haven't gone over that yet).

IIRC this logic may have been refactored on my local branch.

jcantrill · 2024-11-15T16:04:48Z

@StephenWakely @polarathene

I don't fully understand all of what this entails but my general impression is the suggested changes are moving from flexible to an opinionated implementation, relying upon the impl to assume values. Maybe I'm reading incorrectly but I am pushing to ask for flexibility and configurability. The spec defines specific attributes and layout of a message but leaves quite a bit open for interpretation by an admin. As a user of vector in a cloud containerized environment that already is forwarding logs over syslog, we have use cases where it makes sense to allow administrators to set various fields.

The original PR as a straight port of a fluend plugin and what we have now for our vector implementation. Reviewing the syslog spec I can see where several config knobs should be dropped (e.g. tag, add_source) but IMO the others should remain

polarathene · 2024-11-15T21:00:42Z

my general impression is the suggested changes are moving from flexible to an opinionated implementation, relying upon the impl to assume values.
Maybe I'm reading incorrectly but I am pushing to ask for flexibility and configurability.

I haven't had time to go over full review feedback, but I'll do so next week when I return to sync my local changes to this branch.

Any input from me thus far in the review process is mostly focused on the encoder config settings which IMO were not implemented well in the original PR.
My primary focus will be on getting the features core functionality sorted for review and approval. At which point I'd reach out for subscribed users to test it out for feedback so that we can verify the feature is useful your use-cases.

Config wise, I think any property/field mapping can be handled via VRL with remap. We can discuss that in future if it's inconvenient and could be improved, but given my limited time I hope you understand why I don't want too much bike shedding over that right now.

The spec defines specific attributes and layout of a message but leaves quite a bit open for interpretation by an admin.
As a user of vector in a cloud containerized environment that already is forwarding logs over syslog, we have use cases where it makes sense to allow administrators to set various fields.

When this PR reaches the next milestone (when I ping for proper review / testing), I'd be happy to discuss this with you here. As per my review on the original PR, it should be evident that I care about respecting the spec. Anything that should be - (NIL) when not present is handled, and you should have the ability to override each field IIRC, just via Vector remap transforms.

FWIW: My interest in Vector with this feature is also for usage in a container.

The original PR is a straight port of a fluentd plugin and what we have now for our vector implementation.

This PR adapted the original, you can follow the commit history for logical commits with descriptions that reason each change.

IIRC the original had various concerns that I've addressed or intend to (these should be detailed in the commit messages and/or inline commentary).

I remember the add_source functionality and dropping it here in one of the commits as it seemed to be redundant (could be handled via VRL), thus I think most of your concerns aren't likely to be a problem and you'd still have the control/flexibility for your requirements, it just may look a little different config wise.

StephenWakely · 2024-11-18T13:12:26Z

@StephenWakely @polarathene

I don't fully understand all of what this entails but my general impression is the suggested changes are moving from flexible to an opinionated implementation, relying upon the impl to assume values. Maybe I'm reading incorrectly but I am pushing to ask for flexibility and configurability.

It's pushing the flexibility back into VRL, which we already have. This codec should be opinionated about what it expects the log message to look like. Any flexibility you need should be handled in a remap transform.

pront · 2024-11-18T14:36:43Z

@StephenWakely @polarathene
I don't fully understand all of what this entails but my general impression is the suggested changes are moving from flexible to an opinionated implementation, relying upon the impl to assume values. Maybe I'm reading incorrectly but I am pushing to ask for flexibility and configurability.

It's pushing the flexibility back into VRL, which we already have. This codec should be opinionated about what it expects the log message to look like. Any flexibility you need should be handled in a remap transform.

This sounds like a good strategy in general. Remap can do any kind of event transformation you need. The encoder should specify the interface it expects so it can efficiently batch and send requests downstream.

syedriko and others added 16 commits March 15, 2024 17:23

feat: Add syslog codec

1ddf6af

Original commit from syedriko

chore: Split syslog encoder into separate files

7407f7b

This is only a temporary change to make the diffs for future commits easier to follow.

chore: Merge back into syslog.rs

1049ebd

No changes beyond relocating the code into a single file.

feat: Add StructuredData support to Syslog encoder

3cdc1b4

chore: Use snafu for error message

ed202bb

Not sure if this is worthwhile, but it adopts error message convention elsewhere I've seen by managing them via Snafu.

github-actions bot added domain: sinks Anything related to the Vector's sinks domain: codecs Anything related to Vector's codecs (encoding/decoding) labels Sep 18, 2024

polarathene mentioned this pull request Sep 18, 2024

feat(codecs): new syslog codec #17668

Closed

polarathene mentioned this pull request Nov 8, 2024

New syslog sink #6863

Open

pront marked this pull request as ready for review November 13, 2024 15:15

pront requested a review from a team as a code owner November 13, 2024 15:15

polarathene commented Nov 13, 2024

View reviewed changes

StephenWakely requested changes Nov 14, 2024

View reviewed changes

polarathene commented Nov 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(codecs): Add syslog encoder #21307

feat(codecs): Add syslog encoder #21307

polarathene commented Sep 18, 2024 •

edited

Loading

bits-bot commented Sep 18, 2024 •

edited

Loading

gaby commented Nov 7, 2024

polarathene commented Nov 8, 2024

jcantrill commented Nov 8, 2024

StephenWakely commented Nov 13, 2024

polarathene commented Nov 13, 2024 •

edited

Loading

polarathene left a comment

polarathene Nov 13, 2024

StephenWakely Nov 14, 2024

polarathene Nov 13, 2024

StephenWakely Nov 14, 2024

StephenWakely left a comment •

edited

Loading

StephenWakely Nov 14, 2024

StephenWakely Nov 14, 2024

StephenWakely Nov 14, 2024

jcantrill Nov 15, 2024

polarathene Nov 15, 2024

jcantrill Nov 15, 2024 •

edited

Loading

polarathene Nov 15, 2024

StephenWakely Nov 14, 2024

StephenWakely Nov 14, 2024

StephenWakely Nov 14, 2024

StephenWakely Nov 14, 2024

StephenWakely Nov 14, 2024

StephenWakely Nov 14, 2024

StephenWakely Nov 14, 2024

polarathene left a comment

polarathene Nov 14, 2024

jcantrill commented Nov 15, 2024

polarathene commented Nov 15, 2024

StephenWakely commented Nov 18, 2024

pront commented Nov 18, 2024

-        self.log.get("structured_data")
+  match self.log.namespace() {
+       LogNamespace::Vector => self.log.get_by_meaning("structured_data")
+       LogNamespace::Legacy => self.log.get("structured_data")
+  }

feat(codecs): Add syslog encoder #21307

Are you sure you want to change the base?

feat(codecs): Add syslog encoder #21307

Conversation

polarathene commented Sep 18, 2024 • edited Loading

bits-bot commented Sep 18, 2024 • edited Loading

gaby commented Nov 7, 2024

polarathene commented Nov 8, 2024

Context

PR was opened due to demand

My previous input

Personal friction

jcantrill commented Nov 8, 2024

StephenWakely commented Nov 13, 2024

polarathene commented Nov 13, 2024 • edited Loading

polarathene left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StephenWakely left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcantrill Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

polarathene left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcantrill commented Nov 15, 2024

polarathene commented Nov 15, 2024

StephenWakely commented Nov 18, 2024

pront commented Nov 18, 2024

polarathene commented Sep 18, 2024 •

edited

Loading

bits-bot commented Sep 18, 2024 •

edited

Loading

polarathene commented Nov 13, 2024 •

edited

Loading

StephenWakely left a comment •

edited

Loading

jcantrill Nov 15, 2024 •

edited

Loading