Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing timestamp manually seems to work in VRL but throw errors when applied to Vector config #21812

Closed
michellabbe opened this issue Nov 15, 2024 · 11 comments
Labels
type: bug A code related bug.

Comments

@michellabbe
Copy link

michellabbe commented Nov 15, 2024

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I'm exporting Traefik access logs in a JSON file, and using Vector (timberio/vector:latest-alpine docker image) to forward the logs to a Graylog server.

The configuration below works fine except timestamp in the access log isn't recognized.

Traefik saves ISO8601 timestamp in a field named time format, and final timestamp in Graylog differ from the time field:

message = [...],"time":"2024-11-15T08:53:48-05:00"}
time      = 2024-11-15 08:53:48.000  (field type = date)
timestamp = 2024-11-15 08:53:58.191  (field type = date)

While I could customize Traefik log format to rename time as timestamp, this would force me to maintain the custom format on any change. It would be much easier to teach Vector to use the default time field, and at first it seemed very easy to do.

Copy time to timestamp in VRL seems to work:

/ # vector vrl --input /var/log/traefik/access.log
[...]

$ .timestamp = .time
"2024-11-07T22:15:13-05:00"

$ .time
"2024-11-07T22:15:13-05:00"

$ .timestamp
"2024-11-07T22:15:13-05:00"

$ .
[...], "time": "2024-11-07T22:15:13-05:00", "timestamp": "2024-11-07T22:15:13-05:00" }

When applied to the vector config file (uncomment the line from the Configuration section below), Vector container log throws this error on the next access log:

2024-11-15T13:58:42.676883Z ERROR sink{component_kind="sink" component_id=graylog component_type=socket}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count=1 reason="Failed serializing frame." internal_log_rate_limit=true
2024-11-15T13:58:42.676835Z ERROR sink{component_kind="sink" component_id=graylog component_type=socket}: vector::internal_events::codecs: Failed serializing frame. error=LogEvent contains a value with an invalid type. field = "timestamp" type = "null" expected type = "timestamp or integer" error_code="encoder_serialize" error_type="encoder_failed" stage="sending" internal_log_rate_limit=true

-edit- Same behavior when trying to rename the field using .timestamp = del(.time) instead of copying it.

The time field is defined exactly the same and works fine so that definitely looks like a bug in Vector.

There doesn't seem to be a way to specify type = timestamp so let's try parse_timestamp instead.
Again, in VRL again seems to work:

$ .timestamp = parse_timestamp!(.time, format: "%+")
t'2024-11-08T03:15:13Z'

$ .time
"2024-11-07T22:15:13-05:00"

$ .timestamp
t'2024-11-08T03:15:13Z'

$ .
[...], "time": "2024-11-07T22:15:13-05:00", "timestamp": t'2024-11-08T03:15:13Z' }

That's a long shot but let's try anyway.

Also note that the parsed timestamp got switched to UTC, which doesn't seem to be revertable before sinking it (ref: #3333 ).
While I prefer to keep times in local timezone as much as possible for readability, it shouldn't be an issue as timezone is specified in the value.

When applied to Vector config file, Vector container log throws this error on the next access log:

2024-11-15T14:21:57.525859Z ERROR transform{component_kind="transform" component_id=cf_traefik component_type=remap}: vector::internal_events::remap: Mapping failed with event. error="function call error for \"parse_timestamp\" at (83:120): unable to convert value to timestamp" error_type="conversion_failed" stage="processing" internal_log_rate_limit=true
2024-11-15T14:21:57.525898Z ERROR transform{component_kind="transform" component_id=cf_traefik component_type=remap}: vector::internal_events::remap: Internal log [Mapping failed with event.] is being suppressed to avoid flooding.

I thought maybe the timestamp field is treated special so I tried the same functions with other field names, with the same results.

Configuration

# Set global options
data_dir: "/var/lib/vector"
#timezone: "local"
timezone: "America/New_York"

sources:
  traefik_logs:
    type: "file"
    include:
      - "/var/log/traefik/access.log"
    ignore_older_secs: 3600      # 1 hour

transforms:
  custom_fields:
    type: "remap"
    inputs:
      - cf_*
    source: |
      .host_name = "${HOST_NAME}"
      .agent_type = "vector"

  cf_traefik:
    type: "remap"
    inputs:
      - traefik_logs
    source: |
      .event_kind = "traefik"
      .service_type = "traefik"
      #.timestamp = .time
      #.timestamp = parse_timestamp!(.time, format: "%+")
      #.timestamp = parse_timestamp!(.time, format: "%Y-%m-%dT%H:%M:%S%.f%:z")
      #.timestamp_msg = .time
      #.timestamp_msg = parse_timestamp!(.time, format: "%+")
      #.timestamp_msg = parse_timestamp!(.time, format: "%Y-%m-%dT%H:%M:%S%.f%:z")

sinks:
  graylog:
    type: "socket"
    inputs:
      - custom_fields
    address: "graylog.homelab.lan:12201"
    mode: "udp"
    encoding:
      codec: "gelf"
      timestamp_format: "rfc3339"

Version

0.42.0

Debug Output

No response

Example Data

{ "ClientAddr": "192.168.0.33:54978", "ClientHost": "192.168.0.33", "ClientPort": "54978", "ClientUsername": "-", "DownstreamContentSize": 569679, "DownstreamStatus": 200, "Duration": 699781673, "OriginContentSize": 569679, "OriginDuration": 699595289, "OriginStatus": 200, "Overhead": 186384, "RequestAddr": "cadvisor.docker2.mlabbe.lan:443", "RequestContentSize": 0, "RequestCount": 1, "RequestHost": "cadvisor.docker2.mlabbe.lan", "RequestMethod": "GET", "RequestPath": "/metrics", "RequestPort": "443", "RequestProtocol": "HTTP/2.0", "RequestScheme": "https", "RetryAttempts": 0, "RouterName": "cadvisor@docker", "ServiceAddr": "192.168.80.6:8080", "ServiceName": "cadvisor-service@docker", "ServiceURL": "http://192.168.80.6:8080", "SpanId": "0000000000000000", "StartLocal": "2024-11-07T22:15:13.010239743-05:00", "TLSCipher": "TLS_AES_128_GCM_SHA256", "TLSVersion": "1.3", "TraceId": "00000000000000000000000000000000", "downstream_Content-Encoding": "gzip", "downstream_Content-Type": "text/plain; version=0.0.4; charset=utf-8", "downstream_Date": "Fri, 08 Nov 2024 03:15:13 GMT", "entryPointName": "websecure", "level": "info", "msg": "", "origin_Content-Encoding": "gzip", "origin_Content-Type": "text/plain; version=0.0.4; charset=utf-8", "origin_Date": "Fri, 08 Nov 2024 03:15:13 GMT", "request_Accept": "application/openmetrics-text;version=1.0.0;q=0.5,application/openmetrics-text;version=0.0.1;q=0.4,text/plain;version=0.0.4;q=0.3,/;q=0.2", "request_Accept-Encoding": "gzip", "request_User-Agent": "Prometheus/2.55.0", "request_X-Forwarded-Host": "cadvisor.docker2.mlabbe.lan:443", "request_X-Forwarded-Port": "443", "request_X-Forwarded-Proto": "https", "request_X-Forwarded-Server": "traefik", "request_X-Prometheus-Scrape-Timeout-Seconds": "45", "request_X-Real-Ip": "192.168.0.33", "time": "2024-11-07T22:15:13-05:00" }

Additional Context

No response

References

No response

@michellabbe michellabbe added the type: bug A code related bug. label Nov 15, 2024
@pront
Copy link
Member

pront commented Nov 15, 2024

Hi @michellabbe, thank you for providing all the details. However, I set this up locally and I cannot reproduce.

# Set global options
data_dir: /path/to/data_dir_0
#timezone: "local"
timezone: "America/New_York"

sources:
  traefik_logs:
    type: "file"
    include:
      - path/to/traefik-sample.txt
#    ignore_older_secs: 3600      # 1 hour
    ignore_checkpoints: true
    offset_key: offset
    read_from: beginning

transforms:
  custom_fields:
    type: "remap"
    inputs:
      - cf_*
    source: |
      .host_name = "foo"
      .agent_type = "vector"

  cf_traefik:
    type: "remap"
    inputs:
      - traefik_logs
    source: |
      .event_kind = "traefik"
      .service_type = "traefik"

      .time = to_unix_timestamp(parse_timestamp!(.time, format: "%+"))
      #.timestamp = to_unix_timestamp(parse_timestamp!(.time, format: "%+"))

sinks:
  graylog:
    type: "socket"
    inputs:
      - custom_fields
    address: 127.0.0.1:12201 # dummy python script
    mode: "udp"
    encoding:
      codec: "gelf"
      timestamp_format: "rfc3339"

  console:
    inputs:
      - custom_fields
    target: stdout
    type: console
    encoding:
      codec: "gelf"
    buffer:
      type: memory
      max_events: 500
      when_full: block
cargo run --color=always -- --config path/to/config.yml
/usr/bin/python3 pront/scripts/socket_sink.py 
Listening on 127.0.0.1:12201
Received message from ('127.0.0.1', 61171): {"_agent_type":"vector","_host_name":"foo","_offset":0,"_source_type":"file","file":"/Users/pavlos.rontidis/CLionProjects/vector/pront/data/traefik-sample.txt","host":"COMP-LPF0JYPP2Q","short_message":"{ \"ClientAddr\": \"192.168.0.33:54978\", \"ClientHost\": \"192.168.0.33\", \"ClientPort\": \"54978\", \"ClientUsername\": \"-\", \"DownstreamContentSize\": 569679, \"DownstreamStatus\": 200, \"Duration\": 699781673, \"OriginContentSize\": 569679, \"OriginDuration\": 699595289, \"OriginStatus\": 200, \"Overhead\": 186384, \"RequestAddr\": \"cadvisor.docker2.mlabbe.lan:443\", \"RequestContentSize\": 0, \"RequestCount\": 1, \"RequestHost\": \"cadvisor.docker2.mlabbe.lan\", \"RequestMethod\": \"GET\", \"RequestPath\": \"/metrics\", \"RequestPort\": \"443\", \"RequestProtocol\": \"HTTP/2.0\", \"RequestScheme\": \"https\", \"RetryAttempts\": 0, \"RouterName\": \"cadvisor@docker\", \"ServiceAddr\": \"192.168.80.6:8080\", \"ServiceName\": \"cadvisor-service@docker\", \"ServiceURL\": \"http://192.168.80.6:8080\", \"SpanId\": \"0000000000000000\", \"StartLocal\": \"2024-11-07T22:15:13.010239743-05:00\", \"TLSCipher\": \"TLS_AES_128_GCM_SHA256\", \"TLSVersion\": \"1.3\", \"TraceId\": \"00000000000000000000000000000000\", \"downstream_Content-Encoding\": \"gzip\", \"downstream_Content-Type\": \"text/plain; version=0.0.4; charset=utf-8\", \"downstream_Date\": \"Fri, 08 Nov 2024 03:15:13 GMT\", \"entryPointName\": \"websecure\", \"level\": \"info\", \"msg\": \"\", \"origin_Content-Encoding\": \"gzip\", \"origin_Content-Type\": \"text/plain; version=0.0.4; charset=utf-8\", \"origin_Date\": \"Fri, 08 Nov 2024 03:15:13 GMT\", \"request_Accept\": \"application/openmetrics-text;version=1.0.0;q=0.5,application/openmetrics-text;version=0.0.1;q=0.4,text/plain;version=0.0.4;q=0.3,/;q=0.2\", \"request_Accept-Encoding\": \"gzip\", \"request_User-Agent\": \"Prometheus/2.55.0\", \"request_X-Forwarded-Host\": \"cadvisor.docker2.mlabbe.lan:443\", \"request_X-Forwarded-Port\": \"443\", \"request_X-Forwarded-Proto\": \"https\", \"request_X-Forwarded-Server\": \"traefik\", \"request_X-Prometheus-Scrape-Timeout-Seconds\": \"45\", \"request_X-Real-Ip\": \"192.168.0.33\", \"time\": \"2024-11-07T22:15:13-05:00\" }","timestamp":1731693265.424,"version":"1.1"}
Received message from ('127.0.0.1', 55630): {"_agent_type":"vector","_event_kind":"traefik","_host_name":"foo","_offset":0,"_service_type":"traefik","_source_type":"file","file":"/Users/pavlos.rontidis/CLionProjects/vector/pront/data/traefik-sample.txt","host":"COMP-LPF0JYPP2Q","short_message":"{ \"ClientAddr\": \"192.168.0.33:54978\", \"ClientHost\": \"192.168.0.33\", \"ClientPort\": \"54978\", \"ClientUsername\": \"-\", \"DownstreamContentSize\": 569679, \"DownstreamStatus\": 200, \"Duration\": 699781673, \"OriginContentSize\": 569679, \"OriginDuration\": 699595289, \"OriginStatus\": 200, \"Overhead\": 186384, \"RequestAddr\": \"cadvisor.docker2.mlabbe.lan:443\", \"RequestContentSize\": 0, \"RequestCount\": 1, \"RequestHost\": \"cadvisor.docker2.mlabbe.lan\", \"RequestMethod\": \"GET\", \"RequestPath\": \"/metrics\", \"RequestPort\": \"443\", \"RequestProtocol\": \"HTTP/2.0\", \"RequestScheme\": \"https\", \"RetryAttempts\": 0, \"RouterName\": \"cadvisor@docker\", \"ServiceAddr\": \"192.168.80.6:8080\", \"ServiceName\": \"cadvisor-service@docker\", \"ServiceURL\": \"http://192.168.80.6:8080\", \"SpanId\": \"0000000000000000\", \"StartLocal\": \"2024-11-07T22:15:13.010239743-05:00\", \"TLSCipher\": \"TLS_AES_128_GCM_SHA256\", \"TLSVersion\": \"1.3\", \"TraceId\": \"00000000000000000000000000000000\", \"downstream_Content-Encoding\": \"gzip\", \"downstream_Content-Type\": \"text/plain; version=0.0.4; charset=utf-8\", \"downstream_Date\": \"Fri, 08 Nov 2024 03:15:13 GMT\", \"entryPointName\": \"websecure\", \"level\": \"info\", \"msg\": \"\", \"origin_Content-Encoding\": \"gzip\", \"origin_Content-Type\": \"text/plain; version=0.0.4; charset=utf-8\", \"origin_Date\": \"Fri, 08 Nov 2024 03:15:13 GMT\", \"request_Accept\": \"application/openmetrics-text;version=1.0.0;q=0.5,application/openmetrics-text;version=0.0.1;q=0.4,text/plain;version=0.0.4;q=0.3,/;q=0.2\", \"request_Accept-Encoding\": \"gzip\", \"request_User-Agent\": \"Prometheus/2.55.0\", \"request_X-Forwarded-Host\": \"cadvisor.docker2.mlabbe.lan:443\", \"request_X-Forwarded-Port\": \"443\", \"request_X-Forwarded-Proto\": \"https\", \"request_X-Forwarded-Server\": \"traefik\", \"request_X-Prometheus-Scrape-Timeout-Seconds\": \"45\", \"request_X-Real-Ip\": \"192.168.0.33\", \"time\": \"2024-11-07T22:15:13-05:00\" }","timestamp":1731693356.305,"version":"1.1"}

A few things to note here:

  • If the timestamp field is absent:
    • Vector will generate a timestamp e.g. "timestamp":1731693356.305
    • time is a string and will be sent as is
  • You can use the VRL playground for quick tests, example

@michellabbe
Copy link
Author

Sorry for the long delay.

I'm not sure if I'm missing something from your tests.

Your test shows that a timestamp field is really sent, but it's still not matching the event timestamp in the time field.

\"time\": \"2024-11-07T22:15:13-05:00\" }","timestamp":1731693356.305
1731693356.305 = 2024-11-15T12:55:56-05:00

So that message would appear multiple days offset in Graylog.

While the timestamp could be fixed using a processing pipeline in Graylog, It's best when the log shippers can send data already processed so we don't end up wasting alot of resources on the Graylog server.

My testing works in VRL playground:
image
and also in vector vrl inside the container (as explained in my original post)

However, when I apply the same syntax in vector.yaml it throws an error, no matter if I'm trying to fix timestamp directly, or just create a new field (string).

@pront
Copy link
Member

pront commented Dec 2, 2024

Your test shows that a timestamp field is really sent, but it's still not matching the event timestamp in the time field.

To be clear, there's no link between these two fields. The .time field is not special, you will have to set it using whatever value/format you want. However, .timestamp is a bit special because some components might set it if there's no value set.

The original error was unable to convert value to timestamp, so some logs might not contain parsable valid dates:

Since the examples with fake dates work, you will need to debug what happens with real logs.

@michellabbe
Copy link
Author

It's not about the .time value. The very same VRL code works in playground but not in real life, or I'm really missing how to implement it properly (could very well be the case as I just started using vector recently).

Let's forget about .time and .timestamp for a minute, and simplify my example.

I'm only trying to duplicate field named field_1 into cf1 (both strings, no conversion) here. This should be simple.

Create test.log file:

{ "field_1": "test", "field_2": "2024-11-07T22:15:13-05:00" }

Create vector.yaml file:

sources:
  test_log:
    type: "file"
    include:
      - "/var/log/test.log"

transforms:
  custom_fields:
    type: "remap"
    inputs:
      - test_log
    source: |
      .cf1 = .field_1

sinks:
  console:
    inputs:
      - custom_fields
    target: stdout
    type: console
    encoding:
      #codec: "gelf"
      codec: "json"
    buffer:
      type: memory
      max_events: 500
      when_full: block

(I figured out using json codec makes debugging easier as it returns actual values instead of a gelf parsing error)

Start vector container, using the two files we just created:

docker run --rm \
  -v ./vector.yaml:/etc/vector/vector.yaml:ro \
  -v ./test.log:/var/log/test.log:ro \
  timberio/vector:0.43.1-alpine

The transform that works in playground returns a null value for cf1 instead of test string value like playground:

2024-12-11T10:52:23.340032Z  INFO vector::app: Log level is enabled. level="info"
2024-12-11T10:52:23.340480Z  INFO vector::app: Loading configs. paths=["/etc/vector/vector.yaml"]
2024-12-11T10:52:23.343145Z  INFO vector::topology::running: Running healthchecks.
2024-12-11T10:52:23.343202Z  INFO vector: Vector has started. debug="false" version="0.43.1" arch="x86_64" revision="e30bf1f 2024-12-10 16:14:47.175528383"
2024-12-11T10:52:23.343229Z  INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`.
2024-12-11T10:52:23.343234Z  INFO vector::topology::builder: Healthcheck passed.
2024-12-11T10:52:23.343352Z  INFO source{component_kind="source" component_id=test_log component_type=file}: vector::sources::file: Starting file server. include=["/var/log/test.log"] exclude=[]
2024-12-11T10:52:23.343664Z  INFO source{component_kind="source" component_id=test_log component_type=file}:file_server: file_source::checkpointer: Attempting to read legacy checkpoint files.
2024-12-11T10:52:23.344786Z  INFO source{component_kind="source" component_id=test_log component_type=file}:file_server: vector::internal_events::file::source: Found new file to watch. file=/var/log/test.log
{"cf1":null,"file":"/var/log/test.log","host":"8c5e7be45230","message":"{ \"field_1\": \"test\", \"field_2\": \"2024-11-07T22:15:13-05:00\" }","source_type":"file","timestamp":"2024-12-11T10:52:23.344943760Z"}

It looks like the json parsing of the source file isn't done before the transform is applied (json parsing seems to be done at sink level only), but I'm not sure what I'm missing in config.

I don't see anything particular about it in your test setup file either, but you didn't seem to test with docker (which by design eliminates "I can't reproduce on my side" issues).

I tried adding File Descriptor but I can't figure out (yet) proper syntax for this test case. The single real-life example provided isn't helping much when trying to figure out how to use this for the first time.

@jszwedko
Copy link
Member

jszwedko commented Dec 11, 2024

@michellabbe one issue I see here is that you are not parsing the input as JSON and so the remapping you are doing will have no effect (because there is no .field_1). Instead, you should parse the input as JSON first and then remap the field like:

. = merge(., object!(parse_json!(.message)))
 .cf1 = .field_1

@jszwedko
Copy link
Member

vector tap can be useful for seeing the structure of events flowing into a component.

@michellabbe
Copy link
Author

michellabbe commented Dec 11, 2024

 . = merge(., object!(parse_json!(.message)))
 .cf1 = .field_1

That helps alot, thanks!

It works with my last example.

However, when applying the graylog sink, the gelf encoder now complains on wrong field types for other fields that also got parsed with the merge in transforms:

2024-12-11T13:20:59.484698Z ERROR sink{component_kind="sink" component_id=graylog component_type=socket}: vector::internal_events::codecs: Failed serializing frame. error=LogEvent contains a value with an invalid type. field = "level" type = "string" expected type = "integer" error_code="encoder_serialize" error_type="encoder_failed" stage="sending" internal_log_rate_limit=true

Is it possible to extract only the specific field I need from json parsing another field (message) instead of parsing all fields and then work with extracted fields?

something like (obviously wrong syntax):

.cf1 = parse_json!(.field_1, object!(.message))

@jszwedko
Copy link
Member

Is it possible to extract only the specific field I need from json parsing another field (message) instead of parsing all fields and then work with extracted fields?

something like (obviously wrong syntax):

.cf1 = parse_json!(.field_1, object!(.message))

Yeah, that would work. I'd suggest assigning to a variable and then only extracting the fields you want like:

parsed = object!(parse_json!(.message))

.foo = parsed.bar

@pront
Copy link
Member

pront commented Dec 11, 2024

It seems like Jesse already got to the bottom of this 👍

I just wanted to share a simple config that I personally find useful when debugging:

api:
  enabled: true

sources:
  demo_logs_1:
    type: demo_logs
    format: json


transforms:
  remap0:
    type: remap
    inputs: ["demo_logs_1"]
    source: |
      # modify JSON demo log
      .foo = 1

sinks:
  console:
    inputs: [ "remap0" ]
    type: console
    encoding:
      codec: json
      json:
        pretty: true

The API enablement will allow you to use vector tap.

Based on this discussion I am confident this not a bug in Vector so I will close this issue. But feel free to followup.

@pront pront closed this as completed Dec 11, 2024
@michellabbe
Copy link
Author

Yeah, that would work. I'd suggest assigning to a variable and then only extracting the fields you want like:

parsed = object!(parse_json!(.message))
.foo = parsed.bar

I confirm it works (and definitely not a bug in vector). Thanks again!

  cf_traefik:
    type: "remap"
    inputs:
      - traefik_logs
    source: |
      .event_kind = "traefik"
      .service_type = "traefik"
      parsed_message = object!(parse_json!(.message))
      .timestamp = parse_timestamp!(parsed_message.time, format: "%+")

Using the example data from my original post, timestamp is now handled properly in Graylog:

message = [...], "time": "2024-11-07T22:15:13-05:00" }
time      = 2024-11-07 22:15:13.000  (field type = date)
timestamp = 2024-11-07 22:15:13.000  (field type = date)

@michellabbe
Copy link
Author

I just wanted to share a simple config that I personally find useful when debugging:

Thanks for sharing too.

I'll need to try out demo_logs and take a look at vector tap later when I get more time, but the pretty: true part alone is already very much appreciated.

As mentioned, I only started playing with vector recently, looking for a replacement for filebeat. The documentation is full of resources, but implementations are not always easy to figure out as a new user (kinda lacks more real-life examples to put implementations in context). However I have to say I keep being impressed as I play with it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

3 participants