Support checkpointing and reading old logs with `docker_logs` source #7358

pgassmann · 2021-05-06T10:50:38Z

Vector Version

vector 0.13.1 (v0.13.1 x86_64-unknown-linux-gnu 2021-04-29)

Expected Behavior

Docker source has checkpointing and does not miss logs.
When starting, Vector reads all available logs and all new logs. if vector is stopped and started, the logs in the stopped time are read and sent to the configured sink.

Actual Behavior

Logs are only read from the moment when vector is started.
when restarting vector, logs from the containers are missing.
when rebooting and the docker containers are started before vector, the startup logs are missing.

Example Data

Additional Context

docker logs or docker-compose logs and the API provide options to read previous logs.
Kubernetes source has checkpointing implemented: https://vector.dev/docs/reference/configuration/sources/kubernetes_logs/#checkpointing

In #1107 in the description, there is this point:

docker source explicitly only looks at records starting at Vector start time, so no changes required

@jszwedko This is not explicitly mentioned in the docker source documentation. https://vector.dev/docs/reference/configuration/sources/docker_logs/#how-it-works

References

Implement end-to-end record acknowledgement Implement end-to-end record acknowledgement #1107
vector sends thousands of log lines when it cannot read the logs from docker containers vector sends thousands of log lines when it cannot read the logs from docker containers #7272

Vector Configuration File

# Set global options
data_dir = "/var/lib/vector"

# Read logs from Docker API and Send to loki sink

[sources.docker-local]
  type = "docker_logs"
  docker_host = "/var/run/docker.sock"
  exclude_containers = []

  # Identify zero-width space as first line of a multiline block.
  multiline.condition_pattern = '^\x{200B}' # required
  multiline.mode = "halt_before" # required
  multiline.start_pattern = '^\x{200B}' # required
  multiline.timeout_ms = 1000 # required, milliseconds

[sinks.loki]
  # General
  type = "loki" # required
  inputs = ["docker-local"] # required
  endpoint = "https://logs.example.com:443" # required


  # Auth
  auth.strategy = "basic" # required
  auth.user = "" # required
  auth.password = "" # required

  # Encoding
  encoding.codec = "json" # required

  # Healthcheck
  healthcheck.enabled = true # optional, default


  # Labels
  labels.forwarder = 'vector'
  labels.host = '{{ host }}'
  labels.container_name = '{{ container_name }}'
  labels.compose_service = '{{ label.com\.docker\.compose\.service }}'
  labels.compose_project = '{{ label.com\.docker\.compose\.project }}'
  labels.source = '{{ stream }}'
  labels.category = 'dockerlogs'

The text was updated successfully, but these errors were encountered:

pgassmann · 2021-05-06T11:44:57Z

Related to #7336

jszwedko · 2021-05-06T15:01:19Z

Thanks @pgassmann . I agree this would be good to do. I updated it to an enhancement as I believe the current behavior is intentional (albeit undocumented) given the documentation also doesn't document the inverse, that it will pick up logs from before it started.

pgassmann · 2021-11-02T16:14:57Z

@jszwedko how are the plans to implement checkpointing and history for docker logs? We are currently experiencing issues with vector sending logs from docker to loki and because of this missing feature we are losing hours of logs.

jszwedko · 2021-11-09T21:59:05Z

@pgassmann unfortunately no movement on this yet. You could consider eschewing the docker_logs source and using the file source with the container log files. This would give you checkpointing, of course, but would be missing the container metadata that the docker_logs source adds.

pgassmann · 2022-04-11T08:25:32Z

The latest release of loki/promtail has support for docker service discovery, which seems to combine service discovery through the api and reading logs from the json files. Supporting checkpointing.
https://grafana.com/docs/loki/latest/clients/promtail/configuration/#docker_sd_config

That's a major selling point to move back to promtail, as currently with vector we cannot guarantee that all docker logs are transported to loki.

pgassmann · 2022-06-03T13:36:55Z

Checkpointing should be quite trivial to implement by using the since option of the log api. currently it is always set to "now", but this can be set to the last known timestamp or a timestamp calulated by now - max_lookback_duration

vector/src/sources/docker_logs.rs

Lines 271 to 275 in b9f661b

    
           self.docker.events(Some(EventsOptions { 
        
               since: Some(self.now_timestamp), 
        
               until: None, 
        
               filters, 
        
           }))

The docker_logs source just has to keep track of the last read timestamp by container. This can be part of the acknowledgement #7650 i.e. only update the checkpoint timestamp when the event is confirmed on the sink.

After a restart of vector, it should query for the logs of all saved container checkpoints. even if they are no longer running. (e.g. stopped after vector was stopped) but the remaining logs can still be queried from docker api.
For yet unknown containers, vector should also query for a configurable amount of time before "now". max_lookback_duration (e.g. when a container was created/started (shortly) before vector)

cc @bruceg

pgassmann · 2023-03-28T08:47:00Z

@jszwedko @bruceg Can someone please look into this and give feedback to my suggestion? Vector again lost some important logs from a migration, because of this and #16806

jszwedko · 2023-03-28T13:09:42Z

@jszwedko @bruceg Can someone please look into this and give feedback to my suggestion? Vector again lost some important logs from a migration, because of this and #16806

Ouch, that sucks. I'm sorry to hear that you last some logs. Your suggestion for the checkpointing strategy makes sense to me. Unfortunately I don't know when exactly we would get to it, but we'd be happy to help support a PR if you (or anyone else) is motivated.

pgassmann · 2024-07-01T13:06:23Z

There hasn't been progress on that for two years now. Our Devs are getting frustrated, because we lose logs during maintenance windows where we reboot the hosts.
We will now have to switch to a different log collector.

pgassmann · 2024-07-15T16:09:10Z

We now switched to promtail for collecting docker container logs. Here's our promtail configuration.
you can find our full config in the ansible role: https://github.com/teamapps-org/ansible-collection-teamapps-general/tree/main/roles/promtail

scrape_configs:
  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
        # filters:
        #   - name: name
        #     values: [test-container]
    relabel_configs:
      - source_labels: ['__meta_docker_container_label_com_docker_compose_container_number']
        target_label: 'compose_container_number'
        action: 'replace'
        replacement: '${1}'
      - source_labels: ['__meta_docker_container_label_com_docker_compose_project']
        target_label: 'compose_project'
        action: 'replace'
        replacement: '${1}'
      - source_labels: ['__meta_docker_container_label_com_docker_compose_project_working_dir']
        target_label: 'compose_project_working_dir'
        action: 'replace'
        replacement: '${1}'
      - source_labels: ['__meta_docker_container_label_com_docker_compose_oneoff']
        target_label: 'compose_oneoff'
        action: 'replace'
        replacement: '${1}'
      - source_labels: ['__meta_docker_container_label_com_docker_compose_service']
        target_label: 'compose_service'
        action: 'replace'
        replacement: '${1}'
      - source_labels: ['__meta_docker_container_id']
        target_label: 'container_id'
        action: 'replace'
      - source_labels: ['__meta_docker_container_name']
        target_label: 'container_name'
        regex: '/(.*)'
        action: 'replace'

      - source_labels: ['__meta_docker_container_log_stream']
        target_label: 'stream'
        action: 'replace'
      - source_labels: ['__meta_docker_container_log_stream']
        target_label: 'source'
        action: 'replace'
      - source_labels: ['__meta_docker_container_log_stream']
        target_label: 'source_type'
        action: 'replace'

      - target_label: 'category'
        replacement: 'dockerlogs'
      - target_label: 'job'
        replacement: 'docker'
      ## Map all labels
      # - action: labelmap
      #   regex: '__meta_docker_container_label_(.+)'
      #   replacement: 'container_labels_${1}'
    pipeline_stages:
      # combine multiline messages like stacktraces to one message,
      # needs configuration in application to prefix logs with [zero-width-space char.](https://unicode-explorer.com/c/200B)
      - multiline:
          firstline: '^\x{200B}'
          max_wait_time: 1s
      - drop:
          older_than: 4h
          drop_counter_reason: "line_too_old"

pgassmann added the type: bug A code related bug. label May 6, 2021

jszwedko added source: docker_logs Anything `docker_logs` source related type: enhancement A value-adding code change that enhances its existing functionality. and removed type: bug A code related bug. labels May 6, 2021

jszwedko changed the title ~~docker logs are missing when restarting vector. no checkpointing for docker source~~ Support checkpointing and reading old logs with docker_logs source May 6, 2021

pgassmann mentioned this issue May 11, 2021

if loki is not reachable and loki-docker-driver is activated, containers apps stops and cannot be stopped/killed grafana/loki#2361

Open

pgassmann mentioned this issue Jun 3, 2022

ARC Retry backoff duration should be reset on (partial) successfull transmission #12960

Open

pgassmann mentioned this issue Mar 15, 2023

Docker Logs source did not start after reboot. No retry after ERROR. #16806

Open

dsmith3197 mentioned this issue Oct 31, 2023

Cannot receive logs from docker #17747

Closed

jszwedko mentioned this issue Mar 18, 2024

At-least-once, or improved delivery / checkpointing for docker_logs source #20121

Closed

pgassmann mentioned this issue Jul 4, 2024

New source: "loki" #6873

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support checkpointing and reading old logs with `docker_logs` source #7358

Support checkpointing and reading old logs with `docker_logs` source #7358

pgassmann commented May 6, 2021

pgassmann commented May 6, 2021 •

edited

Loading

jszwedko commented May 6, 2021

pgassmann commented Nov 2, 2021

jszwedko commented Nov 9, 2021

pgassmann commented Apr 11, 2022

pgassmann commented Jun 3, 2022 •

edited

Loading

pgassmann commented Mar 28, 2023

jszwedko commented Mar 28, 2023

pgassmann commented Jul 1, 2024

pgassmann commented Jul 15, 2024 •

edited

Loading

Support checkpointing and reading old logs with docker_logs source #7358

Support checkpointing and reading old logs with docker_logs source #7358

Comments

pgassmann commented May 6, 2021

Vector Version

Expected Behavior

Actual Behavior

Example Data

Additional Context

References

Vector Configuration File

pgassmann commented May 6, 2021 • edited Loading

jszwedko commented May 6, 2021

pgassmann commented Nov 2, 2021

jszwedko commented Nov 9, 2021

pgassmann commented Apr 11, 2022

pgassmann commented Jun 3, 2022 • edited Loading

pgassmann commented Mar 28, 2023

jszwedko commented Mar 28, 2023

pgassmann commented Jul 1, 2024

pgassmann commented Jul 15, 2024 • edited Loading

Support checkpointing and reading old logs with `docker_logs` source #7358

Support checkpointing and reading old logs with `docker_logs` source #7358

pgassmann commented May 6, 2021 •

edited

Loading

pgassmann commented Jun 3, 2022 •

edited

Loading

pgassmann commented Jul 15, 2024 •

edited

Loading