Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support checkpointing and reading old logs with docker_logs source #7358

Open
pgassmann opened this issue May 6, 2021 · 10 comments
Open

Support checkpointing and reading old logs with docker_logs source #7358

pgassmann opened this issue May 6, 2021 · 10 comments
Labels
source: docker_logs Anything `docker_logs` source related type: enhancement A value-adding code change that enhances its existing functionality.

Comments

@pgassmann
Copy link

Vector Version

vector 0.13.1 (v0.13.1 x86_64-unknown-linux-gnu 2021-04-29)

Expected Behavior

  • Docker source has checkpointing and does not miss logs.

  • When starting, Vector reads all available logs and all new logs. if vector is stopped and started, the logs in the stopped time are read and sent to the configured sink.

Actual Behavior

Logs are only read from the moment when vector is started.
when restarting vector, logs from the containers are missing.
when rebooting and the docker containers are started before vector, the startup logs are missing.

Example Data

Additional Context

In #1107 in the description, there is this point:

  • docker source explicitly only looks at records starting at Vector start time, so no changes required

@jszwedko This is not explicitly mentioned in the docker source documentation. https://vector.dev/docs/reference/configuration/sources/docker_logs/#how-it-works

References

Vector Configuration File

# Set global options
data_dir = "/var/lib/vector"

# Read logs from Docker API and Send to loki sink

[sources.docker-local]
  type = "docker_logs"
  docker_host = "/var/run/docker.sock"
  exclude_containers = []

  # Identify zero-width space as first line of a multiline block.
  multiline.condition_pattern = '^\x{200B}' # required
  multiline.mode = "halt_before" # required
  multiline.start_pattern = '^\x{200B}' # required
  multiline.timeout_ms = 1000 # required, milliseconds

[sinks.loki]
  # General
  type = "loki" # required
  inputs = ["docker-local"] # required
  endpoint = "https://logs.example.com:443" # required


  # Auth
  auth.strategy = "basic" # required
  auth.user = "" # required
  auth.password = "" # required

  # Encoding
  encoding.codec = "json" # required

  # Healthcheck
  healthcheck.enabled = true # optional, default


  # Labels
  labels.forwarder = 'vector'
  labels.host = '{{ host }}'
  labels.container_name = '{{ container_name }}'
  labels.compose_service = '{{ label.com\.docker\.compose\.service }}'
  labels.compose_project = '{{ label.com\.docker\.compose\.project }}'
  labels.source = '{{ stream }}'
  labels.category = 'dockerlogs'
@pgassmann pgassmann added the type: bug A code related bug. label May 6, 2021
@pgassmann
Copy link
Author

pgassmann commented May 6, 2021

Related to #7336

@jszwedko jszwedko added source: docker_logs Anything `docker_logs` source related type: enhancement A value-adding code change that enhances its existing functionality. and removed type: bug A code related bug. labels May 6, 2021
@jszwedko
Copy link
Member

jszwedko commented May 6, 2021

Thanks @pgassmann . I agree this would be good to do. I updated it to an enhancement as I believe the current behavior is intentional (albeit undocumented) given the documentation also doesn't document the inverse, that it will pick up logs from before it started.

@jszwedko jszwedko changed the title docker logs are missing when restarting vector. no checkpointing for docker source Support checkpointing and reading old logs with docker_logs source May 6, 2021
@pgassmann
Copy link
Author

@jszwedko how are the plans to implement checkpointing and history for docker logs? We are currently experiencing issues with vector sending logs from docker to loki and because of this missing feature we are losing hours of logs.

@jszwedko
Copy link
Member

jszwedko commented Nov 9, 2021

@pgassmann unfortunately no movement on this yet. You could consider eschewing the docker_logs source and using the file source with the container log files. This would give you checkpointing, of course, but would be missing the container metadata that the docker_logs source adds.

@pgassmann
Copy link
Author

The latest release of loki/promtail has support for docker service discovery, which seems to combine service discovery through the api and reading logs from the json files. Supporting checkpointing.
https://grafana.com/docs/loki/latest/clients/promtail/configuration/#docker_sd_config

That's a major selling point to move back to promtail, as currently with vector we cannot guarantee that all docker logs are transported to loki.

@pgassmann
Copy link
Author

pgassmann commented Jun 3, 2022

Checkpointing should be quite trivial to implement by using the since option of the log api. currently it is always set to "now", but this can be set to the last known timestamp or a timestamp calulated by now - max_lookback_duration

self.docker.events(Some(EventsOptions {
since: Some(self.now_timestamp),
until: None,
filters,
}))

The docker_logs source just has to keep track of the last read timestamp by container. This can be part of the acknowledgement #7650 i.e. only update the checkpoint timestamp when the event is confirmed on the sink.

After a restart of vector, it should query for the logs of all saved container checkpoints. even if they are no longer running. (e.g. stopped after vector was stopped) but the remaining logs can still be queried from docker api.
For yet unknown containers, vector should also query for a configurable amount of time before "now". max_lookback_duration (e.g. when a container was created/started (shortly) before vector)

cc @bruceg

@pgassmann
Copy link
Author

@jszwedko @bruceg Can someone please look into this and give feedback to my suggestion? Vector again lost some important logs from a migration, because of this and #16806

@jszwedko
Copy link
Member

@jszwedko @bruceg Can someone please look into this and give feedback to my suggestion? Vector again lost some important logs from a migration, because of this and #16806

Ouch, that sucks. I'm sorry to hear that you last some logs. Your suggestion for the checkpointing strategy makes sense to me. Unfortunately I don't know when exactly we would get to it, but we'd be happy to help support a PR if you (or anyone else) is motivated.

@pgassmann
Copy link
Author

There hasn't been progress on that for two years now. Our Devs are getting frustrated, because we lose logs during maintenance windows where we reboot the hosts.
We will now have to switch to a different log collector.

@pgassmann
Copy link
Author

pgassmann commented Jul 15, 2024

We now switched to promtail for collecting docker container logs. Here's our promtail configuration.
you can find our full config in the ansible role: https://github.com/teamapps-org/ansible-collection-teamapps-general/tree/main/roles/promtail

scrape_configs:
  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
        # filters:
        #   - name: name
        #     values: [test-container]
    relabel_configs:
      - source_labels: ['__meta_docker_container_label_com_docker_compose_container_number']
        target_label: 'compose_container_number'
        action: 'replace'
        replacement: '${1}'
      - source_labels: ['__meta_docker_container_label_com_docker_compose_project']
        target_label: 'compose_project'
        action: 'replace'
        replacement: '${1}'
      - source_labels: ['__meta_docker_container_label_com_docker_compose_project_working_dir']
        target_label: 'compose_project_working_dir'
        action: 'replace'
        replacement: '${1}'
      - source_labels: ['__meta_docker_container_label_com_docker_compose_oneoff']
        target_label: 'compose_oneoff'
        action: 'replace'
        replacement: '${1}'
      - source_labels: ['__meta_docker_container_label_com_docker_compose_service']
        target_label: 'compose_service'
        action: 'replace'
        replacement: '${1}'
      - source_labels: ['__meta_docker_container_id']
        target_label: 'container_id'
        action: 'replace'
      - source_labels: ['__meta_docker_container_name']
        target_label: 'container_name'
        regex: '/(.*)'
        action: 'replace'

      - source_labels: ['__meta_docker_container_log_stream']
        target_label: 'stream'
        action: 'replace'
      - source_labels: ['__meta_docker_container_log_stream']
        target_label: 'source'
        action: 'replace'
      - source_labels: ['__meta_docker_container_log_stream']
        target_label: 'source_type'
        action: 'replace'

      - target_label: 'category'
        replacement: 'dockerlogs'
      - target_label: 'job'
        replacement: 'docker'
      ## Map all labels
      # - action: labelmap
      #   regex: '__meta_docker_container_label_(.+)'
      #   replacement: 'container_labels_${1}'
    pipeline_stages:
      # combine multiline messages like stacktraces to one message,
      # needs configuration in application to prefix logs with [zero-width-space char.](https://unicode-explorer.com/c/200B)
      - multiline:
          firstline: '^\x{200B}'
          max_wait_time: 1s
      - drop:
          older_than: 4h
          drop_counter_reason: "line_too_old"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
source: docker_logs Anything `docker_logs` source related type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

No branches or pull requests

2 participants