Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector Internal Metrics has no data for component_errors_total #10882

Closed
Matthew-Beckett opened this issue Jan 17, 2022 · 1 comment
Closed
Labels
type: bug A code related bug.

Comments

@Matthew-Beckett
Copy link

Matthew-Beckett commented Jan 17, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Vector Version

0.14

Vector Configuration File

---
api:
  enabled: true
  address: '127.0.0.1:8686'
transforms:
  only_fleets:
    type: filter
    inputs:
      - kubernetes_logs
    condition:
      kubernetes.pod_name.starts_with: fleet
      type: check_fields

  fleet-parser:
    inputs:
      - only_fleets
    type: remap
    source: |2-
      structured = <REGEX> ?? {}
      if structured.severity == null {
        structured = <REGEX> ?? {}
      }
      . = merge(., structured)
      if .thread != null {
        .thread = to_int!(.thread)
      }

      if .ue4time != null {
        .ue4time = parse_timestamp!(.ue4time, "%Y.%m.%d-%H.%M.%S")
      }

  drop-loggers:
    type: filter
    inputs:
      - fleet-parser
    condition: '!includes(["LogNetPackageMap","LogNetPlayerMovement"], .logger)'

  optimize-message:
    type: remap
    inputs:
      - drop-loggers
    source: |2-
        .raw_message = .message
        .message = .msg
        del(.msg)

  add-fleet-name:
    type: remap
    inputs:
      - optimize-message
    drop_on_abort: false
    source: |2-
      .fleet_name = "undefined"
      if exists(.kubernetes.pod_name) {
        split_name, err = split(.kubernetes.pod_name, "-")
        if err != null {
          abort
        }
        .fleet_name = join!([split_name[0], split_name[1], split_name[2]], "-")
      }

  add-build-metadata:
    type: remap
    inputs:
      - add-fleet-name
    drop_on_abort: false
    source: |2-
      if .kubernetes.container_image != null {
        parsed_message, err = <REGEX>
        if err != null {
          log("Unable to determine branch name, sha, or build version", level: "error", rate_limit_secs: 0)
          log(err, level: "error", rate_limit_secs: 0)
          .major = 0
          .minor = 0
          .patch = 0
          .sha = "null"
          .branch ="null/null"
        } else {
          . = merge(., parsed_message)
        }
      }

  add-severity:
    type: remap
    inputs:
      - add-build-metadata
    source: |2-
      if .severity == null || .severity == "" || .severity == "Log" || .severity == "Display" {
        .severity = "DEFAULT"
      }
      if .severity == "Verbose" || .severity == "VeryVerbose" {
        .severity = "DEBUG"
      }
      .severity = upcase(.severity)

  log2metric-severity-total:
    type: log_to_metric
    inputs:
      - add-severity
    metrics:
      - name: "gameone_ue4_logs_total"
        field: severity
        type: counter
        tags:
          fleet_name: "{{fleet_name}}"
          pod_name: "{{kubernetes.pod_name}}"
          severity: "{{severity}}"
          major_version: "{{major}}"
          minor_version: "{{minor}}"
          patch_version: "{{patch}}"
          commit_sha: "{{sha}}"
          branch_name: "{{branch}}"
          logger: "{{logger}}"

tests:
  - name: add-fleet-name-present
    inputs:
      - insert_at: fleet-parser
        type: log
        log_fields:
          kubernetes.pod_name: "fleet-dev-test-123"
          message: >-
            [2021.12.14-14.45.39:200][392]LogModuleManager: Shutting down and abandoning module RSA (3)
    outputs:
      - extract_from: add-severity
        conditions:
          - type: check_fields
            severity.exists: true
            severity.equals: DEFAULT
            logger.equals: LogModuleManager
            message.equals: 'Shutting down and abandoning module RSA (3)'
            fleet_name.exists: true
            fleet_name.equals: 'fleet-dev-test'
  - name: add-fleet-name-not-present
    inputs:
      - insert_at: fleet-parser
        type: log
        log_fields:
          message: >-
            [2021.12.14-14.45.39:200][392]LogModuleManager: Shutting down and abandoning module RSA (3)
    outputs:
      - extract_from: add-severity
        conditions:
          - type: check_fields
            severity.exists: true
            severity.equals: DEFAULT
            logger.equals: LogModuleManager
            message.equals: 'Shutting down and abandoning module RSA (3)'
            fleet_name.exists: true
            fleet_name.equals: 'undefined'
  - name: severity_not_present
    inputs:
      - insert_at: fleet-parser
        type: log
        log_fields:
          message: >-
            [2021.12.14-14.45.39:200][392]LogModuleManager: Shutting down and abandoning module RSA (3)
    outputs:
      - extract_from: add-severity
        conditions:
          - type: check_fields
            severity.exists: true
            severity.equals: DEFAULT
            logger.equals: LogModuleManager
            message.equals: 'Shutting down and abandoning module RSA (3)'
  - name: severity_log_to_default
    inputs:
      - insert_at: fleet-parser
        type: log
        log_fields:
          message: >-
            [2021.07.08-15.39.00:688][474]LogFurniture: Log: MoveItem in 775ce8ef-9445-4895-8464-be796eb398c7
    outputs:
      - extract_from: add-severity
        conditions:
          - type: check_fields
            severity.exists: true
            severity.equals: DEFAULT
            logger.equals: LogFurniture
            thread.equals: 474
            message.equals: MoveItem in 775ce8ef-9445-4895-8464-be796eb398c7
  - name: severity_warning
    inputs:
      - insert_at: fleet-parser
        type: log
        log_fields:
          message: >-
            [2021.07.08-15.39.00:688][474]LogFurniture: Warning: MoveItem in 775ce8ef-9445-4895-8464-be796eb398c7
    outputs:
      - extract_from: add-severity
        conditions:
          - type: check_fields
            severity.exists: true
            severity.equals: WARNING
            logger.equals: LogFurniture
            message.equals: MoveItem in 775ce8ef-9445-4895-8464-be796eb398c7
  - name: drop_logger
    no_outputs_from:
      - drop-loggers
    inputs:
      - insert_at: fleet-parser
        type: log
        log_fields:
          message: '[2021.07.08-15.39.00:688][474]LogNetPackageMap: Warning: doing stuff'
sinks:
  fleet-gcp:
    type: gcp_stackdriver_logs
    inputs:
      - add-severity
    log_id: 
    project_id: '${VECTOR_SINK_GCP_PROJECT_ID:-unspecified}'
    resource:
      project_id: '${VECTOR_SINK_GCP_PROJECT_ID:-unspecified}'
      type: generic_node
      location: '${VECTOR_SINK_GCP_LOCATION:-unspecified}'
      node_id: '${VECTOR_SELF_NODE_NAME:-unspecified}'
    severity_key: severity
  prometheus_exporter:
    type: "prometheus_exporter"
    inputs:
      - log2metric-severity-total
    address: "0.0.0.0:9080"

Expected Behavior

Erroring components are appended to the counter component_errors_total allowing a count of dropped logs

Actual Behavior

component_errors_total is never incremented

Example Data

Just make any component error

Additional Context

We're attempting to count the amount of dropped logs that do not make it to a sink because of an error, particularly, we see errors where Vector tries to enrich metadata for a terminated pod and drops it's logs as it cannot attach pod name etc to the event.

We would like to alert on high occurrence of log drop and figured component_errors_total would be the method to do this.

We cannot upgrade to 0.18.X as it fails to start and successfully log pods within our Kubernetes cluster.

References

@Matthew-Beckett Matthew-Beckett added the type: bug A code related bug. label Jan 17, 2022
@jszwedko
Copy link
Member

Hi @Matthew-Beckett !

We are in the process of surveying all components to ensure that they are instrumented to match the component specification. That should cover this. I'll close it in-lieu of #9687 #9688. Thanks for the report!

Regarding kubernetes_logs source issues, we are planning on replacing much of our implementation with a community supported library this quarter, as well. We are hopeful that it'll address the issues users have been seeing with that source or, at least, make them easier to track down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants