Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pastlimit is not getting captured properly #2355

Closed
prben opened this issue May 22, 2020 · 4 comments
Closed

pastlimit is not getting captured properly #2355

prben opened this issue May 22, 2020 · 4 comments
Assignees
Labels
area:coordinator All issues pertaining to coordinator

Comments

@prben
Copy link

prben commented May 22, 2020

Filing M3 Issues

General Issues

Using carbon ingestor built on top of M3Coordinator. It's not calculating the past buffer time defined in the namespace.
May 22 15:23:43 svc-m3co-8138295 m3coordinator[17840]: {"level":"error","ts":1590141223.8895106,"msg":"err writing carbon metric","name":"system.ig-0320deb9-2836-1711296.all.net.icmp_outtimestampreps","error":"datapoint for aggregation too far in past: id=u'\u0006\u0000\u0006\u0000__g0__\u0006\u0000system\u0006\u0000__g1__+\u0000ig-0320deb9-2836-1711296\u0006\u0000__g2__\u0003\u0000all\u0006\u0000__g3__\u0003\u0000net\u0006\u0000__g4__\u0015\u0000icmp_outtimestampreps\u0014\u0000__option_id_scheme__\u0008\u0000graphite, off_by=17.88950451s, timestamp=22 May 20 15:23 +0530, past_limit=22 May 20 15:23 +0530, timestamp_unix_nanos=1590141191000000000, past_limit_unix_nanos=1590141208889504510","errorCauses":[{"error":"datapoint for aggregation too far in past: id=u'\u0006\u0000\u0006\u0000__g0__\u0006\u0000system\u0006\u0000__g1__+\u0000ig-0320deb9-2836-1711296\u0006\u0000__g2__\u0003\u0000all\u0006\u0000__g3__\u0003\u0000net\u0006\u0000__g4__\u0015\u0000icmp_outtimestampreps\u0014\u0000__option_id_scheme__\u0008\u0000graphite, off_by=17.88950451s, timestamp=22 May 20 15:23 +0530, past_limit=22 May 20 15:23 +0530, timestamp_unix_nanos=1590141191000000000, past_limit_unix_nanos=1590141208889504510"}]}

  1. What service is experiencing the issue?
    M3Coordinator v0.15.0-rc.9

  2. What is the configuration of the service?

namespace config
"options": { "bootstrapEnabled": True, "flushEnabled": True, "writesToCommitLog": True, "cleanupEnabled": True, "repairEnabled": True, "retentionOptions": { "retentionPeriodDuration": "8760h", "blockSizeDuration": "12h", "bufferFutureDuration": "1h", "bufferPastDuration": "1h", "blockDataExpiry": True, "blockDataExpiryAfterNotAccessPeriodDuration": "1h" }, "snapshotEnabled": True, "indexOptions": { "enabled": True, "blockSizeDuration": "12h" } }

coordinator config
`
listenAddress:
value: "0.0.0.0:7201"
carbon:
ingester:
listenAddress: "0.0.0.0:2003"
rules:
- pattern: system.*
aggregation:
type: mean
policies:
- resolution: 1m
retention: 8760h
clusters:

  • namespaces:
    • namespace: system_1year
      type: aggregated
      retention: 8760h
      resolution: 1m
      `
  1. How are you using the service?
    telegraf output plugin via graphite

  2. Is there a reliable way to reproduce the behavior? If so, please provide detailed instructions.

  • Setup namespace as per the config above
  • Setup telegraf output plugin for graphite
  • Setup past time within past buffertime on the emitter system.
@xcode03
Copy link

xcode03 commented Jun 3, 2020

same problem

below is my configuration:

m3coordinator:

listenAddress:
  type: "config"
  value: "0.0.0.0:7201"

logging:
  level: info

metrics:
  scope:
    prefix: "coordinator"
  prometheus:
    handlerPath: /metrics
    listenAddress: 0.0.0.0:7203
  sanitization: prometheus
  samplingRate: 1.0
  extended: none

tagOptions:
  idScheme: quoted

clusters:
  - namespaces:
      - namespace: default_unaggregated
        type: unaggregated
        retention: 168h
      - namespace: metrics_aggregated_5m_720h
        type: aggregated
        retention: 720h
        resolution: 5m
    client:
      config:
        service:
          env: default_env
          zone: embedded
          service: m3db
          cacheDir: /var/lib/m3kv
          etcdClusters:
            - zone: embedded
              endpoints:
                - <etcd0>
                - <etcd1>
                - <etcd2>
      writeConsistencyLevel: majority
      readConsistencyLevel: unstrict_majority
      writeTimeout: 10s
      fetchTimeout: 15s
      connectTimeout: 20s
      writeRetry:
        initialBackoff: 500ms
        backoffFactor: 3
        maxRetries: 2
        jitter: true
      fetchRetry:
        initialBackoff: 500ms
        backoffFactor: 2
        maxRetries: 3
        jitter: true
      backgroundHealthCheckFailLimit: 4
      backgroundHealthCheckFailThrottleFactor: 0.5

namespace:

curl -X POST localhost:7201/api/v1/namespace -d '{
  "name": "default_unaggregated",
  "options": {
    "bootstrapEnabled": true,
    "flushEnabled": true,
    "writesToCommitLog": true,
    "cleanupEnabled": true,
    "snapshotEnabled": true,
    "repairEnabled": false,
    "retentionOptions": {
      "retentionPeriodDuration": "168h",
      "blockSizeDuration": "4h",
      "bufferFutureDuration": "1h",
      "bufferPastDuration": "1h",
      "blockDataExpiry": true,
      "blockDataExpiryAfterNotAccessPeriodDuration": "5m"
    },
    "indexOptions": {
      "enabled": true,
      "blockSizeDuration": "4h"
    }
  }
}'

curl -X POST localhost:7201/api/v1/namespace -d '{
  "name": "metrics_aggregated_5m_720h",
  "options": {
    "bootstrapEnabled": true,
    "flushEnabled": true,
    "writesToCommitLog": true,
    "cleanupEnabled": true,
    "snapshotEnabled": true,
    "repairEnabled": false,
    "retentionOptions": {
      "retentionPeriodDuration": "720h",
      "blockSizeDuration": "24h",
      "bufferFutureDuration": "1h",
      "bufferPastDuration": "1h",
      "blockDataExpiry": true,
      "blockDataExpiryAfterNotAccessPeriodDuration": "5m"
    },
    "indexOptions": {
      "enabled": true,
      "blockSizeDuration": "24h"
    }
  }
}'

errors in the m3coordinator log:

{"level":"error","ts":1591174908.5401008,"msg":"write error","remoteAddr":"****:37230","httpResponseStatusCode":400,"numRegularErrors":0,"numBadRequestErrors":1,"lastRegularError":"","lastBadRequestErr":"datapoint for aggregation too far in past: id=u'\u0008\u0000\u0008\u0000__name__\u0016\u0000node_cpu_seconds_total\u0003\u0000cpu\u0002\u000029\u0008\u0000instance\u0007\u00006BSWRT2\u0003\u0000job\u000c\u0000nights-watch\u0004\u0000mode\u0006\u0000system\u0004\u0000port\u0004\u00009100\n\u0000prometheus)\u0000default/prometheus-operated-m3db-k8s-auto\u0012\u0000prometheus_replica.\u0000prometheus-prometheus-operated-m3db-k8s-auto-0, off_by=6m59.290506286s, timestamp=03 Jun 20 08:54 +0000, past_limit=03 Jun 20 09:01 +0000, timestamp_unix_nanos=1591174474246000000, past_limit_unix_nanos=1591174893536506286"}

@xcode03
Copy link

xcode03 commented Jun 4, 2020

I found a solution, after my test can solve this problem:

Add the following configuration in m3coordinator:

downsample:
  bufferPastLimits:
    - resolution: 5m
      bufferPast: 30m

@gibbscullen gibbscullen added the area:coordinator All issues pertaining to coordinator label Jul 2, 2020
@gibbscullen gibbscullen self-assigned this Jul 2, 2020
@gibbscullen
Copy link
Collaborator

Thanks for submitting - we'll take a look !

@skupjoe
Copy link

skupjoe commented Sep 30, 2021

Thanks, @xcode03 !

@gibbscullen - I am also noting that the following block in the m3coordinator config ref is incorrect:

# Specifies a custom buffer past limit for aggregation tiles
downsample:
  bufferPastLimits:
    resolution: <duration>
    bufferPast: <duration>

The options for bufferPastLimits should be formatted as a list, as @xcode03 commented:

downsample:
  bufferPastLimits:
    - resolution: <duration>
      bufferPast: <duration>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:coordinator All issues pertaining to coordinator
Projects
None yet
Development

No branches or pull requests

4 participants