pastlimit is not getting captured properly #2355

prben · 2020-05-22T10:00:04Z

Filing M3 Issues

General Issues

Using carbon ingestor built on top of M3Coordinator. It's not calculating the past buffer time defined in the namespace.
May 22 15:23:43 svc-m3co-8138295 m3coordinator[17840]: {"level":"error","ts":1590141223.8895106,"msg":"err writing carbon metric","name":"system.ig-0320deb9-2836-1711296.all.net.icmp_outtimestampreps","error":"datapoint for aggregation too far in past: id=u'\u0006\u0000\u0006\u0000__g0__\u0006\u0000system\u0006\u0000__g1__+\u0000ig-0320deb9-2836-1711296\u0006\u0000__g2__\u0003\u0000all\u0006\u0000__g3__\u0003\u0000net\u0006\u0000__g4__\u0015\u0000icmp_outtimestampreps\u0014\u0000__option_id_scheme__\u0008\u0000graphite, off_by=17.88950451s, timestamp=22 May 20 15:23 +0530, past_limit=22 May 20 15:23 +0530, timestamp_unix_nanos=1590141191000000000, past_limit_unix_nanos=1590141208889504510","errorCauses":[{"error":"datapoint for aggregation too far in past: id=u'\u0006\u0000\u0006\u0000__g0__\u0006\u0000system\u0006\u0000__g1__+\u0000ig-0320deb9-2836-1711296\u0006\u0000__g2__\u0003\u0000all\u0006\u0000__g3__\u0003\u0000net\u0006\u0000__g4__\u0015\u0000icmp_outtimestampreps\u0014\u0000__option_id_scheme__\u0008\u0000graphite, off_by=17.88950451s, timestamp=22 May 20 15:23 +0530, past_limit=22 May 20 15:23 +0530, timestamp_unix_nanos=1590141191000000000, past_limit_unix_nanos=1590141208889504510"}]}

What service is experiencing the issue?
M3Coordinator v0.15.0-rc.9
What is the configuration of the service?

namespace config
"options": { "bootstrapEnabled": True, "flushEnabled": True, "writesToCommitLog": True, "cleanupEnabled": True, "repairEnabled": True, "retentionOptions": { "retentionPeriodDuration": "8760h", "blockSizeDuration": "12h", "bufferFutureDuration": "1h", "bufferPastDuration": "1h", "blockDataExpiry": True, "blockDataExpiryAfterNotAccessPeriodDuration": "1h" }, "snapshotEnabled": True, "indexOptions": { "enabled": True, "blockSizeDuration": "12h" } }

coordinator config
`
listenAddress:
value: "0.0.0.0:7201"
carbon:
ingester:
listenAddress: "0.0.0.0:2003"
rules:
- pattern: system.*
aggregation:
type: mean
policies:
- resolution: 1m
retention: 8760h
clusters:

namespaces:
- namespace: system_1year
  type: aggregated
  retention: 8760h
  resolution: 1m
  `

How are you using the service?
telegraf output plugin via graphite
Is there a reliable way to reproduce the behavior? If so, please provide detailed instructions.

Setup namespace as per the config above
Setup telegraf output plugin for graphite
Setup past time within past buffertime on the emitter system.

The text was updated successfully, but these errors were encountered:

xcode03 · 2020-06-03T09:02:53Z

same problem

below is my configuration:

m3coordinator:

listenAddress:
  type: "config"
  value: "0.0.0.0:7201"

logging:
  level: info

metrics:
  scope:
    prefix: "coordinator"
  prometheus:
    handlerPath: /metrics
    listenAddress: 0.0.0.0:7203
  sanitization: prometheus
  samplingRate: 1.0
  extended: none

tagOptions:
  idScheme: quoted

clusters:
  - namespaces:
      - namespace: default_unaggregated
        type: unaggregated
        retention: 168h
      - namespace: metrics_aggregated_5m_720h
        type: aggregated
        retention: 720h
        resolution: 5m
    client:
      config:
        service:
          env: default_env
          zone: embedded
          service: m3db
          cacheDir: /var/lib/m3kv
          etcdClusters:
            - zone: embedded
              endpoints:
                - <etcd0>
                - <etcd1>
                - <etcd2>
      writeConsistencyLevel: majority
      readConsistencyLevel: unstrict_majority
      writeTimeout: 10s
      fetchTimeout: 15s
      connectTimeout: 20s
      writeRetry:
        initialBackoff: 500ms
        backoffFactor: 3
        maxRetries: 2
        jitter: true
      fetchRetry:
        initialBackoff: 500ms
        backoffFactor: 2
        maxRetries: 3
        jitter: true
      backgroundHealthCheckFailLimit: 4
      backgroundHealthCheckFailThrottleFactor: 0.5

namespace:

curl -X POST localhost:7201/api/v1/namespace -d '{
  "name": "default_unaggregated",
  "options": {
    "bootstrapEnabled": true,
    "flushEnabled": true,
    "writesToCommitLog": true,
    "cleanupEnabled": true,
    "snapshotEnabled": true,
    "repairEnabled": false,
    "retentionOptions": {
      "retentionPeriodDuration": "168h",
      "blockSizeDuration": "4h",
      "bufferFutureDuration": "1h",
      "bufferPastDuration": "1h",
      "blockDataExpiry": true,
      "blockDataExpiryAfterNotAccessPeriodDuration": "5m"
    },
    "indexOptions": {
      "enabled": true,
      "blockSizeDuration": "4h"
    }
  }
}'

curl -X POST localhost:7201/api/v1/namespace -d '{
  "name": "metrics_aggregated_5m_720h",
  "options": {
    "bootstrapEnabled": true,
    "flushEnabled": true,
    "writesToCommitLog": true,
    "cleanupEnabled": true,
    "snapshotEnabled": true,
    "repairEnabled": false,
    "retentionOptions": {
      "retentionPeriodDuration": "720h",
      "blockSizeDuration": "24h",
      "bufferFutureDuration": "1h",
      "bufferPastDuration": "1h",
      "blockDataExpiry": true,
      "blockDataExpiryAfterNotAccessPeriodDuration": "5m"
    },
    "indexOptions": {
      "enabled": true,
      "blockSizeDuration": "24h"
    }
  }
}'

errors in the m3coordinator log:

{"level":"error","ts":1591174908.5401008,"msg":"write error","remoteAddr":"****:37230","httpResponseStatusCode":400,"numRegularErrors":0,"numBadRequestErrors":1,"lastRegularError":"","lastBadRequestErr":"datapoint for aggregation too far in past: id=u'\u0008\u0000\u0008\u0000__name__\u0016\u0000node_cpu_seconds_total\u0003\u0000cpu\u0002\u000029\u0008\u0000instance\u0007\u00006BSWRT2\u0003\u0000job\u000c\u0000nights-watch\u0004\u0000mode\u0006\u0000system\u0004\u0000port\u0004\u00009100\n\u0000prometheus)\u0000default/prometheus-operated-m3db-k8s-auto\u0012\u0000prometheus_replica.\u0000prometheus-prometheus-operated-m3db-k8s-auto-0, off_by=6m59.290506286s, timestamp=03 Jun 20 08:54 +0000, past_limit=03 Jun 20 09:01 +0000, timestamp_unix_nanos=1591174474246000000, past_limit_unix_nanos=1591174893536506286"}

xcode03 · 2020-06-04T08:12:21Z

I found a solution, after my test can solve this problem:

Add the following configuration in m3coordinator:

downsample:
  bufferPastLimits:
    - resolution: 5m
      bufferPast: 30m

gibbscullen · 2020-07-02T18:10:48Z

Thanks for submitting - we'll take a look !

skupjoe · 2021-09-30T02:17:15Z

Thanks, @xcode03 !

@gibbscullen - I am also noting that the following block in the m3coordinator config ref is incorrect:

# Specifies a custom buffer past limit for aggregation tiles
downsample:
  bufferPastLimits:
    resolution: <duration>
    bufferPast: <duration>

The options for bufferPastLimits should be formatted as a list, as @xcode03 commented:

downsample:
  bufferPastLimits:
    - resolution: <duration>
      bufferPast: <duration>

gibbscullen added the area:coordinator All issues pertaining to coordinator label Jul 2, 2020

gibbscullen self-assigned this Jul 2, 2020

gibbscullen closed this as completed Oct 20, 2020

eberkut mentioned this issue Dec 28, 2021

datapoint for aggregation too far in past m3db/m3db-operator#309

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pastlimit is not getting captured properly #2355

pastlimit is not getting captured properly #2355

prben commented May 22, 2020 •

edited

Loading

xcode03 commented Jun 3, 2020

xcode03 commented Jun 4, 2020

gibbscullen commented Jul 2, 2020

skupjoe commented Sep 30, 2021

pastlimit is not getting captured properly #2355

pastlimit is not getting captured properly #2355

Comments

prben commented May 22, 2020 • edited Loading

Filing M3 Issues

General Issues

xcode03 commented Jun 3, 2020

xcode03 commented Jun 4, 2020

gibbscullen commented Jul 2, 2020

skupjoe commented Sep 30, 2021

prben commented May 22, 2020 •

edited

Loading