Cluster with "placement not found" error #96

jchanam · 2019-03-01T15:16:12Z

What version of Kubernetes are you running? Please include the output of kubectl version.

v1.11.6

What are you trying to do?

Create a m3db cluster and send metrics from prometheus

What did you expect to happen?

The cluster to start accepting requests from prometheus to store metrics

What happened?

The operator is showing logs of:

{"level":"error","ts":1551452698.3225112,"msg":"error from m3admin placement get: placement does not exist: status not found","controller":"m3db-cluster-controller"}
E0301 15:04:58.322536       1 update_cluster.go:171] error from m3admin placement get: placement does not exist: status not found
E0301 15:04:58.322550       1 controller.go:297] error syncing cluster 'm3db/m3db-cluster': error from m3admin placement get: placement does not exist: status not found

And prometheus can't store metrics:

level=warn ts=2019-03-01T11:08:48.652639023Z caller=queue_manager.go:527 component=remote queue=0:http://m3coordinator-m3db-cluster.m3db.svc.cluster.local:7201/api/v1/prom/remote/write msg="Error sending samples to remote storage" count=100 err="server returned HTTP status 500 Internal Server Error: {\"error\":\"M3DB session not yet initialized\"}"

Here is my cluster configuration:

apiVersion: operator.m3db.io/v1alpha1
kind: M3DBCluster
metadata:
  name: m3db-cluster
  namespace: m3db
spec:
  image: quay.io/m3db/m3dbnode:latest
  replicationFactor: 3
  numberOfShards: 256
  isolationGroups:
    - name: eu-west-1b
      numInstances: 1
    - name: eu-west-1c
      numInstances: 1
    - name: eu-west-1a
      numInstances: 1
  namespaces:
    - name: short
      preset: 10s:2d
    - name: medium
      options:
        bootstrapEnabled: true
        flushEnabled: true
        writesToCommitLog: true
        cleanupEnabled: true
        repairEnabled: false
        snapshotEnabled: true
        retentionOptions:
          retentionPeriod: 960h
          blockSize: 12h
          bufferFuture: 10m
          bufferPast: 20m
          blockDataExpiry: true
          blockDataExpiryAfterNotAccessPeriod: 10m
        indexOptions:
          enabled: true
          blockSize: 12h
    - name: long
      options:
        bootstrapEnabled: true
        flushEnabled: true
        writesToCommitLog: true
        cleanupEnabled: true
        repairEnabled: false
        snapshotEnabled: true
        retentionOptions:
          retentionPeriod: 9000h
          blockSize: 72h
          bufferFuture: 10m
          bufferPast: 20m
          blockDataExpiry: true
          blockDataExpiryAfterNotAccessPeriod: 10m
        indexOptions:
          enabled: true
          blockSize: 72h
  configMapName: m3db-cluster-m3-configuration
  containerResources:
    requests:
      memory: 16Gi
      cpu: "1"
    limits:
      memory: 18Gi
      cpu: "4"

and the configmap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: m3db-cluster-m3-configuration
  namespace: m3db
data:
  m3.yml: |+
    coordinator:
      listenAddress:
        type: "config"
        value: "0.0.0.0:7201"
      local:
        namespaces:
          - namespace: short
            type: unaggregated
            retention: 48h
            resolution: 10s
          - namespace: medium
            type: aggregated
            retention: 960h
            resolution: 10m
            downsample:
              all: false
          - namespace: long
            type: aggregated
            retention: 9000h
            resolution: 1h
            downsample:
              all: false
      metrics:
        scope:
          prefix: "coordinator"
        prometheus:
          handlerPath: /metrics
          listenAddress: 0.0.0.0:7203
        sanitization: prometheus
        samplingRate: 1.0
        extended: none
      tagOptions:
        idScheme: quoted

    db:
      logging:
        level: debug

      metrics:
        prometheus:
          handlerPath: /metrics
        sanitization: prometheus
        samplingRate: 1.0
        extended: detailed

      listenAddress: 0.0.0.0:9000
      clusterListenAddress: 0.0.0.0:9001
      httpNodeListenAddress: 0.0.0.0:9002
      httpClusterListenAddress: 0.0.0.0:9003
      debugListenAddress: 0.0.0.0:9004

      hostID:
        resolver: file
        file:
          path: /etc/m3db/pod-identity/identity
          timeout: 5m

      client:
        writeConsistencyLevel: majority
        readConsistencyLevel: unstrict_majority
        writeTimeout: 10s
        fetchTimeout: 15s
        connectTimeout: 20s
        writeRetry:
            initialBackoff: 500ms
            backoffFactor: 3
            maxRetries: 2
            jitter: true
        fetchRetry:
            initialBackoff: 500ms
            backoffFactor: 2
            maxRetries: 3
            jitter: true
        backgroundHealthCheckFailLimit: 4
        backgroundHealthCheckFailThrottleFactor: 0.5

      gcPercentage: 100

      writeNewSeriesAsync: true
      writeNewSeriesLimitPerSecond: 1048576
      writeNewSeriesBackoffDuration: 2ms

      bootstrap:
        bootstrappers:
            - filesystem
            - commitlog
            - peers
            - uninitialized_topology
        fs:
            numProcessorsPerCPU: 0.125

      commitlog:
        flushMaxBytes: 524288
        flushEvery: 1s
        queue:
            calculationType: fixed
            size: 2097152
        blockSize: 10m

      fs:
        filePathPrefix: /var/lib/m3db
        writeBufferSize: 65536
        dataReadBufferSize: 65536
        infoReadBufferSize: 128
        seekReadBufferSize: 4096
        throughputLimitMbps: 100.0
        throughputCheckEvery: 128

      repair:
        enabled: false
        interval: 2h
        offset: 30m
        jitter: 1h
        throttle: 2m
        checkInterval: 1m

      pooling:
        blockAllocSize: 16
        type: simple
        seriesPool:
            size: 262144
            lowWatermark: 0.7
            highWatermark: 1.0
        blockPool:
            size: 262144
            lowWatermark: 0.7
            highWatermark: 1.0
        encoderPool:
            size: 262144
            lowWatermark: 0.7
            highWatermark: 1.0
        closersPool:
            size: 104857
            lowWatermark: 0.7
            highWatermark: 1.0
        contextPool:
            size: 262144
            lowWatermark: 0.7
            highWatermark: 1.0
        segmentReaderPool:
            size: 16384
            lowWatermark: 0.7
            highWatermark: 1.0
        iteratorPool:
            size: 2048
            lowWatermark: 0.7
            highWatermark: 1.0
        fetchBlockMetadataResultsPool:
            size: 65536
            capacity: 32
            lowWatermark: 0.7
            highWatermark: 1.0
        fetchBlocksMetadataResultsPool:
            size: 32
            capacity: 4096
            lowWatermark: 0.7
            highWatermark: 1.0
        hostBlockMetadataSlicePool:
            size: 131072
            capacity: 3
            lowWatermark: 0.7
            highWatermark: 1.0
        blockMetadataPool:
            size: 65536
            lowWatermark: 0.7
            highWatermark: 1.0
        blockMetadataSlicePool:
            size: 65536
            capacity: 32
            lowWatermark: 0.7
            highWatermark: 1.0
        blocksMetadataPool:
            size: 65536
            lowWatermark: 0.7
            highWatermark: 1.0
        blocksMetadataSlicePool:
            size: 32
            capacity: 4096
            lowWatermark: 0.7
            highWatermark: 1.0
        identifierPool:
            size: 262144
            lowWatermark: 0.7
            highWatermark: 1.0
        bytesPool:
            buckets:
                - capacity: 16
                  size: 524288
                  lowWatermark: 0.7
                  highWatermark: 1.0
                - capacity: 32
                  size: 262144
                  lowWatermark: 0.7
                  highWatermark: 1.0
                - capacity: 64
                  size: 131072
                  lowWatermark: 0.7
                  highWatermark: 1.0
                - capacity: 128
                  size: 65536
                  lowWatermark: 0.7
                  highWatermark: 1.0
                - capacity: 256
                  size: 65536
                  lowWatermark: 0.7
                  highWatermark: 1.0
                - capacity: 1440
                  size: 16384
                  lowWatermark: 0.7
                  highWatermark: 1.0
                - capacity: 4096
                  size: 8192
                  lowWatermark: 0.7
                  highWatermark: 1.0
      config:
        service:
          env: default_env
          zone: embedded
          service: m3db
          cacheDir: /var/lib/m3kv
          etcdClusters:
            - zone: embedded
              endpoints:
                - http://m3db-etcd-client:2379

Is this due to a misconfiguration error? Is there anything I can try to fix this?

The text was updated successfully, but these errors were encountered:

schallert · 2019-03-01T15:44:54Z

It looks like this is a bug I introduced in #94

Specifically, we now wrap m3admin errors to provide more context:

m3db-operator/pkg/m3admin/client.go

Line 151 in f3264df

return nil, pkgerrors.WithMessage(ErrNotFound, errMsg)

But I forgot to unwrap those errors in the code that called it:

m3db-operator/pkg/controller/update_cluster.go

Line 168 in f3264df

if err != m3admin.ErrNotFound {

schallert mentioned this issue Mar 1, 2019

[controller] unwrap errors from m3admin #97

Merged

schallert closed this as completed in #97 Mar 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster with "placement not found" error #96

Cluster with "placement not found" error #96

jchanam commented Mar 1, 2019

schallert commented Mar 1, 2019

Cluster with "placement not found" error #96

Cluster with "placement not found" error #96

Comments

jchanam commented Mar 1, 2019

schallert commented Mar 1, 2019