Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster with "placement not found" error #96

Closed
jchanam opened this issue Mar 1, 2019 · 1 comment
Closed

Cluster with "placement not found" error #96

jchanam opened this issue Mar 1, 2019 · 1 comment

Comments

@jchanam
Copy link

jchanam commented Mar 1, 2019

  • What version of Kubernetes are you running? Please include the output of kubectl version.

v1.11.6

  • What are you trying to do?

Create a m3db cluster and send metrics from prometheus

  • What did you expect to happen?

The cluster to start accepting requests from prometheus to store metrics

  • What happened?

The operator is showing logs of:

{"level":"error","ts":1551452698.3225112,"msg":"error from m3admin placement get: placement does not exist: status not found","controller":"m3db-cluster-controller"}
E0301 15:04:58.322536       1 update_cluster.go:171] error from m3admin placement get: placement does not exist: status not found
E0301 15:04:58.322550       1 controller.go:297] error syncing cluster 'm3db/m3db-cluster': error from m3admin placement get: placement does not exist: status not found

And prometheus can't store metrics:

level=warn ts=2019-03-01T11:08:48.652639023Z caller=queue_manager.go:527 component=remote queue=0:http://m3coordinator-m3db-cluster.m3db.svc.cluster.local:7201/api/v1/prom/remote/write msg="Error sending samples to remote storage" count=100 err="server returned HTTP status 500 Internal Server Error: {\"error\":\"M3DB session not yet initialized\"}"

Here is my cluster configuration:

apiVersion: operator.m3db.io/v1alpha1
kind: M3DBCluster
metadata:
  name: m3db-cluster
  namespace: m3db
spec:
  image: quay.io/m3db/m3dbnode:latest
  replicationFactor: 3
  numberOfShards: 256
  isolationGroups:
    - name: eu-west-1b
      numInstances: 1
    - name: eu-west-1c
      numInstances: 1
    - name: eu-west-1a
      numInstances: 1
  namespaces:
    - name: short
      preset: 10s:2d
    - name: medium
      options:
        bootstrapEnabled: true
        flushEnabled: true
        writesToCommitLog: true
        cleanupEnabled: true
        repairEnabled: false
        snapshotEnabled: true
        retentionOptions:
          retentionPeriod: 960h
          blockSize: 12h
          bufferFuture: 10m
          bufferPast: 20m
          blockDataExpiry: true
          blockDataExpiryAfterNotAccessPeriod: 10m
        indexOptions:
          enabled: true
          blockSize: 12h
    - name: long
      options:
        bootstrapEnabled: true
        flushEnabled: true
        writesToCommitLog: true
        cleanupEnabled: true
        repairEnabled: false
        snapshotEnabled: true
        retentionOptions:
          retentionPeriod: 9000h
          blockSize: 72h
          bufferFuture: 10m
          bufferPast: 20m
          blockDataExpiry: true
          blockDataExpiryAfterNotAccessPeriod: 10m
        indexOptions:
          enabled: true
          blockSize: 72h
  configMapName: m3db-cluster-m3-configuration
  containerResources:
    requests:
      memory: 16Gi
      cpu: "1"
    limits:
      memory: 18Gi
      cpu: "4"

and the configmap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: m3db-cluster-m3-configuration
  namespace: m3db
data:
  m3.yml: |+
    coordinator:
      listenAddress:
        type: "config"
        value: "0.0.0.0:7201"
      local:
        namespaces:
          - namespace: short
            type: unaggregated
            retention: 48h
            resolution: 10s
          - namespace: medium
            type: aggregated
            retention: 960h
            resolution: 10m
            downsample:
              all: false
          - namespace: long
            type: aggregated
            retention: 9000h
            resolution: 1h
            downsample:
              all: false
      metrics:
        scope:
          prefix: "coordinator"
        prometheus:
          handlerPath: /metrics
          listenAddress: 0.0.0.0:7203
        sanitization: prometheus
        samplingRate: 1.0
        extended: none
      tagOptions:
        idScheme: quoted

    db:
      logging:
        level: debug

      metrics:
        prometheus:
          handlerPath: /metrics
        sanitization: prometheus
        samplingRate: 1.0
        extended: detailed

      listenAddress: 0.0.0.0:9000
      clusterListenAddress: 0.0.0.0:9001
      httpNodeListenAddress: 0.0.0.0:9002
      httpClusterListenAddress: 0.0.0.0:9003
      debugListenAddress: 0.0.0.0:9004

      hostID:
        resolver: file
        file:
          path: /etc/m3db/pod-identity/identity
          timeout: 5m

      client:
        writeConsistencyLevel: majority
        readConsistencyLevel: unstrict_majority
        writeTimeout: 10s
        fetchTimeout: 15s
        connectTimeout: 20s
        writeRetry:
            initialBackoff: 500ms
            backoffFactor: 3
            maxRetries: 2
            jitter: true
        fetchRetry:
            initialBackoff: 500ms
            backoffFactor: 2
            maxRetries: 3
            jitter: true
        backgroundHealthCheckFailLimit: 4
        backgroundHealthCheckFailThrottleFactor: 0.5

      gcPercentage: 100

      writeNewSeriesAsync: true
      writeNewSeriesLimitPerSecond: 1048576
      writeNewSeriesBackoffDuration: 2ms

      bootstrap:
        bootstrappers:
            - filesystem
            - commitlog
            - peers
            - uninitialized_topology
        fs:
            numProcessorsPerCPU: 0.125

      commitlog:
        flushMaxBytes: 524288
        flushEvery: 1s
        queue:
            calculationType: fixed
            size: 2097152
        blockSize: 10m

      fs:
        filePathPrefix: /var/lib/m3db
        writeBufferSize: 65536
        dataReadBufferSize: 65536
        infoReadBufferSize: 128
        seekReadBufferSize: 4096
        throughputLimitMbps: 100.0
        throughputCheckEvery: 128

      repair:
        enabled: false
        interval: 2h
        offset: 30m
        jitter: 1h
        throttle: 2m
        checkInterval: 1m

      pooling:
        blockAllocSize: 16
        type: simple
        seriesPool:
            size: 262144
            lowWatermark: 0.7
            highWatermark: 1.0
        blockPool:
            size: 262144
            lowWatermark: 0.7
            highWatermark: 1.0
        encoderPool:
            size: 262144
            lowWatermark: 0.7
            highWatermark: 1.0
        closersPool:
            size: 104857
            lowWatermark: 0.7
            highWatermark: 1.0
        contextPool:
            size: 262144
            lowWatermark: 0.7
            highWatermark: 1.0
        segmentReaderPool:
            size: 16384
            lowWatermark: 0.7
            highWatermark: 1.0
        iteratorPool:
            size: 2048
            lowWatermark: 0.7
            highWatermark: 1.0
        fetchBlockMetadataResultsPool:
            size: 65536
            capacity: 32
            lowWatermark: 0.7
            highWatermark: 1.0
        fetchBlocksMetadataResultsPool:
            size: 32
            capacity: 4096
            lowWatermark: 0.7
            highWatermark: 1.0
        hostBlockMetadataSlicePool:
            size: 131072
            capacity: 3
            lowWatermark: 0.7
            highWatermark: 1.0
        blockMetadataPool:
            size: 65536
            lowWatermark: 0.7
            highWatermark: 1.0
        blockMetadataSlicePool:
            size: 65536
            capacity: 32
            lowWatermark: 0.7
            highWatermark: 1.0
        blocksMetadataPool:
            size: 65536
            lowWatermark: 0.7
            highWatermark: 1.0
        blocksMetadataSlicePool:
            size: 32
            capacity: 4096
            lowWatermark: 0.7
            highWatermark: 1.0
        identifierPool:
            size: 262144
            lowWatermark: 0.7
            highWatermark: 1.0
        bytesPool:
            buckets:
                - capacity: 16
                  size: 524288
                  lowWatermark: 0.7
                  highWatermark: 1.0
                - capacity: 32
                  size: 262144
                  lowWatermark: 0.7
                  highWatermark: 1.0
                - capacity: 64
                  size: 131072
                  lowWatermark: 0.7
                  highWatermark: 1.0
                - capacity: 128
                  size: 65536
                  lowWatermark: 0.7
                  highWatermark: 1.0
                - capacity: 256
                  size: 65536
                  lowWatermark: 0.7
                  highWatermark: 1.0
                - capacity: 1440
                  size: 16384
                  lowWatermark: 0.7
                  highWatermark: 1.0
                - capacity: 4096
                  size: 8192
                  lowWatermark: 0.7
                  highWatermark: 1.0
      config:
        service:
          env: default_env
          zone: embedded
          service: m3db
          cacheDir: /var/lib/m3kv
          etcdClusters:
            - zone: embedded
              endpoints:
                - http://m3db-etcd-client:2379

Is this due to a misconfiguration error? Is there anything I can try to fix this?

@schallert
Copy link
Collaborator

It looks like this is a bug I introduced in #94

Specifically, we now wrap m3admin errors to provide more context:

return nil, pkgerrors.WithMessage(ErrNotFound, errMsg)

But I forgot to unwrap those errors in the code that called it:

if err != m3admin.ErrNotFound {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants