Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large cluster deployment with 8 GiB RAM fails #1825

Open
jefflill opened this issue Jul 21, 2023 · 2 comments
Open

Large cluster deployment with 8 GiB RAM fails #1825

jefflill opened this issue Jul 21, 2023 · 2 comments
Assignees
Labels
bug Identifies a bug or other failure cluster-setup neonKUBE cluster setup neon-kube Related to our Kubernetes distribution

Comments

@jefflill
Copy link
Collaborator

jefflill commented Jul 21, 2023

This appears to be a cluster advice issue. In this case, the tempo-ingester pods are not able to be scheduled:

Name:                 tempo-ingester-0
Namespace:            neon-monitor
Priority:             900000000
Priority Class Name:  neon-min
Node:                 <none>
Labels:               app.kubernetes.io/component=ingester
                      app.kubernetes.io/instance=tempo
                      app.kubernetes.io/managed-by=Helm
                      app.kubernetes.io/name=tempo
                      app.kubernetes.io/version=1.3.2
                      controller-revision-hash=tempo-ingester-848cd8689
                      helm.sh/chart=tempo-distributed-0.16.9
                      statefulset.kubernetes.io/pod-name=tempo-ingester-0
                      tempo-gossip-member=true
Annotations:          checksum/config: c764e248482a115a73aaa4678cf3e9a5b9ead286adccfc738cfc4a2e3f314e1c
                      sidecar.istio.io/inject: false
                      traffic.sidecar.istio.io/excludeInboundPorts: 7946
                      traffic.sidecar.istio.io/excludeOutboundPorts: 7946
Status:               Pending
IP:                   
IPs:                  <none>
Controlled By:        StatefulSet/tempo-ingester
Containers:
  ingester:
    Image:       registry.neon.local/neonkube/grafana-tempo:2.0.0
    Ports:       9095/TCP, 7946/TCP, 3100/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Args:
      -target=ingester
      -config.file=/conf/tempo.yaml
      -mem-ballast-size-mbs=64
      -config.expand-env=true
    Limits:
      memory:  1Gi
    Requests:
      memory:   1Gi
    Readiness:  http-get http://:http/ready delay=30s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ACCESS_KEY_ID:      <set to the key 'accesskey' in secret 'minio'>  Optional: false
      SECRET_ACCESS_KEY:  <set to the key 'secretkey' in secret 'minio'>  Optional: false
      GOGC:               10
    Mounts:
      /conf from tempo-conf (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xknb7 (ro)
      /var/tempo from data (rw)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-tempo-ingester-0
    ReadOnly:   false
  tempo-conf:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      tempo
    Optional:  false
  kube-api-access-xknb7:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              node.neonkube.io/monitor.traces-internal=true
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 30s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 30s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  12m                default-scheduler  0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 3 Insufficient memory, 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 5 node(s) didn't find available persistent volumes to bind. preemption: 0/6 nodes are available: 1 Insufficient memory, 5 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  11m (x4 over 12m)  default-scheduler  0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 3 Insufficient memory, 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 5 node(s) had volume node affinity conflict. preemption: 0/6 nodes are available: 1 Insufficient memory, 5 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  4m (x5 over 11m)   default-scheduler  0/6 nodes are available: 3 Insufficient memory, 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 5 node(s) had volume node affinity conflict. preemption: 0/6 nodes are available: 1 Insufficient memory, 5 Preemption is not helpful for scheduling.```

 
I've temporarily reset the node RAM for these test clusters: 8 GiB --> 16 GiB

@jefflill jefflill added bug Identifies a bug or other failure neon-kube Related to our Kubernetes distribution cluster-setup neonKUBE cluster setup labels Jul 21, 2023
@jefflill
Copy link
Collaborator Author

@marcusbooyah looked at this and it's a problem with cluster advice. He hacked around this for clusters with 10 nodes or less but we'll need to put some more effort into how cluster advice works.

@marcusbooyah
Copy link
Member

For now we should just recommend 16GB minimum

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Identifies a bug or other failure cluster-setup neonKUBE cluster setup neon-kube Related to our Kubernetes distribution
Projects
None yet
Development

No branches or pull requests

2 participants