Waiting for k8s resources
# 🌱|help-and-getting-started
Is there a way I can explicitly define how to wait for certain resources within a kubernetes deploy action? (Bonsai) Garden seems to not care about completeness of Jobs and the number of running replicas in statefulsets ( for example if pods are waiting for a volume). I would like to state that my action should be considered complete based on some value in the resource's state if that is possible.
Garden will mark deploy ready or complete once it's ready or complete on the kubernetes side. With kubernetes you can describe how to set and monitor your resource's state via liveness, readiness and startup probes [1] For more advanced cases that can't be configured with kubernetes you could create a Run action as a dependant of the Deploy action that checks for the status more granularly and completes once desired state is reached. If there is a lot of demand for such functionality we might think of implementing it officially into Garden to make it more convenient. https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
I still beleive something is not working correctly, let me show an example. I've declared a statefulset in my kubernetes yamls and added it to a kubernetes deploy action. I intentionally defined a volumemount that has no corresponding volume defined, so that the pod won't be able to start and garden happily reports the deployment as done
Copy code
garden deploy
Deploy 🚀

Garden v0.13 (Bonsai) is a major release with significant changes. Please help us improve it by reporting any issues/bugs here:
→ Run garden util hide-warning 0.13-bonsai to disable this warning.
ℹ garden               → Running in Garden environment default.default
(node:84105) ExperimentalWarning: The Fetch API is an experimental feature. This feature could change at any time
(Use `garden --trace-warnings ...` to show where the warning was created)
ℹ cloud-dashboard      → 🌸  Connected to Cloud Dashboard. View logs and command results at:

Copy code
ℹ providers            → Getting status...
✔ providers            → Cached (took 2.6 sec)
ℹ providers            → Run with --force-refresh to force a refresh of provider statuses.
ℹ graph                → Resolving actions and modules...
✔ graph                → Done (took 0.7 sec)
ℹ deploy.kcp-cert-manager-crds → missing
ℹ deploy.kcp           → outdated
ℹ deploy.kcp-account   → outdated
ℹ deploy.kcp-cert      → Already deployed
ℹ deploy.kcp-cert-manager → Already deployed
ℹ build.kcpctl         → Already built
ℹ deploy.kcp           → Deploying version v-615bf9f5c5...
ℹ deploy.ingress-nginx → Already deployed
ℹ deploy.kcp           → Waiting for resources to be ready...
ℹ deploy.cert-manager  → Already deployed
ℹ deploy.kcp           → Resources ready
✔ deploy.kcp           → Done (took 2.6 sec)
ℹ deploy.kcp           → Ingress: http://kcp.playground.garden
ℹ deploy.kcp-account   → Deploying version v-5f59d745d4...
ℹ deploy.kcp-account   → Waiting for resources to be ready...
ℹ deploy.kcp-account   → Resources ready
✔ deploy.kcp-account   → Done (took 2.6 sec)
ℹ deploy.kcp-cert-manager-crds → Deploying version v-ff63574bea...
ℹ deploy.kcp-cert-manager-crds → Waiting for resources to be ready...
ℹ deploy.kcp-cert-manager-crds → Resources ready
✔ deploy.kcp-cert-manager-crds → Done (took 2.6 sec)

Done! ✔️
Copy code
k get sts
NAME               READY   AGE
kcp                0/1     19s
kcp-cert-manager   1/1     4h15m
Copy code
k describe sts kcp
Name:               kcp
Namespace:          default
CreationTimestamp:  Tue, 23 May 2023 14:45:27 +0200
Selector:           app=kcp
Labels:             garden.io/service=kcp
Annotations:        garden.io/manifest-hash: 09956a2fe4e9ece9e213b5821ece0e5ad1b0dd81cdfeecef8e604bbed9b0d1ba
                    garden.io/mode: default
                    garden.io/service: kcp
Replicas:           1 desired | 0 total
Update Strategy:    RollingUpdate
  Partition:        0
Pods Status:        0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=kcp
  Init Containers:
    Image:      kcpctl:v-79a98dd0ef
    Port:       <none>
    Host Port:  <none>
      cat /etc/kcp-certificate/tls.crt /etc/kcp-certificate/ca.crt > /data/tls-bundle.crt || true

    Environment:  <none>
      /data from data (rw)
      /etc/kcp-certificate from kcp-certificate (rw)
    Image:      ghcr.io/kcp-dev/kcp:530d15f
    Port:       6443/TCP
    Host Port:  0/TCP
    Environment:  <none>
      /data from data (rw)
      /etc/kcp-certificate from kcp-certificate (rw)
Copy code
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kcp-server
    Optional:    false
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  kcp-data
    ReadOnly:   false
Volume Claims:  <none>
  Type     Reason        Age                 From                    Message
  ----     ------        ----                ----                    -------
  Warning  FailedCreate  13s (x13 over 33s)  statefulset-controller  create Pod kcp-0 in StatefulSet kcp failed error: Pod "kcp-0" is invalid: [spec.containers[0].volumeMounts[1].name: Not found: "kcp-certificate", spec.initContainers[0].volumeMounts[1].name: Not found: "kcp-certificate"]
this is the statefulset status:
Copy code
  availableReplicas: 0
  collisionCount: 0
  currentRevision: kcp-ddc5b66b5
  observedGeneration: 1
  replicas: 0
  updateRevision: kcp-ddc5b66b5
ftr I used this workaround:
Copy code
kind: Run
description: Wait for kcp to start
name: kcp-wait
type: exec
    - kubectl
    - rollout
    - status
    - --watch
    - --timeout=120s
    - -n
    - "${environment.namespace}"
    - statefulset/kcp
  - deploy.kcp
and for the job:
Copy code
kind: Run
description: Wait for kcp-account to be completed
name: kcp-account-wait
type: exec
    - kubectl
    - wait
    - --for=condition=complete
    - --timeout=60s
    - -n
    - "${environment.namespace}"
    - job/kcp-account
  - deploy.kcp-account
Hmm yes this looks like undesired behaviour. I'll be on it, thanks for bringing it up
@icy-furniture-17516 would you mind sharing a basic repro, that would speed us up with this a lot
of course, I created a branch that has an sts with a non-existent volume to mount: https://github.com/pepov/kcp-playground/tree/statefulset-pods-not-starting
installation is in the readme, but it requires minikube and bonsai and to reproduce just run:
Copy code
garden deploy kcp
it will pass, but the statefulset will be unavailable
it starts in 85 sec for me, so I hope this is still easy enough to work with
thank you!
thanks for looking into this. I'm okay with the workaround for now, but I think this would be a very nice thing to have. I specifically only care about Jobs/Deployments/Statefulsets to complete/start properly and for that we don't need anything custom I think (although I was asking for that initially)
I can check whether it worked in 0.12 or not, but I have to prepare for this talk tonight
I've filed a pr that should fix the issue: https://github.com/garden-io/garden/pull/4430 Again, thank you for the repro!
nice, thanks!
you don't have handlers for kubernetes Jobs?
the source code references the helm libraries, which have handler for jobs, but I couldn't find it in the garden code
Currently there's no custom wait handler for jobs which basically means we think it's ready immediately https://github.com/garden-io/garden/blob/k8s-sts-state/core/src/plugins/kubernetes/status/status.ts#L71
okay same 🙂
can that be a possible opportunity to contribute? 🙂
first I guess I should just create a feature request and then if it doesn't get priority from your side I could try to give it a shot
although I don't speak typescript, but chatgpt surely does 😄
Sure, if you could create an issue that describes the expected behaviour and maybe even includes a repro we could get it done quite quickly
will do, thanks!
after submitted I realized this is not just for 0.13, please feel free to change label/title accordingly
or let me know if I should submit it differently