issue #4557 `[Bug]: Kubernetes deploy does not wait for kubernetes jobs` garden.io #💻

issue #4557 `[Bug]: Kubernetes deploy does not wai...

astonishing-tomato-18259

06/13/2023, 12:02 PM

Started implementing this with https://github.com/garden-io/garden/pull/4611 but i have some thoughts. This is not a bug per se as

jobs

are not managed by garden. A job can create one more more pods and continue to retry execution of the Pods until a specified number of them successfully terminate or the desired number of pods can't be started after specified retries. Making garden wait for the completion of jobs doesn't seem ideal as the nature of job could range from short-lived, parallel, or sequential batch tasks within the cluster to tasks like background data processing etc. @swift-garage-61180 and I discussed this already and would argue that if you need to wait for a deploy in your pipeline,

job

isn't the right resource and shouldn't be used.

exec

pod

would be more suitable. @icy-furniture-17516 As you originally reported the issue, what do you think?

astonishing-tomato-18259

06/13/2023, 12:05 PM

@big-spring-14945 @freezing-pharmacist-34446 Do you have any thoughts or comments on it?

icy-furniture-17516

06/13/2023, 12:30 PM

I understand you don't want to make this the default, but isn't it an option to create a flag for this? I understand the reasoning but for me this was very unintuitive.

icy-furniture-17516

06/13/2023, 12:32 PM

I tried exec and pod, but I had some issues with it, don't they behave very differently from a deploy action in some ways?

icy-furniture-17516

06/13/2023, 12:32 PM

Sorry I don't remember exactly and I don't have much time for this at the moment, but I remember I tried multiple things

icy-furniture-17516

06/13/2023, 12:34 PM

btw what is the behaviour when deploying a helm chart?

icy-furniture-17516

06/13/2023, 12:46 PM

I will give it a try tomorrow with the kubernetes-pod

freezing-pharmacist-34446

06/13/2023, 1:19 PM

> btw what is the behaviour when deploying a helm chart? We get the deployments/statefulsets/daemonsets that are deployed with the helm chart and wait until they are ready. Waiting for readiness means that the readiness probe on the pods succeed. For jobs there are two different metrics: 1. the pod(s) created by a job pass their readiness probe - they are ready. This status can change during the lifecycle of the pod. 2. the job finishes as either completed or failed. From your example on the GitHub issue it looks like Garden waits for the initial readiness probe to pass when deploying jobs. > This is not a bug per se as jobs are not managed by garden. A job can create one more more pods and continue to retry execution of the Pods until a specified number of them successfully terminate or the desired number of pods can't be started after specified retries. I also would not see it as a bug, more a feature. If you deploy a job with helm for example, helm also does not wait for the completion of the job. It just takes care to successfully deploy the job. I agree with @astonishing-tomato-18259 that when you want to run a one-off task that garden should actually wait for until completed or failed you could use a run action https://docs.garden.io/using-garden/runs. This resource is actually managed by garden and we can make sure it runs to completion.

icy-furniture-17516

06/13/2023, 2:13 PM

Yeah, I'm not sure why I created it as a bug in the first place, maybe I was still stuck in the mindset where I thought it was "working" in 0.12, although I proved the behaviour was the same. So I agree that it's not a bug.

icy-furniture-17516

06/13/2023, 2:15 PM

And although I can accept the reasoning, I still don't think this is consistent and intuitive as long as garden is waiting for statefulsets and deployments to be deployed successfully. But I don't want to argue on this too much, I can also accept that you don't want to change this for the above reasons.

icy-furniture-17516

06/13/2023, 2:24 PM

You are right in that helm doesn't wait for anything by default and that makes sense

icy-furniture-17516

06/13/2023, 2:25 PM

But it has a flag to wait for resources in an opinionated way: https://github.com/helm/helm/blob/main/pkg/action/install.go#L410

icy-furniture-17516

06/13/2023, 2:25 PM

Copy code

--wait-for-jobs                              if set and --wait enabled, will wait until all Jobs have been completed before marking the release as successful. It will wait for as long as --timeout

icy-furniture-17516

06/13/2023, 2:29 PM

Again as a summary: - I agree this is not a bug - I think the current solution is not intuitive - I would be very happy with a flag that can mimic the helm way and I can blame myself if I use it wrong - I will give a try to kubernetes pod run to see if I can let this go completely

freezing-pharmacist-34446

06/13/2023, 2:58 PM

No worries regarding submitting this as a bug. We are always grateful about any kind of feedback and the distinction wasn't meant as a nit-pick, only to triage it a bit more clearly. > You are right in that helm doesn't wait for anything by default and that makes sense > But it has a flag to wait for resources in an opinionated way: https://github.com/helm/helm/blob/main/pkg/action/install.go#L410 Ah nice, i did not know helm had a flag to wait for jobs to complete. I looked at an old issue on GitHub where they rejected that.

icy-furniture-17516

06/13/2023, 3:50 PM

thanks for putting effort into this!

astonishing-tomato-18259

06/13/2023, 3:54 PM

@icy-furniture-17516 Thanks again. I updated the Github issue already to reflect this as feature request. Was not aware of the wait-for-jobs flag either in helm so will look into it. Meanwhile, please give it a try with

run

action type and give feedback. The reason for starting this thread was mainly to discuss this instead of marking the issue as resolved, or introducing the default behavior of waiting for jobs as that would affect other users.

icy-furniture-17516

06/13/2023, 3:56 PM

I really appreciate you reaching out, will give it a try soon!

icy-furniture-17516

06/15/2023, 8:12 AM

fyi I go on vacation and will be back on the 26th of June, so most probably will be able to try pod run on that week earliest

astonishing-tomato-18259

06/30/2023, 4:27 PM

https://github.com/garden-io/garden/pull/4611 This has been implemented. A new flag waitForJobs` has been introduced for this. Thank you so much again @icy-furniture-17516 🙂

icy-furniture-17516

06/30/2023, 4:28 PM

wow, very cool, thanks a lot!

53 Views

Previous Next