Supporting build from source and pre-built images in deploys garden.io #🌱

Supporting build from source and pre-built images ...

flat-state-47578

01/22/2024, 1:24 AM

We are starting to have enough services in our Garden stack that checking out source, copying to the cluster, building and deploying everything is actually starting to take a reasonable amount of time. An idea we had right when we started using Garden was to be able to switch between pre-built images and building from source, so that most services would just deploy existing images (produced by our regular CI/CD pipeline) and only build from source those repos that you are developing at that time. One way this can happen is via the following:

Copy code

kind: Build
type: container
name: my-service
disabled: ${local.env.GARDEN_DEV_SERVICE == "my-service"}

Then in the deploy action

Copy code

kind: Deploy
type: container
name: my-service
spec:
  image: ${local.env.GARDEN_DEV_SERVICE == "my-service" ? var.registryRoot + "/my-service:latest" : actions.build.my-service.outputs.deploymentImageId}

One question I have is what we do with the

build

attribute of the deploy. The documentation is a bit confusing over its purpose - we've been sorta using it as expressing the dependency on the build, but I'm not sure if that's strictly necessary. There's also no obvious way of not specifying it (would

null

undefined

work?). Or should we express the build dependency using the

dependencies

attribute? Any thoughts or ideas appreciated.

flat-state-47578

01/22/2024, 2:00 AM

Reading a bit more, it looks like we could just set

disabled: true

for all builds, as they will be built if marked as a dependency even if disabled. So we need one less piece of conditional logic there. One big stumbling block seems to be that if we don't have a particular environment variable defined at all, Garden falls in a heap. There's no defaulting to an empty string:

Copy code

Invalid template string (${local.env.GARDEN_DEV_SERVICE == …): Could not find key GARDEN_DEV_SERVICE under local.env. Available keys: ...

flat-state-47578

01/22/2024, 3:55 AM

This potential workaround doesn't seem viable:

Copy code

providers
  - name: kubernetes
    dependencies: [exec]
  - name: exec
    initScript: "export GARDEN_DEV_SERVICE"

I guess because local environment variables have already been captured and mutating them makes no difference, or the exec provider runs in a subshell and nothing that changes there makes any difference to Garden's environment.

quaint-dress-831

01/23/2024, 8:55 AM

@flat-state-47578 if you're using the in-cluster image builders, teams should be able to make use of the shared image caching so that images already built are skipped.

quaint-dress-831

01/23/2024, 8:58 AM

Both

kaniko

and

cluster-buildkit

will pull caches from deployment registries

flat-state-47578

01/23/2024, 8:58 AM

Sure, but in practice I get a lot of complaints from developers saying it takes a long time, so I think we just don't get enough reuse of images with the movement of the codebases.

flat-state-47578

01/23/2024, 8:58 AM

It's upwards of 25 minutes for a full build compared with perhaps 3-4 minutes from pre-built images.

flat-state-47578

01/23/2024, 8:59 AM

If the cached images are only considered once the sources are synced to the cluster, we have a lot of developers in the pacific region, connecting to the cluster in the US and having to sync very large amounts of code, so we have to wait for that as well.

quaint-dress-831

01/23/2024, 9:00 AM

That's a very long time. Cached image builds are one of the big drivers of value for Garden. If not in image builds, where are you deriving value?

quaint-dress-831

01/23/2024, 9:02 AM

I'd be interested to dive more into the image builders and understand just what has them spinning for so long.

quaint-dress-831

01/23/2024, 9:03 AM

Are the assigned nodes powerful enough? How does a local image build benchmark against a remote image build?

quaint-dress-831

01/23/2024, 9:12 AM

> One question I have is what we do with the build attribute of the deploy. The documentation is a bit confusing over its purpose - we've been sorta using it as expressing the dependency on the build, but I'm not sure if that's strictly necessary. There's also no obvious way of not specifying it (would null or undefined work?). Or should we express the build dependency using the dependencies attribute? The

build

key resolves actions from the context of the

Build

action it refers to. For strict dependencies, you'll want to use

dependencies

. See e.g. https://docs.garden.io/reference/action-types/deploy/helm#build

flat-state-47578

01/23/2024, 9:28 AM

Our main use case is running a large subset of our services to allow people to work on individual services within a realistic distributed environment. So, it's not unusual to work on a single service and need a bunch of database + Kafka + ElasticSearch and other microservices around it (as well as API gateway, auth services etc) in order to do exploratory integration testing.

flat-state-47578

01/23/2024, 9:29 AM

Well, seems like we need either

build

spec.image

flat-state-47578

01/23/2024, 9:31 AM

The nodes could probably be made more powerful, but the build takes a relatively small amount of time compared to when the stack is running so that would be less cost-effective. I don't think they are especially low-powered (usually

r6i.xlarge

) but as you know, the cluster buildkit only processes one build at a time so we have to consider the end-to-end total build time. At least one JVM service takes perhaps 6 minutes of that, another service might take 5 minutes. It all adds up.

quaint-dress-831

01/23/2024, 9:38 AM

@flat-state-47578 have you tried swapping to kaniko, since it creates pods for each build?

flat-state-47578

01/23/2024, 9:38 AM

We started out with Kaniko at the very beginning of using Garden, but I moved to cluster buildkit fairly early on. I can't remember what the problem was now. Let me check my notes.

flat-state-47578

01/23/2024, 9:39 AM

My notes say that "cluster-buildkit is meant to perform better"

flat-state-47578

01/23/2024, 9:40 AM

This is from around the end of 2021 I believe.

quaint-dress-831

01/23/2024, 9:40 AM

In the mean time, I'm requesting a dev take a look at this particular issue since I agree it is at the very least confusing and doesn't jive with what the docs say.

quaint-dress-831

01/23/2024, 9:41 AM

If you find it a good use of your time, I think it may be worth benchmarking the two and see if we can at least unblock your serial pipe.

flat-state-47578

01/23/2024, 9:41 AM

I'll try that, thanks for the idea!

flat-state-47578

01/24/2024, 4:15 AM

Got 12 minutes with Kaniko just a moment ago. Did you say there is zero caching with that option or does it still try to pull from the external registry if set up?

quaint-dress-831

01/24/2024, 7:21 AM

It still pulls from the external registry if set up. So this is 12 minutes on a build that should be cached, right? Not a cold build.

flat-state-47578

01/24/2024, 7:41 AM

Yes, but at least two of the containers take 5-6 minutes to build... I only saw a couple of concurrent kaniko containers so there's some ordering dependencies there meaning we can't just have everything built in one hit. I don't think we're going to get much faster than 10-12 minutes, so I would like to use the pre-built images if possible.

quaint-dress-831

01/24/2024, 7:43 AM

@flat-state-47578 do you think this might be a feature request?

quaint-dress-831

01/24/2024, 7:44 AM

Roughly composed along the lines of easily swap images and source builds

flat-state-47578

01/24/2024, 7:44 AM

Well, as far as I can tell the only thing preventing me from doing it is being able to optionally recognise environment variables.

quaint-dress-831

01/24/2024, 7:45 AM

Do you mind writing this up as a feature request to optionally recognize environment vars? I can too but I'm still in bed 😴

quaint-dress-831

01/24/2024, 7:45 AM

Or I can later, up to you!

flat-state-47578

01/24/2024, 7:45 AM

it does recognise environment variables, it just throws an error if one is referenced but not defined

flat-state-47578

01/24/2024, 7:46 AM

let me give a quick example and you can tell me if it warrants an issue

flat-state-47578

01/24/2024, 7:46 AM

Copy code

spec:
  image: "${local.env.GARDEN_DEV_SERVICE == 'gateway' ? var.registryRoot + '/api-gateway:latest' : actions.build.gateway.outputs.deploymentImageId }"

but if we don't do something like

GARDEN_DEV_SERVICE=gateway garden deploy

i.e. we don't provide

GARDEN_DEV_SERVICE

at all, we get this:

Copy code

Invalid template string (${local.env.GARDEN_DEV_SERVICE == …): Could not find key GARDEN_DEV_SERVICE under local.env. Available keys: ...

flat-state-47578

01/24/2024, 7:47 AM

if you just did

garden deploy

for example, without that variable in the environment

flat-state-47578

01/24/2024, 7:47 AM

I had no luck adding it via the exec provider

flat-state-47578

01/24/2024, 7:48 AM

a workaround would be putting it in a file that we read in as a varfile, but I feel like people will forget that

quaint-dress-831

01/24/2024, 10:03 AM

@flat-state-47578 can you try changing it to an if/else conditional? See kapa's answer in https://discord.com/channels/817392104711651328/1199655376845557771/1199655485050208256

quaint-dress-831

01/24/2024, 10:13 AM

@flat-state-47578 I think if we set

local.env.GARDEN_DEV_SERVICE

to be an optional value it should work

quaint-dress-831

01/24/2024, 10:13 AM

I've also asked kapa a follow-up to illustrate

brief-restaurant-63679

01/24/2024, 11:00 AM

Hey @flat-state-47578 You are right that the

build

field is confusing and we're planning on deprecating and removing it. So you should definitely use

spec.image

. As for using the template string, this a bit of an awkward limitation of the template system that you can workaround by falling back to an empty string so the full line becomes:

Copy code

spec:
  image: "${(local.env.GARDEN_DEV_SERVICE ||  == 'gateway' ? var.registryRoot + '/api-gateway:latest' : actions.build.gateway.outputs.deploymentImageId }"

flat-state-47578

01/25/2024, 2:52 AM

Great suggestions, I'll give that a try!

flat-state-47578

01/25/2024, 3:21 AM

I think there's a missing closing parenthesis but I'm not sure where it's meant to go

flat-state-47578

01/25/2024, 3:21 AM

${(local.env.GARDEN_DEV_SERVICE || ) ...

perhaps?

flat-state-47578

01/25/2024, 3:24 AM

oh wait, think I figured it out - it must have been swallowed by Discord

flat-state-47578

01/25/2024, 3:32 AM

I've now got this in the project config:

Copy code

variables:
  devService: ${local.env.GARDEN_DEV_SERVICE || ''}

(using

gardenDevService

seems to make it complain, I guess

garden

prefixes are forbidden in variables?) And in the services we do this:

Copy code

spec:
  image: "${var.devService == 'myservice' ? var.registryRoot + '/myservice:latest' : actions.build.myservice.outputs.deploymentImageId }"

just so we don't repeat the same default value again and again. It's at least passing syntax checks and seems to have deployed but I'll rebuild the stack from scratch and play with it a bit to make sure it's working as expected.

flat-state-47578

01/25/2024, 3:58 AM

🤦‍♂️ I have the ternary around the wrong way

flat-state-47578

01/25/2024, 4:52 AM

OK, I think we are past the syntax and conditional problems now, but there's potentially a more fundamental problem here that still gets in the way. Pre-built images are fine - that functionality works. Building from source now I think suffers from a bit of a graph/causality issue.

flat-state-47578

01/25/2024, 4:56 AM

I've synced the remote repo with

garden update-remote all

so I have the latest code for one service up to date in the

.garden

working directory. Then I try to deploy it with

GARDEN_DEV_SERVICE=myservice garden deploy myservice

which in theory should cause it to have a reference to

actions.build.myservice.outputs.deploymentImageId

, but something here gets messed up. The build in question has

disabled: true

, but despite the reference to the build output in the deploy action, I think the source doesn't get synced to the cluster, nor does the build action actually run. Something is generating the version hash though, because then we get this error:

Copy code

Pod myservice-6958dc8bb9-m88q6: Pulling - Pulling image "000000000000.dkr.ecr.us-east-1.amazonaws.com/xxxx/garden/myservice:v-a749b1cbac"
Pod myservice-6958dc8bb9-m88q6: Failed - Failed to pull image "
000000000000.dkr.ecr.us-east-1.amazonaws.com/xxxx/garden/myservice:v-a749b1cbac": rpc error: code = NotFound desc = failed to pull and unpack image "000000000000.dkr.ecr.us-east-1.amazonaws.com/xxxx/garden/myservice:v-a749b1cbac": failed to resolve reference "000000000000.dkr.ecr.us-east-1.amazonaws.com/xxxx/garden/myservice:v-a749b1cbac": 000000000000.dkr.ecr.us-east-1.amazonaws.com/xxxx/garden/myservice:v-a749b1cbac: not found

flat-state-47578

01/25/2024, 8:34 AM

Had a thought that maybe I needed to change the

disabled

attribute of the build, but that didn't make any difference:

Copy code

disabled: ${var.devService == 'myservice'}

flat-state-47578

01/25/2024, 8:35 AM

I'm guessing this is because despite referring to the output of the build in the image URL, it doesn't trigger an actual build. Presumably we need the

build

attribute of the deploy action to be there to do that, but it seems to need to be specified or just not present. I'll see if it is happy with null or undefined.

flat-state-47578

01/25/2024, 8:41 AM

I dislike the amount of boilerplate but this seems to work so far:

Copy code

build:
  $if: ${var.devService == "myservice"}
  $then: myservice

quaint-dress-831

01/25/2024, 9:25 AM

@flat-state-47578 would you say this is a resolution despite the boilerplate?

flat-state-47578

01/25/2024, 9:57 AM

It seems that way but I'll continue testing to make sure 🙂

brief-restaurant-63679

01/25/2024, 11:20 AM

> I think there's a missing closing parenthesis but I'm not sure where it's meant to go Hmm, yeah that looks a copy-paste fail. It's suppose to be:

Copy code

image: "${(local.env.GARDEN_DEV_SERVICE || '')  == 'gateway' ? var.registryRoot + '/api-gateway:latest' : actions.build.gateway.outputs.deploymentImageId }"

brief-restaurant-63679

01/25/2024, 11:23 AM

Regarding: > Then I try to deploy it with GARDEN_DEV_SERVICE=myservice garden deploy myservice which in theory should cause it to have a reference to actions.build.myservice.outputs.deploymentImageId, but something here gets messed up. Could it be that the Deploy action is missing a dependency on the Build?

brief-restaurant-63679

01/25/2024, 11:44 AM

Here's a very minimal example that worked when I tested it:

Copy code

kind: Build
name: api
description: Build the backend
type: container

---
kind: Deploy
name: api
type: container

# You can set this irrespective of whether the image name is hardcoded or not
# If it is hardcoded, Garden will simply skip the build
dependencies: [build.api]

spec:
  image: "${(local.env.GARDEN_USE_PREBUILT || '' ) == 'true' ? 'my-prebuilt-image:latest' : actions.build.api.outputs.deploymentImageId}"
  # ...

brief-restaurant-63679

01/25/2024, 11:45 AM

If I run

GARDEN_USE_PREBUILT=true garden deploy

it skips the build and uses the prebuilt image. If I skip the flag it'll build the image from source and deploy that version. Would this approach work for your use case?

flat-state-47578

01/25/2024, 10:06 PM

so adding the build to the dependencies won't trigger a disabled build?

flat-state-47578

01/25/2024, 10:07 PM

the trouble is having the build disabled by default, but wanting it to run if we set the image URL to be the output of the build action

flat-state-47578

01/25/2024, 10:07 PM

I understand the dependency graph has to figure out intent here, which is hard.

brief-restaurant-63679

01/29/2024, 2:14 PM

Ah right. You can also declare the dependency conditionally. So the updated example would be:

Copy code

kind: Build
name: api
description: Build the backend
type: container

---
kind: Deploy
name: api
type: container

variables:
  buildApi: ${(local.env.GARDEN_USE_PREBUILT || '') == 'true'} # <--- Set as a variable so we can re-use it

# You can set this irrespective of whether the image name is hardcoded or not
# If it is hardcoded, Garden will simply skip the build
dependencies:
  - "${var.buildApi ? 'build.api' : null }" # <--- Conditionally depend on build.api

spec:
  image: "${var.buildApi ? 'my-prebuilt-image:latest' : actions.build.api.outputs.deploymentImageId}"
  # ...

flat-state-47578

01/29/2024, 9:29 PM

I got it working with $if $then in the build attribute, but I guess what you posted is more future-proof if that attribute is going away.

flat-state-47578

01/29/2024, 9:29 PM

Thanks!

orange-ability-1812

02/01/2024, 1:02 AM

Just thing to understand but I don't really understand the premise of the problem. Instead of using the shared builds managed by Garden you just want to bypass that and use some other image somewhere else? Shouldn't that just be essentially a duplication of garden or potentially incorrect? The build action has its own detection of when a build gets invalidated and Kaniko should handle layer caching across all builds. AFAIU garden exactly solves this.

18 Views

Previous Next