Supporting build from source and pre-built images ...
# 🌱|help-and-getting-started
f
We are starting to have enough services in our Garden stack that checking out source, copying to the cluster, building and deploying everything is actually starting to take a reasonable amount of time. An idea we had right when we started using Garden was to be able to switch between pre-built images and building from source, so that most services would just deploy existing images (produced by our regular CI/CD pipeline) and only build from source those repos that you are developing at that time. One way this can happen is via the following:
Copy code
kind: Build
type: container
name: my-service
disabled: ${local.env.GARDEN_DEV_SERVICE == "my-service"}
Then in the deploy action
Copy code
kind: Deploy
type: container
name: my-service
spec:
  image: ${local.env.GARDEN_DEV_SERVICE == "my-service" ? var.registryRoot + "/my-service:latest" : actions.build.my-service.outputs.deploymentImageId}
One question I have is what we do with the
build
attribute of the deploy. The documentation is a bit confusing over its purpose - we've been sorta using it as expressing the dependency on the build, but I'm not sure if that's strictly necessary. There's also no obvious way of not specifying it (would
null
or
undefined
work?). Or should we express the build dependency using the
dependencies
attribute? Any thoughts or ideas appreciated.
Reading a bit more, it looks like we could just set
disabled: true
for all builds, as they will be built if marked as a dependency even if disabled. So we need one less piece of conditional logic there. One big stumbling block seems to be that if we don't have a particular environment variable defined at all, Garden falls in a heap. There's no defaulting to an empty string:
Copy code
Invalid template string (${local.env.GARDEN_DEV_SERVICE == …): Could not find key GARDEN_DEV_SERVICE under local.env. Available keys: ...
This potential workaround doesn't seem viable:
Copy code
providers
  - name: kubernetes
    dependencies: [exec]
  - name: exec
    initScript: "export GARDEN_DEV_SERVICE"
I guess because local environment variables have already been captured and mutating them makes no difference, or the exec provider runs in a subshell and nothing that changes there makes any difference to Garden's environment.
q
@flat-state-47578 if you're using the in-cluster image builders, teams should be able to make use of the shared image caching so that images already built are skipped.
Both
kaniko
and
cluster-buildkit
will pull caches from deployment registries
f
Sure, but in practice I get a lot of complaints from developers saying it takes a long time, so I think we just don't get enough reuse of images with the movement of the codebases.
It's upwards of 25 minutes for a full build compared with perhaps 3-4 minutes from pre-built images.
If the cached images are only considered once the sources are synced to the cluster, we have a lot of developers in the pacific region, connecting to the cluster in the US and having to sync very large amounts of code, so we have to wait for that as well.
q
That's a very long time. Cached image builds are one of the big drivers of value for Garden. If not in image builds, where are you deriving value?
I'd be interested to dive more into the image builders and understand just what has them spinning for so long.
Are the assigned nodes powerful enough? How does a local image build benchmark against a remote image build?
> One question I have is what we do with the build attribute of the deploy. The documentation is a bit confusing over its purpose - we've been sorta using it as expressing the dependency on the build, but I'm not sure if that's strictly necessary. There's also no obvious way of not specifying it (would null or undefined work?). Or should we express the build dependency using the dependencies attribute? The
build
key resolves actions from the context of the
Build
action it refers to. For strict dependencies, you'll want to use
dependencies
. See e.g. https://docs.garden.io/reference/action-types/deploy/helm#build
f
Our main use case is running a large subset of our services to allow people to work on individual services within a realistic distributed environment. So, it's not unusual to work on a single service and need a bunch of database + Kafka + ElasticSearch and other microservices around it (as well as API gateway, auth services etc) in order to do exploratory integration testing.
Well, seems like we need either
build
or
spec.image
The nodes could probably be made more powerful, but the build takes a relatively small amount of time compared to when the stack is running so that would be less cost-effective. I don't think they are especially low-powered (usually
r6i.xlarge
) but as you know, the cluster buildkit only processes one build at a time so we have to consider the end-to-end total build time. At least one JVM service takes perhaps 6 minutes of that, another service might take 5 minutes. It all adds up.
q
@flat-state-47578 have you tried swapping to kaniko, since it creates pods for each build?
f
We started out with Kaniko at the very beginning of using Garden, but I moved to cluster buildkit fairly early on. I can't remember what the problem was now. Let me check my notes.
My notes say that "cluster-buildkit is meant to perform better"
This is from around the end of 2021 I believe.
q
In the mean time, I'm requesting a dev take a look at this particular issue since I agree it is at the very least confusing and doesn't jive with what the docs say.
If you find it a good use of your time, I think it may be worth benchmarking the two and see if we can at least unblock your serial pipe.
f
I'll try that, thanks for the idea!
Got 12 minutes with Kaniko just a moment ago. Did you say there is zero caching with that option or does it still try to pull from the external registry if set up?
q
It still pulls from the external registry if set up. So this is 12 minutes on a build that should be cached, right? Not a cold build.
f
Yes, but at least two of the containers take 5-6 minutes to build... I only saw a couple of concurrent kaniko containers so there's some ordering dependencies there meaning we can't just have everything built in one hit. I don't think we're going to get much faster than 10-12 minutes, so I would like to use the pre-built images if possible.
q
@flat-state-47578 do you think this might be a feature request?
Roughly composed along the lines of easily swap images and source builds
f
Well, as far as I can tell the only thing preventing me from doing it is being able to optionally recognise environment variables.
q
Do you mind writing this up as a feature request to optionally recognize environment vars? I can too but I'm still in bed 😴
Or I can later, up to you!
f
it does recognise environment variables, it just throws an error if one is referenced but not defined
let me give a quick example and you can tell me if it warrants an issue
Copy code
spec:
  image: "${local.env.GARDEN_DEV_SERVICE == 'gateway' ? var.registryRoot + '/api-gateway:latest' : actions.build.gateway.outputs.deploymentImageId }"
but if we don't do something like
GARDEN_DEV_SERVICE=gateway garden deploy
i.e. we don't provide
GARDEN_DEV_SERVICE
at all, we get this:
Copy code
Invalid template string (${local.env.GARDEN_DEV_SERVICE == …): Could not find key GARDEN_DEV_SERVICE under local.env. Available keys: ...
if you just did
garden deploy
for example, without that variable in the environment
I had no luck adding it via the exec provider
a workaround would be putting it in a file that we read in as a varfile, but I feel like people will forget that
q
@flat-state-47578 can you try changing it to an if/else conditional? See kapa's answer in https://discord.com/channels/817392104711651328/1199655376845557771/1199655485050208256
@flat-state-47578 I think if we set
local.env.GARDEN_DEV_SERVICE
to be an optional value it should work
I've also asked kapa a follow-up to illustrate
b
Hey @flat-state-47578 You are right that the
build
field is confusing and we're planning on deprecating and removing it. So you should definitely use
spec.image
. As for using the template string, this a bit of an awkward limitation of the template system that you can workaround by falling back to an empty string so the full line becomes:
Copy code
spec:
  image: "${(local.env.GARDEN_DEV_SERVICE ||  == 'gateway' ? var.registryRoot + '/api-gateway:latest' : actions.build.gateway.outputs.deploymentImageId }"
f
Great suggestions, I'll give that a try!
I think there's a missing closing parenthesis but I'm not sure where it's meant to go
${(local.env.GARDEN_DEV_SERVICE || ) ...
perhaps?
oh wait, think I figured it out - it must have been swallowed by Discord
I've now got this in the project config:
Copy code
variables:
  devService: ${local.env.GARDEN_DEV_SERVICE || ''}
(using
gardenDevService
seems to make it complain, I guess
garden
prefixes are forbidden in variables?) And in the services we do this:
Copy code
spec:
  image: "${var.devService == 'myservice' ? var.registryRoot + '/myservice:latest' : actions.build.myservice.outputs.deploymentImageId }"
just so we don't repeat the same default value again and again. It's at least passing syntax checks and seems to have deployed but I'll rebuild the stack from scratch and play with it a bit to make sure it's working as expected.
🤦‍♂️ I have the ternary around the wrong way
OK, I think we are past the syntax and conditional problems now, but there's potentially a more fundamental problem here that still gets in the way. Pre-built images are fine - that functionality works. Building from source now I think suffers from a bit of a graph/causality issue.
I've synced the remote repo with
garden update-remote all
so I have the latest code for one service up to date in the
.garden
working directory. Then I try to deploy it with
GARDEN_DEV_SERVICE=myservice garden deploy myservice
which in theory should cause it to have a reference to
actions.build.myservice.outputs.deploymentImageId
, but something here gets messed up. The build in question has
disabled: true
, but despite the reference to the build output in the deploy action, I think the source doesn't get synced to the cluster, nor does the build action actually run. Something is generating the version hash though, because then we get this error:
Copy code
Pod myservice-6958dc8bb9-m88q6: Pulling - Pulling image "000000000000.dkr.ecr.us-east-1.amazonaws.com/xxxx/garden/myservice:v-a749b1cbac"
Pod myservice-6958dc8bb9-m88q6: Failed - Failed to pull image "
000000000000.dkr.ecr.us-east-1.amazonaws.com/xxxx/garden/myservice:v-a749b1cbac": rpc error: code = NotFound desc = failed to pull and unpack image "000000000000.dkr.ecr.us-east-1.amazonaws.com/xxxx/garden/myservice:v-a749b1cbac": failed to resolve reference "000000000000.dkr.ecr.us-east-1.amazonaws.com/xxxx/garden/myservice:v-a749b1cbac": 000000000000.dkr.ecr.us-east-1.amazonaws.com/xxxx/garden/myservice:v-a749b1cbac: not found
Had a thought that maybe I needed to change the
disabled
attribute of the build, but that didn't make any difference:
Copy code
disabled: ${var.devService == 'myservice'}
I'm guessing this is because despite referring to the output of the build in the image URL, it doesn't trigger an actual build. Presumably we need the
build
attribute of the deploy action to be there to do that, but it seems to need to be specified or just not present. I'll see if it is happy with null or undefined.
I dislike the amount of boilerplate but this seems to work so far:
Copy code
build:
  $if: ${var.devService == "myservice"}
  $then: myservice
q
@flat-state-47578 would you say this is a resolution despite the boilerplate?
f
It seems that way but I'll continue testing to make sure 🙂
b
> I think there's a missing closing parenthesis but I'm not sure where it's meant to go Hmm, yeah that looks a copy-paste fail. It's suppose to be:
Copy code
image: "${(local.env.GARDEN_DEV_SERVICE || '')  == 'gateway' ? var.registryRoot + '/api-gateway:latest' : actions.build.gateway.outputs.deploymentImageId }"
Regarding: > Then I try to deploy it with GARDEN_DEV_SERVICE=myservice garden deploy myservice which in theory should cause it to have a reference to actions.build.myservice.outputs.deploymentImageId, but something here gets messed up. Could it be that the Deploy action is missing a dependency on the Build?
Here's a very minimal example that worked when I tested it:
Copy code
kind: Build
name: api
description: Build the backend
type: container

---
kind: Deploy
name: api
type: container

# You can set this irrespective of whether the image name is hardcoded or not
# If it is hardcoded, Garden will simply skip the build
dependencies: [build.api]

spec:
  image: "${(local.env.GARDEN_USE_PREBUILT || '' ) == 'true' ? 'my-prebuilt-image:latest' : actions.build.api.outputs.deploymentImageId}"
  # ...
If I run
GARDEN_USE_PREBUILT=true garden deploy
it skips the build and uses the prebuilt image. If I skip the flag it'll build the image from source and deploy that version. Would this approach work for your use case?
f
so adding the build to the dependencies won't trigger a disabled build?
the trouble is having the build disabled by default, but wanting it to run if we set the image URL to be the output of the build action
I understand the dependency graph has to figure out intent here, which is hard.
b
Ah right. You can also declare the dependency conditionally. So the updated example would be:
Copy code
kind: Build
name: api
description: Build the backend
type: container

---
kind: Deploy
name: api
type: container

variables:
  buildApi: ${(local.env.GARDEN_USE_PREBUILT || '') == 'true'} # <--- Set as a variable so we can re-use it

# You can set this irrespective of whether the image name is hardcoded or not
# If it is hardcoded, Garden will simply skip the build
dependencies:
  - "${var.buildApi ? 'build.api' : null }" # <--- Conditionally depend on build.api

spec:
  image: "${var.buildApi ? 'my-prebuilt-image:latest' : actions.build.api.outputs.deploymentImageId}"
  # ...
f
I got it working with $if $then in the build attribute, but I guess what you posted is more future-proof if that attribute is going away.
Thanks!
o
Just thing to understand but I don't really understand the premise of the problem. Instead of using the shared builds managed by Garden you just want to bypass that and use some other image somewhere else? Shouldn't that just be essentially a duplication of garden or potentially incorrect? The build action has its own detection of when a build gets invalidated and Kaniko should handle layer caching across all builds. AFAIU garden exactly solves this.