What is Garden using so much upload bandwidth for when running a task? garden.io #🌱

What is Garden using so much upload bandwidth for ...

mammoth-kilobyte-41764

01/17/2023, 11:30 PM

I’ve been working on an internet connection with very little upload bandwidth so I noticed when running a remote Garden task (with the Garden CLI) resulted in maxing out my 1Mbps upload from my home internet connection. What data is garden sending in such high quantities during the execution of the task? Bandwidth stayed pegged the whole time the task was running and the task itself doesn’t send any data so I figured this data upload must all be about communicating with the cluster garden util and checking job status. Is there more to it? It’s not that 1Mbps is all that much data, just curious what it’s being used for.

brief-restaurant-63679

01/18/2023, 12:49 PM

Hi Matt! My first guess would be that it's the build context for the in-cluster building. If a build is required for the task to run, Garden will send the context over the wire to the cluster. You can also run Garden with a higher log level to better see what happening under the hood. E.g.

garden run task <my-task> -l5

brief-restaurant-63679

01/18/2023, 12:50 PM

Running

garden options

shows you what log levels you can use

mammoth-kilobyte-41764

01/18/2023, 1:58 PM

Thanks, I’ll take a closer look. The bandwidth picks up when the task starts running in its container and stays steady until the task finishes and its pod is deleted, so I suspect there’s something beyond build context being uploaded given the strong coincidence of bandwidth and task execution.

mammoth-kilobyte-41764

01/20/2023, 8:50 PM

Very interesting. The only thing I noticed when logging at level 5 was tons of rapid fire connection attempts that I traced to the log retrieval code. If I comment out that code that gets logs while a task is running, I see almost zero upload bandwidth get used, the whole process is Lightning fast (on my slow internet connection, it goes from taking over a minute to register the task has run and completed to mere seconds), and the whole process is way more reliable (on my machine, it goes from hitting connection errors and exiting with failure 2/3 of the time to being successful in running the task every time). It seems like something about the log tailing setup is a bit more network intensive than it needs to be (certainly no need to be sending 1Mbps of data in pursuit of downloading new logs).

mammoth-kilobyte-41764

01/20/2023, 8:53 PM

Copy code

diff

diff --git a/core/src/plugins/kubernetes/logs.ts b/core/src/plugins/kubernetes/logs.ts
index 34e235e17..a32ed241b 100644
--- a/core/src/plugins/kubernetes/logs.ts
+++ b/core/src/plugins/kubernetes/logs.ts
@@ -230,9 +230,9 @@ export class K8sLogFollower<T> {
   public async followLogs(opts: LogOpts) {
     await this.createConnections(opts)
 
-    this.intervalId = setInterval(async () => {
-      await this.createConnections(opts)
-    }, this.retryIntervalMs)
+//    this.intervalId = setInterval(async () => {
+//      await this.createConnections(opts)
+//    }, this.retryIntervalMs)
 
     return new Promise((resolve, _reject) => {
       this.resolve = resolve

quaint-dress-831

01/23/2023, 1:30 PM

I applaud the level of research you've done @mammoth-kilobyte-41764! Can you rephrase this as a GitHub issue and post your findings? I think our open source dev team would be very interested in any optimization that could be done from our side.

mammoth-kilobyte-41764

01/25/2023, 3:14 PM

For reference: https://github.com/garden-io/garden/issues/3586

brief-restaurant-63679

02/14/2023, 6:51 AM

Hey @mammoth-kilobyte-41764, just wanted to let you know that this is being worked on: https://github.com/garden-io/garden/pull/3730 Should be fixed in our next release.

swift-garage-61180

02/14/2023, 8:00 AM

Yeah. This is basically ready, we're just making a few final tweaks before we merge it.

flat-state-47578

02/15/2023, 6:12 AM

maybe that also relates to the problem I've seen with tasks - they complete fairly quickly but the Garden CLI hangs for a long time I guess waiting to download or process (or something) the configmap in which the output from the task was stored.

flat-state-47578

02/15/2023, 6:12 AM

admittedly this particular task is very noisy, but I haven't seen a configmap bigger than perhaps 100K in size and yet this causes massive problems for the log streaming

quaint-dress-831

02/15/2023, 2:32 PM

I had understood tasks taking a long time to be because they are launched as a separate pod. If you use something like GKE Autopilot, those pods launch in a new VM, adding precious seconds to a task run

3 Views

Previous Next