"Lock file is already being held" errors when upgrading from 0.12.44 to 0.13.13 garden.io #🌱

"Lock file is already being held" errors when upgr...

narrow-application-15594

09/12/2023, 9:35 PM

Hey folks, I'm not sure if this is a known problem going from 0.12 to 0.13 with a documented fix somewhere, but I can no longer run

garden deploy <thing>

without it failing with the error

Lock file is already being held

reported for random modules. This is consistent with every run -- I haven't been able to run

garden deploy

at all since upgrading to 0.13. I haven't yet gotten around to converting modules to actions since I wanted to see first if I could ask my team to upgrade to 0.13 while we convert things over time. I think 0.13 should work with modules to enable this, but we have too many modules (47 as of this writing) to test converting things piecemeal if I'm getting lock errors every run. I tried reinstalling Garden as well (cleaning out the existing .garden directories and starting clean), but that hasn't helped. Anyone know if there's some obvious thing I'm missing here? A lock file I should be looking for to clean up? Should it be treated as a bug? Thanks for any suggestions or time spent thinking about this.

cold-jordan-68753

09/13/2023, 12:25 PM

@narrow-application-15594 Hey, do you also get that with a stack trace or a bit longer error message? What system are you on? Is it possible you have multiple garden processes running at the same time?

narrow-application-15594

09/13/2023, 3:24 PM

@cold-jordan-68753 No stack trace, the only extra parts of the error message (from error.log) are the module name (e.g.,

red: Lock file is already being held

) and the line beforehand,

Failed resolving one or more modules:

due to the lock file error. I'm not running multiple garden processes.

cold-jordan-68753

09/13/2023, 5:24 PM

There might be better logs in

.garden/logs/

, e.g. deploy.silly.*.jsonl.

narrow-application-15594

09/13/2023, 5:47 PM

Nothing extra in there, just logs that it's scanning modules, found files, and flushed debug logs. Oddly, only the deploy.debug.* log contains the lock file error message -- the deploy.silly.*.jsonl log does not have it.

narrow-application-15594

09/13/2023, 5:54 PM

Replaced a few paths with

$SOURCES

in the logs to avoid sharing some specific things, but otherwise if you'd like a zip of them I'm happy to send them somewhere. Can't put them on this exact machine to throw in Discord because personal / work machines are separate, but glad to email them.

narrow-application-15594

09/13/2023, 5:57 PM

Ah, here we go, running with

-l silly

produces something maybe more useful.

narrow-application-15594

09/13/2023, 6:01 PM

This snippet's small enough I should be able to dump it in a gist...

narrow-application-15594

09/13/2023, 6:03 PM

There we go https://gist.github.com/nilium/a17cc49437b60e22b221ff3028b777f7

cold-jordan-68753

09/13/2023, 7:28 PM

Awesome, thank you! We have had this issue happening every now and then with 0.13, but not in an easily reproducible way. A couple of more questions to narrow this down - Are you using custom commands? - Does this happen for any module? You mentioned

garden deploy <thing>

earlier, if

thing

is a module with few or no dependencies does it work then? - Are you running this on a VM? Network disk? Something that could create higher latency or not provide POSIX FS semantics?

narrow-application-15594

09/13/2023, 7:40 PM

* On custom commands, we have a number of exec modules that we use to run

make

with specific values set due to some clunkiness around existing build processes. * Running

garden deploy

for something with no dependencies on other modules also fails. * This is running out of an encrypted home directory (just regular old ecryptfs on Ubuntu 20.04 -- just my workstation, no VM), but moving the contents of ~/.garden to an unencrypted filesystem and symlinking it doesn't seem to improve things. Haven't tried with the project .garden directory, so will try moving and symlinking that next.

narrow-application-15594

09/13/2023, 7:41 PM

Still fails with the project .garden moved to an unencrypted filesystem, so I'm not sure it's that.

narrow-application-15594

09/13/2023, 7:52 PM

Removing all but two modules, a container and persistentvolume module for MySQL, allows deploying the MySQL service to succeed without a lock error

narrow-application-15594

09/13/2023, 7:53 PM

Going to try adding more modules back in and see if there's a point where it becomes unreliable since narrowing it down to a specific module would be convenient

narrow-application-15594

09/13/2023, 10:21 PM

I've managed to get to the point where adding one more exec module in causes lock file errors to start showing up. Replacing the exec module with a kubernetes module allows it to succeed again. The command the exec module uses can just be an echo and it'll still fail, so it doesn't seem to be based on how long the command takes. The only change I can make to the exec module that allows things to succeed is to remove the repositoryUrl from it. If I add the repositoryUrl back in, it fails with a lock error.

narrow-application-15594

09/13/2023, 10:21 PM

Is there some part of how garden handles repositories that interferes with the lock?

narrow-application-15594

09/13/2023, 10:25 PM

Or maybe a double-lock somewhere around that code

cold-jordan-68753

09/14/2023, 2:06 PM

Haven't found a way to repro the issue with the repositoryUrl yet. garden clones the repo into the project .garden/ path, but that doesnt necessarily interact with the config file. One way forward here is if you could potentially share a small reproducible example? Alternatively that I prepare a branch with some more logging that you could run locally.

narrow-application-15594

09/14/2023, 2:47 PM

Will probably take some time to come up with a reproducible example since I have to balance that with other work, but if you know anything you want logged sooner I can also try a branch.

narrow-application-15594

09/19/2023, 10:38 PM

Sorry this took longer than intended -- ended up in jury duty, so that was a bit of surprise time loss. Anyway, I made a public project that manages to always produce the lock file error for me. Obviously can't guarantee it'll reproduce consistently for anyone else, but it's a start: * https://github.com/nilium/garden-locker

narrow-application-15594

09/19/2023, 10:39 PM

This still occurs with 0.13.14 as well.

narrow-application-15594

09/19/2023, 10:41 PM

Might also need to adjust the project.garden.yaml for your own test env since this is intended to be run against a local Kubernetes cluster under Rancher Desktop.

narrow-application-15594

09/20/2023, 5:13 AM

Finally got around to testing it on macOS (personal machine) and I also get the lock file errors there, which brings me a strange sense of comfort even if I was kind of hoping it was just my work machine. So, hopefully this helps with reproducing it in more places than just my machine.

cold-jordan-68753

09/20/2023, 6:52 AM

Thanks so much for looking into this. I've tried it locally and it and can reproduce 👍

cold-jordan-68753

09/20/2023, 3:18 PM

@narrow-application-15594 Have a branch which should resolve the problem: https://github.com/garden-io/garden/pull/5114

narrow-application-15594

09/20/2023, 5:39 PM

Tried running the branch and things are working as expected (left a comment to the same effect on the PR). Also nice to see how much faster 0.13 is (after the usual 3-4s startup time) now that it's working.

cold-jordan-68753

09/22/2023, 12:05 PM

Thats great! The PR has been merged so should be out in the next release. Thanks again for all the help debugging this.

14 Views

Previous Next