"Lock file is already being held" errors when upgr...
# 🌱|help-and-getting-started
Hey folks, I'm not sure if this is a known problem going from 0.12 to 0.13 with a documented fix somewhere, but I can no longer run
garden deploy <thing>
without it failing with the error
Lock file is already being held
reported for random modules. This is consistent with every run -- I haven't been able to run
garden deploy
at all since upgrading to 0.13. I haven't yet gotten around to converting modules to actions since I wanted to see first if I could ask my team to upgrade to 0.13 while we convert things over time. I think 0.13 should work with modules to enable this, but we have too many modules (47 as of this writing) to test converting things piecemeal if I'm getting lock errors every run. I tried reinstalling Garden as well (cleaning out the existing .garden directories and starting clean), but that hasn't helped. Anyone know if there's some obvious thing I'm missing here? A lock file I should be looking for to clean up? Should it be treated as a bug? Thanks for any suggestions or time spent thinking about this.
@narrow-application-15594 Hey, do you also get that with a stack trace or a bit longer error message? What system are you on? Is it possible you have multiple garden processes running at the same time?
@cold-jordan-68753 No stack trace, the only extra parts of the error message (from error.log) are the module name (e.g.,
red: Lock file is already being held
) and the line beforehand,
Failed resolving one or more modules:
due to the lock file error. I'm not running multiple garden processes.
There might be better logs in
, e.g. deploy.silly.*.jsonl.
Nothing extra in there, just logs that it's scanning modules, found files, and flushed debug logs. Oddly, only the deploy.debug.* log contains the lock file error message -- the deploy.silly.*.jsonl log does not have it.
Replaced a few paths with
in the logs to avoid sharing some specific things, but otherwise if you'd like a zip of them I'm happy to send them somewhere. Can't put them on this exact machine to throw in Discord because personal / work machines are separate, but glad to email them.
Ah, here we go, running with
-l silly
produces something maybe more useful.
This snippet's small enough I should be able to dump it in a gist...
Awesome, thank you! We have had this issue happening every now and then with 0.13, but not in an easily reproducible way. A couple of more questions to narrow this down - Are you using custom commands? - Does this happen for any module? You mentioned
garden deploy <thing>
earlier, if
is a module with few or no dependencies does it work then? - Are you running this on a VM? Network disk? Something that could create higher latency or not provide POSIX FS semantics?
* On custom commands, we have a number of exec modules that we use to run
with specific values set due to some clunkiness around existing build processes. * Running
garden deploy
for something with no dependencies on other modules also fails. * This is running out of an encrypted home directory (just regular old ecryptfs on Ubuntu 20.04 -- just my workstation, no VM), but moving the contents of ~/.garden to an unencrypted filesystem and symlinking it doesn't seem to improve things. Haven't tried with the project .garden directory, so will try moving and symlinking that next.
Still fails with the project .garden moved to an unencrypted filesystem, so I'm not sure it's that.
Removing all but two modules, a container and persistentvolume module for MySQL, allows deploying the MySQL service to succeed without a lock error
Going to try adding more modules back in and see if there's a point where it becomes unreliable since narrowing it down to a specific module would be convenient
I've managed to get to the point where adding one more exec module in causes lock file errors to start showing up. Replacing the exec module with a kubernetes module allows it to succeed again. The command the exec module uses can just be an echo and it'll still fail, so it doesn't seem to be based on how long the command takes. The only change I can make to the exec module that allows things to succeed is to remove the repositoryUrl from it. If I add the repositoryUrl back in, it fails with a lock error.
Is there some part of how garden handles repositories that interferes with the lock?
Or maybe a double-lock somewhere around that code
Haven't found a way to repro the issue with the repositoryUrl yet. garden clones the repo into the project .garden/ path, but that doesnt necessarily interact with the config file. One way forward here is if you could potentially share a small reproducible example? Alternatively that I prepare a branch with some more logging that you could run locally.
Will probably take some time to come up with a reproducible example since I have to balance that with other work, but if you know anything you want logged sooner I can also try a branch.
Sorry this took longer than intended -- ended up in jury duty, so that was a bit of surprise time loss. Anyway, I made a public project that manages to always produce the lock file error for me. Obviously can't guarantee it'll reproduce consistently for anyone else, but it's a start: * https://github.com/nilium/garden-locker
This still occurs with 0.13.14 as well.
Might also need to adjust the project.garden.yaml for your own test env since this is intended to be run against a local Kubernetes cluster under Rancher Desktop.
Finally got around to testing it on macOS (personal machine) and I also get the lock file errors there, which brings me a strange sense of comfort even if I was kind of hoping it was just my work machine. So, hopefully this helps with reproducing it in more places than just my machine.
Thanks so much for looking into this. I've tried it locally and it and can reproduce 👍
@narrow-application-15594 Have a branch which should resolve the problem: https://github.com/garden-io/garden/pull/5114
Tried running the branch and things are working as expected (left a comment to the same effect on the PR). Also nice to see how much faster 0.13 is (after the usual 3-4s startup time) now that it's working.
Thats great! The PR has been merged so should be out in the next release. Thanks again for all the help debugging this.