Running Github hosted code in our flows
We are trying to find the "Windmill way" of running the latest version of a python tool we develop outside of Windmill in our flows. We found 2 ways but want to understand what the "WIndmill way" is: one way is to rebuild a worker with the latest version on each merge (including a virtual environment in the worker that has all requirements.txt installed). A second option is to "git pull" & activate (basically all the same steps we can do in the worker build process) at the start of our workflow so the worker instance has the latest code available "at runtime" (this feels very unpreferred & hacky). Is the first option the recomendded way (aka rebuild workers with the latest version of our source code / tools in it on each change) or is there a better way? Thanks!
4 Replies
@Daan ideally you would deploy your tool on a private pypi and import it directly in windmill scripts
OK - is there a "Windmill way" to do that without extra infra / packaging & keep workers in sync with latest tools on a versioned system?
(currently we "apt-get install" a bunch of stuff already in our worker docker build script, but we don't want to abuse this system & rebuild on every commit, or is that OK "by design"?)
The only lightweight alterntive that I would recommend is to use the +git syntax directly for pip requirements
but, the extra infra / packaging might be worth it
it gives you versioning!
Which is super important, it ensure full reproducibility
Windmill magic assumes that it can cache a lot of things, including deps
the only way to cache that safely is because they are immutable, and they are immutable because they are versioned
if you do things that assume things are on disk, it will be non reproducible and hard to keep track off
Very clear thanks Ruben! We will explore that route then