Get root workflow id
I have a data pipeline that normally runs in incremental mode, but sometimes we want to do a full reload. The full reload will be a workflow the reuses the normal incremental workflows with some extra config. One critical config is the target database. During the full reload we'll target a temporary database, and at the end of the flow we'll exchange tables between the old and new. This gives us a very safe mechanism for doing reloads non-destructively.
One way to achieve this would be to parameterize all the incremental loading scripts (and workflows). But this would complicate the code for something that happens rarely.
So, to achieve this I though instead I could set some ontextual config that should apply to all the jobs that run as a result of the top-level reload workflow to override the database they connect to. My idea was that I could set a Windmill resource which contains the necessary config as well as the workflow id of the main reload workflow.
In practice, I have some shared code for connecting to the database that would detect the presence of this override and target the temporary reload database instead. For this to work I must be able to find the "root workflow id", i.e. if I kick off the reload and it is assigned id
123
, then the shared connectivity code would check if the override resource is set, and if the workflow_id stored in it is 123
it would apply the override config. This way the normal pipelines can continue running unaffected.
But it appears that there is no variable that contains this, just WM_FLOW_PATH
and WM_FLOW_JOB_ID
which contains info about the immediately encapsulating flow. Since we'll be dealing with nested flows here it won't work.
So, is there a way I'm not seeing for accessing the root flow id from anywhere "inside" the flow?2 Replies
you can actually get it directly from any job, it's root_job that is part of any completed or running job. I agree it might make sense to have it part of the env variables. let me answer soon if we add it
The good news is that since it's directly stored in the job, it's a O(1) operation. We need that because we use the root job for many operation, we also encode there the mapping from node_id to job_ids as an optimization
but right now you can retrieve it through api from within the job, have the script query the get_job api for its own job id
Cool, thanks! That will do the trick for me. It seems to make a lot of sense as an env variable but I'll be fine with the api call too.