invakid404
invakid4042mo ago

Sporadic "Flow result by id in leaf jobs not found at name ..." errors

Sometimes, specifically when running multiple flows at once, some of them fail with errors like this one:
InternalErr: Error during isolated evaluation of expression `results.a`:
Not found: Flow result by id in leaf jobs not found at name 0191c7b0-0567-020f-02b5-872fbd9297a1, a
InternalErr: Error during isolated evaluation of expression `results.a`:
Not found: Flow result by id in leaf jobs not found at name 0191c7b0-0567-020f-02b5-872fbd9297a1, a
I don't think it's an issue in the flow itself, as the other runs succeed, but I can't seem to pinpoint the reason. The error appears to be happening in g most often, which has an argument that is set to results.a: https://img.qilin-qilin.ts.net/2024-09-09_09-07-06_rkh7D.webp If I look at the node status g in a failed run, it says it has "No arguments": https://img.qilin-qilin.ts.net/2024-09-09_09-08-28_F7OJF.webp Any pointers?
31 Replies
rubenf
rubenf2mo ago
what job is: 0191c7b0-0567-020f-02b5-872fbd9297a1
invakid404
invakid4042mo ago
0191c7b0-0567-020f-02b5-872fbd9297a1 appears to be d, i.e. the second branch on the screenshot and the error itself happens in g
invakid404
invakid4042mo ago
something else that I just spotted that may or may not be relevant: https://img.qilin-qilin.ts.net/2024-09-09_09-21-29_Ot8bu.webp
invakid404
invakid4042mo ago
could it be caused by a sudden crash or something? maybe my windmill instance running out of memory or something?
rubenf
rubenf2mo ago
would you be able to share a minimal flow that has the same issue and reproduction steps? (e.g: run that flow x times) that it takes time waiting for an executor is not relevant The issue is that it's looking in the branch for a when it shouldn't (and look in the root parent job) but that behavior shouldn't be random at all
invakid404
invakid4042mo ago
I will try to make a reproducible example later today, as I unfortunately have higher priority tasks at work :D @rubenf I think I have a reproduction I have no clue how much of it is actually relevant to triggering the error but it's a starting point lemme figure out a better way to share these as they look awful when i send them as messages
invakid404
invakid4042mo ago
there flow_one has the same "shape" as my original flow -- a branch going into a branch and flow_two has a single inline script that runs the first flow five times asynchronously which is what I also do in my original flow on my end, all five runs failed
invakid404
invakid4042mo ago
(I messed up and sent the same flow twice, here's the first one)
invakid404
invakid4042mo ago
@rubenf let me know if you need any further information^ I'm running EE v1.390.1
rubenf
rubenf2mo ago
On it, thanks do you mind sharing your license id in DM?
invakid404
invakid4042mo ago
for sure I can't DM you unless I have you as a friend though just sent it
rubenf
rubenf2mo ago
@invakid404 I imported the flow, what should I do to reproduce the issue? (Test flow works everytime for me)
invakid404
invakid4042mo ago
all I need to reproduce is to run flow two
invakid404
invakid4042mo ago
i cropped it kinda awkwardly
invakid404
invakid4042mo ago
but you get the idea on my end flow one is u/tsvetomir/wmill_error_reproduction and flow two is u/tsvetomir/wmill_error_reproduction_runner the amount of runs seems to be completely irrelevant on my end as well
rubenf
rubenf2mo ago
Ok I know what the issue, I didn't understand that you were launching them in that way
invakid404
invakid4042mo ago
it fails as well even if i make it one run yeah, sorry, should've explained it better
rubenf
rubenf2mo ago
runFlowAsync in this context will run those as if the the flow in which you triggered it was the ultimate root job and you have each flow rewriting the leaf jobs state at the root anyway there is a way to do what you want to do on it set env variable "WM_ROOT_FLOW_JOB_ID" to undefined before you run runFlowAsync it will have them be started as independent flows which is what you want
invakid404
invakid4042mo ago
oh, I see do I have to set WM_ROOT_FLOW_JOB_ID back to its original value afterwards? in my real flow, I do other stuff afterwards
rubenf
rubenf2mo ago
Depending on what you do yes so better safe than sorry
invakid404
invakid4042mo ago
👍 @rubenf just following up on this, I had a chance to try your suggested solution and I can confirm it does indeed work would you say it's worth documenting this/adding some option to runFlowAsync that does this?
rubenf
rubenf2mo ago
I'm pondering on it, it's a pretty niche use-case, you need to use runFlowAsync AND multiple time AND in parallel we want to refactor how WM_ROOT_FLOW_JOB_ID work in not too long, we might revisit then
invakid404
invakid4042mo ago
if I have to elaborate on what my use case is, I have a flow that extracts data from a document, then I want to process each entry from that document further and in parallel then I have some wmill API stuff in my app that shows the "subflows" that are still in progress and stuff like that I kind of just assumed that runFlowAsync would run them as separate jobs, I wasn't aware of WM_ROOT_FLOW_JOB_ID's effect on it i.e. my intent was always to run them as separate jobs that are completely detached from the main flow
rubenf
rubenf2mo ago
noted, I think we have to revisit the options on runFlowAsync and the benefits of still having them attached automatically it has benefits for Workflow-as-code for instance We will revisit this holistically
invakid404
invakid4042mo ago
thanks for the help, I'll be on the lookout for updates
rubenf
rubenf2mo ago
Btw this is great: https://invak.id/long-running-tasks, do you mind if we highlight this in show-and-tell ?
invak.id
inva's personal corner
Geeking out over software engineering and related stuff I find interesting
invakid404
invakid4042mo ago
I was considering sending it myself, but I actually forgot, so yeah, for sure :D
rubenf
rubenf2mo ago
Then please do