ghaar00
ghaar00•2mo ago

Can't cancel any jobs

I've deployed Windmill (self-hosted on DigitalOcean) and the deployment went great, everything for the most part is running great. Except that every time I launch a flow that loops over some data or may contain a bug of some sort, I have no way to stop it - even after restarting the VM. Every time I've tried to hit Cancel or Force Cancel I get "could not cancel job" and I also have several "zombie jobs" stacking up now.
Here's an example Flow where I am simply trying to download a list of events from an external API, then loop through each event and query the event's Listings/Sales via API.
summary: Seatgeek description: "" value: modules: - id: a value: path: f/trident/fetch_events_to_sync type: script is_trigger: false input_transforms: {} summary: Fetch Eligible Events continue_on_error: false - id: b value: type: forloopflow modules: - id: c value: path: f/trident/seatgeek_event_handler type: script is_trigger: false input_transforms: event_id: expr: flow_input.iter.value[0] type: javascript operation: expr: flow_input.iter.value[1] type: javascript summary: Sync Listings / Sales continue_on_error: false iterator: expr: results.a.slice(0,6) type: javascript parallel: false parallelism: null skip_failures: true summary: For Each Event same_worker: false schema: $schema: https://json-schema.org/draft/2020-12/schema properties: {} required: [] type: object order: [] Happy to provide any additional details here. I'd love to buy licenses for Windmill, this seems to be the only real issue I'm having with it but it's a big one!
15 Replies
ghaar00
ghaar00OP•2mo ago
I think the main thing is just figuring out why I can't cancel any jobs... it's making it almost impossible to iterate on bugs since every time I make a mistake it results in a never-ending loop
rubenf
rubenf•2mo ago
Normally cancelling the root flow should be enough What's the error you see when you cancel the root flow?
ghaar00
ghaar00OP•2mo ago
hey @rubenf thx for getting back to me! i might not be sharing all the details needed here, but hopefully this at least shows what i'm looking at here.
https://www.loom.com/share/7fd39efc95134ff6b3394215a3aacfa0?sid=01e2a27d-43d9-484a-8998-a6152d81889b
ghaar00
ghaar00OP•2mo ago
i would be happy to create a login to hop on here and take a look if that would help?
rubenf
rubenf•2mo ago
We will take a look when we can @Hugo
ghaar00
ghaar00OP•2mo ago
thank you!
Hugo
Hugo•2mo ago
hey @ghaar00 in the logs of one of the iteration step there is a log "Restarted job after not receiving job's ping for too long" i think it's not just a bug in the flow, your workers probably crashed, maybe it's oom i'll try to reproduce the cancel issue and fix it also can you try to see if you're able to cancel the iterations themselves?
ghaar00
ghaar00OP•2mo ago
I can't seem to cancel from anywhere, but it certainly could be poorly configured workers! My VM has 16vCPU / 32GB memory and I have 3 workers that I think are configured w 1vCPU/2GB each.
No description
No description
No description
rubenf
rubenf•2mo ago
It's not related to workers Try to cancel the root job
ghaar00
ghaar00OP•2mo ago
I think that's what I've been doing - here's the Flow where I've tried canceling. I've basically hit every single Cancel and Force Cancel button possible but they all result in a could not cancel job toast.
I might not be in the right area though - here's another short video with some attempts. https://www.loom.com/share/c5b762be0cb643ddb4e7c074984ad52f?sid=8e5b84ac-3bc5-4776-bc99-d54d24e11524
No description
ghaar00
ghaar00OP•2mo ago
Feel free to send me an email address and I can invite you as a user to take a quick look if that's easier! I appreciate any help you can offer here, everything has been very intuitive so far aside from this problem
rubenf
rubenf•2mo ago
ruben@windmill.dev also @ghaar00 you should look into the response details of the cancel request in the network tab but you can just invite me, that might be easer
ghaar00
ghaar00OP•2mo ago
OK invite sent as superadmin! I believe SMTP should be working to fire off the invite, let me know if you don't get one
rubenf
rubenf•2mo ago
The job terminated so I can't reproduce unfortunately. I'm traveling the next few days, but feel free to share an access with hugo@windmill.dev if you can reproduce just once and we will login and investigate actually I gave him my access, just ping us when you can reproduce given you are in trial, we will create a dedicate channel for you (or do you prefer in slack). @henri-c
henri-c
henri-c•2mo ago
Yep let me know if you prefer Discord or Slack, and dm your email address if Slack 🙂

Did you find this page helpful?