ZOMBIE APOCALYPSE

My jobs are not blocking, I have plenty CPU / MEM headroom, yet my jobs seem to get killed with:

Job timed out after no ping from job since 2025-03-03 23:28:53.888358 UTC (ZOMBIE_JOB_TIMEOUT: 60, reason: "RestartLimit (3)

I'm running Deno scripts and each worker has 1 cpu and 512 request / 768 limit mem

13 Replies

rubenf•6mo ago

hard to tell, the worker logs when they ping, so were those jobs not pinged?

pixeleetOP•6mo ago

let me check the worker logs one sec

pixeleetOP•6mo ago

It's really strange because the logs look like the pinging is happening but still getting zombied out. Look:

rubenf•6mo ago

that's not a ping from the worker, that's your server logs

pixeleetOP•6mo ago

last job is still running ping: INFO 2025-03-04T17:23:17.613084808Z [resource.labels.containerName: windmill-worker] job 01956200-82a9-0326-f919-71ee1d38ad87 on wk-chromium-lzj4p-qglY8 in voja still running. mem: 1995592kB, peak mem: 1995592kB Is that the log I'm looking for?

rubenf•6mo ago

yes

pixeleetOP•6mo ago

well that was still 4 seconds after the last ping the error states But the worker is simply not pinging, so I'll re-check the resource constraints

rubenf•6mo ago

does the worker die afterwards?

pixeleetOP•6mo ago

the job is restarted on a new worker, so I assume the worker dies

rubenf•6mo ago

why do you have to assume, do you not have access to your workers exit time?

pixeleetOP•6mo ago

I don't see when the pod is restarted or not sure where to find out 😄

pixeleetOP•6mo ago

I'm not hitting limits, slighlty above requested here and there anyway I'll keep looking, I'm sure it's some resource or skill issue 😄

ZOMBIE APOCALYPSE

Did you find this page helpful?