Scheduled script stops scheduling after wait for executor
Hi!
We are experiencing some issues on one of our Windmill instances. It runs a few scheduled jobs and for a still unknown reason sometimes the scheduled jobs wait a very long time for an executor. Also sometimes it seems the run time is displayed as a negative number (which i'm guessing may be some sort of overflow due to crazy long wait times).
The jobs do seem to eventually run just fine, even if the run time is broken like that.
The problem is that after such a "broken" job runs, the schedule stops working and no further jobs are scheduled. We are trying to figure out what the cause behind the large executor wait time may be (as the instance is not under heavy load at all and the jobs don't take very long to execute).
I guess the problem consists of two parts
- The long executor wait time for whatever reason (despite the actual worker load being low)
- And the second reason which i want to ask about here: Why does the schedule seem to stop working after a job waits a while for an executor? Is this a known issue or possibly even expected behavior?
Sadly i have not managed to reproduce this issue on another Windmill instance..


24 Replies
what version is that instance on?
and what are the workers logs showing, anything suspicious?
I can't tell you the exact version for sure right now but it was last updated march 20th around 15:00UTC so I'm quite sure it should be windmill-full:1.478.1
This is to be investigated, I'll check on that
hey @rubenf , i have checked the version and it is indeed 1.478.1
I started another schedule on the same instance today again, hourly this time with the same issue occouring at the 4th run
I investigated the logs a bit and it appears the script gets executed far too early? on an hourly schedule it got executed 4 minutes after the previous run.... But i see no errors in the logs around the time this happened
Lets call the jobs:
14:03 -> "Job 4"
12:00 -> "Job 3"
11:00 -> "Job 2"
10:00 -> "Job 1"
Job 1
from frontend requests:
actually fetched and ran on a worker at
2025-03-27T09:00:00.008204Z
Job 2
from frontend requests:
actually fetched and ran on a worker at 2025-03-27T10:00:00.007376Z
Job 3
from frontend requests:
actually fetched and ran on a worker at 2025-03-27T11:00:00.004517Z
Job 4
from frontend requests:
actually fetched and ran on a worker at 2025-03-27T11:03:41.741176Z
unusual parts of the worker logs:

So it seems the job is actually executed far too early instead of too late?
do you have the scheduled_for of those jobs?
it's the most important part
the column
scheduled_for
in v2_job_queue
? The v2_job_queue
table is emptyit's in v2_job
actually it's not ...
Yeah it's not..
however, you can find it in the logs of the previous tick
grep the previous:
actually it won't be there
Yeah it isn't there either..
at what time was the previous:
2025-03-27T11:04:04.617075Z INFO [...] Schedule ... scheduling next job for ... in [...]
the only explanation I have is that the clock of your db changed
at 11am, it started to believe it was 1pmI searched for
27 11:00:00
and 27T11:00:00
in the whole database and the logs and could not seem to find anything relevant...Can you search for:
scheduling next job for
also are you on a single vps or do you have distributed workers?2025-03-27T11:00:24.435531Z
This is a single VPS, the server, workers and the database run thereso yeah given the above, I think your postgresql somehow changed time
can you run
select now()
in your db ?I cannot run anything in the database right now sadly.. The next opinion for that would be monday
but here is why I believe that is the case
the started_at shows 13:XX
the started_at corresponds to now() for when it was pulled (in windmill every time is relative to the database)
I have just now checked the postgres logs, time seems to be progressing normally in the logs at least around the time it happened there as one would expect at least
but that's not the postgres time
that's the localtime at which the logs were emitted
if at that point in time, the database believed it was 13:XX, then it would indeed pull the job, then schedule the next one (at which point it seems to have believed again it was 11:XX) and then realized the job already existed (itself)
I see! This is very helpful insight already, I'll look into any weird DB behaviour then
I have no other explanation for started_at suddenly being very shifted compared to your worker time as compared to the other jobs
I can also tell you that scheduled_for was 12:00 because of the way waiting_time is computed
What's definitely interesting is that the shift seems to be very close to two hours, hinting it may be some timezone issue somewhere
Thank you so much for your help! I'll be looking into this and see if i can find the problem
adding scheduled_for in the logs for next time
Thank you!!!
I'll report back when the issue is figured out^^