Hi, I have a script that is suddenly
Hi, I have a script that is suddenly failing quite often with the error
ExecutionErr: error during execution of the script: process terminated by signal: Some( 9, ), stopped_signal: None, core_dumped: falseIn the past I've seen this with OOMs, but this script is only peaking at 116MB on a worker with 2GB of memory so I don't understand how it can be memory related.
13 Replies
Oh, turns out the worker only has 512MB of memory, but still, that's a big gap between 116 and 512?
@andness the scripts is only taking 116 but there is the OS, the filesystem buffer and windmill itself as well taking memory. Also it's only peaking at 116MB as the last data point, it wouldn't report the actual peak if it crash
Yeah ok, I think that makes sense and I'm increasing the memory on the worker now. Looks like what happened was that I had a 2 instances of a lower memory worker configured and sometimes the job would run there.
When debugging this I ran into a small challenge: It was difficult to map the worker to the docker container. In the end I ended up brute force searching all the docker logs until I found the worker id that's printed in Windmill and then using that to look at the container.
The worker name is derived from the hostname
Hm, this is running on ECS (EC2) so each worker is a docker container there. I tried looking at
docker inspect
but couldn't find the worker name anywhere
For example wk-default-1.compute.internal-2nJJA
so if you can have a mapping from hostname to containers that would make things easier. In kubernetes, the name of the worker and pod are derived from one another
Ok, so if I could somehow modify the hostname that the docker container reports in ECS then that would show up in Windmill?
that's the hostname of the container: .compute.internal-2nJJA
Yeah ok, not all that familiar with the docker commands, I'll google a bit to see if there's a way to find that hostname
Hmmm... the output from
docker inspect
contains this: "Hostname": "ip-172-31-37-120.eu-central-1.compute.internal",
which does not match the names in the Windmill UIit, does, but we truncate the prefix
otherwise worker names would be super long
It works well for pods, less well for ecs
Yeah ok, so that 2nJJA is not something I'll find in ECS?
no it's a unique worker name (you could have multiple on same container)
ok, so my feature request then would be to be able to expand the full name in the Windmill UI, for example it could be a hidden column that you only toggle on if you need it