andness
andness6mo ago

Hi, I have a script that is suddenly

Hi, I have a script that is suddenly failing quite often with the error
ExecutionErr: error during execution of the script: process terminated by signal: Some( 9, ), stopped_signal: None, core_dumped: false
In the past I've seen this with OOMs, but this script is only peaking at 116MB on a worker with 2GB of memory so I don't understand how it can be memory related.
13 Replies
andness
andnessOP6mo ago
Oh, turns out the worker only has 512MB of memory, but still, that's a big gap between 116 and 512?
rubenf
rubenf6mo ago
@andness the scripts is only taking 116 but there is the OS, the filesystem buffer and windmill itself as well taking memory. Also it's only peaking at 116MB as the last data point, it wouldn't report the actual peak if it crash
andness
andnessOP6mo ago
Yeah ok, I think that makes sense and I'm increasing the memory on the worker now. Looks like what happened was that I had a 2 instances of a lower memory worker configured and sometimes the job would run there. When debugging this I ran into a small challenge: It was difficult to map the worker to the docker container. In the end I ended up brute force searching all the docker logs until I found the worker id that's printed in Windmill and then using that to look at the container.
rubenf
rubenf6mo ago
The worker name is derived from the hostname
andness
andnessOP6mo ago
Hm, this is running on ECS (EC2) so each worker is a docker container there. I tried looking at docker inspect but couldn't find the worker name anywhere For example wk-default-1.compute.internal-2nJJA
rubenf
rubenf6mo ago
so if you can have a mapping from hostname to containers that would make things easier. In kubernetes, the name of the worker and pod are derived from one another
andness
andnessOP6mo ago
Ok, so if I could somehow modify the hostname that the docker container reports in ECS then that would show up in Windmill?
rubenf
rubenf6mo ago
that's the hostname of the container: .compute.internal-2nJJA
andness
andnessOP6mo ago
Yeah ok, not all that familiar with the docker commands, I'll google a bit to see if there's a way to find that hostname Hmmm... the output from docker inspect contains this: "Hostname": "ip-172-31-37-120.eu-central-1.compute.internal", which does not match the names in the Windmill UI
rubenf
rubenf6mo ago
it, does, but we truncate the prefix otherwise worker names would be super long It works well for pods, less well for ecs
andness
andnessOP6mo ago
Yeah ok, so that 2nJJA is not something I'll find in ECS?
rubenf
rubenf6mo ago
no it's a unique worker name (you could have multiple on same container)
andness
andnessOP6mo ago
ok, so my feature request then would be to be able to expand the full name in the Windmill UI, for example it could be a hidden column that you only toggle on if you need it