Best Practices for Windmill on AWS ECS Fargate with Autoscaling Workers
Setup:
- AWS ECS Fargate, 3-15 autoscaling workers
- EFS at /tmp/windmill/logs, ephemeral /tmp/cacheworker for pip cache
- Version: ghcr.io/windmill-labs/windmill:main
Issue:
Intermittent ModuleNotFoundError: No module named 'shopify.resources.option' when
workers scale.
Current Config:
worker_groups = [{
cpu = 2048, memory = 4096
min_capacity = 3, max_capacity = 15
environment = { CACHE_DIR = "/tmp/cacheworker" } # Ephemeral
}]
Questions:
1. Recommended cache strategy for Fargate?
- Ephemeral /tmp vs EFS for pip cache?
- Does Windmill have locking for concurrent pip installs?
2. Can we pre-bake packages into custom worker images?
- Install all packages from .script.lock at image build time?
- Will this conflict with Windmill's package management?
3. ECS/Fargate reference architecture?
- Any documented patterns for dynamic scaling?
- Warm pool or pre-installation strategies?
4. What do production users do? - Custom images vs shared cache vs per-worker install? Considering: Building custom image with pre-installed packages for 5-10s startup vs current 60-90s.
4. What do production users do? - Custom images vs shared cache vs per-worker install? Considering: Building custom image with pre-installed packages for 5-10s startup vs current 60-90s.
2 Replies
Hi, shared cache on s3 is optimal and available on EE. EFS is gonna slow down all your performance and not recommended. Also proper auto-scaling is also an EE feature.
so without EE:
1. ephemeral /tmp
2. it's not possible
3. autoscaling by Windmill is EE, not sure how you would autoscale on ECS given those will only look at CPU metrics
4. production users use the shared s3 cache but without it, you can just rely on ephemeral folders per worker
Hi Rubenf - Appreciate your answers - right now I am just doing a POC with windmill and it was an issue I was running into.