Workers show up in service logs, but not in the Workers UI
I have connected one native worker (ECS service) and one GPU worker (on-prem server, connected to VPN/VPC). I have added routing to my VPN config and can connect to the PG DB using pgAdmin and psql when ssh-ed into both of these machines. Docker logs for the worker containers show normal connection to PG instance and, most importantly, workers show in the service logs (attached screenshot).
The problem is the Workers screen shows 0 worker for all three worker groups (screenshot attached).
This is also preventing me to create a root admin user, since the Deno job which is (I assume) supposed to execute to do this doesn't find an available worker. I can see that my job queue has some jobs, probably reflecting the fact I tried to create the root user 3 times without success.
I am curious are there any additional ports I need to open except 5432? If not, how can I further debug the issue? Thanks!


60 Replies
Here's the proof queue in the fresh server/db has some jobs (admin account creation intents) and logs from the on-prem worker, showing the DB connection happened successfully. Interestingly, there are warnings about DB being undersized. Could this be relevant?




Not sure if this console output is relevant. The first one is EE feature, but the one below it seems suspicious... My network tab only shows a single EE endpoint failing (/list) otherwise all normal.

what is that bad request response exactly?
Not sure, it seems like there is only a single bad request in the whole network tab, and that's the EE /list endpoint (expected), so it seems like the second console log comes from that request? :/

actually that's from the queue drawer, it's unrelated
I'd understand if the worker can't talk to the server/db at all, but the fact I see both native and gpu worker in the service logs tells me at least DB connection is working properly... Is there some kind of port I need to open in my SGs or it is only one way communication?
no it should work
I was hoping you will not say that 😄
I will try to reproduce quickly but we would probably know if it was a common issue
I can't reproduce
the relevant api call is:
/api/workers/list?per_page=1000&ping_since=300
That return empty list, status 200.
then I would investigate the worker_ping table in the db
your logs show the ping is sent so maybe a timezone issue, unclear to me
There are 2 hours of difference between the actual time (17:01 PM) and the timestamp in the DB (15:01 PM), but why would that be an issue?

we only display the last 300s worker pings
but also it's probably just the timezone difference
which mean the time is correct
yup, I am in GMT+2 and the zone in that field is GMT+0, so all good there
keep your instance up, and verify that ping_at is less than 300s, then look the workers page
just did and they don't show up... but one question: are we sure that local time is correctly mapped to GMT before comparing the diff with ping_at? because if not this would explain me getting the empty array back... I'm getting 2hrs of diff and everything grater than 300s is discarded
I'll try manually pinging the same API with very large ping_since param to see if I start getting those
if yes, that will confirm the doubt
yes i'm sure we handle timezone correctly
I believe you, but let me quickly check just in case 😄
yeah, still empty string
https://github.com/windmill-labs/windmill/blob/422a02d8f78cae8e71ace405f1d423978054cb0b/backend/windmill-api/src/workers.rs#L98
so weird, looking into the code it should return it :/
GitHub
windmill/backend/windmill-api/src/workers.rs at 422a02d8f78cae8e71a...
Open-source developer platform to power your entire infra and turn scripts into webhooks, workflows and UIs. Fastest workflow engine (13x vs Airflow). Open-source alternative to Retool and Temporal...

this is the real query:

interesting, substituting the arguments (admin=TRUE, ping_since=300, offset=0, per_page=1000) returns all those native workers O.o
I am assuming the fact I have only a single default admin@windmill.dev user is not relevant?
what about the gpu worker group?
are you an admin on that workspace?
Yes, this is the default superadmin account admin@windmill.dev
I am not sure if I had that worker on when I was performing the tests
Just checked and I didn't have it running, so the output was expected.
and starting it makes it show in the query result immediately, so all good there

Do you see it in the workers page?
no
I take exactly the same worker config, just point it to my locally hosted instance (the same local network as the worker) and it works nicely, but pointing it to the RDS database doesn't work
what's the result of:
replacing demo with your current workspace
and can you list your users in your RDS including their attributes, the equivalent of
\du+;
in psql{"workspace_id":"admin","email":"admin@windmill.dev","username":"admin@windmill.dev","is_admin":true,"is_super_admin":true,"created_at":"2025-05-20T07:50:09.178777677Z","groups":[],"operator":false,"disabled":false,"role":"superadmin","folders_read":[],"folders":[],"folders_owners":[],"name":null}
I didn't get the chance to create any user other than the default admin one created by migrations
can you still list them with their attributes please
I want to check for Bypass RLS
also can you run the same query above but as windmill_admin instead of your db user
seems like I have windmill_user and windmill_admin groups, not users... and when I try to connect to my DB using the windmill_admin/changeme it says the credentials are wrong

windmill_admin has the Bypass_RLS set, _user doesn't

and then the user I'm connecting to the db with is called "samantha", and that user is a member of the windmill_admin group (and windmill_user)
...so this is not possible since I don't have the windmill_admin user.
you need to set role once connected
they have no login it's normal
got it, and I think we might be onto something 😄 when setting the role to windmill_admin I get no results back from the query! O.o
it being bypass rls it doesn't make much sense
if you select * from worker_ping you also get no results?
this works
SET ROLE samantha;
select * from worker_ping;
this doesn't
SET ROLE windmill_admin;
select * from worker_ping;
this doesn't work either
SET ROLE windmill_user;
select * from worker_ping;
can you show all the policies and grant for windmill_admin on worker_ping
you hit the nail on the head! there were none 🙂 After granting them it all started working! Thanks for the support Ruben, this is amazing!
I'll keep recommending Windmill to my friends and colleagues, these kind of things make so much difference!
just to be sure, what did you to make it work?
you did a grant of worker_ping to windmill_admin ?
What's odd if those were missing you should have had an error
yes, but I did have an error on runs table as well, so I had to grant all tables to windmill_admin and then everything started working, my user creation job executed and the whole instance became ready for use
most probably those were not granted since I created a scheme manually in a shared RDS DB and it wasn't called "public" but "samantha", could it be something with that?
did you create the schema after the initial migrations?
no, before
you're supposed to use PG_SCHEMA=samantha
when?
on every servers
that's the real fix
sorry, but where should I set that env var? in my ECS task definitions (essentially in the docker container running my server)?
yes
ah ok
I will do that
did I miss it in the documentation?
should this be set on workers as well?
no need for the worekrs
amazing, thanks
unfortunately, I'm still having PG permission issues even though I have set PG_SCHEMA=samantha in my server docker container... do I need to nuke the DB to force migrations to run again in this new setup?
not too sure unfortunately
but if you can use the default schema, I would start with that
I can't :/ it is a shared DB
each product has it's own schema and a user
Then PG_SCHEMA=samantha and nuke the db might work
I will try that
thank you Ruben!
migrations did run (screeshot), and then I got the setup screen and right after saving my instance settings (so, before asking me to change the default superuser) I got errors on this /route apps/get/g/all/setup_app?nomenubar=true&workspace=admins (screenshot attached)
SqlErr: error returned from database: relation "app" does not exist @apps.rs:494:17
SqlErr: error returned from database: relation "script" does not exist @scripts.rs:323:16
SqlErr: error returned from database: relation "websocket_trigger" does not exist @workspaces.rs:1321:26



I can confirm that workers are pinging the new DB (I see the logs with samantha ROLE)
can you set role as windmill_user and see if you see those tables in the samantha schema
also you're sure you passed PG_SCHEMA ?
absolutely sure, checked it in the container interactive session :/
at the end I solved it just by manually granting the permissions to both _user and _admin