wdesert•6mo ago

Sudden and huge growth of Windmill's DB

I noticed just today that starting from about a month ago, our Windmill's CloudSQL instance started to grow in disk usage from 3-4 GBs to 90-100 GBs all of a sudden. I know, this is not entirely related to Windmill per se, but maybe, based on your experience, there are some specific things that can trigger this growth and that I should take a look at first of all? Thanks!

35 Replies

rubenf•6mo ago

First thing is take a look at the size of each table and what rows are taking space

wdesertOP•6mo ago

rubenf•6mo ago

And what's taking space in v2_job

wdesertOP•6mo ago

These look like the main contributors to the party

select created_at::date, count(*) from v2_job
group by 1 order by 1 desc

select created_at::date, count(*) from v2_job
group by 1 order by 1 desc

created_at,count
2025-06-17,268931
2025-06-16,211101
2025-06-15,30371
2025-06-14,68562
2025-06-13,313508
2025-06-12,194162
2025-06-11,117472
2025-06-10,272306
2025-06-09,246656
2025-06-08,27860
2025-06-07,46278
2025-06-06,166531
2025-06-05,193947
2025-06-04,210364
2025-06-03,273050
2025-06-02,302842
2025-06-01,10117
2025-05-31,6905
2025-05-30,152153
2025-05-29,126787
2025-05-28,148021
2025-05-27,52726
2025-05-26,102431
2025-05-25,6218
2025-05-24,50964
2025-05-23,30185
2025-05-22,39286
2025-05-21,36791
2025-05-20,33235
2025-05-19,59970
2025-05-18,510
2025-03-13,14

created_at,count
2025-06-17,268931
2025-06-16,211101
2025-06-15,30371
2025-06-14,68562
2025-06-13,313508
2025-06-12,194162
2025-06-11,117472
2025-06-10,272306
2025-06-09,246656
2025-06-08,27860
2025-06-07,46278
2025-06-06,166531
2025-06-05,193947
2025-06-04,210364
2025-06-03,273050
2025-06-02,302842
2025-06-01,10117
2025-05-31,6905
2025-05-30,152153
2025-05-29,126787
2025-05-28,148021
2025-05-27,52726
2025-05-26,102431
2025-05-25,6218
2025-05-24,50964
2025-05-23,30185
2025-05-22,39286
2025-05-21,36791
2025-05-20,33235
2025-05-19,59970
2025-05-18,510
2025-03-13,14

rubenf•6mo ago

Don't look at count, look at max size

wdesertOP•6mo ago

Max size of rows, you mean?

rubenf•6mo ago

Yes

wdesertOP•6mo ago

Hm, give me a few minutes The DB is being pretty slow I also see that around 50-80 MBs is written every minute It seems that a specific kind of row is bombarding this table

http-f/workflows/whh_or_prod

http-f/workflows/whh_or_prod

I see that rows with this as created_by appear endlessly Even though I don't see any runs in the UI for this flow

wdesertOP•6mo ago

As you can see from created_at, there are really lots of them

wdesertOP•6mo ago

Even though we don't have even nearly as many runs right now

wdesertOP•6mo ago

Looks like one of the values is doing something extraordinary

wdesertOP•6mo ago

Can it be due to very inefficient loops like this in this flow:

wdesertOP•6mo ago

It iterates over thousands of values from an input array, and this seems to cause those endless writes to the DB ... which makes me think that those jobs just have huge args (thousands of keywords listed)

wdesertOP•6mo ago

So those large blobs are TOASTed, and this takes 78 GBs of space

wdesertOP•6mo ago

It seems that args is the culprit, which makes sense because some of the steps receive large arrays as arguments @rubenf , I am really sorry for disturbing and I know it's not a problem with Windmill per se, we just have large JSON blobs as arguments, which you explicitly advice against in the UI, but some help would be very much appreciated I still don't understand why so many entries are written into v2_joband what we can do to mitigate apart from rewriting our flows completely

rubenf•6mo ago

if you can't reduce your arg size, then best might be to use the Runtime -> Lifetime options to remove them after executions also you can set a lower retention period

wdesertOP•6mo ago

Wow, that is a really cool option If I enable a small retention period (say, 1 day) instead of the current value (30 days), will it garbage collect the old values?

rubenf•6mo ago

yes

wdesertOP•6mo ago

I have just set it to 4 days, but it seems that it doesn't happen instantly

wdesertOP•6mo ago

Can it be something scheduled that will kick in later or maybe I should do something else apart from changing the retention period in the instance settings?

rubenf•6mo ago

Yes it can take a bit of time But eventually it will kick in

wdesertOP•6mo ago

Okay, thanks a lot! I really appreciate the help Also, I am going to attach a GCP bucket, which should also help, right? Instead of storing large blobs that will get TOASTed by Postgres 🙂

rubenf•6mo ago

It won't help for args

wdesertOP•6mo ago

Ah, okay, so for args our best bets are retention period and step lifetimes

rubenf•6mo ago

yes

wdesertOP•6mo ago

Well, it got deleted now, as far as I see. The problem is, without a VACUUM FULL, it doesn't really help 🙂

rubenf•6mo ago

autovacuum would have got to it since it's regular the db usage will be somewhat constant

wdesertOP•6mo ago

Yeah, the problem is that without VACUUM FULL (which is not scheduled regularly by CloudSQL, only VACUUM) space will not be reclaimed on disk So we will have to run it manually at some point

Sudden and huge growth of Windmill's DB

Did you find this page helpful?