wdesert
wdesert•3mo ago

Sudden and huge growth of Windmill's DB

I noticed just today that starting from about a month ago, our Windmill's CloudSQL instance started to grow in disk usage from 3-4 GBs to 90-100 GBs all of a sudden. I know, this is not entirely related to Windmill per se, but maybe, based on your experience, there are some specific things that can trigger this growth and that I should take a look at first of all? Thanks!
No description
35 Replies
rubenf
rubenf•3mo ago
First thing is take a look at the size of each table and what rows are taking space
wdesert
wdesertOP•3mo ago
No description
rubenf
rubenf•3mo ago
And what's taking space in v2_job
wdesert
wdesertOP•3mo ago
No description
wdesert
wdesertOP•3mo ago
These look like the main contributors to the party
select created_at::date, count(*) from v2_job
group by 1 order by 1 desc
select created_at::date, count(*) from v2_job
group by 1 order by 1 desc
created_at,count
2025-06-17,268931
2025-06-16,211101
2025-06-15,30371
2025-06-14,68562
2025-06-13,313508
2025-06-12,194162
2025-06-11,117472
2025-06-10,272306
2025-06-09,246656
2025-06-08,27860
2025-06-07,46278
2025-06-06,166531
2025-06-05,193947
2025-06-04,210364
2025-06-03,273050
2025-06-02,302842
2025-06-01,10117
2025-05-31,6905
2025-05-30,152153
2025-05-29,126787
2025-05-28,148021
2025-05-27,52726
2025-05-26,102431
2025-05-25,6218
2025-05-24,50964
2025-05-23,30185
2025-05-22,39286
2025-05-21,36791
2025-05-20,33235
2025-05-19,59970
2025-05-18,510
2025-03-13,14
created_at,count
2025-06-17,268931
2025-06-16,211101
2025-06-15,30371
2025-06-14,68562
2025-06-13,313508
2025-06-12,194162
2025-06-11,117472
2025-06-10,272306
2025-06-09,246656
2025-06-08,27860
2025-06-07,46278
2025-06-06,166531
2025-06-05,193947
2025-06-04,210364
2025-06-03,273050
2025-06-02,302842
2025-06-01,10117
2025-05-31,6905
2025-05-30,152153
2025-05-29,126787
2025-05-28,148021
2025-05-27,52726
2025-05-26,102431
2025-05-25,6218
2025-05-24,50964
2025-05-23,30185
2025-05-22,39286
2025-05-21,36791
2025-05-20,33235
2025-05-19,59970
2025-05-18,510
2025-03-13,14
rubenf
rubenf•3mo ago
Don't look at count, look at max size
wdesert
wdesertOP•3mo ago
Max size of rows, you mean?
rubenf
rubenf•3mo ago
Yes
wdesert
wdesertOP•3mo ago
Hm, give me a few minutes The DB is being pretty slow I also see that around 50-80 MBs is written every minute It seems that a specific kind of row is bombarding this table
http-f/workflows/whh_or_prod
http-f/workflows/whh_or_prod
I see that rows with this as created_by appear endlessly Even though I don't see any runs in the UI for this flow
wdesert
wdesertOP•3mo ago
No description
wdesert
wdesertOP•3mo ago
As you can see from created_at, there are really lots of them
wdesert
wdesertOP•3mo ago
No description
wdesert
wdesertOP•3mo ago
Even though we don't have even nearly as many runs right now
wdesert
wdesertOP•3mo ago
No description
wdesert
wdesertOP•3mo ago
Looks like one of the values is doing something extraordinary
wdesert
wdesertOP•3mo ago
Can it be due to very inefficient loops like this in this flow:
No description
wdesert
wdesertOP•3mo ago
It iterates over thousands of values from an input array, and this seems to cause those endless writes to the DB ... which makes me think that those jobs just have huge args (thousands of keywords listed)
wdesert
wdesertOP•3mo ago
No description
wdesert
wdesertOP•3mo ago
So those large blobs are TOASTed, and this takes 78 GBs of space
wdesert
wdesertOP•3mo ago
No description
wdesert
wdesertOP•3mo ago
It seems that args is the culprit, which makes sense because some of the steps receive large arrays as arguments @rubenf , I am really sorry for disturbing and I know it's not a problem with Windmill per se, we just have large JSON blobs as arguments, which you explicitly advice against in the UI, but some help would be very much appreciated I still don't understand why so many entries are written into v2_joband what we can do to mitigate apart from rewriting our flows completely
rubenf
rubenf•3mo ago
if you can't reduce your arg size, then best might be to use the Runtime -> Lifetime options to remove them after executions also you can set a lower retention period
wdesert
wdesertOP•3mo ago
Wow, that is a really cool option If I enable a small retention period (say, 1 day) instead of the current value (30 days), will it garbage collect the old values?
rubenf
rubenf•3mo ago
yes
wdesert
wdesertOP•3mo ago
I have just set it to 4 days, but it seems that it doesn't happen instantly
wdesert
wdesertOP•3mo ago
No description
wdesert
wdesertOP•3mo ago
Can it be something scheduled that will kick in later or maybe I should do something else apart from changing the retention period in the instance settings?
rubenf
rubenf•3mo ago
Yes it can take a bit of time But eventually it will kick in
wdesert
wdesertOP•3mo ago
Okay, thanks a lot! I really appreciate the help Also, I am going to attach a GCP bucket, which should also help, right? Instead of storing large blobs that will get TOASTed by Postgres 🙂
rubenf
rubenf•3mo ago
It won't help for args
wdesert
wdesertOP•3mo ago
Ah, okay, so for args our best bets are retention period and step lifetimes
rubenf
rubenf•3mo ago
yes
wdesert
wdesertOP•3mo ago
Well, it got deleted now, as far as I see. The problem is, without a VACUUM FULL, it doesn't really help 🙂
rubenf
rubenf•3mo ago
autovacuum would have got to it since it's regular the db usage will be somewhat constant
wdesert
wdesertOP•3mo ago
Yeah, the problem is that without VACUUM FULL (which is not scheduled regularly by CloudSQL, only VACUUM) space will not be reclaimed on disk So we will have to run it manually at some point

Did you find this page helpful?