Sudden and huge growth of Windmill's DB
I noticed just today that starting from about a month ago, our Windmill's CloudSQL instance started to grow in disk usage from 3-4 GBs to 90-100 GBs all of a sudden.
I know, this is not entirely related to Windmill per se, but maybe, based on your experience, there are some specific things that can trigger this growth and that I should take a look at first of all?
Thanks!

35 Replies
First thing is take a look at the size of each table and what rows are taking space

And what's taking space in v2_job

These look like the main contributors to the party
Don't look at count, look at max size
Max size of rows, you mean?
Yes
Hm, give me a few minutes
The DB is being pretty slow
I also see that around 50-80 MBs is written every minute
It seems that a specific kind of row is bombarding this table
I see that rows with this as
created_by
appear endlessly
Even though I don't see any runs in the UI for this flow
As you can see from
created_at
, there are really lots of them
Even though we don't have even nearly as many runs right now

Looks like one of the values is doing something extraordinary
Can it be due to very inefficient loops like this in this flow:

It iterates over thousands of values from an input array, and this seems to cause those endless writes to the DB
... which makes me think that those jobs just have huge args (thousands of keywords listed)

So those large blobs are TOASTed, and this takes 78 GBs of space

It seems that
args
is the culprit, which makes sense because some of the steps receive large arrays as arguments
@rubenf , I am really sorry for disturbing and I know it's not a problem with Windmill per se, we just have large JSON blobs as arguments, which you explicitly advice against in the UI, but some help would be very much appreciated
I still don't understand why so many entries are written into v2_job
and what we can do to mitigate apart from rewriting our flows completelyif you can't reduce your arg size, then best might be to use the Runtime -> Lifetime options to remove them after executions
also you can set a lower retention period
Wow, that is a really cool option
If I enable a small retention period (say, 1 day) instead of the current value (30 days), will it garbage collect the old values?
yes
I have just set it to 4 days, but it seems that it doesn't happen instantly

Can it be something scheduled that will kick in later or maybe I should do something else apart from changing the retention period in the instance settings?
Yes it can take a bit of time
But eventually it will kick in
Okay, thanks a lot!
I really appreciate the help
Also, I am going to attach a GCP bucket, which should also help, right?
Instead of storing large blobs that will get TOASTed by Postgres 🙂
It won't help for args
Ah, okay, so for args our best bets are retention period and step lifetimes
yes
Well, it got deleted now, as far as I see. The problem is, without a VACUUM FULL, it doesn't really help 🙂
autovacuum would have got to it
since it's regular the db usage will be somewhat constant
Yeah, the problem is that without VACUUM FULL (which is not scheduled regularly by CloudSQL, only VACUUM) space will not be reclaimed on disk
So we will have to run it manually at some point