marko
marko4mo ago

Bun cache missing windmill-client

One of my flows started suddenly failing with the following error:
job=0190e46d-49f3-598f-e238-e13b0d595088 tag=bun worker=wk-default-eb4f47c94407-YMYN2 hostname=eb4f47c94407

skipping install, using cached buntar based on lockfile hash: xb6AgX-a-bSr6dsMtfSbyDy0oxGu0XSg7uiNtNx4HTs=

--- BUN CODE EXECUTION ---

error: Cannot find package "windmill-client" from "/tmp/windmill/wk-default-eb4f47c94407-YMYN2/0190e46d-49f3-598f-e238-e13b0d595088/main.ts"
Bun v1.1.18 (Linux x64 baseline)
job=0190e46d-49f3-598f-e238-e13b0d595088 tag=bun worker=wk-default-eb4f47c94407-YMYN2 hostname=eb4f47c94407

skipping install, using cached buntar based on lockfile hash: xb6AgX-a-bSr6dsMtfSbyDy0oxGu0XSg7uiNtNx4HTs=

--- BUN CODE EXECUTION ---

error: Cannot find package "windmill-client" from "/tmp/windmill/wk-default-eb4f47c94407-YMYN2/0190e46d-49f3-598f-e238-e13b0d595088/main.ts"
Bun v1.1.18 (Linux x64 baseline)
If I test the node, it works fine. It only fails when I run the deployed flow. It looks like it's just a caching issue, but I can't find any functionality in Windmill that would allow me to wipe the build cache. I did try to create a whole new node where I copy/pasted the code, but the problem persists. Any ideas?
30 Replies
rubenf
rubenf4mo ago
it is indeed a caching issue, on EE you can wipe the cache on the worker groups but if you can bash exec into your container and live investigate that would be great. We improved the caching mechanism for bun recently and that might be a by-product of it on what version of windmill are you ?
invakid404
invakid4044mo ago
I just ran into the same issue today, I am running CE v1.366.6-15-g8fcda68af
job=0190e4d9-f832-e2fa-2ab9-2dd2debb229d tag=bun worker=wk-default-e82d921f019408-wm1qh hostname=e82d921f019408

skipping install, using cached buntar based on lockfile hash: tIq3Yr9cAUfe4trUyu1nqBu1Ql03fZdnRQiQGxiYh8U=

--- BUN CODE EXECUTION ---

error: Cannot find module "@elastic/elasticsearch" from "/tmp/windmill/wk-default-e82d921f019408-wm1qh/0190e4d9-f832-e2fa-2ab9-2dd2debb229d/main.ts"
Bun v1.1.18 (Linux x64 baseline)
job=0190e4d9-f832-e2fa-2ab9-2dd2debb229d tag=bun worker=wk-default-e82d921f019408-wm1qh hostname=e82d921f019408

skipping install, using cached buntar based on lockfile hash: tIq3Yr9cAUfe4trUyu1nqBu1Ql03fZdnRQiQGxiYh8U=

--- BUN CODE EXECUTION ---

error: Cannot find module "@elastic/elasticsearch" from "/tmp/windmill/wk-default-e82d921f019408-wm1qh/0190e4d9-f832-e2fa-2ab9-2dd2debb229d/main.ts"
Bun v1.1.18 (Linux x64 baseline)
if I wipe the cache, it works for a single run, then it dies again
rubenf
rubenf4mo ago
you are using docker-compose ?
invakid404
invakid4044mo ago
no, my windmill is deployed on fly.io
rubenf
rubenf4mo ago
so you have no shared volume right ?
invakid404
invakid4044mo ago
the entirety of my windmill, including workers, is running on a single node, so all workers should have the same cache is what i'd assume
rubenf
rubenf4mo ago
do you have a single container ?
invakid404
invakid4044mo ago
yes
[env]
BASE_URL = 'https://xxxxx.fly.dev/'
KEEP_JOB_DIR = 'false'
NUM_WORKERS = '4'
WORKER_TAGS = 'deno,python3,go,bash,powershell,dependency,flow,hub,other,bun,php,postgresql'
RUST_LOG = 'info'

[[mounts]]
source = 'worker_cache'
destination = '/tmp/windmill/cache'
auto_extend_size_threshold = 80
auto_extend_size_increment = "1GB"
auto_extend_size_limit = "10GB"

[http_service]
internal_port = 80
force_https = true
[env]
BASE_URL = 'https://xxxxx.fly.dev/'
KEEP_JOB_DIR = 'false'
NUM_WORKERS = '4'
WORKER_TAGS = 'deno,python3,go,bash,powershell,dependency,flow,hub,other,bun,php,postgresql'
RUST_LOG = 'info'

[[mounts]]
source = 'worker_cache'
destination = '/tmp/windmill/cache'
auto_extend_size_threshold = 80
auto_extend_size_increment = "1GB"
auto_extend_size_limit = "10GB"

[http_service]
internal_port = 80
force_https = true
rubenf
rubenf4mo ago
can you exec into that container ?
invakid404
invakid4044mo ago
here's my fly.toml if it helps yeah, I can
rubenf
rubenf4mo ago
can you go into /tmp/windmill/cache/buntar and ls there
invakid404
invakid4044mo ago
lemme trigger the issue again
rubenf
rubenf4mo ago
you should have a directory tIq3Yr9cAUfe4trUyu1nqBu1Ql03fZdnRQiQGxiYh8U=, cd there and ls it you do not need to trigger the issue to do the above
invakid404
invakid4044mo ago
root@e82d921f019408:/tmp/windmill/cache/buntar/tIq3Yr9cAUfe4trUyu1nqBu1Ql03fZdnRQiQGxiYh8U=# ls -la
total 16
drwxr-xr-x 4 root root 4096 Jul 24 13:11 .
drwxr-xr-x 3 root root 4096 Jul 24 13:11 ..
drwxr-xr-x 3 root root 4096 Jul 24 13:11 node_modules
drwxr-xr-x 2 root root 4096 Jul 24 13:11 shared
root@e82d921f019408:/tmp/windmill/cache/buntar/tIq3Yr9cAUfe4trUyu1nqBu1Ql03fZdnRQiQGxiYh8U=# ls -la
total 16
drwxr-xr-x 4 root root 4096 Jul 24 13:11 .
drwxr-xr-x 3 root root 4096 Jul 24 13:11 ..
drwxr-xr-x 3 root root 4096 Jul 24 13:11 node_modules
drwxr-xr-x 2 root root 4096 Jul 24 13:11 shared
yeah, figured
root@e82d921f019408:/tmp/windmill/cache/buntar/tIq3Yr9cAUfe4trUyu1nqBu1Ql03fZdnRQiQGxiYh8U=/node_modules# ls -la
total 12
drwxr-xr-x 3 root root 4096 Jul 24 13:11 .
drwxr-xr-x 4 root root 4096 Jul 24 13:11 ..
drwxr-xr-x 2 root root 4096 Jul 24 13:11 ms
root@e82d921f019408:/tmp/windmill/cache/buntar/tIq3Yr9cAUfe4trUyu1nqBu1Ql03fZdnRQiQGxiYh8U=/node_modules# ls -la
total 12
drwxr-xr-x 3 root root 4096 Jul 24 13:11 .
drwxr-xr-x 4 root root 4096 Jul 24 13:11 ..
drwxr-xr-x 2 root root 4096 Jul 24 13:11 ms
i can't fit it in a message, so here's the lockfile information for my script in a pastebin if it helps: https://pastebin.com/w5gtu2fs it's supposed to have @elastic/elasticsearch and zod installed, but neither are present in node_modules I believe this started happening after upgrading to 1.366.6, so it must be a regression that happened semi-recently
rubenf
rubenf4mo ago
yes we changed the way it behaved you should have debug @elastic hpagent ms @opentelemetry secure-json-parse tslib undici in there do you have the issue with a script as simple as:
import * as wmill from "@elastic/elasticsearch"

export async function main(x: string) {
return wmill
}
import * as wmill from "@elastic/elasticsearch"

export async function main(x: string) {
return wmill
}
invakid404
invakid4044mo ago
let me see yes same issue node_modules only has ms again
rubenf
rubenf4mo ago
do you have any persistent volume on that container? yeah /tmp/windmill/cache
invakid404
invakid4044mo ago
/tmp/windmill/cache is a persistent volume, yes
rubenf
rubenf4mo ago
@invakid404 could you: remove the folder: tIq3Yr9cAUfe4trUyu1nqBu1Ql03fZdnRQiQGxiYh8U= then deploy and run again, and show me the logs in particular, looking at the error logs: Could not create buntar: it would be in your container logs
invakid404
invakid4044mo ago
on it
rubenf
rubenf4mo ago
also, it will recreate the folder "tIq3Yr9cAUfe4trUyu1nqBu1Ql03fZdnRQiQGxiYh8U=" if you can ls node_modules that would be great
invakid404
invakid4044mo ago
(currently waiting for the docker image to get pushed so i can redeploy :D)
rubenf
rubenf4mo ago
sorry I meant deploying the script just pressing "deploy" in the script UI
invakid404
invakid4044mo ago
oh i see
rubenf
rubenf4mo ago
I have a pretty good intuition of what the problem is which is somehow you couldn't do the full copy and we didn't handle well if you had an error while doing the copy (which I just fixed) but if i'm right then we should some interesting things in your ocntainer logs
invakid404
invakid4044mo ago
2024-07-24T13:36:17Z app[e82d921f019408] ams [info]2024-07-24T13:36:17.528422Z ERROR worker:job: windmill-worker/src/bun_executor.rs:585: Could not create buntar: Invalid cross-device link (os error 18) worker=wk-default-e82d921f019408-A9Tq8 hostname=e82d921f019408 job_id=0190e4f4-f611-f561-6749-a5df8c97a1db
2024-07-24T13:36:17Z app[e82d921f019408] ams [info]2024-07-24T13:36:17.528422Z ERROR worker:job: windmill-worker/src/bun_executor.rs:585: Could not create buntar: Invalid cross-device link (os error 18) worker=wk-default-e82d921f019408-A9Tq8 hostname=e82d921f019408 job_id=0190e4f4-f611-f561-6749-a5df8c97a1db
is this what you're looking for?
rubenf
rubenf4mo ago
yes very interesting ok so my fix does fix it
invakid404
invakid4044mo ago
so i should be fine if i update windmill?
rubenf
rubenf4mo ago
yes but this error is kinda crazy it would only happen if /tmp/windmill/cache is spread across different partitions ah well fly.io does crazy stuff usually
marko
marko4mo ago
Sorry, I was in meetings. My instance is on docker compose on single VPS. And yeah, the issue has now broken every node that uses Bun. I'll try to delete it as well. Looks like deleting the cache folder worked for me as well.