Windmill Dependency Resolution taking around 3s for each scripts

We would like to utilize Azure Blob storage for our persistant storage. As Polars and Windmill are not natevily using it, we made a wrapper around Azure Blob File System:
# extra_requirements:
# adlfs==2024.12.0

import wmill
from typing import TypedDict
from adlfs import AzureBlobFileSystem
from loguru import logger
# extra_requirements:
# adlfs==2024.12.0

import wmill
from typing import TypedDict
from adlfs import AzureBlobFileSystem
from loguru import logger
Having there imports, each worker start time resolved the dependecies: env deps from local cache: adlfs==2024.12.0, aiohappyeyeballs==2.4.4, aiohttp==3.11.11, aiosignal==1.3.2, anyio==4.8.0, attrs==25.1.0, azure-core==1.32.0, azure-datalake-store==0.0.53, azure-identity==1.19.0, azure-storage-blob==12.24.1, certifi==2024.12.14, cffi==1.17.1, charset-normalizer==3.4.1, cryptography==44.0.0, frozenlist==1.5.0, fsspec==2024.12.0, h11==0.14.0, httpcore==1.0.7, httpx==0.28.1, idna==3.10, isodate==0.7.2, msal==1.31.1, msal-extensions==1.2.0, multidict==6.1.0, polars==1.21.0, portalocker==2.10.1, propcache==0.2.1, pycparser==2.22, pyjwt==2.10.1, requests==2.32.3, six==1.17.0, sniffio==1.3.1, typing-extensions==4.12.2, urllib3==2.3.0, wmill==1.450.1, yarl==1.18.3 These are the logs from the actual execution:
2025-01-30 07:27:40.729 | INFO | f.common.storage.azure_file_system:__init__:33 - starting fs init
2025-01-30 07:27:40.859 | INFO | f.project.scripts.retrieve_project_file:by_job_id:15 - file retrieved by path 0194b1a9-c1c3-ca7e-8bff-a81afbfefca1/project.csv
2025-01-30 07:27:41.036 | INFO | f.common.storage.azure_file_system:__init__:33 - starting fs init
2025-01-30 07:27:41.073 | INFO | f.project.scripts.retrieve_resume_urls_for_job:by_job_id:20 - file successfully retrieved
2025-01-30 07:27:40.729 | INFO | f.common.storage.azure_file_system:__init__:33 - starting fs init
2025-01-30 07:27:40.859 | INFO | f.project.scripts.retrieve_project_file:by_job_id:15 - file retrieved by path 0194b1a9-c1c3-ca7e-8bff-a81afbfefca1/project.csv
2025-01-30 07:27:41.036 | INFO | f.common.storage.azure_file_system:__init__:33 - starting fs init
2025-01-30 07:27:41.073 | INFO | f.project.scripts.retrieve_resume_urls_for_job:by_job_id:20 - file successfully retrieved
You will see that the work done is ~ 400ms(which still I think is slow) Actual execution time: 4216ms What should we do to optimise and reduce the starting time? Is the Azure Blob File System necessary for Polars write and read of files?
8 Replies
rubenf
rubenf4w ago
Most likely your imports are taking all that time, and so your imports are too heavy, python is slow and filesystem/disk is slow dedicated workers would solve this at the expense of having to dedicate a worker to that script
Stefan Stefanov
Stefan StefanovOP4w ago
Is it different by using the Polars and S3 file systems? I would like to understand more of a good practices also to be used in Windmill to take care of such cases.
rubenf
rubenf4w ago
if all your imports are already cached, and the time in the main function is much less than the script execution then it's because it's the imports part which are slow which dedicated workers (EE) would solve or super fast disks There isn't much we can do about python being slow to do imports rather than having the script being pre-loaded which is what dedicated workers do
Stefan Stefanov
Stefan StefanovOP4w ago
Thank you for the support! Last one: Is there logs or metrics that indicate the cold start of the workers above? We are using EE
rubenf
rubenf4w ago
What you're facing is not truly a cold start measurable by windmill the script is started
Stefan Stefanov
Stefan StefanovOP4w ago
Alright, understand. Sorry for my misundertanding. Trying to figure out pros and cost of how we should properly access the persistance storage with minimal dependencies. In that case excluding the Azure libraries. And for Polars and Azure: accessing wmill.polars_connection_settings().s3fs_args I have the following error: Exception: http://localhost:44353/api/w/dev/job_helpers/v2/polars_connection_settings: 500, Bad config: Polars only works with an S3 storage, Azure Blob is not supported yet But the Polars .write_**(storage_options) are accepting Azure. Is there planned support to retrieve the azure storage options from wmill client?
rubenf
rubenf4w ago
since you are an EE customer, we can add it to backlog yes what's your license id so we can add it to your board
Stefan Stefanov
Stefan StefanovOP4w ago
Thanks for the suggestion. Topic was send in our mutual Slack channel.

Did you find this page helpful?