Stefan Stefanov
Stefan Stefanov
WWindmill
Created by Stefan Stefanov on 1/30/2025 in #help
Windmill Dependency Resolution taking around 3s for each scripts
We would like to utilize Azure Blob storage for our persistant storage. As Polars and Windmill are not natevily using it, we made a wrapper around Azure Blob File System:
# extra_requirements:
# adlfs==2024.12.0

import wmill
from typing import TypedDict
from adlfs import AzureBlobFileSystem
from loguru import logger
# extra_requirements:
# adlfs==2024.12.0

import wmill
from typing import TypedDict
from adlfs import AzureBlobFileSystem
from loguru import logger
Having there imports, each worker start time resolved the dependecies: env deps from local cache: adlfs==2024.12.0, aiohappyeyeballs==2.4.4, aiohttp==3.11.11, aiosignal==1.3.2, anyio==4.8.0, attrs==25.1.0, azure-core==1.32.0, azure-datalake-store==0.0.53, azure-identity==1.19.0, azure-storage-blob==12.24.1, certifi==2024.12.14, cffi==1.17.1, charset-normalizer==3.4.1, cryptography==44.0.0, frozenlist==1.5.0, fsspec==2024.12.0, h11==0.14.0, httpcore==1.0.7, httpx==0.28.1, idna==3.10, isodate==0.7.2, msal==1.31.1, msal-extensions==1.2.0, multidict==6.1.0, polars==1.21.0, portalocker==2.10.1, propcache==0.2.1, pycparser==2.22, pyjwt==2.10.1, requests==2.32.3, six==1.17.0, sniffio==1.3.1, typing-extensions==4.12.2, urllib3==2.3.0, wmill==1.450.1, yarl==1.18.3 These are the logs from the actual execution:
2025-01-30 07:27:40.729 | INFO | f.common.storage.azure_file_system:__init__:33 - starting fs init
2025-01-30 07:27:40.859 | INFO | f.project.scripts.retrieve_project_file:by_job_id:15 - file retrieved by path 0194b1a9-c1c3-ca7e-8bff-a81afbfefca1/project.csv
2025-01-30 07:27:41.036 | INFO | f.common.storage.azure_file_system:__init__:33 - starting fs init
2025-01-30 07:27:41.073 | INFO | f.project.scripts.retrieve_resume_urls_for_job:by_job_id:20 - file successfully retrieved
2025-01-30 07:27:40.729 | INFO | f.common.storage.azure_file_system:__init__:33 - starting fs init
2025-01-30 07:27:40.859 | INFO | f.project.scripts.retrieve_project_file:by_job_id:15 - file retrieved by path 0194b1a9-c1c3-ca7e-8bff-a81afbfefca1/project.csv
2025-01-30 07:27:41.036 | INFO | f.common.storage.azure_file_system:__init__:33 - starting fs init
2025-01-30 07:27:41.073 | INFO | f.project.scripts.retrieve_resume_urls_for_job:by_job_id:20 - file successfully retrieved
You will see that the work done is ~ 400ms(which still I think is slow) Actual execution time: 4216ms What should we do to optimise and reduce the starting time? Is the Azure Blob File System necessary for Polars write and read of files?
13 replies