AlexK
AlexK6mo ago

Python dependency managment issue withPyMuPDF

I'm tyring to use PyMuPDF to do some PDF operations, I get the following error
job 018f95c5-bbf5-0662-29b8-e549ac3b7426 on worker wk-worker-239.ec2.internal-D5Wi2 (tag: python3)


--- PYTHON CODE EXECUTION ---

Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/tmp/windmill/wk-worker-239.ec2.internal-D5Wi2/018f95c5-bbf5-0662-29b8-e549ac3b7426/wrapper.py", line 9, in <module>
from u.alexkogan.pdf_url_extraction_and_classification import a as inner_script
File "/tmp/windmill/wk-worker-239.ec2.internal-D5Wi2/018f95c5-bbf5-0662-29b8-e549ac3b7426/u/alexkogan/pdf_url_extraction_and_classification/a.py", line 2, in <module>
import pymupdf
File "/tmp/windmill/cache/pip/pymupdf==1.24.4/pymupdf/__init__.py", line 28, in <module>
from . import extra
File "/tmp/windmill/cache/pip/pymupdf==1.24.4/pymupdf/extra.py", line 10, in <module>
from . import _extra
ImportError: libmupdf.so.24.2: cannot open shared object file: No such file or directory
job 018f95c5-bbf5-0662-29b8-e549ac3b7426 on worker wk-worker-239.ec2.internal-D5Wi2 (tag: python3)


--- PYTHON CODE EXECUTION ---

Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/tmp/windmill/wk-worker-239.ec2.internal-D5Wi2/018f95c5-bbf5-0662-29b8-e549ac3b7426/wrapper.py", line 9, in <module>
from u.alexkogan.pdf_url_extraction_and_classification import a as inner_script
File "/tmp/windmill/wk-worker-239.ec2.internal-D5Wi2/018f95c5-bbf5-0662-29b8-e549ac3b7426/u/alexkogan/pdf_url_extraction_and_classification/a.py", line 2, in <module>
import pymupdf
File "/tmp/windmill/cache/pip/pymupdf==1.24.4/pymupdf/__init__.py", line 28, in <module>
from . import extra
File "/tmp/windmill/cache/pip/pymupdf==1.24.4/pymupdf/extra.py", line 10, in <module>
from . import _extra
ImportError: libmupdf.so.24.2: cannot open shared object file: No such file or directory
I've also tried adding the exact requirements PyMuPDF==1.24.4 PyMuPDFb==1.24.3 No change to the error message.
6 Replies
rubenf
rubenf6mo ago
it requires a native library libmupdf you will need to pre-install it on your workers using init scripts
Tiago Serafim
Tiago Serafim6mo ago
If your project doesn't require anything too fancy, I'd go with a Pure Python PDF lib, such as https://pypi.org/project/pypdf/
PyPI
pypdf
A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files
AlexK
AlexK6mo ago
Thanks for the quick reply. I'll take a look at pypdf first. Unfortunately pypdf doesn't do the job, some files have widgets that are not parsed. @rubenf about the pre-installing it, right now i'm using the community edition, just to get to know Windmill. I see the UI option of init scripts is for EE version only. From the documents I understand the for community edition the approach is to add the dependency via the docker file, is this correct? Assuming it is, do I need to mention anything in the script itself so it won't try to fetch the dependency ? I'm basically asking how the package that was installed is "connected" to the script. Thanks
rubenf
rubenf6mo ago
you do not need to install the pip package before hand, just the native library
Tiago Serafim
Tiago Serafim6mo ago
In your docker-compose.yml you should add an env var like this - INIT_SCRIPT="apt-get install NATIVELIB"
Jeremy Worden
Jeremy Worden6mo ago
You can also point it to a script file. INIT_SCRIPT="bash /usr/src/app/scripts/startup.sh" and then have docker mount a shared folder with your startup.sh