Monday, June 6, 2022

Packaging Python

Java programmers don't know the meaning of classpath hell until they've played with Python. Here are some notes I took while ploughing through the excellent Practical MLOps (Gift & Deza). Following their instructions, I as attempting to get a ML model served using Flask in a Docker container. Spoiler: it didn't work out of the box.

Since the correct OnnxRuntime wheel for my Python runtime did not exist, I had to build onnxruntime with --build-wheel while making the artifact.

This is where I encountered my first dependency horror:

CMake 3.18 or higher is required.  You are running version 3.10.2

when running onnxruntime/build.sh. (You can put a new version first in your PATH and avoid having to install it at the OS level).

This finally yielded onnxruntime-1.12.0-cp36-cp36m-linux_x86_64.whl which could be installed into my environment with pip install WHEEL_FILE... except that cp number must correspond to your Python version (3.6 in this case).

Moving virtual environments between machines is hard. You'd be best advised to use pip freeze to capture the environment. But ignoring this advice yields an interesting insight into the Python dependency system:

The first problem is that if you've created the environment with python -m venv then the scripts have your directory structure backed into them, as a simple grep will demonstrate. Copying the entire directory structure up to the virtual environment solved that.

But running the code gave me "No module named ..." errors. Looking at the sys.path didn't show my site-packages [SO] despite me having run activate. Odd. OK, so I defined PYTHONPATH and then I could see my site-packages in sys.path.

Then, you want to use exactly the same Python version. No apt-get Python for us! We have to manually install it [SO]. When doing this on a Docker container, I had to:

RUN apt-get update
RUN apt-get install -y wget
RUN apt-get install -y gcc
RUN apt-get install -y make
RUN apt-get install -y zlib1g-dev

Note that this [SO] helped me to create a Docker container that just pauses the moment it starts. This allows you to login and inspect it without it instantly dying on a misconfiguration.

The next problem: there are many compiled binaries in your virtual environment.

# find $PYTHONPATH/ -name \*.so | wc -l
185

Copying these between architectures is theoretically possible but the "as complexity of the code increases [so does] the likelihood of being linked against a library that is not installed" [SO]

Indeed, when I ran my Python code, I got a Segmentation Fault which can happen if "there's something wrong with your Python installation." [SO]

Python builds

A quick addendum on the how Python builds projects: the standard way is no longer standard: "[A]s of the last few years all direct invocations of setup.py are effectively deprecated in favor of invocations via purpose-built and/or standards-based CLI tools like pip, build and tox" [Paul Gannsle's blog]

No comments:

Post a Comment