Monday, October 2, 2023

Packaging Python

Python build tools are unifying behind a common interface of pyproject.toml.
This and this are great guides. The gist of the former is that you create a TOML file that conforms to a specification then you can use any build tool to run it. The gist of the latter is the whole Python packaging ecosystem.

The salient commands for building and deploying with your TOML file are:

python3 -m build
python3 -m twine upload --repository pypi dist/*


Note, you want to clean your dist directory first.

The Snag

The idea of using any Python build tool is not quite there yet. Poetry only implements a subset of the specification. Also, the specification has a leaky abstraction. On Discord, Prof. Nick Radcliffe explains that the promise of using "any" lead him to naively use setuptools.

Nick Radcliffe — 08/21/2023 2:37 PM

Also, in case anyone is interested (related to packaging, above) I'm currently in the process of packaging a fairly large Python codebase using new-style packaging (pyproject.toml rather than setup.py). It wasn't quite my first use of it, but this project is much more complex. Initially, I chose setuptools as the build backend, since (a) it didn't seem like it should matter much and (b) I didn't think I needed anything special. That was a big mistake for me: it turns out the setuptools back-end ignores almost everything except Python code in building your package. Whereas my package (which has over 10k files) also have about 1,000 non-python files (everything from .txt and .json to shape files, CSV files, and HTML and markdown and all sorts). Some of these are needed for testing (which for some reason some people think don't need to be distributed...as if people shouldn't care about whether the installed software works in situ, rather than just on the developer's machine in the CI system), but others are needed just in the ordinary course of using the software.  setuptools has a way to let you include extra stuff, but it's very manual and would be very error-prone for me. Anyway, the TL;DR is that I switched to Flit as the backend and everything "just worked". Not saying Flit will work better for you; but it sure as hell worked better for me!

Also, the reason I chose flit was that the third bullet in "Why use Flit?" is "Data files within a package directory are automatically included. Missing data files has been a common packaging mistake with other tools."

It also says: "The version number is taken from your package’s version attribute, so that always matches the version that tools like pip see." Which also seems extremely sane (and probably I don't need to do the automatic updating of my pyproject.toml to do that.

Success has many parents...

... but it appears that PyPI packages have only one. Although the authors tag can take a list, adding multiple entries is ignored. The reason is that it's best practise to use a mailing list (see here).

And so my package to facilitate the creation of synthetic data now lives in PyPI much like my Java code is deployed to mvnrepository.

No comments:

Post a Comment