Lockfiles
Securely locking down your third-party dependencies.
Third-party dependencies are typically specified via ranges of allowed versions, known as "requirements", one for each dependency, in a file such as requirements.txt or pyproject.toml. Examples of requirement strings include mypy>1.0.0
, Django>=3.1.0,<4
, pytest==7.1.1
.
A dependency resolution tool like Pip then takes these initial requirements and attempts to find and download a consistent set of transitive dependencies that are mutually compatible with each other and with the target Python interpreter version.
When used naively, this dependency resolution process is unstable: if you run a resolve, and then some time later run another resolve on the same inputs, you may end up with a different resulting set of dependencies. This is because new versions of direct or transitive dependencies may have been published (or, in rare cases, yanked) between the two runs.
This is an issue both for correctness (your code may not be compatible with the new versions) and security (a new version may contain a vulnerability). A further security concern is that repeatedly downloading even the same versions exposes you to greater risk if one of those versions is later compromised.
Dependency resolution can also be a performance bottleneck, with the same complex version compatibility logic running repeatedly, and unnecessarily.
Pants offers a solution to these issues that ensures stable, hermetic, secure builds over time, in the form of lockfiles.
What are lockfiles?
A lockfile is a metadata file that enumerates specific pinned versions of every transitive third-party dependency. It also provides the expected SHA256 hashes of the downloadable artifacts (sdists and wheels) for each dependency. A lockfile can contain dependency version information that is valid across multiple platforms and Python interpreter versions. Lockfiles can be large and complex, but fortunately Pants will generate them for you!
If you use lockfiles, and we highly recommend that you do, then Pants will use the locked transitive dependency versions in every build, and only change them when you deliberately update your lockfiles. Pants will also verify the downloaded artifacts against their expected hashes, to ensure that they haven't been compromised after the lockfile was generated.
Pants supports multiple lockfiles for different parts of your repo, via the mechanism of "resolves" - logical names given to lockfiles so that they are easy to reference.
Pants delegates lockfile creation and consumption to the Pex tool. So you may see standard lockfiles referred to as "Pex-style" lockfiles.
Getting started with resolves
First, you'll need to turn on the resolves functionality for the repo:
[python]
enable_resolves = true
Initially, Pants will assume a single resolve named python-default
which uses the repo's default interpreter constraints and references a lockfile at 3rdparty/python/default.lock
. You can change the name of the default resolve, and/or the location of its lockfile, via:
[python]
enable_resolves = true
default_resolve = "myresolve"
[python.resolves]
myresolve = "path/to/mylockfile"
You generate the lockfile as follows:
$ pants generate-lockfiles
19:00:39.26 [INFO] Completed: Generate lockfile for python-default
19:00:39.29 [INFO] Wrote lockfile for the resolve `python-default` to 3rdparty/python/default.lock
The inputs used to generate a lockfile are third-party dependencies in your repo, expressed via python_requirement
targets , or the python_requirements
/ poetry_requirements
generator targets. In this case, since you haven't yet explicitly mapped your requirement targets to a resolve, they will all map to python-default
, and so all serve as inputs to the default lockfile.
Multiple lockfiles
It's generally simpler to have a single resolve for the whole repository, if you can get away with it. But sometimes you may need more than one resolve, if you genuinely have conflicting requirements in different parts of your repo. For example, you may have both Django 3 and Django 4 projects in your repo.
If you need multiple resolves, you declare them in your config file:
[python]
enable_resolves = true
default_resolve = "data_science"
[python.resolves]
data_science = "3rdparty/python/data_science.lock"
webapps_django3 = "3rdparty/python/webapps_django3.lock"
webapps_django4 = "3rdparty/python/webapps_django4.lock"
Then, you partition your requirements targets across these resolves using the resolve
field, and possibly the parametrize mechanism:
python_requirement(
name="django3",
requirements=["django>=3.1.0,<4"],
resolve="webapps_django3",
)
python_requirement(
name="django4",
requirements=["django>=4.0.0,<5"],
resolve="webapps_django4",
)
python_requirements(
name="webapps_shared",
source="webapps-shared-requirements.txt",
resolve=parametrize("webapps_django3", "webapps_django4")
)
poetry_requirements(
name="data_science_requirements",
)
Any requirements targets that don't specify an explicit resolve=
will be associated with the default resolve.
As before, you run pants generate-lockfiles
to generate the lockfiles. You can use the --resolve
flag to generate just a subset of lockfiles. E.g.,
$ pants generate-lockfiles --resolve=webapps_django3 --resolve=webapps_django4
19:00:39.26 [INFO] Completed: Generate lockfile for webapps_django3
19:00:39.29 [INFO] Completed: Generate lockfile for webapps_django4
19:00:40.02 [INFO] Wrote lockfile for the resolve `webapps_django3` to 3rdparty/python/webapps_django3.lock
19:00:40.17 [INFO] Wrote lockfile for the resolve `webapps_django4` to 3rdparty/python/webapps_django4.lock
Finally, you update your first-party code targets, such as python_sources
, python_tests
, and pex_binary
to set their resolve=
field (which, as before, defaults to the default resolve).
python_sources(
resolve="django_webapp3",
)
python_tests(
name="tests",
resolve="django_webapp3",
# You can use `overrides` to change certain generated targets
overrides={"test_django4.py": {"resolve": "django_webapp4"}},
)
If a first-party target is compatible with multiple resolves, e.g., shared utility code, you can use the parametrize mechanism with the resolve=
field.
All transitive dependencies of a source target must use the same resolve. Pants's dependency inference already handles this for you by only inferring dependencies between targets that share the same resolve.
If you manually add a dependency across different resolves, Pants will error with a helpful message when you try to use that dependency.
To reiterate an important distinction: The resolve=
field on a third-party requirements target specifies that these requirements are inputs to the lockfile generator for that resolve. The resolve=
field on a first-party source target specifies that this target will consume the generated lockfile for that resolve.
Interpreter constraints
A lockfile will contain dependencies for all requested Python versions. By default, these are the global constraints specified by the [python].interpreter_constraints option. You can override this per-lockfile using the [python].resolves_to_interpreter_constraints option.
Modifying lockfile generation behavior
You can use the following options to affect how the lockfile generator resolves dependencies for each resolve:
- [python].resolves_to_constraints_file: For each resolve, a path to a Pip constraints file to use when resolving that lockfile.
- [python].resolves_to_no_binary: For each resolve, a list of projects that must only resolve to sdists and not wheels. Use the value
[":all:"]
to disable wheels for all packages. - [python].resolves_to_only_binary: For each resolve, a list of projects that must only resolve to wheels and not sdists. Use the value
[":all:"]
to disable sdists for all packages.
You can use the key __default__
to set the value for all resolves at once.
Updating lockfiles
If you modify the third-party requirements of a resolve then you must regenerate its lockfile by running the generate-lockfiles
goal. Pants will display an error if a lockfile is no longer compatible with its updated requirements.
You can have Pants display a useful summary of what changed between the old and new versions of the generated lockfile, by setting:
[generate-lockfiles]
diff = true
In theory, when you generate a lockfile, you should want to audit it for bugs, compliance and security concerns. In practice this is intractable to do manually. We would like to integrate with automated auditing tools and services in the future, so watch this space for updates, or feel free to reach out on Slack if this is important to you and you'd like to work on it.
Lockfile subsetting
When consuming a lockfile, Pants uses only the necessary subset of its transitive dependencies in each situation.
For example, when running a test, only the requirements actually used (transitively) by that test will be present on the sys.path
. This means that a test run won't be invalidated if unrelated requirements have changed, which improves cache hit rates. The same holds true when running and packaging code.
You can override this subsetting behavior by setting the [python].run_against_entire_lockfile option.
Lockfiles for tools
Pants's Python support typically involves invoking underlying tools, such as pytest
, mypy
, black
etc. in subprocesses. Almost all these tools are themselves written in Python and thus depended on via requirement strings, just like your third-party import dependencies.
It is strongly recommended that these tools be installed from a hermetic lockfile, for the same security and stability reasons stated above. In fact, Pants ships with built-in lockfiles for every Python tool it uses, and uses them automatically.
The only time you need to think about this is if you want to customize the tool requirements that Pants uses. This might be the case if you want to modify the version of a tool or add extra requirements (for example, tool plugins).
Tools can also be installed from a specific resolve instead of from the built-in lockfile. This is useful for specifying a version of the tool and including extra packages. To do this, set install_from_resolve
and requirements
on the tool's config section:
[python.resolves]
pytest = "3rdparty/python/pytest.lock"
[pytest]
install_from_resolve = "pytest" # Use this resolve's lockfiles.
requirements = ["//3rdparty/python:pytest"] # Use these requirements from the lockfile.
Then set up the resolve's inputs:
- 3rdparty/python/BUILD
- 3rdparty/python/pytest-requirements.txt
python_requirements(
name="pytest",
source="pytest-requirements.txt",
resolve="pytest",
)
# The default requirements (possibly with custom versions).
pytest==7.1.1
pytest-cov>=2.12,!=2.12.1,<3.1
pytest-xdist>=2.5,<3
ipdb
# Our extra requirement.
pytest-myplugin>=1.2.0,<2
And generate its custom lockfile:
$ pants generate-lockfiles --resolve=pytest
19:00:39.26 [INFO] Completed: Generate lockfile for pytest
19:00:39.29 [INFO] Wrote lockfile for the resolve `pytest` to 3rdparty/python/pytest.lock
Note that some tools, such as Flake8 and Bandit, must run on a Python interpreter that is compatible with the code they operate on. In this case you must ensure that the interpreter constraints for the tool's resolve are the same as those for the code in question.
Invalidating tool lockfiles
Pants will verify that any requirements set in the requirements
option are provided by the lockfile specified by install_from_resolve, and will error if not. This lets you ensure that you don't inadvertently use an older version of a tool if you update its requirements but forget to regenerate the lockfile.
The requirements
option can either list requirement strings, such as pytest==7.3.1
, or target addresses, such as //3rdparty/python:pytest
(the //
prefix tells Pants that these are target addresses). The latter is particularly useful as it allows you to avoid specifying the requirements redundantly in two places. Instead, the target can serve as both an input to the lockfile generator and as the requirements to verify.
Pants will only use the given requirements
from the lockfile. If you don't set requirements
, Pants will use the entire lockfile, and won't validate that it provides the desired tool at the desired version.
Sharing lockfiles between tools and code
In some cases a tool also provides a runtime library. For example, pytest
is run as a tool in a subprocess, but your tests can also import pytest
to access testing functionality.
Rather than repeat the same requirement in two different resolves, you can point the tool at an existing resolve that you also use for your code:
[pytest]
install_from_resolve = "python-default"
Of course, you have to ensure that this resolve does in fact provide appropriate versions of the tool.
As above, you will want to point requirements
to the subset of targets representing the tool's requirements, so that Pants can verify that the resolve provides them, and can use just the needed subset without unnecessary invalidation:
[pytest]
install_from_resolve = "python-default"
requirements = [
"//3rdparty/python#pytest",
"//3rdparty/python#pytest-cov",
"//3rdparty/python#pytest-xdist",
"//3rdparty/python#pytest-myplugin",
"//3rdparty/python#ipdb",
]
You can have a single resolve for all your tools, or even a single resolve for all your tools and code! This may be useful if you want to export a virtualenv that includes all your dependencies and all the tools you use.
But note that the more frequently you update a lockfile the more likely it is that unrelated updates will come along for the ride, since Pants does not yet support an "only-if-needed" upgrade strategy.
There is an older way of generating tool lockfiles, by setting the version
and extra_requirements
fields on a tool's config. This method is deprecated in favor of the standard one described above.
If you're using this deprecated tool lockfile generation mechanism, please switch to using the one described here as soon as possible!
Manually generating lockfiles
Rather than using generate-lockfiles
to generate Pex-style lockfiles, you can generate them manually. This can be useful when adopting Pants in a repository already using Poetry by running poetry export --dev
.
Manually generated lockfiles must either use Pex's JSON format or use pip's requirements.txt
-style format (ideally with --hash
entries for better supply chain security).
For example:
freezegun==1.2.0 \
--hash=sha256:93e90676da3... \
--hash=sha256:e19563d0b05...
To use a manually generated lockfile for a resolve, point the resolve to that lockfile's path in [python].resolves
. Then set [python].resolves_generate_lockfiles
to False
. Warning: it will likely be slower to install manually-generated user lockfiles than Pex ones, because Pants cannot as efficiently extract the subset of requirements used for a particular task; see the option [python].run_against_entire_lockfile
.