Skip to main content
Version: 2.23 (prerelease)

Using Pants in CI

Suggestions for how to use Pants to speed up your CI (continuous integration).


Examples

See the example-python repository for an example GitHub Actions worfklow.

Directories to cache

The init-pants GitHub Action

If you're using GitHub Actions to run your CI workflows, then you can use our standard action to set up and cache the Pants bootstrap state. Otherwise, read on to learn how to configure this manually.

In your CI's config file, we recommend caching these directories:

  • $HOME/.cache/nce (Linux) or $HOME/Library/Caches/nce (macOS)
    This is the cache directory used by the Pants launcher binary to cache the assets, interpreters and venvs required to run Pants itself. Cache this against the Pants version, as specified in pants.toml. See the pantsbuild/example-python repo for an example of how to generate an effective cache key for this directory in GitHub Actions.
  • $HOME/.cache/pants/named_caches
    Caches used by some underlying tools. Cache this against the inputs to those tools. For the pants.backend.python backend, named caches are used by PEX, and therefore its inputs are your lockfiles. Again, see pantsbuild/example-python for an example.

If you're not using a fine-grained remote caching service, then you may also want to preserve the local Pants cache at $HOME/.cache/pants/lmdb_store. This has to be invalidated on any file that can affect any process, e.g., hashFiles('**/*') on GitHub Actions.

Computing such a coarse hash, and saving and restoring large directories, can be unwieldy. So this may be impractical and slow on medium and large repos.

A remote cache service integrates with Pants's fine-grained invalidation and avoids these problems, and is recommended for the best CI performance.

See Troubleshooting for how to change these cache locations.

Nuking the cache when too big

In CI, the cache must be uploaded and downloaded every run. This takes time, so there is a tradeoff where too large a cache will slow down your CI.

You can use this script to nuke the cache when it gets too big:

 function nuke_if_too_big() {
path=$1
limit_mb=$2
size_mb=$(du -m -d0 "${path}" | cut -f 1)
if (( size_mb > limit_mb )); then
echo "${path} is too large (${size_mb}mb), nuking it."
nuke_prefix="$(dirname "${path}")/$(basename "${path}").nuke"
nuke_path=$(mktemp -d "${nuke_prefix}.XXXXXX")
mv "${path}" "${nuke_path}/"
rm -rf "${nuke_prefix}.*"
fi
}

nuke_if_too_big ~/.cache/nce 512
nuke_if_too_big ~/.cache/pants/named_caches 1024
Tip: check cache performance with [stats].log

Set the option [stats].log = true in pants.ci.toml for Pants to print metrics of your cache's performance at the end of the run, including the number of cache hits and the total time saved thanks to caching, e.g.:

  local_cache_requests: 204
local_cache_requests_cached: 182
local_cache_requests_uncached: 22
local_cache_total_time_saved_ms: 307200

You can also add plugins = ["hdrhistogram"] to the [GLOBAL] section of pants.ci.toml for Pants to print histograms of cache performance, e.g. the size of blobs cached.

Remote caching

Rather than storing your cache with your CI provider, remote caching stores the cache in the cloud, using gRPC and the open-source Remote Execution API for low-latency and fine-grained caching.

This brings several benefits over local caching:

  • All machines and CI jobs share the same cache.
  • Remote caching downloads precisely what is needed by your run—when it's needed—rather than pessimistically downloading the entire cache at the start of the run.
    • No download and upload stage for your cache.
    • No need to "nuke" your cache when it gets too big.

See Remote Caching and Execution for more information.

Autofixing goals

The goals fmt and fix will attempt to automatically correct your code and then return zero if they were able to do so. This generally counts as "success" for most CI systems. In contrast the lint goal will not modify code and instead exit with a non-zero status if any tools detected a problem. In other words the lint goal is like the "checking" version of fmt/fix. Prefer lint if you want your CI system to return job failures to enforce linting and format rules.

With both approaches, you may want to shard the input targets into multiple CI jobs, for increased parallelism. See Advanced Target Selection. (This is typically less necessary when using remote caching.)

Approach #1: only run over changed files

Because Pants understands the dependencies of your code, you can use Pants to speed up your CI by only running tests and linters over files that actually made changes.

We recommend running these commands in CI:

❯ pants --version  # Bootstrap Pants.
❯ pants \
--changed-since=origin/main \
tailor --check \
update-build-files --check \
lint
❯ pants \
--changed-since=origin/main \
--changed-dependents=transitive \
check test

Because most linters do not care about a target's dependencies, we lint all changed files and targets, but not any dependents of those changes.

Meanwhile, tests should be rerun when any changes are made to the tests or to dependencies of those tests, so we use the option --changed-dependents=transitive. check should also run on any transitive changes.

See Advanced target selection for more information on --changed-since and alternative techniques to select targets to run in CI.

This will not handle all cases, like hooking up a new linter

For example, if you add a new plugin to Flake8, Pants will still only run over changed files, meaning you may miss some new lint issues.

For absolute correctness, you may want to use Approach #2. Alternatively, add conditional logic to your CI, e.g. that any changes to pants.toml trigger using Approach #2.

GitHub Actions: use Checkout

To use --changed-since, you may want to use the Checkout action.

By default, Checkout will only fetch the latest commit; you likely want to set fetch-depth to fetch prior commits.

GitLab CI: disable shallow clones or fetch main branch

GitLab's merge pipelines make a shallow clone by default, which only contains recent commits for the feature branch being merged. That severely limits --changed-since. There are two possible workarounds:

  1. Clone the entire repository by going to "CI / CD" settings and erase the number from the "Git shallow clone" field of the "General pipelines" section. Don't forget to "Save changes". This has the advantage of cloning everything, which also is the biggest long-term disadvantage.
  2. A more targeted and hence light-weight intervention leaves the shallow clone setting at its default value and instead fetches the main branch as well:
git branch -a
git remote set-branches origin main
git fetch --depth 1 origin main
git branch -a

The git branch commands are only included to print out all available branches before and after fetching origin/main.

Using partial clones in CI

Shallow clones are fast, but have the disadvantage of breaking --changed-since if an insufficient amount of depth is fetched from remote. This is particularly acute for feature branches that are very out-of-date or have a large number of commits.

Partial clones are still quite fast, have the advantage of not breaking --changed-since, and don't require any depth setting. Unlike shallow clones, Git will fetch trees and blobs on-demand as it needs them without failing.

If your CI does not support partial clones directly, you can define your own custom checkout strategy:

  • Treeless: git clone --filter=tree:0 <repository>
  • Blobless: git clone --filter=blob:none <repository>

As a workaround to #20027 permission errors, you might need to run this after the cloning the repo:

git config core.sshCommand "env SSH_AUTH_SOCK=$SSH_AUTH_SOCK ssh"

Approach #2: run over everything

Alternatively, you can simply run over all your code. Pants's caching means that you will not need to rerun on changed files.

❯ pants --version  # Bootstrap Pants.
❯ pants \
tailor --check \
update-build-files --check \
lint check test ::

However, when the cache gets too big, it should be nuked (see "Directories to cache"), so your CI may end up doing more work than Approach #1.

This approach works particularly well if you are using remote caching.

Configuring Pants for CI: pants.ci.toml (optional)

Sometimes, you may want config specific to your CI, such as turning on test coverage reports. If you want CI-specific config, create a dedicated pants.ci.toml config file. For example:

pants.ci.toml
[GLOBAL]
# Colors often work in CI, but the shell is usually not a TTY so Pants
# doesn't attempt to use them by default.
colors = true

[stats]
log = true

[test]
use_coverage = true

[coverage-py]
report = ["xml"]
global_report = true

[pytest]
args = ["-vv", "--no-header"]

Then, in your CI script or config, set the environment variable PANTS_CONFIG_FILES=pants.ci.toml to use this new config file, in addition to pants.toml.

Tuning resource consumption (advanced)

Pants allows you to control its resource consumption. These options all have sensible defaults. In most cases, there is no need to change them. However, you may benefit from tuning these options.

Concurrency options:

Memory usage options:

  • pantsd: enable or disable the Pants daemon, which uses an in-memory cache to speed up subsequent runs after the first run in CI.
  • pantsd_max_memory_usage: reduce or increase the size of Pantsd's in-memory cache.

The default test runners for these CI providers have the following resources. If you are using a custom runner, e.g. enterprise, check with your CI provider.

CI ProviderCoresRAMDocs
GitHub Actions, Linux27 GBlink
Travis, Linux27.5 GBlink
Circle CI, Linux, free plan24 GBlink
GitLab, Linux shared runners13.75 GBlink

Tip: automatically retry failed tests

Pants can automatically retry failed tests. This can help keep your builds passing even with flaky tests, like integration tests.

[test]
attempts_default = 3

Tip: store Pants logs as artifacts

We recommend that you configure your CI system to store the pants log (.pants.d/workdir/pants.log) as a build artifact, so that it is available in case you need to troubleshoot CI issues.

Different CI providers and systems have different ways to configure build artifacts:

It's particularly useful to configure your CI to always upload the log, even if prior steps in your pipeline failed.