Debugging and benchmarking
Some techniques to figure out why Pants is behaving the way it is.
Benchmarking with hyperfine
We use hyperfine
to benchmark, especially comparing before and after to see the impact of a change: https://github.com/sharkdp/hyperfine.
When benchmarking, you must decide if you care about cold cache performance vs. warm cache (or both). If cold, use --no-pantsd --no-local-cache
. If warm, use hyperfine's option --warmup=1
.
For example:
❯ hyperfine --warmup=1 --runs=5 'pants list ::`
❯ hyperfine --runs=5 'pants --no-pantsd --no-local-cache lint ::'
CPU profiling with py-spy
py-spy
is a profiling sampler which can also be used to compare the impact of a change before and after: https://github.com/benfred/py-spy.
To profile with py-spy
:
- Activate Pants' development venv
source ~/.cache/pants/pants_dev_deps/<your platform dir>/bin/activate
- Install
py-spy
into itpip install py-spy
- Run Pants with
py-spy
(be sure to disablepantsd
)PYTHONPATH=src/python NO_SCIE_WARNING=1 py-spy record --subprocesses -- python -m pants.bin.pants_loader --no-pantsd <pants args>
- If you're running Pants from sources on code in another repo, set
PYTHONPATH
to thesrc/python
dir in the pants repo, and setPANTS_VERSION
to the current dev version in that repo. - On MacOS you may have to run this as root, under
sudo
.
The default output is a flamegraph. py-spy
can also output speedscope (https://github.com/jlfwong/speedscope) JSON with the --format speedscope
flag. The resulting file can be uploaded to https://www.speedscope.app/ which provides a per-process, interactive, detailed UI.
Additionally, to profile the Rust code the --native
flag can be passed to py-spy
as well. The resulting output will contain frames from Pants Rust code.
Memory profiling with memray
memray
is a Python memory profiler that can also track allocation in native extension modules: https://bloomberg.github.io/memray/.
To profile with memray
:
- Activate Pants' development venv
source ~/.cache/pants/pants_dev_deps/<your platform dir>/bin/activate
- Install
memray
into itpip install memray
- Run Pants with
memray
PYTHONPATH=src/python NO_SCIE_WARNING=1 memray run --native -o output.bin -m pants.bin.pants_loader --no-pantsd <pants args>
- If you're running Pants from sources on code in another repo, set
PYTHONPATH
to thesrc/python
dir in the pants repo, and setPANTS_VERSION
to the current dev version in that repo.
Note that in many cases it will be easier and more useful to run Pants with the --stats-memory-summary
flag.
Debugging rule
code with a debugger in VSCode
Running pants with the PANTS_DEBUG
environment variable set will use debugpy
(https://github.com/microsoft/debugpy)
to start a Debug-Adapter server (https://microsoft.github.io/debug-adapter-protocol/) which will
wait for a client connection before running Pants.
You can connect any Debug-Adapter-compliant editor (Such as VSCode) as a client, and use breakpoints,
inspect variables, run code in a REPL, and break-on-exceptions in your rule
code.
NOTE: PANTS_DEBUG
doesn't work with the pants daemon, so --no-pantsd
must be specified.
Debugging rule
code with a debugger in PyCharm
You'll have to follow a different procedure until PyCharm adds Debug-Adapter support:
- Add a requirement on
pydevd-pycharm
in your local clone of the pants source in 3rdparty/python/requirements.txt - Add this snippet where you want to break:
import pydevd_pycharm
pydevd_pycharm.settrace('localhost', port=5000, stdoutToServer=True, stderrToServer=True)
- Start a remote debugging session
- Run pants from your clone. The build will automatically install the new requirement. For example:
example-python$ PANTS_SOURCE=<path_to_your_pants_clone> pants --no-pantsd test ::
Identifying the impact of Python's GIL (on macOS)
Obtaining Full Thread Backtraces
Pants runs as a Python program that calls into a native Rust library. In debugging locking and deadlock issues, it is useful to capture dumps of the thread stacks in order to figure out where a deadlock may be occurring.
One-time setup:
- Ensure that gdb is installed.
- Ubuntu:
sudo apt install gdb
- Ubuntu:
- Ensure that the kernel is configured to allow debuggers to attach to processes that are not in the same parent/child process hierarchy.
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
- To make the change permanent, add a file to /etc/sysctl.d named
99-ptrace.conf
with contentskernel.yama.ptrace_scope = 0
. Note: This is a security exposure if you are not normally debugging processes across the process hierarchy.
- Ensure that the debug info for your system Python binary is installed.
- Ubuntu:
sudo apt install python3-dbg
- Ubuntu:
Dumping thread stacks:
- Find the pants binary (which may include pantsd if pantsd is enabled).
- Run:
ps -ef | grep pants
- Run:
- Invoke gdb with the python binary and the process ID:
- Run:
gdb /path/to/python/binary PROCESS_ID
- Run:
- Enable logging to write the thread dump to
gdb.txt
:set logging on
- Dump all thread backtraces:
thread apply all bt
- If you use pyenv to manage your Python install, a gdb script will exist in the same directory as the Python binary. Source it into gdb:
source ~/.pyenv/versions/3.8.5/bin/python3.8-gdb.py
(if using version 3.8.5)
- Dump all Python stacks:
thread apply all py-bt