Processes
How to safely run subprocesses in your plugin.
It is not safe to use subprocess.run()
like you normally would because this can break caching and will not leverage Pants's parallelism. Instead, Pants has safe alternatives with Process
and InteractiveProcess
.
Process
Overview
Process
is similar to subprocess.Popen()
. The process will run in the background, and you can run multiple processes in parallel.
from pants.engine.process import Process, ProcessResult
from pants.engine.rules import Get, rule
@rule
async def demo(...) -> Foo:
result = await Get(
ProcessResult,
Process(
argv=["/bin/echo", "hello world"],
description="Demonstrate processes.",
)
)
logger.info(result.stdout.decode())
logger.info(result.stderr.decode())
This will return a ProcessResult
object, which has the fields stdout: bytes
, stderr: bytes
, and output_digest: Digest
.
The process will run in a temporary directory and is hermetic, meaning that it cannot read any arbitrary file from your project and that it will be stripped of environment variables. This sandbox is important for reproducibility and to allow running your Process
anywhere, such as through remote execution.
Process
Setting the --no-process-execution-cleanup-local-dirs
flag will cause the sandboxes of Process
es to be preserved and logged to the console for inspection.
It can be very helpful while editing Process
definitions!
Input Files
To populate the temporary directory with files, use the parameter input_digest: Digest
. It's common to use MergeDigests
to combine multiple Digest
s into one single input_digest
.
Environment Variables
To set environment variables, use the parameter env: Mapping[str, str]
. @rules
are prevented from accessing os.environ
(it will always be empty) because this reduces reproducibility and breaks caching. Instead, either hardcode the value or add a Subsystem
option for the environment variable in question, or request the Environment
or CompleteEnvironment
types in your @rule
.
The Environment
type contains a subset of the environment that Pants was run in, and is requested via a EnvironmentRequest
that lists the variables to consume. The CompleteEnvironment
type contains the full environment of the Pants run. The vast majority of the time, @rule
s should prefer to use Environment
, to avoid re-running for every environment change.
Requesting an Environment
subset:
from pants.engine.environment import Environment, EnvironmentRequest
from pants.engine.rules import Get, rule
@rule
async def partial_env(...) -> Foo:
relevant_env = await Get(Environment, EnvironmentRequest(["RELEVANT_VAR", "PATH"]))
..
Requesting the CompleteEnvironment
:
from pants.engine.environment import CompleteEnvironment
from pants.engine.rules import rule
@rule
async def complete_env(env: CompleteEnvironment) -> Foo:
...
Output Files
To capture output files from the process, set output_files: Iterable[str]
and/or output_directories: Iterable[str]
. Then, you can use the ProcessResult.output_digest
field to get a Digest
of the result.
Timeouts
To use a timeout, set the timeout_fields: int
field. Otherwise, the process will never time out, unless the user cancels Pants.
Process
cachingBy default, a Process
will be cached to ~/.cache/pants/lmdb_store
if the exit_code
is 0
.
If it not safe to cache your Process
—usually the case when you know that a process accesses files outside of its sandbox—you can change the cacheability of your Process
using the ProcessCacheScope
parameter:
from pants.engine.process import Process, ProcessCacheScope, ProcessResult
@rule
async def demo(...) -> Foo:
process = Process(
argv=["/bin/echo", "hello world"],
description="Never cached.",
env=env,
cache_scope=ProcessCacheScope.NEVER,
)
..
ProcessCacheScope
supports other options as well, including PER_RESTART
.
FallibleProcessResult
Normally, a ProcessResult
will raise an exception if the return code is not 0
. Instead, a FallibleProcessResult
allows for any return code.
Use Get(FallibleProcessResult, Process)
if you expect that the process may fail, such as when running a linter or tests.
Like ProcessResult
, FallibleProcessResult
has the attributes stdout: bytes
, stderr: bytes
, and output_digest: Digest
, and it adds exit_code: int
.
InteractiveProcess
InteractiveProcess
is similar to subprocess.run()
. The process will run in the foreground and it is blocking.
Because the process is blocking, you may only run an InteractiveProcess
in an @goal_rule
. Your @goal_rule
must request InteractiveRunner
, and then use its method .run()
. Typically, you should only use InteractiveProcess
for things that may require user input (such as running a REPL), or which have sideeffects in a user's environment (deploying an application to a remote service).
from pants.engine.rules import goal_rule
from pants.engine.process import InteractiveRunner, InteractiveProcess
@goal_rule
async def hello_world(interactive_runner: InteractiveRunner) -> HelloWorld:
# This demonstrates opening a Python REPL.
result = interactive_runner.run(
InteractiveProcess(argv=["/usr/bin/python"])
)
return HelloWorld(exit_code=result.exit_code)
You may either set the parameter input_digest: Digest
, or you may set run_in_workspace=True
. When running in the workspace, you will have access to any file in the build root.
To set environment variables, use the parameter env: Mapping[str, str]
, like you would with Process
. You can also set hermetic_env=False
to inherit the environment variables from the parent ./pants
process.
The method will return an InteractiveProcessResult
, which has a single field exit_code: int
.