Configure Python Pipeline

Configuration¶

The Python Pipeline can be configured through environment variables for further customization. The available parameters inside your .env file are:

Security hardening

Python scripts execute as user-supplied code. Beyond the parameters below, review the security considerations on the installation page — in particular the risk of scripts reaching the cloud metadata endpoint and harvesting the host's machine-identity credentials.

Resources¶

Parameter	Description	Default
`PYTHON_PROCESS_TIMEOUT`	The timeout for a single script (in seconds)	`60`
`PYTHON_TRANSFER_DIRECTORY_PATH`	The directory on the host where python-starter creates a folder for each individual script job	`/tmp`
`PYTHON_RUNNER_CONTAINER_CPUS`	The number of CPUs available to each script container	`1`
`PYTHON_RUNNER_CONTAINER_MEMORY`	The amount of memory available to each script container	`1g`
`PYTHON_RUNNER_READ_ONLY_FILESYSTEM`	Whether the root filesystem should be mounted as read-only (`true` or `false`)	`true`
`PYTHON_RUNNER_TMPFS_MOUNT_SIZE_IN_BYTES`	Maximum size of the `tmpfs` mount (mounted at `/tmp` inside the container) for each script container (in bytes)	`104857600` (100MB)
`PYTHON_RUNNER_DROPPED_CAPABILITIES`	Comma-separated list of capabilities that should be removed from the container. Please refer to the Docker documentation for more details	`CAP_NET_RAW`
`PYTHON_RUNNER_NO_NEW_PRIVILEGES`	Whether container processes should be prevented from gaining additional privileges	`true`

Logging¶

Parameter	Description	Default
`PYTHON_SCHEDULER_LOG_LEVEL`	The log level for the python-scheduler (`DEBUG`, `INFO`, `WARNING`, `ERROR` or `CRITICAL`)	`WARNING`
`PYTHON_STARTER_LOG_LEVEL`	The log level for the python-starter (`DEBUG`, `INFO`, `WARNING`, `ERROR` or `CRITICAL`)	`WARNING`

Limiting Volume Size¶

By default, any script container can use up all of the available storage resources on your disk.

You can use the following instructions to set a limit for the directory that contains all volumes. These commands should be executed on your host.

# Create an empty file
touch python-pipeline-volume

# Resize the file (e.g. 2GB)
truncate -s 2G python-pipeline-volume

# Create a new ext4 filesystem
mke2fs -t ext4 -F python-pipeline-volume

# Create a new directory which will serve as the data transfer directory
mkdir /opt/python-pipeline-transfer

# Mount the filesystem
mount python-pipeline-volume /opt/python-pipeline-transfer

# Validate your changes
df -h /opt/python-pipeline-transfer

Afterwards, you should update your .env file and restart the python-starter by running docker compose up -d:

PYTHON_TRANSFER_DIRECTORY_PATH='/opt/python-pipeline-transfer'

Mounting Additional Directories¶

If your python-runner containers require access to an additional host directory, you can use the environment variable PYTHON_RUNNER_OTHER_OPTIONS to specify additional arguments for the generated docker run command that is used to start the python-runner containers.

However, this variable is not part of the python-pipeline.yml file by default. You should therefore create an additional custom-python-pipeline.yml file with the following contents to extend the default definition of the python-starter service. Docker Compose will automatically merge these two definitions together.

services:
  python-starter:
    environment:
      - PYTHON_RUNNER_OTHER_OPTIONS=["--volume=/path-on-the-host:/path-inside-the-container:rw"]

Issues with Quotes

You should make sure that there are no additional quotes around the square brackets ([ and ]) since this will cause issues.

Afterwards, add custom-python-pipeline.yml to the COMPOSE_FILE variable inside your .env file and restart the python-starter container by running docker compose up -d.

Troubleshooting¶

Set PYTHON_STARTER_LOG_LEVEL to DEBUG inside your .env file. This will cause the full docker run command to be logged to STDOUT, which can be retrieved by running docker compose logs -f python-starter.