Configure Python Pipeline
Configuration¶
The Python Pipeline can be configured through environment variables for further customization. The available parameters inside your .env file are:
Resources¶
| Parameter | Description | Default |
|---|---|---|
PYTHON_PROCESS_TIMEOUT | The timeout for a single script (in seconds) | 60 |
PYTHON_TRANSFER_DIRECTORY_PATH | The directory on the host where python-starter creates a folder for each individual script job | /tmp |
PYTHON_RUNNER_CONTAINER_CPUS | The number of CPUs available to each script container | 1 |
PYTHON_RUNNER_CONTAINER_MEMORY | The amount of memory available to each script container | 1g |
PYTHON_RUNNER_READ_ONLY_FILESYSTEM | Whether the root filesystem should be mounted as read-only (true or false) | true |
PYTHON_RUNNER_TMPFS_MOUNT_SIZE_IN_BYTES | Maximum size of the tmpfs mount (mounted at /tmp inside the container) for each script container (in bytes) | 104857600 (100MB) |
PYTHON_RUNNER_DROPPED_CAPABILITIES | Comma-separated list of capabilities that should be removed from the container. Please refer to the Docker documentation for more details | CAP_NET_RAW |
PYTHON_RUNNER_NO_NEW_PRIVILEGES | Whether container processes should be prevented from gaining additional privileges | true |
Logging¶
| Parameter | Description | Default |
|---|---|---|
PYTHON_SCHEDULER_LOG_LEVEL | The log level for the python-scheduler (DEBUG, INFO, WARNING, ERROR or CRITICAL) | WARNING |
PYTHON_STARTER_LOG_LEVEL | The log level for the python-starter (DEBUG, INFO, WARNING, ERROR or CRITICAL) | WARNING |
Limiting Volume Size¶
By default, any script container can use up all of the available storage resources on your disk.
You can use the following instructions to set a limit for the directory that contains all volumes. These commands should be executed on your host.
# Create an empty file
touch python-pipeline-volume
# Resize the file (e.g. 2GB)
truncate -s 2G python-pipeline-volume
# Create a new ext4 filesystem
mke2fs -t ext4 -F python-pipeline-volume
# Create a new directory which will serve as the data transfer directory
mkdir /opt/python-pipeline-transfer
# Mount the filesystem
mount python-pipeline-volume /opt/python-pipeline-transfer
# Validate your changes
df -h /opt/python-pipeline-transfer
Afterwards, you should update your .env file and restart the python-starter by running docker compose up -d:
PYTHON_TRANSFER_DIRECTORY_PATH='/opt/python-pipeline-transfer'
Mounting Additional Directories¶
If your python-runner containers require access to an additional host directory, you can use the environment variable PYTHON_RUNNER_OTHER_OPTIONS to specify additional arguments for the generated docker run command that is used to start the python-runner containers.
However, this variable is not part of the python-pipeline.yml file by default. You should therefore create an additional custom-python-pipeline.yml file with the following contents to extend the default definition of the python-starter service. Docker Compose will automatically merge these two definitions together.
services:
python-starter:
environment:
- PYTHON_RUNNER_OTHER_OPTIONS=["--volume=/path-on-the-host:/path-inside-the-container:rw"]
Issues with Quotes
You should make sure that there are no additional quotes around the square brackets ([ and ]) since this will cause issues.
Afterwards, add custom-python-pipeline.yml to the COMPOSE_FILE variable inside your .env file and restart the python-starter container by running docker compose up -d.
Troubleshooting¶
Set PYTHON_STARTER_LOG_LEVEL to DEBUG inside your .env file. This will cause the full docker run command to be logged to STDOUT, which can be retrieved by running docker compose logs -f python-starter.