vLLM¶
vLLM is an inference and serving engine for LLMs (Large Language Models) that allows you to run AI models on your own hardware. Compared to Ollama, vLLM provides better throughput under high-concurrency scenarios.
Prerequisites¶
This guide assumes that you have a fully functioning SeaTable Server installation and have successfully installed SeaTable AI.
GPU¶
GPU Usage
The exact requirements depend on your specific GPU. Since vLLM does not provide specific instructions, you can refer to Ollama's documentation instead. Depending on your GPU, this will require installing proprietary NVIDIA drivers and the NVIDIA Container Toolkit or adding additional arguments to the vllm.yml
file shown below.
For Debian-based systems with NVIDIA GPUs, the following steps were carried out to successfully run vLLM:
- Install the proprietary NVIDIA drivers. The Debian Wiki contains installation instructions for the latest Debian releases.
- Remember to restart your system before proceeding with the following steps.
- Install the NVIDIA Container Toolkit.
HuggingFace Access Token¶
By default, vLLM will try to download models from HuggingFace. Some models are only available after accepting license or usage terms. This requires you to generate an access token that can be used to download models. You should store this access token in a safe location since it cannot be displayed again.
Instructions¶
Download .yml
files¶
Download the latest .yml
from GitHub by running the following command:
mkdir /opt/seatable-compose && \
cd /opt/seatable-compose && \
wget -c https://github.com/seatable/seatable-release/releases/latest/download/seatable-compose.tar.gz -O - | tar -xz -C /opt/seatable-compose && \
cp -n .env-release .env
Create vllm.yml
¶
Create /opt/seatable-compose/vllm.yml
with the following contents:
services:
vllm:
image: vllm/vllm-openai:v0.10.2
container_name: vllm
restart: unless-stopped
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
ipc: host
command: ["--model", "${VLLM_MODEL:?Variable is not set or empty}", "--enable-log-requests", "--max-model-len", "${VLLM_MAX_MODEL_LEN:-65536}"]
environment:
- HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_TOKEN:?Variable is not set or empty}
- VLLM_API_KEY=${VLLM_API_KEY:?Variable is not set or empty}
- VLLM_DEBUG_LOG_API_SERVER_RESPONSE=${VLLM_DEBUG_LOG_API_SERVER_RESPONSE:-false}
networks:
- frontend-net
volumes:
- ~/.cache/huggingface:/root/.cache/huggingface
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 10s
start_period: 300s
labels:
caddy: ${VLLM_HOSTNAME:?Variable is not set or empty}
caddy.@denied.not_0: "remote_ip ${VLLM_ALLOWED_IPS:?Variable is not set or empty} private_ranges"
caddy.abort: "@denied"
caddy.reverse_proxy: "{{upstreams 8000}}"
Afterwards, you should add vllm.yml
to the COMPOSE_FILE
variable inside your .env
file:
sed -i "s/COMPOSE_FILE='\(.*\)'/COMPOSE_FILE='\1,vllm.yml'/" /opt/seatable-compose/.env
Make sure that COMPOSE_FILE
already includes caddy.yml
since vLLM's port 8000 is not exposed by default.
vLLM Configuration¶
Add the following configuration settings to your .env
file to configure vLLM:
VLLM_HOSTNAME='YOUR_VLLM_HOSTNAME'
# Allowed IP addresses (multiple addresses can be separated by spaces)
VLLM_ALLOWED_IPS='SEATABLE_AI_PUBLIC_IP'
# Set this to a long random string, e.g. generated by "pwgen -s 64 1"
# This API key must then be used for any API requests
VLLM_API_KEY=''
HUGGING_FACE_TOKEN='YOUR_HUGGING_FACE_TOKEN'
# Model identifier from HuggingFace (e.g. RedHatAI/gemma-3-12b-it-quantized.w4a16)
VLLM_MODEL=''
SeaTable AI Configuration¶
In order to use vLLM to execute AI-based automation steps inside SeaTable, you must add the following configuration settings to the .env
file on the host where SeaTable AI is deployed:
SEATABLE_AI_LLM_TYPE='hosted_vllm'
SEATABLE_AI_LLM_URL='https://<YOUR_VLLM_HOSTNAME>/v1'
# API key for requests to vLLM (use the same key as above)
SEATABLE_AI_LLM_KEY=''
# Model identifier from HuggingFace (e.g. RedHatAI/gemma-3-12b-it-quantized.w4a16)
SEATABLE_AI_LLM_MODEL=''
Remember to restart SeaTable AI after making any changes by running docker compose up -d
inside /opt/seatable-compose
.
Start vLLM¶
You can now start vLLM by running docker compose up -d
inside /opt/seatable-compose
.
Starting vLLM can take several minutes, depending on the model size and your computing resources. vLLM will automatically pull down the configured model from HuggingFace on first startup.
You are now able to run AI-based automations steps inside SeaTable via your local vLLM deployment!