Skip to content

6.0

vLLM

vLLM is an inference and serving engine for LLMs (Large Language Models) that allows you to run AI models on your own hardware. Compared to Ollama, vLLM provides better throughput under high-concurrency scenarios.

Prerequisites

This guide assumes that you have a fully functioning SeaTable Server installation and have successfully installed SeaTable AI.

GPU

GPU Usage

The exact requirements depend on your specific GPU. Since vLLM does not provide specific instructions, you can refer to Ollama's documentation instead. Depending on your GPU, this will require installing proprietary NVIDIA drivers and the NVIDIA Container Toolkit or adding additional arguments to the vllm.yml file shown below.

For Debian-based systems with NVIDIA GPUs, the following steps were carried out to successfully run vLLM:

  1. Install the proprietary NVIDIA drivers. The Debian Wiki contains installation instructions for the latest Debian releases.
  2. Remember to restart your system before proceeding with the following steps.
  3. Install the NVIDIA Container Toolkit.

HuggingFace Access Token

By default, vLLM will try to download models from HuggingFace. Some models are only available after accepting license or usage terms. This requires you to generate an access token that can be used to download models. You should store this access token in a safe location since it cannot be displayed again.

Instructions

Download .yml files

Download the latest .yml from GitHub by running the following command:

mkdir /opt/seatable-compose && \
cd /opt/seatable-compose && \
wget -c https://github.com/seatable/seatable-release/releases/latest/download/seatable-compose.tar.gz -O - | tar -xz -C /opt/seatable-compose && \
cp -n .env-release .env

Create vllm.yml

Create /opt/seatable-compose/vllm.yml with the following contents:

services:
  vllm:
    image: vllm/vllm-openai:v0.10.2
    container_name: vllm
    restart: unless-stopped
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    ipc: host
    command: ["--model", "${VLLM_MODEL:?Variable is not set or empty}", "--enable-log-requests", "--max-model-len", "${VLLM_MAX_MODEL_LEN:-65536}"]
    environment:
      - HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_TOKEN:?Variable is not set or empty}
      - VLLM_API_KEY=${VLLM_API_KEY:?Variable is not set or empty}
      - VLLM_DEBUG_LOG_API_SERVER_RESPONSE=${VLLM_DEBUG_LOG_API_SERVER_RESPONSE:-false}
    networks:
      - frontend-net
    volumes:
      - ~/.cache/huggingface:/root/.cache/huggingface
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 10s
      start_period: 300s
    labels:
      caddy: ${VLLM_HOSTNAME:?Variable is not set or empty}
      caddy.@denied.not_0: "remote_ip ${VLLM_ALLOWED_IPS:?Variable is not set or empty} private_ranges"
      caddy.abort: "@denied"
      caddy.reverse_proxy: "{{upstreams 8000}}"

Afterwards, you should add vllm.yml to the COMPOSE_FILE variable inside your .env file:

sed -i "s/COMPOSE_FILE='\(.*\)'/COMPOSE_FILE='\1,vllm.yml'/" /opt/seatable-compose/.env

Make sure that COMPOSE_FILE already includes caddy.yml since vLLM's port 8000 is not exposed by default.

vLLM Configuration

Add the following configuration settings to your .env file to configure vLLM:

VLLM_HOSTNAME='YOUR_VLLM_HOSTNAME'

# Allowed IP addresses (multiple addresses can be separated by spaces)
VLLM_ALLOWED_IPS='SEATABLE_AI_PUBLIC_IP'

# Set this to a long random string, e.g. generated by "pwgen -s 64 1"
# This API key must then be used for any API requests
VLLM_API_KEY=''

HUGGING_FACE_TOKEN='YOUR_HUGGING_FACE_TOKEN'

# Model identifier from HuggingFace (e.g. RedHatAI/gemma-3-12b-it-quantized.w4a16)
VLLM_MODEL=''

SeaTable AI Configuration

In order to use vLLM to execute AI-based automation steps inside SeaTable, you must add the following configuration settings to the .env file on the host where SeaTable AI is deployed:

SEATABLE_AI_LLM_TYPE='hosted_vllm'

SEATABLE_AI_LLM_URL='https://<YOUR_VLLM_HOSTNAME>/v1'

# API key for requests to vLLM (use the same key as above)
SEATABLE_AI_LLM_KEY=''

# Model identifier from HuggingFace (e.g. RedHatAI/gemma-3-12b-it-quantized.w4a16)
SEATABLE_AI_LLM_MODEL=''

Remember to restart SeaTable AI after making any changes by running docker compose up -d inside /opt/seatable-compose.

Start vLLM

You can now start vLLM by running docker compose up -d inside /opt/seatable-compose.

Starting vLLM can take several minutes, depending on the model size and your computing resources. vLLM will automatically pull down the configured model from HuggingFace on first startup.

You are now able to run AI-based automations steps inside SeaTable via your local vLLM deployment!