LLM Provider Guide

This is the canonical source for LLM provider configuration. This system is designed to support both local and cloud-based Large Language Model (LLM) providers.

Currently supported providers:

  • Ollama — local models (default and preferred)
  • OpenAI — cloud-based models

The system uses a unified configuration approach, allowing you to switch providers without changing application code.


Required fields

Set these values in .env and reference them from config.yaml:

LLM_PROVIDER=openai
LLM_BASE_URL=https://api.openai.com/v1
LLM_API_KEY=your-api-key
LLM_EMBEDDING_MODEL=text-embedding-3-small
LLM_CHAT_MODEL=gpt-4o-mini
VECTOR_SIZE=1536
global:
  llm:
    provider: "${LLM_PROVIDER}"
    base_url: "${LLM_BASE_URL}"
    api_key: "${LLM_API_KEY}"
    models:
      embeddings: "${LLM_EMBEDDING_MODEL}"
      chat: "${LLM_CHAT_MODEL}"
    embeddings:
      vector_size: ${VECTOR_SIZE}

Provider profiles

OpenAI

LLM_PROVIDER=openai
LLM_BASE_URL=https://api.openai.com/v1
LLM_API_KEY=sk-your-openai-key
LLM_EMBEDDING_MODEL=text-embedding-3-small
LLM_CHAT_MODEL=gpt-4o-mini
VECTOR_SIZE=1536

Azure OpenAI

LLM_PROVIDER=azure_openai
LLM_BASE_URL=https://<resource>.openai.azure.com
LLM_API_KEY=your-azure-key
LLM_API_VERSION=2024-05-01-preview
LLM_EMBEDDING_MODEL=<embedding-deployment-name>
LLM_CHAT_MODEL=<chat-deployment-name>
VECTOR_SIZE=1536

Notes:

  • Use the resource root in LLM_BASE_URL.
  • Use deployment names for LLM_*_MODEL values.

Ollama is the default and recommended provider, especially for local development and privacy-sensitive environments.

Prerequisites

To use local models with Ollama, ensure the following:

  1. Ollama is installed and running on your machine - Ollama must be accessible at the configured endpoint
    (default: http://localhost:11434).
  • Installation guide:
    https://ollama.com/download
  1. Required models must be pulled locally before running the system - For embedding generation, the following model is required:

bash ollama pull argus-ai/pplx-embed-v1-0.6b:fp32

LLM_PROVIDER=ollama
LLM_BASE_URL=http://localhost:11434/v1
LLM_API_KEY=ollama # no authentication required
LLM_EMBEDDING_MODEL=argus-ai/pplx-embed-v1-0.6b:fp32
LLM_CHAT_MODEL=llama3.1:8b
VECTOR_SIZE=1024

Notes:

  • If your deployment uses native Ollama APIs, keep model names valid for your server.
  • For local/no-auth setups, LLM_API_KEY may be ignored by backend logic.

OpenAI-compatible endpoint

LLM_PROVIDER=openai_compat
LLM_BASE_URL=https://your-endpoint.example.com/v1
LLM_API_KEY=your-endpoint-key
LLM_EMBEDDING_MODEL=your-embedding-model
LLM_CHAT_MODEL=your-chat-model

Vector size mapping

Set global.llm.embeddings.vector_size to match embedding output dimension. Common examples:

  • text-embedding-3-small -> 1536
  • text-embedding-3-large -> 3072
  • Ollama models vary by model; verify before collection creation

Legacy compatibility

OPENAI_API_KEY is still supported in some flows, but canonical config should use LLM_API_KEY.