LLM Provider Guide

This is the canonical source for LLM provider configuration. This system is designed to support both local and cloud-based Large Language Model (LLM) providers.

Currently supported providers:

Ollama — local models (default and preferred)
OpenAI — cloud-based models

The system uses a unified configuration approach, allowing you to switch providers without changing application code.

Required fields

Set these values in .env and reference them from config.yaml:

LLM_PROVIDER=openai
LLM_BASE_URL=https://api.openai.com/v1
LLM_API_KEY=your-api-key
LLM_EMBEDDING_MODEL=text-embedding-3-small
LLM_CHAT_MODEL=gpt-4o-mini
VECTOR_SIZE=1536

global:
  llm:
    provider: "${LLM_PROVIDER}"
    base_url: "${LLM_BASE_URL}"
    api_key: "${LLM_API_KEY}"
    models:
      embeddings: "${LLM_EMBEDDING_MODEL}"
      chat: "${LLM_CHAT_MODEL}"
    embeddings:
      vector_size: ${VECTOR_SIZE}

Provider profiles

OpenAI

LLM_PROVIDER=openai
LLM_BASE_URL=https://api.openai.com/v1
LLM_API_KEY=sk-your-openai-key
LLM_EMBEDDING_MODEL=text-embedding-3-small
LLM_CHAT_MODEL=gpt-4o-mini
VECTOR_SIZE=1536

Azure OpenAI

LLM_PROVIDER=azure_openai
LLM_BASE_URL=https://<resource>.openai.azure.com
LLM_API_KEY=your-azure-key
LLM_API_VERSION=2024-05-01-preview
LLM_EMBEDDING_MODEL=<embedding-deployment-name>
LLM_CHAT_MODEL=<chat-deployment-name>
VECTOR_SIZE=1536

Notes:

Use the resource root in LLM_BASE_URL.
Use deployment names for LLM_*_MODEL values.

Ollama (Local Models – Recommended)

Ollama is the default and recommended provider, especially for local development and privacy-sensitive environments.

Prerequisites

To use local models with Ollama, ensure the following:

Ollama is installed and running on your machine - Ollama must be accessible at the configured endpoint
(default: http://localhost:11434).

Installation guide:
https://ollama.com/download

Required models must be pulled locally before running the system - For embedding generation, the following model is required:

bash ollama pull argus-ai/pplx-embed-v1-0.6b:fp32

LLM_PROVIDER=ollama
LLM_BASE_URL=http://localhost:11434/v1
LLM_API_KEY=ollama # no authentication required
LLM_EMBEDDING_MODEL=argus-ai/pplx-embed-v1-0.6b:fp32
LLM_CHAT_MODEL=llama3.1:8b
VECTOR_SIZE=1024

Notes:

If your deployment uses native Ollama APIs, keep model names valid for your server.
For local/no-auth setups, LLM_API_KEY may be ignored by backend logic.

OpenAI-compatible endpoint

LLM_PROVIDER=openai_compat
LLM_BASE_URL=https://your-endpoint.example.com/v1
LLM_API_KEY=your-endpoint-key
LLM_EMBEDDING_MODEL=your-embedding-model
LLM_CHAT_MODEL=your-chat-model

Amazon Bedrock

LLM_PROVIDER=bedrock
LLM_AWS_REGION=us-east-1
LLM_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
VECTOR_SIZE=1024

global:
  llm:
    provider: "${LLM_PROVIDER}"
    api_key: "${LLM_API_KEY}"
    models:
      embeddings: "${LLM_EMBEDDING_MODEL}"
    embeddings:
      vector_size: ${VECTOR_SIZE}
    provider_options:
      aws_region: "${LLM_AWS_REGION}"
      model_id: "${LLM_EMBEDDING_MODEL}"
      provisioned_throughput_arn: null

Notes:

Use provider_options.model_id or models.embeddings to specify the Bedrock Titan embedding model.
AWS_REGION may also be supplied through the normal AWS environment variables.
Bedrock chat is not currently implemented; only embeddings are supported.

Vector size mapping

Set global.llm.embeddings.vector_size to match embedding output dimension. Common examples:

text-embedding-3-small -> 1536
text-embedding-3-large -> 3072
Ollama models vary by model; verify before collection creation

Legacy compatibility

OPENAI_API_KEY is still supported in some flows, but canonical config should use LLM_API_KEY.

Variables: Environment Variables Reference
YAML schema: Configuration File Reference
Security: Security Considerations

LLM Provider Guide

Required fields

Provider profiles

OpenAI

Azure OpenAI

Ollama (Local Models – Recommended)

Prerequisites

OpenAI-compatible endpoint

Amazon Bedrock

Vector size mapping

Legacy compatibility

Related references