Model Providers and LLM Integration¶
The Juspay Agent Framework (JAF) provides a flexible and extensible model provider abstraction that enables integration with various Large Language Models (LLMs) through a unified interface. This guide covers everything you need to know about model providers, configuration, and best practices.
Table of Contents¶
- Overview
- Model Provider Interface
- LiteLLM Provider Implementation
- Model Configuration
- Environment Variables and Setup
- Supported Models and Providers
- Error Handling and Fallbacks
- Rate Limiting and Retries
- Cost Optimization
- Custom Model Provider Creation
- Debugging Model Interactions
- Examples
- Best Practices
Overview¶
JAF's model provider system abstracts away the complexity of interacting with different LLM APIs while providing:
- Unified Interface: Single API for all LLM providers
- Type Safety: Full TypeScript support with strict typing
- Flexible Configuration: Per-agent and global model settings
- Tool Support: Automatic tool schema conversion and execution
- Error Handling: Standardized error handling across providers
- Tracing: Built-in observability and debugging support
Model Provider Interface¶
The core ModelProvider
interface defines the contract that all model providers must implement:
export interface ModelProvider<Ctx> {
getCompletion: (
state: Readonly<RunState<Ctx>>,
agent: Readonly<Agent<Ctx, any>>,
config: Readonly<RunConfig<Ctx>>
) => Promise<{
message?: {
content?: string | null;
tool_calls?: Array<{
id: string;
type: 'function';
function: {
name: string;
arguments: string;
};
}>;
};
}>;
}
Parameters¶
state
: The current run state containing messages, context, and metadataagent
: The agent configuration including instructions, tools, and model configconfig
: The run configuration with global settings and overrides
Return Value¶
The provider must return a response object with an optional message
containing:
- content
: The LLM's text response
- tool_calls
: Array of function calls the LLM wants to execute
LiteLLM Provider Implementation¶
JAF includes a built-in LiteLLM provider that supports 100+ LLM providers through a unified interface.
Creating a LiteLLM Provider¶
import { makeLiteLLMProvider } from 'functional-agent-framework';
const modelProvider = makeLiteLLMProvider(
'http://localhost:4000', // LiteLLM server URL
'sk-your-api-key' // API key (can be "anything" for local LiteLLM)
);
LiteLLM Server Setup¶
-
Install LiteLLM:
-
Create configuration file (
litellm.yaml
):model_list: - model_name: gpt-4o litellm_params: model: openai/gpt-4o api_key: os.environ/OPENAI_API_KEY - model_name: claude-3-sonnet litellm_params: model: anthropic/claude-3-sonnet-20240229 api_key: os.environ/ANTHROPIC_API_KEY - model_name: gemini-pro litellm_params: model: gemini/gemini-pro api_key: os.environ/GOOGLE_API_KEY
-
Start LiteLLM server:
Provider Features¶
The LiteLLM provider automatically handles:
- Model Selection: Uses
modelOverride
, agentmodelConfig.name
, or defaults togpt-4o
- Message Conversion: Converts JAF messages to OpenAI-compatible format
- Tool Schema Conversion: Transforms Zod schemas to JSON Schema for function calling
- Temperature Control: Applies temperature settings from agent configuration
- Token Limits: Enforces max token limits from agent configuration
- Response Format: Handles JSON mode for structured outputs
Model Configuration¶
Agent-Level Configuration¶
Configure models at the agent level using the modelConfig
property:
const agent: Agent<MyContext, string> = {
name: 'MathTutor',
instructions: () => 'You are a helpful math tutor.',
tools: [calculatorTool],
modelConfig: {
name: 'gpt-4o', // Model to use
temperature: 0.1, // Lower for more deterministic responses
maxTokens: 2000 // Maximum response length
}
};
Global Configuration¶
Override model settings globally in the run configuration:
const config: RunConfig<MyContext> = {
agentRegistry,
modelProvider,
modelOverride: 'claude-3-sonnet', // Override all agent model settings
maxTurns: 10,
// ... other config
};
Model Selection Priority¶
JAF follows this priority order for model selection:
- Global Override:
config.modelOverride
- Agent Config:
agent.modelConfig.name
- Default:
gpt-4o
Environment Variables and Setup¶
LiteLLM Configuration¶
# LiteLLM server configuration
LITELLM_URL=http://localhost:4000
LITELLM_API_KEY=sk-your-api-key
LITELLM_MODEL=gpt-4o
# Provider-specific API keys
OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
GOOGLE_API_KEY=your-google-api-key
AZURE_API_KEY=your-azure-key
AZURE_API_BASE=https://your-resource.openai.azure.com/
Agent-Specific Environment Variables¶
# Model configuration
RAG_TEMPERATURE=0.1
RAG_MAX_TOKENS=2000
RAG_MAX_TURNS=5
RAG_MODEL=gemini-2.5-flash-lite
Complete Setup Example¶
// Load environment variables
import 'dotenv/config';
// Validate required variables
if (!process.env.LITELLM_URL) {
throw new Error('LITELLM_URL environment variable is required');
}
if (!process.env.LITELLM_API_KEY) {
throw new Error('LITELLM_API_KEY environment variable is required');
}
// Create model provider
const modelProvider = makeLiteLLMProvider(
process.env.LITELLM_URL,
process.env.LITELLM_API_KEY
);
Supported Models and Providers¶
LiteLLM supports 100+ models from major providers:
OpenAI Models¶
gpt-4o
,gpt-4o-mini
gpt-4-turbo
,gpt-4
gpt-3.5-turbo
Anthropic Models¶
claude-3-5-sonnet-20241022
claude-3-opus-20240229
claude-3-sonnet-20240229
claude-3-haiku-20240307
Google Models¶
gemini-2.5-flash-lite
gemini-1.5-pro-latest
gemini-1.5-flash-latest
Azure OpenAI¶
azure/gpt-4o
azure/gpt-4-turbo
AWS Bedrock¶
bedrock/anthropic.claude-3-sonnet-20240229-v1:0
bedrock/anthropic.claude-3-haiku-20240307-v1:0
Others¶
- Cohere, Replicate, Hugging Face, Ollama, and more
Provider Configuration Examples¶
# OpenAI
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
# Anthropic
- model_name: claude-3-sonnet
litellm_params:
model: anthropic/claude-3-sonnet-20240229
api_key: os.environ/ANTHROPIC_API_KEY
# Azure OpenAI
- model_name: azure-gpt-4
litellm_params:
model: azure/gpt-4
api_key: os.environ/AZURE_API_KEY
api_base: os.environ/AZURE_API_BASE
api_version: "2024-02-15-preview"
# Local Ollama
- model_name: llama2-local
litellm_params:
model: ollama/llama2
api_base: http://localhost:11434
Error Handling and Fallbacks¶
Built-in Error Types¶
JAF defines comprehensive error types for model interactions:
export type JAFError =
| { readonly _tag: "MaxTurnsExceeded"; readonly turns: number }
| { readonly _tag: "ModelBehaviorError"; readonly detail: string }
| { readonly _tag: "DecodeError"; readonly errors: z.ZodIssue[] }
| { readonly _tag: "InputGuardrailTripwire"; readonly reason: string }
| { readonly _tag: "OutputGuardrailTripwire"; readonly reason: string }
| { readonly _tag: "ToolCallError"; readonly tool: string; readonly detail: string }
| { readonly _tag: "HandoffError"; readonly detail: string }
| { readonly _tag: "AgentNotFound"; readonly agentName: string };
Error Handling Example¶
const result = await run(initialState, config);
if (result.outcome.status === 'error') {
const error = result.outcome.error;
switch (error._tag) {
case 'ModelBehaviorError':
console.error(`Model error: ${error.detail}`);
// Retry logic, fallback model, etc.
break;
case 'MaxTurnsExceeded':
console.error(`Conversation too long: ${error.turns} turns`);
break;
case 'ToolCallError':
console.error(`Tool ${error.tool} failed: ${error.detail}`);
break;
}
}
Model Fallback Implementation¶
class FallbackModelProvider implements ModelProvider<any> {
constructor(
private primary: ModelProvider<any>,
private fallback: ModelProvider<any>
) {}
async getCompletion(state: any, agent: any, config: any) {
try {
return await this.primary.getCompletion(state, agent, config);
} catch (error) {
console.warn('Primary model failed, trying fallback:', error);
return await this.fallback.getCompletion(state, agent, config);
}
}
}
// Usage
const primaryProvider = makeLiteLLMProvider('http://localhost:4000', 'key1');
const fallbackProvider = makeLiteLLMProvider('http://backup:4000', 'key2');
const modelProvider = new FallbackModelProvider(primaryProvider, fallbackProvider);
Rate Limiting and Retries¶
LiteLLM Built-in Features¶
LiteLLM provides built-in rate limiting and retry logic:
# litellm.yaml
model_list:
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
rpm: 60 # Requests per minute
tpm: 100000 # Tokens per minute
general_settings:
max_retries: 3
timeout: 30
retry_delay: 1
Custom Retry Logic¶
class RetryModelProvider implements ModelProvider<any> {
constructor(
private inner: ModelProvider<any>,
private maxRetries: number = 3,
private baseDelay: number = 1000
) {}
async getCompletion(state: any, agent: any, config: any) {
let lastError: Error;
for (let attempt = 0; attempt <= this.maxRetries; attempt++) {
try {
return await this.inner.getCompletion(state, agent, config);
} catch (error) {
lastError = error as Error;
if (attempt < this.maxRetries) {
const delay = this.baseDelay * Math.pow(2, attempt);
console.warn(`Attempt ${attempt + 1} failed, retrying in ${delay}ms:`, error);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
}
throw lastError!;
}
}
Cost Optimization¶
Model Selection Strategy¶
Choose models based on your use case:
// High-reasoning tasks
const reasoningAgent = {
// ...
modelConfig: { name: 'gpt-4o', temperature: 0.1 }
};
// Simple chat tasks
const chatAgent = {
// ...
modelConfig: { name: 'gpt-3.5-turbo', temperature: 0.7 }
};
// Fast, lightweight tasks
const quickAgent = {
// ...
modelConfig: { name: 'gpt-4o-mini', temperature: 0.3 }
};
Token Management¶
const tokenOptimizedAgent = {
// ...
modelConfig: {
name: 'gpt-4o-mini',
temperature: 0.3,
maxTokens: 500 // Limit response length
}
};
// Compress conversation history
const config: RunConfig<any> = {
// ...
memory: {
provider: memoryProvider,
autoStore: true,
maxMessages: 20, // Keep only recent messages
compressionThreshold: 50 // Compress when > 50 messages
}
};
Cost Monitoring¶
class CostTrackingProvider implements ModelProvider<any> {
private totalCost = 0;
constructor(private inner: ModelProvider<any>) {}
async getCompletion(state: any, agent: any, config: any) {
const startTime = Date.now();
const result = await this.inner.getCompletion(state, agent, config);
const duration = Date.now() - startTime;
// Estimate cost based on tokens and model
const estimatedCost = this.estimateCost(agent.modelConfig?.name, result);
this.totalCost += estimatedCost;
console.log(`Model call cost: $${estimatedCost.toFixed(4)}, Total: $${this.totalCost.toFixed(4)}`);
return result;
}
private estimateCost(model: string = 'gpt-4o', result: any): number {
// Implement cost estimation logic based on your LLM pricing
return 0.001; // Placeholder
}
}
Custom Model Provider Creation¶
Basic Custom Provider¶
class CustomModelProvider implements ModelProvider<any> {
constructor(private apiKey: string, private baseUrl: string) {}
async getCompletion(state: RunState<any>, agent: Agent<any, any>, config: RunConfig<any>) {
const model = config.modelOverride ?? agent.modelConfig?.name ?? 'default-model';
// Convert JAF messages to your API format
const messages = [
{ role: 'system', content: agent.instructions(state) },
...state.messages.map(this.convertMessage)
];
// Prepare API request
const requestBody = {
model,
messages,
temperature: agent.modelConfig?.temperature ?? 0.7,
max_tokens: agent.modelConfig?.maxTokens ?? 1000,
tools: this.convertTools(agent.tools)
};
// Make API call
const response = await fetch(`${this.baseUrl}/chat/completions`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${this.apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify(requestBody)
});
if (!response.ok) {
throw new Error(`API call failed: ${response.status} ${response.statusText}`);
}
const data = await response.json();
return {
message: {
content: data.choices[0]?.message?.content,
tool_calls: data.choices[0]?.message?.tool_calls
}
};
}
private convertMessage(msg: Message) {
// Convert JAF message format to your API format
return {
role: msg.role,
content: msg.content,
tool_call_id: msg.tool_call_id,
tool_calls: msg.tool_calls
};
}
private convertTools(tools?: readonly Tool<any, any>[]) {
if (!tools) return undefined;
return tools.map(tool => ({
type: 'function',
function: {
name: tool.schema.name,
description: tool.schema.description,
parameters: this.zodToJsonSchema(tool.schema.parameters)
}
}));
}
private zodToJsonSchema(schema: any): any {
// Implement Zod to JSON Schema conversion
// (JAF provides zodSchemaToJsonSchema utility)
return { type: 'object' };
}
}
Advanced Provider with Streaming¶
class StreamingModelProvider implements ModelProvider<any> {
async getCompletion(state: RunState<any>, agent: Agent<any, any>, config: RunConfig<any>) {
// Implement streaming response handling
const stream = await this.createStream(state, agent, config);
let content = '';
for await (const chunk of stream) {
if (chunk.choices?.[0]?.delta?.content) {
content += chunk.choices[0].delta.content;
// Optional: emit streaming events
config.onEvent?.({
type: 'llm_call_start', // Reuse existing event types or define new ones
data: { agentName: agent.name, model: 'streaming', content }
});
}
}
return {
message: { content }
};
}
private async* createStream(state: any, agent: any, config: any) {
// Implement your streaming API call
yield { choices: [{ delta: { content: 'Hello' } }] };
yield { choices: [{ delta: { content: ' World!' } }] };
}
}
Debugging Model Interactions¶
Enable Tracing¶
import { ConsoleTraceCollector } from 'functional-agent-framework';
const traceCollector = new ConsoleTraceCollector();
const config: RunConfig<any> = {
// ...
onEvent: traceCollector.collect.bind(traceCollector)
};
Trace Events¶
JAF emits detailed trace events for model interactions:
// LLM call start
{
type: 'llm_call_start',
data: { agentName: 'MathTutor', model: 'gpt-4o' }
}
// LLM call end
{
type: 'llm_call_end',
data: { choice: { message: { content: 'The answer is 42' } } }
}
Custom Debug Provider¶
class DebugModelProvider implements ModelProvider<any> {
constructor(private inner: ModelProvider<any>) {}
async getCompletion(state: RunState<any>, agent: Agent<any, any>, config: RunConfig<any>) {
console.log('🤖 Model Request:');
console.log(' Agent:', agent.name);
console.log(' Model:', config.modelOverride ?? agent.modelConfig?.name ?? 'default');
console.log(' Messages:', state.messages.length);
console.log(' Tools:', agent.tools?.length || 0);
console.log(' Context:', Object.keys(state.context));
const startTime = Date.now();
try {
const result = await this.inner.getCompletion(state, agent, config);
const duration = Date.now() - startTime;
console.log('✅ Model Response:');
console.log(' Duration:', `${duration}ms`);
console.log(' Content:', result.message?.content?.substring(0, 100) + '...');
console.log(' Tool calls:', result.message?.tool_calls?.length || 0);
return result;
} catch (error) {
const duration = Date.now() - startTime;
console.error('❌ Model Error:');
console.error(' Duration:', `${duration}ms`);
console.error(' Error:', error);
throw error;
}
}
}
Response Validation¶
class ValidatingModelProvider implements ModelProvider<any> {
constructor(private inner: ModelProvider<any>) {}
async getCompletion(state: RunState<any>, agent: Agent<any, any>, config: RunConfig<any>) {
const result = await this.inner.getCompletion(state, agent, config);
// Validate response structure
if (!result.message) {
throw new Error('Model provider returned invalid response: missing message');
}
// Validate tool calls if present
if (result.message.tool_calls) {
for (const toolCall of result.message.tool_calls) {
if (!toolCall.id || !toolCall.function?.name) {
throw new Error('Model provider returned invalid tool call structure');
}
// Validate tool exists
const tool = agent.tools?.find(t => t.schema.name === toolCall.function.name);
if (!tool) {
console.warn(`Model called unknown tool: ${toolCall.function.name}`);
}
}
}
return result;
}
}
Examples¶
Basic Setup¶
import 'dotenv/config';
import {
run,
RunConfig,
RunState,
createTraceId,
createRunId,
makeLiteLLMProvider
} from 'functional-agent-framework';
// Set up model provider
const modelProvider = makeLiteLLMProvider(
process.env.LITELLM_URL!,
process.env.LITELLM_API_KEY!
);
// Define agent
const agent = {
name: 'Assistant',
instructions: () => 'You are a helpful assistant.',
modelConfig: {
name: 'gpt-4o',
temperature: 0.7,
maxTokens: 1000
}
};
// Run configuration
const config: RunConfig<any> = {
agentRegistry: new Map([['Assistant', agent]]),
modelProvider,
maxTurns: 10
};
// Execute
const result = await run({
runId: createRunId(crypto.randomUUID()),
traceId: createTraceId(crypto.randomUUID()),
messages: [{ role: 'user', content: 'Hello!' }],
currentAgentName: 'Assistant',
context: {},
turnCount: 0
}, config);
Multi-Model Setup¶
// Different models for different tasks
const agents = {
reasoner: {
name: 'Reasoner',
instructions: () => 'You solve complex problems step by step.',
modelConfig: { name: 'gpt-4o', temperature: 0.1 }
},
creative: {
name: 'Creative',
instructions: () => 'You write creative content.',
modelConfig: { name: 'claude-3-sonnet', temperature: 0.9 }
},
fast: {
name: 'Fast',
instructions: () => 'You provide quick answers.',
modelConfig: { name: 'gpt-4o-mini', temperature: 0.3 }
}
};
Production Setup with Error Handling¶
class ProductionModelProvider implements ModelProvider<any> {
private retryProvider: RetryModelProvider;
private debugProvider: DebugModelProvider;
private costTracker: CostTrackingProvider;
constructor(baseUrl: string, apiKey: string) {
const baseProvider = makeLiteLLMProvider(baseUrl, apiKey);
this.retryProvider = new RetryModelProvider(baseProvider, 3, 1000);
this.debugProvider = new DebugModelProvider(this.retryProvider);
this.costTracker = new CostTrackingProvider(this.debugProvider);
}
async getCompletion(state: RunState<any>, agent: Agent<any, any>, config: RunConfig<any>) {
try {
return await this.costTracker.getCompletion(state, agent, config);
} catch (error) {
// Log error to monitoring system
console.error('Model provider error:', error);
// Report to error tracking
// errorTracker.report(error, { agent: agent.name, model: agent.modelConfig?.name });
throw error;
}
}
}
Best Practices¶
1. Model Selection¶
- Use appropriate models for tasks: GPT-4o for reasoning, GPT-4o-mini for simple tasks
- Consider cost vs. quality tradeoffs: Start with smaller models and upgrade as needed
- Test different models: Benchmark performance across your specific use cases
2. Configuration Management¶
- Environment-based config: Use environment variables for different environments
- Centralized settings: Keep model configurations in a central location
- Validation: Validate all configuration values at startup
3. Error Handling¶
- Implement retries: Handle transient failures with exponential backoff
- Fallback models: Have backup models for critical applications
- Graceful degradation: Handle model failures without breaking user experience
4. Performance Optimization¶
- Token management: Monitor and optimize token usage
- Caching: Cache responses for repeated queries
- Parallel processing: Use concurrent processing where possible
5. Monitoring and Observability¶
- Cost tracking: Monitor usage and costs across models
- Performance metrics: Track response times and success rates
- Error analysis: Analyze error patterns and optimize accordingly
6. Security¶
- API key management: Use secure key storage and rotation
- Input validation: Validate all inputs before sending to models
- Output sanitization: Sanitize model outputs before use
7. Development Workflow¶
- Local testing: Use local models (Ollama) for development
- Staging environment: Test with production models in staging
- A/B testing: Compare model performance with real users
Remember that model providers are a critical component of your JAF application, and proper implementation ensures reliable, cost-effective, and performant AI functionality.