Juspay Agent Framework (JAF) Production Deployment Guide¶
Table of Contents¶
- Architecture Overview
- Prerequisites
- Environment Configuration
- Database Setup
- Docker Containerization
- Load Balancing and Scaling
- Monitoring and Observability
- Security Considerations
- CI/CD Pipeline
- Performance Optimization
- Troubleshooting
Architecture Overview¶
JAF is a purely functional agent framework built with TypeScript, featuring:
- Core Engine: Stateless, immutable execution engine
- Memory Providers: Pluggable conversation storage (In-Memory, Redis, PostgreSQL)
- Server Runtime: Fastify-based HTTP server with REST API
- Tool System: Composable, type-safe function calling
- LLM Integration: Model Context Protocol (MCP) and LiteLLM support
- Tracing: Built-in observability and debugging
Key Dependencies¶
- Runtime: Node.js 18+ (ES2022 target)
- Framework: Fastify 4.x for HTTP server
- Validation: Zod for schema validation
- LLM: OpenAI SDK 4.x, MCP SDK 0.4.x
- Memory: Optional Redis 4.x or PostgreSQL 8.x clients
Prerequisites¶
System Requirements¶
- Node.js: 18.x or higher (LTS recommended)
- Memory: Minimum 512MB RAM, 2GB+ recommended for production
- Storage: 10GB+ for application and logs
- Network: HTTP/HTTPS traffic on configurable ports
External Services¶
- LLM Provider: LiteLLM server or direct OpenAI API access
- Database (optional): PostgreSQL 12+ or Redis 6+
- Load Balancer (recommended): nginx, HAProxy, or cloud LB
Environment Configuration¶
Core Environment Variables¶
# Server Configuration
PORT=3000
HOST=0.0.0.0
NODE_ENV=production
# LLM Configuration
LITELLM_URL=http://litellm:4000
LITELLM_API_KEY=your-api-key
LITELLM_MODEL=gpt-4o-mini
# Memory Provider Configuration
JAF_MEMORY_TYPE=postgres # options: memory, redis, postgres
Memory Provider Configuration¶
PostgreSQL Configuration¶
# PostgreSQL Memory Provider
JAF_MEMORY_TYPE=postgres
JAF_POSTGRES_HOST=postgres
JAF_POSTGRES_PORT=5432
JAF_POSTGRES_DB=jaf_memory
JAF_POSTGRES_USER=jaf_user
JAF_POSTGRES_PASSWORD=secure_password
JAF_POSTGRES_SSL=true
JAF_POSTGRES_TABLE=conversations
JAF_POSTGRES_MAX_CONNECTIONS=10
# Alternative: Connection String
JAF_POSTGRES_CONNECTION_STRING=postgresql://jaf_user:secure_password@postgres:5432/jaf_memory?sslmode=require
Redis Configuration¶
# Redis Memory Provider
JAF_MEMORY_TYPE=redis
JAF_REDIS_HOST=redis
JAF_REDIS_PORT=6379
JAF_REDIS_PASSWORD=secure_redis_password
JAF_REDIS_DB=0
JAF_REDIS_PREFIX=jaf:memory:
JAF_REDIS_TTL=86400 # 24 hours in seconds
# Alternative: Redis URL
JAF_REDIS_URL=redis://:secure_redis_password@redis:6379/0
In-Memory Configuration¶
# In-Memory Provider (development only)
JAF_MEMORY_TYPE=memory
JAF_MEMORY_MAX_CONVERSATIONS=1000
JAF_MEMORY_MAX_MESSAGES=1000
Database Setup¶
PostgreSQL Setup¶
1. Database Initialization¶
-- Create database and user
CREATE DATABASE jaf_memory;
CREATE USER jaf_user WITH ENCRYPTED PASSWORD 'secure_password';
GRANT ALL PRIVILEGES ON DATABASE jaf_memory TO jaf_user;
-- Connect to the database
\c jaf_memory;
-- Grant schema permissions
GRANT ALL ON SCHEMA public TO jaf_user;
2. Table Schema¶
The JAF PostgreSQL provider automatically creates the required schema:
CREATE TABLE IF NOT EXISTS conversations (
conversation_id VARCHAR(255) PRIMARY KEY,
user_id VARCHAR(255),
messages JSONB NOT NULL,
metadata JSONB,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
last_activity TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
-- Performance indexes
CREATE INDEX IF NOT EXISTS idx_conversations_user_id ON conversations (user_id);
CREATE INDEX IF NOT EXISTS idx_conversations_created_at ON conversations (created_at);
CREATE INDEX IF NOT EXISTS idx_conversations_last_activity ON conversations (last_activity);
CREATE INDEX IF NOT EXISTS idx_conversations_metadata_gin ON conversations USING GIN (metadata);
CREATE INDEX IF NOT EXISTS idx_conversations_trace_id ON conversations ((metadata->>'traceId'));
3. Production PostgreSQL Configuration¶
# postgresql.conf optimizations
shared_buffers = 256MB
effective_cache_size = 1GB
work_mem = 16MB
maintenance_work_mem = 64MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
Redis Setup¶
1. Redis Configuration¶
# redis.conf
bind 0.0.0.0
port 6379
requirepass secure_redis_password
maxmemory 512mb
maxmemory-policy allkeys-lru
save 900 1
save 300 10
save 60 10000
2. Redis Persistence Strategy¶
For production, choose between: - RDB: Point-in-time snapshots (lower overhead) - AOF: Append-only file (better durability) - Mixed: RDB + AOF (recommended for critical data)
# Mixed persistence (recommended)
save 300 10
appendonly yes
appendfsync everysec
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
Docker Containerization¶
Application Dockerfile¶
# Multi-stage build for production
FROM node:18-alpine AS builder
WORKDIR /app
# Copy package files
COPY package*.json ./
COPY tsconfig.json ./
# Install dependencies
RUN npm ci --only=production && npm cache clean --force
# Copy source code
COPY src/ ./src/
# Build application
RUN npm run build
# Production stage
FROM node:18-alpine AS production
# Install dumb-init for proper signal handling
RUN apk add --no-cache dumb-init
# Create non-root user
RUN addgroup -g 1001 -S jaf && \
adduser -S jaf -u 1001
WORKDIR /app
# Copy built application and dependencies
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package*.json ./
# Change ownership to non-root user
RUN chown -R jaf:jaf /app
USER jaf
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD node -e "http.get('http://localhost:' + (process.env.PORT || 3000) + '/health', (res) => process.exit(res.statusCode === 200 ? 0 : 1))"
# Expose port
EXPOSE 3000
# Use dumb-init for proper signal handling
ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "dist/index.js"]
Docker Compose for Development¶
version: '3.8'
services:
jaf-app:
build: .
ports:
- "3000:3000"
environment:
- NODE_ENV=development
- PORT=3000
- HOST=0.0.0.0
- JAF_MEMORY_TYPE=postgres
- JAF_POSTGRES_HOST=postgres
- JAF_POSTGRES_DB=jaf_memory
- JAF_POSTGRES_USER=jaf_user
- JAF_POSTGRES_PASSWORD=dev_password
- LITELLM_URL=http://litellm:4000
depends_on:
- postgres
- redis
- litellm
volumes:
- ./src:/app/src
restart: unless-stopped
postgres:
image: postgres:15-alpine
environment:
- POSTGRES_DB=jaf_memory
- POSTGRES_USER=jaf_user
- POSTGRES_PASSWORD=dev_password
volumes:
- postgres_data:/var/lib/postgresql/data
- ./scripts/init-db.sql:/docker-entrypoint-initdb.d/init.sql
ports:
- "5432:5432"
restart: unless-stopped
redis:
image: redis:7-alpine
command: redis-server --requirepass dev_password
volumes:
- redis_data:/data
ports:
- "6379:6379"
restart: unless-stopped
litellm:
image: ghcr.io/berriai/litellm:main-latest
ports:
- "4000:4000"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
volumes:
- ./litellm-config.yaml:/app/config.yaml
command: ["--config", "/app/config.yaml", "--port", "4000"]
restart: unless-stopped
volumes:
postgres_data:
redis_data:
Production Docker Compose¶
version: '3.8'
services:
jaf-app:
image: your-registry/jaf-app:latest
ports:
- "3000:3000"
environment:
- NODE_ENV=production
- PORT=3000
- HOST=0.0.0.0
env_file:
- .env.production
deploy:
replicas: 3
resources:
limits:
memory: 512M
reservations:
memory: 256M
restart_policy:
condition: on-failure
max_attempts: 3
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
depends_on:
- postgres
- redis
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./ssl:/etc/nginx/ssl:ro
depends_on:
- jaf-app
restart: unless-stopped
postgres:
image: postgres:15-alpine
environment:
- POSTGRES_DB=${JAF_POSTGRES_DB}
- POSTGRES_USER=${JAF_POSTGRES_USER}
- POSTGRES_PASSWORD=${JAF_POSTGRES_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
- ./postgresql.conf:/etc/postgresql/postgresql.conf
command: postgres -c config_file=/etc/postgresql/postgresql.conf
restart: unless-stopped
redis:
image: redis:7-alpine
command: redis-server /etc/redis/redis.conf
volumes:
- redis_data:/data
- ./redis.conf:/etc/redis/redis.conf:ro
restart: unless-stopped
volumes:
postgres_data:
redis_data:
Load Balancing and Scaling¶
Nginx Configuration¶
upstream jaf_backend {
least_conn;
server jaf-app-1:3000 weight=1 max_fails=3 fail_timeout=30s;
server jaf-app-2:3000 weight=1 max_fails=3 fail_timeout=30s;
server jaf-app-3:3000 weight=1 max_fails=3 fail_timeout=30s;
}
server {
listen 80;
server_name your-domain.com;
# Redirect HTTP to HTTPS
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name your-domain.com;
# SSL Configuration
ssl_certificate /etc/nginx/ssl/fullchain.pem;
ssl_certificate_key /etc/nginx/ssl/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384;
ssl_prefer_server_ciphers off;
# Security headers
add_header X-Frame-Options DENY;
add_header X-Content-Type-Options nosniff;
add_header X-XSS-Protection "1; mode=block";
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload";
# Compression
gzip on;
gzip_vary on;
gzip_min_length 1024;
gzip_types text/plain application/json application/xml;
# Rate limiting
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
location / {
limit_req zone=api burst=20 nodelay;
proxy_pass http://jaf_backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_cache_bypass $http_upgrade;
# Timeouts
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
# Health check endpoint (bypass rate limiting)
location /health {
proxy_pass http://jaf_backend;
access_log off;
}
# Static files (if any)
location /static/ {
expires 1y;
add_header Cache-Control "public, immutable";
}
}
Kubernetes Deployment¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: jaf-app
labels:
app: jaf-app
spec:
replicas: 3
selector:
matchLabels:
app: jaf-app
template:
metadata:
labels:
app: jaf-app
spec:
containers:
- name: jaf-app
image: your-registry/jaf-app:latest
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: "production"
- name: PORT
value: "3000"
- name: HOST
value: "0.0.0.0"
envFrom:
- secretRef:
name: jaf-secrets
- configMapRef:
name: jaf-config
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
securityContext:
runAsNonRoot: true
runAsUser: 1001
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
---
apiVersion: v1
kind: Service
metadata:
name: jaf-app-service
spec:
selector:
app: jaf-app
ports:
- protocol: TCP
port: 80
targetPort: 3000
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: jaf-app-ingress
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/rate-limit: "100"
spec:
tls:
- hosts:
- your-domain.com
secretName: jaf-tls
rules:
- host: your-domain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: jaf-app-service
port:
number: 80
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: jaf-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: jaf-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Monitoring and Observability¶
Application Metrics¶
JAF provides built-in health checks and tracing. Implement additional monitoring:
// Custom metrics for Prometheus
import promClient from 'prom-client';
// Create metrics
const httpRequestDuration = new promClient.Histogram({
name: 'jaf_http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.1, 0.5, 1, 2, 5]
});
const conversationCounter = new promClient.Counter({
name: 'jaf_conversations_total',
help: 'Total number of conversations',
labelNames: ['agent', 'memory_provider']
});
const memoryProviderLatency = new promClient.Histogram({
name: 'jaf_memory_provider_duration_seconds',
help: 'Memory provider operation duration',
labelNames: ['provider', 'operation'],
buckets: [0.01, 0.05, 0.1, 0.5, 1]
});
// Middleware for metrics collection
app.addHook('onRequest', async (request, reply) => {
request.startTime = Date.now();
});
app.addHook('onResponse', async (request, reply) => {
const duration = (Date.now() - request.startTime) / 1000;
httpRequestDuration
.labels(request.method, request.routerPath, reply.statusCode.toString())
.observe(duration);
});
// Metrics endpoint
app.get('/metrics', async (request, reply) => {
const metrics = await promClient.register.metrics();
reply.type('text/plain').send(metrics);
});
Prometheus Configuration¶
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'jaf-app'
static_configs:
- targets: ['jaf-app:3000']
scrape_interval: 15s
metrics_path: /metrics
- job_name: 'postgres'
static_configs:
- targets: ['postgres-exporter:9187']
- job_name: 'redis'
static_configs:
- targets: ['redis-exporter:9121']
rule_files:
- "jaf_alerts.yml"
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
Alert Rules¶
# jaf_alerts.yml
groups:
- name: jaf_alerts
rules:
- alert: JAFHighErrorRate
expr: rate(jaf_http_request_duration_seconds_count{status_code=~"5.."}[5m]) > 0.1
for: 2m
labels:
severity: warning
annotations:
summary: "High error rate detected"
description: "JAF application has a high error rate: {{ $value }} errors/sec"
- alert: JAFHighLatency
expr: histogram_quantile(0.95, rate(jaf_http_request_duration_seconds_bucket[5m])) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "High latency detected"
description: "95th percentile latency is {{ $value }}s"
- alert: JAFMemoryProviderDown
expr: up{job="jaf-app"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "JAF application is down"
description: "JAF application has been down for more than 1 minute"
- alert: PostgreSQLDown
expr: up{job="postgres"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "PostgreSQL is down"
description: "PostgreSQL database is not responding"
Grafana Dashboard¶
{
"dashboard": {
"title": "JAF Application Dashboard",
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [
{
"expr": "rate(jaf_http_request_duration_seconds_count[5m])",
"legendFormat": "{{method}} {{route}}"
}
]
},
{
"title": "Response Time",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(jaf_http_request_duration_seconds_bucket[5m]))",
"legendFormat": "95th percentile"
},
{
"expr": "histogram_quantile(0.50, rate(jaf_http_request_duration_seconds_bucket[5m]))",
"legendFormat": "50th percentile"
}
]
},
{
"title": "Memory Provider Latency",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(jaf_memory_provider_duration_seconds_bucket[5m]))",
"legendFormat": "{{provider}} {{operation}}"
}
]
},
{
"title": "Conversation Count",
"type": "graph",
"targets": [
{
"expr": "rate(jaf_conversations_total[5m])",
"legendFormat": "{{agent}}"
}
]
}
]
}
}
Centralized Logging¶
# fluentd/fluent.conf
<source>
@type forward
port 24224
bind 0.0.0.0
</source>
<filter docker.**>
@type parser
key_name log
<parse>
@type json
time_key timestamp
time_format %Y-%m-%dT%H:%M:%S.%LZ
</parse>
</filter>
<match docker.jaf-app>
@type elasticsearch
host elasticsearch
port 9200
index_name jaf-logs
type_name _doc
<buffer>
@type file
path /var/log/fluentd-buffers/jaf.buffer
flush_mode interval
flush_interval 10s
</buffer>
</match>
Security Considerations¶
Application Security¶
- Input Validation
- All user inputs are validated using Zod schemas
- Tool parameter validation is enforced
-
SQL injection prevention through parameterized queries
-
Authentication & Authorization
// Example JWT middleware app.addHook('preHandler', async (request, reply) => { if (request.url.startsWith('/api/')) { const token = request.headers.authorization?.replace('Bearer ', ''); if (!token) { return reply.code(401).send({ error: 'Authentication required' }); } try { const decoded = jwt.verify(token, process.env.JWT_SECRET); request.user = decoded; } catch (error) { return reply.code(401).send({ error: 'Invalid token' }); } } });
-
Rate Limiting
import rateLimit from '@fastify/rate-limit'; await app.register(rateLimit, { max: 100, timeWindow: '1 minute', errorResponseBuilder: function (request, context) { return { code: 429, error: 'Rate limit exceeded', message: `Rate limit exceeded, retry in ${Math.round(context.ttl / 1000)} seconds`, retryAfter: context.ttl }; } });
Infrastructure Security¶
- Network Security
- Use private networks for database connections
- Implement firewall rules
-
Enable VPC/security groups in cloud environments
-
Database Security
-- PostgreSQL security hardening ALTER SYSTEM SET ssl = on; ALTER SYSTEM SET log_statement = 'all'; ALTER SYSTEM SET log_min_duration_statement = 1000; -- Create limited user for application CREATE USER jaf_app WITH PASSWORD 'secure_password'; GRANT CONNECT ON DATABASE jaf_memory TO jaf_app; GRANT USAGE ON SCHEMA public TO jaf_app; GRANT SELECT, INSERT, UPDATE, DELETE ON conversations TO jaf_app;
-
Container Security
Secrets Management¶
# Kubernetes secrets
apiVersion: v1
kind: Secret
metadata:
name: jaf-secrets
type: Opaque
stringData:
LITELLM_API_KEY: "your-api-key"
JAF_POSTGRES_PASSWORD: "secure-database-password"
JAF_REDIS_PASSWORD: "secure-redis-password"
JWT_SECRET: "your-jwt-secret"
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: jaf-external-secrets
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-secret-store
kind: SecretStore
target:
name: jaf-secrets
creationPolicy: Owner
data:
- secretKey: LITELLM_API_KEY
remoteRef:
key: secret/jaf
property: litellm_api_key
CI/CD Pipeline¶
GitHub Actions¶
# .github/workflows/deploy.yml
name: Build and Deploy
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: test
POSTGRES_DB: jaf_test
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 5432:5432
redis:
image: redis:7
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 6379:6379
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '18'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run tests
run: npm test
env:
JAF_POSTGRES_HOST: localhost
JAF_POSTGRES_DB: jaf_test
JAF_POSTGRES_USER: postgres
JAF_POSTGRES_PASSWORD: test
JAF_REDIS_HOST: localhost
- name: Type check
run: npm run typecheck
- name: Lint
run: npm run lint
build:
needs: test
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
- name: Setup Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: |
ghcr.io/${{ github.repository }}:latest
ghcr.io/${{ github.repository }}:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
deploy:
needs: build
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
environment: production
steps:
- uses: actions/checkout@v4
- name: Deploy to Kubernetes
uses: azure/k8s-deploy@v1
with:
manifests: |
k8s/deployment.yaml
k8s/service.yaml
k8s/ingress.yaml
images: |
ghcr.io/${{ github.repository }}:${{ github.sha }}
kubectl-version: 'latest'
Helm Chart¶
# Chart.yaml
apiVersion: v2
name: jaf-app
description: Juspay Agent Framework Application
version: 0.1.0
appVersion: "1.0"
# values.yaml
replicaCount: 3
image:
repository: ghcr.io/your-org/jaf-app
pullPolicy: IfNotPresent
tag: ""
service:
type: ClusterIP
port: 80
targetPort: 3000
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: jaf.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: jaf-tls
hosts:
- jaf.example.com
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 250m
memory: 256Mi
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
postgresql:
enabled: true
auth:
postgresPassword: secure-password
database: jaf_memory
redis:
enabled: true
auth:
enabled: true
password: secure-redis-password
Performance Optimization¶
Application Optimization¶
-
Memory Management
// Configure memory limits process.env.NODE_OPTIONS = '--max-old-space-size=512'; // Implement conversation cleanup async function cleanupOldConversations() { if (memoryProvider.cleanupOldConversations) { const result = await memoryProvider.cleanupOldConversations(30); // 30 days console.log(`Cleaned up ${result.data} old conversations`); } } // Schedule cleanup setInterval(cleanupOldConversations, 24 * 60 * 60 * 1000); // Daily
-
Connection Pooling
// PostgreSQL connection pooling const { Pool } = require('pg'); const pool = new Pool({ host: process.env.JAF_POSTGRES_HOST, port: process.env.JAF_POSTGRES_PORT, database: process.env.JAF_POSTGRES_DB, user: process.env.JAF_POSTGRES_USER, password: process.env.JAF_POSTGRES_PASSWORD, max: 20, idleTimeoutMillis: 30000, connectionTimeoutMillis: 2000, });
-
Response Caching
import { fastifyRedis } from '@fastify/redis'; await app.register(fastifyRedis, { host: process.env.REDIS_HOST, port: process.env.REDIS_PORT }); // Cache conversation responses app.addHook('onSend', async (request, reply, payload) => { if (request.url.includes('/conversations/')) { const cacheKey = `cache:${request.url}`; await app.redis.setex(cacheKey, 300, payload); // 5 minutes } });
Database Optimization¶
-
PostgreSQL Tuning
-- Analyze query performance EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM conversations WHERE user_id = 'user123' ORDER BY last_activity DESC LIMIT 10; -- Create covering indexes CREATE INDEX CONCURRENTLY idx_conversations_user_activity ON conversations (user_id, last_activity DESC) INCLUDE (conversation_id, metadata); -- Partition large tables CREATE TABLE conversations_y2024m01 PARTITION OF conversations FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
-
Redis Optimization
Monitoring Performance¶
// Performance monitoring middleware
app.addHook('onRequest', async (request, reply) => {
request.startTime = process.hrtime.bigint();
});
app.addHook('onResponse', async (request, reply) => {
const duration = Number(process.hrtime.bigint() - request.startTime) / 1e6;
// Log slow requests
if (duration > 1000) {
console.warn(`Slow request: ${request.method} ${request.url} took ${duration}ms`);
}
// Add performance headers
reply.header('X-Response-Time', `${duration}ms`);
});
Troubleshooting¶
Common Issues¶
1. Memory Provider Connection Issues¶
Symptoms: - "Failed to connect to PostgreSQL/Redis memory provider" - Health check failures - Connection timeouts
Solutions:
# Check connectivity
docker exec jaf-app nc -zv postgres 5432
docker exec jaf-app nc -zv redis 6379
# Verify credentials
docker exec postgres psql -U jaf_user -d jaf_memory -c "SELECT 1;"
# Check Redis auth
docker exec redis redis-cli -a password ping
2. High Memory Usage¶
Symptoms: - Container OOM kills - Performance degradation - Memory leaks
Solutions:
// Monitor memory usage
setInterval(() => {
const usage = process.memoryUsage();
console.log(`Memory usage: RSS=${Math.round(usage.rss / 1024 / 1024)}MB, Heap=${Math.round(usage.heapUsed / 1024 / 1024)}MB`);
if (usage.rss > 400 * 1024 * 1024) { // 400MB threshold
console.warn('High memory usage detected');
}
}, 30000);
// Force garbage collection
if (global.gc) {
setInterval(() => {
global.gc();
}, 60000);
}
3. LLM Provider Issues¶
Symptoms: - Chat endpoints returning 500 errors - "Model provider not available" - Timeout errors
Solutions:
# Test LiteLLM connectivity
curl -H "Authorization: Bearer $LITELLM_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-3.5-turbo","messages":[{"role":"user","content":"test"}]}' \
http://litellm:4000/v1/chat/completions
# Check model availability
curl http://litellm:4000/v1/models
4. Database Performance Issues¶
Symptoms: - Slow response times - Connection pool exhaustion - Lock timeouts
Solutions:
-- Check active connections
SELECT count(*) FROM pg_stat_activity WHERE state = 'active';
-- Identify slow queries
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Check table sizes
SELECT schemaname,tablename,attname,n_distinct,correlation
FROM pg_stats
WHERE tablename = 'conversations';
Health Check Endpoints¶
// Comprehensive health check
app.get('/health/detailed', async (request, reply) => {
const health = {
status: 'healthy',
timestamp: new Date().toISOString(),
checks: {
database: { status: 'unknown' },
memory: { status: 'unknown' },
llm: { status: 'unknown' }
}
};
// Check database
try {
const dbResult = await memoryProvider.healthCheck();
health.checks.database = {
status: dbResult.data?.healthy ? 'healthy' : 'unhealthy',
latency: dbResult.data?.latencyMs
};
} catch (error) {
health.checks.database = { status: 'error', error: error.message };
}
// Check memory usage
const memUsage = process.memoryUsage();
health.checks.memory = {
status: memUsage.rss < 400 * 1024 * 1024 ? 'healthy' : 'warning',
rss: Math.round(memUsage.rss / 1024 / 1024),
heap: Math.round(memUsage.heapUsed / 1024 / 1024)
};
// Overall status
const allHealthy = Object.values(health.checks).every(check =>
check.status === 'healthy' || check.status === 'warning'
);
if (!allHealthy) {
health.status = 'unhealthy';
return reply.code(503).send(health);
}
return reply.send(health);
});
Log Analysis¶
# Common log queries
# High error rate
kubectl logs -l app=jaf-app | grep "ERROR" | tail -50
# Slow requests
kubectl logs -l app=jaf-app | grep "Slow request" | tail -20
# Memory provider issues
kubectl logs -l app=jaf-app | grep "MEMORY:" | tail -30
# Connection issues
kubectl logs -l app=jaf-app | grep -E "(ECONNREFUSED|timeout|connection)" | tail -20
Performance Debugging¶
// Add request tracing
app.addHook('onRequest', async (request, reply) => {
request.trace = {
id: uuidv4(),
start: Date.now(),
path: request.url,
method: request.method
};
console.log(`[${request.trace.id}] Started ${request.method} ${request.url}`);
});
app.addHook('onResponse', async (request, reply) => {
const duration = Date.now() - request.trace.start;
console.log(`[${request.trace.id}] Completed in ${duration}ms with status ${reply.statusCode}`);
});
// Database query timing
const originalQuery = pool.query;
pool.query = function(...args) {
const start = Date.now();
return originalQuery.apply(this, args).finally(() => {
const duration = Date.now() - start;
if (duration > 100) {
console.warn(`Slow query took ${duration}ms:`, args[0]);
}
});
};
This comprehensive deployment guide provides all the necessary information to deploy JAF applications to production environments. The guide covers everything from basic containerization to advanced Kubernetes deployments, monitoring, security, and troubleshooting.
For additional support or specific deployment scenarios, refer to the individual component documentation and consider the specific requirements of your infrastructure and compliance needs.