Overview
The Multi-Model feature provides access to 55+ AI models across 7 major providers. This comprehensive model library includes various model sizes, capabilities, and price points, enabling organisations to select the optimal model for each use case while maintaining API consistency.Key Capabilities
55+ Production Models
From lightweight to state-of-the-art models across all AI providers.
Automatic Model Validation
Built-in controls ensure only supported models are used ensuring consistent AI output quality.
Model-Specific Optimisations
Automatic parameter adjustments based on model capabilities to ensure optimal performance.
Transparent Pricing
Real-time cost calculation and monitoring for all supported models in AI Gateway.
Business Benefits
1. Best of Breed Model Selection
-
Task-Optimised Performance
Choose the ideal model for each specific use case — GPT-4o for complex reasoning, Claude for long-context analysis, Gemini for multi-modal tasks etc.. -
Cost-Performance Optimisation
Select cost effective models for simple tasks (e.g., GPT-4o-mini, Claude Haiku etc.) and premium models for complex operations. -
Competitive Advantage
Leverage unique capabilities of different models to outperform competitors using single model approaches. -
Innovation Velocity
Immediately access new models as they are released without infrastructure changes.
2. Risk Mitigation & Reliability
-
Model Diversification
Avoid dependency on a single model’s availability, performance or pricing changes. -
Automatic Failover
Seamlessly switch to alternative models during outages or degraded performance. -
Compliance Flexibility
Use region specific or compliance certified models (Azure AI, AWS Bedrock, Google AI etc.) for regulated workloads. - Quality Assurance A/B test different models to ensure consistent quality across providers.
3. Cost Management & Optimisation
-
Dynamic Cost Control
Route requests to cheaper models based on complexity and budget constraints. -
Volume Discounts
Leverage pricing tiers across multiple providers simultaneously. -
Budget Allocation
Set model specific budgets and automatically switch when limits are reached. -
ROI Maximisation
Use premium models only where their advanced capabilities justify the cost.
4. Enterprise Scalability
-
Load Distribution
Distribute high volume workloads across multiple models to avoid rate limits. -
Geographic Optimisation
Use region specific models for lower latency and data residency compliance. -
Capacity Management
Access combined capacity of all providers during peak demand. -
Performance Benchmarking
Compare model performance in production with real workloads.
Supported Models by Provider
OpenAI Models (20 Models)
| Model | Description | Release Date | Modalities | Context Window | Max Output Tokens | Knowledge Cut-Off |
|---|---|---|---|---|---|---|
GPT-5 Chat (gpt-5-chat-latest) | Snapshot used in ChatGPT. Recommended for testing latest improvements in chat use cases. | Latest | Text, Image | 128,000 | 16,384 | Sep 30, 2024 |
GPT-5 (gpt-5-2025-08-07) | Flagship model for coding, reasoning, and agentic tasks across domains. | 2025-08-07 | Text, Image | 400,000 | 128,000 | Sep 30, 2024 |
GPT-5 Mini (gpt-5-mini-2025-08-07) | Faster, more cost-efficient GPT-5 variant for well-defined tasks and precise prompts. | 2025-08-07 | Text, Image | 400,000 | 128,000 | May 31, 2024 |
GPT-5 Nano (gpt-5-nano-2025-08-07) | Cheapest, fastest GPT-5 variant. Ideal for summarization and classification. | 2025-08-07 | Text, Image | 400,000 | 128,000 | May 31, 2024 |
GPT-4.1 (gpt-4.1-2025-04-14) | Excels at instruction following and tool use. Supports 1M token context with low latency. | 2025-04-14 | Text, Image | 1,047,576 | 32,768 | Jun 01, 2024 |
GPT-4.1 Mini (gpt-4.1-mini-2025-04-14) | Smaller, faster GPT-4.1 variant. Maintains broad capabilities with 1M token context. | 2025-04-14 | Text, Image | 1,047,576 | 32,768 | Jun 01, 2024 |
GPT-4.1 Nano (gpt-4.1-nano-2025-04-14) | Ultra-light GPT-4.1 variant for efficiency with 1M token context. | 2025-04-14 | Text, Image | 1,047,576 | 32,768 | Jun 01, 2024 |
GPT-4 Preview (gpt-4-0125-preview) | Research preview of GPT-4 Turbo, an older high-intelligence model. | 2024-01-25 | Text | 128,000 | 4,096 | Dec 01, 2023 |
GPT-4 Legacy (gpt-4-0613) | Older GPT-4 model, still available for compatibility. | 2023-06-13 | Text | 8,192 | 8,192 | Dec 01, 2023 |
GPT-4 Turbo (gpt-4-turbo-2024-04-09) | Cheaper, faster variant of GPT-4. Superseded by GPT-4o. | 2024-04-09 | Text, Image | 128,000 | 4,096 | Dec 01, 2023 |
GPT-4o (gpt-4o-2024-05-13) | Versatile, high-intelligence flagship model. Multimodal (text + image). | 2024-05-13 | Text, Image | 128,000 | 4,096 | Oct 01, 2023 |
GPT-4o (gpt-4o-2024-08-06) | Updated GPT-4o snapshot. | 2024-08-06 | Text, Image | 128,000 | 16,384 | Oct 01, 2023 |
GPT-4o (gpt-4o-2024-11-20) | Updated GPT-4o snapshot. | 2024-11-20 | Text, Image | 128,000 | 16,384 | Oct 01, 2023 |
GPT-4o Latest (chatgpt-4o-latest) | Points to the GPT-4o snapshot used in ChatGPT. | Rolling | Text, Image | 128,000 | 16,384 | Oct 01, 2023 |
GPT-4o Mini (gpt-4o-mini-2024-07-18) | Lightweight GPT-4o variant. Fast, affordable, and fine-tuning friendly. | 2024-07-18 | Text, Image | 128,000 | 16,384 | Oct 01, 2023 |
O1 (o1-2024-12-17) | RL-trained reasoning model. Thinks step-by-step before answering. | 2024-12-17 | Text, Image | 200,000 | 100,000 | Oct 01, 2023 |
O3 (o3-2025-04-16) | High-performance reasoning model for math, science, coding, and multimodal analysis. | 2025-04-16 | Text, Image | 200,000 | 100,000 | Jun 01, 2024 |
O3 Mini (o3-mini-2025-01-31) | Small reasoning model. Supports structured outputs, function calling, and batch API. | 2025-01-31 | Text | 200,000 | 100,000 | Oct 01, 2023 |
O4 Mini (o4-mini-2025-04-16) | Latest small o-series model. Optimized for fast reasoning, coding, and visual tasks. | 2025-04-16 | Text, Image | 200,000 | 100,000 | Jun 01, 2024 |
GPT-3.5 Turbo (gpt-3.5-turbo-0125) | Legacy GPT-3.5 model. Still supported, but GPT-4o Mini is recommended instead. | 2024-01-25 | Text | 16,385 | 4,096 | Sep 01, 2021 |
Anthropic Claude Models (7 Models)
| Model | Description | Release Date | Modalities | Context Window | Max Output Tokens | Knowledge Cut-Off |
|---|---|---|---|---|---|---|
Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) | Best model for complex agents and coding. | 2025-09-29 | Text, Image (Vision), Multilingual | 200K / 1M (beta) | 64,000 | Reliable: Jan 2025 · Training data: Jul 2025 |
Claude Sonnet 4 (claude-sonnet-4-20250514) | High-performance model. | 2025-05-14 | Text, Image (Vision), Multilingual | 200K / 1M (beta) | 64,000 | Reliable: Jan 2025 · Training data: Mar 2025 |
Claude Sonnet 3.7 (claude-3-7-sonnet-20250219, alias: claude-3-7-sonnet-latest) | High-performance model with early extended thinking. | 2025-02-19 | Text, Image (Vision), Multilingual | 200K | 64,000 | Reliable: Oct 2024 · Training data: Nov 2024 |
Claude Opus 4.1 (claude-opus-4-1-20250805) | Exceptional model for specialized complex tasks. | 2025-08-05 | Text, Image (Vision), Multilingual | 200K | 32,000 | Reliable: Jan 2025 · Training data: Mar 2025 |
Claude Opus 4 (claude-opus-4-20250514) | Previous flagship model. | 2025-05-14 | Text, Image (Vision), Multilingual | 200K | 32,000 | Reliable: Jan 2025 · Training data: Mar 2025 |
Claude Haiku 3.5 (claude-3-5-haiku-20241022, alias: claude-3-5-haiku-latest) | Fastest Claude model. | 2024-10-22 | Text, Image (Vision), Multilingual | 200K | 8,192 | Reliable: Jul 2024 · Training data: Jul 2024 |
Claude Haiku 3 (claude-3-haiku-20240307) | Compact model for near-instant responsiveness. | 2024-03-07 | Text, Image (Vision), Multilingual | 200K | 4,096 | Reliable: 2023 · Training data: Aug 2023 |
Amazon Bedrock Models (18 Models)
| Model | Description | Release Date | Modalities | Context Window | Max Output Tokens | Knowledge Cut-Off |
|---|---|---|---|---|---|---|
Claude Sonnet 4.5 (anthropic.claude-sonnet-4-5-20250929-v1:0) | Latest Claude Sonnet reasoning/chat model. | 2025-09-29 | Text, Image | 200K | – | – |
Claude Sonnet 4 (anthropic.claude-sonnet-4-20250514-v1:0) | Advanced Claude Sonnet v4. | 2025-05-14 | Text, Image | 200K | – | – |
Claude Sonnet 3.7 (anthropic.claude-3-7-sonnet-20250219-v1:0) | Claude 3.7 Sonnet generation model. | 2025-02-19 | Text, Image | 200K | – | – |
Claude Sonnet 3.5 v2 (anthropic.claude-3-5-sonnet-20241022-v2:0) | Updated Claude 3.5 Sonnet. | 2024-10-22 | Text, Image | 200K | – | – |
Claude Sonnet 3.5 (anthropic.claude-3-5-sonnet-20240620-v1:0) | Standard Claude 3.5 Sonnet. | 2024-06-20 | Text, Image | 200K | – | – |
Claude Haiku 3 (anthropic.claude-3-haiku-20240307-v1:0) | Lightweight Claude model optimized for speed/cost. | 2024-03-07 | Text | 48K | – | – |
Claude Sonnet 3 (anthropic.claude-3-sonnet-20240229-v1:0) | Claude 3 Sonnet general-purpose model. | 2024-02-29 | Text, Image | 28K | – | – |
Nova Lite (amazon.nova-lite-v1:0) | Amazon Nova lightweight model. | 2025 | Text | 300K | – | – |
Nova Micro (amazon.nova-micro-v1:0) | Amazon Nova smallest variant. | 2025 | Text | 128K | – | – |
Nova Pro (amazon.nova-pro-v1:0) | Amazon Nova flagship model. | 2025 | Text | 300K | – | – |
Titan Text G1 – Express (amazon.titan-text-express-v1) | Balanced Titan LLM for text generation. | 2023 | Text | 8K | – | – |
Titan Text G1 – Lite (amazon.titan-text-lite-v1) | Lightweight Titan model. | 2023 | Text | 4K | – | – |
IBM Granite 3.2 Instruct 8B (ibm-granite-3-2-8b-instruct) | General-purpose instruct model. | 2025 | Text | – | – | – |
IBM Granite 3.0 Instruct 8B (granite-3-0-8b-instruct) | Earlier instruct model (8B params). | 2024 | Text | – | – | – |
IBM Granite 20B Code Instruct (ibm-granite-20b-code-instruct-8k) | Code-focused model (20B params). | 2024 | Text (Code) | 8K | – | – |
IBM Granite 8B Code Instruct (ibm-granite-8b-code-instruct-128k) | Code instruct model with extended context. | 2024 | Text (Code) | 128K | – | – |
IBM Granite 34B Code Instruct (ibm-granite-34b-code-instruct-8k) | Large code instruct model (34B params). | 2024 | Text (Code) | 8K | – | – |
Llama 3 8B Instruct (meta.llama3-8b-instruct-v1:0) | Meta Llama 3 instruct-tuned model. | 2024 | Text | 8K | – | – |
Llama 3 70B Instruct (meta.llama3-70b-instruct-v1:0) | Larger Meta Llama 3 instruct model. | 2024 | Text | 8K | – | – |
DeepSeek-R1 (deepseek-llm-r1) | DeepSeek foundation model. | 2025 | Text | – | – | – |
DeepSeek V3.1 (deepseek.v3-v1:0) | Latest DeepSeek v3.1 model. | 2025 | Text | 163,840 | – | – |
Mistral 7B Instruct (mistral.mistral-7b-instruct-v0:2) | Instruction-tuned Mistral 7B. | 2024-03-01 | Text, Code, Classification | 32K | – | – |
Mistral Large 24.02 (mistral.mistral-large-2402-v1:0) | Large Mistral model for reasoning, text, code, RAG, and agents. | 2024-04-02 | Text, Code, RAG, Agents | 32K | – | – |
Mixtral 8x7B Instruct (mistral.mixtral-8x7b-instruct-v0:1) | Mixture-of-experts instruct model. | 2024-03-01 | Text, Code, Reasoning | 32K | – | – |
Azure OpenAI Models
| Model | Description | Release Date | Modalities | Context Window | Max Output Tokens | Knowledge Cut-Off |
|---|---|---|---|---|---|---|
GPT-5 (gpt-5-2025-08-07) | Flagship GPT-5 with reasoning, structured outputs, text + image processing, functions & tools. | 2025-08-07 | Text, Image | 400,000 (272K in / 128K out) | 128,000 | Sep 30, 2024 |
GPT-5 Mini (gpt-5-mini-2025-08-07) | Smaller, faster GPT-5 variant. | 2025-08-07 | Text, Image | 400,000 (272K in / 128K out) | 128,000 | May 31, 2024 |
GPT-5 Nano (gpt-5-nano-2025-08-07) | Optimized GPT-5 variant with smaller footprint. | 2025-08-07 | Text, Image | 400,000 (272K in / 128K out) | 128,000 | May 31, 2024 |
GPT-5 Chat Preview (gpt-5-chat-2025-08-07) | Chat-optimized GPT-5 (preview). | 2025-08-07 | Text, Image | 128,000 | 16,384 | Sep 30, 2024 |
GPT-5 Chat Preview (gpt-5-chat-2025-10-03) | Updated chat-optimized GPT-5 (preview). | 2025-10-03 | Text, Image | 128,000 | 16,384 | Sep 30, 2024 |
GPT-5 Codex (gpt-5-codex-2025-09-11) | GPT-5 optimized for coding and structured outputs. | 2025-09-11 | Text, Image | 400,000 (272K in / 128K out) | 128,000 | – |
GPT-5 Pro (gpt-5-pro-2025-10-06) | GPT-5 Pro with advanced reasoning, structured outputs, functions & tools. | 2025-10-06 | Text, Image | 400,000 (272K in / 128K out) | 128,000 | Sep 30, 2024 |
GPT-OSS 120B (gpt-oss-120b) Preview | Open-source style reasoning model. | 2025 | Text | 131,072 | 131,072 | May 31, 2024 |
GPT-OSS 20B (gpt-oss-20b) Preview | Smaller GPT-OSS variant. | 2025 | Text | 131,072 | 131,072 | May 31, 2024 |
GPT-4.1 (gpt-4.1-2025-04-14) | Multimodal model with streaming, function calling, and structured outputs. | 2025-04-14 | Text, Image | 1,047,576 · 128K (managed) · 300K (batch) | 32,768 | May 31, 2024 |
GPT-4.1 Nano (gpt-4.1-nano-2025-04-14) | Lightweight GPT-4.1 variant. | 2025-04-14 | Text, Image | 1,047,576 · 128K (managed) · 300K (batch) | 32,768 | May 31, 2024 |
GPT-4.1 Mini (gpt-4.1-mini-2025-04-14) | Smaller GPT-4.1 variant. | 2025-04-14 | Text, Image | 1,047,576 · 128K (managed) · 300K (batch) | 32,768 | May 31, 2024 |
Codex Mini (codex-mini-2025-05-16) | Fine-tuned o4-mini optimized for code. | 2025-05-16 | Text, Image | 200K in / 100K out | 100,000 | May 31, 2024 |
O3 Pro (o3-pro-2025-06-10) | Advanced reasoning model with enhanced capabilities. | 2025-06-10 | Text, Image | 200K in / 100K out | 100,000 | May 31, 2024 |
O4 Mini (o4-mini-2025-04-16) | Reasoning model with efficient performance. | 2025-04-16 | Text, Image | 200K in / 100K out | 100,000 | May 31, 2024 |
O3 (o3-2025-04-16) | Reasoning model with tool use. | 2025-04-16 | Text, Image | 200K in / 100K out | 100,000 | May 31, 2024 |
O3 Mini (o3-mini-2025-01-31) | Text-only reasoning model. | 2025-01-31 | Text | 200K in / 100K out | 100,000 | Oct 2023 |
O1 (o1-2024-12-17) | Reasoning model with structured outputs. | 2024-12-17 | Text, Image | 200K in / 100K out | 100,000 | Oct 2023 |
O1 Preview (o1-preview-2024-09-12) | Early preview release of O1. | 2024-09-12 | Text | 128K in / 32,768 out | 32,768 | Oct 2023 |
O1 Mini (o1-mini-2024-09-12) | Cost-efficient O1 variant. | 2024-09-12 | Text | 128K in / 65,536 out | 65,536 | Oct 2023 |
GPT-4o (gpt-4o-2024-11-20) | Multimodal GPT-4o with JSON mode, function calling, and strong vision support. | 2024-11-20 | Text, Image | 128,000 | 16,384 | Oct 2023 |
GPT-4o (gpt-4o-2024-08-06) | Updated GPT-4o release. | 2024-08-06 | Text, Image | 128,000 | 16,384 | Oct 2023 |
GPT-4o (gpt-4o-2024-05-13) | Early GPT-4o release (Turbo Vision parity). | 2024-05-13 | Text, Image | 128,000 | 4,096 | Oct 2023 |
GPT-4o Mini (gpt-4o-mini-2024-07-18) | Smaller, fast GPT-4o variant. | 2024-07-18 | Text, Image | 128,000 | 16,384 | Oct 2023 |
GPT-4 Turbo (gpt-4-turbo-2024-04-09) | Multimodal GPT-4 Turbo, successor to preview models. | 2024-04-09 | Text, Image | 128,000 | 4,096 | Dec 2023 |
GPT-3.5 Turbo (gpt-35-turbo-0125) | JSON mode, function calling, reproducible outputs. | 2024-01-25 | Text | 16,385 in / 4,096 out | 4,096 | Sep 2021 |
GPT-3.5 Turbo (gpt-35-turbo-1106) | Earlier GPT-3.5 Turbo variant. | 2023-11-06 | Text | 16,385 in / 4,096 out | 4,096 | Sep 2021 |
GPT-3.5 Turbo Instruct (gpt-35-turbo-instruct-0914) | Replacement for legacy Completions models. | 2023-09-14 | Text | 4,097 | 4,097 | Sep 2021 |
Azure AI Inference Models
| Model | Description | Release Date | Modalities | Context Window | Max Output Tokens | Knowledge Cut-Off |
|---|---|---|---|---|---|---|
AI21 Jamba 1.5 Mini (AI21-Jamba-1.5-Mini) | Tool calling: Yes; supports text, JSON, structured outputs. | – | Text | 262,144 | 4,096 | – |
AI21 Jamba 1.5 Large (AI21-Jamba-1.5-Large) | Tool calling: Yes; supports text, JSON, structured outputs. | – | Text | 262,144 | 4,096 | – |
O3 Mini (o3-mini) | OpenAI O-series; tool calling: Yes; structured outputs. | – | Text, Image | 200,000 | 100,000 | – |
O1 (o1) | OpenAI O-series; tool calling: Yes; structured outputs. | – | Text, Image | 200,000 | 100,000 | – |
O1 Preview (o1-preview) | Early O1 preview; tool calling: Yes. | – | Text | 128,000 | 32,768 | – |
O1 Mini (o1-mini) | Cost-efficient O1 variant; tool calling: No. | – | Text | 128,000 | 65,536 | – |
GPT-4o (gpt-4o) | Multimodal GPT-4o; tool calling: Yes; supports structured outputs. | – | Text, Image, Audio | 131,072 | 16,384 | – |
GPT-4o Mini (gpt-4o-mini) | Smaller GPT-4o variant; tool calling: Yes. | – | Text, Image, Audio | 131,072 | 16,384 | – |
Cohere Command A (Cohere-command-A) | Cohere instruct model; tool calling: Yes. | – | Text | 256,000 | 8,000 | – |
Cohere Command R+ (Cohere-command-r-plus-08-2024) | Optimized for reasoning and retrieval; tool calling: Yes. | 2024-08 | Text | 131,072 | 4,096 | – |
Cohere Command R (Cohere-command-r-08-2024) | Earlier R-series model; tool calling: Yes. | 2024-08 | Text | 131,072 | 4,096 | – |
JAIS 30B (jais-30b-chat) | Multilingual model; tool calling: Yes. | – | Text | 8,192 | 4,096 | – |
DeepSeek V3 (DeekSeek-V3-0324) | Latest DeepSeek v3; tool calling: No. | 2024-03 | Text | 131,072 | 131,072 | – |
DeepSeek V3 (Legacy) (DeepSeek-V3-Legacy) | Earlier DeepSeek v3. | – | Text | 131,072 | 131,072 | – |
DeepSeek R1 (DeepSeek-R1) | Reasoning-focused model. | – | Text | 163,840 | 163,840 | – |
Llama 4 Scout (Llama-4-Scout-17B-16E-Instruct) | Meta Llama 4 variant; tool calling: Yes. | – | Text, Image | 128,000 | 8,192 | – |
Llama 4 Maverick (Llama-4-Maverick-17B-128E-Instruct-FP8) | Meta Llama 4 Maverick; tool calling: Yes. | – | Text, Image | 128,000 | 8,192 | – |
Llama 3.3 70B (Llama-3.3-70B-Instruct) | Meta Llama 3.3 large model. | – | Text | 128,000 | 8,192 | – |
Llama 3.2 Vision (Llama-3.2-90B-Vision-Instruct) | Meta Llama 3.2 multimodal vision model. | – | Text, Image | 128,000 | 8,192 | – |
Llama 3.2 Vision (Llama-3.2-11B-Vision-Instruct) | Smaller Meta Llama 3.2 vision variant. | – | Text, Image | 128,000 | 8,192 | – |
Llama 3.1 8B (Meta-Llama-3.1-8B-Instruct) | Meta Llama 3.1 instruct variant. | – | Text | 131,072 | 8,192 | – |
Llama 3.1 405B (Meta-Llama-3.1-405B-Instruct) | Largest Meta Llama 3.1 instruct variant. | – | Text | 131,072 | 8,192 | – |
MAI DS R1 (MAI-DS-R1) | Reasoning model. | – | Text | 163,840 | 163,840 | – |
Phi-4 (Phi-4) | Microsoft Phi-4 general-purpose. | – | Text | 16,384 | 16,384 | – |
Phi-4 Mini (Phi-4-mini-instruct) | Small Phi-4 variant. | – | Text | 131,072 | 4,096 | – |
Phi-4 Multimodal (Phi-4-multimodal-instruct) | Multimodal Phi-4 (text, image, audio). | – | Text, Image, Audio | 131,072 | 4,096 | – |
Phi-4 Reasoning (Phi-4-reasoning) | Phi-4 reasoning-focused model. | – | Text | 32,768 | 32,768 | – |
Phi-4 Mini Reasoning (Phi-4-mini-reasoning) | Lightweight reasoning variant. | – | Text | 128,000 | 128,000 | – |
Phi-3.5 Mini (Phi-3.5-mini-instruct) | Phi-3.5 small instruct model. | – | Text | 131,072 | 4,096 | – |
Phi-3.5 MoE (Phi-3.5-MoE-instruct) | Phi-3.5 mixture-of-experts variant. | – | Text | 131,072 | 4,096 | – |
Phi-3.5 Vision (Phi-3.5-vision-instruct) | Phi-3.5 multimodal variant. | – | Text, Image | 131,072 | 4,096 | – |
Phi-3 Mini 128K (Phi-3-mini-128k-instruct) | Compact Phi-3 variant with 128K context. | – | Text | 131,072 | 4,096 | – |
Phi-3 Mini 4K (Phi-3-mini-4k-instruct) | Compact Phi-3 with 4K context. | – | Text | 4,096 | 4,096 | – |
Phi-3 Small 128K (Phi-3-small-128k-instruct) | Small Phi-3 with 128K context. | – | Text | 131,072 | 4,096 | – |
Phi-3 Small 8K (Phi-3-small-8k-instruct) | Small Phi-3 with 8K context. | – | Text | 131,072 | 4,096 | – |
Phi-3 Medium 128K (Phi-3-medium-128k-instruct) | Medium Phi-3 with 128K context. | – | Text | 131,072 | 4,096 | – |
Phi-3 Medium 4K (Phi-3-medium-4k-instruct) | Medium Phi-3 with 4K context. | – | Text | 4,096 | 4,096 | – |
Codestral 2501 (Codestral-2501) | Mistral Codestral code-focused model. | – | Text | 262,144 | 4,096 | – |
Ministral 3B (Ministral-3B) | Lightweight Mistral model; tool calling: Yes. | – | Text | 131,072 | 4,096 | – |
Mistral Nemo (Mistral-Nemo) | Mistral Nemo model; tool calling: Yes. | – | Text | 131,072 | 4,096 | – |
Mistral Large 24.11 (Mistral-Large-2411) | Latest Mistral large model; tool calling: Yes. | – | Text | 128,000 | 4,096 | – |
Mistral Medium 25.05 (Mistral-medium-2505) | Balanced medium model; tool calling: No. | – | Text, Image | 128,000 | 128,000 | – |
Mistral Small 25.03 (Mistral-small-2503) | Newer small Mistral; tool calling: Yes. | – | Text, Image | 131,072 | 4,096 | – |
Mistral Small (Mistral-small) | Earlier small Mistral variant. | – | Text | 32,768 | 4,096 | – |
Tsuzumi 7B (tsuzumi-7b) | Lightweight Tsuzumi 7B model. | – | Text | 8,192 | 8,192 | – |
Google AI Models (7 Models)
| Model | Description | Release Date | Modalities | Context Window | Max Output Tokens | Knowledge Cut-Off |
|---|---|---|---|---|---|---|
Gemini 2.5 Pro (gemini-2.5-pro) | Most advanced model for complex reasoning and multimodal tasks. | 2025 | Text, Image, Audio, Video | – | 65,536 | Jan 2025 |
Gemini 2.5 Flash (gemini-2.5-flash) | Balanced model optimized for speed and general use. | 2025 | Text, Image, Audio, Video | – | 65,536 | Jan 2025 |
Gemini 2.5 Flash (Preview) (gemini-2.5-flash-preview-09-2025) | Preview release of Gemini 2.5 Flash. | 2025-09 | Text, Image, Audio, Video | – | 65,536 | Jan 2025 |
Gemini 2.5 Flash-Lite (gemini-2.5-flash-lite) | Lightweight, cost-efficient variant. | 2025 | Text, Image, Audio, Video | – | 65,536 | Jan 2025 |
Gemini 2.5 Flash-Lite (Preview) (gemini-2.5-flash-lite-preview-09-2025) | Preview release of Gemini 2.5 Flash-Lite. | 2025-09 | Text, Image, Audio, Video | – | 65,536 | Jan 2025 |
Gemini 2.0 Flash (gemini-2.0-flash) | Earlier generation Flash model. | 2024 | Text, Image, Audio, Video | – | 8,192 | Aug 2024 |
Gemini 2.0 Flash-Lite (gemini-2.0-flash-lite) | Lightweight 2.0 Flash variant. | 2024 / 2025 | Text, Image, Audio | – | 8,192 | Aug 2024 |
Google Vertex AI Models
| Model | Description | Release Date | Modalities | Context Window | Max Output Tokens | Knowledge Cut-Off |
|---|---|---|---|---|---|---|
Gemini 2.5 Flash (Preview) (gemini-2.5-flash) | Balanced model optimized for speed. | 2025 | Text, Image, Audio, Video | 1M | 65,536 | Jan 2025 |
Gemini 2.5 Pro (Preview) (gemini-2.5-pro) | Most advanced Gemini model. | 2025 | Text, Image, Audio, Video | 1M | 65,536 | Jan 2025 |
Gemini 2.0 Flash (gemini-2.0-flash) | Previous Flash generation. | 2024 | Text, Image, Audio, Video | – | 8,192 | Aug 2024 |
Gemini 2.0 Flash-Lite (gemini-2.0-flash-lite) | Lightweight Flash variant. | 2024 | Text, Image, Audio | – | 8,192 | Aug 2024 |
Claude Opus 4.1 (claude-opus-4-1) | Exceptional reasoning model. | 2025 | Text, Image | 200K | 32,000 | Jan 2025 |
Claude Opus 4 (claude-opus-4) | Previous flagship Claude model. | 2025 | Text, Image | 200K | 32,000 | Jan 2025 |
Claude Sonnet 4.5 (claude-sonnet-4-5) | Best for complex agents and coding. | 2025 | Text, Image | 200K / 1M (beta) | 64,000 | Jan 2025 |
Claude Sonnet 4 (claude-sonnet-4) | High-performance Claude Sonnet model. | 2025 | Text, Image | 200K / 1M (beta) | 64,000 | Jan 2025 |
Claude 3.7 Sonnet (claude-3-7-sonnet) | High-performance with extended thinking. | 2025 | Text, Image | 200K | 64,000 | Oct 2024 |
Claude 3.5 Sonnet v2 (claude-3-5-sonnet-v2) | Updated Claude 3.5 Sonnet. | 2024 | Text, Image | 200K | 64,000 | 2024 |
Claude 3.5 Haiku (claude-3-5-haiku) | Fastest Claude model. | 2024 | Text, Image | 200K | 8,192 | Jul 2024 |
Claude 3 Haiku (claude-3-haiku) | Compact and fast Claude model. | 2024 | Text | 200K | 4,096 | Aug 2023 |
Claude 3.5 Sonnet (claude-3-5-sonnet) | Standard Claude 3.5 Sonnet. | 2024 | Text, Image | 200K | 64,000 | 2024 |
Jamba 1.5 Large (Preview) (jamba-1-5-large) | Advanced AI21 Jamba model. | 2025 | Text | – | – | – |
Jamba 1.5 Mini (Preview) (jamba-1-5-mini) | Smaller AI21 Jamba 1.5 variant. | 2025 | Text | – | – | – |
Mistral Medium 3 (mistral-medium-3) | Medium-sized Mistral model. | 2025 | Text | – | – | – |
Mistral Small 3.1 (mistral-small-3-1-25-03) | Smaller, faster Mistral. | 2025-03 | Text | – | – | – |
Mistral Large (mistral-large-24-11) | Large Mistral model. | 2024-11 | Text | – | – | – |
Mistral 7B (mistral-7b) | Base 7B model. | 2023 | Text | – | – | – |
Mixtral (mixtral) | Mixture-of-experts Mistral model. | 2024 | Text | – | – | – |
Llama 4 Maverick (llama-4-maverick-17b-128e) | Meta Llama 4 Maverick. | 2025 | Text | – | – | – |
Llama 4 Scout (llama-4-scout-17b-16e) | Meta Llama 4 Scout. | 2025 | Text | – | – | – |
Llama 4 (llama-4) | Core large Llama 4 model. | 2025 | Text | – | – | – |
Llama 3.3 (llama-3-3) | Successor to Llama 3.2. | 2025 | Text | – | – | – |
Llama 3.2 (Preview) (llama-3-2-preview) | Preview release of Llama 3.2. | 2024 | Text | – | – | – |
Llama 3.2 (llama-3-2) | Stable release of Llama 3.2. | 2024 | Text | – | – | – |
Llama 3.2 Vision (llama-3-2-vision) | Multimodal Llama 3.2. | 2024 | Text, Image | – | – | – |
Llama 3.1 (llama-3-1) | Part of Llama 3 family. | 2024 | Text | – | – | – |
Llama 3 (llama-3) | Base Llama 3 model. | 2023 | Text | – | – | – |
Qwen3-Next 80B Thinking (qwen3-next-80b-thinking) | Reasoning-focused Qwen3 variant. | 2025 | Text | – | – | – |
Qwen3-Next 80B Instruct (qwen3-next-80b-instruct) | Instruction-tuned Qwen3 variant. | 2025 | Text | – | – | – |
Qwen3 Coder (qwen3-coder) | Qwen3 code-focused model. | 2025 | Text (Code) | – | – | – |
Qwen3 235B (qwen3-235b) | Very large Qwen3 model. | 2025 | Text | – | – | – |
Qwen2 (qwen2) | Earlier Qwen release. | 2024 | Text | – | – | – |
DeepSeek V3.1 (deepseek-v3-1) | Advanced DeepSeek model. | 2025 | Text | – | – | – |
DeepSeek R1 (deepseek-r1-0528) | Reasoning-focused DeepSeek model. | 2025-05-28 | Text | – | – | – |
GPT-OSS 120B (gpt-oss-120b) | Open-weight GPT-OSS model. | 2025 | Text | – | – | – |
GPT-OSS 20B (gpt-oss-20b) | Smaller open-weight GPT-OSS model. | 2025 | Text | – | – | – |
Phi-3 (phi-3) | Microsoft Phi-3 model. | 2024 | Text | – | – | – |
Gemma 3n (gemma-3n) | Google Gemma series model. | 2025 | Text | – | – | – |
Gemma 3 (gemma-3) | Member of Gemma family. | 2025 | Text | – | – | – |
Gemma 2 (gemma-2) | Earlier Gemma generation. | 2024 | Text | – | – | – |
Gemma (gemma) | First Gemma release. | 2023 | Text | – | – | – |
Model Selection Guide
By Use Case
Complex Reasoning & Analysis
High Volume Processing
Long Context Applications
Multimodal Applications
Dynamic Model Selection (Example Script)
Cost Optimised Model Routing (Example Script)
A/B Testing Different Models
Model Comparison Matrix
| Provider | Model | Context | Speed | Cost | Best For |
|---|---|---|---|---|---|
| OpenAI | gpt-5-2025-08-07 | 400K | Medium | Highest | Flagship coding, reasoning, agentic tasks |
| gpt-5-mini-2025-08-07 | 400K | Fast | Medium | Cost-efficient GPT-5 for defined tasks | |
| gpt-5-nano-2025-08-07 | 400K | Fastest | Low | Fastest, cheapest GPT-5 variant | |
| gpt-4.1-2025-04-14 | 1M | Medium | High | Instruction following, tool use | |
| gpt-4o | 128K | Medium | High | Complex reasoning, multimodal | |
| gpt-4o-mini | 128K | Fast | Low | Simple tasks, high volume | |
| o3-2025-04-16 | 200K | Slow | Highest | Math, science, coding, multimodal analysis | |
| o1-2024-12-17 | 200K | Slow | High | Step-by-step reasoning | |
| Anthropic | claude-3-5-sonnet | 200K | Fast | Medium | Coding, analysis |
| claude-3-5-haiku | 200K | Fastest | Very Low | Real-time apps | |
| claude-3-opus | 200K | Slow | Very High | Complex research | |
| gemini-1.5-pro | 2M | Medium | Medium | Massive documents | |
| gemini-1.5-flash | 1M | Fast | Low | High-speed processing | |
| Bedrock | llama3-1-70b | 8K | Medium | Low | Open-source needs |
| titan-premier | 8K | Fast | Low | AWS integration | |
| mistral-large | 32K | Medium | Medium | European compliance |