AI Models

Use 300+ models from OpenAI, Anthropic, Google and others through a single API

memory 346 models
free_breakfast 30 free
business 58 providers
Inception: Mercury 2
inception/mercury-2
Reasoning
Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving...
Input
4 128/M so'm
$0.33
Output
12 383/M so'm
$0.98
data_array 128K context
LiquidAI: LFM2.5-1.2B-Thinking (free)
liquid/lfm-2.5-1.2b-thinking:free
free_breakfast Free Reasoning
LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAG—while still running comfortably on edge devices. It supports long context (up to 32K tokens) and is...
Input
Free
Output
Free
data_array 33K context
Microsoft: Phi 4
microsoft/phi-4
Reasoning
[Microsoft Research](/microsoft) Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion...
Input
1 073/M so'm
$0.08
Output
2 311/M so'm
$0.18
data_array 16K context
Microsoft: Phi 4 Mini Instruct
microsoft/phi-4-mini-instruct
Reasoning
Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4...
Input
1 321/M so'm
$0.10
Output
5 779/M so'm
$0.46
data_array 131K context
MiniMax: MiniMax M1
minimax/minimax-m1
Reasoning
MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended context and high-efficiency inference. It leverages a hybrid Mixture-of-Experts (MoE) architecture paired with a custom "lightning attention" mechanism, allowing it...
Input
6 604/M so'm
$0.52
Output
36 322/M so'm
$2.86
data_array 1,000K context
MiniMax: MiniMax M2
minimax/minimax-m2
Reasoning
MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion activated parameters (230 billion total), it delivers near-frontier intelligence across general reasoning,...
Input
4 210/M so'm
$0.33
Output
16 510/M so'm
$1.30
data_array 205K context
Mistral Large 2407
mistralai/mistral-large-2407
Reasoning
This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/)....
Input
33 020/M so'm
$2.60
Output
99 060/M so'm
$7.80
data_array 131K context
Mistral: Mistral Medium 3
mistralai/mistral-medium-3
Reasoning
Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost...
Input
6 604/M so'm
$0.52
Output
33 020/M so'm
$2.60
data_array 131K context
Mistral: Mistral Small 3.1 24B
mistralai/mistral-small-3.1-24b-instruct
Reasoning
Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and...
Input
5 795/M so'm
$0.46
Output
9 163/M so'm
$0.72
data_array 128K context
Mistral: Mistral Small 4
mistralai/mistral-small-2603
Reasoning
Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It combines strong reasoning from...
Input
2 477/M so'm
$0.20
Output
9 906/M so'm
$0.78
data_array 262K context
MoonshotAI: Kimi K2 Thinking
moonshotai/kimi-k2-thinking
Reasoning
Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in...
Input
9 906/M so'm
$0.78
Output
41 275/M so'm
$3.25
data_array 262K context
Nous: Hermes 3 405B Instruct
nousresearch/hermes-3-llama-3.1-405b
Reasoning
Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...
Input
16 510/M so'm
$1.30
Output
16 510/M so'm
$1.30
data_array 131K context
Nous: Hermes 3 405B Instruct (free)
nousresearch/hermes-3-llama-3.1-405b:free
free_breakfast Free Reasoning
Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...
Input
Free
Output
Free
data_array 131K context
Nous: Hermes 3 70B Instruct
nousresearch/hermes-3-llama-3.1-70b
Reasoning
Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...
Input
11 557/M so'm
$0.91
Output
11 557/M so'm
$0.91
data_array 131K context
Nous: Hermes 4 405B
nousresearch/hermes-4-405b
Reasoning
Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with...
Input
16 510/M so'm
$1.30
Output
49 530/M so'm
$3.90
data_array 131K context
Nous: Hermes 4 70B
nousresearch/hermes-4-70b
Reasoning
Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either...
Input
2 146/M so'm
$0.17
Output
6 604/M so'm
$0.52
data_array 131K context
NVIDIA: Llama 3.3 Nemotron Super 49B V1.5
nvidia/llama-3.3-nemotron-super-49b-v1.5
Reasoning
Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and...
Input
6 604/M so'm
$0.52
Output
6 604/M so'm
$0.52
data_array 131K context
NVIDIA: Nemotron 3 Nano Omni (free)
nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free
free_breakfast Free Reasoning
NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...
Input
Free
Output
Free
data_array 256K context
NVIDIA: Nemotron 3 Ultra
nvidia/nemotron-3-ultra-550b-a55b
Reasoning
NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...
Input
8 255/M so'm
$0.65
Output
36 322/M so'm
$2.86
data_array 1,000K context
NVIDIA: Nemotron 3 Ultra (free)
nvidia/nemotron-3-ultra-550b-a55b:free
free_breakfast Free Reasoning
NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...
Input
Free
Output
Free
data_array 1,000K context
NVIDIA: Nemotron Nano 12B 2 VL (free)
nvidia/nemotron-nano-12b-v2-vl:free
free_breakfast Free Reasoning
NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s...
Input
Free
Output
Free
data_array 128K context
NVIDIA: Nemotron Nano 9B V2 (free)
nvidia/nemotron-nano-9b-v2:free
free_breakfast Free Reasoning
NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and...
Input
Free
Output
Free
data_array 128K context
Perceptron: Perceptron Mk1
perceptron/perceptron-mk1
Reasoning
Perceptron Mk1 (Mark One) is Perceptron's highest-quality vision-language model for video and embodied reasoning.** It accepts image and video inputs paired with natural language queries, and produces detailed visual understanding...
Input
2 477/M so'm
$0.20
Output
24 765/M so'm
$1.95
data_array 33K context
Perplexity: Sonar Deep Research
perplexity/sonar-deep-research
Reasoning
Sonar Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across complex topics. It autonomously searches, reads, and evaluates sources, refining its approach as it gathers...
Input
33 020/M so'm
$2.60
Output
132 080/M so'm
$10.40
data_array 128K context