AI Models — CloudAPI

Inception: Mercury 2

inception/mercury-2

Reasoning

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving...

Input

4 128/M so'm

$0.33

Output

12 383/M so'm

$0.98

data_array 128K context

LiquidAI: LFM2.5-1.2B-Thinking (free)

liquid/lfm-2.5-1.2b-thinking:free

free_breakfast Free Reasoning

LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAG—while still running comfortably on edge devices. It supports long context (up to 32K tokens) and is...

Input

Free

Output

Free

data_array 33K context

Microsoft: Phi 4

microsoft/phi-4

Reasoning

[Microsoft Research](/microsoft) Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion...

Input

1 073/M so'm

$0.08

Output

2 311/M so'm

$0.18

data_array 16K context

Microsoft: Phi 4 Mini Instruct

microsoft/phi-4-mini-instruct

Reasoning

Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4...

Input

1 321/M so'm

$0.10

Output

5 779/M so'm

$0.46

data_array 131K context

MiniMax: MiniMax M1

minimax/minimax-m1

Reasoning

MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended context and high-efficiency inference. It leverages a hybrid Mixture-of-Experts (MoE) architecture paired with a custom "lightning attention" mechanism, allowing it...

Input

6 604/M so'm

$0.52

Output

36 322/M so'm

$2.86

data_array 1,000K context

MiniMax: MiniMax M2

minimax/minimax-m2

Reasoning

MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion activated parameters (230 billion total), it delivers near-frontier intelligence across general reasoning,...

Input

4 210/M so'm

$0.33

Output

16 510/M so'm

$1.30

data_array 205K context

Mistral Large 2407

mistralai/mistral-large-2407

Reasoning

This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/)....

Input

33 020/M so'm

$2.60

Output

99 060/M so'm

$7.80

data_array 131K context

Mistral: Mistral Medium 3

mistralai/mistral-medium-3

Reasoning

Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost...

Input

6 604/M so'm

$0.52

Output

33 020/M so'm

$2.60

data_array 131K context

Mistral: Mistral Small 3.1 24B

mistralai/mistral-small-3.1-24b-instruct

Reasoning

Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and...

Input

5 795/M so'm

$0.46

Output

9 163/M so'm

$0.72

data_array 128K context

Mistral: Mistral Small 4

mistralai/mistral-small-2603

Reasoning

Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It combines strong reasoning from...

Input

2 477/M so'm

$0.20

Output

9 906/M so'm

$0.78

data_array 262K context

MoonshotAI: Kimi K2 Thinking

moonshotai/kimi-k2-thinking

Reasoning

Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in...

Input

9 906/M so'm

$0.78

Output

41 275/M so'm

$3.25

data_array 262K context

Nous: Hermes 3 405B Instruct

nousresearch/hermes-3-llama-3.1-405b

Reasoning

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...

Input

16 510/M so'm

$1.30

Output

16 510/M so'm

$1.30

data_array 131K context

Nous: Hermes 3 405B Instruct (free)

nousresearch/hermes-3-llama-3.1-405b:free

free_breakfast Free Reasoning

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...

Input

Free

Output

Free

data_array 131K context

Nous: Hermes 3 70B Instruct

nousresearch/hermes-3-llama-3.1-70b

Reasoning

Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...

Input

11 557/M so'm

$0.91

Output

11 557/M so'm

$0.91

data_array 131K context

Nous: Hermes 4 405B

nousresearch/hermes-4-405b

Reasoning

Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with...

Input

16 510/M so'm

$1.30

Output

49 530/M so'm

$3.90

data_array 131K context

Nous: Hermes 4 70B

nousresearch/hermes-4-70b

Reasoning

Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either...

Input

2 146/M so'm

$0.17

Output

6 604/M so'm

$0.52

data_array 131K context

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

nvidia/llama-3.3-nemotron-super-49b-v1.5

Reasoning

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and...

Input

6 604/M so'm

$0.52

Output

6 604/M so'm

$0.52

data_array 131K context

NVIDIA: Nemotron 3 Nano Omni (free)

nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free

free_breakfast Free Reasoning

NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...

Input

Free

Output

Free

data_array 256K context

NVIDIA: Nemotron 3 Ultra

nvidia/nemotron-3-ultra-550b-a55b

Reasoning

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...

Input

8 255/M so'm

$0.65

Output

36 322/M so'm

$2.86

data_array 1,000K context

NVIDIA: Nemotron 3 Ultra (free)

nvidia/nemotron-3-ultra-550b-a55b:free

free_breakfast Free Reasoning

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...

Input

Free

Output

Free

data_array 1,000K context

NVIDIA: Nemotron Nano 12B 2 VL (free)

nvidia/nemotron-nano-12b-v2-vl:free

free_breakfast Free Reasoning

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s...

Input

Free

Output

Free

data_array 128K context

NVIDIA: Nemotron Nano 9B V2 (free)

nvidia/nemotron-nano-9b-v2:free

free_breakfast Free Reasoning

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and...

Input

Free

Output

Free

data_array 128K context

Perceptron: Perceptron Mk1

perceptron/perceptron-mk1

Reasoning

Perceptron Mk1 (Mark One) is Perceptron's highest-quality vision-language model for video and embodied reasoning.** It accepts image and video inputs paired with natural language queries, and produces detailed visual understanding...

Input

2 477/M so'm

$0.20

Output

24 765/M so'm

$1.95

data_array 33K context

Perplexity: Sonar Deep Research

perplexity/sonar-deep-research

Reasoning

Sonar Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across complex topics. It autonomously searches, reads, and evaluates sources, refining its approach as it gathers...

Input

33 020/M so'm

$2.60

Output

132 080/M so'm

$10.40

data_array 128K context