AI Models

Use 300+ models from OpenAI, Anthropic, Google and others through a single API

memory 346 models
free_breakfast 30 free
business 58 providers
NVIDIA: Nemotron 3.5 Content Safety (free)
nvidia/nemotron-3.5-content-safety:free
free_breakfast Free Vision
NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, accepting...
Input
Free
Output
Free
data_array 128K context
OpenAI GPT Latest
~openai/gpt-latest
Vision
This model always redirects to the latest model in the OpenAI GPT family.
Input
82 550/M so'm
$6.50
Output
495 300/M so'm
$39.00
data_array 1,050K context
OpenAI GPT Mini Latest
~openai/gpt-mini-latest
Vision
This model always redirects to the latest model in the OpenAI GPT Mini family.
Input
12 383/M so'm
$0.98
Output
74 295/M so'm
$5.85
data_array 400K context
Perplexity: Sonar
perplexity/sonar
Vision
Sonar is lightweight, affordable, fast, and simple to use — now featuring citations and the ability to customize sources. It is designed for companies seeking to integrate lightweight question-and-answer features...
Input
16 510/M so'm
$1.30
Output
16 510/M so'm
$1.30
data_array 127K context
Qwen: Qwen2.5 VL 72B Instruct
qwen/qwen2.5-vl-72b-instruct
Vision
Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.
Input
13 208/M so'm
$1.04
Output
16 510/M so'm
$1.30
data_array 131K context
Qwen: Qwen3 VL 235B A22B Instruct
qwen/qwen3-vl-235b-a22b-instruct
Vision
Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table...
Input
3 302/M so'm
$0.26
Output
14 529/M so'm
$1.14
data_array 262K context
Qwen: Qwen3 VL 30B A3B Instruct
qwen/qwen3-vl-30b-a3b-instruct
Vision
Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...
Input
2 146/M so'm
$0.17
Output
8 585/M so'm
$0.68
data_array 262K context
Qwen: Qwen3.5 397B A17B
qwen/qwen3.5-397b-a17b
Vision
The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers...
Input
6 356/M so'm
$0.50
Output
40 450/M so'm
$3.19
data_array 256K context
Qwen: Qwen3.5 Plus 2026-02-15
qwen/qwen3.5-plus-02-15
Vision
The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of...
Input
4 293/M so'm
$0.34
Output
25 756/M so'm
$2.03
data_array 1,000K context
Qwen: Qwen3.5 Plus 2026-04-20
qwen/qwen3.5-plus-20260420
Vision
Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It accepts text, image, and video input and produces text output, with a 1M token context window. This...
Input
4 953/M so'm
$0.39
Output
29 718/M so'm
$2.34
data_array 1,000K context
Qwen: Qwen3.5-122B-A10B
qwen/qwen3.5-122b-a10b
Vision
The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of...
Input
4 293/M so'm
$0.34
Output
34 341/M so'm
$2.70
data_array 262K context
Qwen: Qwen3.5-27B
qwen/qwen3.5-27b
Vision
The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of...
Input
3 219/M so'm
$0.25
Output
25 756/M so'm
$2.03
data_array 262K context
Qwen: Qwen3.5-35B-A3B
qwen/qwen3.5-35b-a3b
Vision
The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall...
Input
2 311/M so'm
$0.18
Output
16 510/M so'm
$1.30
data_array 262K context
Qwen: Qwen3.5-Flash
qwen/qwen3.5-flash-02-23
Vision
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the...
Input
1 073/M so'm
$0.08
Output
4 293/M so'm
$0.34
data_array 1,000K context
Qwen: Qwen3.6 27B
qwen/qwen3.6-27b
Vision
Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid multimodal capabilities — accepting text, image, and video inputs...
Input
4 763/M so'm
$0.38
Output
52 337/M so'm
$4.12
data_array 262K context
Qwen: Qwen3.6 35B A3B
qwen/qwen3.6-35b-a3b
Vision
Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion active parameters per token. It uses a hybrid sparse mixture-of-experts architecture combining Gated...
Input
2 311/M so'm
$0.18
Output
16 510/M so'm
$1.30
data_array 262K context
Qwen: Qwen3.6 Flash
qwen/qwen3.6-flash
Vision
Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It supports text, image, and video input with a 1M token context window. Tiered pricing kicks in...
Input
3 096/M so'm
$0.24
Output
18 574/M so'm
$1.46
data_array 1,000K context
Qwen: Qwen3.6 Plus
qwen/qwen3.6-plus
Vision
Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers...
Input
5 366/M so'm
$0.42
Output
32 195/M so'm
$2.54
data_array 1,000K context
Qwen: Qwen3.7 Plus
qwen/qwen3.7-plus
Vision
Qwen3.7-Plus is a cost-effective model in Alibaba's Qwen3.7 series. It supports text and image input with text output, building on the series' text capabilities with a comprehensive upgrade to its...
Input
5 283/M so'm
$0.42
Output
21 133/M so'm
$1.66
data_array 1,000K context
Reka Edge
rekaai/reka-edge
Vision
Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding,...
Input
1 651/M so'm
$0.13
Output
1 651/M so'm
$0.13
data_array 16K context
StepFun: Step 3.7 Flash
stepfun/step-3.7-flash
Vision
Step 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Experts model. It pairs a 196B-parameter language backbone with a vision encoder for native image and video understanding, activating roughly 11B parameters...
Input
3 302/M so'm
$0.26
Output
18 987/M so'm
$1.50
data_array 256K context
xAI: Grok 4.20 Multi-Agent
x-ai/grok-4.20-multi-agent
Vision
Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information...
Input
20 638/M so'm
$1.63
Output
41 275/M so'm
$3.25
data_array 2,000K context
xAI: Grok Build 0.1
x-ai/grok-build-0.1
Vision
Grok Build 0.1 is xAI’s fast coding model trained specifically for agentic software engineering workflows. It supports text and image inputs with text output, and is optimized for interactive coding...
Input
16 510/M so'm
$1.30
Output
33 020/M so'm
$2.60
data_array 256K context
Xiaomi: MiMo-V2.5
xiaomi/mimo-v2.5
Vision
MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in multimodal perception across image and video understanding...
Input
2 311/M so'm
$0.18
Output
4 623/M so'm
$0.36
data_array 1,049K context