AI Models

Use 300+ models from OpenAI, Anthropic, Google and others through a single API

memory 346 models
free_breakfast 30 free
business 58 providers
Qwen: Qwen3.5-Flash
qwen/qwen3.5-flash-02-23
Vision
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the...
Input
1 073/M so'm
$0.08
Output
4 293/M so'm
$0.34
data_array 1,000K context
Qwen: Qwen3.6 27B
qwen/qwen3.6-27b
Vision
Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid multimodal capabilities — accepting text, image, and video inputs...
Input
4 763/M so'm
$0.38
Output
52 337/M so'm
$4.12
data_array 262K context
Qwen: Qwen3.6 35B A3B
qwen/qwen3.6-35b-a3b
Vision
Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion active parameters per token. It uses a hybrid sparse mixture-of-experts architecture combining Gated...
Input
2 311/M so'm
$0.18
Output
16 510/M so'm
$1.30
data_array 262K context
Qwen: Qwen3.6 Flash
qwen/qwen3.6-flash
Vision
Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It supports text, image, and video input with a 1M token context window. Tiered pricing kicks in...
Input
3 096/M so'm
$0.24
Output
18 574/M so'm
$1.46
data_array 1,000K context
Qwen: Qwen3.6 Max Preview
qwen/qwen3.6-max-preview
Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse mixture-of-experts architecture with approximately 1 trillion total parameters. It is optimized for agentic coding, tool use, and...
Input
17 170/M so'm
$1.35
Output
103 022/M so'm
$8.11
data_array 262K context
Qwen: Qwen3.6 Plus
qwen/qwen3.6-plus
Vision
Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers...
Input
5 366/M so'm
$0.42
Output
32 195/M so'm
$2.54
data_array 1,000K context
Qwen: Qwen3.7 Max
qwen/qwen3.7-max
Qwen3.7-Max is the flagship model in Alibaba's Qwen3.7 series. It supports text input and output and is designed for agent-centric workloads, with particular strengths in coding, office and productivity tasks,...
Input
20 638/M so'm
$1.63
Output
61 913/M so'm
$4.88
data_array 1,000K context
Qwen: Qwen3.7 Plus
qwen/qwen3.7-plus
Vision
Qwen3.7-Plus is a cost-effective model in Alibaba's Qwen3.7 series. It supports text and image input with text output, building on the series' text capabilities with a comprehensive upgrade to its...
Input
5 283/M so'm
$0.42
Output
21 133/M so'm
$1.66
data_array 1,000K context
Reka Edge
rekaai/reka-edge
Vision
Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding,...
Input
1 651/M so'm
$0.13
Output
1 651/M so'm
$0.13
data_array 16K context
Reka Flash 3
rekaai/reka-flash-3
Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at general chat, coding tasks, instruction-following, and function calling. Featuring a...
Input
1 651/M so'm
$0.13
Output
3 302/M so'm
$0.26
data_array 66K context
Relace: Relace Apply 3
relace/relace-apply-3
Code
Relace Apply 3 is a specialized code-patching LLM that merges AI-suggested edits straight into your source files. It can apply updates from GPT-4o, Claude, and others into your files at...
Input
14 034/M so'm
$1.11
Output
20 638/M so'm
$1.63
data_array 256K context
Relace: Relace Search
relace/relace-search
The relace-search model uses 4-12 `view_file` and `grep` tools in parallel to explore a codebase and return relevant files to the user request. In contrast to RAG, relace-search performs agentic...
Input
16 510/M so'm
$1.30
Output
49 530/M so'm
$3.90
data_array 256K context
ReMM SLERP 13B
undi95/remm-slerp-l2-13b
A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge
Input
7 430/M so'm
$0.59
Output
10 732/M so'm
$0.85
data_array 6K context
Sao10K: Llama 3 8B Lunaris
sao10k/l3-lunaris-8b
Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge....
Input
660/M so'm
$0.05
Output
826/M so'm
$0.07
data_array 8K context
Sao10K: Llama 3.1 70B Hanami x1
sao10k/l3.1-70b-hanami-x1
This is [Sao10K](/sao10k)'s experiment over [Euryale v2.2](/sao10k/l3.1-euryale-70b).
Input
49 530/M so'm
$3.90
Output
49 530/M so'm
$3.90
data_array 16K context
Sao10K: Llama 3.1 Euryale 70B v2.2
sao10k/l3.1-euryale-70b
Euryale L3.1 70B v2.2 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.1](/models/sao10k/l3-euryale-70b).
Input
14 034/M so'm
$1.11
Output
14 034/M so'm
$1.11
data_array 131K context
Sao10K: Llama 3.3 Euryale 70B
sao10k/l3.3-euryale-70b
Euryale L3.3 70B is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.2](/models/sao10k/l3-euryale-70b).
Input
10 732/M so'm
$0.85
Output
12 383/M so'm
$0.98
data_array 131K context
StepFun: Step 3.5 Flash
stepfun/step-3.5-flash
Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token....
Input
1 486/M so'm
$0.12
Output
4 953/M so'm
$0.39
data_array 262K context
StepFun: Step 3.7 Flash
stepfun/step-3.7-flash
Vision
Step 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Experts model. It pairs a 196B-parameter language backbone with a vision encoder for native image and video understanding, activating roughly 11B parameters...
Input
3 302/M so'm
$0.26
Output
18 987/M so'm
$1.50
data_array 256K context
Switchpoint Router
switchpoint/router
Switchpoint AI's router instantly analyzes your request and directs it to the optimal AI from an ever-evolving library. As the world of LLMs advances, our router gets smarter, ensuring you...
Input
14 034/M so'm
$1.11
Output
56 134/M so'm
$4.42
data_array 131K context
Tencent: Hunyuan A13B Instruct
tencent/hunyuan-a13b-instruct
Reasoning
Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought. It offers competitive benchmark...
Input
2 311/M so'm
$0.18
Output
9 411/M so'm
$0.74
data_array 131K context
Tencent: Hy3 preview
tencent/hy3-preview
Reasoning
Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use. It supports configurable reasoning levels across disabled, low, and high modes, allowing it to...
Input
1 040/M so'm
$0.08
Output
3 467/M so'm
$0.27
data_array 262K context
TheDrummer: Cydonia 24B V4.1
thedrummer/cydonia-24b-v4.1
Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt adherence, and intelligence.
Input
4 953/M so'm
$0.39
Output
8 255/M so'm
$0.65
data_array 131K context
TheDrummer: Rocinante 12B
thedrummer/rocinante-12b
Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported: - Expanded vocabulary with unique and expressive word choices - Enhanced creativity for vivid narratives -...
Input
2 807/M so'm
$0.22
Output
7 099/M so'm
$0.56
data_array 33K context