Google: Gemma 4 26B A4B Google

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at a fraction of the compute cost. Supports multimodal input including text, images, and video (up to 60s at 1fps). Features a 256K token context window, native function calling, configurable thinking/reasoning mode, and structured output support. Released under Apache 2.0.

Architecture

  • Modality: text+image+video->text
  • InputModalities: image, text, video
  • OutputModalities: text
  • Tokenizer: Gemma

ContextAndLimits

  • ContextLength: 262144 Tokens
  • MaxResponseTokens: 262144 Tokens
  • Moderation: Disabled

Pricing

  • Prompt1KTokens: 1.3e-07 ₽
  • Completion1KTokens: 4e-07 ₽
  • InternalReasoning: ₽
  • Request: ₽
  • Image: ₽
  • WebSearch: ₽