Model Selection ComparisonsModel Comparison

Gemma 4 on device performance on iPhone

April 5, 2026Rohan Paul, Ethan Mollick, Wes Roth

Wes Roth claims Gemma 4 E2B runs fully on device on iPhone 17 Pro at 40 tokens per second using Apple's MLX framework, reinforcing the push toward local agentic models.

Open in PulseSee the full authority discussion →

QUOTES

Incredible possibilities for on-device small models.

Here @adrgrondin is running Google’s Gemma 4 E2B on iPhone 17 Pro.

~40tk/s with MLX optimized for Apple Silicon

Fully offline with thinking mode.

I am impressed by Gemma 4, there’s a lot of power for an on-device model at fast speeds.

But I am not convinced you can get real agentic workflows out of a small model on device.

Gemma 4 E2B hits 40 tokens/sec natively on iPhone 17 Pro.

The model is running entirely on-device, leveraging Apple's MLX framework

VOICES

Rohan Paul

Ethan Mollick

Wes Roth

RELATED TERMS

on deviceperformancegooglegemmaiostoken throughputcontext windowopen models

OTHER FINDINGS IN MODEL SELECTION COMPARISONS

Codex vs Claude Code competition and MCP as the battleground Mythos size-and-price expectations (multi-trillion parameter '10T' pricing)Defaulting to OpenAI and Anthropic for consistency vs other LLMs breaking weirdly

AMYGDALA PULSE

See what authorities are saying right now

This finding is one of many signals tracked across Artificial Intelligence. The live feed updates every few hours with new authority voices, debates, and emerging ideas.

Open Artificial Intelligence Pulse Browse all topics

← Back to Artificial Intelligence