Model Selection ComparisonsModel Comparison

Gemma 4 on device performance on iPhone

April 5, 2026Rohan Paul, Ethan Mollick, Wes Roth

Wes Roth claims Gemma 4 E2B runs fully on device on iPhone 17 Pro at 40 tokens per second using Apple's MLX framework, reinforcing the push toward local agentic models.

Incredible possibilities for on-device small models.
Here @adrgrondin is running Google’s Gemma 4 E2B on iPhone 17 Pro.
~40tk/s with MLX optimized for Apple Silicon
Fully offline with thinking mode.
I am impressed by Gemma 4, there’s a lot of power for an on-device model at fast speeds.
But I am not convinced you can get real agentic workflows out of a small model on device.
Gemma 4 E2B hits 40 tokens/sec natively on iPhone 17 Pro.
The model is running entirely on-device, leveraging Apple's MLX framework
Rohan Paul
Ethan Mollick
Wes Roth
on deviceperformancegooglegemmaiostoken throughputcontext windowopen models

See what authorities are saying right now

This finding is one of many signals tracked across Artificial Intelligence. The live feed updates every few hours with new authority voices, debates, and emerging ideas.

← Back to Artificial Intelligence