Wes Roth claims Gemma 4 E2B runs fully on device on iPhone 17 Pro at 40 tokens per second using Apple's MLX framework, reinforcing the push toward local agentic models.
Incredible possibilities for on-device small models.
Here @adrgrondin is running Google’s Gemma 4 E2B on iPhone 17 Pro.
~40tk/s with MLX optimized for Apple Silicon
Fully offline with thinking mode.
I am impressed by Gemma 4, there’s a lot of power for an on-device model at fast speeds.
But I am not convinced you can get real agentic workflows out of a small model on device.
Gemma 4 E2B hits 40 tokens/sec natively on iPhone 17 Pro.
The model is running entirely on-device, leveraging Apple's MLX framework
This finding is one of many signals tracked across Artificial Intelligence. The live feed updates every few hours with new authority voices, debates, and emerging ideas.
← Back to Artificial Intelligence