Rohan Paul and Ethan Mollick highlight Gemma 4 running fully offline on phones at high token rates, while Mollick cautions that small on device models may still struggle with true agentic workflows requiring judgment and self correction.
Incredible possibilities for on-device small models.
Here @adrgrondin is running Google’s Gemma 4 E2B on iPhone 17 Pro.
~40tk/s with MLX optimized for Apple Silicon
Fully offline with thinking mode.
I am impressed by Gemma 4, there’s a lot of power for an on-device model at fast speeds.
But I am not convinced you can get real agentic workflows out of a small model on device.
This finding is one of many signals tracked across Artificial Intelligence. The live feed updates every few hours with new authority voices, debates, and emerging ideas.
← Back to Artificial Intelligence