Agents And SkillsAgent Workflow

llama cpp runtime tuning for current models

April 4, 2026r/LocalLLaMA

In r/LocalLLaMA, practical command level tweaks are circulating for running newer models in llama cpp, including changing min p defaults and reducing server slots to avoid wasting VRAM.

Open in PulseSee the full expert discussion →

QUOTES

llama.cpp defaults to min-p 0.05. Current models want --min-p 0.0 so you need to specifically add this to your command.

llama.cpp defaults to 4 slots on llama-server.

Unless you have friends over, you probably only want 1 slot because slots use up vram. -np 1

VOICES

r/LocalLLaMA

RELATED TERMS

llama cppinferencellmopen modelsllama cppllama server

OTHER FINDINGS IN AGENTS AND SKILLS

Claude Code source code leak and clean room rewrites Multi-agent harness for frontend design and long-running software engineering Claude Code session quota and rate-limit frustration

AMYGDALA PULSE

See what experts are saying right now

This finding is one of many signals tracked across Artificial Intelligence. The live feed updates every few hours with new expert voices, debates, and emerging ideas.

Open Artificial Intelligence Pulse Browse all topics

← Back to Artificial Intelligence