In r/LocalLLaMA, practical command level tweaks are circulating for running newer models in llama cpp, including changing min p defaults and reducing server slots to avoid wasting VRAM.
llama.cpp defaults to min-p 0.05. Current models want --min-p 0.0 so you need to specifically add this to your command.
llama.cpp defaults to 4 slots on llama-server.
Unless you have friends over, you probably only want 1 slot because slots use up vram. -np 1
This finding is one of many signals tracked across Artificial Intelligence. The live feed updates every few hours with new expert voices, debates, and emerging ideas.
← Back to Artificial Intelligence