Most system prompts you find online were written against frontier models. They assume the model can hold a multi-step plan in its head, infer the right tool from a loose description, and recover gracefully from its own mistakes. Small local models, the 7B–8B class you run on your own hardware, do none of those things reliably. Give them the frontier prompt and they drift: they narrate instead of acting, invent tool arguments, or hallucinate a tool result and keep going.
Same task, different failure modes
The interesting part isn't that small models are "worse." It's that they fail differently, and the failure modes are predictable:
- They plan too far ahead. Ask for a plan and a 7B model will happily write five steps, then execute step one against an imagined result of a tool it never called.
- They wrap tool calls in chatter. A frontier model emits clean structured output; a small model adds "Sure! Let me run that for you:" and breaks the parser.
- They guess arguments. Faced with a required parameter it doesn't know, a small model fills in something plausible instead of going to find the real value.
A prompt that fixes a frontier model's style does nothing for these. You have to write against the specific ways small models go wrong.
What actually helps
Tuning Crush for small models, a few rules did most of the work:
- One tool call per turn. Remove the temptation to plan. Call, observe, decide.
- Output the call and nothing else. No greeting, no markdown; the runtime parses output literally.
- Never fabricate a result. Name the rule explicitly; small models need it said.
- If a value is unknown, call a tool to find it, don't guess.
None of this is clever. It's just rigid, and rigidity is exactly what a small model needs and a large one finds patronizing. The same prompt that makes a 7B model dependable would make a frontier model worse.
Why we tag prompts by model size
This is why OpenPrompts now has a Model size filter. A prompt isn't universally "good"; it's good for a class of model. A system prompt tuned for an 8B local model and one written for a frontier API are different artifacts solving different problems, and you should be able to find the right one.
Browse the Offline & Local LLMs category, or filter the catalog by Size: Small to see prompts written for models that run on your own machine.