Small Language Models (SLMs): The Complete Guide

Small language models trade breadth for speed, privacy, and cost. A well trained 30M parameter model can match or beat a giant LLM on a narrow task, and it runs locally on a CPU at effectively zero inference cost. This guide covers what SLMs are, when they win, how to train one, and how to deploy them on-device.

What is a small language model?

A small language model is a model compact enough to run without datacenter-grade GPUs. That covers anywhere from about 10M to a few billion parameters, depending who you ask. The practical question isn't "how small is small?" It's how small you can go for your task. For many real workloads (classification, extraction, routing, structured output, agent steps) the answer is surprisingly small: tens to low-hundreds of millions of parameters.

SLM vs LLM: the trade-off

Large LLM Small Language Model
Parameters tens to hundreds of B ~10M to a few B (Databiomes: 30M)
Hardware cloud GPUs CPU / on-device
Inference cost per-token, ongoing ~$0 (runs locally)
Latency network + queue local, low
Privacy data leaves device data stays local
Breadth very wide narrow but deep on-task
Best for open-ended generation specific, repeatable tasks

Why SLMs matter now

Four reasons. Cost first: once a model is trained, inference is essentially free on hardware you already own, with no per-token bill. Privacy and compliance follow, because data never leaves the device, which matters in healthcare, finance, and on-prem deployments. Latency drops too, since there's no cloud round-trip. And on a narrow task, a fine-tuned small model often outperforms a prompted giant at a fraction of the cost.

How small can you actually go?

Smaller than most people assume. A model that does one job well doesn't need world knowledge, so scoping the task tightly is the lever that lets you shrink. Databiomes focuses on the 30M parameter range, which is small enough to run on any CPU but big enough to learn a real task from modest data. Paired with the Flora inference engine, these models reach roughly 400 tokens per second on consumer CPUs with no GPU required.

Training a custom SLM

  1. Define the task narrowly. The narrower the task, the smaller the model can be.
  2. Gather focused data. Quality and relevance beat volume, and a few hundred to a few thousand good examples are often enough.
  3. Train or fine-tune. Efficient methods make this fast and cheap.
  4. Evaluate on a blind set, on your metric, not a generic leaderboard.
  5. Deploy on-device. Ship the weights and run on CPU.

With Databiomes you can train a custom model in as little as 24 hours. Build a model →

Deploying SLMs on-device (CPU)

No GPU. No cloud bill. Quantization and a tuned inference engine keep things fast on commodity CPUs, which is what makes the $0 inference model practical rather than theoretical.

SLMs and AI agents

Agents don't need a 400 billion parameter model to click a button, fill a form, or route a request. A swarm of small, fast, local models can run an agentic workflow on-device, cheaper and more private than calling a cloud LLM at every step. That's the idea behind Databiomes' CPU AI agents.

Limitations

SLMs are not for open-ended, world-knowledge tasks. If you need a model to write essays on any topic or reason across broad domains, use a large one. SLMs win when the task is defined and repeatable, which covers a large share of real production workloads.

Frequently asked questions

Are small language models as good as ChatGPT? Not for open-ended chat. On a specific, scoped task, a fine-tuned SLM can match or beat a large model, and far more cheaply.

What's the smallest useful language model? It depends on the task. Narrow jobs run well at 30M parameters.

Can SLMs run without a GPU? Yes, that's the point. They run on ordinary CPUs, including phones and laptops.

How much data do I need to train one? Often a few hundred to a few thousand high-quality, on-task examples.


Ready to put a small model to work? See how Databiomes trains and deploys custom models →