Vortex Two – d0a1.es

Choosing the right Ollama model for your WordPress is not trivial. There are 50+ options in the official registry and each has a different trade-off between quality, speed, and RAM consumption. This guide helps you decide based on your actual use case.

Methodology

I tested 8 representative models on a 4 vCPU / 16 GB RAM VPS without a GPU, running real WordPress tasks:

Meta-description generation (10 articles, 155 char max)
Draft rewriting (8 articles ~800 words)
SEO analysis (keyword research + optimized titles)
Comment summarization (50 real comments)
Comment moderation (spam/legit classification)

Each model was evaluated on 3 axes:

Quality (1-5): human-rated output. Publishable as-is or needs editing?
Speed (tokens/second): measured with ollama benchmark.
RAM: GB consumption of the loaded model (model size + KV cache).

Comparison table

Model	Parameters	RAM	tok/s	Quality	Best for
llama3.1:8b	8B	~5 GB	~45	4.0	General, first deploy
qwen2.5:14b	14B	~10 GB	~28	4.4	Multilingual EN/ES, SEO
mistral:7b	7B	~4.5 GB	~52	3.6	Speed, low consumption
codellama:13b	13B	~9 GB	~25	4.2	Code generation, scripts
phi3:14b	14B	~9 GB	~32	4.1	Structured tasks, JSON
gemma2:9b	9B	~6 GB	~40	4.0	Good all-round balance
llama3.2:3b	3B	~2.5 GB	~80	3.2	Trivial tasks, max speed
deepseek-r1:8b	8B	~5.5 GB	~38	4.3	Reasoning, debugging

Analysis by use case

General WordPress operation (recommended starting point)

llama3.1:8b is the sensible default. It generates acceptable meta-descriptions, summarizes comments correctly, and its latency (~45 tok/s) means a typical 500-token call takes about 11 seconds. It is the only model where the output is usually publishable as-is for basic SEO tasks.

If you have 16 GB of RAM and want more quality without sacrificing speed, jump to qwen2.5:14b — especially good in Spanish (trained on a larger multilingual corpus than llama).

Code generation (WP-CLI scripts, PHP snippets, SQL)

codellama:13b is the king here. It understands WordPress context (hooks, filters, SQL queries), generates code that compiles on the first try, and respects the code base conventions. For tasks like “give me a WP-CLI script that exports all posts to JSON with their SEO metadata” it produces functional code in 1-2 iterations.

Structured tasks and JSON (classification, extraction, RAG)

phi3:14b shines here. Its tuning for structured instructions makes outputs like “give me a JSON with {category, priority, action}” consistently valid without extra characters. For workflows that feed Qdrant, phi3 is more reliable than llama.

Reasoning and debugging (log analysis, diagnosis)

deepseek-r1:8b stands out. It does implicit chain-of-thought, reasons about hypotheses before proposing a solution, and its log analysis is more accurate. If your main use will be “I have this error, what is going on?”, deepseek-r1 is your model.

Raw speed and low consumption

llama3.2:3b for trivial tasks like “classify this comment as spam/legit”. At ~80 tok/s, you can moderate 100 comments in 4 minutes. Its 2.5 GB RAM footprint makes it ideal for running in parallel with a larger model.

Multi-model strategy (recommended)

In production, the best is not to pick a single one, but to combine them per task:

# /opt/d0a1/compose/ollama/.env
# Main model (quality, multilingual)
OLLAMA_MODEL_QUALITY=qwen2.5:14b

# Fast model (classification, simple JSON)
OLLAMA_MODEL_FAST=llama3.2:3b

# Code model (scripts, SQL)
OLLAMA_MODEL_CODE=codellama:13b

# Reasoning model (analysis, debug)
OLLAMA_MODEL_REASON=deepseek-r1:8b

n8n workflows can route based on task type. Example:

# In an n8n Code node (JavaScript):
const task = $json.task_type;

const modelMap = {
  'seo': 'qwen2.5:14b',
  'classify': 'llama3.2:3b',
  'code': 'codellama:13b',
  'analyze': 'deepseek-r1:8b',
  'moderate': 'llama3.2:3b',
  'generate': 'qwen2.5:14b'
};

return { model: modelMap[task] || 'llama3.1:8b' };

Hardware requirements

Setup	Concurrent models	Recommended RAM	Recommended vCPU
Minimum	1 (one model at a time)	8 GB	2
Standard	2 (fast + quality)	16 GB	4
Pro	3-4 (with Q4 quantization)	32 GB	8
With GPU	Any + 10x speed	16 GB + 8+ GB VRAM	4

Quantization: how it affects things

By default Ollama downloads models in Q4_0 (4-bit quantization). This reduces RAM usage by ~75% with <5% quality loss. For more quality, you can download variants:

# Q4_0 (default, best balance)
ollama pull qwen2.5:14b

# Q5_K_M (more quality, +30% RAM)
# Not available as direct tag, use a modelfile

# Q8_0 (max CPU quality, +60% RAM)
# No direct tag either

# FP16 (original quality, +200% RAM, GPU only)
# Don't use on CPU

On CPU, stay with Q4_0. The difference with Q5 is barely noticeable and the RAM cost is not worth it.

Conclusion

If you are only going to use one model, start with llama3.1:8b or qwen2.5:14b (the latter if you need strong EN/ES). As your usage grows, add llama3.2:3b for classification tasks and codellama:13b if you generate scripts regularly.

And remember: no local model charges you per token. The only cost is the VPS.

2026 comparison: the best Ollama models for WordPress