Jolly

Benchmarks

Each model was tested on 8 cases — 4 English, 4 German — spanning short sentences, medium paragraphs, and email-length texts with intentional typos.

English

ModelExact Match (%)Errors Fixed (%)Time (ms)Memory (MB)
OpenRouter gpt-4o-mini75%100%1,526225
GRMR V3 3B50%98%2,7463,496
GRMR V3 4B75%98%3,9104,118
Gemma 3 4B Instruct25%92%4,4114,119
Mistral 7B Instruct v0.350%92%6,5177,666
Harper50%90%100187

German

ModelExact Match (%)Errors Fixed (%)Time (ms)Memory (MB)
OpenRouter gpt-4o-mini75%100%1,501226
Mistral 7B Instruct v0.350%95%9,3957,666
GRMR V3 4B0%68%4,5324,118
GRMR V3 3B0%32%4,1573,496
Gemma 3 4B Instruct0%32%4,7284,119
Harper0%0%278202
Exact Match — corrected output matched the expected text character-for-character Errors Fixed — percentage of individual typos the model caught and corrected Time — wall-clock milliseconds from input to corrected output Memory — resident set size in megabytes while the model is loaded

What this means

The models fall into three categories. Harper is a rule-based grammar checker — it matches words against a dictionary and applies fixes instantly, but it only knows English and struggles with context-dependent errors. Is a linter at the end. The GRMR models are small LLMs fine-tuned specifically for grammar correction — they understand context and fix more errors than Harper, but they were trained primarily on English data, so German accuracy is limited. The general-purpose models (Gemma, Mistral) are larger instruction-following LLMs that use a system prompt to correct text — they aren't specialized for grammar but their broader training data gives them better multilingual support.

OpenRouter's GPT-4o-mini fixes every single error across both languages because it is a much larger model running on powerful remote hardware. The tradeoff is that your text leaves your device and you need an API key.

For local English-only use, GRMR V3 4B (2.5 GB) is the recommended choice — it fixes 87% of errors and is fast. For multilingual use, Mistral 7B (4.7 GB) is the only local model that handles German well. For the best results with no hardware constraints, OpenRouter is unbeatable.